/api/v1/marker
(docs here) to generate parse output followed by a request to /api/v1/marker_rules
to apply rules.
Here is an example in Python:
chunks
save_checkpoint
chunks
output formatchunks
output format.
The chunks
output format looks a lot like our json
format (documented here) with two important changes:
children
on blocks.html
field will render HTML from all nested children without recursive referencesoutput_format
to chunks
in your Marker request, all the response fields will be the same (see them here) except you will also have a new key, chunks
.
The chunks
key contains a list of JSON objects, each of which has these fields:
id
is the block idblock_type
is the block typepage
is the page numbersection_hierarchy
indicates the section that the block is part ofhtml
contains fully-rendered HTML without recursive references to child blocks (which are not available in chunks
output)bbox
is an [x1, y1, x2, y2]
bounding box for the blockpolygon
is a 4-corner version of bbox
in [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
formatimages
is a JSON object with block ID keys and base64-encoded image data valueshtml
like this: <img src='/page/0/Figure/9'>
.
The string in src
is a key to the images
field in the chunk. To render images, you’ll need to substitute that key with the base64-encoded image value in the images
field described above.
save_checkpoint
is required for the Rules APIsave_checkpoint
to True as shown in the example on this page.
save_checkpoint
saves a caches of your initial Marker request parse in our system temporarily and returns a checkpoint_id
in the final result. The /api/v1/marker_rules
endpoint requires this checkpoint_id
.
save_checkpoint
is a new feature and is currently only used with the Rules API. We’ll be using save_checkpoint
for more upcoming document post-processing features and rolling out convenient ways to parse and execute rules in one pass over time in our API and SDK.