The Rules API requires a request to /api/v1/marker (docs here) to generate parse output followed by a request to /api/v1/marker_rules to apply rules. Here is an example in Python:
import requests

MARKER_URL = "https://www.datalab.to/api/v1/marker"
MARKER_RULES_URL = "https://www.datalab.to/api/v1/marker_rules"
headers = {"X-Api-Key": "YOUR_API_KEY"}

### Make a `marker` request
#
# - `output_format` must be `chunks`
# - `save_checkpoint` must be True
# - `skip_cache` must be True
form_data = {
    'file': ('test.pdf', open('./test.pdf', 'rb'), 'application/pdf'),
    'output_format': (None, 'chunks'),
	'save_checkpoint': (None, True),
    'skip_cache': (None, True)
}
response = requests.post(url, files=form_data, headers=headers)
data = response.json()

# Poll for your `marker` result (`poll_result` not shown)
result = poll_result(response.json()["request_check_url"], api_key)

### Make a `marker_rules` request
#
# - `checkpoint_id` must be `checkpoint_id` from your `marker` result
# - `block_correction_prompt` is your prompt
checkpoint_id = result["checkpoint_id"]
response = requests.post(
    MARKER_RULES_URL,
    json={
        "checkpoint_id": checkpoint_id,
        "block_correction_prompt": rules_prompt,
    },
    headers=headers
)

# Poll for your `marker` result (`poll_result` not shown)
result = poll_result(response.json()["request_check_url"], api_key)
A standalone runnable example (using Python and uv) is available on this Github gist. You can run it on your command line like so:
uv run \
https://gist.githubusercontent.com/voberoi/10edd8bc939b5510d80b7af928f94559/raw/610503609eaa09e93c14f27abe43d6ef13ebb405/main.py \
--api-key <your-api-key> \
--rules-prompt "Merge tables across pages" \
<file-path>
The Rules API uses two new Marker API features shown in the example and documented below:
  • A new output format: chunks
  • A new capability: save_checkpoint

The chunks output format

The Rules API only works on Marker requests that use the chunks output format. The chunks output format looks a lot like our json format (documented here) with two important changes:
  • All blocks are flattened: every block will be flattened into a list.
  • Because all blocks are flattened:
    • … there are no page blocks in the output.
    • … only top-level blocks on each page are are in the output.
    • … there are no children on blocks.
    • … the htmlfield will render HTML from all nested children without recursive references

Response Fields

When you set output_format to chunks in your Marker request, all the response fields will be the same (see them here) except you will also have a new key, chunks. The chunks key contains a list of JSON objects, each of which has these fields:
  • id is the block id
  • block_type is the block type
  • page is the page number
  • section_hierarchy indicates the section that the block is part of
  • html contains fully-rendered HTML without recursive references to child blocks (which are not available in chunks output)
  • bbox is an [x1, y1, x2, y2] bounding box for the block
  • polygon is a 4-corner version of bbox in [[x1,y1], [x2,y2], [x3,y3], [x4,y4]] format
  • images is a JSON object with block ID keys and base64-encoded image data values

Rendering images in HTML

When your chunks have images in them, you’ll see them rendered in html like this: <img src='/page/0/Figure/9'>. The string in src is a key to the images field in the chunk. To render images, you’ll need to substitute that key with the base64-encoded image value in the images field described above.

save_checkpoint is required for the Rules API

In order to use the Rules API, your initial Marker request must set save_checkpoint to True as shown in the example on this page. save_checkpoint saves a caches of your initial Marker request parse in our system temporarily and returns a checkpoint_id in the final result. The /api/v1/marker_rules endpoint requires this checkpoint_id. save_checkpoint is a new feature and is currently only used with the Rules API. We’ll be using save_checkpoint for more upcoming document post-processing features and rolling out convenient ways to parse and execute rules in one pass over time in our API and SDK.