Datalab Marker Prompt API

Datalab’s Marker Prompt API allows you to use natural language to correct or tailor Marker output to your preferences. It’s designed for cases when you want to nudge Marker in a different direction and steer its output. You can use it to:
  • Merge tables across pages.
  • Correct OCR errors.
  • Fill in missing data.
  • Handle unique edge cases you encounter with your documents.
Forge Playground designed to help you visualize and evaluate your Marker and Marker Prompt API output across multiple documents easily. Learn how to use the Marker Prompt API here.

Prompting Tips

We recommend using Forge Playground to evaluate your prompts and iterate on them. The same prompting tips apply to the Marker Prompt API that do generally: you want to be as explicit as you can in your instructions and provide context for your changes. The context we provide alongside your prompt includes:
  • Our own prompt to have the LLM adhere to yours as closely as possible and provide other conext (e.g. “we’ll show you an image of a page + JSON blocks, here is how those blocks are formatted, here is how you can use page bounding boxes are normalized as evidence when deciding to make changes”, etc.)
  • Blocks formatted as JSON, each of which has an html key.
  • Images of pages
If you need to provide examples in your prompt, we recommend doing so with HTML, since that is what the LLM sees. We haven’t needed to use examples yet, but sufficiently complicated or extremely particular parse preferences may require them. In our testing, we’ve found that prompts do not work as well when they’re too general: they either do not make the changes we want, or they make others that technically adhere to the prompt but we didn’t foresee happening. Iterating in Forge Playground helps us write a prompt that works, and works more consistently. Marker Prompt API’s current iteration analyzes your prompts for intent and passes them through our own prompts in two modes:
  • Block rewriting: we use your prompt to decide if blocks need to be rewritten on every single page.
  • Cross-page merging: we use your prompt do decide if blocks need to be merged across every two pages.
Notably missing here are block reorders and merges across > 2 pages: we don’t do these yet, but we’ll will support them in a future release.

Public Beta Availability & Support

The Marker Prompt API and Forge Parse are in public beta. Both features are available to all paid customers. You can use the Marker Prompt API and Forge Parse if you have an active subscription or credits (you get $5 free when you provide credit card details). If you need to modify, enrich, or correct your Marker parse output, the Marker Prompt API is designed for you. Give it a spin and reach out to us at support@datalab.to or on Discord with questions.