7/1/2025

  • Structured extraction beta - pass page_schema to the marker endpoint to extract structured data from documents. The schema should be a pydantic schema generated with .model_dump_json_schema(), or another JSON schema format.
  • Support the new chunks output format for marker, which is a simplified list of blocks with their full html, ideal for chunking/RAG.
  • Marker endpoint is now promptable - pass block_correction_prompt to the marker endpoint to correct the output of marker with your custom logic.
  • We support additional configuration parameters for marker via the additional_config parameter. This is a JSON object where the keys are the configuration options and the values are the values for those options. You can see the exact options in the API schema.

6/26/2025

  • Support multiple output formats for one doc by passing them as comma-separated values in output_format for marker.
  • Complete redesign of the dashboard, with a new look and feel. This will also make it easier for us improve functionality in the future.

6/18/2025

  • Improve the playground to make it more functional (easier to test options)
  • Significantly improve styling in the playground
  • Add a public version of the playground to make marker easier to test

6/3/2025

  • Initial launch of playground, for testing marker parsing configurations

5/27/2025

  • New OCR model which benchmarks better overall, handles inline math, gives detailed character bboxes.
  • Add format_lines flag to marker to add inline math and formatting to lines. (this will automatically OCR lines that need it, also)

3/26/2025

  • Add support for multiple file formats - spreadsheets, epub, html, in addition to existing document, image, pdf, and presentation formats.
  • Improve inline math and formatting when passing use_llm.
  • use_llm (the high accuracy mode) now costs the same as regular inference.

1/30/2025

Marker:
  • Integrate a new table recognition model, which handles rowspans and colspans better. This is a significant improvement on the old model.
  • Improve the --use_llm option to merge tables across pages, OCR handwriting, OCR forms, and generally have much higher quality than before.
  • Integrate a new LaTeX OCR model that is significantly more accurate.
  • Add links and references to the markdown - the references include internal links.
General:
  • Speed up inference time.
  • Remove the line detection endpoint - it had low usage.
  • Improve the table_rec endpoint - it now takes the --use_llm flag, and should run much faster.

1/3/2025

  • Add the use_llm option to the marker API - this uses an LLM to make conversion much more accurate for tables, forms, inline math, and complex pages. It’s a beta feature, and will currently double the cost per request.
  • Added other options to the marker endpoint.
    • Use disable_image_extraction to disable image extraction for marker.
    • Use strip_existing_ocr to strip all existing OCR text and re-OCR (if it was added by something like tesseract)
  • Better automatic heuristics for when to OCR with marker.
  • Better text extraction and layout detection for marker.
  • Speed up the marker and OCR endpoints by ~30%.

12/4/2024

  • Uploaded files can now be up to 200MB in size.
  • Improved speed by optimizing file handling on the backend.

12/3/2024

  • We now offer $5 in free credits to new signups
  • Additional bugfixes to improve markdown output quality

12/2/2024

  • We sped up file operations internally, which should result in a decent API speed boost
  • We now handle blockquotes and nested lists with the marker endpoint

11/27/2024

  • Marker is now at v1, with a lot of improvements - it’s 4x faster than a month ago, and quality is much higher across all document types
  • The layout model has been upgraded to a new version, with more potential prediction types

10/31/2024

  • More API speedups, on the order of 15-20% for marker.
  • Bump concurrency/rate limits to 200.
  • Improve stability of service under load.
  • If you cancel, you will now retain your credits until the end of the month.
  • Visual improvements on the marketing site.

10/28/2024

  • Significant API speedups, on the order of 40% faster.

10/25/24

  • Flatten form fields into pdf when extracting tables and markdown
  • Fix page separators, they now appear at the start of every page, and include a page number

10/23/24

  • Speed up marker, layout, and detection by 20-30%
  • Fix various bugs that cause edge case errors in conversion
  • Increase concurrent request limit to 100

10/21/24

  • Significantly improve marker output quality
    • Include header levels like h1, h2, etc.
    • Parse tables very accurately
    • Improve block type detection and markdown quality
    • Fix many output bugs
  • Add in new table recognition model at the /table_rec endpoint
    • This will detect and convert tables into a given format
  • Improve OCR, layout, text detection quality
  • Fix memory leaks and improve performance
  • Fix bugs with pagination and marker

8/19/2024

  • Add in new OCR model with better accuracy across the board
  • Language is now optional for marker and OCR model
  • Increase max page count and max pixel width

7/20/2024

  • Drop prices for marker and surya inference.

7/12/2024

  • Significant speedup for marker and surya text detection/layout. 10-15% faster.

7/10/2024

  • Increase concurrent request limit to 50.

7/6/2024

  • Major infrastructure stability improvements.

7/3/2024

  • Added response caching for up to 1 hour. If you send the same document to the same endpoint, with the same options, within that time, you’ll get a cache hit and won’t be billed again.

7/2/2024

  • Improved parsing for Powerpoint presentations and Word documents.
  • Add status page and changelog.

6/26/2024

  • Increase concurrency limits for all users

6/25/2024

  • Return page count from all endpoints
  • Users can now disable marker image extraction
  • Webhooks are now supported instead of polling. Webhooks will ping a given URL when inference is complete.

6/21/2024

  • Initial support for Microsoft Word and Microsoft Powerpoint documents (docx/doc/pptx/ppt).

6/18/2024

  • Enable paginating marker output.

5/31/2024

  • Initial launch of marker and surya APIs.