Skip to main content8/5/2025
- Launch a new OCR model with improved math performance.
- Improve marker quality in cases where there are inline equations or other text that needs OCR.
7/25/2025
- Improve speed of LLM mode and when outputting multiple output formats.
7/20/2025
- Launch a visual editor for structured extraction that lets you edit schemas and visualize results.
7/15/2025
- Add a visual editor for marker prompts that lets you see how the document was changed, test across documents, and save prompts.
7/1/2025
- Structured extraction beta - pass
page_schema
to the marker
endpoint to extract structured data from documents. The schema should be a pydantic schema generated with .model_dump_json_schema()
, or another JSON schema format.
- Support the new
chunks
output format for marker, which is a simplified list of blocks with their full html, ideal for chunking/RAG.
- Marker endpoint is now promptable - pass
block_correction_prompt
to the marker endpoint to correct the output of marker with your custom logic.
- We support additional configuration parameters for marker via the
additional_config
parameter. This is a JSON object where the keys are the configuration options and the values are the values for those options. You can see the exact options in the API schema.
6/26/2025
- Support multiple output formats for one doc by passing them as comma-separated values in
output_format
for marker.
- Complete redesign of the dashboard, with a new look and feel. This will also make it easier for us improve functionality in the future.
6/18/2025
- Improve the playground to make it more functional (easier to test options)
- Significantly improve styling in the playground
- Add a public version of the playground to make marker easier to test
6/3/2025
- Initial launch of playground, for testing marker parsing configurations
5/27/2025
- New OCR model which benchmarks better overall, handles inline math, gives detailed character bboxes.
- Add
format_lines
flag to marker to add inline math and formatting to lines. (this will automatically OCR lines that need it, also)
3/26/2025
- Add support for multiple file formats - spreadsheets, epub, html, in addition to existing document, image, pdf, and presentation formats.
- Improve inline math and formatting when passing
use_llm
.
use_llm
(the high accuracy mode) now costs the same as regular inference.
1/30/2025
Marker:
- Integrate a new table recognition model, which handles rowspans and colspans better. This is a significant improvement on the old model.
- Improve the
--use_llm
option to merge tables across pages, OCR handwriting, OCR forms, and generally have much higher quality than before.
- Integrate a new LaTeX OCR model that is significantly more accurate.
- Add links and references to the markdown - the references include internal links.
General:
- Speed up inference time.
- Remove the line detection endpoint - it had low usage.
- Improve the
table_rec
endpoint - it now takes the --use_llm
flag, and should run much faster.
1/3/2025
- Add the
use_llm
option to the marker API - this uses an LLM to make conversion much more accurate for tables, forms, inline math, and complex pages. It’s a beta feature, and will currently double the cost per request.
- Added other options to the marker endpoint.
- Use
disable_image_extraction
to disable image extraction for marker.
- Use
strip_existing_ocr
to strip all existing OCR text and re-OCR (if it was added by something like tesseract)
- Better automatic heuristics for when to OCR with marker.
- Better text extraction and layout detection for marker.
- Speed up the marker and OCR endpoints by ~30%.
12/4/2024
- Uploaded files can now be up to 200MB in size.
- Improved speed by optimizing file handling on the backend.
12/3/2024
- We now offer $5 in free credits to new signups
- Additional bugfixes to improve markdown output quality
12/2/2024
- We sped up file operations internally, which should result in a decent API speed boost
- We now handle blockquotes and nested lists with the marker endpoint
11/27/2024
- Marker is now at v1, with a lot of improvements - it’s 4x faster than a month ago, and quality is much higher across all document types
- The layout model has been upgraded to a new version, with more potential prediction types
10/31/2024
- More API speedups, on the order of 15-20% for marker.
- Bump concurrency/rate limits to 200.
- Improve stability of service under load.
- If you cancel, you will now retain your credits until the end of the month.
- Visual improvements on the marketing site.
10/28/2024
- Significant API speedups, on the order of 40% faster.
10/25/24
- Flatten form fields into pdf when extracting tables and markdown
- Fix page separators, they now appear at the start of every page, and include a page number
10/23/24
- Speed up marker, layout, and detection by 20-30%
- Fix various bugs that cause edge case errors in conversion
- Increase concurrent request limit to 100
10/21/24
- Significantly improve marker output quality
- Include header levels like h1, h2, etc.
- Parse tables very accurately
- Improve block type detection and markdown quality
- Fix many output bugs
- Add in new table recognition model at the /table_rec endpoint
- This will detect and convert tables into a given format
- Improve OCR, layout, text detection quality
- Fix memory leaks and improve performance
- Fix bugs with pagination and marker
8/19/2024
- Add in new OCR model with better accuracy across the board
- Language is now optional for marker and OCR model
- Increase max page count and max pixel width
7/20/2024
- Drop prices for marker and surya inference.
7/12/2024
- Significant speedup for marker and surya text detection/layout. 10-15% faster.
7/10/2024
- Increase concurrent request limit to 50.
7/6/2024
- Major infrastructure stability improvements.
7/3/2024
- Added response caching for up to 1 hour. If you send the same document to the same endpoint, with the same options, within that time, you’ll get a cache hit and won’t be billed again.
7/2/2024
- Improved parsing for Powerpoint presentations and Word documents.
- Add status page and changelog.
6/26/2024
- Increase concurrency limits for all users
6/25/2024
- Return page count from all endpoints
- Users can now disable marker image extraction
- Webhooks are now supported instead of polling. Webhooks will ping a given URL when inference is complete.
6/21/2024
- Initial support for Microsoft Word and Microsoft Powerpoint documents (docx/doc/pptx/ppt).
6/18/2024
- Enable paginating marker output.
5/31/2024
- Initial launch of marker and surya APIs.