10/23/2025
- Workflows beta launch! You can now use the API and SDK to compose various steps like parse, extract, segment, and conditional logic to create document processing workflows that are reusable.
10/22/2025
- New model launch! Our SOTA model, Chandra, is now publicly-available, open-source, and accessible via our API (when using modes
balancedandaccurate).
10/20/2025
- During the global AWS outage we put mitigations in place to work around issues our upstream providers were experiencing. With these mitigations, despite ongoing upstream outages, we restored API service to our customers.
10/10/2025
- If parses are taking over 10 seconds in the playground, users will receive an option to receive an email notification when it is complete.
- Fixes and improvements to long-document processing in the playground.
- Fixes to how request statuses are updated (from e.g. “processing” —> “complete”), so they update properly and on-time.
10/8/2025
- v1.0.7 of our container released with stability improvements for very long-running containers (self-serve and enterprise customers only).
10/6/2025
- Users can now click on “View in Playground” on API requests in the Usage tab to view how their document was parsed, segmented, or extracted. This feature is enabled as long as users have the correct data retention settings.
10/3/2025
- v1.0.5 of our container released with settings to significantly reduce log output, useful for highly-scaled workloads.
9/25/2025
- Improvements to Segment/Extract UX in the playground.
- Fixes and improvements to segmentation results.
9/18/2025
- High Accuracy Mode launch — API users can select
mode: "accurate"for our highest accuracy document processing, trading off latency and cost (both higher). - Public playground launch — unauthenticated users can access to the same playground experience as subscribers (with limitations) at https://www.datalab.to/playground
9/16/2025
- New playground launch — we now offer a significantly-improved playground where users can inspect how their documents are parsed or view document segmentation/structured extraction results.
- Segmentation V1 launch — API users can segment documents automatically or with a schema.
9/5/2025
- v1.0.2 of our container released supporting both self-serve and enterprise customers with improved functionality and stability.
- Added
marker_litesupport to our container to measure OCR-likelihood.
9/1/2025
- Users can view showcased static examples in the Datalab playground.
- RTF file format support added to the API.
8/27/2025
- Launched our self-serve on-prem container, purchaseable via Stripe checkout — no sales or contracting process required.
- Added support for our
/ocrendpoint in the conatiner in addition to/marker.
8/20/2025
- Users can generate schemas automatically based on document content in the playground.
- Improvements to structured extraction quality and latency.
8/15/2025
- Users can view citation highlights from structured extraction requests in the playground.
- If parse quality scores are available, they will now be returned in the
/markerresponse.
8/5/2025
- Launch a new OCR model with improved math performance.
- Improve marker quality in cases where there are inline equations or other text that needs OCR.
7/25/2025
- Improve speed of LLM mode and when outputting multiple output formats.
7/20/2025
- Launch a visual editor for structured extraction that lets you edit schemas and visualize results.
7/15/2025
- Add a visual editor for marker prompts that lets you see how the document was changed, test across documents, and save prompts.
7/1/2025
- Structured extraction beta - pass
page_schemato themarkerendpoint to extract structured data from documents. The schema should be a pydantic schema generated with.model_dump_json_schema(), or another JSON schema format. - Support the new
chunksoutput format for marker, which is a simplified list of blocks with their full html, ideal for chunking/RAG. - Marker endpoint is now promptable - pass
block_correction_promptto the marker endpoint to correct the output of marker with your custom logic. - We support additional configuration parameters for marker via the
additional_configparameter. This is a JSON object where the keys are the configuration options and the values are the values for those options. You can see the exact options in the API schema.
6/26/2025
- Support multiple output formats for one doc by passing them as comma-separated values in
output_formatfor marker. - Complete redesign of the dashboard, with a new look and feel. This will also make it easier for us improve functionality in the future.
6/18/2025
- Improve the playground to make it more functional (easier to test options)
- Significantly improve styling in the playground
- Add a public version of the playground to make marker easier to test
6/3/2025
- Initial launch of playground, for testing marker parsing configurations
5/27/2025
- New OCR model which benchmarks better overall, handles inline math, gives detailed character bboxes.
- Add
format_linesflag to marker to add inline math and formatting to lines. (this will automatically OCR lines that need it, also)
3/26/2025
- Add support for multiple file formats - spreadsheets, epub, html, in addition to existing document, image, pdf, and presentation formats.
- Improve inline math and formatting when passing
use_llm. use_llm(the high accuracy mode) now costs the same as regular inference.
1/30/2025
Marker:- Integrate a new table recognition model, which handles rowspans and colspans better. This is a significant improvement on the old model.
- Improve the
--use_llmoption to merge tables across pages, OCR handwriting, OCR forms, and generally have much higher quality than before. - Integrate a new LaTeX OCR model that is significantly more accurate.
- Add links and references to the markdown - the references include internal links.
- Speed up inference time.
- Remove the line detection endpoint - it had low usage.
- Improve the
table_recendpoint - it now takes the--use_llmflag, and should run much faster.
1/3/2025
- Add the
use_llmoption to the marker API - this uses an LLM to make conversion much more accurate for tables, forms, inline math, and complex pages. It’s a beta feature, and will currently double the cost per request. - Added other options to the marker endpoint.
- Use
disable_image_extractionto disable image extraction for marker. - Use
strip_existing_ocrto strip all existing OCR text and re-OCR (if it was added by something like tesseract)
- Use
- Better automatic heuristics for when to OCR with marker.
- Better text extraction and layout detection for marker.
- Speed up the marker and OCR endpoints by ~30%.
12/4/2024
- Uploaded files can now be up to 200MB in size.
- Improved speed by optimizing file handling on the backend.
12/3/2024
- We now offer $5 in free credits to new signups
- Additional bugfixes to improve markdown output quality
12/2/2024
- We sped up file operations internally, which should result in a decent API speed boost
- We now handle blockquotes and nested lists with the marker endpoint
11/27/2024
- Marker is now at v1, with a lot of improvements - it’s 4x faster than a month ago, and quality is much higher across all document types
- The layout model has been upgraded to a new version, with more potential prediction types
10/31/2024
- More API speedups, on the order of 15-20% for marker.
- Bump concurrency/rate limits to 200.
- Improve stability of service under load.
- If you cancel, you will now retain your credits until the end of the month.
- Visual improvements on the marketing site.
10/28/2024
- Significant API speedups, on the order of 40% faster.
10/25/24
- Flatten form fields into pdf when extracting tables and markdown
- Fix page separators, they now appear at the start of every page, and include a page number
10/23/24
- Speed up marker, layout, and detection by 20-30%
- Fix various bugs that cause edge case errors in conversion
- Increase concurrent request limit to 100
10/21/24
- Significantly improve marker output quality
- Include header levels like h1, h2, etc.
- Parse tables very accurately
- Improve block type detection and markdown quality
- Fix many output bugs
- Add in new table recognition model at the /table_rec endpoint
- This will detect and convert tables into a given format
- Improve OCR, layout, text detection quality
- Fix memory leaks and improve performance
- Fix bugs with pagination and marker
8/19/2024
- Add in new OCR model with better accuracy across the board
- Language is now optional for marker and OCR model
- Increase max page count and max pixel width
7/20/2024
- Drop prices for marker and surya inference.
7/12/2024
- Significant speedup for marker and surya text detection/layout. 10-15% faster.
7/10/2024
- Increase concurrent request limit to 50.
7/6/2024
- Major infrastructure stability improvements.
7/3/2024
- Added response caching for up to 1 hour. If you send the same document to the same endpoint, with the same options, within that time, you’ll get a cache hit and won’t be billed again.
7/2/2024
- Improved parsing for Powerpoint presentations and Word documents.
- Add status page and changelog.
6/26/2024
- Increase concurrency limits for all users
6/25/2024
- Return page count from all endpoints
- Users can now disable marker image extraction
- Webhooks are now supported instead of polling. Webhooks will ping a given URL when inference is complete.
6/21/2024
- Initial support for Microsoft Word and Microsoft Powerpoint documents (docx/doc/pptx/ppt).
6/18/2024
- Enable paginating marker output.
5/31/2024
- Initial launch of marker and surya APIs.