Welcome to Datalab

Datalab builds state-of-the-art document intelligence models to convert complex PDFs and other unstructured formats into structured, machine-readable outputs — fast, accurately, and at scale. We offer a fully managed platform, painless on-prem deployment options if you have sensitive documents, and open-source tools for developers. We support over 90 languages and let you:

Accurately parse PDFs into Markdown, HTML, or JSON using chandra, marker, and surya
Recognize and isolate tables and math equations from documents (we’re SoTA on math!)
Extract key information with citations back to source document bounding boxes for data lineage
Run OCR on documents with Surya, our comprehensive document OCR toolkit designed for processing various document types with capabilities that include text detection, text recognition, layout analysis, reading order determination, table recognition, and LaTeX.

What do you want to do?

Parse PDFs into layout-aware HTML, Markdown, or JSON for RAG / ETL → Parse with Marker Pull tables out of PDFs, documents, or websites → Try our Table Recognition API Extract key information out of documents → Run Structured Extraction Automagically segment digitally stapled PDFs → Try Auto-Segmentation

Who uses Datalab?

Datalab is built for anyone working with messy, high-stakes, or high-volume documents. Our users span industries, teams, and use cases. Some examples include:

AI/ML teams building agents or structured data pipelines: Feed RAG systems, knowledge graphs, or automation workflows with clean, structured outputs. Ideal for converting unstructured PDFs into JSON, Markdown, or HTML for downstream use.
Enterprises with compliance-heavy document processing needs: Automate high-volume document review and extraction with auditability, bounding boxes, and deterministic parsing.
Product or Engineering teams in EdTechs, legaltechs, anmfintechs, and AI Research Labs: Turn scanned textbooks, legal filings, financial statements, tax forms, research papers, into product-ready content at scale.

Getting started

Whether you want to securely host us in your own environments, or use our hosted API, we make it easy to get started.

Datalab SDK

Our Powerful Python library.

Datalab API

Our hosted service.

Playground

Test out a sample document!

Open Source

Run our models locally

Support

Can’t find what you’re looking for? Email [email protected] and a member of the team will get back to you!

Service Status

Check the status of Datalab’s services.

Welcome

Self-serve On Prem

Platform

Beta

What do you want to do?

Who uses Datalab?

Getting started

Datalab SDK

Datalab API

Playground

Open Source

Support

Support

Service Status

Welcome

Self-serve On Prem

Platform

Beta

​What do you want to do?

​Who uses Datalab?

​Getting started

Datalab SDK

Datalab API

Playground

Open Source

​Support

Support

Service Status

What do you want to do?

Who uses Datalab?

Getting started

Support