- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
Quick Start
When to Use
Segmentation is useful when:- Batch-scanned documents are combined into a single PDF
- Multiple document types are stapled together
- You need to apply different processing to different sections
Response Format
Process Each Segment
After segmentation, process each segment separately:Using Checkpoints
If you already converted a document withsave_checkpoint=True using the Convert API, pass the checkpoint_id to SegmentOptions to skip re-parsing. This saves time and cost when running segmentation on a previously converted document.
Custom Segmentation Schema
Define expected segment types for better accuracy:Next Steps
Structured Extraction
Extract structured data from document segments using JSON schemas.
Handling Long Documents
Tips for TOC-based segmentation on documents with 50+ pages.
Document Conversion
Convert documents to Markdown, HTML, JSON, or chunks.
Workflows
Create and execute document processing pipelines.