- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
Quick Start
Schema Format
Use JSON Schema format to define what you want to extract:Tips for Better Extraction
- Use descriptive field names -
invoice_numberis clearer thanid - Add descriptions - The
descriptionfield helps the model understand context - Specify types correctly - Use
numberfor numeric values,stringfor text - Use arrays for repeating data - Line items, table rows, etc.
Response
The extracted data is returned inextraction_schema_json:
Citation Tracking
Each extracted field includes citations to the source blocks:json output to trace extracted values back to the source document.
Schema Examples
Financial Document
Scientific Paper
Contract
Using Checkpoints
If you already converted a document withsave_checkpoint=True using the Convert API, pass the checkpoint_id to ExtractOptions to skip re-parsing. This saves time and cost when running extraction on a previously converted document.
file, page_schema (required), mode, max_pages, page_range, save_checkpoint, checkpoint_id, and webhook_url.
Auto-Generate Schemas
Don’t want to write schemas by hand? Use the schema generation endpoint to automatically suggest schemas for your document. This requires a checkpoint from a previous conversion:Using Forge Playground
Create and test schemas visually in Forge Playground:- Upload a sample document
- Define fields in the visual editor
- Switch to JSON Editor to copy the schema
- Test extraction before deploying