Skip to main content
Before you begin, make sure you have:
  1. A Datalab account with an API key (new accounts include $5 in free credits)
  2. Python 3.10+ installed
  3. The Datalab SDK: pip install datalab-python-sdk
  4. Your DATALAB_API_KEY environment variable set

Overview

Saved Schemas let you store extraction schemas in Datalab and reference them by ID (schema_id) when calling /api/v1/extract. Instead of sending a full JSON schema with every request, you save it once and reference it by its stable ID. Saved schemas also support versioning — you can update a schema while keeping a history of previous versions and pin extractions to a specific version using schema_version.

Create a Schema

Create and manage extraction schemas in the Datalab UI. Each schema is assigned a schema_id (e.g. sch_k8Hx9mP2nQ4v) that you can reference in extraction requests.

Extract Using a Saved Schema

Pass schema_id to /api/v1/extract instead of page_schema:
curl -X POST https://www.datalab.to/api/v1/extract \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "schema_id=sch_k8Hx9mP2nQ4v" \
  -F "mode=balanced"

# Poll request_check_url from response until status is "complete"
page_schema and schema_id are mutually exclusive — provide exactly one. If you pass both, the API returns a 400 error.

Schema Versioning

When you update a schema in the Datalab UI, you can choose to create a new version. This saves the current state to version history and increments the version number.

Pin to a specific version

Pass schema_version alongside schema_id to use a specific version:
curl -X POST https://www.datalab.to/api/v1/extract \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "schema_id=sch_k8Hx9mP2nQ4v" \
  -F "schema_version=1"
Omitting schema_version always uses the latest version.
We recommend always specifying schema_version alongside schema_id. This ensures your extractions produce consistent results even if the schema is updated later.

List Schemas

cURL
# List active schemas
curl "https://www.datalab.to/api/v1/extraction_schemas" \
  -H "X-API-Key: $DATALAB_API_KEY"

# Include archived schemas
curl "https://www.datalab.to/api/v1/extraction_schemas?include_archived=true" \
  -H "X-API-Key: $DATALAB_API_KEY"
The response includes schemas (array) and total (count). Schemas are ordered by creation date, newest first.

Get a Schema

curl "https://www.datalab.to/api/v1/extraction_schemas/sch_k8Hx9mP2nQ4v" \
  -H "X-API-Key: $DATALAB_API_KEY"

Archive a Schema

Archiving soft-deletes a schema — it no longer appears in list results (unless include_archived=true) and cannot be used for new extractions:
curl -X DELETE "https://www.datalab.to/api/v1/extraction_schemas/sch_k8Hx9mP2nQ4v" \
  -H "X-API-Key: $DATALAB_API_KEY"

API Reference

Schema Object

FieldTypeDescription
schema_idstringStable string ID (e.g. sch_k8Hx9mP2nQ4v)
namestringHuman-readable name (max 200 chars)
descriptionstring|nullOptional description
schema_jsonobjectJSON schema with a properties key
versionintCurrent version number (starts at 1)
version_historyarrayPrevious versions saved with create_new_version: true
archivedboolWhether the schema is archived
createddatetimeCreation timestamp
updateddatetimeLast update timestamp
ParameterTypeDescription
schema_idstringID of a saved schema. Mutually exclusive with page_schema.
schema_versionintVersion to use. Only valid with schema_id. Defaults to latest.

Next Steps

Structured Extraction

Full guide to extraction with inline schemas, checkpoints, and options.

Confidence Scoring

Score extraction results with per-field confidence ratings.

Forge Evals

Compare extraction results across configurations using saved schemas.

Handling Long Documents

Strategies for extracting from 100+ page documents.