Saved Schemas

Before you begin, make sure you have:

A Datalab account with an API key (new accounts include $5 in free credits)
Python 3.10+ installed
The Datalab SDK: pip install datalab-python-sdk
Your DATALAB_API_KEY environment variable set

Overview

Saved Schemas let you store extraction schemas in Datalab and reference them by ID (schema_id) when calling /api/v1/extract. Instead of sending a full JSON schema with every request, you save it once and reference it by its stable ID. Saved schemas also support versioning — you can update a schema while keeping a history of previous versions and pin extractions to a specific version using schema_version.

Create a Schema

Create schemas via the SDK or the Datalab UI. Each schema is assigned a schema_id (e.g. sch_k8Hx9mP2nQ4v) that you can reference in extraction requests.

from datalab_sdk import DatalabClient

client = DatalabClient()

schema = client.create_extraction_schema(
    name="Invoice Schema",
    description="Extracts key fields from invoices",
    schema_json={
        "properties": {
            "invoice_number": {"type": "string", "description": "Invoice ID"},
            "total_amount": {"type": "number", "description": "Total amount due"},
            "vendor_name": {"type": "string", "description": "Vendor or supplier name"},
            "due_date": {"type": "string", "description": "Payment due date"},
        }
    },
)
print(schema.schema_id)  # e.g. sch_k8Hx9mP2nQ4v

Extract Using a Saved Schema

Pass schema_id to /api/v1/extract instead of page_schema:

from datalab_sdk import DatalabClient, ExtractOptions
import json

client = DatalabClient()

result = client.extract(
    "invoice.pdf",
    options=ExtractOptions(
        schema_id="sch_k8Hx9mP2nQ4v",
        mode="balanced",
    ),
)
extracted = json.loads(result.extraction_schema_json)
print(extracted)

page_schema and schema_id are mutually exclusive — provide exactly one. If you pass both, the API returns a 400 error.

Schema Versioning

When you update a schema in the Datalab UI, you can choose to create a new version. This saves the current state to version history and increments the version number.

Pin to a specific version

Pass schema_version alongside schema_id to use a specific version:

curl -X POST https://www.datalab.to/api/v1/extract \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "schema_id=sch_k8Hx9mP2nQ4v" \
  -F "schema_version=1"

Omitting schema_version always uses the latest version.

We recommend always specifying schema_version alongside schema_id. This ensures your extractions produce consistent results even if the schema is updated later.

List Schemas

result = client.list_extraction_schemas(limit=50, include_archived=False)
for s in result["schemas"]:
    print(f"{s.schema_id}: {s.name} (v{s.version})")

The response includes schemas (array) and total (count). Schemas are ordered by creation date, newest first.

Get a Schema

schema = client.get_extraction_schema("sch_k8Hx9mP2nQ4v")
print(schema.name, schema.version)

Update a Schema

Update schema fields. Pass create_new_version=True to save the current state to version history before updating:

# Update schema fields and create a new version
schema = client.update_extraction_schema(
    "sch_k8Hx9mP2nQ4v",
    schema_json={
        "properties": {
            "invoice_number": {"type": "string"},
            "total_amount": {"type": "number"},
            "line_items": {"type": "array", "items": {"type": "string"}},  # New field
        }
    },
    create_new_version=True,
)
print(f"Now at v{schema.version}")

Archive a Schema

Archiving soft-deletes a schema — it no longer appears in list results (unless include_archived=true) and cannot be used for new extractions:

client.delete_extraction_schema("sch_k8Hx9mP2nQ4v")

API Reference

Schema Object

Field	Type	Description
`schema_id`	string	Stable string ID (e.g. `sch_k8Hx9mP2nQ4v`)
`name`	string	Human-readable name (max 200 chars)
`description`	string\|null	Optional description
`schema_json`	object	JSON schema with a `properties` key
`version`	int	Current version number (starts at 1)
`version_history`	array	Previous versions saved with `create_new_version: true`
`archived`	bool	Whether the schema is archived
`created`	datetime	Creation timestamp
`updated`	datetime	Last update timestamp

Parameter	Type	Description
`schema_id`	string	ID of a saved schema. Mutually exclusive with `page_schema`.
`schema_version`	int	Version to use. Only valid with `schema_id`. Defaults to latest.

Next Steps

Structured Extraction

Full guide to extraction with inline schemas, checkpoints, and options.

Confidence Scoring

Score extraction results with per-field confidence ratings.

Forge Evals

Compare extraction results across configurations using saved schemas.

Handling Long Documents

Strategies for extracting from 100+ page documents.

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

Overview

Create a Schema

Extract Using a Saved Schema

Schema Versioning

Pin to a specific version

List Schemas

Get a Schema

Update a Schema

Archive a Schema

API Reference

Schema Object

Next Steps

Structured Extraction

Confidence Scoring

Forge Evals

Handling Long Documents

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

Documentation Index

​Overview

​Create a Schema

​Extract Using a Saved Schema

​Schema Versioning

​Pin to a specific version

​List Schemas

​Get a Schema

​Update a Schema

​Archive a Schema

​API Reference

​Schema Object

​/extract Parameters (schema-related)

​Next Steps

Structured Extraction

Confidence Scoring

Forge Evals

Handling Long Documents

Overview

Create a Schema

Extract Using a Saved Schema

Schema Versioning

Pin to a specific version

List Schemas

Get a Schema

Update a Schema

Archive a Schema

API Reference

Schema Object

`/extract` Parameters (schema-related)

Next Steps