Skip to main content
This guide helps you migrate from deprecated Datalab API endpoints to their current replacements.

Marker → Dedicated Endpoints

The /api/v1/marker endpoint is deprecated. Migrate to the new dedicated endpoints below.
The monolithic /api/v1/marker endpoint has been replaced with dedicated endpoints for each operation:
Old UsageNew EndpointSDK Method
/marker (basic conversion)POST /api/v1/convertclient.convert()
/marker with page_schemaPOST /api/v1/extractclient.extract()
/marker with segmentation_schemaPOST /api/v1/segmentclient.segment()
/marker with extras=track_changesPOST /api/v1/track-changesclient.track_changes()
/marker with pipeline_idPOST /api/v1/custom-processorclient.run_custom_processor()

SDK upgrade

Update to the latest SDK for the new dedicated methods:
pip install --upgrade datalab-python-sdk
SDK users who only use client.convert() do not need to change code — it continues to work and now calls /api/v1/convert internally.

Document Conversion

# No changes needed — convert() works the same
from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()
result = client.convert("document.pdf")
print(result.markdown)

Structured Extraction

# Old: page_schema on ConvertOptions
from datalab_sdk import DatalabClient, ConvertOptions
options = ConvertOptions(page_schema=schema)
result = client.convert("invoice.pdf", options=options)

Document Segmentation

# Old: segmentation_schema on ConvertOptions
options = ConvertOptions(segmentation_schema=json.dumps(schema))
result = client.convert("document.pdf", options=options)

Track Changes

# Old: extras parameter on ConvertOptions
options = ConvertOptions(extras="track_changes", output_format="html")
result = client.convert("contract.docx", options=options)

Checkpoint reuse

The new endpoints support a checkpoint system to avoid re-parsing documents. Convert once, then extract or segment multiple times:
from datalab_sdk import DatalabClient, ConvertOptions, ExtractOptions, SegmentOptions
import json

client = DatalabClient()

# Step 1: Convert and save checkpoint
options = ConvertOptions(save_checkpoint=True)
result = client.convert("document.pdf", options=options)
checkpoint_id = result.checkpoint_id

# Step 2: Extract using checkpoint (no re-parsing)
extract_opts = ExtractOptions(
    checkpoint_id=checkpoint_id,
    page_schema=json.dumps({"invoice_number": {"type": "string"}})
)
extracted = client.extract(options=extract_opts)

# Step 3: Segment using same checkpoint
segment_opts = SegmentOptions(
    checkpoint_id=checkpoint_id,
    segmentation_schema=json.dumps({"sections": ["Header", "Body", "Footer"]})
)
segmented = client.segment(options=segment_opts)

Workflows → Pipelines

The Workflows API (/api/v1/workflows) is deprecated. Use Pipelines for all new integrations and migrate existing workflows.
Pipelines replace Workflows with a simpler API, per-step status tracking, versioning, and a visual editor in Forge.
WorkflowsPipelines
POST /api/v1/workflows/workflowsPOST /api/v1/pipelines (via SDK: client.create_pipeline())
POST /api/v1/workflows/workflows/{id}/executePOST /api/v1/pipelines/{id}/run (via SDK: client.run_pipeline())
GET /api/v1/workflows/executions/{id}GET /api/v1/pipelines/executions/{id} (via SDK: client.get_pipeline_execution())
See Pipelines for a full walkthrough.

Custom Pipeline → Custom Processor

POST /api/v1/custom-pipeline is deprecated (sunset: September 30, 2026). Migrate to POST /api/v1/custom-processor. The management routes /api/v1/custom_pipelines/* are also deprecated; use /api/v1/custom_processors/* instead.
from datalab_sdk import DatalabClient, CustomProcessorOptions

client = DatalabClient()
options = CustomProcessorOptions(pipeline_id="cp_XXXXX")
result = client.run_custom_pipeline("document.pdf", options=options)
The response format is identical. CustomPipelineOptions remains as a backward-compatible alias for CustomProcessorOptions.

Table Recognition → Document Conversion

The standalone Table Recognition endpoint (/api/v1/table_rec) is deprecated. Use the Document Conversion endpoint with JSON output instead.

Before (deprecated)

# Old: Dedicated table recognition endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/table_rec",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)

After (current)

from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

options = ConvertOptions(
    output_format="json",
    mode="balanced"
)

result = client.convert("document.pdf", options=options)

# Tables are in the JSON output with block_type "Table"
for block in result.json.get("children", []):
    if block.get("block_type") == "Table":
        print(f"Table: {block['id']}")
        print(f"Bounding box: {block['bbox']}")
        # Access cells in block['children']

OCR → Document Conversion

The standalone OCR endpoint (/api/v1/ocr) is deprecated. Use the Document Conversion endpoint instead, which includes OCR as part of its processing pipeline.

Before (deprecated)

# Old: Dedicated OCR endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/ocr",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)

After (current)

from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

# For text extraction, use markdown output
result = client.convert("document.pdf")
print(result.markdown)

# For page-level text, use JSON output
options = ConvertOptions(output_format="json")
result = client.convert("document.pdf", options=options)

Next Steps

Document Conversion

Full guide to the current conversion API

Structured Extraction

Extract structured data using JSON schemas

Changelog

See all API changes and deprecations

SDK Reference

Use the SDK for the simplest migration path