Migration Guide - Datalab Documentation

This guide helps you migrate from deprecated Datalab API endpoints to their current replacements.

Marker → Dedicated Endpoints

The /api/v1/marker endpoint is deprecated. Migrate to the new dedicated endpoints below.

The monolithic /api/v1/marker endpoint has been replaced with dedicated endpoints for each operation:

Old Usage	New Endpoint	SDK Method
`/marker` (basic conversion)	`POST /api/v1/convert`	`client.convert()`
`/marker` with `page_schema`	`POST /api/v1/extract`	`client.extract()`
`/marker` with `segmentation_schema`	`POST /api/v1/segment`	`client.segment()`
`/marker` with `extras=track_changes`	`POST /api/v1/track-changes`	`client.track_changes()`
`/marker` with `pipeline_id`	`POST /api/v1/custom-processor`	`client.run_custom_processor()`

SDK upgrade

Update to the latest SDK for the new dedicated methods:

pip install --upgrade datalab-python-sdk

SDK users who only use client.convert() do not need to change code — it continues to work and now calls /api/v1/convert internally.

Document Conversion

# No changes needed — convert() works the same
from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()
result = client.convert("document.pdf")
print(result.markdown)

# Old
curl -X POST https://www.datalab.to/api/v1/marker \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

# New
curl -X POST https://www.datalab.to/api/v1/convert \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Structured Extraction

# Old: page_schema on ConvertOptions
from datalab_sdk import DatalabClient, ConvertOptions
options = ConvertOptions(page_schema=schema)
result = client.convert("invoice.pdf", options=options)

# New: Dedicated extract() method with ExtractOptions
from datalab_sdk import DatalabClient, ExtractOptions
import json

client = DatalabClient()
options = ExtractOptions(
    page_schema=json.dumps(schema)
)
result = client.extract("invoice.pdf", options=options)
extracted = json.loads(result.extraction_schema_json)

Document Segmentation

# Old: segmentation_schema on ConvertOptions
options = ConvertOptions(segmentation_schema=json.dumps(schema))
result = client.convert("document.pdf", options=options)

# New: Dedicated segment() method with SegmentOptions
from datalab_sdk import DatalabClient, SegmentOptions
import json

client = DatalabClient()
options = SegmentOptions(
    segmentation_schema=json.dumps(schema)
)
result = client.segment("document.pdf", options=options)
segments = result.segmentation_results

Track Changes

# Old: extras parameter on ConvertOptions
options = ConvertOptions(extras="track_changes", output_format="html")
result = client.convert("contract.docx", options=options)

# New: Dedicated track_changes() method
from datalab_sdk import DatalabClient, TrackChangesOptions

client = DatalabClient()
options = TrackChangesOptions(output_format="markdown,html,chunks")
result = client.track_changes("contract.docx", options=options)

Checkpoint reuse

The new endpoints support a checkpoint system to avoid re-parsing documents. Convert once, then extract or segment multiple times:

from datalab_sdk import DatalabClient, ConvertOptions, ExtractOptions, SegmentOptions
import json

client = DatalabClient()

# Step 1: Convert and save checkpoint
options = ConvertOptions(save_checkpoint=True)
result = client.convert("document.pdf", options=options)
checkpoint_id = result.checkpoint_id

# Step 2: Extract using checkpoint (no re-parsing)
extract_opts = ExtractOptions(
    checkpoint_id=checkpoint_id,
    page_schema=json.dumps({"invoice_number": {"type": "string"}})
)
extracted = client.extract(options=extract_opts)

# Step 3: Segment using same checkpoint
segment_opts = SegmentOptions(
    checkpoint_id=checkpoint_id,
    segmentation_schema=json.dumps({"sections": ["Header", "Body", "Footer"]})
)
segmented = client.segment(options=segment_opts)

Workflows → Pipelines

The Workflows API (/api/v1/workflows) is deprecated. Use Pipelines for all new integrations and migrate existing workflows.

Pipelines replace Workflows with a simpler API, per-step status tracking, versioning, and a visual editor in Forge.

Workflows	Pipelines
`POST /api/v1/workflows/workflows`	`POST /api/v1/pipelines` (via SDK: `client.create_pipeline()`)
`POST /api/v1/workflows/workflows/{id}/execute`	`POST /api/v1/pipelines/{id}/run` (via SDK: `client.run_pipeline()`)
`GET /api/v1/workflows/executions/{id}`	`GET /api/v1/pipelines/executions/{id}` (via SDK: `client.get_pipeline_execution()`)

See Pipelines for a full walkthrough.

Custom Pipeline → Custom Processor

POST /api/v1/custom-pipeline is deprecated (sunset: September 30, 2026). Migrate to POST /api/v1/custom-processor. The management routes /api/v1/custom_pipelines/* are also deprecated; use /api/v1/custom_processors/* instead.

from datalab_sdk import DatalabClient, CustomProcessorOptions

client = DatalabClient()
options = CustomProcessorOptions(pipeline_id="cp_XXXXX")
result = client.run_custom_pipeline("document.pdf", options=options)

from datalab_sdk import DatalabClient, CustomProcessorOptions

client = DatalabClient()
options = CustomProcessorOptions(pipeline_id="cp_XXXXX")
result = client.run_custom_processor("document.pdf", options=options)

curl -X POST https://www.datalab.to/api/v1/custom-pipeline \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@document.pdf" \
  -F "pipeline_id=cp_XXXXX"

curl -X POST https://www.datalab.to/api/v1/custom-processor \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@document.pdf" \
  -F "pipeline_id=cp_XXXXX"

The response format is identical. CustomPipelineOptions remains as a backward-compatible alias for CustomProcessorOptions.

Table Recognition → Document Conversion

The standalone Table Recognition endpoint (/api/v1/table_rec) is deprecated. Use the Document Conversion endpoint with JSON output instead.

Before (deprecated)

# Old: Dedicated table recognition endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/table_rec",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)

After (current)

from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

options = ConvertOptions(
    output_format="json",
    mode="balanced"
)

result = client.convert("document.pdf", options=options)

# Tables are in the JSON output with block_type "Table"
for block in result.json.get("children", []):
    if block.get("block_type") == "Table":
        print(f"Table: {block['id']}")
        print(f"Bounding box: {block['bbox']}")
        # Access cells in block['children']

curl -X POST https://www.datalab.to/api/v1/convert \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=json" \
  -F "mode=balanced"

OCR → Document Conversion

The standalone OCR endpoint (/api/v1/ocr) is deprecated. Use the Document Conversion endpoint instead, which includes OCR as part of its processing pipeline.

Before (deprecated)

# Old: Dedicated OCR endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/ocr",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)

After (current)

from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

# For text extraction, use markdown output
result = client.convert("document.pdf")
print(result.markdown)

# For page-level text, use JSON output
options = ConvertOptions(output_format="json")
result = client.convert("document.pdf", options=options)

curl -X POST https://www.datalab.to/api/v1/convert \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Next Steps

Document Conversion

Full guide to the current conversion API

Structured Extraction

Extract structured data using JSON schemas

Changelog

See all API changes and deprecations

SDK Reference

Use the SDK for the simplest migration path

​Marker → Dedicated Endpoints

​SDK upgrade

​Document Conversion

​Structured Extraction

​Document Segmentation

​Track Changes

​Checkpoint reuse

​Workflows → Pipelines

​Custom Pipeline → Custom Processor

​Table Recognition → Document Conversion

​Before (deprecated)

​After (current)

​OCR → Document Conversion

​Before (deprecated)

​After (current)

​Next Steps

Document Conversion

Structured Extraction

Changelog

SDK Reference

Marker → Dedicated Endpoints

SDK upgrade

Document Conversion

Structured Extraction

Document Segmentation

Track Changes

Checkpoint reuse

Workflows → Pipelines

Custom Pipeline → Custom Processor

Table Recognition → Document Conversion

Before (deprecated)

After (current)

OCR → Document Conversion

Before (deprecated)

After (current)

Next Steps