Skip to main content
This guide helps you migrate from deprecated Datalab API endpoints to their current replacements.

Marker → Dedicated Endpoints

The /api/v1/marker endpoint is deprecated. Migrate to the new dedicated endpoints below.
The monolithic /api/v1/marker endpoint has been replaced with dedicated endpoints for each operation:
Old UsageNew EndpointSDK Method
/marker (basic conversion)POST /api/v1/convertclient.convert()
/marker with page_schemaPOST /api/v1/extractclient.extract()
/marker with segmentation_schemaPOST /api/v1/segmentclient.segment()
/marker with extras=track_changesPOST /api/v1/track-changesclient.track_changes()
/marker with pipeline_idPOST /api/v1/custom-pipelineclient.run_custom_pipeline()

SDK upgrade

Update to SDK v0.3.0 for the new dedicated methods:
pip install --upgrade datalab-python-sdk
SDK users who only use client.convert() do not need to change code — it continues to work and now calls /api/v1/convert internally.

Document Conversion

# No changes needed — convert() works the same
from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()
result = client.convert("document.pdf")
print(result.markdown)

Structured Extraction

# Old: page_schema on ConvertOptions
from datalab_sdk import DatalabClient, ConvertOptions
options = ConvertOptions(page_schema=schema)
result = client.convert("invoice.pdf", options=options)

Document Segmentation

# Old: segmentation_schema on ConvertOptions
options = ConvertOptions(segmentation_schema=json.dumps(schema))
result = client.convert("document.pdf", options=options)

Track Changes

# Old: extras parameter on ConvertOptions
options = ConvertOptions(extras="track_changes", output_format="html")
result = client.convert("contract.docx", options=options)

Checkpoint workflow

The new endpoints support a checkpoint system to avoid re-parsing documents. Convert once, then extract or segment multiple times:
from datalab_sdk import DatalabClient, ConvertOptions, ExtractOptions, SegmentOptions
import json

client = DatalabClient()

# Step 1: Convert and save checkpoint
options = ConvertOptions(save_checkpoint=True)
result = client.convert("document.pdf", options=options)
checkpoint_id = result.checkpoint_id

# Step 2: Extract using checkpoint (no re-parsing)
extract_opts = ExtractOptions(
    checkpoint_id=checkpoint_id,
    page_schema=json.dumps({"invoice_number": {"type": "string"}})
)
extracted = client.extract(options=extract_opts)

# Step 3: Segment using same checkpoint
segment_opts = SegmentOptions(
    checkpoint_id=checkpoint_id,
    segmentation_schema=json.dumps({"sections": ["Header", "Body", "Footer"]})
)
segmented = client.segment(options=segment_opts)

Table Recognition → Document Conversion

The standalone Table Recognition endpoint (/api/v1/table_rec) is deprecated. Use the Document Conversion endpoint with JSON output instead.

Before (deprecated)

# Old: Dedicated table recognition endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/table_rec",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)

After (current)

from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

options = ConvertOptions(
    output_format="json",
    mode="balanced"
)

result = client.convert("document.pdf", options=options)

# Tables are in the JSON output with block_type "Table"
for block in result.json.get("children", []):
    if block.get("block_type") == "Table":
        print(f"Table: {block['id']}")
        print(f"Bounding box: {block['bbox']}")
        # Access cells in block['children']

OCR → Document Conversion

The standalone OCR endpoint (/api/v1/ocr) is deprecated. Use the Document Conversion endpoint instead, which includes OCR as part of its processing pipeline.

Before (deprecated)

# Old: Dedicated OCR endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/ocr",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)

After (current)

from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

# For text extraction, use markdown output
result = client.convert("document.pdf")
print(result.markdown)

# For page-level text, use JSON output
options = ConvertOptions(output_format="json")
result = client.convert("document.pdf", options=options)

Next Steps