This guide helps you migrate from deprecated Datalab API endpoints to their current replacements.
Marker → Dedicated Endpoints
The /api/v1/marker endpoint is deprecated. Migrate to the new dedicated endpoints below.
The monolithic /api/v1/marker endpoint has been replaced with dedicated endpoints for each operation:
Old Usage New Endpoint SDK Method /marker (basic conversion)POST /api/v1/convertclient.convert()/marker with page_schemaPOST /api/v1/extractclient.extract()/marker with segmentation_schemaPOST /api/v1/segmentclient.segment()/marker with extras=track_changesPOST /api/v1/track-changesclient.track_changes()/marker with pipeline_idPOST /api/v1/custom-processorclient.run_custom_processor()
SDK upgrade
Update to the latest SDK for the new dedicated methods:
pip install --upgrade datalab-python-sdk
SDK users who only use client.convert() do not need to change code — it continues to work and now calls /api/v1/convert internally.
Document Conversion
Python SDK
cURL (before)
cURL (after)
# No changes needed — convert() works the same
from datalab_sdk import DatalabClient, ConvertOptions
client = DatalabClient()
result = client.convert( "document.pdf" )
print (result.markdown)
Python SDK (before)
Python SDK (after)
# Old: page_schema on ConvertOptions
from datalab_sdk import DatalabClient, ConvertOptions
options = ConvertOptions( page_schema = schema)
result = client.convert( "invoice.pdf" , options = options)
Document Segmentation
Python SDK (before)
Python SDK (after)
# Old: segmentation_schema on ConvertOptions
options = ConvertOptions( segmentation_schema = json.dumps(schema))
result = client.convert( "document.pdf" , options = options)
Track Changes
Python SDK (before)
Python SDK (after)
# Old: extras parameter on ConvertOptions
options = ConvertOptions( extras = "track_changes" , output_format = "html" )
result = client.convert( "contract.docx" , options = options)
Checkpoint reuse
The new endpoints support a checkpoint system to avoid re-parsing documents. Convert once, then extract or segment multiple times:
from datalab_sdk import DatalabClient, ConvertOptions, ExtractOptions, SegmentOptions
import json
client = DatalabClient()
# Step 1: Convert and save checkpoint
options = ConvertOptions( save_checkpoint = True )
result = client.convert( "document.pdf" , options = options)
checkpoint_id = result.checkpoint_id
# Step 2: Extract using checkpoint (no re-parsing)
extract_opts = ExtractOptions(
checkpoint_id = checkpoint_id,
page_schema = json.dumps({ "invoice_number" : { "type" : "string" }})
)
extracted = client.extract( options = extract_opts)
# Step 3: Segment using same checkpoint
segment_opts = SegmentOptions(
checkpoint_id = checkpoint_id,
segmentation_schema = json.dumps({ "sections" : [ "Header" , "Body" , "Footer" ]})
)
segmented = client.segment( options = segment_opts)
Workflows → Pipelines
The Workflows API (/api/v1/workflows) is deprecated. Use Pipelines for all new integrations and migrate existing workflows.
Pipelines replace Workflows with a simpler API, per-step status tracking, versioning, and a visual editor in Forge.
Workflows Pipelines POST /api/v1/workflows/workflowsPOST /api/v1/pipelines (via SDK: client.create_pipeline())POST /api/v1/workflows/workflows/{id}/executePOST /api/v1/pipelines/{id}/run (via SDK: client.run_pipeline())GET /api/v1/workflows/executions/{id}GET /api/v1/pipelines/executions/{id} (via SDK: client.get_pipeline_execution())
See Pipelines for a full walkthrough.
Custom Pipeline → Custom Processor
POST /api/v1/custom-pipeline is deprecated (sunset: September 30, 2026). Migrate to POST /api/v1/custom-processor. The management routes /api/v1/custom_pipelines/* are also deprecated; use /api/v1/custom_processors/* instead.
Python SDK (before)
Python SDK (after)
cURL (before)
cURL (after)
from datalab_sdk import DatalabClient, CustomProcessorOptions
client = DatalabClient()
options = CustomProcessorOptions( pipeline_id = "cp_XXXXX" )
result = client.run_custom_pipeline( "document.pdf" , options = options)
The response format is identical. CustomPipelineOptions remains as a backward-compatible alias for CustomProcessorOptions.
Table Recognition → Document Conversion
The standalone Table Recognition endpoint (/api/v1/table_rec) is deprecated. Use the Document Conversion endpoint with JSON output instead.
Before (deprecated)
# Old: Dedicated table recognition endpoint
response = requests.post(
"https://www.datalab.to/api/v1/table_rec" ,
files = { "file" : ( "doc.pdf" , f, "application/pdf" )},
headers = { "X-API-Key" : API_KEY }
)
After (current)
from datalab_sdk import DatalabClient, ConvertOptions
client = DatalabClient()
options = ConvertOptions(
output_format = "json" ,
mode = "balanced"
)
result = client.convert( "document.pdf" , options = options)
# Tables are in the JSON output with block_type "Table"
for block in result.json.get( "children" , []):
if block.get( "block_type" ) == "Table" :
print ( f "Table: { block[ 'id' ] } " )
print ( f "Bounding box: { block[ 'bbox' ] } " )
# Access cells in block['children']
OCR → Document Conversion
The standalone OCR endpoint (/api/v1/ocr) is deprecated. Use the Document Conversion endpoint instead, which includes OCR as part of its processing pipeline.
Before (deprecated)
# Old: Dedicated OCR endpoint
response = requests.post(
"https://www.datalab.to/api/v1/ocr" ,
files = { "file" : ( "doc.pdf" , f, "application/pdf" )},
headers = { "X-API-Key" : API_KEY }
)
After (current)
from datalab_sdk import DatalabClient, ConvertOptions
client = DatalabClient()
# For text extraction, use markdown output
result = client.convert( "document.pdf" )
print (result.markdown)
# For page-level text, use JSON output
options = ConvertOptions( output_format = "json" )
result = client.convert( "document.pdf" , options = options)
Next Steps
Document Conversion Full guide to the current conversion API
Structured Extraction Extract structured data using JSON schemas
Changelog See all API changes and deprecations
SDK Reference Use the SDK for the simplest migration path