This guide helps you migrate from deprecated Datalab API endpoints to their current replacements.
Marker → Dedicated Endpoints
The /api/v1/marker endpoint is deprecated. Migrate to the new dedicated endpoints below.
The monolithic /api/v1/marker endpoint has been replaced with dedicated endpoints for each operation:
Old Usage New Endpoint SDK Method /marker (basic conversion)POST /api/v1/convertclient.convert()/marker with page_schemaPOST /api/v1/extractclient.extract()/marker with segmentation_schemaPOST /api/v1/segmentclient.segment()/marker with extras=track_changesPOST /api/v1/track-changesclient.track_changes()/marker with pipeline_idPOST /api/v1/custom-pipelineclient.run_custom_pipeline()
SDK upgrade
Update to SDK v0.3.0 for the new dedicated methods:
pip install --upgrade datalab-python-sdk
SDK users who only use client.convert() do not need to change code — it continues to work and now calls /api/v1/convert internally.
Document Conversion
Python SDK
cURL (before)
cURL (after)
# No changes needed — convert() works the same
from datalab_sdk import DatalabClient, ConvertOptions
client = DatalabClient()
result = client.convert( "document.pdf" )
print (result.markdown)
Python SDK (before)
Python SDK (after)
# Old: page_schema on ConvertOptions
from datalab_sdk import DatalabClient, ConvertOptions
options = ConvertOptions( page_schema = schema)
result = client.convert( "invoice.pdf" , options = options)
Document Segmentation
Python SDK (before)
Python SDK (after)
# Old: segmentation_schema on ConvertOptions
options = ConvertOptions( segmentation_schema = json.dumps(schema))
result = client.convert( "document.pdf" , options = options)
Track Changes
Python SDK (before)
Python SDK (after)
# Old: extras parameter on ConvertOptions
options = ConvertOptions( extras = "track_changes" , output_format = "html" )
result = client.convert( "contract.docx" , options = options)
Checkpoint workflow
The new endpoints support a checkpoint system to avoid re-parsing documents. Convert once, then extract or segment multiple times:
from datalab_sdk import DatalabClient, ConvertOptions, ExtractOptions, SegmentOptions
import json
client = DatalabClient()
# Step 1: Convert and save checkpoint
options = ConvertOptions( save_checkpoint = True )
result = client.convert( "document.pdf" , options = options)
checkpoint_id = result.checkpoint_id
# Step 2: Extract using checkpoint (no re-parsing)
extract_opts = ExtractOptions(
checkpoint_id = checkpoint_id,
page_schema = json.dumps({ "invoice_number" : { "type" : "string" }})
)
extracted = client.extract( options = extract_opts)
# Step 3: Segment using same checkpoint
segment_opts = SegmentOptions(
checkpoint_id = checkpoint_id,
segmentation_schema = json.dumps({ "sections" : [ "Header" , "Body" , "Footer" ]})
)
segmented = client.segment( options = segment_opts)
Table Recognition → Document Conversion
The standalone Table Recognition endpoint (/api/v1/table_rec) is deprecated. Use the Document Conversion endpoint with JSON output instead.
Before (deprecated)
# Old: Dedicated table recognition endpoint
response = requests.post(
"https://www.datalab.to/api/v1/table_rec" ,
files = { "file" : ( "doc.pdf" , f, "application/pdf" )},
headers = { "X-API-Key" : API_KEY }
)
After (current)
from datalab_sdk import DatalabClient, ConvertOptions
client = DatalabClient()
options = ConvertOptions(
output_format = "json" ,
mode = "balanced"
)
result = client.convert( "document.pdf" , options = options)
# Tables are in the JSON output with block_type "Table"
for block in result.json.get( "children" , []):
if block.get( "block_type" ) == "Table" :
print ( f "Table: { block[ 'id' ] } " )
print ( f "Bounding box: { block[ 'bbox' ] } " )
# Access cells in block['children']
OCR → Document Conversion
The standalone OCR endpoint (/api/v1/ocr) is deprecated. Use the Document Conversion endpoint instead, which includes OCR as part of its processing pipeline.
Before (deprecated)
# Old: Dedicated OCR endpoint
response = requests.post(
"https://www.datalab.to/api/v1/ocr" ,
files = { "file" : ( "doc.pdf" , f, "application/pdf" )},
headers = { "X-API-Key" : API_KEY }
)
After (current)
from datalab_sdk import DatalabClient, ConvertOptions
client = DatalabClient()
# For text extraction, use markdown output
result = client.convert( "document.pdf" )
print (result.markdown)
# For page-level text, use JSON output
options = ConvertOptions( output_format = "json" )
result = client.convert( "document.pdf" , options = options)
Next Steps