> ## Documentation Index
> Fetch the complete documentation index at: https://documentation.datalab.to/llms.txt
> Use this file to discover all available pages before exploring further.

# Migration Guide

> Migrate from deprecated endpoints to the current API.

This guide helps you migrate from deprecated Datalab API endpoints to their current replacements.

## Marker → Dedicated Endpoints

<Warning>
  The `/api/v1/marker` endpoint is deprecated. Migrate to the new dedicated endpoints below.
</Warning>

The monolithic `/api/v1/marker` endpoint has been replaced with dedicated endpoints for each operation:

| Old Usage                             | New Endpoint                    | SDK Method                      |
| ------------------------------------- | ------------------------------- | ------------------------------- |
| `/marker` (basic conversion)          | `POST /api/v1/convert`          | `client.convert()`              |
| `/marker` with `page_schema`          | `POST /api/v1/extract`          | `client.extract()`              |
| `/marker` with `segmentation_schema`  | `POST /api/v1/segment`          | `client.segment()`              |
| `/marker` with `extras=track_changes` | `POST /api/v1/track-changes`    | `client.track_changes()`        |
| `/marker` with `pipeline_id`          | `POST /api/v1/custom-processor` | `client.run_custom_processor()` |

### SDK upgrade

Update to the latest SDK for the new dedicated methods:

```bash theme={null}
pip install --upgrade datalab-python-sdk
```

SDK users who only use `client.convert()` do not need to change code — it continues to work and now calls `/api/v1/convert` internally.

### Document Conversion

<CodeGroup>
  ```python Python SDK theme={null}
  # No changes needed — convert() works the same
  from datalab_sdk import DatalabClient, ConvertOptions

  client = DatalabClient()
  result = client.convert("document.pdf")
  print(result.markdown)
  ```

  ```bash cURL (before) theme={null}
  # Old
  curl -X POST https://www.datalab.to/api/v1/marker \
    -H "X-API-Key: $DATALAB_API_KEY" \
    -F "file=@document.pdf" \
    -F "output_format=markdown"
  ```

  ```bash cURL (after) theme={null}
  # New
  curl -X POST https://www.datalab.to/api/v1/convert \
    -H "X-API-Key: $DATALAB_API_KEY" \
    -F "file=@document.pdf" \
    -F "output_format=markdown"
  ```
</CodeGroup>

### Structured Extraction

<CodeGroup>
  ```python Python SDK (before) theme={null}
  # Old: page_schema on ConvertOptions
  from datalab_sdk import DatalabClient, ConvertOptions
  options = ConvertOptions(page_schema=schema)
  result = client.convert("invoice.pdf", options=options)
  ```

  ```python Python SDK (after) theme={null}
  # New: Dedicated extract() method with ExtractOptions
  from datalab_sdk import DatalabClient, ExtractOptions
  import json

  client = DatalabClient()
  options = ExtractOptions(
      page_schema=json.dumps(schema)
  )
  result = client.extract("invoice.pdf", options=options)
  extracted = json.loads(result.extraction_schema_json)
  ```
</CodeGroup>

### Document Segmentation

<CodeGroup>
  ```python Python SDK (before) theme={null}
  # Old: segmentation_schema on ConvertOptions
  options = ConvertOptions(segmentation_schema=json.dumps(schema))
  result = client.convert("document.pdf", options=options)
  ```

  ```python Python SDK (after) theme={null}
  # New: Dedicated segment() method with SegmentOptions
  from datalab_sdk import DatalabClient, SegmentOptions
  import json

  client = DatalabClient()
  options = SegmentOptions(
      segmentation_schema=json.dumps(schema)
  )
  result = client.segment("document.pdf", options=options)
  segments = result.segmentation_results
  ```
</CodeGroup>

### Track Changes

<CodeGroup>
  ```python Python SDK (before) theme={null}
  # Old: extras parameter on ConvertOptions
  options = ConvertOptions(extras="track_changes", output_format="html")
  result = client.convert("contract.docx", options=options)
  ```

  ```python Python SDK (after) theme={null}
  # New: Dedicated track_changes() method
  from datalab_sdk import DatalabClient, TrackChangesOptions

  client = DatalabClient()
  options = TrackChangesOptions(output_format="markdown,html,chunks")
  result = client.track_changes("contract.docx", options=options)
  ```
</CodeGroup>

### Checkpoint reuse

The new endpoints support a checkpoint system to avoid re-parsing documents. Convert once, then extract or segment multiple times:

```python theme={null}
from datalab_sdk import DatalabClient, ConvertOptions, ExtractOptions, SegmentOptions
import json

client = DatalabClient()

# Step 1: Convert and save checkpoint
options = ConvertOptions(save_checkpoint=True)
result = client.convert("document.pdf", options=options)
checkpoint_id = result.checkpoint_id

# Step 2: Extract using checkpoint (no re-parsing)
extract_opts = ExtractOptions(
    checkpoint_id=checkpoint_id,
    page_schema=json.dumps({"invoice_number": {"type": "string"}})
)
extracted = client.extract(options=extract_opts)

# Step 3: Segment using same checkpoint
segment_opts = SegmentOptions(
    checkpoint_id=checkpoint_id,
    segmentation_schema=json.dumps({"sections": ["Header", "Body", "Footer"]})
)
segmented = client.segment(options=segment_opts)
```

## Workflows → Pipelines

<Warning>
  The Workflows API (`/api/v1/workflows`) is deprecated. Use [Pipelines](/docs/recipes/pipelines/pipeline-overview) for all new integrations and migrate existing workflows.
</Warning>

Pipelines replace Workflows with a simpler API, per-step status tracking, versioning, and a visual editor in Forge.

| Workflows                                       | Pipelines                                                                            |
| ----------------------------------------------- | ------------------------------------------------------------------------------------ |
| `POST /api/v1/workflows/workflows`              | `POST /api/v1/pipelines` (via SDK: `client.create_pipeline()`)                       |
| `POST /api/v1/workflows/workflows/{id}/execute` | `POST /api/v1/pipelines/{id}/run` (via SDK: `client.run_pipeline()`)                 |
| `GET /api/v1/workflows/executions/{id}`         | `GET /api/v1/pipelines/executions/{id}` (via SDK: `client.get_pipeline_execution()`) |

See [Pipelines](/docs/recipes/pipelines/pipeline-overview) for a full walkthrough.

## Custom Pipeline → Custom Processor

<Warning>
  `POST /api/v1/custom-pipeline` is deprecated (sunset: September 30, 2026). Migrate to `POST /api/v1/custom-processor`. The management routes `/api/v1/custom_pipelines/*` are also deprecated; use `/api/v1/custom_processors/*` instead.
</Warning>

<CodeGroup>
  ```python Python SDK (before) theme={null}
  from datalab_sdk import DatalabClient, CustomProcessorOptions

  client = DatalabClient()
  options = CustomProcessorOptions(pipeline_id="cp_XXXXX")
  result = client.run_custom_pipeline("document.pdf", options=options)
  ```

  ```python Python SDK (after) theme={null}
  from datalab_sdk import DatalabClient, CustomProcessorOptions

  client = DatalabClient()
  options = CustomProcessorOptions(pipeline_id="cp_XXXXX")
  result = client.run_custom_processor("document.pdf", options=options)
  ```

  ```bash cURL (before) theme={null}
  curl -X POST https://www.datalab.to/api/v1/custom-pipeline \
    -H "X-API-Key: $DATALAB_API_KEY" \
    -F "file=@document.pdf" \
    -F "pipeline_id=cp_XXXXX"
  ```

  ```bash cURL (after) theme={null}
  curl -X POST https://www.datalab.to/api/v1/custom-processor \
    -H "X-API-Key: $DATALAB_API_KEY" \
    -F "file=@document.pdf" \
    -F "pipeline_id=cp_XXXXX"
  ```
</CodeGroup>

The response format is identical. `CustomPipelineOptions` remains as a backward-compatible alias for `CustomProcessorOptions`.

## Table Recognition → Document Conversion

The standalone Table Recognition endpoint (`/api/v1/table_rec`) is deprecated. Use the Document Conversion endpoint with JSON output instead.

### Before (deprecated)

```python theme={null}
# Old: Dedicated table recognition endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/table_rec",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)
```

### After (current)

<CodeGroup>
  ```python Python SDK theme={null}
  from datalab_sdk import DatalabClient, ConvertOptions

  client = DatalabClient()

  options = ConvertOptions(
      output_format="json",
      mode="balanced"
  )

  result = client.convert("document.pdf", options=options)

  # Tables are in the JSON output with block_type "Table"
  for block in result.json.get("children", []):
      if block.get("block_type") == "Table":
          print(f"Table: {block['id']}")
          print(f"Bounding box: {block['bbox']}")
          # Access cells in block['children']
  ```

  ```bash cURL theme={null}
  curl -X POST https://www.datalab.to/api/v1/convert \
    -H "X-API-Key: $DATALAB_API_KEY" \
    -F "file=@document.pdf" \
    -F "output_format=json" \
    -F "mode=balanced"
  ```
</CodeGroup>

## OCR → Document Conversion

The standalone OCR endpoint (`/api/v1/ocr`) is deprecated. Use the Document Conversion endpoint instead, which includes OCR as part of its processing pipeline.

### Before (deprecated)

```python theme={null}
# Old: Dedicated OCR endpoint
response = requests.post(
    "https://www.datalab.to/api/v1/ocr",
    files={"file": ("doc.pdf", f, "application/pdf")},
    headers={"X-API-Key": API_KEY}
)
```

### After (current)

<CodeGroup>
  ```python Python SDK theme={null}
  from datalab_sdk import DatalabClient, ConvertOptions

  client = DatalabClient()

  # For text extraction, use markdown output
  result = client.convert("document.pdf")
  print(result.markdown)

  # For page-level text, use JSON output
  options = ConvertOptions(output_format="json")
  result = client.convert("document.pdf", options=options)
  ```

  ```bash cURL theme={null}
  curl -X POST https://www.datalab.to/api/v1/convert \
    -H "X-API-Key: $DATALAB_API_KEY" \
    -F "file=@document.pdf" \
    -F "output_format=markdown"
  ```
</CodeGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Document Conversion" icon="file-text" href="/docs/recipes/conversion/conversion-api-overview">
    Full guide to the current conversion API
  </Card>

  <Card title="Structured Extraction" icon="table" href="/docs/recipes/structured-extraction/api-overview">
    Extract structured data using JSON schemas
  </Card>

  <Card title="Changelog" icon="clock-rotate-left" href="/platform/changelog">
    See all API changes and deprecations
  </Card>

  <Card title="SDK Reference" icon="code" href="/docs/welcome/sdk">
    Use the SDK for the simplest migration path
  </Card>
</CardGroup>
