Skip to main content

Get Your API Key

Sign up at datalab.to/auth/sign_up — new accounts include $5 in free credits, enough to process hundreds of pages. Then grab your API key from the API Keys dashboard.
Want to try before writing code? Upload a document to the Forge Playground to see results instantly — no API key required.

Installation

Install the Datalab SDK:
pip install datalab-python-sdk
Set your API key as an environment variable:
export DATALAB_API_KEY=your_api_key_here

Convert a Document

The SDK provides a simple interface to convert documents to Markdown, HTML, JSON, or chunks.
from datalab_sdk import DatalabClient

client = DatalabClient()  # Uses DATALAB_API_KEY env var

# Convert PDF to markdown
result = client.convert("document.pdf")
print(result.markdown)

# Save output and images
result.save_output("output/")
Common mistakes:
  • Forgetting to set the DATALAB_API_KEY environment variable
  • Using file_url with a private/authenticated URL (must be publicly accessible)
  • Not polling for results — the initial response only contains a request_id, not the actual output

Conversion Options

Control the conversion with options:
from datalab_sdk import DatalabClient, ConvertOptions

client = DatalabClient()

options = ConvertOptions(
    output_format="markdown",  # "markdown", "html", "json", "chunks"
    mode="balanced",           # "fast", "balanced", "accurate"
    paginate=True,             # Add page delimiters
    page_range="0-10",         # Process specific pages (0-indexed)
)

result = client.convert("document.pdf", options=options)

Processing Modes

ModeDescription
fastLowest latency, good for simple documents (SDK default)
balancedBalance of speed and accuracy
accurateHighest accuracy, best for complex layouts

Fill PDF Forms

Fill forms in PDFs or images with structured data:
from datalab_sdk import DatalabClient, FormFillingOptions

client = DatalabClient()

options = FormFillingOptions(
    field_data={
        "full_name": {"value": "John Doe", "description": "Full legal name"},
        "date": {"value": "2024-01-15", "description": "Today's date"},
        "signature": {"value": "John Doe", "description": "Signature field"},
    }
)

result = client.fill("form.pdf", options=options)
result.save_output("filled_form.pdf")

Upload and Manage Files

Upload files to Datalab for use in pipelines:
from datalab_sdk import DatalabClient

client = DatalabClient()

# Upload files
uploaded = client.upload_files(["doc1.pdf", "doc2.pdf"])
for file in uploaded:
    print(f"{file.original_filename}: {file.reference}")
    # Output: doc1.pdf: datalab://file-abc123

# List your files
files = client.list_files(limit=50)
print(f"Total files: {files['total']}")

CLI

The SDK includes a command-line interface:
# Convert a single document
datalab convert document.pdf --format markdown

# Convert with options
datalab convert document.pdf --mode accurate --paginate

# Convert a directory
datalab convert ./documents/ --output_dir ./output/

Run a Pipeline

Pipelines chain processors (convert, extract, segment) into a single reusable call. Create them in Forge or via the SDK:
from datalab_sdk import DatalabClient

client = DatalabClient()

# Run an existing pipeline
execution = client.run_pipeline(
    "pl_abc123",              # Your pipeline ID
    file_path="document.pdf"
)

# Poll until complete
execution = client.get_pipeline_execution(
    execution.execution_id,
    max_polls=300
)

# Get extraction results (step index 1 = extract step)
result = client.get_step_result(execution.execution_id, step_index=1)
print(result)
See Pipelines for creating, versioning, and running pipelines.

Async Support

For high-throughput applications, use the async client:
import asyncio
from datalab_sdk import AsyncDatalabClient

async def convert_documents():
    async with AsyncDatalabClient() as client:
        result = await client.convert("document.pdf")
        print(result.markdown)

asyncio.run(convert_documents())

Next Steps

SDK Reference

Full Python SDK documentation with typed clients and async support.

API Reference

REST API reference for document conversion, form filling, and file management.

Pipelines

Chain processors into versioned, reusable pipelines.

Document Conversion

Detailed guide to converting PDFs and documents to Markdown, HTML, or JSON.