Skip to main content

Installation

pip install datalab-python-sdk
Requires Python 3.10 or higher.

Authentication

Set your API key as an environment variable (recommended):
export DATALAB_API_KEY=your_api_key_here
Or pass it directly to the client:
from datalab_sdk import DatalabClient

client = DatalabClient(api_key="your_api_key_here")
Get your API key from the API Keys dashboard.

Quick Example

from datalab_sdk import DatalabClient

client = DatalabClient()

# Convert a document to markdown
result = client.convert("document.pdf")
print(result.markdown)

# Save output with images
result.save_output("output/")

Client Options

Both sync and async clients accept the same configuration options:
from datalab_sdk import DatalabClient, AsyncDatalabClient

# Synchronous client (blocking)
client = DatalabClient(
    api_key="your_key",           # Or use DATALAB_API_KEY env var
    base_url="https://www.datalab.to",  # API endpoint
    timeout=300,                  # Request timeout in seconds
)

# Asynchronous client (non-blocking)
async_client = AsyncDatalabClient(
    api_key="your_key",
    base_url="https://www.datalab.to",
    timeout=300,
)
ParameterTypeDefaultDescription
api_keystrDATALAB_API_KEY env varYour Datalab API key
base_urlstrhttps://www.datalab.toAPI base URL
timeoutint300Request timeout in seconds

Async Support

For high-throughput applications, use AsyncDatalabClient:
import asyncio
from datalab_sdk import AsyncDatalabClient

async def process_documents():
    async with AsyncDatalabClient() as client:
        result = await client.convert("document.pdf")
        print(result.markdown)

asyncio.run(process_documents())
The async client is recommended when processing multiple documents concurrently.

Error Handling

The SDK raises specific exceptions for different error types:
from datalab_sdk import DatalabClient
from datalab_sdk.exceptions import (
    DatalabAPIError,
    DatalabTimeoutError,
    DatalabFileError,
    DatalabValidationError,
)

client = DatalabClient()

try:
    result = client.convert("document.pdf")
except DatalabAPIError as e:
    print(f"API error {e.status_code}: {e.response_data}")
except DatalabTimeoutError:
    print("Request timed out")
except DatalabFileError as e:
    print(f"File error: {e}")
except DatalabValidationError as e:
    print(f"Invalid input: {e}")
ExceptionDescription
DatalabAPIErrorAPI returned an error response (includes status_code and response_data)
DatalabTimeoutErrorRequest exceeded timeout
DatalabFileErrorFile not found or cannot be read
DatalabValidationErrorInvalid parameters provided

Automatic Retries

The SDK automatically retries requests for:
  • 408 Request Timeout
  • 429 Rate Limit Exceeded
  • 5xx Server Errors
Retries use exponential backoff. You can control polling behavior with max_polls and poll_interval parameters on individual methods.

SDK Features

Method Summary

MethodDescription
convert()Convert documents to markdown, HTML, JSON, or chunks
extract()Extract structured data from documents using JSON schemas
segment()Segment documents into sections using a schema
track_changes()Extract tracked changes from DOCX documents
create_document()Create DOCX from markdown with track changes
run_custom_pipeline()Execute a custom pipeline
fill()Fill PDF or image forms with field data
upload_files()Upload files to Datalab storage
list_files()List uploaded files
get_file_metadata()Get metadata for a specific file
get_file_download_url()Generate presigned download URL
delete_file()Delete an uploaded file
create_workflow()Create a workflow definition
execute_workflow()Execute a workflow
get_execution_status()Check workflow execution status
list_workflows()List all workflows
get_workflow()Get a workflow by ID
delete_workflow()Delete a workflow
get_step_types()Get available workflow step types
ocr()(Deprecated) Use convert() instead

Next Steps