Skip to main content

Installation

pip install datalab-python-sdk
Requires Python 3.10 or higher.

Authentication

Set your API key as an environment variable (recommended):
export DATALAB_API_KEY=your_api_key_here
Or pass it directly to the client:
from datalab_sdk import DatalabClient

client = DatalabClient(api_key="your_api_key_here")
Get your API key from datalab.to/settings.

Quick Example

from datalab_sdk import DatalabClient

client = DatalabClient()

# Convert a document to markdown
result = client.convert("document.pdf")
print(result.markdown)

# Save output with images
result.save_output("output/")

Client Options

Both sync and async clients accept the same configuration options:
from datalab_sdk import DatalabClient, AsyncDatalabClient

# Synchronous client (blocking)
client = DatalabClient(
    api_key="your_key",           # Or use DATALAB_API_KEY env var
    base_url="https://www.datalab.to",  # API endpoint
    timeout=300,                  # Request timeout in seconds
)

# Asynchronous client (non-blocking)
async_client = AsyncDatalabClient(
    api_key="your_key",
    base_url="https://www.datalab.to",
    timeout=300,
)
ParameterTypeDefaultDescription
api_keystrDATALAB_API_KEY env varYour Datalab API key
base_urlstrhttps://www.datalab.toAPI base URL
timeoutint300Request timeout in seconds

Async Support

For high-throughput applications, use AsyncDatalabClient:
import asyncio
from datalab_sdk import AsyncDatalabClient

async def process_documents():
    async with AsyncDatalabClient() as client:
        result = await client.convert("document.pdf")
        print(result.markdown)

asyncio.run(process_documents())
The async client is recommended when processing multiple documents concurrently.

Error Handling

The SDK raises specific exceptions for different error types:
from datalab_sdk import DatalabClient
from datalab_sdk.exceptions import (
    DatalabAPIError,
    DatalabTimeoutError,
    DatalabFileError,
    DatalabValidationError,
)

client = DatalabClient()

try:
    result = client.convert("document.pdf")
except DatalabAPIError as e:
    print(f"API error {e.status_code}: {e.response_data}")
except DatalabTimeoutError:
    print("Request timed out")
except DatalabFileError as e:
    print(f"File error: {e}")
except DatalabValidationError as e:
    print(f"Invalid input: {e}")
ExceptionDescription
DatalabAPIErrorAPI returned an error response (includes status_code and response_data)
DatalabTimeoutErrorRequest exceeded timeout
DatalabFileErrorFile not found or cannot be read
DatalabValidationErrorInvalid parameters provided

Automatic Retries

The SDK automatically retries requests for:
  • 408 Request Timeout
  • 429 Rate Limit Exceeded
  • 5xx Server Errors
Retries use exponential backoff. You can control polling behavior with max_polls and poll_interval parameters on individual methods.

SDK Features

Method Summary

MethodDescription
convert()Convert documents to markdown, HTML, JSON, or chunks
fill()Fill PDF or image forms with field data
upload_files()Upload files to Datalab storage
list_files()List uploaded files
get_file_metadata()Get metadata for a specific file
get_file_download_url()Generate presigned download URL
delete_file()Delete an uploaded file
create_workflow()Create a workflow definition
execute_workflow()Execute a workflow
get_execution_status()Check workflow execution status
list_workflows()List all workflows
get_workflow()Get a workflow by ID
delete_workflow()Delete a workflow
get_step_types()Get available workflow step types

Try Datalab

Get started with our API in less than a minute. We include free credits.