Installation
Authentication
Set your API key as an environment variable (recommended):Quick Example
Client Options
Both sync and async clients accept the same configuration options:| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | DATALAB_API_KEY env var | Your Datalab API key |
base_url | str | https://www.datalab.to | API base URL |
timeout | int | 300 | Request timeout in seconds |
Async Support
For high-throughput applications, useAsyncDatalabClient:
Error Handling
The SDK raises specific exceptions for different error types:| Exception | Description |
|---|---|
DatalabAPIError | API returned an error response (includes status_code and response_data) |
DatalabTimeoutError | Request exceeded timeout |
DatalabFileError | File not found or cannot be read |
DatalabValidationError | Invalid parameters provided |
Automatic Retries
The SDK automatically retries requests for:408Request Timeout429Rate Limit Exceeded5xxServer Errors
max_polls and poll_interval parameters on individual methods.
SDK Features
Document Conversion
Convert PDFs, images, and documents to Markdown, HTML, JSON, or chunks.
Structured Extraction
Extract structured data from documents using JSON schemas.
Document Segmentation
Segment documents into logical sections.
Form Filling
Fill PDF and image forms with structured field data.
Pipelines
Chain processors into versioned, reusable pipelines.
File Management
Upload, list, and manage files in Datalab storage.
CLI
Command-line interface for document conversion.
Method Summary
| Method | Description |
|---|---|
convert() | Convert documents to markdown, HTML, JSON, or chunks |
extract() | Extract structured data from documents using JSON schemas |
segment() | Segment documents into sections using a schema |
track_changes() | Extract tracked changes from DOCX documents |
create_document() | Create DOCX from markdown with track changes |
run_custom_processor() | Execute a custom processor on a document |
fill() | Fill PDF or image forms with field data |
upload_files() | Upload files to Datalab storage |
list_files() | List uploaded files |
get_file_metadata() | Get metadata for a specific file |
get_file_download_url() | Generate presigned download URL |
delete_file() | Delete an uploaded file |
create_pipeline() | Create a new pipeline |
list_pipelines() | List pipelines for your team |
get_pipeline() | Get a pipeline by ID |
update_pipeline() | Update pipeline steps (creates a draft) |
save_pipeline() | Promote a pipeline draft to a named, published version |
archive_pipeline() | Archive a pipeline |
unarchive_pipeline() | Restore an archived pipeline |
create_pipeline_version() | Snapshot the current pipeline steps as an immutable version |
list_pipeline_versions() | List all versions of a pipeline |
discard_pipeline_draft() | Discard draft changes and revert to a published version |
get_pipeline_rate() | Get per-page rate for a pipeline |
run_pipeline() | Execute a pipeline on a file |
get_pipeline_execution() | Poll pipeline execution status |
list_pipeline_executions() | List recent executions for a pipeline |
get_step_result() | Fetch the result of a specific pipeline step |
list_custom_processors() | List custom processors for your team |
get_custom_processor_status() | Check custom processor generation status |
list_custom_processor_versions() | List versions of a custom processor |
set_active_processor_version() | Set the active version of a custom processor |
archive_custom_processor() | Archive a custom processor |
create_extraction_schema() | Create a reusable extraction schema |
list_extraction_schemas() | List saved extraction schemas |
get_extraction_schema() | Get a schema by ID |
update_extraction_schema() | Update schema fields or create a new version |
delete_extraction_schema() | Archive (soft-delete) an extraction schema |
run_custom_pipeline() | (Deprecated) Use run_custom_processor() instead |
ocr() | (Deprecated) Use convert() instead |
Next Steps
Document Conversion
Convert PDFs, images, and documents to Markdown, HTML, JSON, or chunks.
Structured Extraction
Extract structured data from documents using JSON schemas.
Document Segmentation
Segment documents into logical sections.
Form Filling
Fill PDF and image forms with structured field data.
Pipelines
Chain processors into versioned, reusable pipelines.
File Management
Upload, list, and manage files in Datalab storage.