Documentation Index Fetch the complete documentation index at: https://documentation.datalab.to/llms.txt
Use this file to discover all available pages before exploring further.
Before you begin , make sure you have:
A Datalab account with an API key (new accounts include $5 in free credits)
Python 3.10+ installed
The Datalab SDK: pip install datalab-python-sdk
Your DATALAB_API_KEY environment variable set
Version Lifecycle
Every pipeline goes through a predictable lifecycle:
State active_versionDescription Draft 0Edits auto-save. No published version yet. Saved 0Named pipeline, still no published version. Published 1, 2, …Immutable version snapshots exist.
When you edit a published pipeline, your changes go into a draft. The published version is untouched until you explicitly publish again.
Publish a Version
Create an immutable snapshot of the current pipeline steps:
Python SDK
cURL
Python (requests)
from datalab_sdk import DatalabClient
client = DatalabClient()
# Publish version 1
version = client.create_pipeline_version(
"pl_abc123" ,
description = "Initial production release"
)
print ( f "Published v { version.version } " ) # v1
Each call increments the version number. Published versions are immutable — their steps cannot be changed.
Edit and Iterate
After publishing, any edits create a draft that is separate from the published version:
from datalab_sdk import PipelineProcessor
# Edit steps — this creates a draft
client.update_pipeline( "pl_abc123" , steps = [
PipelineProcessor( type = "convert" , settings = { "mode" : "accurate" }), # Changed
PipelineProcessor( type = "extract" , settings = {
"page_schema" : { "type" : "object" , "properties" : {
"title" : { "type" : "string" },
"author" : { "type" : "string" } # Added field
}}
})
])
# Test the draft
execution = client.run_pipeline( "pl_abc123" , file_path = "test.pdf" , version = 0 )
# Happy with changes? Publish a new version
version = client.create_pipeline_version( "pl_abc123" , description = "Added author field" )
print ( f "Published v { version.version } " ) # v2
version=0 explicitly runs the draft. Omitting version runs the active published version. See Run a Pipeline for version parameter details.
Discard a Draft
Revert unsaved changes and restore the published version’s steps:
Python SDK
cURL
Python (requests)
# Discard draft, revert to active version
pipeline = client.discard_pipeline_draft( "pl_abc123" )
# Or revert to a specific version
pipeline = client.discard_pipeline_draft( "pl_abc123" , version = 1 )
Browse Version History
List all published versions for a pipeline:
result = client.list_pipeline_versions( "pl_abc123" )
for v in result[ "versions" ]:
print ( f "v { v.version } : { v.description } (created { v.created } )" )
print ( f " Steps: { [s[ 'type' ] for s in v.steps] } " )
Versions are returned newest-first.
Best Practices
Pin production integrations to a specific version. When calling run_pipeline() from production code, pass an explicit version number. This protects you from accidental changes:
# Production code — pinned to v2
execution = client.run_pipeline(
"pl_abc123" ,
file_path = "document.pdf" ,
version = 2 # Always runs v2, even if v3 is published later
)
Test drafts before publishing. Use version=0 to run the draft version against test documents:
# Test draft changes
execution = client.run_pipeline(
"pl_abc123" ,
file_path = "test_document.pdf" ,
version = 0 # Runs draft
)
Use descriptions. Include a meaningful description when publishing so your team can understand what changed:
client.create_pipeline_version(
"pl_abc123" ,
description = "Switch to accurate mode, add line_items extraction"
)
Archive unused pipelines. Keep your pipeline list clean:
client.archive_pipeline( "pl_old123" )
# List includes archived if you need them
result = client.list_pipelines( include_archived = True )
Next Steps
Run a Pipeline Execute pipelines with version selection, overrides, and polling.
Create a Pipeline Build pipelines with Forge or the SDK.
Pipeline Overview Processor types, composition rules, and when to use pipelines.
SDK Reference Full SDK reference for all pipeline methods.