Skip to main content
Before you begin, make sure you have:
  1. A Datalab account with an API key (new accounts include $5 in free credits)
  2. Python 3.10+ installed
  3. The Datalab SDK: pip install datalab-python-sdk
  4. Your DATALAB_API_KEY environment variable set

Version Lifecycle

Every pipeline goes through a predictable lifecycle:
Stateactive_versionDescription
Draft0Edits auto-save. No published version yet.
Saved0Named pipeline, still no published version.
Published1, 2, …Immutable version snapshots exist.
When you edit a published pipeline, your changes go into a draft. The published version is untouched until you explicitly publish again.

Publish a Version

Create an immutable snapshot of the current pipeline steps:
from datalab_sdk import DatalabClient

client = DatalabClient()

# Publish version 1
version = client.create_pipeline_version(
    "pl_abc123",
    description="Initial production release"
)
print(f"Published v{version.version}")  # v1
Each call increments the version number. Published versions are immutable — their steps cannot be changed.

Edit and Iterate

After publishing, any edits create a draft that is separate from the published version:
from datalab_sdk import PipelineProcessor

# Edit steps — this creates a draft
client.update_pipeline("pl_abc123", steps=[
    PipelineProcessor(type="convert", settings={"mode": "accurate"}),  # Changed
    PipelineProcessor(type="extract", settings={
        "page_schema": {"type": "object", "properties": {
            "title": {"type": "string"},
            "author": {"type": "string"}  # Added field
        }}
    })
])

# Test the draft
execution = client.run_pipeline("pl_abc123", file_path="test.pdf", version=0)

# Happy with changes? Publish a new version
version = client.create_pipeline_version("pl_abc123", description="Added author field")
print(f"Published v{version.version}")  # v2
version=0 explicitly runs the draft. Omitting version runs the active published version. See Run a Pipeline for version parameter details.

Discard a Draft

Revert unsaved changes and restore the published version’s steps:
# Discard draft, revert to active version
pipeline = client.discard_pipeline_draft("pl_abc123")

# Or revert to a specific version
pipeline = client.discard_pipeline_draft("pl_abc123", version=1)

Browse Version History

List all published versions for a pipeline:
result = client.list_pipeline_versions("pl_abc123")

for v in result["versions"]:
    print(f"v{v.version}: {v.description} (created {v.created})")
    print(f"  Steps: {[s['type'] for s in v.steps]}")
Versions are returned newest-first.

Best Practices

Pin production integrations to a specific version. When calling run_pipeline() from production code, pass an explicit version number. This protects you from accidental changes:
# Production code — pinned to v2
execution = client.run_pipeline(
    "pl_abc123",
    file_path="document.pdf",
    version=2  # Always runs v2, even if v3 is published later
)
Test drafts before publishing. Use version=0 to run the draft version against test documents:
# Test draft changes
execution = client.run_pipeline(
    "pl_abc123",
    file_path="test_document.pdf",
    version=0  # Runs draft
)
Use descriptions. Include a meaningful description when publishing so your team can understand what changed:
client.create_pipeline_version(
    "pl_abc123",
    description="Switch to accurate mode, add line_items extraction"
)
Archive unused pipelines. Keep your pipeline list clean:
client.archive_pipeline("pl_old123")

# List includes archived if you need them
result = client.list_pipelines(include_archived=True)

Next Steps

Run a Pipeline

Execute pipelines with version selection, overrides, and polling.

Create a Pipeline

Build pipelines with Forge or the SDK.

Pipeline Overview

Processor types, composition rules, and when to use pipelines.

SDK Reference

Full SDK reference for all pipeline methods.