Pipelines

Pipelines chain processors — convert, extract, segment, and custom — into a single reusable unit. Define a pipeline once, version it, and run it against any document with one API call. Before you begin, make sure you have:

A Datalab account with an API key (new accounts include $5 in free credits)
Python 3.10+ installed
The Datalab SDK: pip install datalab-python-sdk
Your DATALAB_API_KEY environment variable set

Why Pipelines

Individual endpoints like /convert and /extract work well for one-off tasks. Pipelines are better when you need to:

Chain processors — Convert a document, then extract structured data, in one call
Version your configuration — Pin production integrations to a specific version while iterating on drafts
Standardize processing — Share pipeline configurations across your team
Track execution — Monitor each processor’s status as a pipeline runs

You can build pipelines visually in Forge or programmatically via the SDK and API.

How Pipelines Work

A pipeline is an ordered chain of processors. Each processor processes the document and passes its output to the next via checkpoints.

convert → segment → extract

Most pipelines start with convert. The fill processor is the exception — it runs as a standalone step and cannot be chained.

Processor Types

Processor	Description	Can Follow
`convert`	Parse document to markdown/HTML/JSON	Must be first
`segment`	Split document into logical sections	`convert`
`extract`	Extract structured data using a JSON schema	`convert`, `segment`, `custom`
`custom`	Run a custom processor	`convert`
`fill`	Fill form fields in a PDF or image	Standalone only

Composition Rules

Every pipeline starts with a convert or fill processor
extract is always terminal (nothing can follow it)
segment can feed into extract
custom can feed into extract
fill is always standalone — it cannot follow or precede other processors

Common patterns:

Pattern	Use Case
`convert`	Simple document parsing
`convert → extract`	Parse and extract structured fields
`convert → segment`	Parse and split into sections
`convert → segment → extract`	Split, then extract from each section
`convert → custom → extract`	Apply custom processing, then extract
`fill`	Version and track form-filling workflows

Pipeline Lifecycle

Pipelines have three states:

Draft — Edits auto-save. Not versioned yet.
Saved — Named and visible in your pipeline list.
Published — An immutable version snapshot. Safe to use in production.

Create (draft) → Save (named) → Publish version (immutable)
         ↑                              |
         └──── Edit (new draft) ←───────┘

When you edit a published pipeline, your changes go into a draft. The published version remains unchanged until you publish a new version. You can discard the draft at any time to revert. See Pipeline Versioning for the full lifecycle.

Quick Example

Create a pipeline that converts a document and extracts invoice data:

from datalab_sdk import DatalabClient, PipelineProcessor

client = DatalabClient()

# Define steps
steps = [
    PipelineProcessor(type="convert", settings={"mode": "balanced"}),
    PipelineProcessor(type="extract", settings={
        "page_schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"},
                "vendor_name": {"type": "string"}
            }
        }
    })
]

# Create and save
pipeline = client.create_pipeline(steps=steps)
pipeline = client.save_pipeline(pipeline.pipeline_id, name="Invoice Extractor")

# Run on a document
execution = client.run_pipeline(
    pipeline.pipeline_id,
    file_path="invoice.pdf"
)

# Poll until complete
execution = client.get_pipeline_execution(
    execution.execution_id,
    max_polls=300
)

# Get extraction result
result = client.get_step_result(execution.execution_id, step_index=1)
print(result)

Pipelines vs Individual Endpoints

	Individual Endpoints	Pipelines
Processors	One at a time	Chain multiple processors
Versioning	None	Draft, saved, published versions
Configuration	Pass options per request	Configure once, reuse
Forge UI	Playground	Full pipeline builder
Best for	Quick tests, simple tasks	Production integrations

Individual endpoints (/convert, /extract, /segment) are not going away. Use them for simple, one-off processing. Use Pipelines when you need repeatability, versioning, or multi-processor chains.

Next Steps

Create a Pipeline

Build your first pipeline with Forge or the SDK.

Pipeline Versioning

Manage drafts, versions, and production deployments.

Run a Pipeline

Execute pipelines with overrides, polling, and webhooks.

SDK Reference

Full SDK reference for all pipeline methods.

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

Pipelines

Why Pipelines

How Pipelines Work

Processor Types

Composition Rules

Pipeline Lifecycle

Quick Example

Pipelines vs Individual Endpoints

Next Steps

Create a Pipeline

Pipeline Versioning

Run a Pipeline

SDK Reference

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

Documentation Index

​Why Pipelines

​How Pipelines Work

​Processor Types

​Composition Rules

​Pipeline Lifecycle

​Quick Example

​Pipelines vs Individual Endpoints

​Next Steps

Create a Pipeline

Pipeline Versioning

Run a Pipeline

SDK Reference

Why Pipelines

How Pipelines Work

Processor Types

Composition Rules

Pipeline Lifecycle

Quick Example

Pipelines vs Individual Endpoints

Next Steps