- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
Why Pipelines
Individual endpoints like/convert and /extract work well for one-off tasks. Pipelines are better when you need to:
- Chain processors — Convert a document, then extract structured data, in one call
- Version your configuration — Pin production integrations to a specific version while iterating on drafts
- Standardize processing — Share pipeline configurations across your team
- Track execution — Monitor each processor’s status as a pipeline runs
You can build pipelines visually in Forge or programmatically via the SDK and API.
How Pipelines Work
A pipeline is an ordered chain of processors. Each processor processes the document and passes its output to the next via checkpoints.convert processor always runs first. Downstream processors depend on it.
Processor Types
| Processor | Description | Can Follow |
|---|---|---|
convert | Parse document to markdown/HTML/JSON | Must be first |
segment | Split document into logical sections | convert |
extract | Extract structured data using a JSON schema | convert, segment, custom |
custom | Run a custom processor | convert |
Composition Rules
- Every pipeline starts with a
convertprocessor extractis always terminal (nothing can follow it)segmentcan feed intoextractcustomcan feed intoextract
| Pattern | Use Case |
|---|---|
convert | Simple document parsing |
convert → extract | Parse and extract structured fields |
convert → segment | Parse and split into sections |
convert → segment → extract | Split, then extract from each section |
convert → custom → extract | Apply custom processing, then extract |
Pipeline Lifecycle
Pipelines have three states:- Draft — Edits auto-save. Not versioned yet.
- Saved — Named and visible in your pipeline list.
- Published — An immutable version snapshot. Safe to use in production.
Quick Example
Create a pipeline that converts a document and extracts invoice data:Pipelines vs Individual Endpoints
| Individual Endpoints | Pipelines | |
|---|---|---|
| Processors | One at a time | Chain multiple processors |
| Versioning | None | Draft, saved, published versions |
| Configuration | Pass options per request | Configure once, reuse |
| Forge UI | Playground | Full pipeline builder |
| Best for | Quick tests, simple tasks | Production integrations |
/convert, /extract, /segment) are not going away. Use them for simple, one-off processing. Use Pipelines when you need repeatability, versioning, or multi-processor chains.
Next Steps
Create a Pipeline
Build your first pipeline with Forge or the SDK.
Pipeline Versioning
Manage drafts, versions, and production deployments.
Run a Pipeline
Execute pipelines with overrides, polling, and webhooks.
SDK Reference
Full SDK reference for all pipeline methods.