- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
Why Pipelines
Individual endpoints like/convert and /extract work well for one-off tasks. Pipelines are better when you need to:
- Chain processors — Convert a document, then extract structured data, in one call
- Version your configuration — Pin production integrations to a specific version while iterating on drafts
- Standardize processing — Share pipeline configurations across your team
- Track execution — Monitor each processor’s status as a pipeline runs
You can build pipelines visually in Forge or programmatically via the SDK and API.
How Pipelines Work
A pipeline is an ordered chain of processors. Each processor processes the document and passes its output to the next via checkpoints.convert. The fill processor is the exception — it runs as a standalone step and cannot be chained.
Processor Types
| Processor | Description | Can Follow |
|---|---|---|
convert | Parse document to markdown/HTML/JSON | Must be first |
segment | Split document into logical sections | convert |
extract | Extract structured data using a JSON schema | convert, segment, custom |
custom | Run a custom processor | convert |
fill | Fill form fields in a PDF or image | Standalone only |
Composition Rules
- Every pipeline starts with a
convertorfillprocessor extractis always terminal (nothing can follow it)segmentcan feed intoextractcustomcan feed intoextractfillis always standalone — it cannot follow or precede other processors
| Pattern | Use Case |
|---|---|
convert | Simple document parsing |
convert → extract | Parse and extract structured fields |
convert → segment | Parse and split into sections |
convert → segment → extract | Split, then extract from each section |
convert → custom → extract | Apply custom processing, then extract |
fill | Version and track form-filling workflows |
Pipeline Lifecycle
Pipelines have three states:- Draft — Edits auto-save. Not versioned yet.
- Saved — Named and visible in your pipeline list.
- Published — An immutable version snapshot. Safe to use in production.
Quick Example
Create a pipeline that converts a document and extracts invoice data:Pipelines vs Individual Endpoints
| Individual Endpoints | Pipelines | |
|---|---|---|
| Processors | One at a time | Chain multiple processors |
| Versioning | None | Draft, saved, published versions |
| Configuration | Pass options per request | Configure once, reuse |
| Forge UI | Playground | Full pipeline builder |
| Best for | Quick tests, simple tasks | Production integrations |
/convert, /extract, /segment) are not going away. Use them for simple, one-off processing. Use Pipelines when you need repeatability, versioning, or multi-processor chains.
Next Steps
Create a Pipeline
Build your first pipeline with Forge or the SDK.
Pipeline Versioning
Manage drafts, versions, and production deployments.
Run a Pipeline
Execute pipelines with overrides, polling, and webhooks.
SDK Reference
Full SDK reference for all pipeline methods.