Overview
Pipelines chain processors (convert, extract, segment, custom) into reusable, versioned configurations. See Pipeline Overview for concepts.Basic Usage
Models
PipelineProcessor
Defines a single processor in a pipeline.| Field | Type | Required | Description |
|---|---|---|---|
type | str | Yes | "convert", "extract", "segment", or "custom" |
settings | dict | Yes | Step-specific configuration |
custom_processor_id | str | No | Custom processor ID for "custom" steps |
eval_rubric_id | int | No | Evaluation rubric to apply |
PipelineConfig
Returned by pipeline CRUD methods.| Field | Type | Description |
|---|---|---|
pipeline_id | str | Unique ID (pl_XXXXX) |
steps | list | Ordered list of step definitions |
name | str | Pipeline name (set via save_pipeline) |
is_saved | bool | Whether pipeline has been saved |
archived | bool | Whether pipeline is archived |
active_version | int | Current published version (0 = no published version) |
created | datetime | Creation timestamp |
updated | datetime | Last update timestamp |
PipelineVersion
Immutable snapshot of pipeline steps at a point in time.| Field | Type | Description |
|---|---|---|
version | int | Version number |
steps | list | Steps at this version |
description | str | Version description |
created | datetime | When version was published |
PipelineExecution
Result from running a pipeline.| Field | Type | Description |
|---|---|---|
execution_id | str | Unique ID (pex_XXXXX) |
pipeline_id | str | Pipeline that was executed |
pipeline_version | int | Version used (0 = draft) |
status | str | pending, running, completed, completed_with_errors, failed |
steps | list | List of PipelineExecutionStepResult |
started_at | datetime | Execution start time |
completed_at | datetime | Execution end time |
created | datetime | When execution was created |
config_snapshot | dict | Frozen step configuration used |
input_config | dict | Input file details |
rate_breakdown | dict | Billing breakdown |
PipelineExecutionStepResult
Status of a single step within an execution.| Field | Type | Description |
|---|---|---|
step_index | int | Position in pipeline |
step_type | str | Step type |
status | str | pending, dispatched, running, completed, failed, skipped |
result_url | str | URL to fetch step result |
checkpoint_id | str | Checkpoint passed to downstream steps |
started_at | datetime | Step start time |
finished_at | datetime | Step end time |
error_message | str | Error details if failed |
Pipeline Management
Create
Save
Update
Creates a draft if a published version exists:List
Get
Archive / Unarchive
Versioning
Publish a Version
List Versions
Discard Draft
Get Rate
Execution
Run
| Parameter | Type | Default | Description |
|---|---|---|---|
pipeline_id | str | Required | Pipeline to run |
file_path | str | - | Local file path |
file_url | str | - | URL to document |
page_range | str | - | Pages to process ("0-5,10") |
output_format | str | - | Override output format |
skip_cache | bool | False | Skip cached results |
run_evals | bool | False | Run eval rubrics on steps |
webhook_url | str | - | Webhook URL for completion |
version | int | - | Version to run (omit=active, 0=draft, N=specific) |
max_polls | int | 1 | Polling attempts |
poll_interval | int | 1 | Seconds between polls |
Poll Execution
List Executions
Get Step Result
Async Usage
All pipeline methods are available onAsyncDatalabClient:
Next Steps
Pipeline Overview
Concepts, processor types, and when to use pipelines.
Create a Pipeline
Step-by-step guide to building pipelines.
Pipeline Versioning
Manage drafts and publish versions.
Run a Pipeline
Execution, overrides, and result retrieval.