Custom Processors are currently in beta. Contact support@datalab.to for access.
convert processor. When standard conversion doesn’t produce exactly what you need — edge-case layouts, domain-specific formatting, or use-case-specific output transformations — custom processors let you fine-tune the result.
Before you begin, make sure you have:
- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
How Custom Processors Work
A custom processor applies modifications on top of document conversion. The flow is:- The
convertprocessor parses your document into structured output - The custom processor applies your modifications to refine that output
- Block-level — Modify individual blocks (e.g., rewrite table captions, summarize content)
- Page-level — Modify entire pages with full structural control (e.g., reorder blocks, add/remove elements)
- Classification — Classify pages into categories for downstream routing
Creating a Custom Processor
The recommended way to create a custom processor is through Forge:- Open Forge and start a new custom processor
- Describe what you want in natural language — for example, “Summarize all tables into bullet points” or “Extract only the financial data sections”
- Upload example documents that represent your use case
- The system generates a processor configuration based on your description and examples
- Review the results on your test documents and iterate if needed
cp_XXXXX.
Using a Custom Processor
Standalone
Run a custom processor directly on a document:In a Pipeline
Use a custom processor as part of a pipeline by adding it as acustom processor:
CustomProcessorOptions
| Option | Type | Default | Description |
|---|---|---|---|
pipeline_id | str | Required | Custom processor ID (cp_XXXXX) |
version | int | Active version | Specific processor version to run |
run_eval | bool | False | Run evaluation rules after processing |
mode | str | "fast" | Processing mode: "fast", "balanced", "accurate" |
output_format | str | "markdown" | Output format: "markdown", "html", "json", "chunks" |
paginate | bool | False | Add page delimiters |
add_block_ids | bool | False | Add block IDs for citation tracking |
disable_image_extraction | bool | False | Don’t extract images |
disable_image_captions | bool | False | Don’t generate image captions |
webhook_url | str | - | Webhook URL for completion notification |
Versioning
Custom processors support versioning. Each iteration creates a new version, letting you refine behavior over time:Managing Custom Processors
Next Steps
Pipeline Overview
Processor types, composition rules, and when to use pipelines.
Create a Pipeline
Build pipelines that include custom processors.
Document Conversion
Understand the convert processor that custom processors build on.
Contact Support
Request beta access to Custom Processors.