Custom Processors are currently in beta. Contact support@datalab.to for access.
convert processor. When standard conversion doesn’t produce exactly what you need — edge-case layouts, domain-specific formatting, or use-case-specific output transformations — custom processors let you fine-tune the result.
Before you begin, make sure you have:
- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
How Custom Processors Work
A custom processor applies modifications on top of document conversion. The flow is:- The
convertprocessor parses your document into structured output - The custom processor applies your modifications to refine that output
- Block-level — Modify individual blocks (e.g., rewrite table captions, summarize content)
- Page-level — Modify entire pages with full structural control (e.g., reorder blocks, add/remove elements)
- Classification — Classify pages into categories for downstream routing
Creating a Custom Processor
The recommended way to create a custom processor is through Forge. The creation flow is a 3-step guided wizard:- Describe — Use the chat-driven builder to articulate what your processor should do. Describe your goal in natural language (e.g., “Summarize all tables into bullet points” or “Extract only the financial data sections”) and the AI assistant will help you refine and confirm the specification before generating the processor.
- Documents — Upload example documents that represent your use case. These are used to generate and validate the processor configuration.
- Review — See the generated processor run on your examples. If the results aren’t right, use the Improve tab in the sidebar to describe what to change and generate a new version. The History tab shows all past versions and lets you revert to any of them; Details shows the active configuration.
cp_XXXXX.
Using a Custom Processor
Standalone
Run a custom processor directly on a document:In a Pipeline
Use a custom processor as part of a pipeline by adding it as acustom processor:
CustomProcessorOptions
| Option | Type | Default | Description |
|---|---|---|---|
pipeline_id | str | Required | Custom processor ID (cp_XXXXX) |
version | int | Active version | Specific processor version to run |
run_eval | bool | False | Run evaluation rules after processing |
mode | str | "fast" | Processing mode: "fast", "balanced", "accurate" |
output_format | str | "markdown" | Output format: "markdown", "html", "json", "chunks" |
paginate | bool | False | Add page delimiters |
add_block_ids | bool | False | Add block IDs for citation tracking |
disable_image_extraction | bool | False | Don’t extract images |
disable_image_captions | bool | False | Don’t generate image captions |
webhook_url | str | - | Webhook URL for completion notification |
Versioning
Custom processors support versioning. Each iteration creates a new version, letting you refine behavior over time:Managing Custom Processors
Next Steps
Pipeline Overview
Processor types, composition rules, and when to use pipelines.
Create a Pipeline
Build pipelines that include custom processors.
Document Conversion
Understand the convert processor that custom processors build on.
Contact Support
Request beta access to Custom Processors.