Custom Processors

Custom Processors are currently in beta. Contact support@datalab.to for access.

Custom processors customize the output of the convert processor. When standard conversion doesn’t produce exactly what you need — edge-case layouts, domain-specific formatting, or use-case-specific output transformations — custom processors let you fine-tune the result. Before you begin, make sure you have:

A Datalab account with an API key (new accounts include $5 in free credits)
Python 3.10+ installed
The Datalab SDK: pip install datalab-python-sdk
Your DATALAB_API_KEY environment variable set

How Custom Processors Work

A custom processor applies modifications on top of document conversion. The flow is:

The convert processor parses your document into structured output
The custom processor applies your modifications to refine that output

Modifications can operate at different levels:

Block-level — Modify individual blocks (e.g., rewrite table captions, summarize content)
Page-level — Modify entire pages with full structural control (e.g., reorder blocks, add/remove elements)
Classification — Classify pages into categories for downstream routing

Creating a Custom Processor

The recommended way to create a custom processor is through Forge. The creation flow is a 3-step guided wizard:

Describe — Use the chat-driven builder to articulate what your processor should do. Describe your goal in natural language (e.g., “Summarize all tables into bullet points” or “Extract only the financial data sections”) and the AI assistant will help you refine and confirm the specification before generating the processor.
Documents — Upload example documents that represent your use case. These are used to generate and validate the processor configuration.
Review — See the generated processor run on your examples. If the results aren’t right, use the Improve tab in the sidebar to describe what to change and generate a new version. The History tab shows all past versions and lets you revert to any of them; Details shows the active configuration.

Each custom processor gets an ID in the format cp_XXXXX.

Using a Custom Processor

Standalone

Run a custom processor directly on a document:

from datalab_sdk import DatalabClient, CustomProcessorOptions

client = DatalabClient()

options = CustomProcessorOptions(
    pipeline_id="cp_abc123",    # Your custom processor ID
    mode="balanced",
    output_format="markdown",
)

result = client.run_custom_processor("document.pdf", options=options)
print(result.markdown)

In a Pipeline

Use a custom processor as part of a pipeline by adding it as a custom processor:

from datalab_sdk import DatalabClient, PipelineProcessor

client = DatalabClient()

pipeline = client.create_pipeline(steps=[
    PipelineProcessor(type="convert", settings={"mode": "balanced"}),
    PipelineProcessor(type="custom", settings={}, custom_processor_id="cp_abc123"),
    PipelineProcessor(type="extract", settings={
        "page_schema": {
            "type": "object",
            "properties": {
                "summary": {"type": "string"}
            }
        }
    })
])

This chains convert → custom → extract: the document is parsed, your custom modifications are applied, then structured data is extracted from the customized output.

CustomProcessorOptions

Option	Type	Default	Description
`pipeline_id`	str	Required	Custom processor ID (`cp_XXXXX`)
`version`	int	Active version	Specific processor version to run
`run_eval`	bool	`False`	Run evaluation rules after processing
`mode`	str	`"fast"`	Processing mode: `"fast"`, `"balanced"`, `"accurate"`
`output_format`	str	`"markdown"`	Output format: `"markdown"`, `"html"`, `"json"`, `"chunks"`
`paginate`	bool	`False`	Add page delimiters
`add_block_ids`	bool	`False`	Add block IDs for citation tracking
`disable_image_extraction`	bool	`False`	Don’t extract images
`disable_image_captions`	bool	`False`	Don’t generate image captions
`webhook_url`	str	-	Webhook URL for completion notification

Versioning

Custom processors support versioning. Each iteration creates a new version, letting you refine behavior over time:

# List versions
versions = client.list_custom_processor_versions("cp_abc123")
for v in versions["versions"]:
    print(f"v{v.version}: {v.description}")

# Switch active version
client.set_active_processor_version("cp_abc123", version=2)

Managing Custom Processors

# List your custom processors
result = client.list_custom_processors(limit=50)
for p in result["processors"]:
    print(f"{p.processor_id}: {p.name} (v{p.active_version})")

# Archive
client.archive_custom_processor("cp_abc123")

Next Steps

Pipeline Overview

Processor types, composition rules, and when to use pipelines.

Create a Pipeline

Build pipelines that include custom processors.

Document Conversion

Understand the convert processor that custom processors build on.

Contact Support

Request beta access to Custom Processors.

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

How Custom Processors Work

Creating a Custom Processor

Using a Custom Processor

Standalone

In a Pipeline

CustomProcessorOptions

Versioning

Managing Custom Processors

Next Steps

Pipeline Overview

Create a Pipeline

Document Conversion

Contact Support

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

Documentation Index

​How Custom Processors Work

​Creating a Custom Processor

​Using a Custom Processor

​Standalone

​In a Pipeline

​CustomProcessorOptions

​Versioning

​Managing Custom Processors

​Next Steps

Pipeline Overview

Create a Pipeline

Document Conversion

Contact Support

How Custom Processors Work

Creating a Custom Processor

Using a Custom Processor

Standalone

In a Pipeline

CustomProcessorOptions

Versioning

Managing Custom Processors

Next Steps