Skip to main content
Custom Processors are currently in beta. Contact support@datalab.to for access.
Custom processors customize the output of the convert processor. When standard conversion doesn’t produce exactly what you need — edge-case layouts, domain-specific formatting, or use-case-specific output transformations — custom processors let you fine-tune the result. Before you begin, make sure you have:
  1. A Datalab account with an API key (new accounts include $5 in free credits)
  2. Python 3.10+ installed
  3. The Datalab SDK: pip install datalab-python-sdk
  4. Your DATALAB_API_KEY environment variable set

How Custom Processors Work

A custom processor applies modifications on top of document conversion. The flow is:
  1. The convert processor parses your document into structured output
  2. The custom processor applies your modifications to refine that output
Modifications can operate at different levels:
  • Block-level — Modify individual blocks (e.g., rewrite table captions, summarize content)
  • Page-level — Modify entire pages with full structural control (e.g., reorder blocks, add/remove elements)
  • Classification — Classify pages into categories for downstream routing

Creating a Custom Processor

The recommended way to create a custom processor is through Forge:
  1. Open Forge and start a new custom processor
  2. Describe what you want in natural language — for example, “Summarize all tables into bullet points” or “Extract only the financial data sections”
  3. Upload example documents that represent your use case
  4. The system generates a processor configuration based on your description and examples
  5. Review the results on your test documents and iterate if needed
Each custom processor gets an ID in the format cp_XXXXX.

Using a Custom Processor

Standalone

Run a custom processor directly on a document:
from datalab_sdk import DatalabClient, CustomProcessorOptions

client = DatalabClient()

options = CustomProcessorOptions(
    pipeline_id="cp_abc123",    # Your custom processor ID
    mode="balanced",
    output_format="markdown",
)

result = client.run_custom_processor("document.pdf", options=options)
print(result.markdown)

In a Pipeline

Use a custom processor as part of a pipeline by adding it as a custom processor:
from datalab_sdk import DatalabClient, PipelineProcessor

client = DatalabClient()

pipeline = client.create_pipeline(steps=[
    PipelineProcessor(type="convert", settings={"mode": "balanced"}),
    PipelineProcessor(type="custom", settings={}, custom_processor_id="cp_abc123"),
    PipelineProcessor(type="extract", settings={
        "page_schema": {
            "type": "object",
            "properties": {
                "summary": {"type": "string"}
            }
        }
    })
])
This chains convert → custom → extract: the document is parsed, your custom modifications are applied, then structured data is extracted from the customized output.

CustomProcessorOptions

OptionTypeDefaultDescription
pipeline_idstrRequiredCustom processor ID (cp_XXXXX)
versionintActive versionSpecific processor version to run
run_evalboolFalseRun evaluation rules after processing
modestr"fast"Processing mode: "fast", "balanced", "accurate"
output_formatstr"markdown"Output format: "markdown", "html", "json", "chunks"
paginateboolFalseAdd page delimiters
add_block_idsboolFalseAdd block IDs for citation tracking
disable_image_extractionboolFalseDon’t extract images
disable_image_captionsboolFalseDon’t generate image captions
webhook_urlstr-Webhook URL for completion notification

Versioning

Custom processors support versioning. Each iteration creates a new version, letting you refine behavior over time:
# List versions
versions = client.list_custom_processor_versions("cp_abc123")
for v in versions["versions"]:
    print(f"v{v.version}: {v.description}")

# Switch active version
client.set_active_processor_version("cp_abc123", version=2)

Managing Custom Processors

# List your custom processors
result = client.list_custom_processors(limit=50)
for p in result["processors"]:
    print(f"{p.processor_id}: {p.name} (v{p.active_version})")

# Archive
client.archive_custom_processor("cp_abc123")

Next Steps

Pipeline Overview

Processor types, composition rules, and when to use pipelines.

Create a Pipeline

Build pipelines that include custom processors.

Document Conversion

Understand the convert processor that custom processors build on.

Contact Support

Request beta access to Custom Processors.