Run Custom Processor

import requests url = "https://www.datalab.to/api/v1/custom-processor" files = { "file.0": ("example-file", open("example-file", "rb")) } payload = { "pipeline_id": "<string>", "file_url": "<string>", "version": "123", "run_eval": "false", "mode": "fast", "max_pages": "123", "page_range": "<string>", "output_format": "<string>", "paginate": "false", "add_block_ids": "false", "include_markdown_in_chunks": "false", "disable_image_extraction": "false", "disable_image_captions": "false", "skip_cache": "false", "webhook_url": "<string>", "workflowstepdata_id": "123", "model_override_settings": "<string>", "file": "<string>" } headers = {"X-API-Key": "<api-key>"} response = requests.post(url, data=payload, files=files, headers=headers) print(response.text)

Authorizations

X-API-Key

string

header

required

Cookies

wos-session

string

datalab_active_team

string

Body

multipart/form-data

pipeline_id

string

required

The custom pipeline ID or template ID to execute. Must be a completed pipeline created via the custom pipeline API, or a valid template slug.

file_url

string | null

Optional file URL (http/https). If provided, the server will download and process it.

version

integer | null

Optional version number to use. If not provided, the active version of the pipeline will be used.

run_eval

boolean

default:false

Run evaluation rules defined for this custom pipeline.

mode

string

default:fast

Output mode for the underlying parsing step.

max_pages

integer | null

The maximum number of pages to process.

page_range

string | null

The page range to process, comma separated like 0,5-10,20.

output_format

string | null

The output format. Can be 'json', 'html', 'markdown', or 'chunks'. Defaults to 'markdown'.

paginate

boolean

default:false

Whether to paginate the output. If set to True, each page of the output will be separated by a horizontal rule that contains the page number.

add_block_ids

boolean

default:false

Add data-block-id attributes to HTML elements for citation tracking. Only applies when output_format includes 'html'.

include_markdown_in_chunks

boolean

default:false

Include markdown field in chunks and JSON output. When enabled, each chunk will have a 'markdown' field with the markdown representation of that block. Only applies when output_format includes 'json' or 'chunks'.

disable_image_extraction

boolean

default:false

Disable image extraction from the PDF. Defaults to False.

disable_image_captions

boolean

default:false

Disable synthetic image captions/descriptions in output. Images will be rendered as plain img tags without alt text. Defaults to False.

skip_cache

boolean

default:false

Skip the cache and re-run.

webhook_url

string | null

Optional webhook URL to call when the request is complete.

workflowstepdata_id

integer | null

Optional workflow step data ID to associate with this request.

model_override_settings

string | null

file

file | null

Input PDF, word document, powerpoint, or image file, uploaded as multipart form data. Images must be png, jpg, or webp format.

Response

Successful Response

request_id

string

required

The ID of the request. This ID can be used to check the status of the request.

request_check_url

string

required

The URL to check the status of the request and get results.

success

boolean

default:true

Whether the request was successful.

error

string | null

If the request was not successful, this will contain an error message.

versions

A dictionary of the versions of the libraries used in the request.

API Reference

Custom Processors

Processor Templates

Custom Pipelines (Deprecated)

Pipeline Templates (Deprecated)

files

eval_rubrics

collections

pipelines

extraction_schemas

Authorizations

Cookies

Body

Response

API Reference

Custom Processors

Processor Templates

Custom Pipelines (Deprecated)

Pipeline Templates (Deprecated)

files

eval_rubrics

collections

pipelines

extraction_schemas

Documentation Index

Authorizations

Cookies

Body

Response