Convert Document

POST

api

convert

Convert Document

import requests

url = "https://www.datalab.to/api/v1/convert"

files = { "file.0": ("example-file", open("example-file", "rb")) }
payload = {
    "file_url": "<string>",
    "mode": "fast",
    "max_pages": "123",
    "page_range": "<string>",
    "paginate": "false",
    "add_block_ids": "false",
    "include_markdown_in_chunks": "false",
    "disable_image_extraction": "false",
    "disable_image_captions": "false",
    "fence_synthetic_captions": "false",
    "output_format": "<string>",
    "token_efficient_markdown": "false",
    "skip_cache": "false",
    "save_checkpoint": "false",
    "additional_config": "<string>",
    "workflowstepdata_id": "123",
    "extras": "<string>",
    "webhook_url": "<string>",
    "force_new": "false",
    "file": "<string>"
}
headers = {"X-API-Key": "<api-key>"}

response = requests.post(url, data=payload, files=files, headers=headers)

print(response.text)

{
  "request_id": "<string>",
  "request_check_url": "<string>",
  "success": true,
  "error": "<string>",
  "versions": {}
}

Authorizations

X-API-Key

string

header

required

Cookies

wos-session

string

access_token

string

datalab_active_team

string

Body

multipart/form-data

file_url

string | null

Optional file URL (http/https). If provided, the server will download and process it.

mode

string

default:fast

Which output mode to use. Valid values: 'fast' (lowest latency), 'balanced' (balanced accuracy and latency), 'accurate' (highest accuracy).

max_pages

integer | null

The maximum number of pages in the document to convert.

page_range

string | null

The page range to convert, comma separated like 0,5-10,20. Overrides max_pages if provided.

paginate

boolean

default:false

Whether to paginate the output. Each page will be separated by a horizontal rule with the page number.

add_block_ids

boolean

default:false

Add data-block-id attributes to HTML elements for citation tracking. Only applies when output_format includes 'html'.

include_markdown_in_chunks

boolean

default:false

Include markdown field in chunks and JSON output.

disable_image_extraction

boolean

default:false

Disable image extraction from the document.

disable_image_captions

boolean

default:false

Disable synthetic image captions/descriptions in output.

fence_synthetic_captions

boolean

default:false

Wrap synthetic image captions with HTML comment markers for easy identification/removal.

output_format

string | null

The output format. Can be 'json', 'html', 'markdown', or 'chunks'. Defaults to 'markdown'. Comma separate multiple formats.

token_efficient_markdown

boolean

default:false

Optimize markdown for LLM token usage (compact tables, single-space indents).

skip_cache

boolean

default:false

Skip the cache and re-run the conversion.

save_checkpoint

boolean

default:false

Save a checkpoint after conversion. The checkpoint_id in the response can be used with /extract or /segment to skip re-parsing.

additional_config

string | null

Additional configuration as a JSON string. Supported keys: 'keep_pageheader_in_output', 'keep_pagefooter_in_output', 'keep_spreadsheet_formatting'.

workflowstepdata_id

integer | null

Optional workflow step data ID to associate with this request.

extras

string | null

Comma-separated list of extra features: 'track_changes', 'chart_understanding', 'table_row_bboxes', 'extract_links', 'infographic', 'new_block_types'.

webhook_url

string | null

Optional webhook URL to call when the request is complete.

force_new

boolean

default:false

Internal: force Modal backend.

file

file | null

Input PDF, word document, powerpoint, or image file, uploaded as multipart form data. Images must be png, jpg, or webp format.

Response

Successful Response

request_id

string

required

The ID of the request. This ID can be used to check the status of the request.

request_check_url

string

required

The URL to check the status of the request and get results.

success

boolean

default:true

Whether the request was successful.

error

string | null

If the request was not successful, this will contain an error message.

versions

A dictionary of the versions of the libraries used in the request.

Extract Structured DataExtract structured data from a document using a JSON schema. Provide a file for end-to-end processing, or a checkpoint_id from a previous /convert call to skip re-parsing.

⌘I

Convert Document

import requests

url = "https://www.datalab.to/api/v1/convert"

files = { "file.0": ("example-file", open("example-file", "rb")) }
payload = {
    "file_url": "<string>",
    "mode": "fast",
    "max_pages": "123",
    "page_range": "<string>",
    "paginate": "false",
    "add_block_ids": "false",
    "include_markdown_in_chunks": "false",
    "disable_image_extraction": "false",
    "disable_image_captions": "false",
    "fence_synthetic_captions": "false",
    "output_format": "<string>",
    "token_efficient_markdown": "false",
    "skip_cache": "false",
    "save_checkpoint": "false",
    "additional_config": "<string>",
    "workflowstepdata_id": "123",
    "extras": "<string>",
    "webhook_url": "<string>",
    "force_new": "false",
    "file": "<string>"
}
headers = {"X-API-Key": "<api-key>"}

response = requests.post(url, data=payload, files=files, headers=headers)

print(response.text)

{
  "request_id": "<string>",
  "request_check_url": "<string>",
  "success": true,
  "error": "<string>",
  "versions": {}
}

API Reference

files

providers

Authorizations

Cookies

Body

Response