Skip to main content
POST
/
api
/
v1
/
segment
Segment Document
import requests

url = "https://www.datalab.to/api/v1/segment"

files = { "file.0": ("example-file", open("example-file", "rb")) }
payload = {
    "segmentation_schema": "<string>",
    "file_url": "<string>",
    "checkpoint_id": "<string>",
    "mode": "fast",
    "max_pages": "123",
    "page_range": "<string>",
    "save_checkpoint": "false",
    "skip_cache": "false",
    "webhook_url": "<string>",
    "workflowstepdata_id": "123",
    "file": "<string>"
}
headers = {"X-API-Key": "<api-key>"}

response = requests.post(url, data=payload, files=files, headers=headers)

print(response.text)
{
  "request_id": "<string>",
  "request_check_url": "<string>",
  "success": true,
  "error": "<string>",
  "versions": {}
}

Authorizations

X-API-Key
string
header
required

Cookies

wos-session
string
access_token
string
datalab_active_team
string

Body

multipart/form-data
segmentation_schema
string
required

The JSON schema for document segmentation. Should contain segment names and descriptions for identifying page ranges of different document sections.

file_url
string | null

Optional file URL. Provide either file/file_url or checkpoint_id.

checkpoint_id
string | null

Checkpoint ID from a previous /convert request (with save_checkpoint=true). Skips re-parsing when provided.

mode
string
default:fast

Output mode for parsing (only used when providing a file, not a checkpoint).

max_pages
integer | null

The maximum number of pages to process.

page_range
string | null

The page range to process, comma separated like 0,5-10,20.

save_checkpoint
boolean
default:false

Save a checkpoint after processing for future extraction/segmentation calls.

skip_cache
boolean
default:false

Skip the cache and re-run.

webhook_url
string | null

Optional webhook URL to call when the request is complete.

workflowstepdata_id
integer | null

Optional workflow step data ID to associate with this request.

file
file | null

Input PDF, word document, powerpoint, or image file, uploaded as multipart form data. Images must be png, jpg, or webp format.

Response

Successful Response

request_id
string
required

The ID of the request. This ID can be used to check the status of the request.

request_check_url
string
required

The URL to check the status of the request and get results.

success
boolean
default:true

Whether the request was successful.

error
string | null

If the request was not successful, this will contain an error message.

versions

A dictionary of the versions of the libraries used in the request.