Skip to main content
POST
/
api
/
v1
/
ocr
OCR
curl --request POST \
  --url https://www.datalab.to/api/v1/ocr \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: <api-key>' \
  --form max_pages=123 \
  --form 'page_range=<string>' \
  --form 'langs=<string>' \
  --form skip_cache=false \
  --form file=@example-file
{
  "success": true,
  "error": "<string>",
  "request_id": "<string>",
  "request_check_url": "<string>",
  "versions": {}
}

Authorizations

X-API-Key
string
header
required

Cookies

access_token
string

Body

multipart/form-data
max_pages
integer | null

The maximum number of pages in the PDF to convert.

page_range
string | null

The page range to parse, comma separated like 0,5-10,20. This will override max_pages if provided. Example: '0,2-4' will process pages 0, 2, 3, and 4.

langs
string | null

Note: This parameter has been deprecated, and is no longer used. The languages to use for OCR, comma separated. Can be up to 4 languages. Must be either the names or codes from https://github.com/datalab-to/surya/blob/master/surya/languages.py. Any other inputs will be ignored. Defaults to 'en' if not provided.

skip_cache
boolean
default:false

Skip the cache and re-run the inference. Defaults to False. If set to True, the cache will be skipped and the inference will be re-run.

file
file | null

Input PDF, word document, powerpoint, or image file, uploaded as multipart form data. Images must be png, jpg, or webp format.

Response

Successful Response

request_id
string
required

The ID of the request. This ID can be used to check the status of the request.

request_check_url
string
required

The URL to check the status of the request and get results.

success
boolean
default:true

Whether the request was successful.

error
string | null

If the request was not successful, this will contain an error message.

versions

A dictionary of the versions of the libraries used in the request.