OCR - Datalab Documentation

OCR

curl --request POST \
  --url https://www.datalab.to/api/v1/ocr \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: <api-key>' \
  --form max_pages=123 \
  --form 'page_range=<string>' \
  --form 'langs=<string>' \
  --form skip_cache=false \
  --form file=@example-file

{
  "success": true,
  "error": "<string>",
  "request_id": "<string>",
  "request_check_url": "<string>",
  "versions": {}
}

POST

api

ocr

OCR

curl --request POST \
  --url https://www.datalab.to/api/v1/ocr \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: <api-key>' \
  --form max_pages=123 \
  --form 'page_range=<string>' \
  --form 'langs=<string>' \
  --form skip_cache=false \
  --form file=@example-file

{
  "success": true,
  "error": "<string>",
  "request_id": "<string>",
  "request_check_url": "<string>",
  "versions": {}
}

Authorizations

X-API-Key

string

header

required

Cookies

access_token

string

Body

multipart/form-data

max_pages

integer | null

The maximum number of pages in the PDF to convert.

page_range

string | null

The page range to parse, comma separated like 0,5-10,20. This will override max_pages if provided. Example: '0,2-4' will process pages 0, 2, 3, and 4.

langs

string | null

Note: This parameter has been deprecated, and is no longer used. The languages to use for OCR, comma separated. Can be up to 4 languages. Must be either the names or codes from https://github.com/datalab-to/surya/blob/master/surya/languages.py. Any other inputs will be ignored. Defaults to 'en' if not provided.

skip_cache

boolean

default:false

Skip the cache and re-run the inference. Defaults to False. If set to True, the cache will be skipped and the inference will be re-run.

file

file | null

Input PDF, word document, powerpoint, or image file, uploaded as multipart form data. Images must be png, jpg, or webp format.

Response

Successful Response

request_id

string

required

The ID of the request. This ID can be used to check the status of the request.

request_check_url

string

required

The URL to check the status of the request and get results.

success

boolean

default:true

Whether the request was successful.

error

string | null

If the request was not successful, this will contain an error message.

versions

A dictionary of the versions of the libraries used in the request.

Table Recognition Layout

⌘I

API Reference

Authorizations

Cookies

Body

Response