Skip to main content
POST
/
api
/
v1
/
ocr
[DEPRECATED] OCR
import requests

url = "https://www.datalab.to/api/v1/ocr"

files = { "file.0": ("example-file", open("example-file", "rb")) }
payload = {
    "max_pages": "123",
    "page_range": "<string>",
    "langs": "<string>",
    "skip_cache": "false",
    "file": "<string>"
}
headers = {"X-API-Key": "<api-key>"}

response = requests.post(url, data=payload, files=files, headers=headers)

print(response.text)
{
  "request_id": "<string>",
  "request_check_url": "<string>",
  "success": true,
  "error": "<string>",
  "versions": {}
}

Authorizations

X-API-Key
string
header
required

Cookies

wos-session
string
access_token
string
datalab_active_team
string

Body

multipart/form-data
max_pages
integer | null

The maximum number of pages in the PDF to convert.

page_range
string | null

The page range to parse, comma separated like 0,5-10,20. This will override max_pages if provided. Example: '0,2-4' will process pages 0, 2, 3, and 4.

langs
string | null

Note: This parameter has been deprecated, and is no longer used. The languages to use for OCR, comma separated. Can be up to 4 languages. Must be either the names or codes from https://github.com/datalab-to/surya/blob/master/surya/languages.py. Any other inputs will be ignored. Defaults to 'en' if not provided.

skip_cache
boolean
default:false

Skip the cache and re-run the inference. Defaults to False. If set to True, the cache will be skipped and the inference will be re-run.

file
file | null

Input PDF, word document, powerpoint, or image file, uploaded as multipart form data. Images must be png, jpg, or webp format.

Response

Successful Response

request_id
string
required

The ID of the request. This ID can be used to check the status of the request.

request_check_url
string
required

The URL to check the status of the request and get results.

success
boolean
default:true

Whether the request was successful.

error
string | null

If the request was not successful, this will contain an error message.

versions

A dictionary of the versions of the libraries used in the request.