SDK

Our SDK is the fastest way to get started with Datalab.
pip install datalab-python-sdk
Then get your API key here.

Python

Python
from datalab_sdk import DatalabClient

# Use AsyncDatalabClient for async versions
# You can also set DATALAB_API_KEY
client = DatalabClient(api_key="YOUR_KEY_HERE")

# Convert PDF to markdown
result = client.convert("document.pdf")
print(result.markdown)

# OCR a document
ocr_result = client.ocr("document.pdf")
print(ocr_result.pages)

CLI

CLI
 datalab convert document.pdf --api_key YOUR_API_KEY

REST

If you need more customization, you can call our REST endpoints directly.
HTTP
POST /api/v1/marker
GET /api/v1/marker/{request_id}

POST /api/v1/ocr
GET /api/v1/ocr/{request_id}

Python REST example


import requests
import time

url = "https://www.datalab.to/api/v1/marker"

form_data = {
    'file': ('test.pdf', open('~/pdfs/test.pdf', 'rb'), 'application/pdf'),
}

headers = {"X-Api-Key": "YOUR_API_KEY"}

response = requests.post(url, files=form_data, headers=headers)
data = response.json()

max_polls = 300
check_url = data["request_check_url"]

for i in range(max_polls):
    time.sleep(2)
    response = requests.get(check_url, headers=headers)
    data = response.json()

    if data["status"] == "complete":
        break