Datalab implements various limits to ensure fair usage and maintain service quality for all users. This guide covers all the limits you need to know about, how they work, and what to do if you need higher limits.

File Size Limits

Current Limits by File Type

File TypeMaximum SizeNotes
PDF Documents200 MBAll pages combined
Images200 MBPer image file
Office Documents200 MB.docx, .xlsx, .pptx

Handling Large Files

For files exceeding 200 MB, you have several options:
  1. File Splitting: Break large PDFs into smaller chunks
  2. Compression: Reduce file size before upload
  3. Enterprise Limits: Contact support for increased limits
  4. Batch Processing: Process files in segments
# Split a large PDF into 50-page chunks
import PyPDF2

def split_pdf(input_file, pages_per_chunk=50):
    pdf_reader = PyPDF2.PdfReader(input_file)
    total_pages = len(pdf_reader.pages)

    for start in range(0, total_pages, pages_per_chunk):
        pdf_writer = PyPDF2.PdfWriter()
        end = min(start + pages_per_chunk, total_pages)

        for page in range(start, end):
            pdf_writer.add_page(pdf_reader.pages[page])

        output_file = f"chunk_{start//pages_per_chunk + 1}.pdf"
        with open(output_file, 'wb') as output:
            pdf_writer.write(output)

Signed URLs

We do not currently support signed URLs but plan to implement them soon.

Rate Limits

When you exceed rate limits, you’ll receive a 429 error and will need to wait up to 60 seconds. For most accounts, this is capped to 200 documents per minute. Custom rate limits are available for enterprise plan users. More information can be found here.

Implementing Your Own Retry Logic

import time
import requests

def api_call_with_retry(url, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, json=data)

        if response.status_code == 429:
            time.sleep(60)
            continue

        return response

    raise Exception("Max retries exceeded")