SDK Batch Processing
Process multiple files using Python’s async capabilities:Async Batch Processing
For higher throughput:CLI Batch Processing
The CLI handles directory processing automatically:REST API Batch Processing
For raw API usage, implement parallel requests with retry handling:Rate Limits
- Request rate limit: 400 requests per minute per account (429 on exceed)
- Concurrent request limit: 400 concurrent requests (429 on exceed)
- Page concurrency limit: 5,000 pages in flight across all requests — this is enforced during processing, not at submission. Results return with
success: falseif exceeded. Always check thesuccessfield when polling for results. - The SDK and CLI handle request rate limiting and retries automatically
- For higher limits, contact support@datalab.to
Tips
- Use async for high throughput - Async processing handles many concurrent requests efficiently
- Limit concurrency - Start with 5-10 concurrent requests and adjust based on your rate limits
- Handle failures gracefully - Use
return_exceptions=Truewithasyncio.gatherto continue processing on errors - Save progress - Write results incrementally to avoid losing work on long batches
Next Steps
Document Conversion
Learn more about Marker’s conversion API and output formats.
API Limits
Understand rate limits and how to optimize throughput.
Workflows
Chain batch processing into multi-step document workflows.
Webhooks
Get notified when batch conversions complete via webhooks.