Restrict to Specific Pages
If you know which pages contain the data you need, usepage_range:
Segment and Chain Extractions
For documents with distinct sections (like financial reports or contracts), extract the table of contents first, then process each section separately.Step 1: Extract Table of Contents
Step 2: Extract Each Section
Use Document Segmentation
For documents without a clear table of contents, use Document Segmentation to automatically split by section headers.Full Example
Complete workflow for processing a 100+ page financial report:Tips
- Process pages you need - Use
page_rangeto avoid processing unnecessary pages - Extract TOC first - Build page ranges dynamically from the document structure
- Use appropriate modes -
balancedis usually sufficient; useaccuratefor complex tables - Handle errors - Some sections may not match your schema exactly
Next Steps
Structured Extraction
Learn the full structured extraction API and schema options.
Document Segmentation
Automatically split documents by section headers.
Batch Processing
Process multiple long documents efficiently in parallel.
Pipelines
Chain processors into versioned, reusable pipelines.