Basic Usage
Extract Options
UseExtractOptions to configure extraction behavior:
| Option | Type | Default | Description |
|---|---|---|---|
page_schema | str | Required | JSON schema defining the fields to extract |
checkpoint_id | str | None | Checkpoint ID from a previous convert() call |
mode | str | "fast" | Processing mode: "fast", "balanced", "accurate" |
output_format | str | "markdown" | Output format: "markdown", "html", "json", "chunks" |
save_checkpoint | bool | False | Save checkpoint for reuse with subsequent calls |
max_pages | int | None | Maximum number of pages to process |
page_range | str | None | Specific pages to process (e.g., "0-5,10") |
skip_cache | bool | False | Skip cached results, force reprocessing |
webhook_url | str | None | Webhook URL for completion notification |
Checkpoint Workflow
Use checkpoints to avoid re-parsing a document when running extraction after conversion. First convert withsave_checkpoint=True, then extract using the returned checkpoint_id: