Accurate mode runs a multi-pass extraction pipeline with independent verification. Every extracted field includes an audit trail: where the value came from, how it was derived, and whether an independent check confirmed it. Before you begin, make sure you have:Documentation Index
Fetch the complete documentation index at: https://documentation.datalab.to/llms.txt
Use this file to discover all available pages before exploring further.
- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
When to Use Accurate vs Balanced
| Balanced (default) | Accurate | |
|---|---|---|
| Price | $6 / 1K pages | $25 / 1K pages |
| Latency | Fast | Slower — trades speed for accuracy via multi-pass verification |
| Per-field citations | Yes | Yes |
| Extraction status | No | Yes (EXTRACTED / NOT_RESOLVABLE) |
| Per-field reasoning | No | Yes |
| Independent verification | No | Yes (PASS / FAIL) |
| Best for | High-volume workflows: invoices, forms, bank statements | Compliance, financial, legal, and medical workflows where every field needs an audit trail |
Quick Start
extraction_mode controls the extraction pipeline (balanced or accurate). This is separate from mode, which controls the document parsing stage (fast, balanced, or accurate). You can combine them independently — for example, mode="fast" with extraction_mode="accurate".Response Format
In accurate mode, each extracted field includes three sibling keys. The_citations sibling is the same format as balanced mode for compatibility — accurate mode adds _meta with richer metadata on top:
_citations key is shared with balanced mode — if you switch between modes, citation-consuming code continues to work. The _meta key is accurate-mode-only and contains the full audit trail.
Field Metadata
Each_meta object contains:
| Field | Description |
|---|---|
extraction_status | How the value was produced: EXTRACTED (value found in the document) or NOT_RESOLVABLE (document doesn’t contain this information) |
reasoning | Audit-ready prose explaining how the value was produced, with block ID citations |
citations | Block IDs from the source document that support the value |
verification | Independent verification result with status and feedback |
Extraction Status
| Status | Meaning | Value |
|---|---|---|
EXTRACTED | The value was found in or derived from the document | The extracted value |
NOT_RESOLVABLE | The document does not contain or imply this value | null |
Verification Status
| Status | Meaning |
|---|---|
PASS | The value and citations were independently confirmed against the source document |
FAIL_UNRESOLVABLE | The document does not support a value for this field |
FAIL_FIX | The value was flagged as incorrect during verification — the document supports a different value |
FAIL_CITATIONS | The value is correct but the citations are wrong or insufficient |
ITEMS_MISSING | (List fields only) The document contains entries that are not present in the extraction |
PASS or FAIL_UNRESOLVABLE after verification. The other statuses indicate cases where the verifier flagged an issue that could not be fully resolved automatically.
Building Workflows with Verification Metadata
The per-field metadata enables automated quality gates:Common Workflow Patterns
- Auto-approve when all fields have
verification.status == "PASS"— no human review needed - Flag for review when any field is
NOT_RESOLVABLEor has aFAIL_*verification status — the document may be missing information or the extraction needs a human check - Show citations to reviewers so they can verify in seconds — each field links back to specific blocks in the document
- Use reasoning as an audit trail — for compliance workflows, the per-field reasoning documents exactly how each value was produced, with block-level citations back to the source document
Next Steps
Structured Extraction Overview
Schema format, response structure, and extraction tips
Confidence Scoring
Additional per-field confidence scores (works with both modes)
Saved Schemas
Save and version schemas for reuse across requests
Handling Long Documents
Tips for extracting from 100+ page documents