Skip to main content
Score your structured extraction results to get per-field confidence ratings (1–5) with reasoning that explains what evidence was found or missing.
Extraction scoring is in beta.We’re working on scoring every extraction automatically so you can fetch and view scores later.We’d love your feedback — reach out at support@datalab.to.Scoring is free.
Before you begin, make sure you have:
  1. A Datalab account with an API key (new accounts include $5 in free credits)
  2. Python 3.10+ installed
  3. The Datalab SDK: pip install datalab-python-sdk
  4. Your DATALAB_API_KEY environment variable set

How It Works

After extracting structured data from a document, you can score each field to understand how confident the model is in its extraction. Each field receives:
  • A score from 1 (very low confidence) to 5 (high confidence)
  • A reasoning string explaining what evidence supports or undermines the extracted value
There are two ways to get scores:
ApproachMethodTradeoffs
Asynchronous (recommended)Extract first, then call /extract/scoreFaster extraction, scoring failures don’t affect results
SynchronousPass include_scores=true to /extractSingle request, but slower and all-or-nothing
The Playground uses the asynchronous approach — it serves your extraction first, then scores it separately. Extract with save_checkpoint=true, then submit the checkpoint_id to /extract/score. This decouples scoring from extraction so a scoring failure never blocks your results.
# Step 1: Extract with checkpoint
curl -X POST https://www.datalab.to/api/v1/extract \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@invoice.pdf" \
  -F 'page_schema={"type":"object","properties":{"invoice_number":{"type":"string","description":"Invoice ID"},"total_amount":{"type":"number","description":"Total due"},"vendor_name":{"type":"string","description":"Vendor or company name"}}}' \
  -F "save_checkpoint=true"

# Poll request_check_url until status is "complete", then grab checkpoint_id

# Step 2: Score the extraction
curl -X POST https://www.datalab.to/api/v1/extract/score \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"checkpoint_id": "<checkpoint_id>"}'

# Poll request_check_url until status is "complete"

Get Scores Inline (Synchronous)

Pass include_scores=true to receive scores in the same request as extraction.
Tradeoffs of synchronous scoring:
  1. Slower results — the request takes longer because scoring runs after extraction.
  2. All-or-nothing — if scoring fails, the entire request fails and you lose the extraction result too.
For production use, we recommend the asynchronous approach above.
curl -X POST https://www.datalab.to/api/v1/extract \
  -H "X-API-Key: $DATALAB_API_KEY" \
  -F "file=@invoice.pdf" \
  -F 'page_schema={"type":"object","properties":{"invoice_number":{"type":"string","description":"Invoice ID"},"total_amount":{"type":"number","description":"Total due"},"vendor_name":{"type":"string","description":"Vendor or company name"}}}' \
  -F "include_scores=true"

# Poll request_check_url until status is "complete"

Response Format

Without scoring, extraction_schema_json contains fields and citations:
{
  "invoice_number": "INV-2024-001",
  "invoice_number_citations": ["block_123"],
  "total_amount": 1500.00,
  "total_amount_citations": ["block_456"]
}
With scoring, each field also gets a _score object, and the top-level response includes an extraction_score_average:
{
  "invoice_number": "INV-2024-001",
  "invoice_number_citations": ["block_123"],
  "invoice_number_score": {
    "score": 5,
    "reasoning": "Value found verbatim in the document header with a matching citation."
  },
  "total_amount": 1500.00,
  "total_amount_citations": ["block_456"],
  "total_amount_score": {
    "score": 4,
    "reasoning": "Amount found in the totals row; minor ambiguity due to a subtotal nearby."
  }
}
The top-level response also includes extraction_score_average (4.5 in this case), averaging all field scores.

Score Rubric

ScoreMeaning
5High confidence — clear match with strong citation support
4Good confidence — match found with minor ambiguity
3Moderate confidence — partial match or uncertain citation
2Low confidence — match is inferred or weakly supported
1Very low confidence — no clear evidence found

Using Scores in Practice

Use extraction_score_average for a quick quality check, then inspect individual _score fields to flag low-confidence results:
import json

# After getting scored result (from either approach)
avg = result["extraction_score_average"]
print(f"Average score: {avg}")

scored = json.loads(result["extraction_schema_json"])
for key, value in scored.items():
    if not key.endswith("_score"):
        continue

    field = key.replace("_score", "")
    if value["score"] <= 2:
        print(f"Low confidence for '{field}': {value['reasoning']}")
    elif value["score"] >= 4:
        print(f"'{field}' = {scored[field]}")
This is useful for building review workflows — auto-accept high-confidence fields and route low-confidence ones to a human reviewer.

Next Steps

Structured Extraction

Full extraction API reference and schema examples

Handling Long Documents

Strategies for extracting from 100+ page documents

Workflows

Chain extraction and scoring with other processing steps

Document Conversion

Convert documents to various formats