Extraction Confidence Scoring

Score your structured extraction results to get per-field confidence ratings (1–5) with reasoning that explains what evidence was found or missing.

Extraction scoring is in beta.We’d love your feedback — reach out at support@datalab.to.Scoring is free.

Before you begin, make sure you have:

A Datalab account with an API key (new accounts include $5 in free credits)
Python 3.10+ installed
The Datalab SDK: pip install datalab-python-sdk
Your DATALAB_API_KEY environment variable set

How It Works

Scoring runs automatically after every extraction. When you poll request_check_url, the extraction result initially contains just the extracted fields and citations. If you continue polling the same URL, the response will eventually include _score fields and an extraction_score_average once scoring completes. Each scored field receives:

A score from 1 (very low confidence) to 5 (high confidence)
A reasoning string explaining what evidence supports or undermines the extracted value

No extra parameters or endpoints are needed — just keep polling until scores appear.

Example

import requests, json, time, os

headers = {"X-API-Key": os.getenv("DATALAB_API_KEY")}

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string", "description": "Invoice ID or number"},
        "total_amount": {"type": "number", "description": "Total amount due"},
        "vendor_name": {"type": "string", "description": "Vendor or company name"}
    },
    "required": ["invoice_number", "total_amount"]
}

with open("invoice.pdf", "rb") as f:
    resp = requests.post(
        "https://www.datalab.to/api/v1/extract",
        files={"file": ("invoice.pdf", f, "application/pdf")},
        data={
            "page_schema": json.dumps(schema),
            "mode": "balanced"
        },
        headers=headers
    )
check_url = resp.json()["request_check_url"]

# Poll until extraction is complete
while True:
    result = requests.get(check_url, headers=headers).json()
    if result["status"] == "complete":
        extracted = json.loads(result["extraction_schema_json"])
        print("Extraction:", extracted)
        break
    time.sleep(2)

# Continue polling — scores are enriched asynchronously
while "extraction_score_average" not in result:
    time.sleep(2)
    result = requests.get(check_url, headers=headers).json()

scored = json.loads(result["extraction_schema_json"])
for key, value in scored.items():
    if key.endswith("_score"):
        field = key.replace("_score", "")
        print(f"{field}: score={value['score']}, reasoning={value['reasoning']}")

Response Format

Without scoring, extraction_schema_json contains fields and citations:

{
  "invoice_number": "INV-2024-001",
  "invoice_number_citations": ["block_123"],
  "total_amount": 1500.00,
  "total_amount_citations": ["block_456"]
}

With scoring, each field also gets a _score object, and the top-level response includes an extraction_score_average:

{
  "invoice_number": "INV-2024-001",
  "invoice_number_citations": ["block_123"],
  "invoice_number_score": {
    "score": 5,
    "reasoning": "Value found verbatim in the document header with a matching citation."
  },
  "total_amount": 1500.00,
  "total_amount_citations": ["block_456"],
  "total_amount_score": {
    "score": 4,
    "reasoning": "Amount found in the totals row; minor ambiguity due to a subtotal nearby."
  }
}

The top-level response also includes extraction_score_average (4.5 in this case), averaging all field scores.

Score Rubric

Score	Meaning
5	High confidence — clear match with strong citation support
4	Good confidence — match found with minor ambiguity
3	Moderate confidence — partial match or uncertain citation
2	Low confidence — match is inferred or weakly supported
1	Very low confidence — no clear evidence found

Using Scores in Practice

Use extraction_score_average for a quick quality check, then inspect individual _score fields to flag low-confidence results:

import json

# After getting scored result (from either approach)
avg = result["extraction_score_average"]
print(f"Average score: {avg}")

scored = json.loads(result["extraction_schema_json"])
for key, value in scored.items():
    if not key.endswith("_score"):
        continue

    field = key.replace("_score", "")
    if value["score"] <= 2:
        print(f"Low confidence for '{field}': {value['reasoning']}")
    elif value["score"] >= 4:
        print(f"'{field}' = {scored[field]}")

This is useful for building review workflows — auto-accept high-confidence fields and route low-confidence ones to a human reviewer.

Next Steps

Structured Extraction

Full extraction API reference and schema examples

Handling Long Documents

Strategies for extracting from 100+ page documents

Pipelines

Chain processors into versioned, reusable pipelines.

Document Conversion

Convert documents to various formats

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

How It Works

Example

Response Format

Score Rubric

Using Scores in Practice

Next Steps

Structured Extraction

Handling Long Documents

Pipelines

Document Conversion

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Pipelines

Create Document

Track Changes

Table Recognition (Deprecated)

Forge Evals

Documentation Index

​How It Works

​Example

​Response Format

​Score Rubric

​Using Scores in Practice

​Next Steps

Structured Extraction

Handling Long Documents

Pipelines

Document Conversion

How It Works

Example

Response Format

Score Rubric

Using Scores in Practice

Next Steps