Extraction scoring is in beta.We’re working on scoring every extraction automatically so you can fetch and view scores later.We’d love your feedback — reach out at support@datalab.to.Scoring is free.
- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
How It Works
After extracting structured data from a document, you can score each field to understand how confident the model is in its extraction. Each field receives:- A score from 1 (very low confidence) to 5 (high confidence)
- A reasoning string explaining what evidence supports or undermines the extracted value
| Approach | Method | Tradeoffs |
|---|---|---|
| Asynchronous (recommended) | Extract first, then call /extract/score | Faster extraction, scoring failures don’t affect results |
| Synchronous | Pass include_scores=true to /extract | Single request, but slower and all-or-nothing |
Score a Previous Extraction (Recommended)
Extract withsave_checkpoint=true, then submit the checkpoint_id to /extract/score. This decouples scoring from extraction so a scoring failure never blocks your results.
Get Scores Inline (Synchronous)
Passinclude_scores=true to receive scores in the same request as extraction.
Response Format
Without scoring,extraction_schema_json contains fields and citations:
_score object, and the top-level response includes an extraction_score_average:
extraction_score_average (4.5 in this case), averaging all field scores.
Score Rubric
| Score | Meaning |
|---|---|
| 5 | High confidence — clear match with strong citation support |
| 4 | Good confidence — match found with minor ambiguity |
| 3 | Moderate confidence — partial match or uncertain citation |
| 2 | Low confidence — match is inferred or weakly supported |
| 1 | Very low confidence — no clear evidence found |
Using Scores in Practice
Useextraction_score_average for a quick quality check, then inspect individual _score fields to flag low-confidence results:
Next Steps
Structured Extraction
Full extraction API reference and schema examples
Handling Long Documents
Strategies for extracting from 100+ page documents
Workflows
Chain extraction and scoring with other processing steps
Document Conversion
Convert documents to various formats