Extraction scoring is in beta.We’d love your feedback — reach out at support@datalab.to.Scoring is free.
- A Datalab account with an API key (new accounts include $5 in free credits)
- Python 3.10+ installed
- The Datalab SDK:
pip install datalab-python-sdk - Your
DATALAB_API_KEYenvironment variable set
How It Works
Scoring runs automatically after every extraction. When you pollrequest_check_url, the extraction result initially contains just the extracted fields and citations. If you continue polling the same URL, the response will eventually include _score fields and an extraction_score_average once scoring completes.
Each scored field receives:
- A score from 1 (very low confidence) to 5 (high confidence)
- A reasoning string explaining what evidence supports or undermines the extracted value
Example
Response Format
Without scoring,extraction_schema_json contains fields and citations:
_score object, and the top-level response includes an extraction_score_average:
extraction_score_average (4.5 in this case), averaging all field scores.
Score Rubric
| Score | Meaning |
|---|---|
| 5 | High confidence — clear match with strong citation support |
| 4 | Good confidence — match found with minor ambiguity |
| 3 | Moderate confidence — partial match or uncertain citation |
| 2 | Low confidence — match is inferred or weakly supported |
| 1 | Very low confidence — no clear evidence found |
Using Scores in Practice
Useextraction_score_average for a quick quality check, then inspect individual _score fields to flag low-confidence results:
Next Steps
Structured Extraction
Full extraction API reference and schema examples
Handling Long Documents
Strategies for extracting from 100+ page documents
Pipelines
Chain processors into versioned, reusable pipelines.
Document Conversion
Convert documents to various formats