What is Forge Evals?
Forge Evals allows you to:- Upload up to 10 documents at once
- Test up to 5 different parsing configurations simultaneously
- Compare results side-by-side with visual diff highlighting
- Identify the optimal parsing settings for your document types
- Determine which parsing mode (Fast, Balanced, or Accurate) works best for your documents
- Evaluate special features like Track Changes or Chart Understanding
- Compare parsing results across different document types
- Optimize for speed vs. accuracy trade-offs
Getting started
Access Forge Evals at https://www.datalab.to/app/evalsStep 1: Upload documents
Upload the documents you want to evaluate. You can:- Drag and drop files directly into the upload zone
- Click to browse and select files
- Upload up to 10 documents per evaluation session
Spreadsheet files (XLS, XLSX, CSV, ODS) are processed automatically without additional configuration options.
Step 2: Select configurations
Choose which parsing configurations to test. You can select from preset configurations or create custom ones.Preset configurations
- Fast Mode: Lowest latency, great for real-time use cases
- Balanced Mode: Balanced accuracy and latency, works well with most documents
- Accurate Mode: Highest accuracy and latency, good for complex documents
- Track Changes: Extract tracked changes from DOCX files (DOCX only)
- Chart Understanding: Extract data from charts and graphs
Custom configurations
Create custom configurations to test specific combinations of:- Processing mode (Fast, Balanced, or Accurate)
- Page range selection
- Special features (Track Changes, Chart Understanding)
- Output options (pagination, headers, footers)
Track Changes only works with DOCX files. The grid will show “N/A” for incompatible document/configuration combinations.
Step 3: Run evaluation
Click “Start Evaluation” to begin processing. The system will:- Process each document with each selected configuration
- Display progress in a grid view
- Show completion status and processing time for each run
- Monitor progress in real-time
- Cancel all runs if needed
- Retry failed runs
Step 4: Compare results
Once runs complete, click any two cells in the grid to compare their results side-by-side. The comparison view shows:- Parallel view: Full documents side-by-side with inline diff highlighting
- Multiple output formats: Switch between Markdown, HTML, and JSON
- Processing metrics: Duration and configuration details for each run
- Diff statistics: Lines added, removed, and changed
Excluding runs
Right-click any cell in the grid to exclude that specific document/configuration combination from running. This is useful when:- You know certain configurations won’t work for specific documents
- You want to reduce the total number of runs
- You need to focus on specific comparisons
Best practices
Choosing configurations
- Start with the three preset modes (Fast, Balanced, Accurate) to establish a baseline
- Add Track Changes if you’re working with DOCX files that contain revisions
- Add Chart Understanding if your documents contain charts or graphs
- Create custom configurations to test specific parameter combinations
Document selection
- Include representative samples of your document types
- Test edge cases (complex layouts, mixed content, etc.)
- Keep document count manageable (3-5 documents is often sufficient)
Interpreting results
- Compare processing times to understand speed/accuracy trade-offs
- Use the diff view to identify where configurations produce different outputs
- Pay attention to “N/A” cells indicating incompatible combinations
- Look for patterns across similar document types
Limitations
- Maximum 10 documents per evaluation session
- Maximum 5 run configurations per session
- Track Changes feature only works with DOCX files
- Spreadsheet files use automatic configuration (no mode selection)