What is Forge Evals?
Forge Evals allows you to:- Upload up to 10 documents at once
- Test up to 5 different parsing configurations simultaneously
- Compare results side-by-side with visual diff highlighting
- Identify the optimal parsing settings for your document types
- Determine which parsing mode (Fast, Balanced, or Accurate) works best for your documents
- Evaluate special features like Track Changes or Chart Understanding
- Compare parsing results across different document types
- Optimize for speed vs. accuracy trade-offs
Getting started
Access Forge Evals at https://www.datalab.to/app/evalsStep 1: Upload documents
Upload the documents you want to evaluate. You can:- Drag and drop files directly into the upload zone
- Click to browse and select files
- Upload up to 10 documents per evaluation session
Spreadsheet files (XLS, XLSX, CSV, ODS) are processed automatically without additional configuration options.
Step 2: Select configurations
Choose which parsing configurations to test. Configurations are organized into three tabs:Datalab tab
Select from Datalab’s preset configurations or create custom ones: Preset configurations:- Fast Mode: Lowest latency, great for real-time use cases
- Balanced Mode: Balanced accuracy and latency, works well with most documents
- Accurate Mode: Highest accuracy and latency, good for complex documents
- Track Changes: Extract tracked changes from DOCX files (DOCX only)
- Chart Understanding: Extract data from charts and graphs
- Processing mode (Fast, Balanced, or Accurate)
- Page range selection
- Special features (Track Changes, Chart Understanding)
- Output options (pagination, headers, footers)
- Run count (1-3×): Run the same configuration multiple times to test consistency
Track Changes only works with DOCX files. The grid will show “N/A” for incompatible document/configuration combinations.
Other Models tab
Compare Datalab against other open source models hosted on our infrastructure:- OlmoOCR
- RolmoOCR
- DotsOCR
- DeepSeekOCR
External Providers tab
Access to external providers is currently limited to select users. If you’re actively evaluating Datalab against other providers, contact us to request access.
Step 3: Run evaluation
Click “Start Evaluation” to begin processing. The system will:- Process each document with each selected configuration
- Display progress in a grid view
- Show completion status and processing time for each run
- Monitor progress in real-time
- Cancel all runs if needed
- Retry failed runs
Step 4: Compare results
Once runs complete, click any two cells in the grid to compare their results side-by-side. The comparison view shows:- Parallel view: Full documents side-by-side with inline diff highlighting
- Multiple output formats: Switch between Markdown, HTML, JSON, and Chunks
- Rendered output: Toggle between raw and rendered views for HTML, Markdown, and JSON formats
- Visual diffs: When enabled with rendered output, see word-level highlighting of changes
- JSON visualization: View JSON output with document thumbnails and bounding boxes overlaid
- Processing metrics: Duration and configuration details for each run
- Diff statistics: Lines added, removed, and changed
Viewing modes
- Raw view: See the original output text with line numbers
- Rendered view: View formatted HTML/Markdown or visualized JSON with thumbnails
- Diff view: Compare outputs with line-by-line or word-level highlighting
- Rendered diff: Combine rendered output with word-level diff highlighting (HTML/Markdown only)
Rendered diff view is only available for HTML and Markdown formats. JSON rendered view shows bounding boxes but does not support diff highlighting.
Visualization features
Rendered output
Toggle the “Render” button to view formatted output instead of raw text:- HTML/Markdown: See the fully rendered document with proper formatting, including math equations rendered with MathJax
- JSON: View document thumbnails with bounding boxes overlaid on detected blocks (text, tables, figures, etc.)
Diff highlighting
When comparing two runs, enable “Show Diff” to see differences:- Raw diff: Line-by-line comparison with added/removed lines highlighted
- Rendered diff: Word-level highlighting within rendered HTML/Markdown output, preserving formatting and math rendering
- Changed paragraphs with block-level highlighting
- Specific changed words within modified paragraphs
- Preserved math equations with accurate semantic comparison
Rendered diff is not available for JSON format. Use raw diff view to compare JSON outputs.
Multiple iterations
When a configuration is set to run multiple times (2× or 3×), each iteration appears as a separate column in the grid (e.g., “Accurate #1”, “Accurate #2”). This allows you to:- Compare consistency across multiple runs of the same configuration
- Identify variability in parsing results
- Validate that your configuration produces stable outputs
Excluding runs
Right-click any cell in the grid to exclude that specific document/configuration combination from running. This is useful when:- You know certain configurations won’t work for specific documents
- You want to reduce the total number of runs
- You need to focus on specific comparisons
Best practices
Choosing configurations
- Start with the three preset modes (Fast, Balanced, Accurate) to establish a baseline
- Add Track Changes if you’re working with DOCX files that contain revisions
- Add Chart Understanding if your documents contain charts or graphs
- Create custom configurations to test specific parameter combinations
Document selection
- Include representative samples of your document types
- Test edge cases (complex layouts, mixed content, etc.)
- Keep document count manageable (3-5 documents is often sufficient)
Interpreting results
- Compare processing times to understand speed/accuracy trade-offs
- Use the diff view to identify where configurations produce different outputs
- Toggle between raw and rendered views to see formatted output
- Use rendered diff view for word-level highlighting of changes in HTML/Markdown
- Visualize JSON output with bounding boxes to see document structure
- Pay attention to “N/A” cells indicating incompatible combinations
- Look for patterns across similar document types
- Run configurations multiple times (using run count) to test consistency and identify variability
Limitations
- Maximum 10 documents per evaluation session
- Maximum 5 run configurations per session
- Maximum 3 iterations per configuration
- Track Changes feature only works with DOCX files
- Spreadsheet files use automatic configuration (no mode selection)
- Rendered diff view only available for HTML and Markdown formats
- External provider access is limited to select users (contact us for access)