Skip to main content
Forge Evals is a powerful tool for evaluating and comparing different parsing configurations across multiple documents. Use it to determine which settings work best for your specific document types and use cases.

What is Forge Evals?

Forge Evals allows you to:
  • Upload up to 10 documents at once
  • Test up to 5 different parsing configurations simultaneously
  • Compare results side-by-side with visual diff highlighting
  • Identify the optimal parsing settings for your document types
This is particularly useful when you need to:
  • Determine which parsing mode (Fast, Balanced, or Accurate) works best for your documents
  • Evaluate special features like Track Changes or Chart Understanding
  • Compare parsing results across different document types
  • Optimize for speed vs. accuracy trade-offs

Getting started

Access Forge Evals at https://www.datalab.to/app/evals

Step 1: Upload documents

Upload the documents you want to evaluate. You can:
  • Drag and drop files directly into the upload zone
  • Click to browse and select files
  • Upload up to 10 documents per evaluation session
Supported formats: PDF, DOCX, XLSX, PPTX, images, and more. See supported file types for the complete list.
Spreadsheet files (XLS, XLSX, CSV, ODS) are processed automatically without additional configuration options.

Step 2: Select configurations

Choose which parsing configurations to test. Configurations are organized into three tabs:

Datalab tab

Select from Datalab’s preset configurations or create custom ones: Preset configurations:
  • Fast Mode: Lowest latency, great for real-time use cases
  • Balanced Mode: Balanced accuracy and latency, works well with most documents
  • Accurate Mode: Highest accuracy and latency, good for complex documents
  • Track Changes: Extract tracked changes from DOCX files (DOCX only)
  • Chart Understanding: Extract data from charts and graphs
Custom configurations: Create custom configurations to test specific combinations of:
  • Processing mode (Fast, Balanced, or Accurate)
  • Page range selection
  • Special features (Track Changes, Chart Understanding)
  • Output options (pagination, headers, footers)
  • Run count (1-3×): Run the same configuration multiple times to test consistency
Track Changes only works with DOCX files. The grid will show “N/A” for incompatible document/configuration combinations.

Other Models tab

Compare Datalab against other open source models hosted on our infrastructure:
  • OlmoOCR
  • RolmoOCR
  • DotsOCR
  • DeepSeekOCR
These models are hosted by Datalab and don’t require any API credentials. Because Datalab models have additional optimizations when hosted on our managed API, we omit timing numbers from other hosted models to avoid confusion since a fair comparison is difficult. If you’d like to see additional models or want help with custom evals / timings, contact us at [email protected].

External Providers tab

Access to external providers is currently limited to select users. If you’re actively evaluating Datalab against other providers, contact us to request access.
You can also use Evals to compare Datalab outputs to other proprietary document processing providers. Get in touch to enable this.

Step 3: Run evaluation

Click “Start Evaluation” to begin processing. The system will:
  1. Process each document with each selected configuration
  2. Display progress in a grid view
  3. Show completion status and processing time for each run
You can:
  • Monitor progress in real-time
  • Cancel all runs if needed
  • Retry failed runs

Step 4: Compare results

Once runs complete, click any two cells in the grid to compare their results side-by-side. The comparison view shows:
  • Parallel view: Full documents side-by-side with inline diff highlighting
  • Multiple output formats: Switch between Markdown, HTML, JSON, and Chunks
  • Rendered output: Toggle between raw and rendered views for HTML, Markdown, and JSON formats
  • Visual diffs: When enabled with rendered output, see word-level highlighting of changes
  • JSON visualization: View JSON output with document thumbnails and bounding boxes overlaid
  • Processing metrics: Duration and configuration details for each run
  • Diff statistics: Lines added, removed, and changed

Viewing modes

  • Raw view: See the original output text with line numbers
  • Rendered view: View formatted HTML/Markdown or visualized JSON with thumbnails
  • Diff view: Compare outputs with line-by-line or word-level highlighting
  • Rendered diff: Combine rendered output with word-level diff highlighting (HTML/Markdown only)
Rendered diff view is only available for HTML and Markdown formats. JSON rendered view shows bounding boxes but does not support diff highlighting.
Use the “Switch Runs” button to select different runs for comparison without leaving the comparison view.

Visualization features

Rendered output

Toggle the “Render” button to view formatted output instead of raw text:
  • HTML/Markdown: See the fully rendered document with proper formatting, including math equations rendered with MathJax
  • JSON: View document thumbnails with bounding boxes overlaid on detected blocks (text, tables, figures, etc.)

Diff highlighting

When comparing two runs, enable “Show Diff” to see differences:
  • Raw diff: Line-by-line comparison with added/removed lines highlighted
  • Rendered diff: Word-level highlighting within rendered HTML/Markdown output, preserving formatting and math rendering
The rendered diff view intelligently highlights:
  • Changed paragraphs with block-level highlighting
  • Specific changed words within modified paragraphs
  • Preserved math equations with accurate semantic comparison
Rendered diff is not available for JSON format. Use raw diff view to compare JSON outputs.

Multiple iterations

When a configuration is set to run multiple times (2× or 3×), each iteration appears as a separate column in the grid (e.g., “Accurate #1”, “Accurate #2”). This allows you to:
  • Compare consistency across multiple runs of the same configuration
  • Identify variability in parsing results
  • Validate that your configuration produces stable outputs

Excluding runs

Right-click any cell in the grid to exclude that specific document/configuration combination from running. This is useful when:
  • You know certain configurations won’t work for specific documents
  • You want to reduce the total number of runs
  • You need to focus on specific comparisons
Excluded cells appear with a yellow background and can be re-included by clicking them again.

Best practices

Choosing configurations

  • Start with the three preset modes (Fast, Balanced, Accurate) to establish a baseline
  • Add Track Changes if you’re working with DOCX files that contain revisions
  • Add Chart Understanding if your documents contain charts or graphs
  • Create custom configurations to test specific parameter combinations

Document selection

  • Include representative samples of your document types
  • Test edge cases (complex layouts, mixed content, etc.)
  • Keep document count manageable (3-5 documents is often sufficient)

Interpreting results

  • Compare processing times to understand speed/accuracy trade-offs
  • Use the diff view to identify where configurations produce different outputs
  • Toggle between raw and rendered views to see formatted output
  • Use rendered diff view for word-level highlighting of changes in HTML/Markdown
  • Visualize JSON output with bounding boxes to see document structure
  • Pay attention to “N/A” cells indicating incompatible combinations
  • Look for patterns across similar document types
  • Run configurations multiple times (using run count) to test consistency and identify variability

Limitations

  • Maximum 10 documents per evaluation session
  • Maximum 5 run configurations per session
  • Maximum 3 iterations per configuration
  • Track Changes feature only works with DOCX files
  • Spreadsheet files use automatic configuration (no mode selection)
  • Rendered diff view only available for HTML and Markdown formats
  • External provider access is limited to select users (contact us for access)

Custom evaluations

For larger document sets or custom evaluation needs, contact us to discuss enterprise evaluation options.