Forge Evals

Forge Evals is a powerful tool for evaluating and comparing different parsing configurations across multiple documents. Use it to determine which settings work best for your specific document types and use cases.

What is Forge Evals?

Forge Evals allows you to:

Upload up to 10 documents at once
Test up to 5 different parsing configurations simultaneously
Compare results side-by-side with visual diff highlighting
Identify the optimal parsing settings for your document types

This is particularly useful when you need to:

Determine which parsing mode (Fast, Balanced, or Accurate) works best for your documents
Evaluate special features like Track Changes or Chart Understanding
Compare parsing results across different document types
Optimize for speed vs. accuracy trade-offs

Getting started

Access Forge Evals at https://www.datalab.to/app/evals

Step 1: Upload documents

Upload the documents you want to evaluate. You can:

Drag and drop files directly into the upload zone
Click to browse and select files
Upload up to 10 documents per evaluation session

Supported formats: PDF, DOCX, XLSX, PPTX, images, and more. See supported file types for the complete list.

Spreadsheet files (XLS, XLSX, CSV, ODS) are processed automatically without additional configuration options.

Step 2: Select configurations

Choose which parsing configurations to test. Configurations are organized into three tabs:

Datalab tab

Select from Datalab’s preset configurations or create custom ones: Preset configurations:

Fast Mode: Lowest latency, great for real-time use cases
Balanced Mode: Balanced accuracy and latency, works well with most documents
Accurate Mode: Highest accuracy and latency, good for complex documents
Track Changes: Extract tracked changes from DOCX files (DOCX only)
Chart Understanding: Extract data from charts and graphs

Custom configurations: Create custom configurations to test specific combinations of:

Processing mode (Fast, Balanced, or Accurate)
Page range selection
Special features (Track Changes, Chart Understanding)
Output options (pagination, headers, footers)
Run count (1-3×): Run the same configuration multiple times to test consistency

Track Changes only works with DOCX files. The grid will show “N/A” for incompatible document/configuration combinations.

Other Models tab

Compare Datalab against other open source models hosted on our infrastructure:

OlmoOCR
RolmoOCR
DotsOCR
DeepSeekOCR

These models are hosted by Datalab and don’t require any API credentials. Because Datalab models have additional optimizations when hosted on our managed API, we omit timing numbers from other hosted models to avoid confusion since a fair comparison is difficult. If you’d like to see additional models or want help with custom evals / timings, contact us at [email protected].

External Providers tab

Access to external providers is currently limited to select users. If you’re actively evaluating Datalab against other providers, contact us to request access.

You can also use Evals to compare Datalab outputs to other proprietary document processing providers. Get in touch to enable this.

Step 3: Run evaluation

Click “Start Evaluation” to begin processing. The system will:

Process each document with each selected configuration
Display progress in a grid view
Show completion status and processing time for each run

You can:

Monitor progress in real-time
Cancel all runs if needed
Retry failed runs

Step 4: Compare results

Once runs complete, click any two cells in the grid to compare their results side-by-side. The comparison view shows:

Parallel view: Full documents side-by-side with inline diff highlighting
Multiple output formats: Switch between Markdown, HTML, JSON, and Chunks
Rendered output: Toggle between raw and rendered views for HTML, Markdown, and JSON formats
Visual diffs: When enabled with rendered output, see word-level highlighting of changes
JSON visualization: View JSON output with document thumbnails and bounding boxes overlaid
Processing metrics: Duration and configuration details for each run
Diff statistics: Lines added, removed, and changed

Viewing modes

Raw view: See the original output text with line numbers
Rendered view: View formatted HTML/Markdown or visualized JSON with thumbnails
Diff view: Compare outputs with line-by-line or word-level highlighting
Rendered diff: Combine rendered output with word-level diff highlighting (HTML/Markdown only)

Rendered diff view is only available for HTML and Markdown formats. JSON rendered view shows bounding boxes but does not support diff highlighting.

Use the “Switch Runs” button to select different runs for comparison without leaving the comparison view.

Visualization features

Rendered output

Toggle the “Render” button to view formatted output instead of raw text:

HTML/Markdown: See the fully rendered document with proper formatting, including math equations rendered with MathJax
JSON: View document thumbnails with bounding boxes overlaid on detected blocks (text, tables, figures, etc.)

Diff highlighting

When comparing two runs, enable “Show Diff” to see differences:

Raw diff: Line-by-line comparison with added/removed lines highlighted
Rendered diff: Word-level highlighting within rendered HTML/Markdown output, preserving formatting and math rendering

The rendered diff view intelligently highlights:

Changed paragraphs with block-level highlighting
Specific changed words within modified paragraphs
Preserved math equations with accurate semantic comparison

Rendered diff is not available for JSON format. Use raw diff view to compare JSON outputs.

Multiple iterations

When a configuration is set to run multiple times (2× or 3×), each iteration appears as a separate column in the grid (e.g., “Accurate #1”, “Accurate #2”). This allows you to:

Compare consistency across multiple runs of the same configuration
Identify variability in parsing results
Validate that your configuration produces stable outputs

Excluding runs

Right-click any cell in the grid to exclude that specific document/configuration combination from running. This is useful when:

You know certain configurations won’t work for specific documents
You want to reduce the total number of runs
You need to focus on specific comparisons

Excluded cells appear with a yellow background and can be re-included by clicking them again.

Best practices

Choosing configurations

Start with the three preset modes (Fast, Balanced, Accurate) to establish a baseline
Add Track Changes if you’re working with DOCX files that contain revisions
Add Chart Understanding if your documents contain charts or graphs
Create custom configurations to test specific parameter combinations

Document selection

Include representative samples of your document types
Test edge cases (complex layouts, mixed content, etc.)
Keep document count manageable (3-5 documents is often sufficient)

Interpreting results

Compare processing times to understand speed/accuracy trade-offs
Use the diff view to identify where configurations produce different outputs
Toggle between raw and rendered views to see formatted output
Use rendered diff view for word-level highlighting of changes in HTML/Markdown
Visualize JSON output with bounding boxes to see document structure
Pay attention to “N/A” cells indicating incompatible combinations
Look for patterns across similar document types
Run configurations multiple times (using run count) to test consistency and identify variability

Limitations

Maximum 10 documents per evaluation session
Maximum 5 run configurations per session
Maximum 3 iterations per configuration
Track Changes feature only works with DOCX files
Spreadsheet files use automatic configuration (no mode selection)
Rendered diff view only available for HTML and Markdown formats
External provider access is limited to select users (contact us for access)

Custom evaluations

For larger document sets or custom evaluation needs, contact us to discuss enterprise evaluation options.

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Workflows

Track Changes

Table Recognition (Deprecated)

Forge Evals

What is Forge Evals?

Getting started

Step 1: Upload documents

Step 2: Select configurations

Datalab tab

Other Models tab

External Providers tab

Step 3: Run evaluation

Step 4: Compare results

Viewing modes

Visualization features

Rendered output

Diff highlighting

Multiple iterations

Excluding runs

Best practices

Choosing configurations

Document selection

Interpreting results

Limitations

Custom evaluations

General

Document Conversion

Structured Extraction

Document Segmentation

Form Filling

File Management

Workflows

Track Changes

Table Recognition (Deprecated)

Forge Evals

​What is Forge Evals?

​Getting started

​Step 1: Upload documents

​Step 2: Select configurations

​Datalab tab

​Other Models tab

​External Providers tab

​Step 3: Run evaluation

​Step 4: Compare results

​Viewing modes

​Visualization features

​Rendered output

​Diff highlighting

​Multiple iterations

​Excluding runs

​Best practices

​Choosing configurations

​Document selection

​Interpreting results

​Limitations

​Custom evaluations

​Related resources

What is Forge Evals?

Getting started

Step 1: Upload documents

Step 2: Select configurations

Datalab tab

Other Models tab

External Providers tab

Step 3: Run evaluation

Step 4: Compare results

Viewing modes

Visualization features

Rendered output

Diff highlighting

Multiple iterations

Excluding runs

Best practices

Choosing configurations

Document selection

Interpreting results

Limitations

Custom evaluations

Related resources