Usage Analytics - Datalab Documentation

The usage analytics endpoint provides comprehensive metrics about inference requests processed by your on-premises container. Use this endpoint to monitor request volumes, success rates, performance statistics, and current system status.

This endpoint is only available in on-premises deployments and requires a valid license.

Endpoint

GET /api/v1/usage

Query Parameters

Parameter	Type	Default	Description
`start_date`	string (ISO 8601)	24 hours ago	Start of time range for analytics
`end_date`	string (ISO 8601)	Now	End of time range for analytics

The time range cannot exceed 7 days. Requests with larger ranges will return a 400 error.

Authentication

This endpoint requires a valid on-premises license. If your license is invalid or expired, the endpoint returns a 423 (Locked) status code.

Response Structure

The endpoint returns a comprehensive analytics object with five main sections:

Period

The effective time range for the query (normalized to UTC):

{
  "period": {
    "start_date": "2024-06-01T00:00:00+00:00",
    "end_date": "2024-06-01T23:59:59+00:00"
  }
}

Summary

Aggregate statistics across all request types:

{
  "summary": {
    "total_requests": 1250,
    "successful_requests": 1200,
    "failed_requests": 50,
    "successful_pages_processed": 15000,
    "failed_pages_processed": 500,
    "success_rate": 0.96
  }
}

Field	Type	Description
`total_requests`	int	Total completed requests in time range
`successful_requests`	int	Requests completed without errors
`failed_requests`	int	Requests that failed with errors
`successful_pages_processed`	int	Total pages from successful requests
`failed_pages_processed`	int	Total pages from failed requests
`success_rate`	float	Ratio of successful to total requests (0-1)

By Request Type

Per-type breakdown of the same metrics:

{
  "by_request_type": {
    "marker": {
      "total_requests": 1000,
      "successful_requests": 980,
      "failed_requests": 20,
      "successful_pages_processed": 12000,
      "failed_pages_processed": 200
    },
    "ocr": {
      "total_requests": 250,
      "successful_requests": 220,
      "failed_requests": 30,
      "successful_pages_processed": 3000,
      "failed_pages_processed": 300
    }
  }
}

Performance

Processing time and queue wait statistics (only includes successful requests):

{
  "performance": {
    "average_processing_time_secs": 12.5,
    "median_processing_time_secs": 10.2,
    "p95_processing_time_secs": 25.8,
    "p99_processing_time_secs": 35.4,
    "average_queue_wait_secs": 2.3
  }
}

Field	Type	Description
`average_processing_time_secs`	float	Mean time from start to completion
`median_processing_time_secs`	float	50th percentile processing time
`p95_processing_time_secs`	float	95th percentile processing time
`p99_processing_time_secs`	float	99th percentile processing time
`average_queue_wait_secs`	float	Mean time from submission to start

Performance metrics are null when there are no successful requests in the time range. Failed requests are excluded from performance calculations.

Current Status

Live snapshot of in-progress and queued requests (not filtered by time range):

{
  "current_status": {
    "requests_in_progress": 5,
    "requests_queued": 12
  }
}

Field	Type	Description
`requests_in_progress`	int	Requests currently being processed
`requests_queued`	int	Requests waiting to be processed

Examples

Basic Usage (Default 24-Hour Window)

# The Python SDK does not yet support the usage endpoint
# Use the requests library directly
import requests

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    headers={"X-API-Key": "any-value"}  # Not validated in on-prem
)

data = response.json()
print(f"Total requests: {data['summary']['total_requests']}")
print(f"Success rate: {data['summary']['success_rate']:.2%}")

Custom Time Range

import requests
from datetime import datetime, timedelta, timezone

# Query last 7 days
end_date = datetime.now(timezone.utc)
start_date = end_date - timedelta(days=7)

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    params={
        "start_date": start_date.isoformat(),
        "end_date": end_date.isoformat()
    },
    headers={"X-API-Key": "any-value"}
)

data = response.json()

Monitoring Dashboard Example

import requests
from datetime import datetime, timezone

def get_usage_metrics():
    """Fetch current usage metrics for monitoring dashboard."""
    response = requests.get(
        "http://localhost:8000/api/v1/usage",
        headers={"X-API-Key": "any-value"}
    )
    
    if response.status_code != 200:
        raise Exception(f"Failed to fetch metrics: {response.status_code}")
    
    return response.json()

def print_dashboard():
    """Print a simple monitoring dashboard."""
    data = get_usage_metrics()
    
    print("=" * 60)
    print("DATALAB ON-PREM USAGE DASHBOARD")
    print("=" * 60)
    
    # Summary
    summary = data["summary"]
    print(f"\n📊 SUMMARY (Last 24 Hours)")
    print(f"  Total Requests:     {summary['total_requests']:,}")
    print(f"  Successful:         {summary['successful_requests']:,}")
    print(f"  Failed:             {summary['failed_requests']:,}")
    print(f"  Success Rate:       {summary['success_rate']:.2%}")
    print(f"  Pages Processed:    {summary['successful_pages_processed']:,}")
    
    # By type
    print(f"\n📈 BY REQUEST TYPE")
    for req_type, metrics in data["by_request_type"].items():
        print(f"  {req_type.upper()}:")
        print(f"    Requests: {metrics['total_requests']:,} ({metrics['successful_requests']:,} successful)")
        print(f"    Pages: {metrics['successful_pages_processed']:,}")
    
    # Performance
    perf = data["performance"]
    if perf["average_processing_time_secs"]:
        print(f"\n⚡ PERFORMANCE")
        print(f"  Avg Processing:     {perf['average_processing_time_secs']:.2f}s")
        print(f"  Median Processing:  {perf['median_processing_time_secs']:.2f}s")
        print(f"  P95 Processing:     {perf['p95_processing_time_secs']:.2f}s")
        print(f"  P99 Processing:     {perf['p99_processing_time_secs']:.2f}s")
        print(f"  Avg Queue Wait:     {perf['average_queue_wait_secs']:.2f}s")
    
    # Current status
    status = data["current_status"]
    print(f"\n🔄 CURRENT STATUS")
    print(f"  In Progress:        {status['requests_in_progress']}")
    print(f"  Queued:             {status['requests_queued']}")
    
    print("=" * 60)

if __name__ == "__main__":
    print_dashboard()

Error Responses

400 Bad Request

Invalid query parameters:

{
  "detail": "start_date must be before end_date."
}

{
  "detail": "Time range must not exceed 7 days."
}

423 Locked

License validation failed:

{
  "detail": "License validation failed"
}

Implementation Notes

Only completed requests (with end_time set) are included in summary statistics
Failed requests are counted in totals but excluded from performance metrics
Performance percentiles use linear interpolation for accurate calculation
Queue wait time is calculated as start_time - submission_time
Processing time is calculated as end_time - start_time
Naive datetimes (without timezone) are treated as UTC
The current_status section provides a live snapshot and is not filtered by the time range

Use Cases

Capacity Planning

Monitor request volumes and processing times to plan infrastructure scaling:

import requests
from datetime import datetime, timedelta, timezone

# Get last 7 days of data
end = datetime.now(timezone.utc)
start = end - timedelta(days=7)

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    params={"start_date": start.isoformat(), "end_date": end.isoformat()},
    headers={"X-API-Key": "any-value"}
)

data = response.json()
avg_daily_requests = data["summary"]["total_requests"] / 7
avg_daily_pages = data["summary"]["successful_pages_processed"] / 7

print(f"Average daily requests: {avg_daily_requests:.0f}")
print(f"Average daily pages: {avg_daily_pages:.0f}")

Performance Monitoring

Track processing times to identify performance degradation:

import requests

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    headers={"X-API-Key": "any-value"}
)

perf = response.json()["performance"]

# Alert if P95 exceeds threshold
if perf["p95_processing_time_secs"] and perf["p95_processing_time_secs"] > 30:
    print(f"ALERT: P95 processing time is {perf['p95_processing_time_secs']:.1f}s")

Queue Monitoring

Monitor queue depth to detect bottlenecks:

import requests

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    headers={"X-API-Key": "any-value"}
)

status = response.json()["current_status"]

if status["requests_queued"] > 50:
    print(f"WARNING: {status['requests_queued']} requests in queue")

Next Steps

On-Prem API

Full API reference for the on-prem container.

Running the Container

Get the on-prem container up and running.

Error Codes

Understand HTTP error codes and troubleshooting.

On-Prem Overview

Compare open-source and paid on-prem options.

Documentation Index

​Endpoint

​Query Parameters

​Authentication

​Response Structure

​Period

​Summary

​By Request Type

​Performance

​Current Status

​Examples

​Basic Usage (Default 24-Hour Window)

​Custom Time Range

​Monitoring Dashboard Example

​Error Responses

​400 Bad Request

​423 Locked

​Implementation Notes

​Use Cases

​Capacity Planning

​Performance Monitoring

​Queue Monitoring

​Next Steps

On-Prem API

Running the Container

Error Codes

On-Prem Overview

Endpoint

Query Parameters

Authentication

Response Structure

Period

Summary

By Request Type

Performance

Current Status

Examples

Basic Usage (Default 24-Hour Window)

Custom Time Range

Monitoring Dashboard Example

Error Responses

400 Bad Request

423 Locked

Implementation Notes

Use Cases

Capacity Planning

Performance Monitoring

Queue Monitoring

Next Steps