Skip to main content
The usage analytics endpoint provides comprehensive metrics about inference requests processed by your on-premises container. Use this endpoint to monitor request volumes, success rates, performance statistics, and current system status.
This endpoint is only available in on-premises deployments and requires a valid license.

Endpoint

GET /api/v1/usage

Query Parameters

ParameterTypeDefaultDescription
start_datestring (ISO 8601)24 hours agoStart of time range for analytics
end_datestring (ISO 8601)NowEnd of time range for analytics
The time range cannot exceed 7 days. Requests with larger ranges will return a 400 error.

Authentication

This endpoint requires a valid on-premises license. If your license is invalid or expired, the endpoint returns a 423 (Locked) status code.

Response Structure

The endpoint returns a comprehensive analytics object with five main sections:

Period

The effective time range for the query (normalized to UTC):
{
  "period": {
    "start_date": "2024-06-01T00:00:00+00:00",
    "end_date": "2024-06-01T23:59:59+00:00"
  }
}

Summary

Aggregate statistics across all request types:
{
  "summary": {
    "total_requests": 1250,
    "successful_requests": 1200,
    "failed_requests": 50,
    "successful_pages_processed": 15000,
    "failed_pages_processed": 500,
    "success_rate": 0.96
  }
}
FieldTypeDescription
total_requestsintTotal completed requests in time range
successful_requestsintRequests completed without errors
failed_requestsintRequests that failed with errors
successful_pages_processedintTotal pages from successful requests
failed_pages_processedintTotal pages from failed requests
success_ratefloatRatio of successful to total requests (0-1)

By Request Type

Per-type breakdown of the same metrics:
{
  "by_request_type": {
    "marker": {
      "total_requests": 1000,
      "successful_requests": 980,
      "failed_requests": 20,
      "successful_pages_processed": 12000,
      "failed_pages_processed": 200
    },
    "ocr": {
      "total_requests": 250,
      "successful_requests": 220,
      "failed_requests": 30,
      "successful_pages_processed": 3000,
      "failed_pages_processed": 300
    }
  }
}

Performance

Processing time and queue wait statistics (only includes successful requests):
{
  "performance": {
    "average_processing_time_secs": 12.5,
    "median_processing_time_secs": 10.2,
    "p95_processing_time_secs": 25.8,
    "p99_processing_time_secs": 35.4,
    "average_queue_wait_secs": 2.3
  }
}
FieldTypeDescription
average_processing_time_secsfloatMean time from start to completion
median_processing_time_secsfloat50th percentile processing time
p95_processing_time_secsfloat95th percentile processing time
p99_processing_time_secsfloat99th percentile processing time
average_queue_wait_secsfloatMean time from submission to start
Performance metrics are null when there are no successful requests in the time range. Failed requests are excluded from performance calculations.

Current Status

Live snapshot of in-progress and queued requests (not filtered by time range):
{
  "current_status": {
    "requests_in_progress": 5,
    "requests_queued": 12
  }
}
FieldTypeDescription
requests_in_progressintRequests currently being processed
requests_queuedintRequests waiting to be processed

Examples

Basic Usage (Default 24-Hour Window)

# The Python SDK does not yet support the usage endpoint
# Use the requests library directly
import requests

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    headers={"X-API-Key": "any-value"}  # Not validated in on-prem
)

data = response.json()
print(f"Total requests: {data['summary']['total_requests']}")
print(f"Success rate: {data['summary']['success_rate']:.2%}")

Custom Time Range

import requests
from datetime import datetime, timedelta, timezone

# Query last 7 days
end_date = datetime.now(timezone.utc)
start_date = end_date - timedelta(days=7)

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    params={
        "start_date": start_date.isoformat(),
        "end_date": end_date.isoformat()
    },
    headers={"X-API-Key": "any-value"}
)

data = response.json()

Monitoring Dashboard Example

import requests
from datetime import datetime, timezone

def get_usage_metrics():
    """Fetch current usage metrics for monitoring dashboard."""
    response = requests.get(
        "http://localhost:8000/api/v1/usage",
        headers={"X-API-Key": "any-value"}
    )
    
    if response.status_code != 200:
        raise Exception(f"Failed to fetch metrics: {response.status_code}")
    
    return response.json()

def print_dashboard():
    """Print a simple monitoring dashboard."""
    data = get_usage_metrics()
    
    print("=" * 60)
    print("DATALAB ON-PREM USAGE DASHBOARD")
    print("=" * 60)
    
    # Summary
    summary = data["summary"]
    print(f"\n📊 SUMMARY (Last 24 Hours)")
    print(f"  Total Requests:     {summary['total_requests']:,}")
    print(f"  Successful:         {summary['successful_requests']:,}")
    print(f"  Failed:             {summary['failed_requests']:,}")
    print(f"  Success Rate:       {summary['success_rate']:.2%}")
    print(f"  Pages Processed:    {summary['successful_pages_processed']:,}")
    
    # By type
    print(f"\n📈 BY REQUEST TYPE")
    for req_type, metrics in data["by_request_type"].items():
        print(f"  {req_type.upper()}:")
        print(f"    Requests: {metrics['total_requests']:,} ({metrics['successful_requests']:,} successful)")
        print(f"    Pages: {metrics['successful_pages_processed']:,}")
    
    # Performance
    perf = data["performance"]
    if perf["average_processing_time_secs"]:
        print(f"\n⚡ PERFORMANCE")
        print(f"  Avg Processing:     {perf['average_processing_time_secs']:.2f}s")
        print(f"  Median Processing:  {perf['median_processing_time_secs']:.2f}s")
        print(f"  P95 Processing:     {perf['p95_processing_time_secs']:.2f}s")
        print(f"  P99 Processing:     {perf['p99_processing_time_secs']:.2f}s")
        print(f"  Avg Queue Wait:     {perf['average_queue_wait_secs']:.2f}s")
    
    # Current status
    status = data["current_status"]
    print(f"\n🔄 CURRENT STATUS")
    print(f"  In Progress:        {status['requests_in_progress']}")
    print(f"  Queued:             {status['requests_queued']}")
    
    print("=" * 60)

if __name__ == "__main__":
    print_dashboard()

Error Responses

400 Bad Request

Invalid query parameters:
{
  "detail": "start_date must be before end_date."
}
{
  "detail": "Time range must not exceed 7 days."
}

423 Locked

License validation failed:
{
  "detail": "License validation failed"
}

Implementation Notes

  • Only completed requests (with end_time set) are included in summary statistics
  • Failed requests are counted in totals but excluded from performance metrics
  • Performance percentiles use linear interpolation for accurate calculation
  • Queue wait time is calculated as start_time - submission_time
  • Processing time is calculated as end_time - start_time
  • Naive datetimes (without timezone) are treated as UTC
  • The current_status section provides a live snapshot and is not filtered by the time range

Use Cases

Capacity Planning

Monitor request volumes and processing times to plan infrastructure scaling:
import requests
from datetime import datetime, timedelta, timezone

# Get last 7 days of data
end = datetime.now(timezone.utc)
start = end - timedelta(days=7)

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    params={"start_date": start.isoformat(), "end_date": end.isoformat()},
    headers={"X-API-Key": "any-value"}
)

data = response.json()
avg_daily_requests = data["summary"]["total_requests"] / 7
avg_daily_pages = data["summary"]["successful_pages_processed"] / 7

print(f"Average daily requests: {avg_daily_requests:.0f}")
print(f"Average daily pages: {avg_daily_pages:.0f}")

Performance Monitoring

Track processing times to identify performance degradation:
import requests

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    headers={"X-API-Key": "any-value"}
)

perf = response.json()["performance"]

# Alert if P95 exceeds threshold
if perf["p95_processing_time_secs"] and perf["p95_processing_time_secs"] > 30:
    print(f"ALERT: P95 processing time is {perf['p95_processing_time_secs']:.1f}s")

Queue Monitoring

Monitor queue depth to detect bottlenecks:
import requests

response = requests.get(
    "http://localhost:8000/api/v1/usage",
    headers={"X-API-Key": "any-value"}
)

status = response.json()["current_status"]

if status["requests_queued"] > 50:
    print(f"WARNING: {status['requests_queued']} requests in queue")

Next Steps