The usage analytics endpoint provides comprehensive metrics about inference requests processed by your on-premises container. Use this endpoint to monitor request volumes, success rates, performance statistics, and current system status.
This endpoint is only available in on-premises deployments and requires a valid license.
Endpoint
Query Parameters
Parameter Type Default Description start_datestring (ISO 8601) 24 hours ago Start of time range for analytics end_datestring (ISO 8601) Now End of time range for analytics
The time range cannot exceed 7 days. Requests with larger ranges will return a 400 error.
Authentication
This endpoint requires a valid on-premises license. If your license is invalid or expired, the endpoint returns a 423 (Locked) status code.
Response Structure
The endpoint returns a comprehensive analytics object with five main sections:
Period
The effective time range for the query (normalized to UTC):
{
"period" : {
"start_date" : "2024-06-01T00:00:00+00:00" ,
"end_date" : "2024-06-01T23:59:59+00:00"
}
}
Summary
Aggregate statistics across all request types:
{
"summary" : {
"total_requests" : 1250 ,
"successful_requests" : 1200 ,
"failed_requests" : 50 ,
"successful_pages_processed" : 15000 ,
"failed_pages_processed" : 500 ,
"success_rate" : 0.96
}
}
Field Type Description total_requestsint Total completed requests in time range successful_requestsint Requests completed without errors failed_requestsint Requests that failed with errors successful_pages_processedint Total pages from successful requests failed_pages_processedint Total pages from failed requests success_ratefloat Ratio of successful to total requests (0-1)
By Request Type
Per-type breakdown of the same metrics:
{
"by_request_type" : {
"marker" : {
"total_requests" : 1000 ,
"successful_requests" : 980 ,
"failed_requests" : 20 ,
"successful_pages_processed" : 12000 ,
"failed_pages_processed" : 200
},
"ocr" : {
"total_requests" : 250 ,
"successful_requests" : 220 ,
"failed_requests" : 30 ,
"successful_pages_processed" : 3000 ,
"failed_pages_processed" : 300
}
}
}
Processing time and queue wait statistics (only includes successful requests):
{
"performance" : {
"average_processing_time_secs" : 12.5 ,
"median_processing_time_secs" : 10.2 ,
"p95_processing_time_secs" : 25.8 ,
"p99_processing_time_secs" : 35.4 ,
"average_queue_wait_secs" : 2.3
}
}
Field Type Description average_processing_time_secsfloat Mean time from start to completion median_processing_time_secsfloat 50th percentile processing time p95_processing_time_secsfloat 95th percentile processing time p99_processing_time_secsfloat 99th percentile processing time average_queue_wait_secsfloat Mean time from submission to start
Performance metrics are null when there are no successful requests in the time range. Failed requests are excluded from performance calculations.
Current Status
Live snapshot of in-progress and queued requests (not filtered by time range):
{
"current_status" : {
"requests_in_progress" : 5 ,
"requests_queued" : 12
}
}
Field Type Description requests_in_progressint Requests currently being processed requests_queuedint Requests waiting to be processed
Examples
Basic Usage (Default 24-Hour Window)
Python SDK
cURL
Python (requests)
# The Python SDK does not yet support the usage endpoint
# Use the requests library directly
import requests
response = requests.get(
"http://localhost:8000/api/v1/usage" ,
headers = { "X-API-Key" : "any-value" } # Not validated in on-prem
)
data = response.json()
print ( f "Total requests: { data[ 'summary' ][ 'total_requests' ] } " )
print ( f "Success rate: { data[ 'summary' ][ 'success_rate' ] :.2%} " )
Custom Time Range
Python SDK
cURL
Python (requests)
import requests
from datetime import datetime, timedelta, timezone
# Query last 7 days
end_date = datetime.now(timezone.utc)
start_date = end_date - timedelta( days = 7 )
response = requests.get(
"http://localhost:8000/api/v1/usage" ,
params = {
"start_date" : start_date.isoformat(),
"end_date" : end_date.isoformat()
},
headers = { "X-API-Key" : "any-value" }
)
data = response.json()
Monitoring Dashboard Example
Python SDK
cURL
Python (requests)
import requests
from datetime import datetime, timezone
def get_usage_metrics ():
"""Fetch current usage metrics for monitoring dashboard."""
response = requests.get(
"http://localhost:8000/api/v1/usage" ,
headers = { "X-API-Key" : "any-value" }
)
if response.status_code != 200 :
raise Exception ( f "Failed to fetch metrics: { response.status_code } " )
return response.json()
def print_dashboard ():
"""Print a simple monitoring dashboard."""
data = get_usage_metrics()
print ( "=" * 60 )
print ( "DATALAB ON-PREM USAGE DASHBOARD" )
print ( "=" * 60 )
# Summary
summary = data[ "summary" ]
print ( f " \n 📊 SUMMARY (Last 24 Hours)" )
print ( f " Total Requests: { summary[ 'total_requests' ] :,} " )
print ( f " Successful: { summary[ 'successful_requests' ] :,} " )
print ( f " Failed: { summary[ 'failed_requests' ] :,} " )
print ( f " Success Rate: { summary[ 'success_rate' ] :.2%} " )
print ( f " Pages Processed: { summary[ 'successful_pages_processed' ] :,} " )
# By type
print ( f " \n 📈 BY REQUEST TYPE" )
for req_type, metrics in data[ "by_request_type" ].items():
print ( f " { req_type.upper() } :" )
print ( f " Requests: { metrics[ 'total_requests' ] :,} ( { metrics[ 'successful_requests' ] :,} successful)" )
print ( f " Pages: { metrics[ 'successful_pages_processed' ] :,} " )
# Performance
perf = data[ "performance" ]
if perf[ "average_processing_time_secs" ]:
print ( f " \n ⚡ PERFORMANCE" )
print ( f " Avg Processing: { perf[ 'average_processing_time_secs' ] :.2f} s" )
print ( f " Median Processing: { perf[ 'median_processing_time_secs' ] :.2f} s" )
print ( f " P95 Processing: { perf[ 'p95_processing_time_secs' ] :.2f} s" )
print ( f " P99 Processing: { perf[ 'p99_processing_time_secs' ] :.2f} s" )
print ( f " Avg Queue Wait: { perf[ 'average_queue_wait_secs' ] :.2f} s" )
# Current status
status = data[ "current_status" ]
print ( f " \n 🔄 CURRENT STATUS" )
print ( f " In Progress: { status[ 'requests_in_progress' ] } " )
print ( f " Queued: { status[ 'requests_queued' ] } " )
print ( "=" * 60 )
if __name__ == "__main__" :
print_dashboard()
Error Responses
400 Bad Request
Invalid query parameters:
{
"detail" : "start_date must be before end_date."
}
{
"detail" : "Time range must not exceed 7 days."
}
423 Locked
License validation failed:
{
"detail" : "License validation failed"
}
Implementation Notes
Only completed requests (with end_time set) are included in summary statistics
Failed requests are counted in totals but excluded from performance metrics
Performance percentiles use linear interpolation for accurate calculation
Queue wait time is calculated as start_time - submission_time
Processing time is calculated as end_time - start_time
Naive datetimes (without timezone) are treated as UTC
The current_status section provides a live snapshot and is not filtered by the time range
Use Cases
Capacity Planning
Monitor request volumes and processing times to plan infrastructure scaling:
import requests
from datetime import datetime, timedelta, timezone
# Get last 7 days of data
end = datetime.now(timezone.utc)
start = end - timedelta( days = 7 )
response = requests.get(
"http://localhost:8000/api/v1/usage" ,
params = { "start_date" : start.isoformat(), "end_date" : end.isoformat()},
headers = { "X-API-Key" : "any-value" }
)
data = response.json()
avg_daily_requests = data[ "summary" ][ "total_requests" ] / 7
avg_daily_pages = data[ "summary" ][ "successful_pages_processed" ] / 7
print ( f "Average daily requests: { avg_daily_requests :.0f} " )
print ( f "Average daily pages: { avg_daily_pages :.0f} " )
Track processing times to identify performance degradation:
import requests
response = requests.get(
"http://localhost:8000/api/v1/usage" ,
headers = { "X-API-Key" : "any-value" }
)
perf = response.json()[ "performance" ]
# Alert if P95 exceeds threshold
if perf[ "p95_processing_time_secs" ] and perf[ "p95_processing_time_secs" ] > 30 :
print ( f "ALERT: P95 processing time is { perf[ 'p95_processing_time_secs' ] :.1f} s" )
Queue Monitoring
Monitor queue depth to detect bottlenecks:
import requests
response = requests.get(
"http://localhost:8000/api/v1/usage" ,
headers = { "X-API-Key" : "any-value" }
)
status = response.json()[ "current_status" ]
if status[ "requests_queued" ] > 50 :
print ( f "WARNING: { status[ 'requests_queued' ] } requests in queue" )
Next Steps