Skip to main content
Follow these practices to keep your Datalab integration secure in production.

API Key Management

Store keys in environment variables

Never hardcode API keys in source code. Use environment variables:
export DATALAB_API_KEY="your-api-key"
# The SDK reads DATALAB_API_KEY automatically
from datalab_sdk import DatalabClient
client = DatalabClient()  # Uses env var
Never commit API keys to version control. Add .env files to your .gitignore.

Use per-key spend limits

Create separate API keys for different environments and set spend limits on each:
  • Development key — low spend limit for testing
  • Staging key — moderate limit for integration testing
  • Production key — appropriate limit for your expected usage
Manage keys at datalab.to/app/keys.

Rotate keys regularly

If you suspect a key has been compromised:
  1. Create a new API key at datalab.to/app/keys
  2. Update your application to use the new key
  3. Revoke the old key
Create the new key before revoking the old one to avoid downtime.

Webhook Security

Always use HTTPS

Configure your webhook endpoint to use HTTPS. Webhook payloads contain request data that should be encrypted in transit.

Verify webhook signatures

Always verify the webhook signature before processing the payload:
import hashlib
import hmac
from fastapi import FastAPI, Request, HTTPException

app = FastAPI()
WEBHOOK_SECRET = "your-webhook-secret"

@app.post("/webhook")
async def handle_webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("X-Webhook-Signature")

    expected = hmac.new(
        WEBHOOK_SECRET.encode(),
        body,
        hashlib.sha256
    ).hexdigest()

    if not hmac.compare_digest(signature, expected):
        raise HTTPException(status_code=401, detail="Invalid signature")

    # Process the webhook payload
    payload = await request.json()
    return {"status": "ok"}

Handle duplicate events

Webhook deliveries may be retried on 5xx errors or timeouts. Use the request_id field to deduplicate:
processed_ids = set()  # Use a database in production

@app.post("/webhook")
async def handle_webhook(request: Request):
    payload = await request.json()
    request_id = payload["request_id"]

    if request_id in processed_ids:
        return {"status": "already processed"}

    processed_ids.add(request_id)
    # Process the payload
Do not log webhook secrets or full webhook payloads containing sensitive document data.

Data Handling

Results expiration

Conversion results are automatically deleted from Datalab servers one hour after processing completes. Retrieve and store results in your own infrastructure promptly. You can control whether your documents are used to improve Datalab’s models. This is an opt-in setting configurable in your team settings. Teams that opt in receive discounted rates.

Minimize data exposure

  • Only send documents that need to be processed — avoid sending unnecessary files
  • Use page_range to process only the pages you need rather than entire documents
  • Download and delete results as soon as they’re available

Network Security

For on-premises deployments

  • Place the Datalab container behind a reverse proxy with TLS termination
  • Restrict network access to the container’s port (8000) to trusted clients only
  • The on-premises container does not require API key authentication by default — implement authentication at the network or reverse proxy level
  • See On-Premises Overview for deployment details

IP restrictions

For additional security, consider restricting API access to known IP addresses using your infrastructure’s firewall or WAF rules.

Next Steps