Follow these practices to keep your Datalab integration secure in production.
API Key Management
Store keys in environment variables
Never hardcode API keys in source code. Use environment variables:
export DATALAB_API_KEY="your-api-key"
# The SDK reads DATALAB_API_KEY automatically
from datalab_sdk import DatalabClient
client = DatalabClient() # Uses env var
Never commit API keys to version control. Add .env files to your .gitignore.
Use per-key spend limits
Create separate API keys for different environments and set spend limits on each:
- Development key — low spend limit for testing
- Staging key — moderate limit for integration testing
- Production key — appropriate limit for your expected usage
Manage keys at datalab.to/app/keys.
Rotate keys regularly
If you suspect a key has been compromised:
- Create a new API key at datalab.to/app/keys
- Update your application to use the new key
- Revoke the old key
Create the new key before revoking the old one to avoid downtime.
Webhook Security
Always use HTTPS
Configure your webhook endpoint to use HTTPS. Webhook payloads contain request data that should be encrypted in transit.
Verify webhook signatures
Always verify the webhook signature before processing the payload:
import hashlib
import hmac
from fastapi import FastAPI, Request, HTTPException
app = FastAPI()
WEBHOOK_SECRET = "your-webhook-secret"
@app.post("/webhook")
async def handle_webhook(request: Request):
body = await request.body()
signature = request.headers.get("X-Webhook-Signature")
expected = hmac.new(
WEBHOOK_SECRET.encode(),
body,
hashlib.sha256
).hexdigest()
if not hmac.compare_digest(signature, expected):
raise HTTPException(status_code=401, detail="Invalid signature")
# Process the webhook payload
payload = await request.json()
return {"status": "ok"}
Handle duplicate events
Webhook deliveries may be retried on 5xx errors or timeouts. Use the request_id field to deduplicate:
processed_ids = set() # Use a database in production
@app.post("/webhook")
async def handle_webhook(request: Request):
payload = await request.json()
request_id = payload["request_id"]
if request_id in processed_ids:
return {"status": "already processed"}
processed_ids.add(request_id)
# Process the payload
Do not log webhook secrets or full webhook payloads containing sensitive document data.
Data Handling
Results expiration
Conversion results are automatically deleted from Datalab servers one hour after processing completes. Retrieve and store results in your own infrastructure promptly.
Data retention consent
You can control whether your documents are used to improve Datalab’s models. This is an opt-in setting configurable in your team settings. Teams that opt in receive discounted rates.
Minimize data exposure
- Only send documents that need to be processed — avoid sending unnecessary files
- Use
page_range to process only the pages you need rather than entire documents
- Download and delete results as soon as they’re available
Network Security
For on-premises deployments
- Place the Datalab container behind a reverse proxy with TLS termination
- Restrict network access to the container’s port (8000) to trusted clients only
- The on-premises container does not require API key authentication by default — implement authentication at the network or reverse proxy level
- See On-Premises Overview for deployment details
IP restrictions
For additional security, consider restricting API access to known IP addresses using your infrastructure’s firewall or WAF rules.
Next Steps