Skip to main content
The File Upload API allows you to upload files to Datalab’s storage and reference them in workflows and API requests. This is particularly useful for batch processing, workflow automation, and managing large document collections.

Overview

The File Upload API provides:
  • Direct R2 uploads - Upload files directly to cloud storage using presigned URLs
  • File management - List, retrieve metadata, and delete uploaded files
  • Workflow integration - Reference uploaded files in workflow executions
  • File references - Use datalab://file-{id} URLs to reference files across API calls

Upload Flow

The recommended upload flow uses presigned URLs for direct client-side uploads:
  1. Request upload URL - Get a presigned URL from /api/v1/files/upload
  2. Upload file - Upload directly to R2 using the presigned URL
  3. Confirm upload - Call /api/v1/files/{file_id}/confirm to verify and finalize
This approach is more efficient than uploading through the API server and supports larger files.

Request Upload URL

First, request a presigned upload URL:
import requests

url = "https://www.datalab.to/api/v1/files/upload"
headers = {"X-API-Key": "YOUR_API_KEY"}

payload = {
    "filename": "document.pdf",
    "content_type": "application/pdf"
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

print(data)
Response:
{
  "file_id": 123,
  "upload_url": "https://presigned-url-to-r2...",
  "expires_in": 3600,
  "reference": "datalab://file-abc123xyz"
}
Supported Content Types:
  • PDFs: application/pdf
  • Images: image/png, image/jpeg, image/webp, image/gif, image/tiff
  • Documents: application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • Spreadsheets: application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • HTML: text/html
  • And more - see Supported File Types

Upload to R2

Use the presigned URL to upload your file directly:
# Upload file using the presigned URL
with open("document.pdf", "rb") as f:
    upload_response = requests.put(
        data["upload_url"],
        data=f,
        headers={"Content-Type": "application/pdf"}
    )

if upload_response.status_code == 200:
    print("Upload successful!")
else:
    print(f"Upload failed: {upload_response.status_code}")
Important: The presigned URL expires after 1 hour. If you need more time, request a new URL.

Confirm Upload

After uploading, confirm the upload to verify the file and get the actual file size:
file_id = data["file_id"]
confirm_url = f"https://www.datalab.to/api/v1/files/{file_id}/confirm"

response = requests.get(confirm_url, headers=headers)
confirm_data = response.json()

print(confirm_data)
Response:
{
  "success": true,
  "file_id": 123,
  "reference": "datalab://file-abc123xyz",
  "message": "Upload confirmed successfully"
}
The file is now ready to use in workflows and API requests.

File Management

List Files

Get a paginated list of your uploaded files:
url = "https://www.datalab.to/api/v1/files"
params = {
    "limit": 50,
    "offset": 0
}

response = requests.get(url, params=params, headers=headers)
files = response.json()
Response:
{
  "files": [
    {
      "file_id": 123,
      "original_filename": "document.pdf",
      "content_type": "application/pdf",
      "file_size": 1048576,
      "upload_status": "completed",
      "created": "2025-12-08T10:00:00Z",
      "reference": "datalab://file-abc123xyz"
    }
  ],
  "total": 150,
  "limit": 50,
  "offset": 0
}

Get File Metadata

Retrieve metadata for a specific file:
file_id = 123
url = f"https://www.datalab.to/api/v1/files/{file_id}"

response = requests.get(url, headers=headers)
metadata = response.json()

Download File

Generate a presigned download URL:
file_id = 123
url = f"https://www.datalab.to/api/v1/files/{file_id}/download"
params = {"expires_in": 3600}  # URL valid for 1 hour

response = requests.get(url, params=params, headers=headers)
download_data = response.json()

# Use the download URL
download_url = download_data["download_url"]

Delete File

Remove a file from storage:
file_id = 123
url = f"https://www.datalab.to/api/v1/files/{file_id}"

response = requests.delete(url, headers=headers)
result = response.json()

Using Files in Workflows

Once uploaded, you can reference files in workflow executions using the datalab://file-{id} format:
# Execute workflow with uploaded file
workflow_url = "https://www.datalab.to/api/v1/workflows/456/execute"

payload = {
    "input_config": {
        "type": "single_file",
        "file_url": "datalab://file-abc123xyz"
    }
}

response = requests.post(workflow_url, json=payload, headers=headers)
For multiple files:
payload = {
    "input_config": {
        "type": "file_list",
        "file_urls": [
            "datalab://file-abc123xyz",
            "datalab://file-def456uvw",
            "datalab://file-ghi789rst"
        ]
    }
}
You can also use file references in the Marker API:
url = "https://www.datalab.to/api/v1/marker"

form_data = {
    "file_url": (None, "datalab://file-abc123xyz"),
    "output_format": (None, "markdown"),
    "use_llm": (None, True)
}

response = requests.post(url, files=form_data, headers=headers)

Complete Example

Here’s a complete example that uploads a file and uses it in a workflow:
import requests
import time

API_KEY = "YOUR_API_KEY"
headers = {"X-API-Key": API_KEY}

# Step 1: Request upload URL
upload_request = {
    "filename": "invoice.pdf",
    "content_type": "application/pdf"
}

response = requests.post(
    "https://www.datalab.to/api/v1/files/upload",
    json=upload_request,
    headers=headers
)
upload_data = response.json()

file_id = upload_data["file_id"]
upload_url = upload_data["upload_url"]
file_reference = upload_data["reference"]

# Step 2: Upload file to R2
with open("invoice.pdf", "rb") as f:
    upload_response = requests.put(
        upload_url,
        data=f,
        headers={"Content-Type": "application/pdf"}
    )

if upload_response.status_code != 200:
    raise Exception(f"Upload failed: {upload_response.status_code}")

# Step 3: Confirm upload
confirm_response = requests.get(
    f"https://www.datalab.to/api/v1/files/{file_id}/confirm",
    headers=headers
)
confirm_data = confirm_response.json()

if not confirm_data["success"]:
    raise Exception("Upload confirmation failed")

print(f"File uploaded successfully: {file_reference}")

# Step 4: Use file in workflow
workflow_response = requests.post(
    "https://www.datalab.to/api/v1/workflows/123/execute",
    json={
        "input_config": {
            "type": "single_file",
            "file_url": file_reference
        }
    },
    headers=headers
)

execution_data = workflow_response.json()
execution_id = execution_data["execution_id"]

# Step 5: Poll for workflow completion
max_polls = 300
for i in range(max_polls):
    time.sleep(5)
    
    status_response = requests.get(
        f"https://www.datalab.to/api/v1/workflows/executions/{execution_id}",
        headers=headers
    )
    status_data = status_response.json()
    
    if status_data["status"] == "COMPLETED":
        print("Workflow completed successfully!")
        print(status_data["steps"])
        break
    elif status_data["status"] == "FAILED":
        print("Workflow failed")
        break

Best Practices

  1. Always confirm uploads - Call the confirm endpoint to verify files were uploaded successfully, or use our SDK to streamline this
  2. Handle errors gracefully - Check upload status and handle failures appropriately
  3. Clean up unused files - Delete files you no longer need to manage storage
  4. Use file references - Reference uploaded files by their datalab://file-{id} URL for consistency
  5. Set appropriate expiry times - Use shorter expiry times for download URLs when possible

Limits

  • Maximum file size: 300 MB
  • Upload URL expiry: 1 hour
  • Download URL expiry: 1 minute to 24 hours (configurable)
For larger files or custom requirements, contact [email protected].