Skip to main content

Overview

Datalab provides file storage for documents you want to process with workflows or reuse across multiple API calls. Uploaded files get a reference URL (datalab://file-xxx) that you can use in workflows.

Upload Files

Upload one or more files to Datalab storage:
from datalab_sdk import DatalabClient

client = DatalabClient()

# Upload a single file
file = client.upload_files("document.pdf")
print(f"Uploaded: {file.original_filename}")
print(f"Reference: {file.reference}")  # datalab://file-abc123

# Upload multiple files
files = client.upload_files(["doc1.pdf", "doc2.pdf", "doc3.pdf"])
for f in files:
    print(f"{f.original_filename}: {f.reference}")

Upload Result

The UploadedFileMetadata object contains:
FieldTypeDescription
file_idintUnique file ID
original_filenamestrOriginal filename
content_typestrMIME type
referencestrDatalab reference URL (datalab://file-xxx)
upload_statusstrStatus: "pending", "completed", "failed"
file_sizeintFile size in bytes
createdstrUpload timestamp

List Files

List all uploaded files with pagination:
# List first 50 files
result = client.list_files(limit=50, offset=0)

print(f"Total files: {result['total']}")
for file in result['files']:
    print(f"  {file.original_filename} ({file.file_size} bytes)")
    print(f"    Reference: {file.reference}")
    print(f"    Status: {file.upload_status}")

Pagination

# Page through all files
offset = 0
limit = 50

while True:
    result = client.list_files(limit=limit, offset=offset)

    for file in result['files']:
        print(file.original_filename)

    if offset + limit >= result['total']:
        break

    offset += limit

Get File Metadata

Get details for a specific file:
# By file ID (integer)
file = client.get_file_metadata(123)

# By hashid (string from reference URL)
file = client.get_file_metadata("abc123")

print(f"Filename: {file.original_filename}")
print(f"Size: {file.file_size} bytes")
print(f"Type: {file.content_type}")
print(f"Created: {file.created}")

Get Download URL

Generate a presigned URL to download a file:
result = client.get_file_download_url(
    file_id=123,
    expires_in=3600  # URL valid for 1 hour (default)
)

print(f"Download URL: {result['download_url']}")
print(f"Expires in: {result['expires_in']} seconds")

# Download the file
import requests
response = requests.get(result['download_url'])
with open("downloaded.pdf", "wb") as f:
    f.write(response.content)

Expiration Options

The expires_in parameter accepts values from 60 to 86400 seconds (1 minute to 24 hours):
# Short-lived URL (1 minute)
result = client.get_file_download_url(file_id, expires_in=60)

# Long-lived URL (24 hours)
result = client.get_file_download_url(file_id, expires_in=86400)

Delete File

Delete an uploaded file:
result = client.delete_file(123)

if result['success']:
    print(f"Deleted: {result['message']}")

Using Files in Workflows

File references can be used in workflow inputs:
from datalab_sdk import DatalabClient, InputConfig

client = DatalabClient()

# Upload files
files = client.upload_files(["invoice1.pdf", "invoice2.pdf"])
references = [f.reference for f in files]
# ['datalab://file-abc123', 'datalab://file-def456']

# Use in workflow
input_config = InputConfig(file_urls=references)

execution = client.execute_workflow(
    workflow_id=42,
    input_config=input_config
)
See Workflows for more details.

Async Usage

import asyncio
from datalab_sdk import AsyncDatalabClient

async def manage_files():
    async with AsyncDatalabClient() as client:
        # Upload
        files = await client.upload_files(["doc.pdf"])

        # List
        result = await client.list_files(limit=10)

        # Get metadata
        file = await client.get_file_metadata(files[0].file_id)

        # Download URL
        url = await client.get_file_download_url(files[0].file_id)

        # Delete
        await client.delete_file(files[0].file_id)

asyncio.run(manage_files())

Example: Batch Upload and Process

from datalab_sdk import DatalabClient
from pathlib import Path

client = DatalabClient()

# Find all PDFs in a directory
pdf_files = list(Path("./documents").glob("*.pdf"))

# Upload all files
uploaded = client.upload_files([str(p) for p in pdf_files])

print(f"Uploaded {len(uploaded)} files:")
for file in uploaded:
    print(f"  {file.original_filename}: {file.reference}")

# Store references for later use
references = {f.original_filename: f.reference for f in uploaded}

Supported File Types

See Supported File Types for a complete list of supported formats.

Try Datalab

Get started with our API in less than a minute. We include free credits.