> ## Documentation Index
> Fetch the complete documentation index at: https://documentation.datalab.to/llms.txt
> Use this file to discover all available pages before exploring further.

# File Management

> Upload, list, and manage files in Datalab storage using the SDK.

## Overview

Datalab provides file storage for documents you want to process with pipelines or reuse across multiple API calls. Uploaded files get a reference URL (`datalab://file-xxx`) that you can use in pipelines.

## Upload Files

Upload one or more files to Datalab storage:

```python theme={null}
from datalab_sdk import DatalabClient

client = DatalabClient()

# Upload a single file
file = client.upload_files("document.pdf")
print(f"Uploaded: {file.original_filename}")
print(f"Reference: {file.reference}")  # datalab://file-abc123

# Upload multiple files
files = client.upload_files(["doc1.pdf", "doc2.pdf", "doc3.pdf"])
for f in files:
    print(f"{f.original_filename}: {f.reference}")
```

### Upload Result

The `UploadedFileMetadata` object contains:

| Field               | Type | Description                                    |
| ------------------- | ---- | ---------------------------------------------- |
| `file_id`           | int  | Unique file ID                                 |
| `original_filename` | str  | Original filename                              |
| `content_type`      | str  | MIME type                                      |
| `reference`         | str  | Datalab reference URL (`datalab://file-xxx`)   |
| `upload_status`     | str  | Status: `"pending"`, `"completed"`, `"failed"` |
| `file_size`         | int  | File size in bytes                             |
| `created`           | str  | Upload timestamp                               |

## List Files

List all uploaded files with pagination:

```python theme={null}
# List first 50 files
result = client.list_files(limit=50, offset=0)

print(f"Total files: {result['total']}")
for file in result['files']:
    print(f"  {file.original_filename} ({file.file_size} bytes)")
    print(f"    Reference: {file.reference}")
    print(f"    Status: {file.upload_status}")
```

### Pagination

```python theme={null}
# Page through all files
offset = 0
limit = 50

while True:
    result = client.list_files(limit=limit, offset=offset)

    for file in result['files']:
        print(file.original_filename)

    if offset + limit >= result['total']:
        break

    offset += limit
```

## Get File Metadata

Get details for a specific file:

```python theme={null}
# By file ID (integer)
file = client.get_file_metadata(123)

# By hashid (string from reference URL)
file = client.get_file_metadata("abc123")

print(f"Filename: {file.original_filename}")
print(f"Size: {file.file_size} bytes")
print(f"Type: {file.content_type}")
print(f"Created: {file.created}")
```

## Get Download URL

Generate a presigned URL to download a file:

```python theme={null}
result = client.get_file_download_url(
    file_id=123,
    expires_in=3600  # URL valid for 1 hour (default)
)

print(f"Download URL: {result['download_url']}")
print(f"Expires in: {result['expires_in']} seconds")

# Download the file
import requests
response = requests.get(result['download_url'])
with open("downloaded.pdf", "wb") as f:
    f.write(response.content)
```

### Expiration Options

The `expires_in` parameter accepts values from 60 to 86400 seconds (1 minute to 24 hours):

```python theme={null}
# Short-lived URL (1 minute)
result = client.get_file_download_url(file_id, expires_in=60)

# Long-lived URL (24 hours)
result = client.get_file_download_url(file_id, expires_in=86400)
```

## Delete File

Delete an uploaded file:

```python theme={null}
result = client.delete_file(123)

if result['success']:
    print(f"Deleted: {result['message']}")
```

## Using Files in Pipelines

File references can be used as input to pipelines:

```python theme={null}
from datalab_sdk import DatalabClient

client = DatalabClient()

# Upload files
files = client.upload_files(["invoice1.pdf", "invoice2.pdf"])

# Run pipeline on each uploaded file
for f in files:
    execution = client.run_pipeline(
        "pl_abc123",
        file_url=f.reference  # e.g., 'datalab://file-abc123'
    )
    print(f"{f.original_filename}: {execution.execution_id}")
```

See [Pipelines](/docs/recipes/pipelines/pipeline-overview) for more details.

## Async Usage

```python theme={null}
import asyncio
from datalab_sdk import AsyncDatalabClient

async def manage_files():
    async with AsyncDatalabClient() as client:
        # Upload
        files = await client.upload_files(["doc.pdf"])

        # List
        result = await client.list_files(limit=10)

        # Get metadata
        file = await client.get_file_metadata(files[0].file_id)

        # Download URL
        url = await client.get_file_download_url(files[0].file_id)

        # Delete
        await client.delete_file(files[0].file_id)

asyncio.run(manage_files())
```

## Example: Batch Upload and Process

```python theme={null}
from datalab_sdk import DatalabClient
from pathlib import Path

client = DatalabClient()

# Find all PDFs in a directory
pdf_files = list(Path("./documents").glob("*.pdf"))

# Upload all files
uploaded = client.upload_files([str(p) for p in pdf_files])

print(f"Uploaded {len(uploaded)} files:")
for file in uploaded:
    print(f"  {file.original_filename}: {file.reference}")

# Store references for later use
references = {f.original_filename: f.reference for f in uploaded}
```

## Supported File Types

See [Supported File Types](/docs/common/supportedfiletypes) for a complete list of supported formats.

## Next Steps

<CardGroup cols={2}>
  <Card title="File Upload Recipe" icon="cloud-arrow-up" href="/docs/recipes/file-management/file-upload-api">
    Step-by-step guide for uploading and managing files via the API.
  </Card>

  <Card title="Pipelines" icon="workflow" href="/docs/recipes/pipelines/pipeline-overview">
    Chain processors into versioned, reusable pipelines.
  </Card>

  <Card title="Conversion SDK" icon="file-export" href="/docs/welcome/sdk/conversion">
    Convert documents to Markdown, HTML, JSON, or chunks.
  </Card>

  <Card title="API Limits" icon="gauge-high" href="/docs/common/limits">
    Understand rate limits and file size constraints.
  </Card>
</CardGroup>
