> ## Documentation Index
> Fetch the complete documentation index at: https://documentation.datalab.to/llms.txt
> Use this file to discover all available pages before exploring further.

# Form Filling

> Fill PDF and image forms with structured field data using the Datalab SDK.

## Overview

The form filling API lets you programmatically fill forms in PDFs and images. It works with both:

* **Native PDF forms** - Forms with actual form fields
* **Image-based forms** - Scanned forms or images with visual form layouts

The API matches your field data to form fields and returns a filled PDF or image.

## Basic Usage

```python theme={null}
from datalab_sdk import DatalabClient, FormFillingOptions

client = DatalabClient()

options = FormFillingOptions(
    field_data={
        "full_name": {"value": "John Doe", "description": "Full legal name"},
        "date_of_birth": {"value": "1990-01-15", "description": "Date of birth"},
        "address": {"value": "123 Main St, City, ST 12345", "description": "Mailing address"},
    }
)

result = client.fill("form.pdf", options=options)
result.save_output("filled_form.pdf")
```

## Form Filling Options

| Option                 | Type  | Default  | Description                                                                          |
| ---------------------- | ----- | -------- | ------------------------------------------------------------------------------------ |
| `field_data`           | dict  | Required | Field names mapped to values and descriptions                                        |
| `context`              | str   | None     | Additional context to help match fields                                              |
| `confidence_threshold` | float | `0.5`    | Minimum confidence for field matching (0.0-1.0)                                      |
| `max_pages`            | int   | None     | Maximum pages to process                                                             |
| `page_range`           | str   | None     | Specific pages to process (e.g., `"0-2"`). For spreadsheets, filters by sheet index. |
| `skip_cache`           | bool  | `False`  | Skip cached results                                                                  |

### Field Data Format

Each field in `field_data` is a dictionary with:

```python theme={null}
field_data = {
    "field_key": {
        "value": "The value to fill",
        "description": "Description to help match the field"
    }
}
```

The `description` helps the API match your field key to the actual form field, especially when field names in the PDF don't match your data structure.

### Example with Multiple Field Types

```python theme={null}
options = FormFillingOptions(
    field_data={
        # Text fields
        "name": {"value": "Jane Smith", "description": "Full name"},
        "email": {"value": "jane@example.com", "description": "Email address"},

        # Date fields
        "date": {"value": "2024-01-15", "description": "Today's date"},

        # Numeric fields
        "amount": {"value": "1500.00", "description": "Total amount"},

        # Checkbox (use descriptive value)
        "agree_terms": {"value": "Yes", "description": "Agreement checkbox"},

        # Signature (text is rendered)
        "signature": {"value": "Jane Smith", "description": "Signature field"},
    },
    context="This is an employment application form"
)
```

### Using Context

The `context` parameter provides additional information to improve field matching:

```python theme={null}
options = FormFillingOptions(
    field_data={
        "ssn": {"value": "123-45-6789", "description": "Social Security Number"},
        "employer": {"value": "Acme Corp", "description": "Current employer name"},
    },
    context="W-4 tax withholding form for new employee onboarding"
)
```

### Confidence Threshold

Adjust `confidence_threshold` to control field matching strictness:

```python theme={null}
options = FormFillingOptions(
    field_data={...},
    confidence_threshold=0.7,  # Higher = more strict matching
)
```

* **Lower values (0.3-0.5)**: More fields matched, but may have incorrect matches
* **Higher values (0.7-0.9)**: Fewer fields matched, but more accurate

## Form Filling Result

```python theme={null}
result = client.fill("form.pdf", options=options)

# Check results
print(result.success)           # True if filling succeeded
print(result.status)            # "complete" when done
print(result.output_format)     # "pdf" or "png"
print(result.fields_filled)     # List of successfully filled fields
print(result.fields_not_found)  # List of fields that couldn't be matched
print(result.page_count)        # Number of pages processed
print(result.cost_breakdown)    # Cost details
```

### Result Fields

| Field              | Type  | Description                               |
| ------------------ | ----- | ----------------------------------------- |
| `success`          | bool  | Whether form filling succeeded            |
| `status`           | str   | Processing status                         |
| `output_format`    | str   | Output type: `"pdf"` or `"png"`           |
| `output_base64`    | str   | Base64-encoded filled form                |
| `fields_filled`    | list  | Field names that were successfully filled |
| `fields_not_found` | list  | Field names that couldn't be matched      |
| `page_count`       | int   | Number of pages processed                 |
| `runtime`          | float | Processing time in seconds                |
| `cost_breakdown`   | dict  | Cost details                              |

## Saving the Filled Form

```python theme={null}
# Save to file
result.save_output("filled_form.pdf")

# Or access the raw base64 data
import base64
pdf_bytes = base64.b64decode(result.output_base64)
with open("filled.pdf", "wb") as f:
    f.write(pdf_bytes)
```

## Filling Image Forms

The API also works with image-based forms (PNG, JPG, etc.):

```python theme={null}
result = client.fill("scanned_form.png", options=options)
result.save_output("filled_form.png")  # Returns filled image
```

For images, the output is a PNG with the field values rendered onto the image.

## From URL

Fill a form from a URL:

```python theme={null}
result = client.fill(
    file_url="https://example.com/form.pdf",
    options=options
)
```

## Async Usage

```python theme={null}
import asyncio
from datalab_sdk import AsyncDatalabClient, FormFillingOptions

async def fill_form():
    async with AsyncDatalabClient() as client:
        options = FormFillingOptions(
            field_data={
                "name": {"value": "John Doe", "description": "Full name"},
            }
        )
        result = await client.fill("form.pdf", options=options)
        result.save_output("filled.pdf")

asyncio.run(fill_form())
```

## Handling Unmatched Fields

Check which fields couldn't be matched:

```python theme={null}
result = client.fill("form.pdf", options=options)

if result.fields_not_found:
    print("These fields couldn't be matched:")
    for field in result.fields_not_found:
        print(f"  - {field}")

    # Consider adjusting descriptions or lowering confidence threshold
```

## Example: Tax Form

```python theme={null}
from datalab_sdk import DatalabClient, FormFillingOptions

client = DatalabClient()

options = FormFillingOptions(
    field_data={
        "first_name": {"value": "John", "description": "First name"},
        "last_name": {"value": "Doe", "description": "Last name"},
        "ssn": {"value": "123-45-6789", "description": "Social Security Number"},
        "address": {"value": "123 Main Street", "description": "Street address"},
        "city": {"value": "Springfield", "description": "City"},
        "state": {"value": "IL", "description": "State abbreviation"},
        "zip": {"value": "62701", "description": "ZIP code"},
        "filing_status": {"value": "Single", "description": "Filing status"},
        "signature": {"value": "John Doe", "description": "Taxpayer signature"},
        "date": {"value": "2024-04-15", "description": "Date signed"},
    },
    context="IRS W-4 Employee's Withholding Certificate"
)

result = client.fill("w4_form.pdf", options=options)

print(f"Filled {len(result.fields_filled)} fields")
print(f"Unmatched: {result.fields_not_found}")

result.save_output("w4_filled.pdf")
```

## Next Steps

<CardGroup cols={2}>
  <Card title="Form Filling Recipe" icon="file-pen" href="/docs/recipes/form-filling/form-filling-api-overview">
    Detailed guide on form filling with field matching and templates.
  </Card>

  <Card title="File Management" icon="folder-open" href="/docs/welcome/sdk/file-management">
    Upload, list, and manage files in Datalab storage.
  </Card>

  <Card title="Conversion SDK" icon="file-export" href="/docs/welcome/sdk/conversion">
    Convert documents to Markdown, HTML, JSON, or chunks.
  </Card>

  <Card title="Pipelines" icon="workflow" href="/docs/recipes/pipelines/pipeline-overview">
    Chain processors into versioned, reusable pipelines.
  </Card>
</CardGroup>
