Skip to main content

Overview

The form filling API lets you programmatically fill forms in PDFs and images. It works with both:
  • Native PDF forms - Forms with actual form fields
  • Image-based forms - Scanned forms or images with visual form layouts
The API matches your field data to form fields and returns a filled PDF or image.

Basic Usage

from datalab_sdk import DatalabClient, FormFillingOptions

client = DatalabClient()

options = FormFillingOptions(
    field_data={
        "full_name": {"value": "John Doe", "description": "Full legal name"},
        "date_of_birth": {"value": "1990-01-15", "description": "Date of birth"},
        "address": {"value": "123 Main St, City, ST 12345", "description": "Mailing address"},
    }
)

result = client.fill("form.pdf", options=options)
result.save_output("filled_form.pdf")

Form Filling Options

OptionTypeDefaultDescription
field_datadictRequiredField names mapped to values and descriptions
contextstrNoneAdditional context to help match fields
confidence_thresholdfloat0.5Minimum confidence for field matching (0.0-1.0)
max_pagesintNoneMaximum pages to process
page_rangestrNoneSpecific pages to process (e.g., "0-2")
skip_cacheboolFalseSkip cached results

Field Data Format

Each field in field_data is a dictionary with:
field_data = {
    "field_key": {
        "value": "The value to fill",
        "description": "Description to help match the field"
    }
}
The description helps the API match your field key to the actual form field, especially when field names in the PDF don’t match your data structure.

Example with Multiple Field Types

options = FormFillingOptions(
    field_data={
        # Text fields
        "name": {"value": "Jane Smith", "description": "Full name"},
        "email": {"value": "[email protected]", "description": "Email address"},

        # Date fields
        "date": {"value": "2024-01-15", "description": "Today's date"},

        # Numeric fields
        "amount": {"value": "1500.00", "description": "Total amount"},

        # Checkbox (use descriptive value)
        "agree_terms": {"value": "Yes", "description": "Agreement checkbox"},

        # Signature (text is rendered)
        "signature": {"value": "Jane Smith", "description": "Signature field"},
    },
    context="This is an employment application form"
)

Using Context

The context parameter provides additional information to improve field matching:
options = FormFillingOptions(
    field_data={
        "ssn": {"value": "123-45-6789", "description": "Social Security Number"},
        "employer": {"value": "Acme Corp", "description": "Current employer name"},
    },
    context="W-4 tax withholding form for new employee onboarding"
)

Confidence Threshold

Adjust confidence_threshold to control field matching strictness:
options = FormFillingOptions(
    field_data={...},
    confidence_threshold=0.7,  # Higher = more strict matching
)
  • Lower values (0.3-0.5): More fields matched, but may have incorrect matches
  • Higher values (0.7-0.9): Fewer fields matched, but more accurate

Form Filling Result

result = client.fill("form.pdf", options=options)

# Check results
print(result.success)           # True if filling succeeded
print(result.status)            # "complete" when done
print(result.output_format)     # "pdf" or "png"
print(result.fields_filled)     # List of successfully filled fields
print(result.fields_not_found)  # List of fields that couldn't be matched
print(result.page_count)        # Number of pages processed
print(result.cost_breakdown)    # Cost details

Result Fields

FieldTypeDescription
successboolWhether form filling succeeded
statusstrProcessing status
output_formatstrOutput type: "pdf" or "png"
output_base64strBase64-encoded filled form
fields_filledlistField names that were successfully filled
fields_not_foundlistField names that couldn’t be matched
page_countintNumber of pages processed
runtimefloatProcessing time in seconds
cost_breakdowndictCost details

Saving the Filled Form

# Save to file
result.save_output("filled_form.pdf")

# Or access the raw base64 data
import base64
pdf_bytes = base64.b64decode(result.output_base64)
with open("filled.pdf", "wb") as f:
    f.write(pdf_bytes)

Filling Image Forms

The API also works with image-based forms (PNG, JPG, etc.):
result = client.fill("scanned_form.png", options=options)
result.save_output("filled_form.png")  # Returns filled image
For images, the output is a PNG with the field values rendered onto the image.

From URL

Fill a form from a URL:
result = client.fill(
    file_url="https://example.com/form.pdf",
    options=options
)

Async Usage

import asyncio
from datalab_sdk import AsyncDatalabClient, FormFillingOptions

async def fill_form():
    async with AsyncDatalabClient() as client:
        options = FormFillingOptions(
            field_data={
                "name": {"value": "John Doe", "description": "Full name"},
            }
        )
        result = await client.fill("form.pdf", options=options)
        result.save_output("filled.pdf")

asyncio.run(fill_form())

Handling Unmatched Fields

Check which fields couldn’t be matched:
result = client.fill("form.pdf", options=options)

if result.fields_not_found:
    print("These fields couldn't be matched:")
    for field in result.fields_not_found:
        print(f"  - {field}")

    # Consider adjusting descriptions or lowering confidence threshold

Example: Tax Form

from datalab_sdk import DatalabClient, FormFillingOptions

client = DatalabClient()

options = FormFillingOptions(
    field_data={
        "first_name": {"value": "John", "description": "First name"},
        "last_name": {"value": "Doe", "description": "Last name"},
        "ssn": {"value": "123-45-6789", "description": "Social Security Number"},
        "address": {"value": "123 Main Street", "description": "Street address"},
        "city": {"value": "Springfield", "description": "City"},
        "state": {"value": "IL", "description": "State abbreviation"},
        "zip": {"value": "62701", "description": "ZIP code"},
        "filing_status": {"value": "Single", "description": "Filing status"},
        "signature": {"value": "John Doe", "description": "Taxpayer signature"},
        "date": {"value": "2024-04-15", "description": "Date signed"},
    },
    context="IRS W-4 Employee's Withholding Certificate"
)

result = client.fill("w4_form.pdf", options=options)

print(f"Filled {len(result.fields_filled)} fields")
print(f"Unmatched: {result.fields_not_found}")

result.save_output("w4_filled.pdf")

Try Datalab

Get started with our API in less than a minute. We include free credits.