Skip to main content
The form filling API automatically fills PDF and image forms with your structured data. It supports PDFs with or without native form fields.

How it works

The form filling process:
  1. Upload your form (PDF or image) and provide field data
  2. The API detects form fields
  3. Field names are matched to your data
  4. The form is filled and returned as a PDF or PNG

Running form filling

Form submission

The form filling endpoint is available at /api/v1/fill. Here is an example request in Python:
import requests
import json

url = "https://www.datalab.to/api/v1/fill"

# Define the field data to fill
field_data = {
    "name": {"value": "John Doe", "description": "Full name of the person"},
    "email": {"value": "[email protected]", "description": "Email address"},
    "date": {"value": "12/15/2024", "description": "Today's date"},
    "is_citizen": {"value": "yes", "description": "US citizenship status"}
}

form_data = {
    'file': ('form.pdf', open('~/forms/form.pdf', 'rb'), 'application/pdf'),
    'field_data': (None, json.dumps(field_data)),
    'context': (None, 'Filling out a general form'),
    'confidence_threshold': (None, 0.5)
}

headers = {"X-Api-Key": "YOUR_API_KEY"}

response = requests.post(url, files=form_data, headers=headers)
data = response.json()
Parameters:
  • file - the input form file to process (PDF or image).
  • file_url - a URL pointing to the input file. Either file or file_url must be provided. Supports:
    • HTTP/HTTPS URLs (e.g., https://example.com/form.pdf)
    • Datalab file references (e.g., datalab://file-abc123xyz) - see File Upload API
  • field_data - required JSON string mapping field keys to values and descriptions. Format:
    {
      "field_key": {
        "value": "field value",
        "description": "description of what this field represents"
      }
    }
    
  • context - optional context to guide form filling (e.g., “Initial hire for new employee”, “Tax year 2024”). Helps the LLM make smarter matching decisions.
  • confidence_threshold - minimum confidence for field matching (0.0-1.0). Fields below this threshold won’t be filled. Defaults to 0.5.
  • page_range - specific pages to process, comma separated like 0,5-10,20. Example: 0,2-4 will process pages 0, 2, 3, and 4.
  • skip_cache - skip the cache and re-run the inference. Defaults to False.
You can see a full list of parameters in the Form Filling API reference. The request will return the following response:
{
  'success': True,
  'error': None,
  'request_id': "PpK1oM-HB4RgrhsQhVb2uQ",
  'request_check_url': 'https://www.datalab.to/api/v1/fill/PpK1oM-HB4RgrhsQhVb2uQ'
}

Polling for completion

You will then need to poll request_check_url, like this:
import time

max_polls = 300
check_url = data["request_check_url"]

for i in range(max_polls):
    time.sleep(2)
    response = requests.get(check_url, headers=headers)
    data = response.json()

    if data["status"] == "complete":
        break
Eventually, the status field will be set to complete, and you will get an object that looks like this:
{
    "status": "complete",
    "success": True,
    "output_format": "pdf",
    "output_base64": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC...",
    "fields_filled": ["name", "email", "date"],
    "fields_not_found": ["middle_initial"],
    "runtime": 3.45,
    "page_count": 2
}

Response fields

  • status - indicates the status of the request (complete or processing).
  • success - indicates if the request completed successfully. True or False.
  • output_format - output format: pdf or png.
  • output_base64 - base64-encoded filled form (PDF or PNG). Decode with base64.b64decode(value).
  • fields_filled - list of field keys that were successfully filled.
  • fields_not_found - list of field keys that couldn’t be matched to form fields.
  • runtime - processing time in seconds.
  • page_count - number of pages processed.
  • error - if there was an error, this contains the error message.
Important!: All response data will be deleted from datalab servers an hour after the processing is complete, so make sure to get your results by then.

Field data format

The field_data parameter accepts structured data with values and descriptions:

Basic fields

field_data = {
    "first_name": {
        "value": "John",
        "description": "First name"
    },
    "last_name": {
        "value": "Doe",
        "description": "Last name"
    }
}

Nested fields

You can use nested structures for complex forms:
field_data = {
    "employee": {
        "name": {
            "value": "John Doe",
            "description": "Employee's full name"
        },
        "hire_date": {
            "value": "01/01/2024",
            "description": "Date of hire"
        }
    }
}
These will be flattened to dot-notation keys like employee.name and employee.hire_date.

Checkboxes and radio buttons

For checkboxes and radio buttons, use boolean-like values:
field_data = {
    "is_citizen": {
        "value": "yes",  # or True, "1", "checked", "x"
        "description": "US citizenship status"
    },
    "gender": {
        "value": "Male",
        "description": "Gender selection"
    }
}
Values like "yes", "true", "1", "checked", "x" will check the box. Values like "no", "false", "0" will leave it unchecked.

Compound data

The API can automatically split compound data across multiple fields:
field_data = {
    "full_address": {
        "value": "123 Main St, New York, NY, 10001",
        "description": "Complete address"
    }
}
This will be intelligently split into street address, city, state, and ZIP code fields.

Context parameter

The context parameter helps guide field matching for forms with multiple use cases:
# For employment forms (I-9, W-4)
context = "Initial hire for new employee"
# or
context = "Rehire - employee returning after break"

# For tax forms
context = "Tax year 2024"

# For general forms
context = "Standard form filling for typical use case"
If not provided, the API defaults to the most common use case for the form type.

Full code sample

import os
import time
import requests
import json
import base64
from pathlib import Path

API_URL = "https://www.datalab.to/api/v1/fill"
API_KEY = os.getenv("DATALAB_API_KEY")

def fill_form(
    form_path: Path,
    field_data: dict,
    context: str = None
):
    url = "https://www.datalab.to/api/v1/fill"

    #
    # Submit initial request
    #
    with open(form_path, 'rb') as f:
        form_data = {
            'file': (form_path.name, f, 'application/pdf'),
            'field_data': (None, json.dumps(field_data)),
        }
        
        if context:
            form_data['context'] = (None, context)

        headers = {"X-Api-Key": API_KEY}

        response = requests.post(url, files=form_data, headers=headers)
        data = response.json()

    #
    # Poll for completion
    #
    max_polls = 300
    check_url = data["request_check_url"]
    
    for i in range(max_polls):
        response = requests.get(check_url, headers=headers)
        check_result = response.json()

        if check_result['status'] == 'complete':
            if check_result['success']:
                # Decode the filled form
                filled_form_bytes = base64.b64decode(check_result['output_base64'])
                
                # Save to file
                output_ext = 'pdf' if check_result['output_format'] == 'pdf' else 'png'
                output_path = form_path.with_suffix(f'.filled.{output_ext}')
                with open(output_path, 'wb') as f:
                    f.write(filled_form_bytes)
                
                print(f"Form filled successfully!")
                print(f"Fields filled: {check_result['fields_filled']}")
                print(f"Fields not found: {check_result['fields_not_found']}")
                print(f"Saved to: {output_path}")
                
                return output_path
            else:
                print(f"Form filling failed: {check_result.get('error')}")
                return None
                
        elif check_result["status"] == "failed":
            print("Failed to fill form")
            return None
        else:
            print("Waiting 2 more seconds to re-check status")
            time.sleep(2)

# Example usage
field_data = {
    "name": {"value": "John Doe", "description": "Full name"},
    "email": {"value": "[email protected]", "description": "Email address"},
    "date": {"value": "12/15/2024", "description": "Date"}
}

fill_form(
    Path("form.pdf"),
    field_data,
    context="General form filling"
)

Supported form types

The form filling API supports:
  • PDF with native AcroForm fields - uses pypdf to fill existing form fields
  • PDF with visual fields - uses LLM to detect field locations and adds text overlays
  • Images (PNG, JPG) - uses LLM to detect field locations and draws text on the image
The API automatically detects the input type and uses the appropriate method.

Try it out

Sign up for Datalab and try out form filling - it’s free, and we’ll include credits. If you need a self-hosted solution, contact us for on-prem pricing. As always, write to us at [email protected] if you want credits or have any specific questions / requests!