Skip to main content
Convert markdown to Word documents (DOCX) with support for track changes, insertions, deletions, and comments. This is useful for generating legal documents, contracts with redlines, and collaborative review documents.

Quick Start

from datalab_sdk import DatalabClient

client = DatalabClient()

markdown = (
    "# Contract\n\n"
    "This agreement is between "
    '<ins data-revision-author="Editor" data-revision-datetime="2024-01-15T10:00:00Z">'
    "Acme Corp</ins> and the client."
)

result = client.create_document(markdown=markdown)
result.save_output("contract")  # saves contract.docx
print(f"Document created: {result.page_count} page(s)")

SDK Usage

Use client.create_document() to create a DOCX from markdown:
from datalab_sdk import DatalabClient

client = DatalabClient()

result = client.create_document(
    markdown="# Title\n\nDocument content here.",
    output_format="docx",       # Only 'docx' is supported
    webhook_url=None,           # Optional completion webhook
    save_output="output/doc",   # Optional: saves output.docx automatically
)

print(result.success)         # True if creation succeeded
print(result.page_count)      # Number of pages
print(result.cost_breakdown)  # Cost details
result.save_output("output/contract")  # Saves contract.docx

SDK Method Parameters

ParameterTypeDefaultDescription
markdownstrRequiredMarkdown content with optional track changes markup
output_formatstr"docx"Output format (only "docx" is supported)
webhook_urlstrNoneOptional webhook URL for completion notification
save_outputstr/PathNoneFile path to save the output DOCX
max_pollsint300Maximum polling attempts
poll_intervalint1Seconds between polls

SDK Result Fields

FieldTypeDescription
successboolWhether document creation succeeded
statusstr"complete" when done
output_formatstr"docx"
output_base64strBase64-encoded DOCX file
runtimefloatProcessing time in seconds
page_countintPages in the generated document
cost_breakdowndictCost details
errorstrError message if creation failed

How It Works

  1. Send markdown content with optional track changes markup
  2. The API converts it to a DOCX file with proper Word formatting
  3. Track changes tags become native Word revision marks
  4. The DOCX file is returned as a base64-encoded string

Track Changes Markup

Insertions

Mark inserted text with <ins> tags:
<ins data-revision-author="John Doe" data-revision-datetime="2024-01-15T10:00:00Z">newly added text</ins>
AttributeRequiredDescription
data-revision-authorYesAuthor name for the insertion
data-revision-datetimeYesISO 8601 timestamp (e.g., 2024-01-15T10:00:00Z)

Deletions

Mark deleted text with <del> tags:
<del data-revision-author="Jane Smith" data-revision-datetime="2024-01-15T11:00:00Z">removed text</del>
AttributeRequiredDescription
data-revision-authorYesAuthor name for the deletion
data-revision-datetimeYesISO 8601 timestamp

Comments

Add comments with <comment> tags:
<comment data-comment-author="Reviewer" data-comment-datetime="2024-01-15T12:00:00Z" text="Please verify this clause">annotated text</comment>
AttributeRequiredDescription
data-comment-authorYesAuthor/reviewer name
textYesThe comment text
data-comment-datetimeNoISO 8601 timestamp (defaults to current time)
data-comment-initialNoAuthor initials (auto-generated from name if omitted)

Parameters

ParameterTypeRequiredDescription
markdownstringYesMarkdown content with optional track changes markup
output_formatstringNoOutput format (currently only docx is supported)
webhook_urlstringNoWebhook URL to notify when processing completes

Response

The response follows the standard async pattern — submit, then poll: Initial response:
{
  "success": true,
  "request_id": "abc123",
  "request_check_url": "https://www.datalab.to/api/v1/create-document/abc123"
}
Final response (when polling):
FieldTypeDescription
statusstringprocessing or complete
successboolWhether document creation succeeded
output_formatstringdocx
output_base64stringBase64-encoded DOCX file
runtimefloatProcessing time in seconds
page_countintPages in the generated document
cost_breakdownobjectCost details
errorstringError message if creation failed

Full Example

A contract with insertions, deletions, and reviewer comments:
import requests, json, time, base64, os

headers = {"X-API-Key": os.getenv("DATALAB_API_KEY")}

markdown = """# Service Agreement

## Parties

This agreement is between <ins data-revision-author="Legal" data-revision-datetime="2024-06-01T09:00:00Z">Acme Corporation ("Provider")</ins> and <del data-revision-author="Legal" data-revision-datetime="2024-06-01T09:00:00Z">the Client</del><ins data-revision-author="Legal" data-revision-datetime="2024-06-01T09:00:00Z">GlobalTech Inc. ("Client")</ins>.

## Terms

The service period begins on <comment data-comment-author="Reviewer" text="Confirm start date with finance">January 1, 2025</comment> and continues for <del data-revision-author="Legal" data-revision-datetime="2024-06-01T10:00:00Z">12</del><ins data-revision-author="Legal" data-revision-datetime="2024-06-01T10:00:00Z">24</ins> months.

## Payment

The total contract value is <ins data-revision-author="Finance" data-revision-datetime="2024-06-02T14:00:00Z">$150,000</ins> payable in quarterly installments.
"""

response = requests.post(
    "https://www.datalab.to/api/v1/create-document",
    json={"markdown": markdown, "output_format": "docx"},
    headers=headers
)

check_url = response.json()["request_check_url"]

while True:
    result = requests.get(check_url, headers=headers).json()
    if result["status"] == "complete":
        docx_bytes = base64.b64decode(result["output_base64"])
        with open("service_agreement.docx", "wb") as f:
            f.write(docx_bytes)
        print(f"Document saved ({result['page_count']} pages)")
        break
    time.sleep(2)
The generated DOCX file opens in Microsoft Word with native track changes visible, allowing reviewers to accept or reject each change.

Use Cases

  • Legal document generation — create contracts with tracked revisions
  • Contract redlining — mark up agreements with insertions and deletions
  • Collaborative review — add reviewer comments to documents
  • Document automation — generate Word documents from templates with dynamic content

Next Steps

Track Changes Extraction

Extract track changes from existing Word documents

Document Conversion

Convert documents to markdown, HTML, or JSON

Webhooks

Get notified when document creation completes

Pipelines

Chain processors into versioned, reusable pipelines.