Skip to main content
Datalab supports the following file types for document conversion:

PDF

ExtensionMIME Type
.pdfapplication/pdf

Spreadsheets

ExtensionMIME Type
.xlsapplication/vnd.ms-excel
.xlsxapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xlsmapplication/vnd.ms-excel.sheet.macroEnabled.12
.xltxapplication/vnd.openxmlformats-officedocument.spreadsheetml.template
.csvtext/csv
.odsapplication/vnd.oasis.opendocument.spreadsheet

Word Documents

ExtensionMIME Type
.docapplication/msword
.docxapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
.odtapplication/vnd.oasis.opendocument.text

Presentations

ExtensionMIME Type
.pptapplication/vnd.ms-powerpoint
.pptxapplication/vnd.openxmlformats-officedocument.presentationml.presentation
.odpapplication/vnd.oasis.opendocument.presentation

HTML

ExtensionMIME Type
.htmltext/html

Ebooks

ExtensionMIME Type
.epubapplication/epub+zip

Images

ExtensionMIME Type
.pngimage/png
.jpgimage/jpeg
.jpegimage/jpeg
.webpimage/webp
.gifimage/gif
.tiffimage/tiff

Detecting MIME Types

To automatically detect a file’s MIME type in Python:
import filetype

mime = filetype.guess("document.pdf")
if mime:
    print(mime.mime)  # application/pdf
Install with pip install filetype.

Size Limits

See API Limits for file size and page limits.

Next Steps

Quickstart

Get started converting documents in minutes.

Document Conversion

Detailed guide to converting documents to Markdown, HTML, or JSON.

API Limits

Understand file size limits, page limits, and rate limiting.

File Upload

Upload files to Datalab storage for use in pipelines.