A few things to try to streamline Structured Extraction on very long (50, 100+ page files)
page_range
parameter in the API to ensure we only process the relevant pages. You’ll only be charged for those (even if your document is much longer).
When you submit your marker
request, set page_range
to the right values. For example: 0,2-4
will process pages 0, 2, 3, and 4
. Note that this overrides max_pages
if you set that too, and that our page ranges are 0-indexed
(so 0 is the first page).
page_range
to 0-6
(whichever range includes the entire Table of Contents). Run it with an Extraction schema that’s designed to pull out a table of contents.page_range
values for each sectionmarker
using each page_range
and the corresponding extraction schema for the info you know is in them.marker
submission, polling, and dynamic page range extraction.