Input Format "Pdf"
This input format allows you to split a multi-page PDF file into several individual files. The split is performed based on certain characteristics of the PDF file's text content, which is evaluated page by page. Only the native content of the PDF file is used to perform the split. Text from embedded image data that requires OCR processing is not included.
Property | Description |
|---|---|
InputFormat[].SplitMode | Definition of the separation mode The split mode defines how to identify the pages where a split into a new part file is performed. When splitting, the original document is discarded and instead, a copy of it is generated for each partial file. The partial file is added to this copy as another attachment. The split documents generated are given the name suffix The following modes are available:
For the In The |
InputFormat[].SplitFieldDef(*) | Definition of the extraction range in Use the same syntax here as for the PDF index data reader (see Index Data Reader "Pdf"). A page range need not be specified (the evaluation is performed implicitly for each page in the present context). |
InputFormat[].SplitValue[](*) | Definition of one or more search terms in Caution: Specify the page number for the The search terms can be wildcard expressions (with wildcards |
InputFormat[].Tolerance | Tolerance range in millimeters. The tolerance range specifies the extent of deviation from a given value for the coordinates of a text fragment to still be considered a match to that value. Default value: |