Reading Out File Contents and File Properties
EvalKositValidatorReport()
This function evaluates a report of the KoSIT validator which was previously created with the file macro CreateKositValidatorReport(). The function returns the validation result (i.e., the recommendation to accept/reject the document from the "assessment" node).
Optionally, additional information about the recognized scenario and further details can be read out and assigned to specific target fields.
Return type: Boolean
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2 | Text | Name of a target field in the document to which the name of the recognized validation scenario is assigned (optional) Return type: text |
3 | Text | Name of a target field in the document to which detailed information about the validation result extracted from the report is assigned (optional) Return type: text Syntactically, this information is provided in the form of a JSON expression. |
Examples
EvalKositValidatorReport("*.report.xml", "Scenario", "ReportDetails") returns the validation result from the report file attachment found (e.g., TRUE). The name of the recognized scenario and checked details are written to the Scenario and ReportDetails fields.
ExtractFullTextOcr()
By performing OCR, this function detects the text content of a PDF file containing raster images or images.
Common image file formats are supported, including multi-page image file formats in the case of TIFF. For PDF files, both native text content is extracted and OCR performed across embedded images.
Return type: text
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2 | Text | Pages to be included:
|
3 | Text | Language of the OCR dictionary to be used (e.g., The appropriate dictionary file must be available in the program directory for the respective language (e.g., |
4 | Bool | Boolean value determining whether, in the OCR, only the full-page images contained in the PDF will be included (e.g., scanned pages included in PDF) Default value: Otherwise, all images embedded in a PDF page are processed by OCR. |
5 | Number | Timeout value defining the number of seconds after which OCR processing of a single page will be aborted if no result has become available (optional) The text content of such a page will then not be adopted, and the program may continue with the next page. |
Examples
ExtractFullTextOcr("*.tif", "1-3", "German") returns the text content of the first three pages of a TIFF file.
ExtractFullTextPdf()
This function reads the native text content of a PDF file attachment.
Return type: text
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2 | Text | Pages to be included:
|
Examples
ExtractFullTextPdf("*.pdf", "First") returns the text content of the first page of a PDF file.
FindEInvoiceFileByFormat(), FindEInvoiceFilesByFormat()
The FindEInvoiceFileByFormat() function reads the name of the first file attachment found that corresponds to a specific electronic invoice format (return type: text). If no matching file attachments are found, the return type is an empty string.
The FindEInvoiceFilesByFormat() function reads the names of all file attachments found that correspond to a specific electronic invoice format (return type: array of text values). If no matching file attachments are found, the return type is an empty array.
XML is supported as a file type and PDF is also supported for the ZUGFeRD format. However, ZUGFeRD files in the outdated 1.x format cannot be processed by the program, for which reason they will not appear in search results.
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachments to be processed Default value: |
2* | Text | Identification of the e-invoice format you are looking for or comma-separated list of multiple formats:
|
Examples
FindEInvoiceFileByFormat("*.xml", "XRechnung") returns the name of the first XRechnung file attachment found (e.g., "invoice1.xml" or "" if there is no match).
FindEInvoiceFilesByFormat("*.xml", "XRechnung") returns the names of all XRechnung file attachments found (e.g., ["invoice1.xml", "invoice2.xml"] or [] if there are no hits).
GetEInvoiceFileFormat()
This function determines for an XML file attachment or a (ZUGFeRD) PDF file attachment with an embedded XML file which known e-invoice format this file corresponds to.
For this purpose, only the relevant XML node with the format identifier is evaluated. No more validation of the XML content is performed. Optionally, additional information about the recognized format and version can be extracted and assigned to specific target fields.
Return type: text (PeppolInvoice, PeppolPintInvoice, XRechnung, Zugferd, or Unknown)
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2 | Text | Name of a target field in the document (optional) The recognized PEPPOL document type, XRechnung syntax or ZUGFeRD version is assigned to the target field. Return type: text |
3 | Text | Name of a target field in the document (optional) The recognized PEPPOL version, XRechnung version or the recognized ZUGFeRD profile is assigned to the target field. Return type: text |
4 | Text | Name of a target field in the document (optional) An error message text is assigned to the target field if no known e-invoice format is recognized. Return type: text |
Examples
GetEInvoiceFileFormat("*.xml", "Syntax", "Version", "Error") returns the type of an XML file attachment (e.g., "XRechnung"). The syntax and version of the file attachment are also written in the relevant target fields (e.g., "UblInvoice" and "Version_3_0"). If the type is not recognized (type Unknown), the cause of the error can be written in the Error field.
GetExternalFileContent()
This function reads the content of an external file from the file system as a text value.
Return type: text
Parameter | Data Type | Description |
|---|---|---|
1* | Text | Full path of the file in the file system |
Examples
GetExternalFileContent("c:/test.txt") returns the text content of the specified file.
GetExternalFileProperty()
This function reads a property of an external file from the file system.
The return type will vary depending on the preferred property.
Parameter | Data Type | Description |
|---|---|---|
1* | Text | Full path of the file in the file system |
2* | Text | Name of the property to be read:
|
Examples
GetFileContent("*.txt", "Original") returns the content of a text file attachment of the type "Original".
GetFileContent()
Reads the content of a file attachment as a text value.
Return type: text
Parameter | Data Type | Description |
|---|---|---|
1* | Text | Name filter for the file attachment to be to be read in, whereby only the first attachment found is taken into account Default value: |
2* | Text | Restriction of the search to file attachments of a certain type (default value: all attachments) (optional):
|
GetImageProperty()
This function reads a property of an image file attachment. The common raster image file formats are supported.
The return type will vary depending on the preferred property.
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2* | Text | Name of the property to be read:
|
Examples
GetImageProperty("*.tif", "PageCount") returns the number of pages of a TIFF file attachment (e.g., 3).
GetJsonProperty()
This function reads the value of a JSON property from a field value or from a file that constitutes a JSON document in terms of content.
The return type varies depending on the data type of the value. For date values that are saved as a string in JSON format, the parser only performs an implicit conversion to a date value for a common syntax (e.g., ISO format).
If multiple values are read depending on the third call parameter, the values will be returned in an array, even if only a single value is found. If a property is not found at all or has the value NULL, the return value will be an empty string or an empty array.
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Value or name of a field or name filter for the file attachment to be processed, whereby only the first attachment found is taken into account First, an attempt is made to interpret the value directly as a JSON document. Then, an attempt is made to find a field with the same name. If no field is found, a search is performed for a matching file. Default value: |
2* | Text | JSONPath expression for addressing the JSON property to be read out Use the same syntax as with the index data reader "Json". |
3 | Bool | Boolean value determining whether all values will be read in for a property with potentially multiple values (i.e., an array) If not, only the first value will be adopted. Default value: |
Examples
GetJsonProperty("JsonData", "$.Name") returns the value of a Name property from the JSON data in a field called JsonData (e.g., "Value1").
GetJsonProperty("JsonData", "$.Name") is comparable to the previous example. Here, the source field is not addressed as a variable, but by its name.
GetJsonProperty("*.json", "$.Names[*]", TRUE) returns all values of an array Names from a JSON file attachment (e.g., ["Value1", "Value2"]).
GetPdfProperty()
This function reads a property from of a PDF file attachment.
The return type will vary depending on the preferred property.
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2* | Text | Name of the property to be read:
|
3 | (variable) | First additional parameter that applies only to certain properties (optional) |
4 | (variable) | Second optional additional parameter that applies only to certain properties |
Examples
GetPdfProperty("*.pdf", "EmbeddedNames", "*.xml") returns the names of embedded XML files in a PDF file attachment (e.g., ["factur-x.xml"]).
GetXmlNode()
This function reads the text content of a node from a field value or file attachment that represents an XML document in terms of content.
The return value is always of type "Text." If required, the return value must be converted to the desired target type. If multiple nodes are read in depending on the fourth call parameter, the values are returned in an array, even if only a single node is found. If a node is not found at all, the return value will be an empty string or an empty array.
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Value or name of a field or name filter for the file attachment to be processed, whereby only the first attachment found is taken into account First, an attempt is made to interpret the value directly as an XML document. Then, an attempt is made to find a field with the same name. If no field is found, a search is performed for a matching file. Default value: |
2* | Text | XPath expression for addressing the XML node to be read out Use the same syntax as for the index data reader "Xml". |
3 | Bool | Boolean value determining whether namespace information contained in XML documents will be removed from them Problems with parsing can be avoided by removing the namespace information. An XPath expression for referencing nodes must then also be specified without a namespace prefix. Default value: |
4 | Bool | Boolean value determining whether all values of a potentially multiple node are to be read in (as an array) If not, only the first value of the node will be adopted. Default value: |
Examples
GetXmlNode("XmlData", "/ubl:Invoice/cbc:ID") returns the value of the IDnode from the XML data in a field named XmlData, e.g. "00004711".
GetXmlNode("XmlData", "/ubl:Invoice/cbc:ID") is similar to the previous example. Here, the source field is not addressed as a variable, but by its name.
GetXmlNode("*.xml", "/Invoice/InvoiceLine/ID", TRUE, TRUE) returns the values of all ID nodes from an XML file attachment, e.g. ["1","2"].
IsPeppolXml(), IsPeppolPintXml(), IsUblInvoiceXml(), IsXRechnungXml(), IsZugferdXml(), IsZugferdPdf()
These functions determine whether an XML file attachment or a (ZUGFeRD) PDF file attachment with an embedded XML file corresponds to a known PEPPOL, UBL, XRechnung, or ZUGFeRD format.
For this purpose, only the relevant XML node with the format identifier is evaluated. No further validation of the XML content is performed. Optionally, additional information about the recognized format and version can be read out and assigned to specific target fields.
The syntax of the identifier for versions may vary. Only detection patterns for main versions are stored in the program. This way, the program does not have to be adapted for each new minor version. As an example, if the main version number 2.x is known to the program, then the returned identifier will be Version_2_x. In the case of unknown main version numbers, the entire version number will be read dynamically from the ID string of the e-invoice. In this case, the version number will contain the number of the sub-version instead of x.
Return type: Boolean
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value for Default value for NoticeIf a name pattern that is also suitable for PDF files is transferred to the |
2 | Text | Name of a target field in the document to which the recognized PEPPOL/UBL document type, XRechnung syntax, or ZUGFeRD version is assigned (optional) Return type: text |
3 | Text | Name of a target field in the document to which the recognized PEPPOL/UBL version, XRechnung version, or recognized ZUGFeRD profile is assigned (optional) Return type: text |
4 | Text | Name of a target field in the document to which an error message text is assigned if the file attachment was not recognized as the desired format (optional) Return type: text |
Examples
IsXRechnungXml("*.xml", "Syntax", "Version", "Error") returns the check result for an XML file attachment as to whether the file attachment is an XRechnung (e.g., TRUE). The syntax and version are also written to the relevant target fields (e.g., "UblInvoice" and "Version_3_0"). In the event of a negative result, an error cause might be written in the Error field.
ReadBarcode()
This function reads barcode values from a (multi-page) TIFF or PDF file attachment. If only a single value is read, the return type will be text. If multiple values are read, the return type will be an array.
The search for a scalar value is defined by the inclusion of only one single page (First, Last, or the single page number) and only one value on this page (First or Last) according to the following parameters. If no value is found, an empty string or array will be returned.
Parameter | Data Type | Description |
|---|---|---|
1 | Text | Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: |
2 | Text | Type of barcodes to search for:
|
3 | Text | Pages to be included:
|
4 | Text | Within a page, found barcode locations to be used:
|
5 | Text | Filter to limit the search to barcodes with specific content or structure (optional) Syntax: see Name Filter Syntax. Default value: |
6 | Number | Resolution (dpi) for implicit conversion to raster images required for PDF pages before barcode recognition Default value: |
Examples
ReadBarcode("*.tif", "Simple", , , "A#######") returns the value of a barcode of a given pattern on the first page of a TIFF file attachment (e.g., "A0000001" or "" if there is no match).
ReadBarcode("*.tif", "Simple", "All", "All") returns the values of all barcodes on all pages of a TIFF file attachment (e.g., ["A0000001", "A0000002", "B0000001"] or [] if there is no match).