The Processing Step
The input step is followed by the processing step. During processing, the processing worker first periodically checks the administrative database to see if new documents are available for further processing. Usually, these documents originate from the previous input step; however, they may also already have been in the processing step. If they were indeed in the processing step, it will have been the result of a hold for an error or a check. Once the issue behind the hold is resolved, the documents are once again queued for processing.
Once the processing worker has found documents, the actual processing starts. The initial step is extraction of the document index data. Index data readers for specific input formats are used for this purpose. The basis for reading the index data is always a file attachment in the corresponding format. This could be, for example, an XML file that was read in the input step. If a document contains alternate or parallel file attachments in different formats, several index data readers can also be executed one after the other.
If an input system such as a native email interface does not explicitly provide the index data as a file, then, in the input step, an artificial index file attachment is generated for the reader to refer to. For more information on the name and structure of this artificial file attachment, see the description of the respective input system. Usually, the artificial file attachment is in JSON format, going under the file name index.json, which allows the file to be processed with the index data reader specifically for files in JSON format. The content structure is kept as simple and flat as possible. In the case of mail interfaces, for example, the properties are arranged at the top level and can be addressed with the names "Subject" and "Body."
Along with the process of reading index data, the document is assigned to a field catalog. A field catalog consists of a fixed set of fields that a document must have. These fields may originate in the index data or are pre-populated or generated internally.
It is also possible to declare other field catalogs as alternatives, to be assigned dynamically at runtime. When declaring alternative field catalogs, a default field catalog is first read in. Its contents can be used to formulate conditions to address a deviating field catalog. With regard to header data, a document always includes all fields of the catalog assigned to it (with some fields empty if there is no value to fill them with). In the case of item data, on the other hand, only as many item lines are generated as are available in the index data, or that configuration requires be artificially added. Generation of item lines always includes all the fields belonging to a given line.
If required, the name of a processing scenario to be used as an alternative can be determined dynamically from the contents of the default field catalog. An alternative processing scenario is determined before the determination of the alternative field catalog, the reason being that the document will not be processed at the current position anyway. Instead, the document is put on hold and waits in the processing step to be processed again by the process worker responsible for the scenario just determined.
After the index data has been read, macro functions are used to prepare content and convert the data and the file attachments of the document in line with configuration. For more information on macro functions, see Functions.