Conditional routing is supported in AIE for both ingestion and query processing workflows. Conditional routing is useful for criteria-based document list splitting, load balancing work in a multi-node or multi-process configuration, and for expansion of attachments or archives. Conditional routing is achieved by components which extend the AbstractDocumentRouter or AbstractQueryRouter classes.
For the purposes of this discussion, "routers" and "splitters" both refer to a component that can conditionally send IngestDocuments from one workflow to another.
For a tutorial demonstration of routing, see Content Ingestion - Concepts and Tools.
Splitters use a rejoin switch that is coded as an XML property, but which is not an actual property of the underlying Java objects. This switch tells AIE whether to return the split documents back to their point of origin. If true (the default), AIE automatically constructs a joiner component and adds it to the workflow after the splitter.
Do not be confused that rejoin doesn't appear in the Javadoc for the splitter. It operates at a higher level.
View incoming links.
Supported Routing Patterns
At present four routing patterns are supported.
A subflow is a non-conditional branch from one workflow to another. The second workflow behaves like a classic subroutine. Documents are processed by the subflow and are then returned to the original workflow.
The subflow stage is a predefined feature of the AIE workflow toolkit. Use it to call one workflow from another:
The most common splitter in AIE workflows is SplitDocumentListByFieldValue .
AIE supports a number of "splitter" components, each of which can send an IngestDocument to one of N different workflows based on some criterion. In a pure "split" pattern, the processing paths depart the original workflow and do not return to it.
The simplest example is a router that sends documents to different workflows based on a field value. If the value of the document's sensitivity field is sensitive, then send the document to the sensitiveWorkflow. If the value of the sensitivity field is very sensitive, then send the document to the verySensitiveWorkflow. Otherwise, send it to the normalWorkflow:
You can specify the routing by creating the workflows and components and then opening the workflow in the AIE Administrator.
The underlying code is as follows:
In the example above, the rejoin property is false, which makes this a pure splitter pattern. (See "Split and Rejoin", below, for the opposite example.)
A cyclic router sends documents back to an upstream workflow that will eventually return those same messages to itself. The most common example is a zip/archive file extractor component. Each time the extractor sees a zip file it extracts all of the contained files, making a new IngestDocument out of each one. The next component in the workflow is usually a cyclic splitter that sends these new documents back to the beginning of the ingestion path. The textextraction workflow contains an example of a cyclic router.
In the following example, all new documents begin the ingestion process at the StartHere workflow. They all pass through the UnzipArchives workflow, where the ZipArchiveExtractor intercepts the documents that have zipped attachments. These are opened and the contents are extracted into new IngestDocuments, with the isNew field set to "true".
All the IngestDocuments, old and new, are then examined by the FilterNewDocs splitter. If isNew is "false" or is not set, the document passed on to the following workflow. If isNew is "true", the document is sent back to StartHere, where isNew is reset to "false".
Eventually the "new" documents pass through the UnzipArchives workflow again. When all nested Zip archives have been extracted, all of these documents eventually pass the splitter and proceed to the next workflow.
Note that the rejoin property is still false, meaning that the routed documents do not automatically return to the splitter. The following example shows what happens when rejoin is "true".
Split and Rejoin
When the rejoin property of a splitter is set to "true", the target workflows behave like subroutines. The splitter conditionally dispatches IngestDocuments to one or more external workflows. The workflows process the documents and then return them to the splitter. (Technically, the documents are returned to a Joiner component that automatically appears right after the splitter in the workflow.) The returned documents rejoin the main flow.
In the example below, documents that contain a specific field value are shunted aside to a special workflow, processed there, and are then returned to the original workflow where they rejoin the flow.
Note that the rejoin property is "true" this time. This automatically creates and inserts a "joiner" stage following the splitter in the workflow. (Note that "true" is the default value for this property, so split-and-join components often omit it from the XML configuration.)
Do not reuse splitters!
Each rejoining splitter must be used in exactly one workflow. If you insert the same splitter instance into multiple workflows, you will find that the split documents all return to the same workflow. This behavior is usually not desirable.
Unsupported Routing Patterns
Combining of the patterns above into a single routing configuration is not allowed. In the example pictured below, an invalid cycle and splitter combination is shown along with the proper approach to achieving the same result.