Page tree
Skip to end of metadata
Go to start of metadata

Overview

Conditional routing is supported in AIE for both ingestion and query processing workflows. Conditional routing is useful for criteria-based document list splitting, load balancing work in a multi-node or multi-process configuration, and for expansion of attachments or archives. Conditional routing is achieved by components which extend the AbstractDocumentRouter  or AbstractQueryRouter  classes.

For the purposes of this discussion, "routers" and "splitters" both refer to a component that can conditionally send IngestDocuments from one workflow to another.

Routing Tutorial

For a tutorial demonstration of routing, see Content Ingestion - Concepts and Tools.

Rejoin Property

Splitters use a rejoin switch that is coded as an XML property, but which is not an actual property of the underlying Java objects. This switch tells AIE whether to return the split documents back to their point of origin. If true (the default), AIE automatically constructs a joiner component and adds it to the workflow after the splitter.

Do not be confused that rejoin doesn't appear in the Javadoc for the splitter. It operates at a higher level.

View incoming links.

 

Supported Routing Patterns

At present four routing patterns are supported.

Subflow

A subflow is a non-conditional branch from one workflow to another. The second workflow behaves like a classic subroutine. Documents are processed by the subflow and are then returned to the original workflow.

The subflow stage is a predefined feature of the AIE workflow toolkit. Use it to call one workflow from another:

SubflowDiagram
<workflow name="originalWorkflow" type="ingest" />
  <documentTransformer name="stage1" />
  <subflow name="otherWorkflow" />
  <documentTransformer name="stage2" />
</workflow>

Split Flow

The most common splitter in AIE workflows is SplitDocumentListByFieldValue .

AIE supports a number of "splitter" components, each of which can send an IngestDocument to one of N different workflows based on some criterion. In a pure "split" pattern, the processing paths depart the original workflow and do not return to it.

The simplest example is a router that sends documents to different workflows based on a field value. If the value of the document's sensitivity field is sensitive, then send the document to the sensitiveWorkflow. If the value of the sensitivity field is very sensitive, then send the document to the verySensitiveWorkflow. Otherwise, send it to the normalWorkflow:

DividingRouter

You can specify the routing by creating the workflows and components and then opening the workflow in the AIE Administrator.

 

The underlying code is as follows:

<components>
  <component name="splitSensitiveDocs"
    class="com.attivio.platform.transformer.ingest.routing.SplitDocumentListByFieldValue">
    <properties>
      <property name="input" value="sensitivity" />
      <property name="rejoin" value="false" />
      <map name="routingMap">
        <property name="sensitive" value="sensitiveWorkflow" />
        <property name="very sensitive" value="verySensitiveWorkflow" />
      </map>
      <property name="routingMapElse" value="normalWorkflow" />
    </properties>
  </component>
</components>

<workflow name="filterSensitiveDocs" type="ingest">
  <documentTransformer name="stage1" />
  <splitter name="splitSensitiveDocs" />
</workflow>

 

In the example above, the rejoin property is false, which makes this a pure splitter pattern. (See "Split and Rejoin", below, for the opposite example.)

Cyclic Flow

A cyclic router sends documents back to an upstream workflow that will eventually return those same messages to itself. The most common example is a zip/archive file extractor component. Each time the extractor sees a zip file it extracts all of the contained files, making a new IngestDocument out of each one. The next component in the workflow is usually a cyclic splitter that sends these new documents back to the beginning of the ingestion path. The textextraction workflow contains an example of a cyclic router.

In the following example, all new documents begin the ingestion process at the StartHere workflow. They all pass through the UnzipArchives workflow, where the ZipArchiveExtractor intercepts the documents that have zipped attachments. These are opened and the contents are extracted into new IngestDocuments, with the isNew field set to "true".

All the IngestDocuments, old and new, are then examined by the FilterNewDocs splitter. If isNew is "false" or is not set, the document passed on to the following workflow. If isNew is "true", the document is sent back to StartHere, where isNew is reset to "false".

Eventually the "new" documents pass through the UnzipArchives workflow again. When all nested Zip archives have been extracted, all of these documents eventually pass the splitter and proceed to the next workflow.

CycleRouter
<components>
  <component name="FilterNewDocs"
    class="com.attivio.platform.transformer.ingest.routing.SplitDocumentListByFieldValue">
    <properties>
      <property name="input" value="isNew" />
      <property name="rejoin" value="false" />
      <map name="routingMap">
        <property name="true" value="StartHere" /> <!-- the cycle -->
        <!-- no else clause, so we fall through to the next component or workflow -->
      </map>
    </properties>
  </component>
</components>

<workflow name="UnzipArchives" type="ingest">
  <documentTransformer name="ZipArchiveExtractor" />
  <splitter name="FilterNewDocs" />
</workflow>

Note that the rejoin property is still false, meaning that the routed documents do not automatically return to the splitter. The following example shows what happens when rejoin is "true".

Split and Rejoin

When the rejoin property of a splitter is set to "true", the target workflows behave like subroutines. The splitter conditionally dispatches IngestDocuments to one or more external workflows. The workflows process the documents and then return them to the splitter. (Technically, the documents are returned to a Joiner component that automatically appears right after the splitter in the workflow.) The returned documents rejoin the main flow.

In the example below, documents that contain a specific field value are shunted aside to a special workflow, processed there, and are then returned to the original workflow where they rejoin the flow. 

SplitRouter
<component xmlns="http://www.attivio.com/configuration/type/componentType" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   name="mySplitter" 
   class="com.attivio.platform.transformer.ingest.routing.SplitDocumentListByFieldValue"
   xsi:schemaLocation="http://www.attivio.com/configuration/type/componentType http://www.attivio.com/configuration/type/componentType.xsd ">
  <properties>
    <property name="input" value="keyFieldName"/>
    <property name="rejoin" value="true" />
    <map name="routingMap">
        <property name="keyvalue" value="specialWorkflow" /> 
        <!-- no else clause, so we fall through to the next component or workflow -->
    </map> 
  </properties>
</component>

<workflow name="mainWorkflow" type="ingest">
  <documentTransformer name="stage1" />
  <splitter name="mySplitter" />
  <documentTransformer name="stage2" />
</workflow>

Note that the rejoin property is "true" this time. This automatically creates and inserts a "joiner" stage following the splitter in the workflow. (Note that "true" is the default value for this property, so split-and-join components often omit it from the XML configuration.)

Do not reuse splitters!

Each rejoining splitter must be used in exactly one workflow.  If you insert the same splitter instance into multiple workflows, you will find that the split documents all return to the same workflow.  This behavior is usually not desirable.

 

Unsupported Routing Patterns

Combining of the patterns above into a single routing configuration is not allowed. In the example pictured below, an invalid cycle and splitter combination is shown along with the proper approach to achieving the same result.

UnsupportedRouter

  • No labels