Page tree
Skip to end of metadata
Go to start of metadata

Overview

The Attivio Intelligence Engine (AIE) includes a default set of components for processing content. These components include document transformers, query transformers, response transformers and routers.  (There are also some special-purpose components.) 

A component is a container for a transformer, which is an instance of a Java transformer class.  (A list of available transformer classes is here.) The component provides the transformer with a set of configuration properties that tell the transformer what we want it to do.  The component is then inserted into a workflow, which brings IngestDocuments, QueryRequests or SearchDocuments to the component and then takes the processed document on to the next stage of the workflow.  A component typically does not know anything about where the documents come from or where they are going next.  For this reason, a component can be used in multiple workflows simultaneously.

The same transformer may be used in multiple components (all with different names) if different sets of properties are required.

Use the AIE Administrator to modify components!

Most AIE components can be created, removed, modified and maintained directly from the AIE Administrator user interface using Dynamic Configuration.

View incoming links.

Component Editing

AIE has many transformers, nearly all of which can be configured as components in the AIE Administrator interface.  The new component can then be added to a workflow, also using the AIE Administrator.  This section shows a simple example of creating, configuring, and using a custom component.

Using the factbook example (from the Quick Start Tutorial), navigate to the Palette and click the New link.  This opens a New Component dialog.  Filter for "capitalize" and select the Capitalize component type from the list.  Click OK to open a New Component Editor that is linked to the Capitalize transformer class.  This class takes text from an IngestDocument field and applies capitalization to it.  It can uppercase, lowercase or title case the text, and then write the result into the same or a different field.

Name the new component myCapitalizer.  Configure the field map to take text from the title field, modify it, and write it back into the title field.  The default mode is to uppercase the text.  Save the component.

Navigate to the AIE Administrator Workflows > Ingest page, and open the ingest workflow for editing. Use the Add Existing Component button to add the myCapitalizer component to the list of stages.  It will appear at the bottom of the list.  Use the Move Up button to put the new component in the next-to-last position, just before the indexer subflow. Save the changes.

Run (or rerun) the country and news feeds of the factbook demo, and use SAIL to view some results.  Use the Search Options debug checkbox to expose all document fields in the search results.  You'll see that the value of the title field is now all capitalized.

To view the XML configuration of the new component and the modified workflow, go to the AIE CLI window and issue the update command.  This writes the dynamic changes to configuration files in the <project-dir>\conf\ directory tree.  This is the XML configuration of myCapitalizer:

<project-dir>\conf\components\myCapitalizer.xml
<?xml version="1.0" encoding="UTF-8"?>

<component xmlns="http://www.attivio.com/configuration/type/componentType" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
name="myCapitalizer" 
class="com.attivio.platform.transformer.ingest.field.Capitalize" 
xsi:schemaLocation="http://www.attivio.com/configuration/type/componentType http://www.attivio.com/configuration/type/componentType.xsd ">
  <!--Generated configuration-->
  <properties>
    <map name="fieldMapping">
      <property name="title" value="title"/>
    </map>
  </properties>
</component>

Although most components can be configured entirely in the AIE Administrator interface, a few are too complex to modify in that environment.  For those few you'll have to copy and modify an XML configuration.  

This page describes the XML elements, attributes and properties that you might encounter when editing an XML component configuration.

Component XML Configuration

Components can be configured using AIE's XML configuration features.  (See the <install-dir>\conf\core-app\attivio-components.xml file for many examples.)

XML Syntax

To define a component, follow this general pattern:

XML configuration file
<components>
  <component name="<component-name>" class="<full-component-class-name>" override="<true/false>">
    <description>your description here</description>
    <performance maxInstances="<number of instances to run>" />
    <properties>
      <property name="<property-name>" value="<property-value>"/>
      ...
      <map name="<map-name>">
        <property name="<mapped-name>" value="<mapped-value>"/>
        ...
      </map>
    </properties>
  </component>
  ...
</components>

Each component supports a variety of properties, but not all properties are supported by all components. For example, not all components use a <map> block.

Component definitions are contained within the <components> element of an XML configuration file. Each component has a unique name and identifies the transformer class that will be used when the component is instantiated.

All components will automatically have the system standard input and output ports. A component can define additional inputs or outputs as needed for communication in a multi-node or multi-process system. After the input definitions, the component can optionally define a performance element (see below).

Component Element

The component element has the following attributes.

  • name - A required attribute which names the component. Component names must be unique. It is an error to reuse a component name without using the override attribute.
  • class - An attribute which defines the implementation java class for the component. This attribute must be defined unless the override attributes is used.
  • override - The override attribute is used to replace a previous component definition with a new one. For instance, when you edit a component in the AIE Administrator, the resulting dynamic.xml file contains a new component definition with the override attribute set to "true." It is an error to override a component that has not been defined. 

Description Element

Used to add a comment to the component definition. 

Performance Element

<performance maxInstances="4"/>

By default, AIE starts multiple instances of each workflow component to provide parallel processing in the ingestion and query workflows.  The default behavior is to start N instances of each component where N is the number of CPUs on the server.  If the computer has only one CPU, however, AIE starts two instances of each component.  Main article: Processor Utilization Tuning.

You can set a manual default for all ingest or all query components using the default-instances property in the project configuration file

You can override any of the previous defaults in the configuration of a specific component using the maxInstances attribute of the component's performance element (see example below). You can set maxInstances by editing the component in the AIE Administrator.  Using a value less than "1" reverts to the default behavior.

This Performance element example is from a <install>/conf/<module>/module.xml file
    <component name="localeDetector" class="com.attivio.basistech.transformer.ingest.linguistics.BasisTechDetectLocale" override="true">
      <performance maxInstances="4"/>
      <properties>
        <property name="languages" value="languages"/>
        <list name="input">
          <entry value="text"/>
        </list>
        <property name="minimumLength" value="50"/>
      </properties>
    </component>

Thread Limits in Linux

When altering maxInstances, be careful about the thread limits imposed by your Linux operating system. If you exceed this thread limit, you may run into unexpected behavior caused by critical threads being unable to start.

To see if you are in danger of running out of threads, run these commands:

ulimit -u
ps -eaFm | wc -l

If the number from the second command starts approaching the number from the first command, you are in danger of running out of threads.

From here, you have two options:

  • Decrease maxInstances to reduce the number of threads you are using
  • Increase OS thread restriction on the Attivio user account

    su - attivio
    ulimit -u <newValue>

    Note: Consult with your IT team before taking this step. Extreme thread counts can contribute to overhead on your processor, and other processor related problems.

 

Properties Element

A list of component properties is enclosed between <properties></properties> tags.

Property Element

As a general rule, the XML properties of a component mirror the "set" methods of the underlying Java transformer class and take this general form:

<property name="<property-name>" value="<property-value>" />

A property has a name attribute and a value attribute.  For instance, the Capitalize transformer class has a setMode() method, which accepts values of UPPER, LOWER and TITLE.  The XML configuration of the myCapitalizer component (shown in the example on this page) uses the mode property to supply the value UPPER to this set method. 

<property name="mode" value="UPPER" />

Map Element

A map element is generally employed to map input fields to output fields using a series of property elements.  The name attribute is usually interpreted as the input field name, and the value attribute is usually interpreted as the output field name.  If both fields names are the same, the output value overwrites the input value.  Note that other types of mappings are common, but the syntax always follows this pattern:

      <map name="<map-name>">
        <property name="<mapped-name>" value="<mapped-value>"/>
        ...
      </map>

The map element, if any, is enclosed between the <properties></properties> tags.

  • No labels