Overview
This page describes the types of configuration available for an Attivio node or cluster, and how to create projects and build custom applications based on Attivio's default configuration files.
View incoming links.
Glossary
Individual values you can substitute in configuration files at startup time. | |
XML-based configuration for cooperating sets of Attivio components and workflows, supporting high-level descriptions of indexes, spellchecking, multi-node layouts, etc. | |
Configures how content is indexed. | |
Definition of components (Java objects that process content or queries) that can be use within Attivio. The component configuration defines named instances of specific components, such as transformers, the main building block of application logic. You can use components as services or within workflows. | |
Workflows define the available processing paths in Attivio. For example, the sequence in which components execute to process content or queries. | |
XML-based configuration for most Java objects. You can customize many components and Attivio Features with Spring beans. | |
Multi-Node Topologies | Modeling a set of of cooperating nodes in an Attivio system. Topologies let you use the same set of configuration files for all nodes in a system and automatically connect workflows running on different nodes. |
Configuration Files
On Attivio start-up, one or more XML configuration files must be specified. The default configuration file generated by createProject is <project-dir>\conf\configuration.xml.
Top-level Configuration
Attivio supports configuration options that control system-wide behavior such as:
- Default message queue sizes
- Component sizes
- Controlling message ordering
- Triggering events based on memory use
System-wide Configuration
The configuration element controls system-wide configuration. The <project-dir>\conf\configuration.xml file contains the default configuration of system-wide options for your project..
<message-queue>
The <message-queue> sub-element controls default settings for message queues in the system. The size attribute controls the default size of message queues. See Component Configuration for details.
<default-performance>
By default, Attivio starts multiple instances of each workflow component to provide parallel processing in the ingestion and query workflows. The default behavior is to start N instances of each component where N is the number of CPUs on the server as reported by Runtime.availableProcessors(). If the computer has only one CPU, however, Attivio starts two instances of each component.
Hyperthreading
If the CPUs are hyper-threaded, Runtime.availableProcessors() may report twice as many available cores as are physically present. For example, a system with 12 hyper-threaded cores may present as 24 processors. In that situation, Attivio may start twice as many component instances.
The <default-performance> sub-element lets you determine the number of instances to start. . This element's maxQueryInstances attribute controls the number of instances used for query-processing components. The maxIngestInstances attribute controls the number of instances used for ingestion components. Values less than "1" use the system default, which is always at least two instances, as described above.
<configuration projectName="factbook" > ... <default-performance maxQueryInstances="1" maxIngestInstances="1"/> </configuration>
These settings can be overridden for individual components by setting the component's maxInstances property, as when you edit the component using the AIE Administrator. Main article: Processor Utilization Tuning.
Controlling the Number of Documents Attivio Processes Concurrently
<messageDomains>
MessageDomains is an advanced feature that allows Attivio users to allocate ingestion workflow resources based on a classification of "messages", which are the basic units of Attivio processing. The specific message in question is the documentList message, which contains some batched set of attivioDocuments.
In practice, message domains are typically used so that a few large documents do not starve resources from a large number of small documents.
To control the number of concurrent documents during processing stages, insert a messageDomains element inside the project's <configuration> element. Within this element, you can specify any number of <messageDomain> sub-elements. A messageDomain can be used to logically group connectors and the processing stages in their ingest workflows. For example, you might have a set of connectors which scan "large" files. By assigning these connectors to their own messageDomain, you can effectively throttle how many large documents will be processed per component at any point in time, thereby ensuring that processing resources will be available for other connectors.
The perComponent attribute limits the number of documentLists any one component can process at a time. The number of documentLists being processed (in flight) defaults to -1, which means unlimited (by this setting). Note that the effective value of the perComponent attribute in any message domain is limited by maxIngestInstances. That is, if maxIngestInstances is set (or defaults to) 8, then any value of perComponent greater than 8 will effectively be 8 at runtime.
<configuration projectName="factbook" > <messageDomains> <messageDomain name="large"> <maxMessagesInFlight perComponent="1"/> </messageDomain> <messageDomain name="small"> <maxMessagesInFlight perComponent="5"/> </messageDomain> </messageDomains> </configuration>
In the above example, there are two message domains, "large" and "small".
- For documentLists marked with the "small" message domain, Attivio will allow 5 lists to be processed concurrently.
- For documentLists marked with the "large" message domain, Attivio will allow 1 list to be processed at a time.
Internally, this is implemented with separate processing queues at each component for each message domain. The consequence of this is that messages from one message domain will not delay processing of messages marked with another message domain.
You can assign a connector to use one or more messageDomains by adding the sizeToDomain property to its feeder:
<feeder class="com.attivio.connector.DirectMessagePublisher"> <properties> <map name="sizeToDomain"> <property name="1" value="small"/> <property name="10" value="large"/> </map> </properties> </feeder>
In the above example, documentLists that are larger than 1 MB, but smaller than 10 MB are marked with the "small" message domain. DocumentLists that are larger than 10 MB are marked with the "large" message domain. DocumentLists smaller than 1 MB are not marked with a message domain. To make a connector map all documents to a particular domain, simply create a single entry with the size "0" and the desired message domain name.
The size used in determining the message domain is the sum of a DocumentList message's constituent AttivioDocuments including the size of any ContentPointers.
<security>
You can use the <security> sub-element to secure access to various parts of the system. See the Security Guide for details.
Importing Configuration Files
Attivio loads all of the configuration files in the <project-dir>\conf\ directory tree.
Other Configuration Information
Dynamic Configuration
You can make changes to components and workflows while Attivio is running. These changes work in single and multi-node environments and persist across system restarts.
Main article: Dynamic Configuration
Logging Configuration
Logging inside Attivio is provided by a central AttivioLogger class. A user can change logging configuration at any time and the change will have system-wide effect.
Main Article: Attivio Logger