Overview
Most of the AIE documentation concerns mechanisms that operate within the AIE application, such as document ingestion and querying. This page is concerned with the external applications and services that surround and support the AIE nodes. Developing in AIE requires some understanding of these external programs and their relationships with one another.
This is a schematic showing most of the entities in a running AIE project. We'll elaborate this basic diagram to illustrate some of the essential concepts and tools of AIE development.
View incoming links.
AIE Applications, Services, and Directories
This section introduces the various AIE applications and services that work together to manage an AIE project.
AIE Agent
The AIE Agent is the keystone application for an AIE project. It manages communications among all of the critical elements of AIE. It must be started first before any other AIE activity can begin. It must be running on every server where AIE has been installed. The Agent instances "find" each other and form a control network across all the AIE nodes of the project.
For production systems, is it most convenient to run the AIE Agent as a service that starts automatically when the server boots up. This is the default setting during installation. It is especially convenient for multi-node systems on multiple servers.
The AIE Agent can also be run as an application (<install-dir>\bin\agent.exe). It runs in a command window.
To stop it, just close the window. This option is convenient during early development when it is a common practice to delete all previous project data and restart the system. If the Agent is running, you can't delete the data files until you stop it. Stopping and restarting the agent application is easier than stopping/restarting the service.
For the most part, AIE developers start the Agent and then ignore it. It runs in the background and requires no attention. The Agent has a monitoring UI you can view at http://<host>:16999/admin that shows the location of the data-agent directory (see next section) and the current list of processes that the Agent is running.
Main Article: Starting and Stopping Attivio
Data-Agent Directory
The AIE Agent maintains a data directory (../data-agent/) where it stores runtime information about the AIE project(s). Note that the project's source files are not kept here. Those are in the project directory.
Information stored in the data-agent directory includes the project's index files, the data accumulated by the Performance Monitor, the current Store files (document store, connector history, and much more), and all log files. Look for the aie-agent.log file in the data-agent node itself.
Do not edit the data-agent files!
We would not normally edit the files that are stored in the data-agent area. We do open and examine the log files.
The index files in the data-agent area are not usually in a restorable state, so backing them up is not helpful.
During early development, when the project topology and index design are still changing, and when the project nodes are all on a single host, it is a common practice to delete the data-directory files periodically. This lets us rebuild the project from a clean slate. Be advised that deleting the data-agent directory throws away the project index. You have to reload all of your documents.
In a multi-host system, there is one data-agent directory on each host. This directory contains files that are pertinent to the AIE nodes that run on that host.
The location of the data-agent directory is configurable when you start AIE Agent. The default location is ../data-agent, meaning it will be a sibling of the AIE installation directory, wherever you installed it. Using default values during the installation, this turns out to be <drive>\attivio\data-agent.
If you are starting the AIE Agent manually, you can set the data-agent location as a command-line switch. If using the AIE Agent service, you can set the data-agent location using Windows Registry Key editor to edit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\aie-agent-service\ImagePath
.
Main Article: Starting and Stopping AIE
Configuration Server(s)
AIE coordinates the content of multi-node systems through configuration servers. Configuration servers have two missions:
- They deploy a single set of project configuration files, which are provided to all AIE nodes on startup. That way all nodes are always running the same configuration.
- They publish the master indexer's index revision number, which coordinates identical index updates across all nodes.
Configuration servers are started by the CLI, working from the <configservers> elements in <project-dir>\conf\environments\<environment>\topology-nodes.xml, through the AIE Agent. Configuration servers can run on any node where AIE is installed, but it is not necessary to run a configuration server on every AIE node.
A multi-node production system requires at least three configuration servers to provide fault-tolerant recovery. A development system can get by with only one. It is illegal to create a project that has exactly two configuration servers because fault-tolerance cannot be guaranteed in that configuration.
Information maintained by the configuration servers will be found in the data-agent directory tree, although generally not in a user-readable form.
Configuration server status may be monitored on the project servers list in CLI.
Main Article: Multi-Node Topologies
Store
AIE manages a data-storage layer referred to as the store. The store provides storage for documents and ingestion history outside of the AIE index.It can operate in fault-tolerant (where data is automatically replicated) or non-fault-tolerant modes. AIE manages the configuration and startup of the store via the <store> elements in the topology-nodes.xml file.
One does not normally interact with the store. It runs in the background and does not need user attention.
The AIE store consists of multiple data stores that are outside of the index. These include:
- Document Store: The document store lets AIE save local copies of documents so that they can be more easily reloaded.
- Connector History: Connector incremental update history is maintained by the store.
Information maintained by the store will be found in the data-agent directory tree, although some of it will not be user-readable.
Store status may be monitored on the project servers list in CLI.
Performance Monitor
An AIE project has a single performance monitor service that collects metrics and events from all of the AIE nodes and makes them available for inspection in the AIE Administrator. There is also a separate perfmonviewer.exe application for viewing the data when the AIE nodes are not running.
Performance trends and events may be viewed as a time graph in the AIE Administrator. The usual metrics involve document ingestion rates and query processing rates, but the menu of available metrics is very comprehensive. You can create customized graphs as required.
The performance monitoring service is configured in the <perfmonserver> element of topology-nodes.xml.
The performance monitoring service is started from the CLI. The service's status may be monitored there.
The data collected by the performance monitor is stored in the data-agent directory tree.
Main Article: Performance Monitoring
AIE Nodes
An "AIE node" is a running instance of AIE. This is where document ingestion, index maintenance, and query serving occur. An AIE project must have at least one node, and may have hundreds of nodes.
In a typical production system, each AIE node runs on its own dedicated server. During development and testing, however, it is possible to run multiple nodes on a single server. A project can have multiple environments, meaning different topologic allocations of nodes to servers. The developer can start up the system in whichever environment seems appropriate for the day's activities. The environments share the project's configuration files, but maintain separate indexes.
The diagram below is from the Using Java APIs page. It shows the general architecture of an AIE node, with document ingestion on the left and query serving on the right. The AIE Universal Index is in the center.
AIE nodes are started/stopped from the CLI. Their index files are stored in the data-agent directory tree.
Main Article: Attivio Intelligence Engine (AIE)
AIE Administrator
The AIE Administrator is a browser-based control panel for an AIE node. It contains many control, editing, and monitoring features. For instance, this is where we perform and monitor ingestion.
The AIE Administrator is a complex environment with many different screens. This is the System Information screen.
The AIE Administrator offers dynamic configuration of workflows, workflow components, and connectors. One can use forms-based editors to make changes in these components and the edited objects will become active immediately in the running system. This is very convenient when adapting AIE to the needs of project-specific kinds of documents. Note that the dynamic changes reside in the configuration servers, and must eventually be "updated" in order to add them to the permanent configuration files in the project directory.
To run the AIE Administrator, you can direct your browser to any running AIE node in this fashion:
http://<host>:<baseport>/admin/info
Main Article: Use the Attivio Administrator
Version Control and Backup
You may back up the project configuration files in the project directory.
You should not back up the files in the data-agent area. The project configuration files are not all present here. Simply copying the data files at an arbitrary moment does not capture a consistent or complete view of the index.
Command Line Interface (CLI)
The AIE Command Line Interface is a small-footprint utility that runs in an interactive command window. It lets us start, stop and monitor multiple servers, and also provides tools for managing the relationship between the project's source files and their image as reflected on the configuration servers.
The CLI is a very simple interactive utility:
To run it, execute <install-dir>\bin\aie-cli.exe.
Main Article: Starting and Stopping Attivio
Editing local files vs. Administrator vs. AIE CLI
The AIE Administrator and the local project files are the twin pillars of AIE project development. Developers regularly shift from one to the other in the course of their work. Sometimes this is a little confusing for new users.
The Command Line Interface (AIE-CLI or just CLI) is a small-footprint console for controlling a project. Use it to control your Attivio system.
The table below provides a side-by-side comparison of the three tools, along with observations about their use.
Task | AIE Administrator | Editing Local Files | Command Line Interface (CLI) | Comments |
---|---|---|---|---|
Used for Development | Yes | Yes | No | |
Used for Production | Yes | No | Yes | Administrator lets us view detailed performance data. |
Start and stop AIE project servers. | No | No | Yes | You can stop one AIE node from the Administrator. You can't start it from the Administrator because the node is the webserver for the Administrator. From the CLI you can start/stop all of the project servers and all of the AIE nodes in the project. |
Create a new project | No | Yes | No | See also the createproject.exe tool. |
Add modules to a project | No | Yes | No | The createproject.exe tool can modify a project by incrementally adding modules to it. |
Update and Deploy a project. (Plus other tools for managing the project on the configuration servers.) | No | No | Yes | The CLI has tools for restoring a consistent project state after a configuration server or node crash. |
Compare and reconcile conflicts between local files and dynamic configuration files. | No | No | Yes | CLI has some helpful tools for this situation but doesn't show you the files. |
Configure or create connectors, components, and workflows. | Yes, mostly. | Yes | No | The Administrator offers dynamic configuration, which makes changes instantly available for testing. However, not every property and attribute is editable. When editing files locally, a redeploy and restart (of all AIE nodes) are required before you can test them. |
Create new transformer classes in Java. | No | Yes | No | |
Execute custom Java Client API code | No | Yes | No | |
Start and stop document connectors | Yes | No | No | |
Delete the index | Yes | No | No | When developing and testing a connector, it is very convenient to be able to delete the index and start over. |
Display system information | Yes | No | Some | Administrator has many displays devoted to system information, lists of active services, environment variables, and similar information. |
Diagnostic Export | Yes | No | No | Administrator can generate a .zip file containing the configuration files, log files, and status of an AIE node, for debugging purposes. |
Logging Level Settings | Yes | No | No | |
View Server Log | Yes | Yes | Some | CLI has a limited ability to display error messages from server logs. |
Attivio Business Center and User Role Management | Yes | No | No | |
Dictionary Management | Yes | No | No | |
Debug Search and Search UI | Yes | No | No | |
Edit the project topology files | No | Yes | No |
Understanding Deploy and Update
It is critical for the AIE project developer to understand the significance of the "deploy" and "update" features of the CLI.
A developing AIE project has configuration files in two different locations:
- The master configuration files are in the project director. This is where the "official" project sources reside. Files in this directory might be in a partly-edited, inconsistent state that cannot be loaded on an AIE node.
- A "deployed" project is a snapshot of the configuration files at a point when they are loadable onto all AIE nodes. These files are managed by the configuration servers.
When we "deploy" a project, we release a new version of the project to the AIE nodes, via the configuration servers.
This would all be very straightforward except for one thing. The AIE Administrator provides for dynamic configuration, which is such a convenience that just about everyone uses it. This feature lets us make configuration changes to connectors, workflows and components using form-based editors, and to see the changes take effect on running AIE nodes without requiring a restart. The unexpected detail is that the altered configuration files reside with the configuration servers, not on disk.
The dynamic configuration changes are not reflected in the project source files until we "update" the sources with the new changes. At that point, the configuration servers generate new XML configuration files and save them in the project directory. This creates a single consistent set of sources again.
At this point, it is a Best Practice to wipe the project off the configuration servers and re-deploy it. We can do this easily from the CLI by executing the deploy force command.
This option deletes the current deployed project from the configuration servers, and replaces it with a complete set of configuration files from the project directory.
At this point the "dynamic" changes have been erased, the equivalent source files are in place, and the project has been redeployed.
Resolving Conflicts
For the most part, updating and deploying a project is very straightforward. The only place you can go wrong is to make dynamic modifications to a component, workflow, or connector, while simultaneously making different updates to the same object on disk. That creates a version conflict that must be resolved before updating or deploying again.
If there is a conflict, the CLI will refuse to redeploy the project until you resolve all conflicts. Files with changes made via the Admin will have a .remote extension. Use a diff tool to observe differences and make appropriate edits to resolve the conflicts. Delete the .remote version once complete.
Once the local source files are up to date, be sure to redeploy them to the configuration servers using the "deploy force" feature of CLI. Only at that point are the local and remote files in sync again.