Overview
This page contains several detailed diagrams that illustrate AIE's architecture.
View incoming links.
Top Level Architecture
Figure 1: AIE Architecture Overview
Section | Description |
---|---|
Content/Data Sources | AIE can ingest structured and unstructured data and content from almost any source including databases, file systems, CMS systems, email systems and more. |
Exposes Java end points for API communication. | |
Asynchronous workflows for cleansing and enriching content before it is persisted in the Universal Index. | |
Repository which stores all structured and unstructured content within AIE and is responsible for executing all query requests. | |
Synchronous workflows for modifying queries and query results between the query client and the Universal Index. | |
Exposes Java, HTTP/REST end points for API communication. | |
System Services | Services that are not specifically tied to AIE workflows; for example, the Attivio Logger. |
ZooKeeper | Centralized coordination system for service discovery, single-sourced configuration, distributed locks, leader election, and other global data. |
Storage | Centralized system for durable content, audit, document, and ingestion history storage. |
Content/Data Sources and API Layer
Associated Documentation:
- Load Data and Content
- Java Client API
Figure 2: Content/Data Sources and API Interactions
Ingest Workflows and Services
Associated Documentation:
Figure 3: Ingestion Workflows and Services
Universal Index
Associated Documentation:
Figure 4: The Universal Index
Query and Response Services and Workflows
Associated Documentation:
Figure 5: Query and Response Services and Workflows
Query API Layer
Associated Documentation:
- SAIL
- Java Client API
Figure 6: Query Clients and API Interactions
Storage
AIE requires durable storage to support a number features. This includes but is not limited to content storage (sometimes referred to as the Content Store), ingestion history (supports incremental ingestion), audit, fault tolerance, and document storage. These features store data not appropriate for the AIE index and thus are stored externally. When running in an unclustered mode where only a single host is used, these data are stored within a combination of an embedded SQL database and the local filesystem. Writes to storage are atomic in nature. This database is housed in the aie-store
process. Storage API services in the unclustered configuration access the embedded database via JDBC connections. When running in clustered mode where multiple hosts are used, AIE requires a Hadoop backend. AIE leverages this backend for scalable durable highly available file (HDFS) and key-value oriented (HBase) storage. In this configuration the aie-store
process provides an endpoint for REST APIs and nothing else. Storage API services in the clustered configuration utilize direct HBase and HDFS client code and do not interact directly with any AIE process.
Summary
Unclustered | Clustered | |
---|---|---|
aie-store process | Used to embed database | None |
file storage | Local filesystem | External HDFS |
data storage | Embedded SQL database | External HBase |
fault tolerance | Limited to node failures. Disk failures will result in potential loss of data. | Complete fault tolerance when Hadoop cluster appropriately configured |
high availability | None | Highly available as provided by Hadoop cluster |
scalability | Limited | Highly scalable data and processing |
services | JDBC connections to embedded database | Direct HDFS and HBase client connections to external cluster |
Fault tolerance is not supported for production with AIE 5.2. However, the concept is documented to describe system behavior when fault tolerance is supported and enabled for production use.