Page tree
Skip to end of metadata
Go to start of metadata

Overview

This page contains several detailed diagrams that illustrate AIE's architecture.  

View incoming links.

Top Level Architecture

Top Level

Figure 1: AIE Architecture Overview

Section

Description

Content/Data Sources

AIE can ingest structured and unstructured data and content from almost any source including databases, file systems, CMS systems, email systems and more.

API Layer (Ingestion)

Exposes Java end points for API communication.

Ingestion Services and Workflows

Asynchronous workflows for cleansing and enriching content before it is persisted in the Universal Index.

Universal Index

Repository which stores all structured and unstructured content within AIE and is responsible for executing all query requests.

Query and Response Services and Workflows

Synchronous workflows for modifying queries and query results between the query client and the Universal Index.

API Layer (Query)

Exposes Java, HTTP/REST end points for API communication.

System Services

ZooKeeperCentralized coordination system for service discovery, single-sourced configuration, distributed locks, leader election, and other global data.
StorageCentralized system for durable content, audit, document, and ingestion history storage.

Content/Data Sources and API Layer

Associated Documentation:

Data Sources

Figure 2: Content/Data Sources and API Interactions

Ingest Workflows and Services

Associated Documentation:

Ingest

Figure 3: Ingestion Workflows and Services

Universal Index

Associated Documentation:

Universal Index

Figure 4: The Universal Index

Query and Response Services and Workflows

Associated Documentation:

Query and Response

Figure 5: Query and Response Services and Workflows

Query API Layer

Associated Documentation:

Query API Layer

Figure 6: Query Clients and API Interactions

Storage

AIE requires durable storage to support a number features. This includes but is not limited to content storage (sometimes referred to as the Content Store), ingestion history (supports incremental ingestion), audit, fault tolerance, and document storage.  These features store data not appropriate for the AIE index and thus are stored externally.  When running in an unclustered mode where only a single host is used, these data are stored within a combination of an embedded SQL database and the local filesystem.  Writes to storage are atomic in nature.  This database is housed in the aie-store process.  Storage API services in the unclustered configuration access the embedded database via JDBC connections.  When running in clustered mode where multiple hosts are used, AIE requires a Hadoop backend.  AIE leverages this backend for scalable durable highly available file (HDFS) and key-value oriented (HBase) storage.  In this configuration the aie-store process provides an endpoint for REST APIs and nothing else.   Storage API services in the clustered configuration utilize direct HBase and HDFS client code and do not interact directly with any AIE process.

Summary

 UnclusteredClustered
aie-store processUsed to embed databaseNone
file storageLocal filesystemExternal HDFS
data storageEmbedded SQL databaseExternal HBase
fault toleranceLimited to node failures. Disk failures will result in potential loss of data.Complete fault tolerance when Hadoop cluster appropriately configured
high availabilityNoneHighly available as provided by Hadoop cluster
scalabilityLimitedHighly scalable data and processing
servicesJDBC connections to embedded databaseDirect HDFS and HBase client connections to external cluster

Fault tolerance is not supported for production with AIE 5.2. However, the concept is documented to describe system behavior when fault tolerance is supported and enabled for production use.

storage architecture details

 

  • No labels