Page tree
Skip to end of metadata
Go to start of metadata

Overview

Indexing and query execution in AIE is performed by the Attivio Engine Service. See the Configure the Index page for information on configuring indexing.

View incoming links.

Segment Life Cycle: Ingest, Commit, Optimize

When documents are ingested by AIE, the new index records are first accumulated in memory until a block of sufficient size can be written (or "flushed") to disk as a new index segment. A segment is a file containing index records. New segments are added to the index as long as document ingestion continues. See Segment Flushing, below.

These new segments cannot be queried until a Commit Message is issued. A commit makes sure that all new segments are written to disk and reloads all memory-resident data related to them. Its resource utilization is proportional to the size of the new segments and, to a lesser extent, if caches are configured to be loaded at commit time, to the size of the whole index.

At any time, the index may consist of one to many segments of various sizes. Incoming queries have to be run against each segment in turn, and the query results from all segments must be merged. If there are many segments, query performance can suffer.  For this reason AIE combines small segments into larger ones that can be queried more efficiently.

In systems that perform frequent updates to indexed documents, a significant number of record deletions occur.  When a record is deleted, it is not actually removed from its segment.  It is simply flagged as being inactive.  These "deleted" records accumulate and gradually degrade query performance. 

AIE addresses these issues in two ways:

  • Automatic Merging: AIE monitors the set of index segments and automatically merges smaller segments into larger ones. In the merging process, deleted records are dropped. This is a background task that operates on all segment files that are below a certain size. The strategy for selecting the best segments to merge is embodied in the Default Merge Policy. The goal of the merge policy is to respond to ongoing indexing and deletion by maintaining a mix of segments that are relatively efficient to query and are relatively free of deleted records.  See Automatic Segment Merging, below.
  • On-Demand Optimization: When desired, you may issue an Optimize Message to AIE. Optimization aggressively merges segments to create a minimal number of large, deletion-free segments.  This has little impact on query performance, but is essential for updating facet finder.

    On-demand optimization can be performed as a foreground task that blocks new ingest messages until it is completed, or as a background task that operates on current segments but allows ingestion to continue and new segments to be created. In either mode, however, its aggressive merging of segments requires significant I/O and CPU resources, and degrades both query and ingestion performance until it completes. If system resources are limited it should be performed during off hours if possible. AIE lets you control this task manually so that you can choose an appropriate time to run it. See On-Demand Optimization, below.

Remember that newly-ingested records are not visible to queries until they committed. Query service is still available during both a commit and an optimize, although performance may suffer.

Indexing Messages

The AttivioEngine responds to five types of indexing messages from Content Dispatchers. (See Load Data and Content for more information on feeding documents and other indexing messages.)

DocumentList Message

When the AttivioEngine receives a DocumentList  message, all IngestDocuments  contained in the DocumentList will be indexed. These documents will not be searchable until the next Commit Message or Optimize Message is performed.

Commit Message

When the AttivioEngine receives a Commit  message, it will flush all outstanding documents to disk as a new segment. Uncommitted segments will then be made searchable for future queries.

Optimize Message

When the AttivioEngine receives an Optimize  message, it will first execute a commit, i.e., flush all outstanding documents to disk. Then it will merge all the index segments into one - or optionally a few - segment(s). Once merging is complete, the new index will be put live and any outstanding documents will be searchable.

On-demand optimization can be performed as either a foreground of background task:

  • If executed in the foreground, optimization blocks new ingest messages until it is completed. When it completes all the index segments are merged into one (or more if so requested) optimized segments and all the updates up to that point are searchable.
  • If executed in the background it commits and optimizes all current updates, but allows ingestion to continue and new segments to be created. The sequence of operations is as follows:
    • Commits all current updates (i.e., flushes them to disk). While this is taking place, ingestion is blocked.
    • Schedules a background merge and resumes ingestion.
    • Makes the committed segments searchable.
    • While the background optimize proceeds, new commit messages can be sent. When it completes, it does not automatically make the optimized segments searchable. They become searchable after the next commit.

Optimize operations may impact query performance since they perform lots of disk IO in order to merge segments.

Optimizes may take some time. It is recommended that optimizes are are not performed during peak hours due to their impact on content ingestion and query performance.

For large indexes, the Optimize operation can take a long time. In order to ensure that outstanding documents will be searchable as soon as possible it is recommended that you send a Commit Message prior to sending an Optimize Message.

  • No labels