Page tree
Skip to end of metadata
Go to start of metadata

Overview

This document describes the different in-memory caching techniques used within the Attivio Intelligence Engine (AIE) to improve query performance. For each caching technique, this document explains how the cache works. Sample configurations for each technique are also provided. 

View incoming links.

Query Caches

Query caches contain internal doc IDs and optional bitset. If faceting is enabled, the query cache value is larger (a bitset the size of the index - doc count / 8 bytes in size). Each query's cache memory usage is typically dominated by the cache key (the query). Queries and documents are cached separately. Query caches differ from document caches in that document caches contain the actual document data, cached on a per-doc basis, not a per-query basis.

AIE provides multiple caching mechanisms to speed up query execution. There are three query caches you can configure through the Index Feature. These caches use a LRU (Least Recently Used) cache and you can configure the maximum cache size for each independently.

See Index Searcher Properties for configuring engine level query caches.

Query Requests and Caching

This section describes how various AIE features and QueryRequest  settings affect caching in AIE.

Query Request Settings

You can explicitly enable/disable caching on a per query  basis. Disable Search, Filter, and Document caches by setting the QueryRequest's cacheable  property to false. By default, all QueryRequests are cacheable. In addition, if the QueryRequest's debug  property is set to true, then the query is not cached.

Freshness

Use of the Freshness  score function in query requests results in cache entries expiring when using a new Center Time is used. The Center Time determines the relative freshness of results, so if it changes for a particular query, the cached results are no longer valid and are removed from the cache. By default, when Freshness is used, the Center Time changes once an hour, at which time any existing cache entries expire.

The default relevancy model  uses Freshness to determine query result relevancy ranking.

Real Time Fields

Use of Real-Time Updates  in a QueryRequest results in any cache entries containing real-time updates to expire when a Commit  message using  COMMIT_REAL_TIME_FIELDS mode is sent to the AIE Index. The message results in the committing of real-time updates, which, in turn, requires the removal of any cache entries that contain old real-time values.

Field Caches

Field caches cache all values for a particular field in memory. This caching enables high-speed queries that use query features such as facets, sorting and JOINs. This section describes AIE's field based caches, and how to configure cache loading (latency), and retention policies (when the caches are garbage collected).

Field Cache Configuration

For every field cache type, there is a Schema Field Property to control the cache loading policy.

Field Cache Retention Policy

The field cache policy controls when the field cache loads and is removed from memory (i.e. when/if the field cache objects are reclaimed by garbage collection). You can control the field cache retention policy on a per cache basis using the cachename.cachePolicy schema field property.

Field Cache Configuration Table

Cache Policy

Description

static

cache loads at commit time and may be garbage collected upon a low memory event

soft

cache loads upon demand after commit and may be garbage collected upon a low memory event

The default setting for all caches is Cache Policy = soft.

Recommended Field Cache Configurations

Cache Policy

Use Case

static

cache is used frequently and cache load time should not impact first query after a commit.

soft

cache is used infrequently. Slower queries while cache is loading is acceptable.

Memory Warnings

To produce the most stable AIE memory profile, set the Cache Policy to static for only the most frequently-used caches.

Setting most/all field caches for all fields to Cache Policy = static is strongly discouraged. Doing so can result in running out of memory if a large number of fields are configured in this way.

Field caches generally use memory proportional to the number of documents in the index. This calculation includes deleted documents not yet reclaimed via an AIE Index OPTIMIZE.

Whenever you update a document, the old version is implicitly deleted.

To reduce the overall memory usage required for field based caches, run OPTIMIZE on the index frequently to reclaim old documents. Optimizing also reduces the load time required for field caches.

Facet Field Cache

The facet field cache speeds up facet generation during query execution.

SchemaField Properties:

Schema Field Property

Default Value

Description

facet.cachePolicy

soft

retention policy for the facet cache. Set to "static" to lock cache into memory

Memory Usage

The amount of memory used by the facet cache is proportional to the number of unique terms in the field and the average number of facet field values per-document, times the number of documents in the system.

Recommended Configurations

Configuration for frequently used facet field:

<field name="facetField" type="string" facet="true">
  <properties>
    <property name="facet.cachePolicy" value="static"/>
  </properties>
</field>

Configuration for rarely used facet field:

<field name="facetField" type="string" facet="true">
  <properties>
    <property name="facet.cachePolicy" value="soft"/>
  </properties>
</field>

Field Vector Cache

The field vector cache is a single value per document cache used for sorting, boosting, and filtering operations. This cache
is available when the schema field attribute sort="true | realtime" or index="true | realtime" in the schema for a particular field. If sorting
is enabled, the sort values are used for this cache, as the sorting index is single valued. If sorting is disabled
for the field, but indexing is enabled, the indexed values are used in this cache.

Caching using indexed values can have unforeseen consequences if the field is multi-valued.

SchemaField Properties:

Schema Field Property

Default Value

Description

vector.cachePolicy

soft

retention policy for the sort cache. Set to "static" to lock cache into memory

Deprecated Schema Field Properties:

The unified Field Vector properties (described above) replace the following schema field properties.  To support backwards compatibility, the properties below are used when vector properties are not specified.

Schema Field Property

Default Value

Description

sort.cachePolicy

soft

retention policy for the sort cache. Set to "static" to lock cache into memory

boost.cachePolicy

soft

retention policy for the static boost cache. Set to "static" to lock cache into memory

Memory Usage

The memory usage for a field vector cache is proportional to the number of documents in the system.

Recommended Configurations

Configuration for frequently used sort field:

<field name="sortField" type="integer" sort="true">
  <properties>
    <property name="vector.cachePolicy" value="static"/>
  </properties>
</field>

Configuration for rarely used sort field:

<field name="sortField" type="integer" sort="true">
  <properties>
    <property name="vector.cachePolicy" value="soft"/>
  </properties>
</field>

Join Field Caches

Join Field Caches are used in the execution of Relational (JOIN) queries against AIE.

SchemaField Properties:

Schema Field Property

Default Value

Description

join.cachePolicy

soft

retention policy for the JOIN cache. Set to "static" to lock cache into memory

terms.cachePolicy

soft

retention policy for the per-segment component of a JOIN cache. Set to "static" to lock cache into memory

Setting join.cachePolicy to "static" will implicitly set terms.cachePolicy to "static" as well

Memory Usage

The memory usage for a JOIN cache is proportional to the number of documents in the index.

Recommended Configurations

Configuration for frequently used JOIN field:

<field name="joinField" type="string" indexed="true" tokenize="false">
  <properties>
    <property name="join.cachePolicy" value="static"/>
  </properties>
</field>

Configuration for rarely used JOIN field:

<field name="joinField" type="string" indexed="true" tokenize="false">
  <properties>
    <property name="join.cachePolicy" value="soft"/>
  </properties>
</field>

Configuration for a JOIN field used only for multi-node joins or graph queries:

<field name="joinField" type="string" indexed="true" tokenize="false">
  <properties>
    <property name="terms.cachePolicy" value="static"/>
  </properties>
</field>

Multiple Field Join

The cache policy for multiple field JOINs is always "soft". You can significantly improve the cache loading for multiple field joins by setting the terms.cachePolicy field property to "static" for both fields involved in the multi-field join.

Configuration for a optimizing caching of multi-field join:

<field name="joinField1" type="string" indexed="true" tokenize="false">
  <properties>
    <property name="terms.cachePolicy" value="static"/>
  </properties>
</field>
<field name="joinField2" type="string" indexed="true" tokenize="false">
  <properties>
    <property name="terms.cachePolicy" value="static"/>
  </properties>
</field>

Real-Time Update Caches

Real-Time Updates require mapping Document IDs between the portions of AIE used to manage real-time and non-real-time fields. Real-time update caches store these mappings.

If sorting/faceting/boosting are enabled for a real-time field, sort/facet/boost caches for the field are loaded in addition to the real-time update cache for translating document IDs.

Memory Usage

Using real-time updates requires memory for holding real-time caches. Memory usage is proportional to the number of documents in the system.

Configuration Properties

Schema Field Property

Default Value

Description

rtfCachePolicy

 

STATIC

cache policy for real time field caches

Default Configuration

This is the default configuration for real-time caches. This configuration ensures that the first queries using real-time fields (after a commit/refresh) are subjected to cache at load time. To avoid this cost at query time, use the recommended configuration (next section).

<f:index name="index">
  <f:searchers>
    <f:property name="rtfCachePolicy" value="SOFT"/>
  </f:searchers>
</f:index>

Recommended Configuration

This is the recommended real-time cache configuration to reduce query latency.

<f:index name="index">
  <f:searchers>
    <!-- configure real time field caches to be loaded on commit/refresh (not latent)
      and stay in memory (no garbage collection) -->
    <f:property name="rtfCachePolicy" value="STATIC"/>
  </f:searchers>
</f:index>
  • No labels