Page tree
Skip to end of metadata
Go to start of metadata

Overview

You can configure the AIE Schema, and individual fields in the schema through attributes on the <field> definition, and by defining properties with the <properties> definition. The following sections discuss all the AIE-defined properties; however, you can add a property of any name and refer to it with your own custom code. 

View incoming links.

Example: Setting a Field Property

For example, you can use a field property to disable jmx registration for the title field:

<schema name="default">
  <fields default-search-field="content">
    <field name="date" type="date" indexed="true" stored="true"/>
    <field name="title" type="string" indexed="true" stored="true">
      <properties>
        <property name="jmx.enabled" value="false"/>
      </properties>
    </field>
    ...
  </fields>
</schema>

Setting a Property at Global Level

Use the a <properties> element defined beneath the <schema> tag and before the <fields> tag to define properties you want to apply to all fields in the schema. For example, to make all faceted fields have a maximum length of 50:

<schema name="default">
  <properties>
    <property name="facet.maxLength" value="50"/>
  </properties>

  <fields default-search-field="content">
    <field name="date" type="date" indexed="true" stored="true"/>
    <field name="title" type="string" indexed="true" stored="true"/>
    ...
  </fields>
</schema>

Setting a Property at Field Level

Use a <properties> element defined beneath the appropriate <field> or  <realtimeField> tag to define properties you only want to apply to a specific field. For example, to make only the date and title fields have a maximum length of 50:

<schema name="default">
  <fields default-search-field="content">
    <field name="date" type="date" indexed="true" stored="true">
      <properties>
        <property name="facet.maxLength" value="50"/>
      </properties>
    </field>
    <field name="title" type="string" indexed="true" stored="true">
      <properties>
        <property name="facet.maxLength" value="50"/>
      </properties>
    </field>
... (other field definitions) ...
  </fields>
</schema>

Properties set at field level override the same properties set globally. For example, if you set facet.maxLength to 50 globally (at the schema level) and set the same property to 25 for a specific field, that one field will use the 25 value, while all other fields which do not have field-level facet.maxLength properties defined will use the global setting of 50.

Changing Properties

Some properties require re-indexing content if they change. Failure to re-index after changing one of these properties can result in query failures, unexpected query results, or other runtime errors.

There are three levels of impact for whether or not re-indexing is required:

  • required - Reindexing is required after changing this setting to prevent query failures and other errors.
  • latent - Re-indexing is not required; however, previously indexed documents do not reflect the setting change and may not be properly represented in search results.
  • no - Re-indexing is not required when changing this property.

The level associated with each property are shown in the Reindex column in the tables that follow.

List of Defined Properties

Indexed Field Properties

Property

Type

Default

Reindex

Description

index.tokenizer

string

"default"

required

The name of the tokenizer to use for this field. See Tokenizer Registration for more information.

index.relevancy

boolean

true

no

Enable/Disable query boosts on field. If false, no relevancy boost is applied to query terms against this field.

index.termVector

boolean

false

required

Enable/Disable term vectors.

index.termVectorPositions

boolean

false

required

Enable/Disable term vector positions (index.termVector must be enabled first).

index.termVectorOffsets

boolean

false

required

Enable/Disable term vector offsets (index.termVector must be enabled first).

index.omitNorms

boolean

false

required

Enable/disable field length normalization for boosting (only on string fields.)

index.maxLength

int

4096

latent

Maximum length for a field value. If a value exceeds this length it will be dropped. Defaults to 4096 for string fields, but is unlimited for text fields.

index.maxTokens

int

10000

latent

Maximum number of tokens for a field value. Set to -1 for unlimited.  This only affects the information indexed, not the information stored (see store.maxChars).

date.resolution

enum: milliseconds, seconds, minutes, hours, days

seconds

latent

The resolution used for indexing date values. NOTE: this is only valid for the DATE data type. NOTE: this is the resolution used when calculating freshness scores against the field.

index.precisionStep

int

4 (integer,float), 6 (date,long,money,double)

required

Defines the precision step for numeric indexing. A value of 0 disables precision step based indexing. Use this setting to specify a different encoding for numeric values that supports faster range searches. For integer/float fields, the default is 4. For date/long/double fields, the default is 6. Smaller values result faster range searches and larger index size. Larger values result in slower range searches and smaller index size. Max value for integer/float fields is 32 (no optimization). Max value for date/long/double fields is 64 (no optimization).

Wildcard Properties

Property

Type

Default

Reindex

Description

wildcard.mode

enum: default, prefix

default

required

Wildcard optimization mode. Set to prefix to enable optimized prefix wildcard queries.

wildcard.step

int

2

required

Step for indexing optimized prefix wildcards. Smaller values result in faster prefix wildcard queries, but require more disk space for indexing.

wildcard.maxLength

int

10

required

Maximum length of optimized prefix wildcard term. Prefix wildcard queries longer than this length receive no optimization, however longer terms (in general) result in less possible terms with shared prefix.

wildcard.maxTermsint-1 (unlimited)noMaximum number of terms allowed during simple wildcard query expansion. Setting this value to 0 will disallow use of wildcard queries against this field (query will fail with exception) NOTE: this setting also applies to range query expansion.
wildcard.maxPhraseTermsint5000noMaximum number of terms allowed during phrase wildcard query expansion. Setting this value to 0 will disallow using wildcard queries in phrases (query will fail with exception). WARNING: setting this value too high (or unlimited) can result in system degradation and/or memory errors. In general, it is recommended to leave the default as is, or to set the value lower.

Facet Field Properties

Property

Type

Default

Reindex

Description

facet.minCutoff

integer

1

no

Default minimum frequency for facet counts returned in search.

facet.distributedMinCutoff

integer

1

no

Default minimum frequency for facet counts returned to top level dispatchers.

facet.maxNumBuckets

integer

10

no

Default maximum number of facet buckets to return.

facet.distributedMaxNumBuckets

integer

1000

no

Default maximum number of facet buckets to return to top level dispatchers.

facet.cached

boolean

true

no

If true, facet caches are cached in memory. Setting this to false makes AIE perform facet calculations directly from disk, which results in slower facet execution.

Enum Facet Properties (Expert)

Faceting can be tuned to support improved performance for fields where the number of unique terms is low.

Property

Type

Default

Reindex

Description

facet.enum

boolean

false

no

Request the use of enum based facets if possible. Enum facets are characterized as having very few possible unique terms. If field does not support enum faceting due to high term count, default implementation is used instead.

facet.enumMaxTerms

int

1024

no

Specify maximum number of terms to use for enum based faceting. Larger value results in higher per-query memory usage. Maximum value is 16384.

facet.enumFactor

int

100

no

Specifies cutoff factor for using compressed enum facet if optimal. Setting this value to 0 disables using a compressed enum facet. Setting facet.enum to "true" sets this value to a default value of 132. This property has a maximum value of 200 (setting higher will be equivalent to setting this to 200)

Performance of enum facets is proportional to the number of documents that match the query. If the query matches all documents in the index (:) or a large percentage of available documents, then enum-based faceting performs worse on average than default faceting. Enum based faceting is recommended when most queries issued against the system match less than 50% of the indexed documents.

Facet Tuning Parameters (Expert)

You can prune the number of terms used for faceting with the parameters below. Using these parameters results in facets returning approximate counts. Using these parameters can improve memory usage and facet performance by trading off accuracy/recall.

Property

Type

Default

Reindex

Description

facet.maxTerms

integer

unlimited

no

Sets the maximum number of terms loaded for facet computation.

facet.outlierFactor

double

2.0

no

Factor used for removing high frequency outlier terms. When values are ranked by frequency, this is the number of standard deviations to include in the facet list. Outliers beyond that point are discarded. Set to 0.0 to disable outlier cutoff (only applies when using facet.maxTerms)

These settings are applied on a per cache basis. When facets are segmented, or a partitioned index is used, accurate counts on returned buckets are not guaranteed.

Sort Field Properties

Property

Type

Default

Reindex

Description

collation.languageString<none>yesLocale language for collation. If specified, a collation index will be created for this field according to a language specific sort order. If left unspecified, sorting will occur according to unicode sort order.
collation.countryString<none>yesLocale country for collation comparisons.
collation.variantString<none>yesLocale variant for collation comparisons.
collation.strengthenum: primary, secondary, tertiary, identicalprimaryyesExpert: Strength for collation comparisons.
collation.decompositionenum: none, canonicalnoneyesExpert: Decomposition setting for collation comparisons.

Stored Field Properties

Property

Type

Default

Reindex

Description

store.maxChars

integer

unlimited

latent

Specifies the maximum number of characters to store per field value. This only affects the stored value of the field, not the information indexed for this field (see index.maxTokens).  It defaults to "unlimited."  Use it to reduce the amount of text stored for very large text field values.

You can store up to store.maxChars characters for each value of a multi-valued field.

store.hidden

boolean

false

no

If true, the field is not returned when asking for all fields (*) (useful for very large fields).

store.docValuesbooleanfalseyes

If true, stored values are indexed to support optimal per-field access. This has the potential to optimize SQL/streaming access in some situations.

Experimental: Setting this value to true may require reindexing during upgrade to next version.

Highlighting Properties

See Highlighting for examples of these field properties in use.

Property

Type

Default

Description

highlight.enabled

boolean

false

If true, highlight this field if QueryRequest.setHighlight(true) was set in the query.

highlight.method

enum

offsets

Sets the method for performing highlighting.
index = create term vectors in the index to support highlighting
offsets = create more compact term vectors at greater CPU investment
document = don’t create term vectors

See Highlighting Results

highlight.fragment

boolean

false

If true, splits the response field into fragments clustered around highlighted terms.
If false, returns the whole field with all matching terms highlighted.

highlight.mergeFragmentsbooleantrueIf true, and fragmenting is enabled, fragments will be merged into a single text value, separated by highlight.separator.

highlight.fallbackField

string

field name property is on

Field to use for fallback if fragmenting is enabled and no recommended fragments are found.

Text falls back on teaser.

The AIE Schema prescribes that the text field will fall back on the teaser field when highlighting is on but there are no keyword matches in the text field.  This means that the teaser field value will be displayed in place of the text field value in this situation.

highlight.numFragments

integer

3

Number of fragments to split response field into if highlight.fragment is true.

highlight.fragmentScopestringnullIf set, this will be the scope to use for creating snippets during highlighting.

highlight.fragmentSize

integer

100

Approximate size of each fragment in characters. NOTE: this setting is not used when highlight.fragmentScope is configured.

highlight.separator

string

"..."

Separator text to place between returned fragments.

highlight.whitelist

string

null

Comma separated list of fields. This list of fields defines which query terms are extracted from the query for highlighting. If not specified, all query terms are extracted from the query regardless of what field they are specified for. If specified, only query terms for fields in the whitelist are used for highlighting.

Decimal Field Properties

The following properties are for configuring the "decimal" field type.

PropertyTypeDefaultReindexDescription
decimal.precisioninteger38latentMaximum precision allowed for values in this field. Max value of 38. Document values that exceed this precision will result in a warning.
decimal.scaleinteger0latentNumber of decimal digits following the decimal point.
decimal.precisionStepinteger0required

Precision step for optimized range search indexing. Setting this to a value greater than 0 will result in adding optimized range search index.
Smaller values result in larger indexes (and faster queries). Larger values will result in smaller indexers but will result in less optimal range searches.
When using frequent range searches on a decimal field, recommended values are: 1, 2 or 4.
NOTE: you must reindex after changing this setting. failure to reindex may result in range searches not matching all desired documents. 

 

Query/Index Shared Properties

Property

Type

Default

Reindex

Description

stopwords.mode

enum: off, query, index

off

latent

Specify when to remove stopwords (if at all)

If stopwords.mode = query, stopwords are removed from queries. 

If stopwords.mode = index, stopwords are removed during indexing, and also during queries.

Main article: Stopword Removal

spellcheck.field

string

null

no

If set, uses the specified field's spell-check dictionary instead of its own.

Document Transformer Properties

Property

Type

Default

Reindex

Description

workflow.date.format

string

""

latent

Date and time pattern suitable for java.text.SimpleDateFormat. Add this format to the list of default date formats that AIE can parse into java.util.Date objects during ingestion.

workflow.date.defaultTimezone

string

"UTC"

latent

Default timezone for dates if not specified in the date/time itself.

Field Cache Properties

The properties below control the creation and retention of field caches used for search execution.

*.cachePolicy settings determine the cache loading policy for the cache. static = locked into memory, soft = load cache on demand, release on low memory condition.

Property

Type

Default

Reindex

Description

cache.compressed

boolean

false

no

If true, memory caches for this field are compressed. Enabling this reduces memory usage in general, but results in slower access to caches for sorting/faceting/boosting/joining.

facet.cachePolicy

enum: static, soft

soft

no

Cache Retention Policy for facet cache.

join.cachePolicy

enum: static, soft

soft

no

Cache Retention Policy for join cache.

Scope Properties

The following field property is related to Scope Search and Loading XML Content.  It relates specifically to the extractScopes component of the xmlScopeIngest workflow, which copies XML elements into the text field as scopes.  The typical use case would be to set this property on the text field in order to create attribute scopes as well as element scopes.

PropertyTypeDefaultReindexDescription
scope.xmlAttributesBooleanfalsenoIf true, the extractScopes transformer will extract XML attributes into attribute scopes (prefixed with @)

 


Natural-Language Fields

Many parts of the AIE documentation refer casually to "natural-language fields."  These are the AIE Schema fields that have the naturalLanguage property set to true.  This includes the title and text fields, by default, but also includes any dynamic fields that have names ending in _text such as (the imaginary field) testimony_text

PropertyTypeDefaultReindexDescription
naturalLanguageBooleanfalsenoIf true, this field is identified as a natural language field and is suitable for advanced linguistics processing.

Look for your project's schema in <project-dir>\conf\schema\default.xml