This is what we mean by "highlighting results." We searched for "nation." AIE highlighted "nation," "national," and "nationalities."
View incoming links.
Index-based vs. Document-based Highlighting
AIE offers index-based highlighting, in which the data is tokenized during ingestion, and document-based highlighting, in which tokenization occurs during a query.
Index-based highlighting can be configured in the Schema for tokenized string or text fields that are indexed, such as the title and text fields. (Highlighting is not supported for untokenized datatypes such as dates.)
To enable index-based highlighting for a field, you must configure the schema field in one of two ways:
- Set the highlight.enabled field property to "true", or
- Set highlight.enabled to "false", but explicitly set highlight.method to "offsets".
If a field has highlight.enabled set in the schema, then AIE will always highlight the field in every query. If highlight.enabled is "false", but highlight.method is set to "offsets", then index-based highlighting of this field can be enabled on demand by using the Teaser field expression in a query.
To enable document-based highlighting, set highlight.method to "document". This streamlines the ingestion process and slims the index, but slows down querying when highlighting is requested.
The Teaser field expression can be used on any field expression, regardless of the highlight.method used during ingestion. If the field has been prepared for highlighting (highlight.method is "offsets") the Teaser uses index-based highlighting. If the underlying expression has not been prepared for highlighting (highlight.method "document"), Teaser will tokenize the value at query time and will try its best to apply highlighting to the result. You can apply this approach to non-string fields/expressions with varying success.
View incoming links.
Schema Field Properties
Example Highlighted Field
This is an example of configuring the title field to support result highlighting:
Note that setting highlight.enabled to "true" means that highlight.method will default to "offsets", and therefore does not need to be specified in the example.
There are two primary methods for performing highlighting of documents at query time: index-based highlighting (with two sub-types), and document-based highlighting. Each method has its performance and disk space characteristics. Different methods will also produce different highlighted results.
Note that index-based highlighting requires that the schema field have the indexed attribute set to "true".
Index-based highlighting prepares data for highlighting during ingestion, in order to accelerate highlighting performance in queries. This approach requires extra disk space to store special data structures (known as term vectors) for each document. This method also allows for better highlighting of phrases, as well as proper highlighting of stems, entities, synonyms, and other inflected forms generated at index time.
The required disk space/performance can be tuned by storing the term vector data in one of two different ways.
Option 1. highlight.method = "index"
Setting the highlight.method to index requires AIE to build term vectors for a field and store them in the index, enabling very fast result highlighting during a query. This requires the most disk space of the three highlighting options, but it also produces the fastest queries.
Option 2. highlight.method = "offsets"
Setting the highlight.method to "offsets" requires AIE to build term vectors for a field and store them in the index, just like the index option, but in this case the term vectors are compressed to take up less disk space in the index. Query-time highlighting performance will be slower as a result, especially for wildcard queries.
Query performance of this method can be improved by setting the index.termVector field property to "true" for the field. This will require more disk space (but less than is required for the index method), and it will accelerate the performance of highlighting wildcard/expansion queries.
Document-based highlighting does not require indexing the field, because it tokenizes stored fields at query time instead of during ingestion. As a result, highlighting large documents at query time (especially using slow tokenizers) can be very CPU-intensive.
This method can be enabled by setting the highlight.method property of the field to "document".
Enabling Highlighting At Query Time
Highlighting at query time is enabled by calling QueryRequest.setHighlight(true) . This will result in all requested fields with highlight.enabled set to "true" in the schema being highlighted on every query.
Teaser and ScopeTeaser Field Expressions
The Teaser and ScopeTeaser Field Expressions let us request highlighting on individual fields at query time. They use the form of highlighting that is appropriate to the field. This makes it possible to highlight fields that are not configured for highlighting in the schema. It also lets us use different highlighting settings for fields that were configured for highlighting in the schema.
Using the Teaser or ScopeTeaser Field Expressions does not require turning on highlighting for the QueryRequest.
The following example is for Teaser. The ScopeTeaser field expression extends Teaser by tying the size of the result snippet to the position of scope tags in the text. See the Field Expressions page for more information.
Highlight Scope and Mode
In search results, the highlighted text is set off by scope tags. The text of the scope tag (default "highlight") is set by the QueryRequest.setHighlightScope() method.
The tag be set by the following HTML REST parameter:
or in java code:
The tag can be returned in one of two modes, either as an XML tag or as an HTML <span> tag.
These modes are set using the QueryRequest.setHighlightMode() method. The HTML REST examples are:
In Java it looks like this:
Highlighting Large Documents
Very large stored fields can have impacts on highlighting performance. In general, it is not recommended to index very large stored fields as this has impacts on disk usage, as well as memory usage during query retrieval.
If large stored fields will in general be highlighted, the following recommendations are suggested in order to reduce the CPU and memory burden.
Example Schema Configuration
This schema configuration has two text fields: text and text.full. text will be returned by default and will be highlighted if highlighting is enabled for the query. text.full must be explicitly requested. It is recommended that text.full is only requested for one document at a time. See query examples below.
This configuration requires that you copy the value of the text field to the text.full field in the ingestion workflow.
Example that highlights the text field:
Example that requests the "full text" for a specified document: