A facet is a list of distinct values for a specific field within an Attivio Intelligence Engine (AIE) schema. The individual values within a facet are referred to as facet values. For example, given a set of documents about countries, there might be a field of "region" on each document; so the facet for the "region" field would contain facet values of "Asia", "North America", etc...
View incoming links.
Facets also provide counts, referred to as facet value counts, of the number of documents that contain each specific facet value; so, in the country document example, the "region" facet might provide facet value counts of 22 for "Asia" and 5 for "North America", indicating that there are 5 documents marked with "North America" in the "region" field.
Attivio can also compute facet statistics for the facet values of numeric fields, such as the sum, average, minimum, maximum, and standard deviation.
Facets allow users to filter search results based on a single dimension in the search results. For example, if a user's search results contain a list of countries with region metadata for each result, the user can request a facet on the region field and restrict the list of countries to "Far East" or "North America". Facets also provide a unique way of summarizing a set of information or viewing it along a different dimension; instead of showing field values on a per document basis, facets show distinct field values across all of the documents that share those values.
Facets for Filtering
Below is an example of using facets to filter search results. The first image shows a standard search result listing with facets presented on the left-hand side. Notice that the "Table" facet contains multiple values with a count next to each value. The count indicates the number of documents containing that value in the faceted field. For example, the facet value of "site" lists 22 next to it, meaning that there are 22 documents that have a "Table" field that contains a value of "site". The second image shows the filtered results after the a date range was chosen and the Key Phrase "Computer Science" was clicked and subsequently applied as a filter to the original search for "machine learning". Notice that above the results, the result count now lists "6 results found".
Figure 1: Listing of Facets for the query "machine learning"
Figure 2: Results for the query "machine learning" filtered using the "Date" and "Key Phrases" facets
Facets for Summarizing/Exploring
Below is an example of using facets to summarize information. In this example, facets are used to provide a dashboard-like view of the information in a search result set. An end user could look at the charts below and quickly realize that "Computing" is the most common category.
Figure 3: Facets used to summarize or explore information
Facets are requested as part of a QueryRequest through one of the AIE Client APIs (Java or HTTP/REST).
Facet Request Types
The simplest example of faceting is requesting a Discrete Facet on a category field. This facet will return a list of all categories available across matching documents, along with the number of documents that are in each category.
Main Article: Discrete Facets
Range facets are similar to Discrete Facets, except they aggregate facet value counts across all values in specified ranges. These ranges consist of a minimum value, a maximum value and a label.
Main Article: Range Facets
Filter based facet requests provide the greatest flexibility in defining facet buckets. Filter Based Facets allow using any query Filter to be used to define a facet bucket. The bucket count is the number of documents that match both the query and the Filter Based Facet bucket filter.
Main Article: Filter-Based Facets
SchemaFacetRequest objects provide the ability to determine which fields are populated for the documents matching a query. This type of facet can provide a view of the AIE Schema that is restricted to just the documents that match a particular query. For example, using the Quick Start Tutorial sample data, a SchemaFacetRequest could be used to determine which fields are present in country documents.
Main Article: Schema Facet
Each facet requested with a query request will return a response with zero or more buckets . Each FacetResponse represents a different dimension that can be used for filtering results. Each FacetBucket contains a label, a filter and the number of documents that match the bucket's filter in the current result set. A bucket can be used to narrow the result set down using by Facet Filtering.
All facets can specify the ordering for sorting buckets. If a facet buckets are sorted by COUNT , the buckets will be ordered by the number of documents mapped to each bucket. If the buckets are sorted by VALUE , the buckets will be ordered by the labels assigned to each bucket. Facet bucket sort order can be specified as ascending or descending.
Bucket sorting is controlled by the following settings:
whether to sort buckets by a bucket labels or counts.
The order for the primary sort
The order for the secondary sort
A secondary sort will be performed if the facet bucket values or counts contain duplicate values (depending on which one is used for the primary sort). If the primary sort is COUNT , then the bucket labels will be used as the secondary sort. If the primary sort is VALUE , then the bucket counts will be used as the secondary sort.
Facets can be requested with the option to only return buckets with counts greater than a specified value and/or to request only the top N buckets.
minBucketCount > 0
AIE enforces the requirement that the minBucketCount parameter must be positive (> 0). A value of minBucketCount = 0 therefore behaves as minBucketCount = 1.
When requesting statistics for a facet, a FacetResponse object containing the computed statistics will be returned along with any buckets that are generated for the facet.
Facet Statistics Example:
In this example, the *:* query is used to calculate the facet over all documents. Supplying any other query will limit the facet calculation to only the documents returned by the specified query. The field "field_i" (which is an Integer field due to the Dynamic Field naming pattern of *_i) is used in this example. Any numeric field specified in the schema that has facets enabled could be used.
Sample Schema Field Definition:
Facets in a Multi-partition Index Configuration
In a Multi-partition Index Configuration, facet calculations are performed against each index partition and then merged at the QueryDispatcher layer. In order to get accurate counts for facet buckets, all possible facet buckets must be calculated for each index and sent to the dispatcher.
Multi-Partition Index Facet Performance Considerations
Calculating accurate facet buckets and counts across multiple index partitions increases network load (in the case of partitioning across multiple physical nodes) and CPU load. This networking and CPU load can be reduced by having the index partitions only return a subset of the calculated buckets. Returning a subset of buckets will result in increased performance; however, some accuracy in the counts may be lost. By default, AIE is configured to preserve facet accuracy in a multi-partition index configuration despite a loss in performance.
Sample code for reducing the performance impact of faceting across multiple index partitions.
Filtering Results using a Facet
Sample code for filtering a query using a facet bucket:
This example shows drilling down on a field-based facet, and drilling down on a range-based facet.