Overview
A facet is a list of distinct values for a specific field within an Attivio Intelligence Engine (AIE) schema. The individual values within a facet are referred to as facet values. For example, given a set of documents about countries, there might be a field of "region" on each document; so the facet for the "region" field would contain facet values of "Asia", "North America", etc...
View incoming links.
Facet Features
Facets also provide counts, referred to as facet value counts, of the number of documents that contain each specific facet value; so, in the country document example, the "region" facet might provide facet value counts of 22 for "Asia" and 5 for "North America", indicating that there are 5 documents marked with "North America" in the "region" field.
Attivio can also compute facet statistics for the facet values of numeric fields, such as the sum, average, minimum, maximum, and standard deviation.
Facets allow users to filter search results based on a single dimension in the search results. For example, if a user's search results contain a list of countries with region metadata for each result, the user can request a facet on the region field and restrict the list of countries to "Far East" or "North America". Facets also provide a unique way of summarizing a set of information or viewing it along a different dimension; instead of showing field values on a per document basis, facets show distinct field values across all of the documents that share those values.
Facets for Filtering
Below is an example of using facets to filter search results. The first image shows a standard search result listing with facets presented on the left-hand side. Notice that the "Table" facet contains multiple values with a count next to each value. The count indicates the number of documents containing that value in the faceted field. For example, the facet value of "site" lists 22 next to it, meaning that there are 22 documents that have a "Table" field that contains a value of "site". The second image shows the filtered results after the a date range was chosen and the Key Phrase "Computer Science" was clicked and subsequently applied as a filter to the original search for "machine learning". Notice that above the results, the result count now lists "6 results found".
Figure 1: Listing of Facets for the query "machine learning"
Figure 2: Results for the query "machine learning" filtered using the "Date" and "Key Phrases" facets
Facets for Summarizing/Exploring
Below is an example of using facets to summarize information. In this example, facets are used to provide a dashboard-like view of the information in a search result set. An end user could look at the charts below and quickly realize that "Computing" is the most common category.
Figure 3: Facets used to summarize or explore information
Using Facets
Facets are requested as part of a QueryRequest through one of the AIE Client APIs (Java or HTTP/REST).
Facet Request Types
The simplest example of faceting is requesting a Discrete Facet on a category field. This facet will return a list of all categories available across matching documents, along with the number of documents that are in each category.
Main Article: Discrete Facets
Range facets are similar to Discrete Facets, except they aggregate facet value counts across all values in specified ranges. These ranges consist of a minimum value, a maximum value and a label.
Main Article: Range Facets
Filter based facet requests provide the greatest flexibility in defining facet buckets. Filter Based Facets allow using any query Filter to be used to define a facet bucket. The bucket count is the number of documents that match both the query and the Filter Based Facet bucket filter.
Main Article: Filter-Based Facets
SchemaFacetRequest objects provide the ability to determine which fields are populated for the documents matching a query. This type of facet can provide a view of the AIE Schema that is restricted to just the documents that match a particular query. For example, using the Quick Start Tutorial sample data, a SchemaFacetRequest could be used to determine which fields are present in country documents.
Main Article: Schema Facet
Facet Responses
Each facet requested with a query request will return a response with zero or more buckets . Each FacetResponse represents a different dimension that can be used for filtering results. Each FacetBucket contains a label, a filter and the number of documents that match the bucket's filter in the current result set. A bucket can be used to narrow the result set down using by Facet Filtering.
Bucket Sorting
All facets can specify the ordering for sorting buckets. If a facet buckets are sorted by COUNT , the buckets will be ordered by the number of documents mapped to each bucket. If the buckets are sorted by VALUE , the buckets will be ordered by the labels assigned to each bucket. Facet bucket sort order can be specified as ascending or descending.
Bucket sorting is controlled by the following settings:
Setting | Description |
---|---|
whether to sort buckets by a bucket labels or counts. | |
The order for the primary sort | |
The order for the secondary sort |
Secondary Sort
A secondary sort will be performed if the facet bucket values or counts contain duplicate values (depending on which one is used for the primary sort). If the primary sort is COUNT , then the bucket labels will be used as the secondary sort. If the primary sort is VALUE , then the bucket counts will be used as the secondary sort.
Bucket Filtering
Facets can be requested with the option to only return buckets with counts greater than a specified value and/or to request only the top N buckets.
minBucketCount > 0
AIE enforces the requirement that the minBucketCount parameter must be positive (> 0). A value of minBucketCount = 0 therefore behaves as minBucketCount = 1.
Example (Java)
// Request at most 10 buckets each with a count greater than or equal to 5 FacetRequest facet = new FacetRequest("category"); facet.setMaxBuckets(10); //return at most 10 buckets facet.setMinBucketCount(5); //return only those buckets whose count is greater than 5
Example (HTTP/REST)
q=*:*&facet=category(maxBuckets=10,minBucketCount=5)
Facet Statistics
Facet Statistics compute statistical information about a facet's values. Statistics can be requested on any FacetRequest or RangeFacetRequest on a numeric field (integer, long, double, float).
When requesting statistics for a facet, a FacetResponse object containing the computed statistics will be returned along with any buckets that are generated for the facet.
Facet Statistics Example:
In this example, the *:* query is used to calculate the facet over all documents. Supplying any other query will limit the facet calculation to only the documents returned by the specified query. The field "field_i" (which is an Integer field due to the Dynamic Field naming pattern of *_i) is used in this example. Any numeric field specified in the schema that has facets enabled could be used.
Sample Schema Field Definition:
... <field name="myfield" type="INTEGER" indexed="yes" facet="yes" .../> ...
Example (Java)
SearchClient client = ...; // Create the search client // Create the query request QueryRequest request = new QueryRequest("*:*"); FacetRequest facetRequest = new FacetRequest("field_i"); facetRequest.setCalculateStatistics(true); // Ask for statistics facetRequest.setFacetFinder(false); // Disable facet finder (otherwise FacetRequest may be altered) request.addFacet(facetRequest); // Add to query request // Get the query response QueryResponse response = client.search(request); FacetResponse facet = response.getFacet("field_i"); FacetResponseStatistics stats = facet.getStatistics(); // Show the statistics System.err.println("Facet Statistics:"); System.err.printf (" Count: %d\n", stats.getCount()); System.err.printf (" Min: %d\n", stats.getMin()); System.err.printf (" Max: %d\n", stats.getMax()); System.err.printf (" Sum: %d\n", stats.getSum()); System.err.printf (" Sum of Squares: %d\n", stats.getSumOfSquares()); System.err.printf (" Mean: %.3f\n", stats.getMean()); System.err.printf (" Mid Point: %.3f\n", stats.getMidPoint()); System.err.printf (" Variance: %.3f\n", stats.getVariance()); System.err.printf (" Stdev: %.3f\n", stats.getStandardDeviation());
Example (HTTP/REST)
q=*:*&facet=field_i(statistics=true)
Additional Statistics
The Median and Mode can be computed quite easily using facets.
Main Article: Facet Median
Main Article: Facet Mode
Facets in a Multi-partition Index Configuration
In a Multi-partition Index Configuration, facet calculations are performed against each index partition and then merged at the QueryDispatcher layer. In order to get accurate counts for facet buckets, all possible facet buckets must be calculated for each index and sent to the dispatcher.
Multi-Partition Index Facet Performance Considerations
Calculating accurate facet buckets and counts across multiple index partitions increases network load (in the case of partitioning across multiple physical nodes) and CPU load. This networking and CPU load can be reduced by having the index partitions only return a subset of the calculated buckets. Returning a subset of buckets will result in increased performance; however, some accuracy in the counts may be lost. By default, AIE is configured to preserve facet accuracy in a multi-partition index configuration despite a loss in performance.
Example (Java)
Sample code for reducing the performance impact of faceting across multiple index partitions.
FacetRequest facet = new FacetRequest("category"); facet.setMaxBuckets(10); facet.setMinBucketCount(5); // Set the maximum number of buckets to return to the dispatcher // This should always be greater than the max buckets requested // so that at least the max buckets number of buckets if returned // if that many buckets exist across the multiple index partitions facet.setDistributedMaxBuckets(160); // Set the minimum bucket count a bucket must have to be returned back to the dispatcher // This should always be less than or equal to the min bucket count requested // so that at the user does not end up receiving no buckets when there are // buckets that satisfy the min bucket count facet.setDistributedMinBucketCount(2);
Example (HTTP/REST)
q=*:*&facet=category(maxBuckets=10,minBucketCount=5,distributedMaxBuckets=160,distributedMinBucketCount=2)
Filtering Results using a Facet
Facet buckets returned as part of a facet request can be used to filter the results of a subsequent Query Request .
Example (Java)
Sample code for filtering a query using a facet bucket:
// Execute the search query QueryRequest query = new QueryRequest("\"hello world\""); query.addFacet( new FacetRequest("category") ); QueryResponse response = client.search(query); // Filter results using the first bucket returned for the category facet. // (assumes at least one bucket was returned) FacetResponse facet = response.getFacet("category"); QueryRequest filtered = new QueryRequest("\"hello world\""); //filter by the first category in the category facet filtered.addFacetFilter( facet.get(0) ); // Execute the filtered query response = client.search(filtered); // Display the filtered response for (SearchDocument doc : response.getDocuments()) { System.out.println(doc); }
Example (HTTP/REST)
This example shows drilling down on a field-based facet, and drilling down on a range-based facet.
// Filter on discrete facet value 'Research & Development' for 'department' field q=*:*&facet.filter=department:"Research & Development" // Filter on facet range of on 'size' field q=*:*&rangefacet.filter=size:[500 TO 900}