Page tree
Skip to end of metadata
Go to start of metadata

Overview

A facet is a list of distinct values for a specific field within an Attivio Intelligence Engine (AIE) schema. The individual values within a facet are referred to as facet values. For example, given a set of documents about countries, there might be a field of "region" on each document; so the facet for the "region" field would contain facet values of "Asia", "North America", etc... 

View incoming links.

Facet Features

Facets also provide counts, referred to as facet value counts, of the number of documents that contain each specific facet value; so, in the country document example, the "region" facet might provide facet value counts of 22 for "Asia" and 5 for "North America", indicating that there are 5 documents marked with "North America" in the "region" field.

Attivio can also compute facet statistics for the facet values of numeric fields, such as the sum, average, minimum, maximum, and standard deviation.

Facets allow users to filter search results based on a single dimension in the search results. For example, if a user's search results contain a list of countries with region metadata for each result, the user can request a facet on the region field and restrict the list of countries to "Far East" or "North America". Facets also provide a unique way of summarizing a set of information or viewing it along a different dimension; instead of showing field values on a per document basis, facets show distinct field values across all of the documents that share those values.

Facets for Filtering

Below is an example of using facets to filter search results. The first image shows a standard search result listing with facets presented on the left-hand side. Notice that the "Table" facet contains multiple values with a count next to each value. The count indicates the number of documents containing that value in the faceted field. For example, the facet value of "site" lists 22 next to it, meaning that there are 22 documents that have a "Table" field that contains a value of "site". The second image shows the filtered results after the a date range was chosen and the Key Phrase "Computer Science" was clicked and subsequently applied as a filter to the original search for "machine learning". Notice that above the results, the result count now lists "6 results found".

Figure 1: Listing of Facets for the query "machine learning"

Figure 2: Results for the query "machine learning" filtered using the "Date" and "Key Phrases" facets

Facets for Summarizing/Exploring

Below is an example of using facets to summarize information. In this example, facets are used to provide a dashboard-like view of the information in a search result set. An end user could look at the charts below and quickly realize that "Computing" is the most common category.

Figure 3: Facets used to summarize or explore information

Using Facets

Facets are requested as part of a QueryRequest through one of the AIE Client APIs (Java or HTTP/REST).

Facet Request Types

The simplest example of faceting is requesting a Discrete Facet on a category field. This facet will return a list of all categories available across matching documents, along with the number of documents that are in each category.

Main Article: Discrete Facets

Range facets are similar to Discrete Facets, except they aggregate facet value counts across all values in specified ranges. These ranges consist of a minimum value, a maximum value and a label.

Main Article: Range Facets

  Filter based facet  requests provide the greatest flexibility in defining facet buckets. Filter Based Facets allow using any query Filter  to be used to define a facet bucket. The bucket count is the number of documents that match both the query  and the Filter Based Facet bucket filter.

Main Article: Filter-Based Facets

  SchemaFacetRequest  objects provide the ability to determine which fields are populated for the documents matching a query. This type of facet can provide a view of the AIE Schema that is restricted to just the documents that match a particular query. For example, using the Quick Start Tutorial sample data, a SchemaFacetRequest could be used to determine which fields are present in country documents.

Main Article: Schema Facet

Facet Responses

Each facet requested with a query request will return a response with zero or more buckets . Each FacetResponse represents a different dimension that can be used for filtering results. Each FacetBucket contains a label, a filter and the number of documents that match the bucket's filter in the current result set. A bucket can be used to narrow the result set down using by Facet Filtering.

Bucket Sorting

All facets can specify the ordering for sorting buckets. If a facet buckets are sorted by COUNT , the buckets will be ordered by the number of documents mapped to each bucket. If the buckets are sorted by VALUE , the buckets will be ordered by the labels assigned to each bucket. Facet bucket sort order can be specified as ascending or descending.

Bucket sorting is controlled by the following settings:

Setting

Description

Sort Field

whether to sort buckets by a bucket labels or counts.

Primary Sort Order

The order for the primary sort

Secondary Sort Order

The order for the secondary sort

Secondary Sort

A secondary sort will be performed if the facet bucket values or counts contain duplicate values (depending on which one is used for the primary sort). If the primary sort is COUNT , then the bucket labels will be used as the secondary sort. If the primary sort is VALUE , then the bucket counts will be used as the secondary sort.

Bucket Filtering

Facets can be requested with the option to only return buckets with counts greater than a specified value and/or to request only the top N buckets.

minBucketCount > 0

AIE enforces the requirement that the minBucketCount parameter must be positive (> 0).  A value of minBucketCount = 0 therefore behaves as minBucketCount = 1.

 

Example (Java)

// Request at most 10 buckets each with a count greater than or equal to 5
FacetRequest facet = new FacetRequest("category");
facet.setMaxBuckets(10);  //return at most 10 buckets
facet.setMinBucketCount(5); //return only those buckets whose count is greater than 5

Example (HTTP/REST)

q=*:*&facet=category(maxBuckets=10,minBucketCount=5)

 

Facet Statistics

Facet Statistics compute statistical information about a facet's values. Statistics can be requested on any FacetRequest  or RangeFacetRequest  on a numeric field (integer, long, double, float).

When requesting statistics for a facet, a FacetResponse  object containing the computed statistics will be returned along with any buckets that are generated for the facet.

Facet Statistics Example:

In this example, the *:* query is used to calculate the facet over all documents. Supplying any other query will limit the facet calculation to only the documents returned by the specified query. The field "field_i" (which is an Integer field due to the Dynamic Field naming pattern of *_i) is used in this example. Any numeric field specified in the schema that has facets enabled could be used.

Sample Schema Field Definition:

<project_dir>/conf/schema.xml
...
   <field name="myfield" type="INTEGER" indexed="yes" facet="yes" .../>
...
Example (Java)
SearchClient client = ...; // Create the search client

// Create the query request
QueryRequest request = new QueryRequest("*:*");
FacetRequest facetRequest = new FacetRequest("field_i");
facetRequest.setCalculateStatistics(true); // Ask for statistics
facetRequest.setFacetFinder(false); // Disable facet finder (otherwise FacetRequest may be altered)
request.addFacet(facetRequest); // Add to query request

// Get the query response
QueryResponse response = client.search(request);
FacetResponse facet = response.getFacet("field_i");
FacetResponseStatistics stats = facet.getStatistics();

// Show the statistics
System.err.println("Facet Statistics:");
System.err.printf ("           Count: %d\n",   stats.getCount());
System.err.printf ("             Min: %d\n",   stats.getMin());
System.err.printf ("             Max: %d\n",   stats.getMax());
System.err.printf ("             Sum: %d\n",   stats.getSum());
System.err.printf ("  Sum of Squares: %d\n",   stats.getSumOfSquares());
System.err.printf ("            Mean: %.3f\n", stats.getMean());
System.err.printf ("       Mid Point: %.3f\n", stats.getMidPoint());
System.err.printf ("        Variance: %.3f\n", stats.getVariance());
System.err.printf ("           Stdev: %.3f\n", stats.getStandardDeviation());
Example (HTTP/REST)
q=*:*&facet=field_i(statistics=true)

 

Additional Statistics

The Median and Mode can be computed quite easily using facets.

Main Article: Facet Median
Main Article: Facet Mode

Facets in a Multi-partition Index Configuration

In a Multi-partition Index Configuration, facet calculations are performed against each index partition and then merged at the QueryDispatcher layer. In order to get accurate counts for facet buckets, all possible facet buckets must be calculated for each index and sent to the dispatcher.

Multi-Partition Index Facet Performance Considerations

Calculating accurate facet buckets and counts across multiple index partitions increases network load (in the case of partitioning across multiple physical nodes) and CPU load. This networking and CPU load can be reduced by having the index partitions only return a subset of the calculated buckets. Returning a subset of buckets will result in increased performance; however, some accuracy in the counts may be lost. By default, AIE is configured to preserve facet accuracy in a multi-partition index configuration despite a loss in performance.

Example (Java)

Sample code for reducing the performance impact of faceting across multiple index partitions.

FacetRequest facet = new FacetRequest("category");
facet.setMaxBuckets(10);
facet.setMinBucketCount(5);

// Set the maximum number of buckets to return to the dispatcher
// This should always be greater than the max buckets requested
// so that at least the max buckets number of buckets if returned
// if that many buckets exist across the multiple index partitions
facet.setDistributedMaxBuckets(160);

// Set the minimum bucket count a bucket must have to be returned back to the dispatcher
// This should always be less than or equal to the min bucket count requested
// so that at the user does not end up receiving no buckets when there are
// buckets that satisfy the min bucket count
facet.setDistributedMinBucketCount(2);

Example (HTTP/REST)

q=*:*&facet=category(maxBuckets=10,minBucketCount=5,distributedMaxBuckets=160,distributedMinBucketCount=2)

 

Filtering Results using a Facet

Facet buckets returned as part of a facet request can be used to filter the results of a subsequent Query Request .

Example (Java)

Sample code for filtering a query using a facet bucket:

// Execute the search query
QueryRequest query = new QueryRequest("\"hello world\"");
query.addFacet( new FacetRequest("category") );
QueryResponse response = client.search(query);

// Filter results using the first bucket returned for the category facet.
//  (assumes at least one bucket was returned)
FacetResponse facet = response.getFacet("category");
QueryRequest filtered = new QueryRequest("\"hello world\"");

//filter by the first category in the category facet
filtered.addFacetFilter( facet.get(0) );

// Execute the filtered query
response = client.search(filtered);

// Display the filtered response
for (SearchDocument doc : response.getDocuments()) {
 System.out.println(doc);
}

Example (HTTP/REST)

This example shows drilling down on a field-based facet, and drilling down on a range-based facet.

// Filter on discrete facet value 'Research & Development' for 'department' field
q=*:*&facet.filter=department:"Research & Development"

// Filter on facet range of on 'size' field
q=*:*&rangefacet.filter=size:[500 TO 900}
  • No labels