Page tree

Overview

Multi-level faceting is a feature designed to support visualizing of a hierarchy for the documents matching a query, and support filtering documents on arbitrary levels of the hierarchy. This supports deeper level of analytics over all documents matching a query using multiple fields. Facet Requests are now updated to support specifying multiple levels. Each FacetRequest will include a "next level" which itself will be a FacetRequest. Similarly, Facet buckets are updated to contain responses for the next level of a facet bucket. 

Example

If we have 5 documents as below, faceting on state and city in the next level, allows user to filter documents of different state and city combinations.

Source Content:

  • Document 1: state = MA, city = Boston
  • Document 2: state = PA, city = Pittsburgh
  • Document 3: state = MA, city = Newton
  • Document 4: state = MA, city = Boston
  • Document 5: state = NY, city = <null>

Request:

  • Facet on state, then on city

Expected Results:

  • State: MA (count 3)
    • City: Boston (count 2)
    • City: Newton (count 1)
  • State: PA (count 1)
    • City: Pittsburgh (count 1)
  • State: NY (count 1)

Usage of the Multi-level facets

Multi-level facet requests can be made by using both the Java and REST APIs and the detailed instructions can be found in the below pages. They also contain details of using mixed facet types (parent and child facet can be of different facet types). Streaming queries do not support multi-level faceting as of now.

Facet filtering and Response

Each Facet request, returns a response with zero or more buckets. Each level in a N-level facet request can specify its own configuration, including sorting, number of requested buckets, etc. This configuration will be applied to that level bucket response only.

Details of sorting, filtering, etc can be found in the facet documentation Facets.

Sorting

Results can be sorted in ASC/DESC order. Sorting can now be done at any level of facet and is separate to each level.

Sorting on level 3 facet - sort only applies to the level 3 buckets.

"facets": [ "size(filter=small(\"size:>120\"), childFacet=table(childFacet=.fields(sortBy=COUNT, PrimarySort=ASC)))" ],

Sorting on level 2 facet - sort only applies to the level 2 buckets.

"facets": [ "size(filter=small(\"size:>120\"), childFacet=table(childFacet=.fields, sortBy=COUNT, PrimarySort=ASC))" ],

Bucket Filtering

Facets can be requested with the option to only return buckets with counts greater than a specified value and/or to request only the top N buckets. This filtering can be done at any level of the facet request.

Limiting buckets on first and third level - returns a maximum of 10 buckets whose count is greater than 100 at level 1 and 50 at level 3

"facets": [ "size(filter=small(\"size:>120\"), maxBuckets=10, minBucketCount=100, childFacet=table(childFacet=.fields(maxBuckets=10, minBucketCount=50)))" ],

Statistics and Field Expressions

Statistics can be produced by setting statistics=true at any level as long as the field is integer, long, double, float.

Similarly, field expressions can be included at any level. Details on these expressions can be found at Field Expressions.

"facets": ["table(childFacet=language(function=UPPERCASE(language)))"],    #set at level 2
"facets": ["table(function=SUBSTRING(table,0,3), childFacet=language)"],   #set at level 1

Visualizing in search UI

Multi-level facets can be configured in the search UI by changing the configuration.properties.js file as described in the Search UI Configuration Quick Start.

Below is an example of the facet request of table(childFacet=country(childFacet=language))

On the left hand side of SearchUI, multi-level facets can be seen for table facet. Click on the child facets and we can see how the documents are filtered further.




Performance Impact

Multi-level facet requests can have a significant impact on query performance. Attivio recommends their use be limited to fields which have a low number of unique values (low cardinality) and that you benchmark the performance impact prior to adding them to a production environment.

Observations: There is a performance variation when high cardinal fields are present in the deeper facet levels (example: level 3). 

The factors that can impact perfornance are:

  • Total number of documents containing the field
  • Total count of documents containing each field value
  • Unique values of a field

Following is a sample benchmarking result to demonstrate the impact these nested facets can have:

Index size25,116,274
Number of requests per test500
Percentage of single-, double--, triple-term queries50%, 40%,10%
Breakdown of content by field
FieldUnique values
field1> 100,000
field237,995
field313,869
field42008
field55
field61
field5(childFacet=field6(childFacet=field4))time taken 37 sec, throughput 13.3/sec
field5(childFacet=field6(childFacet=field2))time taken 50 sec, throughput 9.9/sec
field5(childFacet=field6(childFacet=field1))time taken 03 min 13 sec, throughput 2.6/sec

field2(childFacet=field3(childFacet=field4))

time taken 03:59, 2.1/sec throughput
field3(childFacet=field2(childFacet=field4))time taken 05:35,  1.5/sec throughput
field3(childFacet=field4(childFacet=field2))time taken 09:34,  52.3/min throughput
field1(childFacet=field2(childFacet=field1))60 samples took half an hour to complete, throughput 2.4/min
field2(childFacet=field1(childFacet=field1))time taken 21:24, throughput 23.4/min
field1(childFacet=field1(childFacet=field2))  -time taken 16:46, throughput 29.8/min

field1(childFacet=field1)

time taken 03:05, throughput 2.1/sec
  • No labels