Overview
Multi-level faceting is a feature designed to support visualizing of a hierarchy for the documents matching a query, and support filtering documents on arbitrary levels of the hierarchy. This supports deeper level of analytics over all documents matching a query using multiple fields. Facet Requests are now updated to support specifying multiple levels. Each FacetRequest will include a "next level" which itself will be a FacetRequest. Similarly, Facet buckets are updated to contain responses for the next level of a facet bucket.
Example
If we have 5 documents as below, faceting on state and city in the next level, allows user to filter documents of different state and city combinations.
Source Content:
- Document 1: state = MA, city = Boston
- Document 2: state = PA, city = Pittsburgh
- Document 3: state = MA, city = Newton
- Document 4: state = MA, city = Boston
- Document 5: state = NY, city = <null>
Request:
- Facet on state, then on city
Expected Results:
- State: MA (count 3)
- City: Boston (count 2)
- City: Newton (count 1)
- State: PA (count 1)
- City: Pittsburgh (count 1)
- State: NY (count 1)
Usage of the Multi-level facets
Multi-level facet requests can be made by using both the Java and REST APIs and the detailed instructions can be found in the below pages. They also contain details of using mixed facet types (parent and child facet can be of different facet types). Streaming queries do not support multi-level faceting as of now.
Facet filtering and Response
Each Facet request, returns a response with zero or more buckets. Each level in a N-level facet request can specify its own configuration, including sorting, number of requested buckets, etc. This configuration will be applied to that level bucket response only.
Details of sorting, filtering, etc can be found in the facet documentation Facets.
Sorting
Results can be sorted in ASC/DESC order. Sorting can now be done at any level of facet and is separate to each level.
Sorting on level 3 facet - sort only applies to the level 3 buckets.
"facets": [ "size(filter=small(\"size:>120\"), childFacet=table(childFacet=.fields(sortBy=COUNT, PrimarySort=ASC)))" ],
Sorting on level 2 facet - sort only applies to the level 2 buckets.
"facets": [ "size(filter=small(\"size:>120\"), childFacet=table(childFacet=.fields, sortBy=COUNT, PrimarySort=ASC))" ],
Bucket Filtering
Facets can be requested with the option to only return buckets with counts greater than a specified value and/or to request only the top N buckets. This filtering can be done at any level of the facet request.
Limiting buckets on first and third level - returns a maximum of 10 buckets whose count is greater than 100 at level 1 and 50 at level 3
"facets": [ "size(filter=small(\"size:>120\"), maxBuckets=10, minBucketCount=100, childFacet=table(childFacet=.fields(maxBuckets=10, minBucketCount=50)))" ],
Statistics and Field Expressions
Statistics can be produced by setting statistics=true
at any level as long as the field is integer, long, double, float.
Similarly, field expressions can be included at any level. Details on these expressions can be found at Field Expressions.
"facets": ["table(childFacet=language(function=UPPERCASE(language)))"], #set at level 2 "facets": ["table(function=SUBSTRING(table,0,3), childFacet=language)"], #set at level 1
Visualizing in search UI
Multi-level facets can be configured in the search UI by changing the configuration.properties.js
file as described in the Search UI Configuration Quick Start.
Below is an example of the facet request of table(childFacet=country(childFacet=language))
On the left hand side of SearchUI, multi-level facets can be seen for table facet. Click on the child facets and we can see how the documents are filtered further.
Performance Impact
Multi-level facet requests can have a significant impact on query performance. Attivio recommends their use be limited to fields which have a low number of unique values (low cardinality) and that you benchmark the performance impact prior to adding them to a production environment.
Observations: There is a performance variation when high cardinal fields are present in the deeper facet levels (example: level 3).
The factors that can impact perfornance are:
- Total number of documents containing the field
- Total count of documents containing each field value
- Unique values of a field
Following is a sample benchmarking result to demonstrate the impact these nested facets can have:
Index size | 25,116,274 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number of requests per test | 500 | ||||||||||||||
Percentage of single-, double--, triple-term queries | 50%, 40%,10% | ||||||||||||||
Breakdown of content by field |
| ||||||||||||||
field5(childFacet=field6(childFacet=field4)) | time taken 37 sec, throughput 13.3/sec | ||||||||||||||
field5(childFacet=field6(childFacet=field2)) | time taken 50 sec, throughput 9.9/sec | ||||||||||||||
field5(childFacet=field6(childFacet=field1)) | time taken 03 min 13 sec, throughput 2.6/sec | ||||||||||||||
field2(childFacet=field3(childFacet=field4)) | time taken 03:59, 2.1/sec throughput | ||||||||||||||
field3(childFacet=field2(childFacet=field4)) | time taken 05:35, 1.5/sec throughput | ||||||||||||||
field3(childFacet=field4(childFacet=field2)) | time taken 09:34, 52.3/min throughput | ||||||||||||||
field1(childFacet=field2(childFacet=field1)) | 60 samples took half an hour to complete, throughput 2.4/min | ||||||||||||||
field2(childFacet=field1(childFacet=field1)) | time taken 21:24, throughput 23.4/min | ||||||||||||||
field1(childFacet=field1(childFacet=field2)) - | time taken 16:46, throughput 29.8/min | ||||||||||||||
field1(childFacet=field1) | time taken 03:05, throughput 2.1/sec |