Overview
Attivio can group or "collapse" duplicate results with the same headline into a single query result. This feature is called Field Collapsing. Field Collapsing filters out all except one result that share a unique value for a specified field by setting a FieldCollapse specification on the QueryRequest .
JoinRollupMode TREE
Not compatible with Streaming Queries
Note that Field Collapsing cannot be used with Streaming Queries.
View incoming links.
Field Collapse Specification
Field collapsing can be applied to a QueryRequest by using a FieldCollapse specification.
REST Syntax
collapse=FIELDNAME(mode=MODE, sort=SORT, facet=FACET, rows=ROWS)
Supported Parameters
Parameter | Default | Description |
---|---|---|
FIELDNAME | required | Specify the field to use for collapsing (must be joinable, i.e., not a TEXT or SHAPE field) |
MODE | DEFAULT | Specify the mode for field collapsing |
SORT | mode specific | Specify the sort order for documents in a group |
FACET | true | See #Faceting |
ROWS | 1 | Number of rows per group (must be >= 1) |
Example
collapse=cat(mode=DEFAULT, sort=$score:desc, facet=true, rows=2)
Field Collapsing Modes
2 field collapsing modes are currently supported:
- DEFAULT - provides standard field collapsing, scaled to support large number of unique field values.
- TWO_DIMENSIONAL - provides a 2 dimensional view of the index by returning a result set for each unique field value.
Default Field Collapsing
Sorting
The sort specification specified for field collapsing in default mode will determine the ordering of documents in each group. This will be used to determine the root document returned for each group. Default field collapsing only supports single level sort. Specifying a multi-level sort specification will in an exception.
Default Order
The default ordering for field collapsing is by natural index order. This method should be used unless business requirements require more explicit selection of a group's root document.
// Explicitly specify default order FieldCollapse collapse = new FieldCollapse("site"); collapse.setSort( SortSpecification.DOCUMENT_ORDER_SORT ); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(collapse); request.setJoinRollupMode(JoinRollupMode.TREE);
This ordering will perform the best and has the lowest memory requirements.
Relevancy Order
This root document selection method will result in the document with the highest relevancy score for a group being selected as the root document. This method requires more memory and CPU than selecting by natural ordering.
// Rest syntax: collapse=site(sort=$score:desc) FieldCollapse collapse = new FieldCollapse("site"); collapse.setSort( SortSpecification.RELEVANCY_SORT ); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(collapse); request.setJoinRollupMode(JoinRollupMode.TREE);
Arbitrary Sort Order
Any valid single-level sort specification can be used to order groups as desired. This method requires more memory and CPU than selecting by natural ordering.
// Rest syntax: collapse=site(date:desc) FieldCollapse collapse = new FieldCollapse("site"); collapse.setSort("date", Sort.SortOrder.DESCENDING); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(collapse); request.setJoinRollupMode(JoinRollupMode.TREE);
Faceting
Inclusion of collapsed documents in facet calculations can be disabled for default field collapsing if desired. When disabled, facet counts will only reflect the first document for each group.
// Rest syntax: collapse=site(facet=false) FieldCollapse collapse = new FieldCollapse("site"); collapse.setFacet(false); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(collapse); request.setJoinRollupMode(JoinRollupMode.TREE);
Result Format
When more than one row is requested for each group, additional rows will be returned as children of the root document for each group.
Java API
Child documents can be retrieved in the Java API via SearchDocument.getChildDocuments() .
SearchDocument root = results.getDocument(0); // get root document SearchDocument[] children = root.getChildren(); // Process child documents.
REST API
Child documents will be attached to a <SearchDocument> element in the <children> sub element.
NULL values
Each document that does not populate the field collapsing field will be returned in its own "group" for default field collapsing.
Multi-partition limitations
Default field collapsing has limitations when used with an AIE index which has multiple partitions.
- Result count is approximate (and may be larger than final result when paging through full result set).
- Facet counts are approximate.
- Child documents will only be provided from one partition.
See the following sections for more information.
Total result count may not be accurate
The returned number of matching rows for a collapsed result set may not be fully accurate, as the total row count will only account for documents collapsed within each partition and for documents returned in the first N results (where N is the number of requested hits) returned from each partition, as the top N results from each partition are collapsed together.
Total row count can be guaranteed for a number of search result hits up to a configurable point. By default, this point is the number of rows requested in the response; so, when requesting hits=10
and offset=0
(i.e. results 1-10), the total number of hits will be accurate; however, when the user moves to the next page of results (i.e. results 11-20), the total number of hits may be different than when the first page of results was requested.
This point up to which the total result count will be accurate can be increased by setting the QueryRequest's
Search Depth parameter. If a search result has a total result count less than or equal to the search depth, the total result count will be accurate, and in the case of more results than the search depth, the total result count will be accurate for paged result requested where hits + offset * hits
is less than or equal to the search depth. For example, setting the search depth to 30 would be useful for maintaining consistent total result counts for the first 3 pages of 10 results.
Facet counts may not be accurate
When merging facet results from each index partition, the facets from each node are calculated before collapsing is performed for the results returned from each index partition. Since collapsing reduces the number of documents in the result set, facet counts for buckets may be higher than the actual number of results returned when a particular facet bucket filter is applied to the original query.
Child documents
If the same "group" of documents is found on multiple nodes, then child documents will only be returned from one of the nodes. The rest of the child documents will be discarded.
For example, if the documents in the group "1" are indexed in multiple partitions, the documents for all but one partition will be discarded in the final result set.
From first partition
"1" -> Document 7 (group leader on first partition) |-> Document 2 |-> Document 5
From second partition:
"1" -> Document 9 (group leader on second partition) |-> Document 6 |-> Document 10
If "Document 7" is selected as the primary document for the group during query dispatching, then "Document 9", "Document 6" and "Document 10" will be discarded and not included in the group.
2-D Search Field Collapsing
2-D Search field collapsing will segment the result set using the specified field. This will result in returning the top N documents for each unique field value. This method of field collapsing can be enabled by setting the mode for field collapsing to FieldCollapse.Mode.TWO_DIMENSIONAL.
Rows Per Group
The number of rows returned per group can be specified with FieldCollapse.setRows() . By default, this value is 1, which will result in a single document per group.
// Rest syntax: collapse=cat(rows=10) FieldCollapse collapse = new FieldCollapse("cat"); collapse.setMode(FieldCollapse.Mode.TWO_DIMENSIONAL); collapse.setRows(10); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(request); request.setJoinRollupMode(JoinRollupMode.TREE);
Sorting
Any arbitrary sort can be used to order rows in each group. By default, a group's rows will be ordered by score descending.
Single Level Sort
// Order rows in each group by title ascending // Rest Syntax: collapse=cat(mode=2D, sort=title:asc) FieldCollapse collapse = new FieldCollapse("cat"); collapse.setMode(FieldCollapse.Mode.TWO_DIMENSIONAL); collapse.setSort( "title", Sort.SortOrder.ASCENDING ); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(collapse); request.setJoinRollupMode(JoinRollupMode.TREE);
Multi-Level Sort
// Order rows in each group by title ascending // Rest Syntax: collapse=cat(mode=2D, sort=title:asc, sort=author:desc) FieldCollapse collapse = new FieldCollapse("cat"); collapse.setMode(FieldCollapse.Mode.TWO_DIMENSIONAL); SortSpecification sort = new SortSpecification(); sort.add( new Sort("title", Sort.SortOrder.ASCENDING) ); sort.add( new Sort("author", Sort.SortOrder.DESCENDING) ); collapse.setSort(sort); QueryRequest request = new QueryRequest("*:*"); request.setFieldCollapse(collapse); request.setJoinRollupMode(JoinRollupMode.TREE);
Result Format
When using 2-Dimensional search, one document will be returned for each group. This document will contain no fields. It will contain the top N rows for the group as child documents. The total number of rows in the group will be retrievable via SearchDocument.getTotalChildren() .
NULL Values
All documents that do not populate the field collapsing field will be grouped together for two-dimensional search.
Limitations
The field being collapsed on for 2-Dimensional search must contain maximum 1024 unique values. In general, this feature should only be used when the number of possible groups is small. If the number of unique values for a field is large, default field collapsing should be used instead.