Overview
The Simple Query Language lets untrained users pose simple queries at search portals. It is similar to the familiar Google and Yahoo query languages.
Users who are more familiar with simple query syntax can query specific fields using more elaborate Boolean conditions. The simple query language also matches ranges of numbers and dates.
All terms in the query are ANDed together by default. There is no need to specify AND conditions in the query.
Therefore, a Simple Query Language query containing the term "AND" returns documents that contain the word "and".
View incoming links.
Default Behavior
The Simple Query Language is intended for untrained seekers making unsophisticated queries. It is a human-friendly query language.
When you submit a keyword query, Attivio treats each word as a "term". It then launches a series of queries against various fields in the index.
For instance, suppose we enter "hello world" in a search field and click the Search button.
Attivio interprets this request as two keywords that should both be found in the same document:
Because this query did not specify a field to search, Attivio searches the
content
field of each document, first for "hello" and then for "world". Both words must be present for a match. (The
content
field concatenates the document's
title
,
author
and
text
values, plus the values of all
*_s
and
*_text
dynamic fields, into a single field value. It is the
default-search-field
of the Attivio Schema.)
Then Attivio automatically evaluates the matching documents for relevancy. It checks the
title
fields for "hello" and for "world", then for the phrase "hello world" anywhere in the title, and finally it looks for titles that exactly match the phrase "hello world". Each of these cases gets a boost in its score. An exact-match to the whole title gets a very large boost, sufficient to propel that document to the top of the result list.
Viewing Queries
If you use Attivio's Debug Search page, you can type in key words, launch the search, and then view the resulting query and boost behavior in the
<query>
and
<boostQueries>
elements of the returned Legacy-XML.
Default OR
Note that while the default interpretation is to use AND of the terms, this can be switched to be OR. This can be done either via an additional CGI parameter to REST search requests or via a configuration property. Using the CGI parameter will override setting selected via a configuration property. The valid configurations are defaultAnd (default) and defaultOr.
Via an additional CGI parameter:
q.SimpleQueryParser.defaultBooleanMode=defaultOr
Via property:
rest.search.api.q.parameters=SimpleQueryParser.defaultBooleanMode=defaultOr
Simple Query Examples
The fastest way to appreciate the features of the Simple Query Language is to view a short list of example queries. Each of these links down the page to a section that describes the syntax in more detail.
Example Query | Description |
---|---|
Match documents where the content field contains the term "hello". | |
Match documents where the content field does not contain the term "hello". | |
Match documents where the content field contains the phrase "hello world". | |
Match all documents that contain either "hello" or "world", or both. | |
Match all documents where the content field contains either "hello" or "world", and double the score of documents that contain "hello". | |
Match documents that contain the term "hello" in the "title" field | |
Match documents that contain the phrase "hello world" in the "title" field | |
Match documents that satisfy the wildcard expression "h?llo" Matching terms will include "hello", "hillo", "hallo", etc. | |
Match documents that satisfy the wildcard expression "cat*". Matching terms will include "cat", "cats", "catapult", etc. | |
Match documents that have terms that are 66% similar to the term "cat". Sixty-six percent similarity to a three-letter word means that one-letter substitutions are permissible. Matching terms may include "hat", "cats", and "bat". | |
Match documents where the content field contains the string "1899". To search a numeric field, you have to name it. See next entry. | |
Perform a numeric match on the | |
Match the one document whose document-ID is "NA". Note the period before "id", denoting an internal Attivio field. | |
If the ID contains special characters, put it inside double-quotes and escape the characters. | |
Match documents with values greater than 55 (exclusive) | |
Match documents with values greater than 55 (inclusive) | |
Match documents with values greater than 55 (exclusive) | |
Match documents with date values between midnight Jan 1 and midnight Jan 4, 2007. | |
Match date values with full precision. |
Each of these examples is explained in more detail below.
Wildcard Searching
To perform wildcard searches, specify the "*" or "?" characters in the search term. The "*" will match any pattern of zero or more characters, and the "?" will match any single character.
Example Query | Description |
---|---|
h?llo | Match documents that satisfy the wildcard expression "h?llo" (matching terms will include "hello", "hillo", "hallo", etc.) |
cat* | Match documents that satisfy the wildcard expression "cat*" (matching terms will include "cat", "cats", "catapult", etc) |
Don't use * with single-character terms, please.
If you use the asterisk wildcard (zero or more characters) with a single-character term (such as A) it can sometimes generate enough matches to overwhelm system memory. For this reason Attivio has internal safeguards (wildcard.maxTerms and wildcard.maxPhraseTerms) to force a query error if the wildcard pattern gets too many matches.
The pattern *A matches every word in the index that ends in "a". Similarly, A* matches every word that begins with "a". Adding even one more letter to the pattern cuts down the number of matches dramatically.
You can sometimes encounter this error if your search term contains punctuation. For instance, the query term *R&D resolves into a search for *R and a parallel search for D, because Attivio drops punctuation before querying.
Wildcards, Phrases, and Highlighting
When you use a wildcard term ("?ello*"), Attivio highlights all the terms that match the wildcard term ("hello", "jello", "fellow", "bellow").
When you search for a phrase ("fine fellow"), Attivio highlights only the exact phrase. It does not highlight the words individually.
However, when you use a wildcard term in a phrase ("fine ?ello*"), Attivio highlights both the full phrase and all of the words that matched the wildcard term individually.
Terms
Terms and phrases can be searched at query time.
The syntax for a Term is:
VALUE[MODIFIERS]
VALUE
can be either a String, Number, or Date.
Example Query | Description |
---|---|
hello | Match documents where content field contains the term "hello". |
"hello world" | Match documents where the content field contains the phrase "hello world". |
Boost Modifier
A boost can be applied to a term in order to more heavily weight the value of matching that term in relevancy scoring.
To boost a term, append the search term/phrase with a caret ("^") symbol along with an integer (percentage) boost factor. The default boost factor is 100 which is, effectively, no boost.
Example Query | Description |
---|---|
hello^200 OR world | Match all documents that contain either "hello" or "world", and score documents that contain "hello" higher than documents that contain "world". |
Real-Time Updates
Note that this feature is not supported with Real-Time Update fields.
See Machine Learning Relevancy for more information on relevancy/boosting.
Similarity Modifier
A term can be used in a similarity search. A similarity search finds terms that are similar to the search term.
To use a term in a similarity search, append the search term with a tilde ("~") symbol followed by an optional similarity factor (Default: 50).
The similarity factor is a percentage value between 1 and 100.
Example Query | Description |
---|---|
cat~80 | Match documents that have terms that are similar to the term "cat" (matching terms may include "hat", "cats", "bat") |
Fields
Field-specific queries can be performed.
Field Modifier Syntax:
FIELD:EXPRESSION
FIELD
is the name of the field to search.
EXPRESSION
is a search term or range.
Any search terms that are not field-specific are applied to the default search field that is specified in the schema.
Example Query | Description |
---|---|
title:hello | Match documents that contain the term "hello" in the "title" field |
title:"hello world" | Match documents that contain the phrase "hello world" in the "title" field |
Document ID Field
The Attivio document ID field is
.id
. You can write Simple Query Language queries that directly match this field.
Example Query | Description |
---|---|
.id:NA | Match the one document whose document-ID is "NA". Note the period before "id", denoting an internal Attivio field. |
.id:"NAM\-2" | If the ID contains special characters, put it inside double-quotes and escape the characters. |
Ranges
Range searches match all documents that have values within a given range for a given field.
Range queries can be applied to all data types, including strings. Numeric fields are sorted numerically. String fields are sorted lexicographically, even if the strings appear to be numbers.
- Numeric: 1, 2, 3, 10, 20, 30, 100, 200, 300
- Lexical: 1, 10, 100, 2, 20, 200, 3, 30, 300
Fielded Ranges
It is a best practice to explicitly identify the field that you want to evaluate for a range of matches. For fielded ranges, the ">" and "<" inequality operators are treated as lower and upper limits on the desired range.
Operator | Query | Description |
---|---|---|
Greater Than | size:>55 | Match documents with values greater than 55 (exclusive) |
Greater Than | size:[55 TO *] | Match documents with values greater than 55 (inclusive) |
Greater Than | size:{55 TO *] | Match documents with values greater than 55 (exclusive) |
Less Than | size:<55 | Match documents with values less than 55 (exclusive) |
Less Than | size:[* TO 55] | Match documents with values less than 55 (inclusive) |
Less Than | size:[* TO 55} | Match documents with values less than 55 (exclusive) |
In Range | size:[5 TO 50] | Match documents with values between 5 and 50 inclusive |
In Range | size:{5 TO 50} | Match documents with values between 5 and 50 exclusive |
In Range | size:[5 TO 50} | Match documents with values between 5 (inclusive) and 50 (exclusive) |
In Range | size:{5 TO 50] | Match documents with values between 5 (exclusive) and 50 (inclusive) |
In Date Range | date:["2007-01-01" TO NOW] | Match documents with date values between Jan 1 2007 and the current moment. |
In Date Range | date:["2007-01-01" TO "2007-01-04"] | Match documents with date values between Jan 1 and Jan 4th 2007 |
In Date Range | date:["2007-01-01T00:00:00" TO "2007-01-01T00:00:00"] | Match documents with date values between Jan 1 and Jan 4th 2007 with full precision |
Unfielded Ranges
If you issue a range query without specifying the field, Attivio applies the request to the default search field. This is normally the
content
field, which is a concatenation of various text fields in the document ( title
, author
, text
, and others specified in the Attivio Schema). Therefore, unfielded range comparisons will always be lexicographic rather than numeric.
Unfielded range queries support the TO
syntax. The ">" and "<" operators are not supported for unfielded queries.
Exclusion Modifier
Use the exclusion modifier to match only documents that do not contain a specific term.
To exclude a term, prefix the term with a -. Do not leave a blank space between the - and the term you wish to exclude. If the term includes a field, place the - prior to the field name.
Example Query | Description |
---|---|
-cat | Only match documents that do not contain the term cat in the default search field |
-title:cat | Only match documents that do not contain the term cat in the title field |
OR Operator
Use the OR operator to require that a matching document contain at least one of many possible values.
The OR operator must be specified in upper case (otherwise the search is for the term "or").
Example Query | Description |
---|---|
cat OR dog | Matches documents that contain the term "cat" or the term "dog" |
cat OR "hound dog" | Matches documents that contain the term "cat" or the phrase "hound dog" |
To search for the term OR (in upper case), wrap OR in double quotes (ex: "OR")
Special Characters
The following characters are special characters that must be escaped with a backslash unless wrapped in double quotes:
- < > [ ] { } : ~ ^ <space>
The following characters must always be escaped with a backslash:
\ "
Special characters must be either escaped or wrapped in double quotes when used in search terms or ranges. See the table below for some example search terms, and how they must be queried in either an escaped or quoted form.
Search Term | Escaped Value | Quoted Value |
---|---|---|
dog | dog | "dog" |
attivio engine | attivio\ engine | "attivio engine" |
dog^cat | dog^cat | "dog^cat" |
double"quote | double\"quote | "double\"quote" |
back\slash | back\\slash | "back\\slash" |
Some fields may be indexed without tokenization (for example, product ids and other identification strings). In order to query these fields, any search terms must be either quoted or properly escaped to ensure correct matches.
Data Types
Strings
Strings can be either quoted or unquoted. unquoted strings start with any non-special character, and end at the first unescaped special character or space. quoted strings start with a double quote and end with an unescaped double quote. Special characters are escaped with the "\" character.
Examples | Description |
---|---|
test | Unquoted string |
"test query" | Quoted String with space |
"test-query" | Quoted String with special character |
test-query | Unquoted string with escaped special character |
Dates
A Date value must be specified in UTC format, wrapped in double quotes, because a fully-qualified date contains special characters. Valid formats are " YYYY-MM-DDThh:mm:ss
" and " YYYY-MM-DD hh:mm:ss
".
Examples | Notes |
---|---|
"2007-08-09T05:55:00" | fully specified |
"2007-08-09 05:55:00" | Using space instead of T. |
"2007-08-09" | assumes dates in the index are for midnight, that is, 00:00:00 |
Date queries require field names
To search for a date, you must use a field name, as in date:"2014-07-17". This tells Attivio to convert the date string into a 64-bit number that will match that field in the index. If you omit the field name, Attivio will try to match the string against the document's content field. The
content
field concatenates the document's
title
,
author
and
text
values, plus the values of all *_s and
*_text
dynamic fields, into a single field value. It does not contain any dates. Therefore, unfielded date queries won't match any documents.
For the details of Attivio's date-matching behavior, see Dates and Date Formats.
Numbers
Numbers can be searched at query time. To search for numbers (rather than searching for a string that contains a number), direct your query to numeric fields. These are the fields that are defined as integer, double, float, or long in Configure the Attivio Schema.
To search for a negative number, the number must be wrapped in double quotes, because the minus sign (-) is a special character.
Examples |
---|
532 |
"-532" |
5.5 |
5.05 |
"-32.8" |
When querying an int field using a float or double constant in the query string, the constant will be truncated to an int before the match proceeds.
- A document contains a field with value 7. It will be returned when querying for 7.9. It will NOT be found by a query for 6.99.
- A range query for 1.1 to 9.9 will be interpreted as 1 to 9.