Page tree

Overview

The Simple Query Language lets untrained users pose simple queries at search portals. It is similar to the familiar Google and Yahoo query languages.

Users who are more familiar with simple query syntax can query specific fields using more elaborate Boolean conditions. The simple query language also matches ranges of numbers and dates. 

All terms in the query are ANDed together by default. There is no need to specify AND conditions in the query.

Therefore, a Simple Query Language query containing the term "AND" returns documents that contain the word "and".

View incoming links.

 

 

Default Behavior

The Simple Query Language is intended for untrained seekers making unsophisticated queries. It is a human-friendly query language.

When you submit a keyword query, Attivio treats each word as a "term". It then launches a series of queries against various fields in the index.

For instance, suppose we enter "hello world" in a search field and click the Search button.

Attivio interprets this request as two keywords that should both be found in the same document:

Resulting internal query
AND(hello, world)


Because this query did not specify a field to search, Attivio searches the content field of each document, first for "hello" and then for "world". Both words must be present for a match. (The content field concatenates the document's title , author and text values, plus the values of all *_s and *_text dynamic fields, into a single field value. It is the default-search-field of the Attivio Schema.)

Then Attivio automatically evaluates the matching documents for relevancy. It checks the title fields for "hello" and for "world", then for the phrase "hello world" anywhere in the title, and finally it looks for titles that exactly match the phrase "hello world". Each of these cases gets a boost in its score. An exact-match to the whole title gets a very large boost, sufficient to propel that document to the top of the result list.

Viewing Queries

If you use Attivio's Debug Search page, you can type in key words, launch the search, and then view the resulting query and boost behavior in the  <query>  and  <boostQueries>  elements of the returned Legacy-XML.

Default OR

Note that while the default interpretation is to use AND of the terms, this can be switched to be OR. This can be done either via an additional CGI parameter to REST search requests or via a configuration property. Using the CGI parameter will override setting selected via a configuration property. The valid configurations are defaultAnd (default) and defaultOr.

Via an additional CGI parameter:

CGI parameter
q.SimpleQueryParser.defaultBooleanMode=defaultOr

Via property:

Property
rest.search.api.q.parameters=SimpleQueryParser.defaultBooleanMode=defaultOr


Simple Query Examples

The fastest way to appreciate the features of the Simple Query Language is to view a short list of example queries. Each of these links down the page to a section that describes the syntax in more detail.

Example Query

Description

hello

Match documents where the content field contains the term "hello".

-hello

Match documents where the content field does not contain the term "hello".

"hello world"

Match documents where the content field contains the phrase "hello world".

hello OR world

Match all documents that contain either "hello" or "world", or both.

hello^200 OR world

Match all documents where the content field contains either "hello" or "world", and double the score of documents that contain "hello".

title:hello

Match documents that contain the term "hello" in the "title" field

title:"hello world"

Match documents that contain the phrase "hello world" in the "title" field

h?llo

Match documents that satisfy the wildcard expression "h?llo" Matching terms will include "hello", "hillo", "hallo", etc.

cat*

Match documents that satisfy the wildcard expression "cat*". Matching terms will include "cat", "cats", "catapult", etc.

cat~66

Match documents that have terms that are 66% similar to the term "cat". Sixty-six percent similarity to a three-letter word means that one-letter substitutions are permissible. Matching terms may include "hat", "cats", and "bat".

1899

Match documents where the content field contains the string "1899". To search a numeric field, you have to name it. See next entry.

size:1899

Perform a numeric match on the size field. The size field is not included in the "default" search, but it can be queried by name.

.id:NA

Match the one document whose document-ID is "NA". Note the period before "id", denoting an internal Attivio field.

.id:"NAM\-2"

If the ID contains special characters, put it inside double-quotes and escape the characters.

size:>55

Match documents with values greater than 55 (exclusive)

size:[55 TO *]

Match documents with values greater than 55 (inclusive)

size:{55 TO *]

Match documents with values greater than 55 (exclusive)

date:["2007-01-01" TO "2007-01-04"]

Match documents with date values between midnight Jan 1 and midnight Jan 4, 2007.

date:["2007-01-01 00:00:00" TO "2007-01-01 00:00:00"]

Match date values with full precision.

Each of these examples is explained in more detail below.

Wildcard Searching

To perform wildcard searches, specify the "*" or "?" characters in the search term. The "*" will match any pattern of zero or more characters, and the "?" will match any single character.

Example Query

Description

h?llo

Match documents that satisfy the wildcard expression "h?llo" (matching terms will include "hello", "hillo", "hallo", etc.)

cat*

Match documents that satisfy the wildcard expression "cat*" (matching terms will include "cat", "cats", "catapult", etc)

Don't use * with single-character terms, please.

If you use the asterisk wildcard (zero or more characters) with a single-character term (such as A) it can sometimes generate enough matches to overwhelm system memory. For this reason Attivio has internal safeguards (wildcard.maxTerms and wildcard.maxPhraseTerms) to force a query error if the wildcard pattern gets too many matches.

The pattern *A matches every word in the index that ends in "a". Similarly, A* matches every word that begins with "a". Adding even one more letter to the pattern cuts down the number of matches dramatically.

You can sometimes encounter this error if your search term contains punctuation. For instance, the query term *R&D resolves into a search for *R and a parallel search for D, because Attivio drops punctuation before querying.

Wildcards, Phrases, and Highlighting

When you use a wildcard term ("?ello*"), Attivio highlights all the terms that match the wildcard term ("hello", "jello", "fellow", "bellow").

When you search for a phrase ("fine fellow"), Attivio highlights only the exact phrase.  It does not highlight the words individually.

However, when you use a wildcard term in a phrase ("fine ?ello*"), Attivio highlights both the full phrase and all of the words that matched the wildcard term individually. 

 

Terms

Terms and phrases can be searched at query time.

The syntax for a Term is:
VALUE[MODIFIERS]

VALUE can be either a String, Number, or Date.

Example Query

Description

hello

Match documents where content field contains the term "hello".

"hello world"

Match documents where the content field contains the phrase "hello world".

Boost Modifier

A boost can be applied to a term in order to more heavily weight the value of matching that term in relevancy scoring.

To boost a term, append the search term/phrase with a caret ("^") symbol along with an integer (percentage) boost factor.  The default boost factor is 100 which is, effectively, no boost.

Example Query

Description

hello^200 OR world

Match all documents that contain either "hello" or "world", and score documents that contain "hello" higher than documents that contain "world".

Real-Time Updates

Note that this feature is not supported with Real-Time Update fields.  

See Machine Learning Relevancy for more information on relevancy/boosting.

Similarity Modifier

A term can be used in a similarity search.  A similarity search finds terms that are similar to the search term.

To use a term in a similarity search, append the search term with a tilde ("~") symbol followed by an optional similarity factor (Default: 50).
The similarity factor is a percentage value between 1 and 100.

Example Query

Description

cat~80

Match documents that have terms that are similar to the term "cat" (matching terms may include "hat", "cats", "bat")

Fields

Field-specific queries can be performed.

Field Modifier Syntax:
FIELD:EXPRESSION

FIELD is the name of the field to search.  EXPRESSION is a search term or range.

Any search terms that are not field-specific are applied to the default search field that is specified in the schema. 

Example Query

Description

title:hello

Match documents that contain the term "hello" in the "title" field

title:"hello world"

Match documents that contain the phrase "hello world" in the "title" field

Document ID Field

The Attivio document ID field is .id .  You can write Simple Query Language queries that directly match this field.

Example Query

Description

.id:NA

Match the one document whose document-ID is "NA". Note the period before "id", denoting an internal Attivio field.

.id:"NAM\-2"

If the ID contains special characters, put it inside double-quotes and escape the characters.

Ranges

Range searches match all documents that have values within a given range for a given field.

Range queries can be applied to all data types, including strings. Numeric fields are sorted numerically.  String fields are sorted lexicographically, even if the strings appear to be numbers. 

  • Numeric:  1, 2, 3, 10, 20, 30, 100, 200, 300
  • Lexical:    1, 10, 100, 2, 20, 200, 3, 30, 300

Fielded Ranges

It is a best practice to explicitly identify the field that you want to evaluate for a range of matches.  For fielded ranges, the ">" and "<" inequality operators are treated as lower and upper limits on the desired range.

Operator

Query

Description

Greater Than

size:>55

Match documents with values greater than 55 (exclusive)

Greater Than

size:[55 TO *]

Match documents with values greater than 55 (inclusive)

Greater Than

size:{55 TO *]

Match documents with values greater than 55 (exclusive)

Less Than

size:<55

Match documents with values less than 55 (exclusive)

Less Than

size:[* TO 55]

Match documents with values less than 55 (inclusive)

Less Than

size:[* TO 55}

Match documents with values less than 55 (exclusive)

In Range

size:[5 TO 50]

Match documents with values between 5 and 50 inclusive

In Range

size:{5 TO 50}

Match documents with values between 5 and 50 exclusive

In Range

size:[5 TO 50}

Match documents with values between 5 (inclusive) and 50 (exclusive)

In Range

size:{5 TO 50]

Match documents with values between 5 (exclusive) and 50 (inclusive)

In Date Range

date:["2007-01-01" TO NOW]

Match documents with date values between Jan 1 2007 and the current moment.

In Date Range

date:["2007-01-01" TO "2007-01-04"]

Match documents with date values between Jan 1 and Jan 4th 2007

In Date Range

date:["2007-01-01T00:00:00" TO "2007-01-01T00:00:00"]

Match documents with date values between Jan 1 and Jan 4th 2007 with full precision

Unfielded Ranges

If you issue a range query without specifying the field, Attivio applies the request to the default search field.  This is normally the content field, which is a concatenation of various text fields in the document ( title , author , text , and others specified in the Attivio Schema).  Therefore, unfielded range comparisons will always be lexicographic rather than numeric.

Unfielded range queries support the TO syntax.  The ">" and "<" operators are not supported for unfielded queries.

Exclusion Modifier

Use the exclusion modifier to match only documents that do not contain a specific term.

To exclude a term, prefix the term with a -. Do not leave a blank space between the - and the term you wish to exclude. If the term includes a field, place the - prior to the field name.

Example Query

Description

-cat

Only match documents that do not contain the term cat in the default search field

-title:cat

Only match documents that do not contain the term cat in the title field

OR Operator

Use the OR operator to require that a matching document contain at least one of many possible values.

The OR operator must be specified in upper case (otherwise the search is for the term "or").

Example Query

Description

cat OR dog

Matches documents that contain the term "cat" or the term "dog"

cat OR "hound dog"

Matches documents that contain the term "cat" or the phrase "hound dog"

To search for the term OR (in upper case), wrap OR in double quotes (ex: "OR")

Special Characters

The following characters are special characters that must be escaped with a backslash unless wrapped in double quotes:
- < > [ ] { } : ~ ^ <space> 

The following characters must always be escaped with a backslash:

\ "

Special characters must be either escaped or wrapped in double quotes when used in search terms or ranges. See the table below for some example search terms, and how they must be queried in either an escaped or quoted form.

Search Term

Escaped Value

Quoted Value

dog

dog

"dog"

attivio engine

attivio\ engine

"attivio engine"

dog^cat

dog^cat

"dog^cat"

double"quote

double\"quote

"double\"quote"

back\slashback\\slash"back\\slash"

Some fields may be indexed without tokenization (for example, product ids and other identification strings). In order to query these fields, any search terms must be either quoted or properly escaped to ensure correct matches.

Data Types

Strings

Strings can be either quoted or unquoted. unquoted strings start with any non-special character, and end at the first unescaped special character or space. quoted strings start with a double quote and end with an unescaped double quote. Special characters are escaped with the "\" character.

Examples

Description

test

Unquoted string

"test query"

Quoted String with space

"test-query"

Quoted String with special character

test-query

Unquoted string with escaped special character

Dates

A Date value must be specified in UTC format, wrapped in double quotes, because a fully-qualified date contains special characters. Valid formats are " YYYY-MM-DDThh:mm:ss " and " YYYY-MM-DD hh:mm:ss ".

Examples

Notes

"2007-08-09T05:55:00"

fully specified

"2007-08-09 05:55:00"

Using space instead of T.

"2007-08-09"

assumes dates in the index are for midnight, that is, 00:00:00

Date queries require field names

To search for a date, you must use a field name, as in date:"2014-07-17".  This tells Attivio to convert the date string into a 64-bit number that will match that field in the index. If you omit the field name, Attivio will try to match the string against the document's content field. The content field concatenates the document's title , author and text values, plus the values of all *_s and *_text dynamic fields, into a single field value.  It does not contain any dates. Therefore, unfielded date queries won't match any documents.

For the details of Attivio's date-matching behavior, see Dates and Date Formats.

Numbers

Numbers can be searched at query time. To search for numbers (rather than searching for a string that contains a number), direct your query to numeric fields. These are the fields that are defined as integer, double, float, or long in Configure the Attivio Schema.

To search for a negative number, the number must be wrapped in double quotes, because the minus sign (-) is a special character.

Examples

532

"-532"

5.5

5.05

"-32.8"

When querying an int field using a float or double constant in the query string, the constant will be truncated to an int before the match proceeds.

  • A document contains a field with value 7. It will be returned when querying for 7.9. It will NOT be found by a query for 6.99.
  • A range query for 1.1 to 9.9 will be interpreted as 1 to 9.
  • No labels