Page tree
Skip to end of metadata
Go to start of metadata

Overview

AIE performs linguistic transformation on incoming documents in order to create the richest possible index. Query strings must undergo the same linguistic transformations in order to match indexed documents accurately.

AIE is very good at identifying the language of incoming documents (and parts of documents), and then applying the transformations that are appropriate to that language.

Queries, however, are usually too short for language-identification algorithms to work. For this reason, it is critically important that you tell AIE which language a non-English query was written in. Otherwise AIE will analyze the query as if it were written in English, and search results may not be optimal.  

Related Topics

Language identification is performed by both Attivio Core Language Identification and by ALM Language Identification.  The AIE core recognizes more than 25 languages; the ALM recognizes over 50 languages.

Locale codes are mapped to language-specific tokenizers in <project-dir>\conf\features\core\TokenizerModel... files.

Exposing Locale Properties demonstrates how to inspect locale settings of an IngestDocument during ingestion.

View incoming links.

Examples of Non-English Queries

Designating the language of a non-English query lets AIE apply a specific set of linguistic transformations to the text of the query. The examples that follow illustrate differences in Lemmatization among several languages.

  • If we search for the ubiquitous word "domino" using the default English query processing, AIE looks for documents that contain "domino."
  • However, if we tell AIE that "domino" is written in Spanish or Portuguese, AIE searches for "domino" and also for its lemma "dominar."
  • If we identify "domino" as Italian, AIE searches for "domino" and "domare."
  • In Hungarian, AIE searches for "domino" and "dominó."
  • For Polish, AIE matches documents containing "domino" and "domina."

From these simple examples you can imagine the impact this can have on complex queries. The default English version of the query could be significantly different from the non-English version, with a corresponding difference in matching documents.

Setting the Locale in the UI

Language of the Query

In this context, the "language" of the query means the spoken language the the words and phrases of the query were written in, such as English, French, German, Thai, etc.

However, specifying the query's language is done in multiple ways, depending in part on which of AIE's Query Languages is in use. In this sense, we refer to AIE's Simple Query Language and Advanced Query Language.

SAIL Interface

The SAIL interface sets the query locale in the Preferences dialog, Results View tab Locale field. To use the locale field, type in a single ISO-639-1 two-letter code for the language.  For instance, enter "fr" to pose a query in French.

SAILResultsViewTab

Debug Search Interface

The Debug Search interface uses the Locale control to set the language of both Simple and Advanced queries. To use the locale field, type in the ISO-639-1 two-letter code for the language.

DebugQueryLanguage

Setting the Spoken Language in the Query

Under some circumstances you can specify the spoken language of the query (also for parts of the query) from within the query itself.

Simple Query Language

There is no construct within the Simple Query Language that lets you directly specify the spoken language of the query.

However, simple queries can be submitted in a variety of ways that let us designate the spoken language outside of the actual query string. See the section about the Debug Search pages, above.

Advanced Query Language

The Advanced Query Language offers a shared language parameter that can be appended to most of the advanced query operators (such as the TERM operator).

Advanced Query Language
TERM(domino,language="de")

 

Setting the Spoken Language in Java

When setting up a query in the Java Client API, note that the QueryRequest  class has a setLocale method.

Setting the Spoken Language of a Query
QueryRequest req = new QueryRequest("*:*", QueryLanguages.SIMPLE);
QueryResponse resp = searchclient.search(req.setLocale("fr"));
  • No labels