Page tree
Skip to end of metadata
Go to start of metadata

Overview

Attivio's Dictionary Management User Interface lets you add, edit, and delete the dictionaries and dictionary terms that let users make word associations to improve query results.

Required Modules

These features require the inclusion of the businesscenter module when you run createproject or Attivio Designer to create the project directories. (This module is included in the demo module group.)

Dictionaries edited in the Dictionary Manager are automatically applied to incoming queries once they have been published. They do not impact the ingestion process.

View incoming links.

 

Attivio Dictionaries

Dictionary Types

Attivio supports six types of dictionary:

  • Acronym dictionaries, for associating abbreviations found in user queries with their expanded forms
  • Autocomplete dictionaries, for completing partially-matched terms found in user queries (Available in Attivio Platform 5.6.3 and later releases)
  • Entity dictionaries, for use in dictionary-based entity extraction and other functions
  • Spell check dictionaries, for correcting misspelled terms found in user queries
  • Stop word dictionaries, for preventing search on common terms found in user queries
  • Synonym dictionaries, for associating terms found in user queries with equivalent terms or phrases

Attivio uses these dictionaries to perform linguistic operations on search queries to increase the number of relevant documents retrieved for the user.

Saving, Approving, and Publishing Attivio Dictionaries

IMPORTANT

Attivio Dictionary Management uses a multi-stage approval process to effect dictionary changes. Only the published version of a dictionary is available as a resource in the Store. This system permits data entry, manager approval, and final release of dictionary changes.

To make an Attivio dictionary live (or to delete a dictionary) requires all stages of this process.

  • Save – when you create or change a dictionary and save it, the dictionary Status becomes New or Modified, and the Version becomes Pending.
  • Approve – when you approve the changes to a dictionary, the dictionary Status remains New or Modified, and the Version becomes Approved.
  • Publish – when you publish a dictionary, the Status becomes Published, and the Version becomes Live.

For example, if you have a live (published) dictionary and you want to add one more term to it, you cannot just add that term to the live dictionary. You must add the term to the dictionary, then Save, Approve and Publish the whole dictionary again to see that change in the live dictionary.

Managing Dictionaries

To manage dictionaries, do the following:

  1. Select Manage Dictionaries from the Business Center Admin. The Dictionary Administration screen appears showing any existing dictionaries.
     

    From here you can:

    • Click View All to access individual dictionaries by name.
    • Access dictionaries by type or version, by clicking the desired link in the associated area.
    • Enter a known dictionary name in the Find a dictionary by name field and click Go to find that dictionary.
    • Click Create A New Dictionary to create a new dictionary.
    • Click Import a Dictionary to import a CSV-formatted existing dictionary.
  2. Click View All. The Dictionaries screen appears.
     
    From here, you can:
    • Search for a dictionary by name in the Find a dictionary… field.
    • Click the Create a New Dictionary or Import a Dictionary buttons to perform those actions.
    • Click the Limit Results To drop-down control to filter which dictionaries appear in your list.
    • Click the Sort By drop-down control to order the dictionary list by modified date, published date, or dictionary name.
    • Click on a dictionary's name link to view or edit the general details of that dictionary.
    • Click the Actions drop-down menu button for a dictionary to edit that dictionary's details or terms, or to delete or export the dictionary.

Creating a New Dictionary

To create a new dictionary, do the following:

  1. Click the Create a New Dictionary button. The General Details screen appears.
  2. Enter a name for the dictionary.
  3. Select the desired dictionary Type.
  4. Enter a name in the Group field to help you identify this dictionary from within a family of others. For example, English Australian, English UK, English USA.
    • Specify if you want lookups against this dictionary to be punctuation-sensitive or case-sensitive by selecting the corresponding checkbox. See Punctuation Sensitivity in Queries for additional information on this option.

  5. Click the magnifying glass icon to specify the desired Locale(s). The Locales screen appears.

  6. Search for your desired locales by name or use the page controls to locate them. Check the box for each desired locale, then click Select. The selected locales appear in the Dictionary details. See Locale and Profile for details.
    • For synonym dictionaries, select the Bidirectional option from the Expansion Mode list to make the dictionary bidirectional if desired. For acronym dictionaries, select the Bidirectional option to make the dictionary bidirectional if desired. You can also make acronym dictionaries bidirectional later by editing their details.

      Unidirectional and Bidirectional Dictionaries

      Attivio dictionaries are unidirectional by default, so that term expansion only occurs in one direction. For example, when the term is USA and the expansion terms are United States and United States of America. If the query is "USA", Attivio modifies the query to include a search for "United States" and "United States of America." However, if the query is "United States," Attivio searches for only that phrase since there is no reverse expansion linking United States to USA or United States of America.

      Bidirectional dictionaries link aliases back to the root term. In bidirectional dictionaries, any term listed in a given entry in the dictionary expands to include all other listed terms. For example, if the query is "United States," the query expands to search for all three variations: the root term ("USA"), the expansion term the user entered ("United States"), and any other listed term expansions ("United States of America").

  7. Click Save. The dictionary saves.
  8. When you are ready to make the dictionary "live," click Approve and then Publish.

Creating a Derived Autocomplete or Entity Dictionary

(Autocomplete dictionaries are available in Attivio Platform 5.6.3 and later releases.)

A "derived" autocomplete or entity dictionary is a special case in which dictionary entries are dynamically derived from the values of a field in your Attivio index's ingested documents. The values are drawn directly from the Attivio index and are stored as a new dictionary in the Store.

Dictionary Entity Extraction

Creating an entity dictionary is only the first half of extracting entities. See Dictionary Entity Extraction for the rest of the story.

When defining a new dictionary, first select "Autocomplete" or "Entity" as the dictionary's Type, then select "Derived" as its Dictionary Mode. New controls will appear on the screen.

  • Available Sources is the list of tables in the Attivio index. In this example we selected the city table from the Factbook demo, meaning that dictionary terms will be derived from ingested documents tagged with table field values of "city".
  • The Entity Types control lets you select the field(s) from which terms will be extracted. In this case, we are extracting terms from country field values found on documents in the city table.
  • The Partial Matches checkbox allows enabling additional matching for terms which partially match a user's query. This matching is primarily useful for autocomplete dictionaries and entity dictionaries that will be used for autocomplete, and should not be enabled for entity dictionaries used for ingestion entity extraction. When this setting is enabled, the entity "Sample Entity" would be matched by autocomplete if user entered "Sam", as well as if user entered "Ent". Partial matches will only start at word boundaries.

Enabling the Partial Matches option will significantly increase the size of the compiled autocomplete or entity dictionary.

  • The Auto Refresh control allows you to specify the period of time in hours after which Attivio should attempt to refresh (that is, re-derive and re-publish) the dictionary. This is set to "default", meaning 24 hours, for newly-created dictionaries. You can set values down to 0.25 hours (15 minutes). If you do not want a derived-mode dictionary to be refreshed, you can set its period to "never"; this is also the default value for dictionaries which were created in older versions of Attivio. (This control is available in Attivio Platform 5.6.3 and later releases.)

Be sure to save, approve and publish the dictionary.

If you add documents to the entity source table and want to update the dictionary, use the Refresh button. Refreshing re-derives terms from the Attivio index and re-publishes the dictionary with the new term set.

Derived DIctionary Behaviors

Derived dictionaries have some special behaviors that might be puzzling.

  • Even though the dictionary has been published, the Published Terms list will be empty when you view it. Use the search box to find specific terms. (You can search for * to view the whole list.) The displayed terms are read-only.
  • You cannot export terms from a derived dictionary. An export file is created, but it contains only the metadata header.
  • The Entity Types list contains only facetable fields, as defined in the Attivio Schema. That's why schema fields like title don't appear in the list.
  • The displayed terms are prefix-matched terms. For example, a search for term "inter" will display matched terms "international business machines", "intercom", "internal" and so on, but not "business international", "football inter milan", etc.


Punctuation Sensitivity in Queries

Before setting punctuation sensitivity (as described above), it is important to understand how punctuation sensitivity can impact search results. For example, consider an Acronym dictionary formatted as follows:

"#TYPE","ACRONYM"
"#NAME","ACR"
"#GROUP","english"
"#PUNCTUATION_SENSITIVE","true"
"#LOCALE","en"
"U-S-A","University of South Australia^100"
"U.S.A.","United States of America^100"

If your ingested content includes two documents with content "University of South Australia" and "United States of America", query behavior will differ depending on the dictionary's punctuation sensitivity setting.

If punctuation sensitivity is set to true, as shown above, then a query for "U-S-A" with dashes will match "University of South Australia" and not "United States of America", while a query for "U.S.A." with dots will match "United States of America" but not "University of South Australia".

If punctuation sensitivity is set to false, the dashes and dots will be ignored and only the alphanumeric characters in the query term will be considered for acronym matching. As a result, querying for either "U-S-A" with dashes or "U.S.A." with dots will match "University of South Australia" and "United States of America".

Adding Dictionary Terms

Dictionary terms let you make associations between similar words. The nature of the association depends on the dictionary type. For example:

  • Acronym dictionaries for associating abbreviations (such as "USA") with names ("United States of America").
  • Synonym dictionaries for associating one word or phrase with another that shares the same meaning or usage (such as "automobile" and "car").

To add a dictionary term, do the following:

  1. Create a new dictionary or open an existing dictionary for editing, then click the Edit Terms button. The Terms screen for that dictionary appears.

    • To upload terms from a CSV file, click Import Terms, browse to a CSV file in the correct format for the selected dictionary type (see Dictionary Formats, below), and click OK. The CSV file's terms upload to the dictionary, overwriting any existing matching terms and adding new ones. Importing terms does not create duplicates.

    • To create new terms individually, continue from step 2.

    To avoid excessive loading times, the Terms screen only displays a maximum of 5,000 dictionary terms. If your dictionary contains more than 5,000 terms, use the Find Item(s) search field to locate terms.

  2. Click Add New Term. The New Term dialog box for the selected dictionary type appears. Different types of dictionaries have different data-entry dialog boxes. This one is for synonyms:
  3. Enter the new term, and depending on the dictionary type, any desired synonyms, expansions, etc.
    • Complete any optional fields such as Boost.

      Boosting Terms

      When defining synonyms, you may want to set a boost value. Boosting the term's value lets you affect the result order when sorting by result relevancy, so as to give preference to documents containing the term a user actually entered over the term's variations. Documents with the original query term(s) should generally appear before the documents that were obtained using variations. Boost value guidelines:

      • 0 – variations are ignored in when relevancy scores are calculated.
      • 50 – documents found using variations count half as much as documents found with the original term.
      • 100 – documents found using the variations count the same as documents found with the original term.
      • Greater than 100 – use to make documents matching the variations to count more than the user-entered term.

      Term boosting is available in both the Simple Query Language and the Advanced Query Language, and is discussed in detail on the Machine Learning Relevancy page.

  4. Click Save. The dictionary saves and the new terms appear on the Pending Terms tab awaiting approval.

     

Editing Existing Terms

To edit existing dictionary terms, do the following:

  1. From the Dictionary Administration page, click View All to access the list of existing dictionaries. The Dictionaries screen appears.
  2. Click the Actions button in the row for the desired dictionary, and select Edit Details (to add locales) or Edit Terms to add or delete terms from the selected dictionary. The Terms page appears showing terms on either the Pending Terms, Approved Terms or Published Terms tab depending on the dictionary's state.
  3. Click the Edit Term icon to access the term and its associated components for editing, or click the Revert icon to remove the term from the dictionary.

    Once the term is approved and published, the Revert icon becomes a Delete icon.

Importing, Exporting, and Deleting Dictionaries

You can export dictionaries as CSV files for purposes such as distributing or backing up a dictionary. Exported dictionaries can be imported into Attivio instances at other sites or for other projects.

Note that many of Attivio's default dictionaries are not compatible with the Dictionary Management format, and therefore cannot be imported.

Importing Dictionaries

To import a dictionary file, do the following:

  1. From the Dictionaries screen, click Import a Dictionary. You are prompted to select a valid CSV file.
  2. Click Choose File, and browse to and select the desired CSV dictionary file.
  3. Click Import. The selected file imports.
  4. Click OK to dismiss the "Dictionary imported successfully" message.
  5. Use the Limit Results To: drop-down control to display the newly-imported dictionary: choose "All", or the appropriate "Dictionaries By Type" sub-entry (e.g., "Entity" if you imported an entity dictionary), or the "Pending" sub-entry under "Dictionaries By Version". The imported dictionary should now appear in the list of dictionaries.

Exporting Dictionaries

To export an existing dictionary, do the following:

  • From the Dictionaries screen, click the Actions button in the row for the desired dictionary, and select Export. The selected dictionary exports as a CSV file to the default download location specified in your browser.

Deleting Dictionaries

Before deleting a dictionary, consider exporting it to a CSV file in case you need it again later. Once you mark a dictionary for deletion there is no way to change your mind and back out. Export it first.

A dictionary deletion must be approved and published, just like any other dictionary change. Until the publication step, the dictionary is still live.

To delete a dictionary, do the following:

  1. From the Dictionaries screen, click the Actions button in the row for the desired dictionary, and select Delete. You are prompted to confirm the deletion.

  2. Click Confirm. The dictionary is marked for deletion, and the row containing the dictionary is shaded.
  3. Click Approve in the row containing the dictionary.
  4. Click Publish in the row containing the dictionary. The dictionary is permanently deleted.

Dictionary Formats

Dictionary CSV Metadata

The following section refers to the format of exported dictionaries. Dictionaries for import must conform to these CSV standards.

When importing complete dictionaries, all dictionaries require metadata in the following pattern.

"#TYPE","<TYPE>"
"#NAME","<NAME>"
"#GROUP","<GROUP_NAME>"
"#LOCALE","<LOCALE_LANGUAGE_TAG>"

For example:

Acronym dictionary metadata example
"#TYPE","ACRONYM"
"#NAME","NetworkAcronymDictionary"
"#GROUP","english"
"#LOCALE","en"

The best practice is to enclose all strings in quotes.

Some dictionary types require extra settings in the metadata. For example, a derived-mode spell check dictionary includes additional settings:

Spell check dictionary metadata example
"#TYPE","SPELLCHECK"
"#NAME","NewsSpelling-BlackList"
"#GROUP","english"
"#DICT_MODE","DERIVED"
"#ALLOW_NUMERIC_TERMS","true"
"#MAX_TERM_LEN","20"
"#MIN_TERM_LEN","10"
"#MIN_TERM_FREQ","1"
"#LOCALE","en"

Warning

Opening and saving dictionary CSV files in Excel can corrupt their format, as Excel may add extra commas to lines which are not accepted by the dictionary import service. It is better to edit the CSV files in a UTF-8 text editor.

Dictionary Term Formats

The following section describes the term formats for the available dictionary types and provides examples of each dictionary type in CSV format.

Acronyms Dictionary Term Format

TERM,ACRONYM[^BOOST][,ACRONYM[^BOOST]]

At least one acronym expansion for the root term is required.

Sample Dictionary:

"#TYPE","ACRONYM"
"#NAME","myAcronyms"
"#GROUP","English"
"#EXPANSION","BIDIRECTIONAL"
"#LOCALE","en"
"ACE","Access Control Entry^100"
"ACL","Access Control List"
"CA","certification authority"

Autocomplete Dictionary Term Format

(Autocomplete dictionaries are available in Attivio Platform 5.6.3 and later releases.)

TERM[,PAYLOAD]

Sample Dictionary:

"#TYPE","AUTOCOMPLETE"
"#NAME","MyAutocompleteDictionary"
"#GROUP","MyGroup"
"#CASE_SENSITIVE","false"
"#PUNCTUATION_SENSITIVE","false"
"#PARTIAL_MATCHES","false"
"#LOCALE","en"
"aardvark","aardvark payload"
"anteater"

Entity Dictionary Term Format

TERM[,LABEL]

Sample Dictionary:

#TYPE,ENTITY
#NAME,location
#GROUP,English
#LOCALE,en
#CASE_SENSITIVE,true
#PUNCTUATION_SENSITIVE,true
"Santa","St. Nicholas"
"Santy","St. Nicholas"
"Santa Claus","St. Nicholas"
"Saint Nick","St. Nicholas"
"Father Winter","St. Nicholas"
"Saint Nicholas","St. Nicholas"

Entity dictionaries support optional entity normalization. This feature lets Attivio recognize an entity from one string, and then label the entity using a second string. The sample dictionary entries above show how to extract references to St. Nicholas in many of his guises.

Dictionary Entity Extraction

Creating an entity dictionary is only the first half of extracting entities. See Dictionary Entity Extraction for the rest of the story.

Spell Check Dictionary Term Format

TERM

Sample Dictionary:

"#TYPE","SPELLCHECK"
"#NAME","Spell_Custom"
"#GROUP","english"
"#LOCALE","en"
"Illustration"
"Conundrum"
"Corroborate"

Stopword Dictionary Term Format

TERM[,BOOST]

Sample Dictionary:

"#TYPE","STOPWORD"
"#NAME","MyStopWords"
"#GROUP","MyGroup"
"#LOCALE","en"
"before"
"of","50"
"off","100"

Synonyms Dictionary Term Format

[EXPANSION_MODE,]TERM,EXPANSION[^BOOST][,EXPANSION[^BOOST]]

The expansion mode field is omitted for DEFAULT expansion mode.

At least one synonym expansion for the base term is required.

Expansion Modes:

  • DEFAULT – use default specified on dictionary (or EXPAND if dictionary also specifies DEFAULT)
  • EXPAND – one way expansion
  • BIDIRECTIONAL – two way expansion
  • REWRITE - rewrite
  • SUGGEST - suggestion

Expansion Mode Behavior:

The table below shows the outcome for each of the five expansion modes when querying with a synonym dictionary containing term "automobile" and expansion "car":

CSV EntityExpansion ModeResults
&expand;
EXPANDQuerying for "automobile" matches "automobile" and "car" results.
&bidir;BIDIRECTIONALQuerying for "automobile" or for "car" matches "automobile" and "car" results.
&rewrite;REWRITEQuerying for "automobile" matches "car" results (and not "automobile" results).
&suggest;
SUGGESTQuerying for "automobile" matches "automobile" results, offers user a "Did you mean" suggestion pointing to "car" results.

Sample Dictionary:

"#TYPE","SYNONYM"
"#NAME","mySynonyms"
"#GROUP","English"
"#EXPANSION","EXPAND"
"#LOCALE","en"
"HDFS","Hadoop Distributed File System^100"
"Attivio","Attivio, Inc."
"&suggest;","globe","circle^100"
"&bidir;","happy","glad^100"
"&rewrite;","principal","vital^100","important^100"

Best Practice Guidelines

Acronym Dictionaries

Acronym dictionary terms make associations between abbreviations and their expanded words or phrases. For example, CIA is an abbreviation for Central Intelligence Agency and IBM is an abbreviation for International Business Machines. If a user enters the abbreviation or the full name, it is often desirable to find documents that use either variation. For that reason, many acronym dictionaries are bidirectional rather than unidirectional.

When creating terms, determine which way you want the full form of the term to occur. For example, you create a dictionary where the FULL TERM field is always an abbreviation (such as USA). In that case, the values you specify in the Expansion field would be the phrase you want to associate with the abbreviation (such as United States and United States of America). In another dictionary, you could specify the root phrase in the Acronym field (such as United States of America) and list all of its abbreviations as variations (USA, U.S.A, U.S. USofA, etc.) Whichever format you select, the important thing is consistency for the sake of long term maintenance, tuning, and testing.

Main article: Querying of Unstructured Data

Synonym Dictionaries

Synonym dictionary terms make associations between words of similar meanings or usage. For example, car and automobile are terms that can reference the same thing. With synonyms, you may want to define terms as unidirectional rather than bidirectional. For example, a search for the word "car" would benefit if you also include terms such as sedan, coupe, mini-van, station-wagon, convertible, etc. However, if a user searches for convertible, then it is better to only return results of that car type. While synonym dictionaries are more frequently unidirectional, they can also be bidirectional. This frequently occurs when working with proper names. Remember that bidirectional dictionaries work in both directions. If all terms are truly equal to each other, set the dictionary as bidirectional. If some terms are general while others are specific, unidirectional may be a better choice.

Main article: Querying of Unstructured Data

Dictionary Locales and Profile Language/Country Settings

Attivio analyzes a query's profile to determine which dictionaries to apply to a given query. Only dictionaries with locales matching the search profile's language/country will be applied. For example, if a query specifies a search profile which has its language/country set to "English", then only dictionaries with the "English" dictionary locale assigned will be applied on that query.

Dictionaries can have multiple locales assigned, but each search profile can only have a single language/country setting.

Some sample dictionaries might include:

Dictionary Name

Locale

Profile Name

Applies to:

attivio_en

en

"Default"

Queries with an empty profile and empty locale (assuming an English system)

attivio_jp

jp

"Default"

Queries with an empty profile and locale set to "jp"

eng_en

en

"Engineering"

Queries with profile "Engineering" and locale set to "en" or left empty (assuming an English system)

eng_jp

jp

"Engineering"

Queries with profile "Engineering" and locale set to "jp"

sales_en

en

"Sales"

Queries with profile "Sales" and locale set to "en" or left empty (assuming an English system)

sales_jp

jp

"Sales"

Queries with profile "Sales" and locale set to "jp"

If a query does not specify a locale, Attivio uses the system locale.

To use a particular type of dictionary, you must enable the corresponding query property. For example, the Synonym setting for the query must be ON for the synonym dictionary that matches a query's locale and profile to be applied to the query. See Querying of Unstructured Data for more details.


Configuring Spell Checking in Attivio

IMPORTANT

In past releases, Attivio had an XML-configured spell check feature and a derived spell check dictionary based on the entire index. This is no longer the case.

Spell checking changed in the following ways starting in AIE 4.x:

  • Attivio provides spell checking and correction at query time, with multiple modes to ensure that the user experience is as efficient as possible.
  • Attivio spell checking is now configured through Dictionary Management
  • Attivio does not contain a default spell check dictionary. To enable spell checking, you must create a spell check dictionary in Dictionary Management.
  • Spell check Dictionaries update/refresh when you publish a dictionary through the Dictionary Management or the Business Center.

Spell checking is an important part of any effective full-text search system, as users who misspell a query term are easily frustrated when the query returns poor results.

Attivio provides the following spell checking and correction features:

  • Corpus-specific spelling dictionary compiled from the Attivio index when the dictionary is published.
  • Spelling suggestions for misspelled query terms, providing multiple suggestions ordered by likelihood.
  • Automatic correction of misspelled query terms, using:
    • Only the most likely suggestion for each misspelled term.
    • All suggestions (including the original terms) up to some configurable number.
  • Automatic correction of misspelled query terms when a query results in zero hits, using either or both correction modes.


Interaction with Text Analytics

Spelling correction, when enabled, occurs after stopword extraction but before other natural-language transformations in the default query workflow. Spelling correction can sometimes change a search term in a way that triggers unexpected synonym expansion, etc. It is best to turn spelling correction off while testing natural-language features.

Interaction with Wildcards

Spelling correction is automatically disabled when the query contains a wildcard.

 

Creating a Spell Check Dictionary

To create a spell check dictionary, do the following:

  1. From the Dictionary Management screen, click New Dictionary. The New Dictionary screen appears.
  2. Select Spell Check from the Type list. The spell check options appear on the screen.
  3. Enter a name for the dictionary.
  4. Specify the desired dictionary Type and Group.
  5. Click the magnifying glass icon to specify the desired Locale(s). The Locales screen appears.
     

  6. Search for your desired locales by name or use the page controls to locate them. Check the box for each desired locale, then click Select. The selected locales appear in the Dictionary details. See Locale and Profile for details. Note that the "locale" setting on a spell check dictionary does not limit dictionary terms to this locale but rather says: "use this spell check dictionary for this locale".
  7. To specify your spell check data sources, select the desired data sources from the Available Sources list and use the >> and buttons to move them to the Selected Sources list for use in this dictionary. Note
    • If you select data sources, the dictionary is created on those sources in the index.
    • If you do not select data sources, the dictionary is created on the entire index. If your sources contain multiple languages, terms for all languages will be included in the spell check dictionary, which may or may not be desirable.
  8. Select one of the two Dictionary Modes:
    • Derived – includes terms extracted from one or more data sources in the the index. Configured dictionary terms are used as a blacklist – i.e., user-configured terms will never be offered as spell check corrections or suggestions. You cannot edit derived-mode dictionaries and their extracted terms are hidden and cannot be exported.
    • Custom – includes only user-configured dictionary terms, which are used as a whitelist – i.e., only user-configured terms will be offered as spell check corrections or suggestions.
  9. Specify the remaining Spell Check Options for derived dictionaries as follows:
    • Maximum Term Length – the maximum length for terms added to spell check dictionary. Default is 0.
    • Minimum Term Length – the minimum length for terms added to spell check dictionary. Default is 0.
    • Minimum Term Frequency– the minimum frequency allowed for terms. Default is 0.
    • Allow Numeric Terms– if enabled, terms containing numbers are allowed. Default is disabled.

      These options are only available for derived-mode dictionaries (i.e., not available for custom-mode dictionaries).

  10. Click Save to save your dictionary.
  11. Click Approve Dictionary.
  12. Click Publish Dictionary to make your spell check dictionary active.

To configure a basic, English spell checking dictionary, use the following values:

    • Dictionary Name: SpellCheckDictionary
    • Type: Spell Check
    • Group: SpellCheck
    • Locales: English
    • Dictionary Mode: Derived
    • Maximum Term Length: 255
    • Minimum Term Length: 1
    • Minimum Term Frequency: 1
To test, add this spell check dictionary to a profile in Attivio Business Center, then specify that profile on a test query containing misspelled search terms. See the Manage Search with The Attivio Business Center documentation page for more details on creating, editing, and testing profiles.

 

Changing Spell Check Dictionaries

Spell check dictionaries only change on publication. To incorporate new frequencies into the spell check dictionary, you must re-publish the spell check dictionary. When you publish a spell check dictionary the following process occurs:

  • Modified dictionary terms/config are put into the published state
  • The configured dictionary is compiled and published to Attivio Store
    • This is the whitelist/blacklist
  • Spell check extractors extract term frequencies and apply to configured terms
  • If multiple partitions for index exist, spell check service merges the output from all extractors
  • Final compiled spell check dictionary is published to Store
  • Query transformers are notified of new spell check dictionary

Per-Query Parameters

Each query request has three parameters that affect spell checking: the spell check mode, expansion size, and the resubmit count.

Spell check mode: Specifies the spell check processing before the query is executed, and affects how the query is processed if it is resubmitted after returning no results. The mode is specified in the search profile. Its values are one of:

  • OFF - no spell checking. This is the default.
  • SUGGEST - returns in the query response a list of possible corrections for each word that does not appear in the spell-checking dictionary, in order of likelihood.
  • AUTO_CORRECT - replaces each word in the query that does not appear in the spell-checking dictionary with the most likely correction.
  • AUTO_EXPAND - OR's each query word that does not appear in the spell-checking dictionary with the most likely corrections.

Expansion size: Specifies the max number of the suggested spelling corrections that can be used in AUTO_EXPAND mode. This can be specified in the search profile by clicking the "Advanced Options" button, selecting the "Query Parameters" tab, and entering the "l.spellexpandsize" parameter.

Max resubmits: The maxResubmit parameter specifies how many times the query can be resubmitted for re-execution. If it is set to zero, however, it disallows resubmitting of queries that return no results after spell-correction. This can be specified in the search profile by clicking the "Advanced Options" button, selecting the "Query Parameters" tab, and entering the "q.maxResubmits" parameter.

Configuring Spell Check Resubmission

The spell check resubmission feature operates on queries which return no results: it re-submits such queries to the engine, using progressively more permissive spelling settings, in an effort to find spell check matches.

This feature is disabled by default. To enable it, create a <PROJECT_DIR>/conf/features/core/SpellCheckResubmit.xml configuration file with this content:

<PROJECT_DIR>/conf/features/core/SpellCheckResubmit.xml
<ff:features xmlns:ff="http://www.attivio.com/configuration/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fbase="http://www.attivio.com/configuration/features/base" xmlns:f="http://www.attivio.com/configuration/features/core" xsi:schemaLocation="http://www.attivio.com/configuration/config http://www.attivio.com/configuration/config.xsd http://www.attivio.com/configuration/features/base http://www.attivio.com/configuration/features/baseFeatures.xsd http://www.attivio.com/configuration/features/core http://www.attivio.com/configuration/features/coreFeatures.xsd">
  <f:spellCheckResubmit enabled="true" index="index" spellCheckWorkflow="querySpellCheck">
    <f:where position="last" skip-if-exists="false" workflow="search"/>
  </f:spellCheckResubmit>
</ff:features>

The feature's properties are described below.

PropertyDefault ValueDescription
index
index
The name of the index feature for which you are configuring the spell check resubmission.
spellCheckWorkflow
querySpellCheck
The name of the workflow containing the spell check query transformer.
skip-if-existsfalseIf the named component already exists at the specified point in the destination workflow, silently do nothing. This feature prevents double-insertion of components that are independently created by multiple modules.
workflow
search
The workflow into which you are inserting the spell check resubmission.
position
last
The location within the workflow for the spell check resubmission. (first, last, before, after)
relativeComponent
not setThe relative component for when resubmit position is after or before.

To verify that the feature is operating:

  1. Create a spell check dictionary.
  2. Create a search profile with spelling mode set to Off and set it to use the new spell check dictionary.
  3. Issue a query for a misspelled search term, with the new search profile enabled and maxResubmit set to 1 or more. You should see query feedback messages named spelling (indicating what resubmission action was taken) and spellcheck.dictionary (indicating which dictionary was used); if Attivio found a spell check match, you will also see one or more suggestion, expansion, or correction messages.

Configuring Spell Check Schema Fields

By default, Business Center only compiles spell check dictionaries for the title and content schema fields. This means that spell check will return suggestions for queries like title:markhm (which is restricted to the title field) or content:markhm (explicitly restricted to the content field) or markhm (implicitly restricted to the content field because it is the schema's default field), but will not return suggestions for queries restricted to other fields, such as author:markhm.

If you want Business Center to include other schema fields when compiling spell check dictionaries, add a new biz:spellCheckField entry for each such field in the  <PROJECT_DIR>/conf/features/businesscenter/BusinessCenter.xml file:

<PROJECT_DIR>/conf/features/businesscenter/BusinessCenter.xml
<ff:features xmlns:ff="http://www.attivio.com/configuration/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:biz="http://www.attivio.com/configuration/features/businesscenter" xsi:schemaLocation="http://www.attivio.com/configuration/config http://www.attivio.com/configuration/config.xsd http://www.attivio.com/configuration/features/businesscenter http://www.attivio.com/configuration/features/businesscenterFeatures.xsd">
  <biz:businessCenter component="applyBusinessCenter" enabled="true" index="abc-index" nodeset="searchers" searchWorkflow="search" service="businessCenterService">
    <biz:where position="after" relative-component="queryParser" skip-if-exists="false" workflow="queryInit"/>
    <biz:spellCheckField>title</biz:spellCheckField>
    <biz:spellCheckField>content</biz:spellCheckField> <!-- User-added entry to enable spellcheck on "author" field-restricted queries -->
    <biz:spellCheckField>author</biz:spellCheckField>
  </biz:businessCenter>
</ff:features>

Re-deploy the Attivio project configuration with this change. Any new spell check dictionaries created after this deployment should return suggestions for author:markhm and other author field-restricted queries. To update existing spell check dictionaries after making this change, you must re-save, re-approve, and re-publish them following the steps given in the previous sections.

Query Stopword Removal

Query stopword removal is provided by the queryStopwords component located in the search > defaultQuery > queryAttivioLinguistics query workflow. This component will remove stopwords for query terms searching the title field or the content field (the default search field).

By default, this component is configured to use terms configured in a stop word dictionary named stopwords. In order to enable stopword removal, you must create and publish a stop word dictionary named stopwords.

Query Synonym Expansion

Query synonym expansion is provided by the querySynonymizer component located in the search > defaultQuery > queryAttivioLinguistics query workflow. This component will expand synonyms for query terms searching the title field or the content field (the default search field).

By default, this component is configured to use terms configured in a synonym dictionary named synonyms. In order to enable synonym expansion, you must create and publish a synonym dictionary named synonyms.

Query Acronym Expansion

Query acronym expansion is provided by the queryAcronymExpander component located in the search > defaultQuery > queryAttivioLinguistics query workflow. This component will expand acronyms for query terms searching the title field or the content field (the default search field).

By default, this component is configured to use terms configured in an acronym dictionary named acronyms. In order to enable acronym expansion, you must create and publish an acronym dictionary named acronyms.

Dictionary URI

Once a dictionary is published, it is available in the Attivio Store as a unique resource. The identifier for the dictionary uses the following format:

acs://<CONTENT_STORE_NAME>/dictionaries/<DICTIONARY_TYPE>/<DICTIONARY_NAME>/<DICTIONARY_GROUP>

For example, if we have the following attributes in the system…

  • Store Name: contentStore
  • Dictionary Type: AUTOCOMPLETE
  • Dictionary Name: myAutocomplete
  • Dictionary Group: autocompleteDictionaries

…then, once published, this dictionary will be available via following URI

acs://contentStore/dictionaries/AUTOCOMPLETE/myAutocomplete/autocompleteDictionaries

This dictionary URI provides an Input Stream which you can read in the following manner:

final InputStream in = dictionaryUri.toURL().openStream();
final CSVReader csv = new CSVReader(new InputStreamReader(in, "UTF-8")); //read through terms
for (String[] row = csv.readNext(); row != null; row = csv.readNext()) {
  ...
}