Overview
Many languages have processes by which words can change systematically. When the changes involve plural/singular, gender, or verb tense or mood changes, these are referred to linguistically as inflection. For example, the terms book, booked, and booking are inflectionally related in English.
Legacy search engines sometimes use a process called stemming, which removed suffixes from words using a set of language-specific rules.This process indexes booked and booking as the simple stem, book. All documents that use inflected versions of book can then be found in a single search for "book".
Stemming is not recommended in AIE!
AIE has lemmatization enabled for all supported languages. Stemming is incompatible with lemmatization, and is disabled by default.
View incoming links.
Enabling Stemming
Stemming is not available for all languages. See the individual language pages for more information.
To enable stemming for core AIE linguistics (of English), you can modify the EnglishTokenizer to perform stemming instead of lemmatization. Edit the <project-dir>/conf/features/core/TokenizerModel.english.xml file, adding an f:property element with name "lemmas" and value "stem":
<?xml version="1.0" encoding="UTF-8"?> <ff:features xmlns:ff="http://www.attivio.com/configuration/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fbase="http://www.attivio.com/configuration/features/base" xmlns:f="http://www.attivio.com/configuration/features/core" xsi:schemaLocation="http://www.attivio.com/configuration/config http://www.attivio.com/configuration/config.xsd http://www.attivio.com/configuration/features/base http://www.attivio.com/configuration/features/baseFeatures.xsd http://www.attivio.com/configuration/features/core http://www.attivio.com/configuration/features/coreFeatures.xsd"> <f:tokenizer class="com.attivio.platform.tokenizer.EnglishTokenizer" enabled="true" fallbackLocale="en" name="english"> <f:property name="lemmas" value="stem"/> </f:tokenizer> </ff:features>
Stemming modifies incoming text, so you'll have to re-index any documents that have been ingested up to this point. A mix of stemmed and unstemmed documents in an index produces incomplete search results.