Linguistic analysis tools operate primarily at the tokenization level, preparing unstructured text to support richer, more accurate, and more convenient search features by:

  • Identifying the language(s) used in the document, to select the best approach to tokenization.
  • Tokenizing the incoming text by breaking out individual words and sentences.
  • Adding tokens to improve accuracy and recall of keyword-matching searches through synonym expansion and lemmatization. 
  • Making it possible to accurately highlight matching phrases in search results.
  • Preparing the text for subsequent metadata extraction and facet creation through advanced text analytics features. 

These tools are described in detail in the pages listed below.

