Page tree
Skip to end of metadata
Go to start of metadata


Linguistic analysis tools operate primarily at the tokenization level, preparing unstructured text to support richer, more accurate, and more convenient search features by:

  • Identifying the language(s) used in the document, to select the best approach to tokenization.
  • Tokenizing the incoming text by breaking out individual words and sentences.
  • Adding tokens to improve accuracy and recall of keyword-matching searches through synonym expansion and lemmatization. 
  • Making it possible to accurately highlight matching phrases in search results.
  • Preparing the text for subsequent metadata extraction and facet creation through advanced text analytics features. 

These tools are described in detail in the pages listed below.

View incoming links.


  • No labels