Linguistic analysis tools operate primarily at the tokenization level, preparing unstructured text to support richer, more accurate, and more convenient search features by:
- Identifying the language(s) used in the document, to select the best approach to tokenization.
- Tokenizing the incoming text by breaking out individual words and sentences.
- Adding tokens to improve accuracy and recall of keyword-matching searches through synonym expansion and lemmatization.
- Making it possible to accurately highlight matching phrases in search results.
- Preparing the text for subsequent metadata extraction and facet creation through advanced text analytics features.
These tools are described in detail in the pages listed below.
View incoming links.