Page tree
Skip to end of metadata
Go to start of metadata

This page contains an alphabetical list of concepts and terms used in the Attivio platform documentation.  Most of the entries link out to a document that uses or illustrates the concept.   

View incoming links.


A

Access Control Entry (ACE) - A specific entry in a given ACL. For example, on a specific document, there may be an ACE that maps the Read permission type to the list of principals (user1, group55) who are allowed to read the document. More...

Access Control List (ACL) - A list of permissions attached to an object. An ACL specifies which users and/or groups are granted access to objects, as well as what operations are allowed to be performed on these objects. An ACL is stored with each document in the index and is used to filter query results based on the user's credentials. More...

Accuracy - In the context of search, "accuracy" refers to the quality of the search results, and is measured through precision and recall.

Advanced Query Language - A query language suitable for programmers who need to write sophisticated queries in Attivio. The advanced query language includes operators for terms, Booleans, term anchoring (text starts with...), proximity search (NEAR), embedded queries (for embedding a user query or application parameters), etc. See also Simple Query Language. More...

Authentication - Prompting user for credential information to validate their identity. More...

Authorization - Applying permissions based on a user's credentials. More...

Availability - In the context of Attivio, "availability" generally refers to maintaining a low query-response time. This is achieved through proper index partitioning and duplication. More...

AVM - Attivio Virtual Memory, a transport mechanism for passing messages among the Attivio components of a single node. More...

B

Base Linguistics - Used in the context of the Advanced Linguistics Module to designate language-specific linquistic features that do not require that languages Advanced Module.  See also Core Linguistics.

Boolean Search - A search allowing the inclusion or exclusion of documents containing certain words through the use of operators such as AND, NOT and OR. More...

Boosting - Boosting may be used to alter the relevancy value of a document compared to other documents in a search index, typically because it is perceived to be a more valuable resource. It is the addition or subtraction of a value to a document's rank (relevancy). More...

C

CJK - "Chinese, Japanese, and Korean" languages. More...

Classification or Categorization - Document classification/categorization is the task of assigning an electronic document to one or more categories based on content. More...

Component- A component is a general data processing unit derived from the core class, PlatformComponent . Components may perform simple conditional evaluation, transform content, or provide a custom implementation that wraps embedded components or interacts with external ones. More...

Completeness - Related to relevancy. A gauge of how well the document matches superior document contexts such as the title or the URL. More...

Concept Extraction - The ability to mine and extract domain-specific terms ("concepts") from documents. More...

Concept Search - A search for documents related conceptually to a word, rather than specifically containing the word itself. More...

Connector - Connectors are Attivio platform components that retrieve data or content from particular sources and pass that data or content on to an Attivio workflow for further processing. More...

Content - In the context of Attivio, "content" usually refers to unstructured text documents such as white papers, newspaper articles, and email messages. Contrast with data. More...

Core Linguistics - Used to denote Attivio's native linguistic-analysis tools, without having recourse to the Advanced Linguistics Module

Corpus (pl. Corpora) - A "body" or collection of documents that have some common theme or domain, such as all sharing the same source, the same topic, or the same author.

Crawling - The act of accessing Web servers and/or file systems in order to extract information to feed into the enterprise search platform. See also Harvesting.  

D

Data - In the context of Attivio, "data" and "structured data" refer to database records. Contrast with content. More...

Dictionary - Within the context of search, a dictionary supports linguistic processing of content and queries against a list of words/terms/phrases in order to improve recall and/or precision for a query. More...

Document - This term generally refers to a unit of unstructured content, such as a report, article, or email message. More...

Duplication - Attivio can create and manage multiple copies of the index partitions, a strategy that supports high query loading. Duplication can be managed using  "shared" and "replicated" partitions. More...

E

Electronic Discovery (e-discovery) - A process in which electronic data is sought, located, secured, and searched with the intent of using it as evidence in a civil or criminal legal case. More...

Entity Extraction - The ability of an enterprise search platform to parse and recognize informational entities, such as geographic names, personal names, and company names. More...

Exact match - To match query terms to document words exactly. This will not allow fuzzy matching based on spell check and/or linguistic normalization.

Endpoint - The address or connection point to a web service, such as an Attivio connector or API. It is typically represented by a simple HTTP URL string.

Entity Sentiment - A document may mention multiple entities, some positively and others negatively. Entity sentiment is a sentiment score for an individual entity mentioned in a document. Contrast to Sentiment Analysis. More...

Excluded Terms - The ability to disqualify a matching document if it contains a specific term. More...

F

False positive - In the search context, a returned result that does not actually relate to the query. 

Facet - Facets are drill-downs or filters, offered within a graphical user interface, that help a user navigate search results more effectively and efficiently. Common examples of facets seen on e-commerce websites are product category, brand, or price range. More...

Federated search - A search over multiple search systems. Attivio combines content from multiple sources into one index, which makes typical federated search unnecessary.  In addition, Attivio indexes can be partitioned into multiple nodes that are then federated into a single search.  More...

Field Query - A field query requires that a keyword match a specific document field, such as the title. More...

File Traversing - The act of scanning the content of a file system directory for content ingestion.

Footprint - The resources (disk, memory, and so on) occupied by a software system. More...

Filter - An optional search query parameter that specifies the query will retrieve only those documents that meet certain criteria. See Facet.

Freshness - A function computed on the age of a document. Usually used to boost the relevancy of newer documents, putting them nearer the top of the list of results. More...

Fuzzy Matching - A match that is based on morphological and lexical techniques to reduce words to their core and then match all forms of the word. More...

G

Group - A set of users typically classified (grouped together) by a set of common traits. More...

GWT - Google Web Toolkit (GWT) is an open-source set of tools that allows web developers to create and maintain complex JavaScript front-end applications in Java. More...

H

Harvesting - Web harvesting can be thought of as focused or directed Web crawling.

Highlighting - The search term is highlighted wherever it appears in the list of results. More...

I

IngestDocument- IngestDocuments are the content containers that flow through the ingestion workflows. They are created by connectors, are transformed by workflow components, and eventually are processed into the Attivio index. More...  

Ingestion - The process of gathering content and processing it into Attivio. More...

Index - The searchable catalog of documents created by the search engine software. More...

J

JOIN - Return a matching document plus additional documents and records that are related to it by JOINing on a common key in multiple Attivio index "tables."  More...

K

Key Phrase - A word or phrase that is used unusually frequently in this document, compared to its overall frequency in that language. More...

L

Language Model - A language model is a record of how probable words and sequences of words are, measured over a large amount of text.  (Language models are large enough that they are packaged as separate downloads that are added to Attivio as needed.) More...

Lemmatization - The process of determining the "lemma" for a given word. Lemmatization allows inflected forms of a word to be analyzed as a single item. Lemmatization is closely related to "stemming", but unlike stemming, which operates only on a single word at a time, lemmatization operates on the full text, and can therefore discriminate between words that have different meanings depending on context. More...

  • Lemmatization By Expansion - a type of lemmatization that expands terms to their full set of inflected forms.
  • Lemmatization By Reduction - a type of lemmatization that normalizes terms to their grammatical base form.

Linguistics - the study of the nature, structure, and variation of language. In advanced enterprise search platforms, linguistics analysis enables transformation of content and queries for the purposes of improving relevancy, recall, and precision. More...

Link Cardinality - the number of links in a set that refer to a given document. It is best used to determine the relevancy of a Web page by factoring in how many other pages refer to the page under consideration.

M

Module - A module is a portion of the Attivio product that provides a specific set of features to a project. The core Attivio installation includes many modules. There are also add-on modules that can be downloaded and added to Attivio projects. Add-on modules typically extend the core product by adding new features, transformer classes and components, and workflows.  The Add-on modules often require special licensing and must be installed separately from Attivio. More...

Morphology - The structure and form of words in language, including inflection, derivation, and the formation of compounds.

Multi-Level Sorting - sorting by multiple fields.  More...

N

Navigation - Information discovery through drill-down into query results. See also Facet.

Node - In the Attivio documentation, a "node" usually means a single Attivio instance running on a server.  Nodes are usually allocated one to a server, but in some circumstances more than one node can run on a single server.  More...

Nodeset - A nodeset is any logical grouping of nodes. (Note that every node is automatically a nodeset of one element.) Nodesets can overlap.  More...

Noun Phrase Extraction - Extracting noun phrases from text during the ingestion process. More...

O

OCR (Optical Character Recognition) - The translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into electronic representations of that text. One example of OCR is translating a PDF file into a flat text file. More...

P

Parsing - The process of analyzing input to determine its grammatical structure with respect to formal grammar. A parser is a computer program that carries out this task.

Partition - An Attivio index may be subdivided into multiple "partitions" to obtain various efficiency improvements. It is usual to place each partition in its own node, but it is possible to have a single node that contains multiple partitions.

Performance - There are several areas where Attivio can be tuned for optimum performance given a specific index and hardware configuration. More...

Permission - A right granted or denied to specific users or groups in a system. Examples of permission types are: Read, Write, Delete. Note that the current implementation of the Attivio Content Security only supports the Read permission type. More...

Phrasing - The recognition and grouping of an idiom such as "red zone" or "diamond lane". More...

Phrase Search - A search for documents containing an exact sentence or phrase specified by a user, such as "New York." More...

Precision - The fraction of the result list that matches the query. For example, if a search engine lists 20 documents but only 5 of them really match the query, then the precision would be 25%. See also recall.

Prefix Notation - A form of notation used in logic, arithmetic and algebra that places operators to the left of their operands. For example, the expression "1 + 2" would be expressed as "+ 1 2" in prefix notation. More...

Principal - An entity that may perform actions on objects within a system; may be an individual user or a group of users. More...

Principal ID - A unique identifier for a user or a group. For example, the SID in Active Directory.

Proximity Boosting - For a query that contains multiple words, a proximity boost will cause a document that contain those words close to each other to move higher on the results list than another document that contains the same number of instances of those words, but where those words appear farther apart from each other. More...

Proximity Search - A search specifying the maximum distance by which the query terms may be separated. Proximity search is a hard constraint, as opposed to proximity boosting (above). More...

Q

Quality - In relation to relevancy, the quality of the document, and how important it is as viewed by the content owner or search application.

Query - The word or words used for searching, plus any other options defined by the search engine, such as proximity boosting or linguistic processing. More...

Query-By-Example - A search where a user instructs an engine to find more documents that are similar to a particular document (also known as "more like this"). More...

R

Range Restrictions - The ability to limit a search to a specified range of values. For example, a search for a "desktop computer" that costs between $600 and $800. More...

Ranking - To arrange result documents according to their relevancy score for a given a query. More...

Real-Time Update - Real-time fields are Attivio index fields that support high-speed updates and can be updated without having to re-index entire documents. These fields can be queried, retrieved and faceted as if they were normal fields in the index. More...

Recall - Related to precision, this is the degree to which a search engine returns all the matching documents in a collection. If the search should have found 100 documents, but in fact returned only 80 of them, recall would be 80%.

Regular Expression (regex) - A regular expression provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. More...

Relational Search - Attivio respects the relations it finds in ingested database records, and enhances them by extracting relationships among entities it finds in unstructured content. This lets us write queries that exploit relations, such as "List the salaries of all athletes who were named in newspaper articles about drug abuse." More...

Relevancy - Relevancy is a measure of how well a particular document matches a query. Relevancy is calculated based on various characteristics of the document, and is used to rank the result list. More...

S

Scanner - Scanners connect to external data sources and feed results to Attivio. Remote systems could include file systems, CRM systems, databases, CSV, XML, or custom data feeds. Scanners implement the minimum amount of logic necessary to convert the source data format into an IngestDocument. A scanner is one of the three essential elements of a connector. More...  

Schema - The Attivio Schema is the list of fields in the Attivio Universal Index. More...

Schema-Neutral - Attivio is "schema-neutral" in that it allows you to extend your system with tables of data and content at any time, without having to change the underlying data model or re-ingest information. This is in contrast to conventional databases, where creating new tables is a major inconvenience.

Security Realm - A collection of users and groups that are controlled by the same authentication policy. More...

Seed - A starting point for a harvester to begin its processing of pages from.

Segment - A segment is a file containing index records. When documents are ingested by Attivio, the new index records are first accumulated in memory until a INLINE of sufficient size can be written (or "flushed") to disk as a new index segment. More...

SELinux - Security-Enhanced Linux is a Linux feature that provides access control security policies in the LInux environment.  Enterprise server software packages often must disable this feature in order to run properly.

Sentiment Analysis - The evaluation of the sentiment - typically positive or negative - of a document based on the usage of language. Compare to Entity Sentiment. More...

Silos - "Silo" is used to indicate an isolated information repository such as a database, an email system, a wiki system, HTTP web pages, spreadsheets, trouble tickets, and so forth. Silos are traditionally searched one at a time because it is so inconvenient to combine them into a single search experience. Combining multiple silos into a single search experience is one of the many strengths of the Attivio platform.

Simple Query Language - Simple search queries return only documents that contain all query terms in a search request. The simple query language also supports phrase matches, wildcard search, similarity search, term boosting, field queries, numeric values, dates and ranges. Users can also exclude terms or use the 'OR' operator explicitly. See also Advanced Query Language.

Similarity Search - Find words that are similar to the search term. For instance, the search might request matches that are 80% similar to "retrieval," such as "retrieve," "retrieved," and "retrievable." More...

Spell Checking - Query terms and phrases are spell-checked and corrected using a dictionary based on ingested content.  More...

Stacking - When several tokens apply to the same word (as stem, lemma, and synonyms might do), the tokens are said to be stacked. 

Stage - A simple input/output component in a workflow. The term is used loosely to include all steps in a workflow, even though some of them technically are not "stages." More...

Stemming - The process for reducing words to their root or stem form. Also see Lemmatization. More...

Stop Words - "Stop words" are words that are too common to be useful for text matching. These words are ignored during ingestion.  More...

Storage Area Network (SAN) - A Storage Area Network (SAN) is a network of storage disks. In large enterprises, a SAN connects multiple servers to a centralized pool of disk storage. More...

Surprise Value - A statistical measurement of how "surprising" it is to have found a word (bigram, etc.,) in a specific context. More...

Synonym Expansion - To expand a query or document with a defined list of synonyms for the words it originally contains. More...

T

Table - The Attivio Universal Index can contain many "virtual tables" identified by unique "table" labels found in every record.   More...

Term Boosting - The ability to support different relevance weight for different terms in a query. For instance, new^200 would increase the relevancy score for documents that contain the term 'new'.  More...

Text Analytics - This is a general term for statistical analysis of unstructured text, and may include entity extraction, classification, and sentiment analysis. More...

TF-IDF (Term Frequency - Inverse Document Frequency) - a weight used to evaluate how important a word is to a document in a collection of documents or the full corpus. The importance increases proportionally to the number of times a word appears in the document (TF), but is offset by the frequency of the word in the index (IDF). More...

Tokenization - Tokenization splits a INLINE of text into individual strings that are single words, punctuation marks, numbers, etc. Word boundaries are determined by punctuation, including especially the space character. Some languages, such as CJK, require more sophisticated algorithms using semantic analysis as there are no clear boundaries between words. Tokenization, More...

Transformer- Transformers are instances of com.attivio.platform.transformer . They operate on the contents of IngestDocuments, performing a wide variety of processing and analysis tasks. A transformer is the heart of a workflow component.

U

Unified Information Access (UIA) - Attivio delivers the power of unified information access to a variety of interfaces, including dashboards, reports and search results.  Related content such as emails, logs and other text files can also be displayed in response to a single query for information on a topic.

Universal Index - Attivio's "universal index" accommodates many different kinds of content and data in a single format. More...

W

Weight - See term boosting.

Wildcard Search - Using a character such as "?" to take the place of an unknown character in a query term.  A search for n?w would match both 'new' and 'now'. More...

Workflow - A series of services and transformers. In addition to linear pipelines of transformers, workflows can include conditional logic (branching), loops and the duplication of messages to multiple endpoints. More...

Z

Zone - A zone is a subset of an index, containing a subset of the indexed documents. Each zone is implemented as a separate set of index files. Users can update each zone separately but query all of them together as one index. More...

  • No labels