Page tree
Skip to end of metadata
Go to start of metadata

General Questions

When should I commit or optimize an index?

Indexed documents are not available for searching until a Commit has been sent. In addition, any documents which have been indexed but not committed will be lost if the system shuts down unexpectedly. When performing a large / initial feed of content, a Commit is only necessary after all content has been fed. All connectors, by default, will send a Commit after sending the last document. Commits require that all caches be reloaded and are proportional in resource utilization to the size and complexity of the index. As new content is added to the index, AIE manages deleting, updating and adding documents by modifying the underlying index data structures on disk and by marking deleted documents without actually removing them. Over time this growing storage of deleted documents can add up and affect performance. An Optimize is necessary to remove all of the previously deleted documents. The Optimize can take significant resources and should be performed during off hours. Search is still available during both a Commit and Optimize however search times may be degraded if either operation takes significant resources or the index is on an underpowered machine.

How do I delete the data that has been indexed?

AIE has several methods for deleting documents. You can delete documents by providing a list of documents or even by providing a query. View the Deleting Content guide for more information.

What is the largest size file that can be processed by the Attivio Intelligence Engine (AIE)?

AIE has a memory based processing model; therefore, the amount of memory available limits the maximum file size that can be processed. The amount of memory required for ingesting a file is a function of the number of instances of text extraction, how many files are being processed simultaneously, the size of other files being processed at the same time, queue sizes, the file size and the number of tokens in the file. Attivio strongly recommends that connector maximum file size as well as maximum token limits within the tokenizer be used to ensure that files can be processed within the supplied memory constraints. AIE is configured with default values for the maxTokens and max-file-size constraints. 

How do I configure the AIE to support the AND operator?  My users use the AND operator in their queries.

The Attivio Simple Query Language is a default AND query language.  Users do not need to specify AND in the queries.  For example, if a user queries "Blue Sky", this is treated internally as "blue AND sky".  By default, if a user includes the word AND, AIE will match documents that contain AND.  If this is not the desired behavior, it is possible to configure the RewritePatterns QueryTransformer to remove occurrences of the word AND from queries.

What is ordered messaging and when should I use it?

See the Message Ordering documentation for more information.

API Questions

When I search, I always get back 10 documents even though more should match, am I doing something wrong?

The .getDocuments() property contains the list of documents in the first result set (page) of a query but not all of the documents matching that query. The total number of documents that match a query can be obtained via the API method QueryResponse.getTotalRows(). When creating the you can set the size of the result set (10 is the default) via the Rows property and change the starting position within the rowset via the Offset property. Note that setting these to large values will have performance, memory and bandwidth implications.

Multi-Node Questions

Given a powerful multi-core machine with plenty of RAM, Is there any performance benefit from splitting up a very large index into multiple local smaller ones inside the same AIE instance? For example, one 8-million doc index vs. four 2-million doc indexes?

Yes. This can improve the performance of commits and other indexing tasks.

Is it safe to use a delete-by-query connector in a multi-node/index environment?

Yes. There is no difference.

  • No labels