This guide details the ordering of messages in the Attivio platform.
View incoming links.
Attivio does not enforce the order of messages as they flow through the system. It is possible that messages may arrive out of order to any component in any workflow, including the index engine. Unordered messaging also means the order in which documents are processed by the index engine may differ from the order in which the connector(s) sent them. For general bulk loads, or systems without high update rates for single documents, unordered messaging is not a problem, and the system defaults allow for the highest throughput and lowest memory utilization.
For example, a client may send a series of documents as follows:
However, those documents may arrive and be processed by the indexer as follows:
Issues with Unordered Messaging
Unordered messaging can be a problem if updates or deletes are sent in between index commits, as follows:
In the series above, with unordered messaging, there are no guarantees that the original versions of doc1 and doc2 will get to the indexer first and be deleted/updated with the later versions of those documents. For example, the delete could be processed first, followed by the update, followed by the original version that would result in an old version of a document being in the index. In order to solve this problem, commits must be issued and completed before the series of updates/deletes are sent.
Ordered Commits Mode
Attivio runs with connectors having "ordered commits" mode enabled by default. "Ordered commits" mode means that connectors will wait for all previously sent documents to be indexed before sending a commit message. Ordered Commits mode ensures that any time a commit is sent to a workflow, all documents previously sent are flushed through the workflow before the commit is sent.
For example, a client may make the following calls to the API:
Using ordered commits, the feeder waits for doc1, doc2, and doc3 to reach the end of the workflow before the commit is sent. This guarantees that an update for doc1 sent later will not arrive at the indexer before the original document. See the javadoc for ContentFeeder for more information on ordered commits.
Because of unordered messaging, in the example above, the indexer might see messages in any of the following orders. The important point is that the groups of documents between commits are guaranteed to have completed the workflow before the commit completes and before any other documents are sent to Attivio.
Enabling/Disabling Ordered Commits Mode on a Connector
The default "ordered commits" mode can be overridden in the connector configuration.
Disable Ordered Commits
Configure the connector component with the following property set:
Enable Ordered Commits
Configure the connector component with the following property set.
Issues with Ordered Commit Mode
Ordered commit mode only guarantees that all documents sent before a commit will be committed. Ordered commit mode makes no guarantee that other messages will not be committed. For instance, if the following messages were sent:
they may be seen and indexed as follows:
If client code is dependent on the number of commits, waitForCompletion() should be called after each call to commit() , or ordered messaging should be enabled. Client code may be dependent on the number of commits if, for example, the number of documents committed is sent, or updates and deletes are sent together.
Manual Ordered Messaging
In addition to ordered commits mode, any ingest client can call waitForCompletion() at any time to ensure that all documents previously sent to Attivio have completed processing. Calling this method does have significant overhead, as the client must wait for all messages to be processed, which may take a long time if the server is busy or otherwise resource bound; however, calling this method can allow clients to ensure message order, when necessary, without the added cost of a commit. For example:
There is no coordination between separate clients feeding the same document. This applies to clients in separate threads and clients in the same thread (ContentFeeder client is NOT thread-safe). If sending the same document (based on unique Attivio document id) from different clients, an external synchronization method based on waiting for commits to complete needs to be implemented. Timing is not sufficient as a means to guarantee message ordering between clients, as SEDA, fault tolerant configurations, multiple component instances and a variety of other workflow stages could reorder the messages.
For example, if two clients are feeding documents as shown below, there is no guarantee which versions of the documents will get indexed. Document doc1 will either be deleted or updated. Documents doc2 and doc3 will both be in the index; however, there is no guarantee which client's version will be returned from a search.