Page tree
Skip to end of metadata
Go to start of metadata

Overview

Backing up and restoring an Attivio Cluster does not require any special set up or configuration to be enabled. Since Attivio controls the embedded Hadoop services of HDFS, HBase and ZooKeeper, the user that Attivio runs as will have all the necessary permissions to perform all of the following commands and actions.

There are 3 fundamental types of data which an Attivio project requires to fully function, which need to be backed up on a regular basis. This is in addition to the configuration and custom source files which reside in the project directory which are assumed to be backed up already in your source code management tool such as Git.

  1. Indexes - the files which comprise an Attivio index, such as the default "index" or "abc-index" used by the Business Center.
  2. Store - operational data such as incremental connector tracking info, trigger definitions, and signals used for Machine Learning relevancy.
  3. ZooKeeper - application configuration and state information.

Back up Indexes

Backing up an index involves creating a snapshot in HDFS, where the index files are stored. Snapshots are read-only, point-in-time copies of the file system. 

The implementation of snapshots is efficient:

  • Snapshot creation is instantaneous.
  • There is no data copying.
  • Snapshots do not adversely affect regular operations.
  • Over time, as the data diverges from the snapshot, more disk space will be consumed.

To create an index snapshot, execute the following command via the Attivio CLI:

snapshot index index
snapshot index abc-index

(warning) The index must be running. Ingestion will be paused momentarily to ensure a consistent view of the data. The snapshot will capture the live/committed state of the index for restoration at a later date.

If you want to ensure uncommitted documents are included in the backup, do a commit first.

Back up the Store

When the Attivio store is backed up, each Attivio-related HBase table will be snapshotted in sequence. 

To create a store snapshot, execute the following command via the Attivio CLI:

snapshot store

(warning) All HBase and HDFS services must be running to successfully create a store snapshot.

Back up ZooKeeper

When ZooKeeper is backed up, the contents of the "attivio" znode will be serialized to a file.  

To create a ZooKeeper snapshot, execute the following command via the Attivio CLI:

snapshot zk

(warning) The ZooKeeper snapshot is not technically a snapshot in the same sense as the index and store. A ZooKeeper snapshot walks through the ZooKeeper tree and serializes the contents to a file.

Backup All

Attivio provides a convenience command which will create a snapshot of the indexes, store and ZooKeeper data all at once.

To create all snapshots at once, execute the following command via the Attivio CLI:

aie> snapshot all
Creating index snapshot: index_20161111_161210
Creating store snapshot: store_201611411_161210
Creating zookeeper snapshot: zk_20161111_161210

List Snapshots

To list all snapshots in descending order, sorted by the time they were requested, execute the following command via the Attivio CLI:

listsnapshots
2016-11-11T16:12:15.346-05:00 zk_20161111_161210
2016-11-11T16:12:15.129-05:00 store_20161111_161210
2016-11-11T16:12:10.046-05:00 index_20161111_161210
2016-11-11T16:12:05.446-05:00 zk_20161111_161204
2016-11-11T16:12:01.601-05:00 store_20161111_161157
2016-11-11T16:11:47.441-05:00 index_20161111_161146

Delete a Snapshot

Older snapshots should be deleted periodically. To delete a snapshot, execute the following command via the Attivio CLI:

deletesnapshot zk_20161111_161210
deletesnapshot store_20161111_161210
deletesnapshot index_20161111_161210

Restore a Snapshot

You can restore the various elements of an Attivio project to an earlier state by restoring snapshots. 

To restore the state of the index, store, or ZooKeeper from a previous snapshot, execute the following command via the Attivio CLI:

restoresnapshot index_20161111_161146

(tick) Plan Restore Downtime: The restore can be a long-running operation depending on the amount of data in the index and store. Downtime should be planned.

(tick) Take a snapshot before restoring a snapshot. This way, if any errors are encountered during restore, the most recent data is still recoverable.

(warning) Attivio connectors must be stopped in order to perform a restore. If you forget to stop the connectors before a restore, the indexers may become locked. You should be able to unlock the indexes at the end of the restore.

What happens during restoration?

  • When restoring indexes, the index will be stopped, the data will be copied back into place, and the index will be started.
  • When restoring the store, the HBase tables will be disabled, restored, and enabled (one by one).
  • When restoring ZooKeeper, note that the ZooKeeper instance must be running, and no downtime will be incurred as a result of the restore.

Export a Snapshot

When planning a disaster recovery strategy, you will want to copy the index, store and ZooKeeper backups to a separate location, potentially in another datacenter. 

To export a snapshot to a local or attached drive, execute the following command via the Attivio CLI:

localexportsnapshot store_20161111_161157 /opt/attivio/projects/myproject/snapshots/

(warning) The target directory must exist before attempting to export the snapshot.

(warning) Note that in the target location, the data is no longer recognized as a HDFS snapshot. To import a snapshot that was previously exported, the import command must be used (rather than restore).

Import a Snapshot

Restoring a snapshot that was exported requires using the localimportsnapshot rather than the restore command. Execute the following command via the Attivio CLI:

localimportsnapshot /opt/attivio/projects/myproject/snapshots_20170808/store_20170808_152916

ZooKeeper snapshots should not be restored in a different data center from where they were created. ZooKeeper stores IP addresses and hostnames which will naturally be different in a DR environment. Only store and index snapshots should be imported in the DR environment. The topology file of the DR project is expected to reflect the appropriate hostnames for that environment.

After restoring a DR environment from snapshots, all dictionaries, search profiles, triggers and relevancy models should be re-published as the publish status of each of these is stored in ZooKeeper.

Scheduled Backups

Backups may be scheduled using cron or a similar tool, by launching the CLI in non-interactive mode.

To execute any of the above commands in non-interactive mode, see the following example:

echo "snapshot index" | /path/to/aie/bin/aie-cli -p /path/to/project -ni
  • No labels