Page tree
Skip to end of metadata
Go to start of metadata

This page applies to single-node, unclustered systems.

For clustered system restore, see Backup and Restore.

Overview

The Attivio Platform can be configured to make backups of the index periodically by running a system command via a Backup Service. The Backup Service guarantees that the backup command is executed within the context of a commit so the Attivio index is in a known and consistent state. In addition, backups can also be scheduled on a periodic basis so that they can run during off-peak usage times.

The backup mechanism consists of copying the Attivio index directory tree to another location using rsync (Linux), robocopy (Windows) or a script/executable specified by the user.

 

Required Modules

These features require that the indexbackup module be included when you run createproject to create the project directories.


Why not just back up the data-agent directory?

The Attivio index is a complex, multi-part system subject to constant updates. The most recently-ingested documents are in memory but have not been written to the disk yet. Simply copying the data files at an arbitrary moment does not capture a consistent and complete view of the index. The Attivio Backup Service commits recent changes and puts the indexes into a restorable state before copying them. 


In many cases it is useful to use a property value as a source or destination for backup operations. In order to ensure that the correct path separators (forward vs back slash) the $PATH property evaluator can be used in configuration files.

Using the indexbackup Module

The indexbackup module is comprised of two separate services that can be used to manage backing up of a live index when desired. In order to use these two services, a backup connector and a backup service must be configured to match the operating system's backup commands and the Attivio index configuration. See the configuration samples below for more details.

Backup Service

The BackupService is designed to run the system command (rsync, robocopy, etc.) used to perform the backup. The backupService is a normal service that is also configured as a notification endpoint for an index engine. When the AttivioEngine receives a Backup message from a backup connector, the engine will notify the Backup Service to run the backup command once the engine has finished handling the backup message.

Environment Variables for Backup Service Configuration

Attivio sets three backup-related environment variables which can be referenced in your Backup Service configuration. When a backup is run, these environment variables will be replaced with the values shown below for each index partition being backed up:

VariableValueReference as
AIE_BACKUP_DATA_DIR

The path to the index partition's data directory, used as the source directory for rsync or robocopy backup commands.

Example: /attivio/data-agent/projects/MyProject/staging/data/index/index-part0

$ENV{AIE_BACKUP_DATA_DIR}
AIE_BACKUP_ENGINE_NAME

The name of the index partition's engine, used to construct backup destination directory paths.

Examples: index.writer or index.reader

$ENV{AIE_BACKUP_ENGINE_NAME}
AIE_BACKUP_INDEX_NAME

The name of the index partition being backed up, used to construct backup destination directory paths.

Example: index-part0

$ENV{AIE_BACKUP_INDEX_NAME}

By abstracting out the settings that are partition-specific, these variables enable you to configure an rsync or robocopy command that will work correctly for all master index partitions.

Example Linux Configuration

Create a new configuration file in the <project-dir>/conf/components directory. It can have any name, but we recommend an intuitive name such backupService.xml. Insert the following example content into the file, replacing <HOST> and <PATH> with appropriate values for the server to which you want the backup to be written (e.g., replace <HOST> with myAttivioBackupHost and <PATH> with /attivio/backups):

<component xmlns="http://www.attivio.com/configuration/type/componentType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="backupService" class="com.attivio.platform.service.BackupService" xsi:schemaLocation="http://www.attivio.com/configuration/type/componentType http://www.attivio.com/configuration/type/componentType.xsd">

  <properties>
    <!-- name of the index feature to provide backup for -->
    <property name="index" value="index"/>
    <property name="exec" value="rsync" />
    <list name="argumentList">
      <entry value="-a" />
      <entry value="--delete" />
      <entry value="$ENV{AIE_BACKUP_DATA_DIR}" />
      <!-- replace <HOST> and <PATH> with appropriate values for your backup server -->
      <entry value="<HOST>:/<PATH>/$ENV{AIE_BACKUP_INDEX_NAME}/" />
    </list>
    <property name="workingDirectory" value="${attivio.project}" />
  </properties>
</component>

(warning) Note that on some systems, the full path to the command used by the backup service may need to be specified. For example, on a Linux system, specifying the following may be required:

<property name="exec" value="/usr/local/bin/rsync" />

Example Windows configuration

Create a new configuration file in the <project-dir>\conf\components directory. It can have any name, but we recommend an intuitive name such as backupService.xmlInsert the following example content into the file:

<component xmlns="http://www.attivio.com/configuration/type/componentType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="backupService" class="com.attivio.platform.service.BackupService" xsi:schemaLocation="http://www.attivio.com/configuration/type/componentType http://www.attivio.com/configuration/type/componentType.xsd">

  <properties>
    <!-- name of the index feature to provide backup for -->
    <property name="index" value="index"/>
    <property name="exec" value="robocopy" />
    <list name="argumentList">
      <entry value="$ENV{AIE_BACKUP_DATA_DIR}" />
      <entry value="C:\attivio-backup\factbook\$ENV{AIE_BACKUP_INDEX_NAME}\" />
      <entry value="/MIR" />
    </list>
    <property name="workingDirectory" value="${attivio.project}" />
  </properties>
</component>

(warning) Note that robocopy requires the /MIR switch to prevent obsolete segment files from accumulating in the destination directory.

Service Definition

Once the Backup Service component is defined, an Attivio service must be defined (in the <PROJECT-DIR>/conf/services.xml file) so that that the Backup Service gets started with Attivio.

<PROJECT-DIR>/conf/services.xml
<services>
  <service name="backupService" />
</services>

If you want to run a backup service for multiple index features, you will need to create multiple backup service components (one for each index feature you want to backup).

Other notes:

  • The script or process run in order to perform a backup should be synchronous such that when the command is finished, the complete backup process is finished and all files being read from the index directory have completed.
  • On Linux hosts, the user account under which the node/engine is running must have the execute permission on the backup script/command.

Backup Connector

Note that a Backup Connector is not a stand-alone mechanism. The connector simply sends a Backup message to the Attivio Backup Service via a workflow. The Backup Service must be configured (as shown above) before the Backup Connector can function.

The BackupConnector is a connector that can be scheduled to send a Backup message through an ingest workflow in order to trigger a Backup. This connector takes an option to force a commit prior to performing the backup. This connector can be scheduled to run via the normal connector scheduling mechanisms. This allows backups to run via a policy such as "Run backup at 2am every weekday" and so on.

To create a Backup Connector, navigate to the Connectors page of the Use the Attivio Administrator and click the New link. Select Backup Connector from the New Connector dialog box. A connector editor will open.

Give the connector a name. To bypass most ingestion stages and reduce overhead for the connector, you can change the default ingest workflow to indexer.

Note that the editor's Scheduler tab lets you trigger a backup event on a regular schedule (see below).

Triggering a Backup

Manually, via Connector

If a Backup Connector is configured, the Administrator UI can be used to start that connector from the Connectors page.

Via the Scheduler

You can configure a schedule for the Backup Connector in the Scheduler tab of its editor to run backups on a timed basis. See "Scheduling Connectors" in the Scheduling Tasks page for more details on scheduling the connector.

Programmatically, via Java SDK

Attivio provides Java APIs that allow users to trigger a backup. The solution involves creating a Java executable using the Attivio APIs which can be run from the command line. The executable can be written to return a success or failure signal.

The relevant Javadoc is located at https://attivio.github.io/sdk-5.5-javadoc/com/attivio/sdk/client/ConnectorControlApi.html.

The following page has examples of using the ServiceFactory from within and outside Attivio:
https://github.com/attivio/sdk/blob/5.5/service_factory.md

While connectors are running, users can monitor for System Events using https://attivio.github.io/sdk-5.5-javadoc/com/attivio/sdk/event/EventQuery.html

EventStoreApi eventClient = ServiceFactory.getService(EventStoreApi.class);

Single-Node Restore

Restoration of a backed-up Attivio index in a single-node system requires these steps:

  1. Ensure the the system is not ingesting content.
  2. Navigate to the Indexes page from the Admin UI.
  3. Delete the Index. A confirmation dialog will open, type "index" into the text box and click okay.
  4. Shut down Attivio. This lets you overwrite the index files.
  5. Remove the index directory from the Project Index Directory, for example: /opt/attivio/data-agent/projects/<project-name>/default/data/
  6. Copy the contents of the Backup Directory Index created by the backup into the Project Index Directory 
  7. Restart Attivio.
  8. Navigate back to the Indexes page in the Admin UI and verify that the index reflects the rollback.
  9. Reload any sources that were ingested after the date of the backup.

 

  • No labels