Page tree
Skip to end of metadata
Go to start of metadata

Overview

Clustered (multi-node) AIE projects use Apache™ Hadoop®, along with YARN, the Hadoop® File System (HDFS), Apache ZooKeeper™, and Apache HBase™ to create a scalable, highly available AIE configuration.There are a few one-time configuration steps required to prepare YARN, HDFS, ZooKeeper and HBase before being able to run AIE projects in Hadoop.This page describes those one-time configuration steps required.

Before continuing, verify your Hadoop system meets AIE base system requirements as explained on the System Requirements page under the Hadoop section.

Multi-node Semantic Data Catalog

The Semantic Data Catalog can be run in a clustered configuration to allow full SQL and BI tool access. Certain capabilities may be limited in clustered mode. See here for detail.

View incoming links.

Setting Up HDFS

A one-time configuration is required for HDFS before AIE projects can use it.You will need a directory in HDFS to hold AIE libraries and Index data. By default, the directory is expected to be on HDFS under the path /attivio, but this may be changed as long as all Attivio projects have the hdfs.store.root property set to this different path.

The permissions on this directory should be such that the AIE user running the project has full read and write access.

To create this directory in HDFS, you would execute a command such as this:

Create HDFS Attivio Directory
sudo hdfs dfs -mkdir /attivio

For production systems, or systems that should be secure, the HDFS "/attivio" directory should have permissions that allow only the proper AIE user to have read and write access.

But for quick setup of test systems where security is not an issue, one way to let all users read and write this directory and create, edit or delete any project, would be to execute the following:

  • sudo hdfs dfs -chown hdfs:supergroup /attivio
  • sudo hdfs dfs -chmod 777 /attivio

Setting up HBase

AIE needs permission to create and update tables in HBase.  If security is enabled on HBase, the user running AIE must either have full HBase Admin permissions or an HBase namespace must be created for the user by an administrator.

Option 1: AIE User with HBase Admin Privileges

The user running AIE is granted Read, Write, Create, and Admin permissions.  For example:

sudo su hbase
hbase shell
hbase> grant '<user>','RCWA'

The format of the namespace created by Attivio is: attivio_<project-name>_<environment-name>.

For example: attivio_myproject_test

Note the following:

  • For the default environment, the last component of the name is dropped. e.g. attivio_myproject
  • '-' and '.' characters in the project name are replaced by '_'.

Option 2: HBase Namespace Created by an Administrator

The HBase administrator will create a namespace with Read, Write and Create privileges for the AIE user before AIE is deployed and started. The namespace must be specified in the attivio.properties file using the hbase.store.namespace property. A different name space must be created for different projects and different environments when they use the same HBase instance.

 

Setting up Dedicated ZooKeeper Instance

AIE can be configured to use a dedicated zookeeper instance for AIE's coordination between all components, both those within and outside the Hadoop cluster.  

To do this, follow these steps:

  1. Install a apache zookeeper instance on a system that all AIE components can access.
  2. Confirm entries in configuraiton file conf/zoo.cfg
  3. Start the zk server

 

bin/zkServer.sh start

 

Complete setting up zookeeper. Use the dedicated zookeeper host and port for the -z parameter when running aieuploadclusterinfo and in the topology.xml file for use with the aie-cli. 

Setting up ZooKeeper

AIE needs to access various configuration settings of the Hadoop cluster and uses ZooKeeper to store these. A one-time upload of Hadoop configuration information is required before running any AIE programs against Hadoop.

(warning) If ZooKeeper, HDFS, HBase and YARN are in separate clusters see Single vs. Separate Hadoop Clusters for exceptions to the following configuration steps.

To do this, follow these steps:

  1. Install your version of AIE on a Linux node with access to the Hadoop cluster.
  2. Download the YARN, HDFS and HBase client configuration files from the Hadoop Cluster into a local directory, for example /opt/attivio/sitefilesThe site files can be downloaded through the applicable Hadoop cluster UI (Ambari for Hortonworks and Cloudera Manager for Cloudera):
    1. Select the service, such as YARN, HDFS or HBase.
    2. Under 'Actions' there is an option to download the client configuration. The downloaded files will be either compressed files in .zip or .tar.gz format.
    3. Store these files all in the same directory. There is no need to uncompress them.On Cloudera clusters, these files will be named something like yarn-clientconfig.zip, hbase-clientconfig.zip, and hdfs-clientconfig.zip.On Hortonworks clusters, they will be named something like YARN_CLIENT-configs.tar.gz, HBASE_CLIENT-configs.tar.gz, and HDFS_CLIENT-configs.tar.gz.
  3. Determine the host and port of your ZooKeeper instance.For this example, we will assume it is example.com:2181.
  4. Run the aieuploadclusterinfo program as follows:
     <aie-install-dir>/bin/aieuploadclusterinfo -z example.com:2181 -d /opt/attivio/sitefiles


Upon successful completion of the upload, aieuploadclusterinfo will display the message "Successfully uploaded properties to ZooKeeper." 


Options for the aieuploadclusterinfo tool are:
Option
Default Value
Description
-dnone

Specifies the directory that contains the site configuration files.

-znone

Comma-delimited list of ZooKeeper hosts, along with their port numbers, for your Hadoop cluster. If using a dedicated ZooKeeper instance use that host and port.

Example: -z "host1.example.com:2181,host2.example.com:2181"

-pnone

Comma-separated list of key/value pairs for resolving properties that include variables in their values (used for Hortonworks)

Example: -p hdp.version=HDP2.3.0.0,user.name=jdoe

If Hadoop, YARN, HDFS or HBase configuration values change, you must re-run the aieuploadclusterinfo tool using newly download configuration files so AIE has the most up-to-date configuration settings

Single vs. Separate Hadoop Clusters

Attivio supports integrations with either a single Hadoop cluster or where each major component of Hadoop, specifically ZooKeeper, HDFS, HBase, and YARN, runs in its own cluster, provided each is managed by the same KDC and KMS system.

When integrating with separate clusters for each Hadoop component, there are some exceptions to the steps outlined above:

  • Use the ZooKeeper's cluster's host and port for the -z parameter when running aieuploadclusterinfo tool and in the topology.xml file for use with the aie-cli.  
  • Sitefiles: Download the HDFS client configuration from the HDFS service of the HDFS cluster, the YARN client configuration from the YARN service of the YARN cluster, and the HBase client configuration from the HBase service of the HBase cluster.
  • DO NOT include all 3 tars/zips as inputs to the aieuploadclusterinfo tool, instead...
    • create a sitefiles directory in a tempoaray location and insert the following into it:
      • from the HDFS sitefiles, add hdfs-site.xml and core-site.xml
      • from the HBase sitefiles, add hbase-site.xml
      • from the Yarn sitefiles, add yarn-site.xml and mapred-site.xml
      • point the aieuploadclusterinfo tool to this directory using the -d parameter
    • All other steps remain the same as when integrating with a single cluster

Ready for AIE Project Creation

Once all previous steps are done, your Hadoop Cluster should be ready to create and deploy AIE projects.Follow the steps on the Create an Attivio Project on Hadoop page for project creation and configuration.

Removing AIE from the Cluster

During the early stages of project development, it is sometimes expedient to delete all derived data to return the system to a clean state.Instructions for clearing the HDFS index files and HBase utility tables are available here.