Page tree

Introduction

The Cloud Support module provides components which interface with AWS to provide on-premise connector support to Attivio Managed Services. On-premise, the customer uses a client installation of Attivio that transmits the output of connector execution to private AWS S3 and SQS channels. On the managed services side, an Attivio server is configured to listen for events on the SQS queue then download and ingest content from S3 as indicated by the events. To use, the cloudsupport external module must be added to an existing Attivio installation. To install the module run:

$ATTIVIO_HOME/bin/aie-exec modulemanager -i cloudsupport-1.0.0.zip

System Requirements

Please follow Cloud Support Module - Client System Requirements

On-Premise Use

Quick Start

  1. Run the onprem command to generate a sample onprem.json configuration file:

    aie-exec onprem -g > onprem.json
  2. Update the onprem.json file with the customerId, accessId, and secretKey. The secretKey must be encrypted using the Attivio encryption tool. The intention is that the secretKey is never shared directly with the customer; they should just have access to the encrypted form, which will ensure that the use of AWS services is constrained to Attivio code.

    aie-exec encrypt --password 928smI8N+y5QJ6xZFsniEX1eYTsDiwrmYdAaZo+2
  3. Place the onprem.json file in a directory by itself. This will become the project directory. Then run the onprem command, specifying the file as the argument to the -f flag. The system may then be started using standard aie-cli commands.

    aie-exec onprem -f onprem.json

Details

The cloudsupport module adds special support for automatically configuring the system to support on-premise connector execution. A limited JSON configuration file specifying only the topology of the system and a license is all that is necessary to get started:

onprem.json
{
  "store" : {
    "host" : "localhost",
    "port" : 16970
  },
  "zookeeper" : [ {
    "host" : "localhost",
    "port" : 16980
  } ],
  "nodes" : [ {
    "host" : "localhost",
    "port" : 17000
  } ],
  "region" : "us-east-2",
  "customerId" : "...",
  "accessId" : "...",
  "secretKey" : "...",

  // generally required only for test environments, bucket name is determined by appending bucketSuffix to customerId
  "bucketSuffix" : ".attivio.com",

  // optional, allows for advanced client configuration, such as proxy support. 
  "clientConfig" : { ... }
} 


Note that in this simple configuration, the license is included and the secretKey for AWS access is provided in an encrypted form. On-premise customers can control the number of nodes and location of the store process along with their respective ports. Note, the customer must be licensed for the onprem client feature. 
Additionally optional values for service endpoints for S3 and SQS may be provided. When provided, the default AWS endpoints for S3 and/or SQS are ignored.

onprem.json
...
"region" : "us-east-2",
"sqsServiceEndpoint" : "...",
"s3ServiceEndpoint" : "...",
...

JVM Heap Memory Configuration

Starting in version 1.0.3 of the cloudsupport module, you can specify a maxMemory field in any storezookeeper or nodes entry in the JSON configuration file, at the same level as the host and port fields. The value of this field is the maximum JVM heap size for the specified store, zookeeper, or node process, expressed in megabytes; if not specified, it defaults to 4 GB (4096 MB).

In this example, the maximum JVM heap-size limit for the single on-premise node is increased from the default to 8 GB (8192 MB):

onprem.json
{
  "store" : {
    "host" : "localhost",
    "port" : 16970
  },
  "zookeeper" : [ {
    "host" : "localhost",
    "port" : 16980
  } ],
  "nodes" : [ {
    "host" : "localhost",
    "port" : 17000,
    "maxMemory" : 8192
  } ],
...

This setting not present in the sample onprem.json configuration file; it is not usually required.

AWS Client Configuration

This simplified configuration also supports modification of the AWS client configuration using clientConfig. All fields and values correspond to the AWS ClientConfiguration class. Only the subset of the fields related to proxy configuration are mapped. The full set of mapped properties can be seen in the example below.

onprem.json
  "clientConfig" : {
    "proxyDomain" : "CORP",
    "proxyHost" : "myproxy.com",
    "proxyPassword" : "ROgdLg55F1/oirus32RzYQ==",
    "proxyPort" : 55,
    "proxyUsername" : "user",
    "proxyWorkstation" : "workstation",
    "nonProxyHosts" : "localhost|127.0.0.1",
    "https" : true
  }

aie-exec onprem

The new aie-exec executable onprem is used to start the system. The onprem executable is a simplified extension of the aie-cli program. It will automatically generate the Attivio configuration necessary to support on-premise connector activity. No use of createproject is necessary or supported in this context. To run the onprem command, all that is needed is to start the aie-agent as normal and the limited configuration file above. For example:

aie-exec onprem -f ~/onprem/onprem.json

When the onprem command runs, it generates an Attivio configuration in the same directory as the input file. For this reason it is recommended that the input file (onprem.json in the example) be placed in its own directory. The onprem command then provides a aie-cli prompt, with extraneous commands (such as hdfs, zk, snapshots, etc) removed. The generated project does not contain the Admin UIs for changing workflows and other advanced actions. The standard diagnostic tools for debugging a running system are available. Note, the input file onprem.json may be named anything. The Attivio project name is set the the name of the file with the suffix stripped.

Option

Description

-f

Specifies the onprem configuration file. Required unless -g flag is used. The project name will be set to this file name minus any suffix.

-g

Prints a sample configuration file to the console.

-sc

Enables simple console mode. No status bar is displayed at the bottom of the console.

-h

prints help

Supplying no arguments results in no action.

Escape hatch

Since the generated project is a standard (albeit slimmed down) Attivio project running against a standard installation, the full complement of Attivio tools is available. aie-cli can be used directly with the project, createproject can be used to incrementally add functionality to an existing project, etc. These tools can be used to overcome missing features in the cloudsupport module in the field. However, any such need should be reported as a bug or feature request, as the intention is that such use should not be required. If this step is required, the following properties must then be configured:

Property

Value

cloudsupport.onprem

true

cloudsupport.onprem.showmenus

true

enableDynamicConfiguration

true

projectLockTimeoutInSeconds

30

Additionally, the onprem generated project will have placed a copy of the json configuration file in the resources directory. Its name will always be onprem.json, regardless of the original name and must not be changed (the server looks specifically for this resource name). If using aie-cli to start and stop the system, then this file must be modified if any changes to the clientConfigregion, or other connection information is required. Note, that if the onprem command is later used it will overwrite this file.

Connectors

Connectors in an on-premise environment are added and configured normally. The ingestWorkflow that is set for the connector will be used as the target workflow for the connector ingestion on the Attivio cloud server. In the on-premise environment, all connector data will always be routed to the local ingest workflow. The local ingest workflow contains the component which uploads data to AWS.

Document batch size

When configuring connectors, the Document batch size should be set depending on the type of connector being configured and the characteristics of the data. See the table below for guidance:

Connector Type

Content

Recommended Document batch size

JDBC Database

No BLOB or CLOB fields

50

JDBC Database

With BLOB or CLOB fields

1

CSV Files

 

50

All other connector types

 

1

Attivio Server Use

On the server side of the system, configuration is done manually. After installing the cloudsupport module and adding it to the managed services project, two steps are required:

  1. Add an onprem.json file to the project resources directory. This file only needs to contain the customerIdaccessId, and secretKey information. Other information (such as region and clientConfig), is optional.
  1. Add a conf/bean/cloudcontrolchannellistener.xml file to the project. This bean will define a listener that waits for events on the customer's associated SQS queue and generates temporary connectors in response to ingest the associated data. This listener is a PeriodicGlobalRunnable, meaning that it will be running on every node of the server system, with one copy as master and the others ready to take over in the event of a server node crash.
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"
       default-lazy-init="true"  >
  <bean name="cloudcontrolchannellistener" class="com.attivio.cloudsupport.CloudControlChannelListener">
     <!-- The private secret key if using data encryption -->
     <property name="privateKeyFilename" value="secret.prv"/>
     <!-- customize region.  only necessary if onprem.json is being overridden to define multiple listeners -->
     <property name="region" value="us-east-2"/>
  </bean>
</beans>

Once this bean is added to the system, it will use the SQS queue as a control channel, kicking off ingestion runs as the on-prem connectors post uploads. If the privateKeyFilename property is present, it should refer to an Attivio resource file in the system. See Data Encryption below for details on creation of the secret.prv file.

Note: The attivio cloudsupport.onprem property must be missing or set to false on the server side of the system.

Data Encryption

The cloud support module supports AWS S3 Server Side Encryption with Customer Provided Encryption Keys. When activated, the data for each connector execution is encrypted with a unique key generated at upload time. This encryption key is included with the connector execution metadata that is transmitted via AWS SQS. The S3 key is encrypted using the public portion of a RSA public/private key pair. The private key is available only within the managed service server for the specific customer.
Activation of encryption support occurs when the controlEncryptionKey is added to the configuration json file. This encryption key is a text version of the RSA public key. A public/private key pair for data encryption can be generated using the onprem command with the -generate-key-pair argument (the key file directory must exist):

> aie-exec onprem --generate-key-pair /tmp/keys/secret.prv
Private key saved to /tmp/keys/secret.prv.  Do not share with customer.
Public key (add to json configuration file):
"controlEncryptionKey":"rO0ABXNyABRqYXZhLnNlY3VyaXR5LktleVJlcL35T7OImqVDAgAETAAJYWxnb3JpdGhtdAASTGphdmEvbGFuZy9TdHJpbmc7WwAHZW5jb2RlZHQAAltCTAAGZm9ybWF0cQB+AAFMAAR0eXBldAAbTGphdmEvc2VjdXJpdHkvS2V5UmVwJFR5cGU7eHB0AANSU0F1cgACW0Ks8xf4BghU4AIAAHhwAAABJjCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKgS1Cp23PRO1XxbjESILKE6a05t3r1XpVEw4RRunltSLgRStipCjRBIeC52OgE8EwAnh/AzcB85p66lQn5yTIGPAWWXAdyBQ7N0qRjPv3J1C5HZ1eRu9Cj0UiTmyrfZi/bwn+qnUvNBRnjTpker9WhJ4oeFXeNq7jEgR1m0bUfLF1+SiP7zUQ8GlyxIqc0+4neBbUmG/7x4xy+sZ2v4aDSoJDppoCxIBwThqRXThEFv4YdkkjSulbMHIauNof+PKqCbLWH3kT0E/M0A26YM2ddegcledd+QeYLd16jCedtl4CDVUYfgZ1D07fHKpcCLDi7yM75atTmwk4A/CUCuIvsCAwEAAXQABVguNTA5fnIAGWphdmEuc2VjdXJpdHkuS2V5UmVwJFR5cGUAAAAAAAAAABIAAHhyAA5qYXZhLmxhbmcuRW51bQAAAAAAAAAAEgAAeHB0AAZQVUJMSUM="

The /tmp/keys/secret.prv file is kept and referenced within the managed services system. The controlEncryptionKey setting is added to the json configuration file. See Attivio Server Use section above for details on how to configure the server to reference the secret.prv file.

Currently, encryption cannot be enabled for use of the web crawler on-prem due to a known issue. This will be resolved in a future version.


AWS Provisioning

Using the on-premise connector requires prior creation of an S3 bucket and two SQS queues. The bucket name must follow the naming convention of customerId.attiviocloud.com. The SQS queues must be named customerId and customerId-deadletter. The S3 bucket and SQS queues must be created in the same region. Given these, the following permissions may then be added to the user whose accessId is in use (customerId is cloudsupport in this example).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::cloudsupport.attiviosupport.com"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::cloudsupport.attiviosupport.com/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "sqs:DeleteMessage",
                "sqs:GetQueueUrl",
                "sqs:ReceiveMessage",
                "sqs:SendMessage",
                "sqs:GetQueueAttributes",
                "sqs:ListQueueTags",
                "sqs:CreateQueue",
                "sqs:PurgeQueue",
                "sqs:SetQueueAttributes"
            ],
            "Resource": "arn:aws:sqs:*:*:cloudsupport*"
        }
    ]
}


It is recommended to use the following settings for the primary SQS queue customerId: * Message Retention Period: 1 day * Redrive Policy: Checked * Dead Letter Queue: customerId-deadletter * Maximum Receives: 5
The Message Retention Period for the DLQ should then be set to 14 days (the maximum).

AWS Client diagnostics

Detailed logging of AWS client interactions can be turned on by following these instructions. Logging output will be included in the aie-node.log file. Both the on-prem and server side use AWS clients and will produce output when configured to do so.