Introduction
The Cloud Support module provides components which interface with AWS to provide on-premise connector support to Attivio Managed Services. On-premise, the customer uses a client installation of Attivio that transmits the output of connector execution to private AWS S3 and SQS channels. On the managed services side, an Attivio server is configured to listen for events on the SQS queue then download and ingest content from S3 as indicated by the events. To use, the cloudsupport external module must be added to an existing Attivio installation. To install the module run:
$ATTIVIO_HOME/bin/aie-exec modulemanager -i cloudsupport-1.0.0.zip
System Requirements
Please follow Cloud Support Module - Client System Requirements
On-Premise Use
Quick Start
Run the
onprem
command to generate a sampleonprem.json
configuration file:aie-exec onprem -g > onprem.json
Update the
onprem.json
file with the customerId, accessId, and secretKey. The secretKey must be encrypted using the Attivio encryption tool. The intention is that the secretKey is never shared directly with the customer; they should just have access to the encrypted form, which will ensure that the use of AWS services is constrained to Attivio code.aie-exec encrypt --password 928smI8N+y5QJ6xZFsniEX1eYTsDiwrmYdAaZo+2
Place the
onprem.json
file in a directory by itself. This will become the project directory. Then run theonprem
command, specifying the file as the argument to the-f
flag. The system may then be started using standardaie-cli
commands.aie-exec onprem -f onprem.json
Details
The cloudsupport module adds special support for automatically configuring the system to support on-premise connector execution. A limited JSON configuration file specifying only the topology of the system and a license is all that is necessary to get started:
{ "store" : { "host" : "localhost", "port" : 16970 }, "zookeeper" : [ { "host" : "localhost", "port" : 16980 } ], "nodes" : [ { "host" : "localhost", "port" : 17000 } ], "region" : "us-east-2", "customerId" : "...", "accessId" : "...", "secretKey" : "...", // generally required only for test environments, bucket name is determined by appending bucketSuffix to customerId "bucketSuffix" : ".attivio.com", // optional, allows for advanced client configuration, such as proxy support. "clientConfig" : { ... } }
Note that in this simple configuration, the license is included and the secretKey
for AWS access is provided in an encrypted form. On-premise customers can control the number of nodes and location of the store process along with their respective ports. Note, the customer must be licensed for the onprem client feature.
Additionally optional values for service endpoints for S3 and SQS may be provided. When provided, the default AWS endpoints for S3 and/or SQS are ignored.
... "region" : "us-east-2", "sqsServiceEndpoint" : "...", "s3ServiceEndpoint" : "...", ...
JVM Heap Memory Configuration
Starting in version 1.0.3 of the cloudsupport module, you can specify a maxMemory
field in any store
, zookeeper
or nodes
entry in the JSON configuration file, at the same level as the host
and port
fields. The value of this field is the maximum JVM heap size for the specified store, zookeeper, or node process, expressed in megabytes; if not specified, it defaults to 4 GB (4096 MB).
In this example, the maximum JVM heap-size limit for the single on-premise node is increased from the default to 8 GB (8192 MB):
{ "store" : { "host" : "localhost", "port" : 16970 }, "zookeeper" : [ { "host" : "localhost", "port" : 16980 } ], "nodes" : [ { "host" : "localhost", "port" : 17000, "maxMemory" : 8192 } ], ...
This setting not present in the sample onprem.json
configuration file; it is not usually required.
AWS Client Configuration
This simplified configuration also supports modification of the AWS client configuration using clientConfig
. All fields and values correspond to the AWS ClientConfiguration class. Only the subset of the fields related to proxy configuration are mapped. The full set of mapped properties can be seen in the example below.
"clientConfig" : { "proxyDomain" : "CORP", "proxyHost" : "myproxy.com", "proxyPassword" : "ROgdLg55F1/oirus32RzYQ==", "proxyPort" : 55, "proxyUsername" : "user", "proxyWorkstation" : "workstation", "nonProxyHosts" : "localhost|127.0.0.1", "https" : true }
aie-exec onprem
The new aie-exec
executable onprem
is used to start the system. The onprem
executable is a simplified extension of the aie-cli
program. It will automatically generate the Attivio configuration necessary to support on-premise connector activity. No use of createproject
is necessary or supported in this context. To run the onprem
command, all that is needed is to start the aie-agent
as normal and the limited configuration file above. For example:
aie-exec onprem -f ~/onprem/onprem.json
When the onprem
command runs, it generates an Attivio configuration in the same directory as the input file. For this reason it is recommended that the input file (onprem.json
in the example) be placed in its own directory. The onprem
command then provides a aie-cli
prompt, with extraneous commands (such as hdfs, zk, snapshots, etc) removed. The generated project does not contain the Admin UIs for changing workflows and other advanced actions. The standard diagnostic tools for debugging a running system are available. Note, the input file onprem.json
may be named anything. The Attivio project name is set the the name of the file with the suffix stripped.
Option | Description |
---|---|
-f | Specifies the onprem configuration file. Required unless -g flag is used. The project name will be set to this file name minus any suffix. |
-g | Prints a sample configuration file to the console. |
-sc | Enables simple console mode. No status bar is displayed at the bottom of the console. |
-h | prints help |
Supplying no arguments results in no action.
Escape hatch
Since the generated project is a standard (albeit slimmed down) Attivio project running against a standard installation, the full complement of Attivio tools is available. aie-cli
can be used directly with the project, createproject
can be used to incrementally add functionality to an existing project, etc. These tools can be used to overcome missing features in the cloudsupport
module in the field. However, any such need should be reported as a bug or feature request, as the intention is that such use should not be required. If this step is required, the following properties must then be configured:
Property | Value |
---|---|
cloudsupport.onprem | true |
cloudsupport.onprem.showmenus | true |
enableDynamicConfiguration | true |
projectLockTimeoutInSeconds | 30 |
Additionally, the onprem
generated project will have placed a copy of the json configuration file in the resources
directory. Its name will always be onprem.json
, regardless of the original name and must not be changed (the server looks specifically for this resource name). If using aie-cli
to start and stop the system, then this file must be modified if any changes to the clientConfig
, region
, or other connection information is required. Note, that if the onprem
command is later used it will overwrite this file.
Connectors
Connectors in an on-premise environment are added and configured normally. The ingestWorkflow that is set for the connector will be used as the target workflow for the connector ingestion on the Attivio cloud server. In the on-premise environment, all connector data will always be routed to the local ingest
workflow. The local ingest
workflow contains the component which uploads data to AWS.
Document batch size
When configuring connectors, the Document batch size should be set depending on the type of connector being configured and the characteristics of the data. See the table below for guidance:
Connector Type | Content | Recommended Document batch size |
---|---|---|
JDBC Database | No BLOB or CLOB fields | 50 |
JDBC Database | With BLOB or CLOB fields | 1 |
CSV Files |
| 50 |
All other connector types |
| 1 |
Attivio Server Use
On the server side of the system, configuration is done manually. After installing the cloudsupport module and adding it to the managed services project, two steps are required:
- Add an
onprem.json
file to the projectresources
directory. This file only needs to contain thecustomerId
,accessId
, andsecretKey
information. Other information (such asregion
andclientConfig
), is optional.
- Add a
conf/bean/cloudcontrolchannellistener.xml
file to the project. This bean will define a listener that waits for events on the customer's associated SQS queue and generates temporary connectors in response to ingest the associated data. This listener is aPeriodicGlobalRunnable
, meaning that it will be running on every node of the server system, with one copy as master and the others ready to take over in the event of a server node crash.
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd" default-lazy-init="true" > <bean name="cloudcontrolchannellistener" class="com.attivio.cloudsupport.CloudControlChannelListener"> <!-- The private secret key if using data encryption --> <property name="privateKeyFilename" value="secret.prv"/> <!-- customize region. only necessary if onprem.json is being overridden to define multiple listeners --> <property name="region" value="us-east-2"/> </bean> </beans>
Once this bean is added to the system, it will use the SQS queue as a control channel, kicking off ingestion runs as the on-prem connectors post uploads. If the privateKeyFilename
property is present, it should refer to an Attivio resource file in the system. See Data Encryption below for details on creation of the secret.prv
file.
Note: The attivio
cloudsupport.onprem
property must be missing or set tofalse
on the server side of the system.
Data Encryption
The cloud support module supports AWS S3 Server Side Encryption with Customer Provided Encryption Keys. When activated, the data for each connector execution is encrypted with a unique key generated at upload time. This encryption key is included with the connector execution metadata that is transmitted via AWS SQS. The S3 key is encrypted using the public portion of a RSA public/private key pair. The private key is available only within the managed service server for the specific customer.
Activation of encryption support occurs when the controlEncryptionKey
is added to the configuration json file. This encryption key is a text version of the RSA public key. A public/private key pair for data encryption can be generated using the onprem
command with the -generate-key-pair
argument (the key file directory must exist):
> aie-exec onprem --generate-key-pair /tmp/keys/secret.prv Private key saved to /tmp/keys/secret.prv. Do not share with customer. Public key (add to json configuration file): "controlEncryptionKey":"rO0ABXNyABRqYXZhLnNlY3VyaXR5LktleVJlcL35T7OImqVDAgAETAAJYWxnb3JpdGhtdAASTGphdmEvbGFuZy9TdHJpbmc7WwAHZW5jb2RlZHQAAltCTAAGZm9ybWF0cQB+AAFMAAR0eXBldAAbTGphdmEvc2VjdXJpdHkvS2V5UmVwJFR5cGU7eHB0AANSU0F1cgACW0Ks8xf4BghU4AIAAHhwAAABJjCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKgS1Cp23PRO1XxbjESILKE6a05t3r1XpVEw4RRunltSLgRStipCjRBIeC52OgE8EwAnh/AzcB85p66lQn5yTIGPAWWXAdyBQ7N0qRjPv3J1C5HZ1eRu9Cj0UiTmyrfZi/bwn+qnUvNBRnjTpker9WhJ4oeFXeNq7jEgR1m0bUfLF1+SiP7zUQ8GlyxIqc0+4neBbUmG/7x4xy+sZ2v4aDSoJDppoCxIBwThqRXThEFv4YdkkjSulbMHIauNof+PKqCbLWH3kT0E/M0A26YM2ddegcledd+QeYLd16jCedtl4CDVUYfgZ1D07fHKpcCLDi7yM75atTmwk4A/CUCuIvsCAwEAAXQABVguNTA5fnIAGWphdmEuc2VjdXJpdHkuS2V5UmVwJFR5cGUAAAAAAAAAABIAAHhyAA5qYXZhLmxhbmcuRW51bQAAAAAAAAAAEgAAeHB0AAZQVUJMSUM="
The /tmp/keys/secret.prv
file is kept and referenced within the managed services system. The controlEncryptionKey
setting is added to the json configuration file. See Attivio Server Use section above for details on how to configure the server to reference the secret.prv
file.
Currently, encryption cannot be enabled for use of the web crawler on-prem due to a known issue. This will be resolved in a future version.
AWS Provisioning
Using the on-premise connector requires prior creation of an S3 bucket and two SQS queues. The bucket name must follow the naming convention of customerId.attiviocloud.com. The SQS queues must be named customerId and customerId-deadletter. The S3 bucket and SQS queues must be created in the same region. Given these, the following permissions may then be added to the user whose accessId is in use (customerId is cloudsupport
in this example).
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::cloudsupport.attiviosupport.com" }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::cloudsupport.attiviosupport.com/*" ] }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "sqs:DeleteMessage", "sqs:GetQueueUrl", "sqs:ReceiveMessage", "sqs:SendMessage", "sqs:GetQueueAttributes", "sqs:ListQueueTags", "sqs:CreateQueue", "sqs:PurgeQueue", "sqs:SetQueueAttributes" ], "Resource": "arn:aws:sqs:*:*:cloudsupport*" } ] }
It is recommended to use the following settings for the primary SQS queue customerId: * Message Retention Period: 1 day * Redrive Policy: Checked * Dead Letter Queue: customerId-deadletter * Maximum Receives: 5
The Message Retention Period for the DLQ should then be set to 14 days (the maximum).
AWS Client diagnostics
Detailed logging of AWS client interactions can be turned on by following these instructions. Logging output will be included in the aie-node.log
file. Both the on-prem and server side use AWS clients and will produce output when configured to do so.