The Cloud Support module provides components which interface with AWS to provide on-premise connector support to Attivio Managed Services. On-premise, the customer uses a client installation of Attivio that transmits the output of connector execution to private AWS S3 and SQS channels. On the managed services side, an Attivio server is configured to listen for events on the SQS queue then download and ingest content from S3 as indicated by the events. To use, the cloudsupport external module must be added to an existing Attivio installation. To install the module run:
Please follow Cloud Support Module - Client System Requirements
onpremcommand to generate a sample
onprem.jsonfile with the customerId, accessId, and secretKey. The secretKey must be encrypted using the Attivio encryption tool. The intention is that the secretKey is never shared directly with the customer; they should just have access to the encrypted form, which will ensure that the use of AWS services is constrained to Attivio code.
onprem.jsonfile in a directory by itself. This will become the project directory. Then run the
onpremcommand, specifying the file as the argument to the
-fflag. The system may then be started using standard
The cloudsupport module adds special support for automatically configuring the system to support on-premise connector execution. A limited JSON configuration file specifying only the topology of the system and a license is all that is necessary to get started:
Note that in this simple configuration, the license is included and the
secretKey for AWS access is provided in an encrypted form. On-premise customers can control the number of nodes and location of the store process along with their respective ports. Note, the customer must be licensed for the onprem client feature.
Additionally optional values for service endpoints for S3 and SQS may be provided. When provided, the default AWS endpoints for S3 and/or SQS are ignored.
JVM Heap Memory Configuration
Starting in version 1.0.3 of the cloudsupport module, you can specify a
maxMemory field in any
nodes entry in the JSON configuration file, at the same level as the
port fields. The value of this field is the maximum JVM heap size for the specified store, zookeeper, or node process, expressed in megabytes; if not specified, it defaults to 4 GB (4096 MB).
In this example, the maximum JVM heap-size limit for the single on-premise node is increased from the default to 8 GB (8192 MB):
This setting not present in the sample
onprem.json configuration file; it is not usually required.
AWS Client Configuration
This simplified configuration also supports modification of the AWS client configuration using
clientConfig. All fields and values correspond to the AWS ClientConfiguration class. Only the subset of the fields related to proxy configuration are mapped. The full set of mapped properties can be seen in the example below.
onprem is used to start the system. The
onprem executable is a simplified extension of the
aie-cli program. It will automatically generate the Attivio configuration necessary to support on-premise connector activity. No use of
createproject is necessary or supported in this context. To run the
onprem command, all that is needed is to start the
aie-agent as normal and the limited configuration file above. For example:
onprem command runs, it generates an Attivio configuration in the same directory as the input file. For this reason it is recommended that the input file (
onprem.json in the example) be placed in its own directory. The
onprem command then provides a
aie-cli prompt, with extraneous commands (such as hdfs, zk, snapshots, etc) removed. The generated project does not contain the Admin UIs for changing workflows and other advanced actions. The standard diagnostic tools for debugging a running system are available. Note, the input file
onprem.json may be named anything. The Attivio project name is set the the name of the file with the suffix stripped.
Specifies the onprem configuration file. Required unless -g flag is used. The project name will be set to this file name minus any suffix.
Prints a sample configuration file to the console.
Enables simple console mode. No status bar is displayed at the bottom of the console.
Supplying no arguments results in no action.
Since the generated project is a standard (albeit slimmed down) Attivio project running against a standard installation, the full complement of Attivio tools is available.
aie-cli can be used directly with the project,
createproject can be used to incrementally add functionality to an existing project, etc. These tools can be used to overcome missing features in the
cloudsupport module in the field. However, any such need should be reported as a bug or feature request, as the intention is that such use should not be required. If this step is required, the following properties must then be configured:
onprem generated project will have placed a copy of the json configuration file in the
resources directory. Its name will always be
onprem.json, regardless of the original name and must not be changed (the server looks specifically for this resource name). If using
aie-cli to start and stop the system, then this file must be modified if any changes to the
region, or other connection information is required. Note, that if the
onprem command is later used it will overwrite this file.
Connectors in an on-premise environment are added and configured normally. The ingestWorkflow that is set for the connector will be used as the target workflow for the connector ingestion on the Attivio cloud server. In the on-premise environment, all connector data will always be routed to the local
ingest workflow. The local
ingest workflow contains the component which uploads data to AWS.
Document batch size
When configuring connectors, the Document batch size should be set depending on the type of connector being configured and the characteristics of the data. See the table below for guidance:
Recommended Document batch size
No BLOB or CLOB fields
With BLOB or CLOB fields
All other connector types
Attivio Server Use
On the server side of the system, configuration is done manually. After installing the cloudsupport module and adding it to the managed services project, two steps are required:
- Add an
onprem.jsonfile to the project
resourcesdirectory. This file only needs to contain the
secretKeyinformation. Other information (such as
clientConfig), is optional.
- Add a
conf/bean/cloudcontrolchannellistener.xmlfile to the project. This bean will define a listener that waits for events on the customer's associated SQS queue and generates temporary connectors in response to ingest the associated data. This listener is a
PeriodicGlobalRunnable, meaning that it will be running on every node of the server system, with one copy as master and the others ready to take over in the event of a server node crash.
Once this bean is added to the system, it will use the SQS queue as a control channel, kicking off ingestion runs as the on-prem connectors post uploads. If the
privateKeyFilename property is present, it should refer to an Attivio resource file in the system. See Data Encryption below for details on creation of the
Note: The attivio
cloudsupport.onpremproperty must be missing or set to
falseon the server side of the system.
The cloud support module supports AWS S3 Server Side Encryption with Customer Provided Encryption Keys. When activated, the data for each connector execution is encrypted with a unique key generated at upload time. This encryption key is included with the connector execution metadata that is transmitted via AWS SQS. The S3 key is encrypted using the public portion of a RSA public/private key pair. The private key is available only within the managed service server for the specific customer.
Activation of encryption support occurs when the
controlEncryptionKey is added to the configuration json file. This encryption key is a text version of the RSA public key. A public/private key pair for data encryption can be generated using the
onprem command with the
-generate-key-pair argument (the key file directory must exist):
/tmp/keys/secret.prv file is kept and referenced within the managed services system. The
controlEncryptionKey setting is added to the json configuration file. See Attivio Server Use section above for details on how to configure the server to reference the
Currently, encryption cannot be enabled for use of the web crawler on-prem due to a known issue. This will be resolved in a future version.
Using the on-premise connector requires prior creation of an S3 bucket and two SQS queues. The bucket name must follow the naming convention of customerId.attiviocloud.com. The SQS queues must be named customerId and customerId-deadletter. The S3 bucket and SQS queues must be created in the same region. Given these, the following permissions may then be added to the user whose accessId is in use (customerId is
cloudsupport in this example).
It is recommended to use the following settings for the primary SQS queue customerId: * Message Retention Period: 1 day * Redrive Policy: Checked * Dead Letter Queue: customerId-deadletter * Maximum Receives: 5
The Message Retention Period for the DLQ should then be set to 14 days (the maximum).
AWS Client diagnostics
Detailed logging of AWS client interactions can be turned on by following these instructions. Logging output will be included in the
aie-node.log file. Both the on-prem and server side use AWS clients and will produce output when configured to do so.