aie-exec loadgen command generates artificial ingestion load to an AIE cluster via the Java API. Loadgen generates realistic emails and documents with varying sizes according to a normal distribution.
The documents the loadgen sends are either plain or emails. Plain files originate from the filesDirectory and are chosen based on a random file size that the load generator wants to send. Emails are files that have attachments and the files to be used as attachments reside in the attachmentDirectory. The user can determine the ratio of attachments vs plain files to send using the percentageAttachments parameter.
The sizes of files are determined by the use of two different algorithms: normal distribution or power law distribution. The normal distribution uses the mean and standard deviation that simulates the 'average' file size to go through the system. This will test the scenario where most files are of average size. The power law distribution uses a mix, max, exponent and seed. The power law will simulate outliers, in particular the infrequent very large file. The parameter dpldUsage is used to determine how often to use the power law algorithm. dlpdUsage is the number of times out of 10 that power law algorithm should be used.
|-z, --zookeeper||ZooKeeper connection string for system||None, required|
|-n, --project||Project name||None, required|
|--filesDirectory||Source of the files that will be used for random generation. Expectation is that a variety of sizes will be present.||None, required|
|-e, --projectEnvironment||Project environment name||default|
|--docsPerSecond||Number of documents per second to generate||10|
|--percentageAttachments||The percentage of email files to send with attachments||0|
|--attachmentDirectory||The directory containing all of the files to be used as attachments||None|
|--attachmentPaths||A file containing newline separated paths within the files directory of files (emails) with attachments||None|
The percentage of documents to be created from structured data, the rest will be created from unstructured data
The csv file containing structured data to ingest if percentageStructured > 0, format of rows is comma delimited, escape quotation marks (") with double quotation marks ("")
|--commitOnNewClient||True if the ingested data should be commit the ingested data on a new client||true|
|--ingestWorkflowName||Name of the ingest workflow to use when feeding the documents to the index||fileIngest|
|--maxDocuments||The maximum number of documents to ingest before the tool will terminate, -1 means indefinite||-1|
|--newClientThreshold||Number of documents sent before a new client is used||1000000|
|--quiet||Do not print the status of load generation||false|
|--runtime||The runtime for data generation in seconds. -1 means indefinite||-1|
|--tableName||The name of the table field for unstructured documents||None|
Not for Production Systems
The load generator and query generator can overwhelm a system with input and mimic a Denial Of Service (DOS) attack. Do not use these tools on an in-production system.
If your system uses a non-default hdfs.store.root
aie-exec -J-Dhdfs.store.root=/user/systemtest/attivio loadgen -z linenghdp07:2181 -n vtest --filesDirectory /data/bulk
Working with Kerberos enabled systems
loadgen interacts directly with AIE client APIs and requires authentication information when working in kerberized environments. The AIE properties security.hadoop.principal and security.hadoop.keytab may be supplied to
aie-exec by prefixing the standard java -Dproperty=value syntax with a -J. For example:
aie-exec -J-Dsecurity.hadoop.principal=<principal-name> -J-Dsecurity.hadoop.keytab=<path-to-keytab> loadgen -z linenghdp07:2181 -n vtest --filesDirectory /data/bulk