Page tree
Skip to end of metadata
Go to start of metadata

The aie-exec loadgen command generates artificial ingestion load to an AIE cluster via the Java API.  Loadgen generates realistic emails and documents with varying sizes according to a normal distribution.  

The documents the loadgen sends are either plain or emails. Plain files originate from the filesDirectory and are chosen based on a random file size that the load generator wants to send. Emails are files that have attachments and the files to be used as attachments reside in the attachmentDirectory. The user can determine the ratio of attachments vs plain files to send using the percentageAttachments parameter.

The sizes of files are determined by the use of two different algorithms: normal distribution or power law distribution. The normal distribution uses the mean and standard deviation that simulates the 'average' file size to go through the system. This will test the scenario where most files are of average size. The power law distribution uses a mix, max, exponent and seed. The power law will simulate outliers, in particular the infrequent very large file. The parameter dpldUsage is used to determine how often to use the power law algorithm. dlpdUsage is the number of times out of 10 that power law algorithm should be used.

Usage

Flag
Description
Default Value
-z, --zookeeperZooKeeper connection string for systemNone, required
-n, --projectProject nameNone, required
--filesDirectorySource of the files that will be used for random generation. Expectation is that a variety of sizes will be present.None, required
-e, --projectEnvironmentProject environment namedefault
--docsPerSecondNumber of documents per second to generate10
--percentageAttachmentsThe percentage of email files to send with attachments0
--attachmentDirectoryThe directory containing all of the files to be used as attachmentsNone
--attachmentPathsA file containing newline separated paths within the files directory of files (emails) with attachmentsNone
--percentageStructured

The percentage of documents to be created from structured data, the rest will be created from unstructured data

0
--csvStructuredDataFile

The csv file containing structured data to ingest if percentageStructured > 0, format of rows is comma delimited, escape quotation marks (") with double quotation marks ("")

None
--commitOnNewClientTrue if the ingested data should be commit the ingested data on a new clienttrue
--ingestWorkflowNameName of the ingest workflow to use when feeding the documents to the indexfileIngest
--maxDocumentsThe maximum number of documents to ingest before the tool will terminate, -1 means indefinite-1
--newClientThresholdNumber of documents sent before a new client is used1000000
--quietDo not print the status of load generationfalse
--runtimeThe runtime for data generation in seconds.  -1 means indefinite-1
--tableNameThe name of the table field for unstructured documentsNone

Not for Production Systems

The load generator and query generator can overwhelm a system with input and mimic a Denial Of Service (DOS) attack. Do not use these tools on an in-production system.


Examples

aie-exec loadgen -z linenghdp07:2181 -n vtest --filesDirectory /data/bulk

If your system uses a non-default hdfs.store.root

aie-exec -J-Dhdfs.store.root=/user/systemtest/attivio loadgen -z linenghdp07:2181 -n vtest --filesDirectory /data/bulk


Working with Kerberos enabled systems

loadgen interacts directly with AIE client APIs and requires authentication information when working in kerberized environments.  The AIE properties security.hadoop.principal and security.hadoop.keytab may be supplied to aie-exec by prefixing the standard java -Dproperty=value syntax with a -J.  For example:

aie-exec -J-Dsecurity.hadoop.principal=<principal-name> -J-Dsecurity.hadoop.keytab=<path-to-keytab> loadgen -z linenghdp07:2181 -n vtest --filesDirectory /data/bulk
  • No labels