Page tree
Skip to end of metadata
Go to start of metadata

Overview

The Attivio Intelligence Engine (AIE) supports document classification and sentiment analysis using a statistical classification engine. Statistical classification technology is an application of machine learning. Statistical models are trained using training data which has been manually labeled. See Building Sentiment Models for instructions on building models. A sentiment model is a black box which accepts a document as input and computes a score for each possible output label, "pos" or "neg". Even when the document does not contain a significant amount of positive or negative sentiment, it always receives one of the two labels. 

Document-level sentiment analysis is performed by the AddSentimentDocument  ingest transformer component. When a document is received by the AddSentimentDocument transformer, the transformer converts some fields in the document into a "bag of words". The "bag of words" for the document is compared against the word statistics which were created during the training process for each label in the sentiment model, and the sentiment label and score are output to configurable output fields.


Required Modules

The features on this page require that you include the sentiment module when you run createproject to create the project directories. In addition, the classifier module must be installed in AIE, but does not need to be added to the project. 

View incoming links.

 

Default Behavior

When you include the sentiment module in a project, it modifies the ingestion workflow by adding a new component, addSentimentDocument. This component is defined in <project-dir>\conf\components\addSentimentDocument.xml. To modify this component, edit the component in the Palette area of the Administration User Interface. 

The sentiment module adds some fields to your project schema file, <project-dir>\conf\schema\default.xml:

<project-dir>\conf\schema\default.xml
    <field name="sentiment" displayName="Sentiment" type="STRING" indexed="true" stored="true" sort="false" tokenize="no"/>
    <field name="sentiment.score" displayName="Score" type="FLOAT" indexed="true" stored="true" sort="false"/>
    <field name="sentiment.explain" displayName="Sentiment Explanation" type="STRING" indexed="true" stored="true" sort="false"/>
    <field name="sentiment.explain.score" displayName="Sentiment Explanation Score" type="FLOAT" indexed="true" stored="true" sort="false"/>

 

AddSentimentDocument

Note that the locations of the stop-word list and the sentiment model are under your control, and you'll probably need to modify those filepaths in the definition of addSentimentDocument. To modify it, edit the addSentimentDocument component through the AIE Administrator.

There are three parts to addSentimentDocument. The first part specifies the URI of the the sentiment model to be loaded. The second part specifies how the "bag of words" is computed, and the third part specifies where the sentiment label and score should be output.

Model parameters

Parameter

Value

Description

Model Name

String (no default)

URI of the model to be loaded, either acs://contentStoreName/name or a filename.

Bag-of-words parameters

Parameter

Value

Description

Stopwords Dictionary

List

Specifies the list of stop-word dictionaries (by language). Stop words are ignored when building the bag of words.

LowerCase

Boolean (default: false)

If true, all words are forced to lower case before being added to the bag of words.

Negation Patterns

List of strings (no default)

The list of words which trigger special negation processing when building the bag of words.

isUseAllNaturalLanguageFields

Boolean (default: true)

When true, all words in all natural language fields in the document are included in the bag of words. When false, only the words found in fields explicitly selected by the fields parameter are included.

Fields

List

Specifies the list of fields to use when building the bag of words when useAllNaturalLanguageFields is false.

The bag-of-words parameters used here must be exactly the same as those used to train the model.  Otherwise the sentiment analysis will be meaningless. 

All fields used by sentiment analysis need to be tokenized. If any of the fields used in computing the "bag of words" is added after tokenization, a warning will be printed during sentiment analysis. Main article: Ingestion Tokenization.

Output parameters for AddSentimentDocument

Parameter

Value

Description

Output

String Default:"sentiment"

When a sentiment label is output, it is written to this field.

Output Score

Default:"sentiment.score"

When a sentiment score is output, it is written to this field.
The sentiment score is a float in the range 0.0 to 1.0, which expresses the statistical confidence that the document has been assigned to the right sentiment category.

Output Score Threshold

Float (default: 0.5)

The minimum sentiment score for a label to be output. Note that lowering the threshold sufficiently can result in a document that has both a positive and a negative sentiment score.

ExplainingBoolean (default: false)If true, turns on the "explanation" feature of the classifier, so the document keeps a record of the words and phrases that contributed most to its label.
Explanation FieldnameString (default: "sentiment.explain")

The desired number of explanatory words and phrases are written to this field.

Explanation Score Field NameString (default: "sentiment.explain.score")For each explanatory word or phrase, its contribution to the raw output score is written to this field.
Explanation LengthInteger (default: 5)The maximum number of top-weighted contributors to add to the explanation.

The addSentimentDocument component always outputs all sentiment labels which have scores which exceed the value of the outputScoreThreshold parameter.

Note that the algorithm tends to be "more certain" (scores tend to be close to 1.0) on longer documents.

 

The explanation feature can be turned on at any time, with any sentiment model. You don't need to train a special explanation model.

 

Installing a Sentiment Model

For demonstration purposes, Attivio provides a generic sentiment model and a small set of test documents to use with it. The model was trained using a set of positive and negative consumer reviews of products and restaurants. The sample sentiment model was trained for product reviews written in modern written English.

Limitations of Sentiment Model

The ability of any model to detect sentiment decreases rapidly as the text diverges from the training data. For example, the default sentiment model will not perform well on abbreviated "tweets." The model will not work at all on content in languages other than English.

Attivio recommends training a sentiment model on samples of the data you need to analyze. See Building Sentiment Models for instructions on building sentiment models.

 

 

Using the Example Sentiment Analysis Model

To install the sample sentiment analysis model:

  1. Note that the sentiment and classifier modules are Add-on Modules that must be downloaded and installed before using createproject. (Unzip the modules into the top-level AIE <install-dir>.)
  2. Restart AIE and use createproject to create your sentiment project. Make sure the project uses the sentiment module (required). We recommend adding the demo group of AIE modules for this exercise. It is not necessary to include the classifier module for sentiment analysis, although it must be downloaded and installed.

    <install-dir>\bin\createproject -n myproject -m sentiment -g demo -o c:\attivio-projects
    
  3. Optionally (in order to use a different model or stopwords list) edit addSentimentDocument in the AIE Administrator.

  4. Create a "start directory" such as c:\documents. Copy the nine sample files from <install-dir>\example\sentiment\genericSentimantDemo\conf\sampleInput into this directory. (It is not actually necessary to use c:\documents. It just makes the demo code easier to view.)
  5. Create a "GenericReviews" File Connector to ingest the documents. (Main article: Loading File Content). Give it the location of the start directory, and send the incoming documents to the textFileIngest workflow.

  6. Start AIE using the AIE Agent and its Command-Line Interface (CLI):

    1. Run the Agent in a Command Window:

      <install-dir>\bin\aie-agent.exe -d <data-agent-dir>
    2. Run the Command-Line Interface in a second Command Window. Note that the CLI is invoked for a specific project.

      <install-dir>\bin\aie-cli -p <project-dir>
    3. To run the project, use the start all command in the Command-Line Interface:
  7. Navigate to the Connectors page of the Administration UI.

    http://localhost:17000/admin/connectors?title=Connectors
    
  8. Start the Generic Reviews connector to ingest the nine documents. This takes only a few seconds.
  9. Navigate to Search UI, which is Query > Search UI.

  10. Set the Details toggle to "ON."
  11. Search for *:*.  This returns all nine documents.
  12. Note that the sentiment and sentiment.score fields have been filled in with appropriate values.