Overview
Although knowing the overall sentiment of a document is often useful, it is also useful to be able to analyze the sentiment for smaller parts of a document, too. The Attivio Intelligence Engine (AIE) provides several ways to perform Entity Extraction. The Entity Sentiment module assigns sentiment weights to those entities, based on their context within a document.
The sentiment model built by sentiment training is used to assign sentiment to entities as well. Building such a model is the first, and most challenging, step.
Required Modules
These features require that the entitysentiment module be included when you run createproject to create the project directories. In addition, both the classifier module and the sentiment module must be installed in AIE. You do not need to add these modules to the project.
Entity Sentiment Highlighting in SAIL
If you are exploring entity sentiment analysis, note that SAIL uses brackets to indicate entity sentiment in search results:
- Positive: Green brackets
- Negative: Red brackets
These icons are assigned in <install-dir>\webapps\sail\resources\css\scopesearch.css.
View incoming links.
Prerequisites
To use Entity Sentiment:
- Entity extraction must be turned on. See Dictionary Entity Extraction.
- A sentiment model must exist. See Building Sentiment Models.
The entity-extraction handlers must include the ScopeTagger. See Scope Search.
.
Configuration
Entity Sentiment is primarily a matter of configuring the entity sentiment module, and adding it to your ingestion workflow.
When you install AIE with the Entity Sentiment module, your AIE configuration includes the following definition:
<component name="addEntitySentiment" class="com.attivio.platform.transformer.ingest.document.AddEntitySentiment"> <properties> <map name="stopWordDictionaries"> <!-- A minimal stopword dictionary is better for sentiment --> <property name="en" value="classifier/dictionaries/small_stopwords_en.csv" /> <!-- add stopword dictionaries for other languages here --> </map> <list name="negationPatterns"> <entry value="not" /> <entry value=".*n't" /> </list> <property name="modelName" value="${sentiment.defaultModelFileName}" /> <property name="lowerCase" value="true" /> <property name="posOutput" value="entity.sentiment.pos"/> <property name="posOutputScore" value="entity.sentiment.pos.score"/> <property name="negOutput" value="entity.sentiment.neg"/> <property name="negOutputScore" value="entity.sentiment.neg.score"/> <property name="includingSentences" value="true" /> <property name="includingNounPhrases" value="false" /> <property name="useEmptyModelIfNoneFound" value="true" /> <property name="blurriness" value="10" /> <property name="minimumPositiveValue" value="0.5"/> <property name="minimumNegativeValue" value="0.5"/> </properties> </component>
Each of the properties reflects some aspect of the component's operation:
the ignoreScopes list (not shown above) specifies scopes which will be ignored by entity sentiment analysis. By default, this list includes the
number
,date
,extracteddate
,time
, andurl
scopes.the stopWordsDictionary map sets, for each language specified, a file of stopwords for that langauges.
The stopWordsDictionary has no impact on the identification of entities. You cannot eliminate an entity by adding it to the stopWordsDictionary.
The stopWordsDictionary can be used to eliminate loaded words that skew the sentiment values associated with entities.
- the negationPatterns list specifies the tokens or token patterns that connote negation, or reverse sentiment polarity in a similar way.
the modelName is the uri for the sentiment model file that this component will use.
default modelName value
The default value of modelName should be changed from the default value before use. The default value is empty-sentiment.model, which will both not do anything and trigger empty model log warnings. The model generic-sentiment.model is recommended, unless the user decides to train their own sentiment model. This change may be made directly in the conf file, or by changing the value in the attivio.entitysentiment.properties file.
- the lowerCase property specifies, if true, that the tokens should be lowercased before being looked up the model. (This value should match the training time value.)
- the posOutput property is the name of the field in which to put the entities assigned positive sentiment by the component.
- the posOutputScore property is the name of the field in which to put this scores for the entities assigned positive sentiment by the component.
- the negOutput property is the name of the field in which to put the entities assigned negative sentiment by the component.
- the negOutputScore property is the name of the field in which to put the scores for the entities assigned negative sentiment by the component.
- the includingSentences property is a boolean that specifies whether sentential information should be used for calculating entity sentiment. It is suggested that this property remain set to "true".
- the includingNounPhrases property is a boolean that specifies whether extracted noun phrases should have sentiments extracted for them as for other entity types.
- the useEmptyModelIfNoneFound property is a boolean. Its meaning is the same as in the sentiment module: If "true", load an empty model if no model file can be found.
- the blurriness property is a positive integer. After computing an initial sentiment value for each token, the algorithm will spread this sentiment value to the
blurriness
tokens before and after it. The default value is 10. - the minimumPositiveValue and minimumNegativeValue define the minimum sentiment value that a entity must have in a document before being included in the list of entities with positive and negative sentiment, respectively. Their default value is 0.5.
Default Minimum Entity Sentiment value
The default value of minimumPositiveValue and minimumNegativeValue is 0.5, but this may not the optimum value for all applications of entity sentiment in AIE. It may require some experimentation to calibrate the minimum values appropriately, especially if a custom sentiment model is used. It is suggested, at least initially, that a value of 0.25-0.5 be assigned to both minimumPositiveValue and minimumNegativeValue.
Note that the fields used for this configuration need to be specified in an addition to the schema. When you create a project with createproject
and include the entitysentiment
module, your project's schema will include the appropriate fields:
<schema name="default" merge="true"> <fields> <!-- default-search-field="content" --> <!-- Just like the "cat" field --> <field name="sentiment" type="string" tokenize="false" indexed="true" stored="true" facet="true" displayName="Sentiment" /> <field name="sentiment.score" type="float" indexed="true" stored="true" facet="false" displayName="Score" /> <field name="entity.sentiment.pos" type="string" tokenize="false" indexed="false" stored="true" facet="true" displayName="PositiveEntities" /> <field name="entity.sentiment.pos.score" type="float" tokenize="false" indexed="false" stored="true" facet="false" displayName="PositiveEntityScores" /> <field name="entity.sentiment.neg" type="string" tokenize="false" indexed="false" stored="true" facet="true" displayName="NegativeEntities" /> <field name="entity.sentiment.neg.score" type="float" tokenize="false" indexed="false" stored="true" facet="false" displayName="NegativeEntityScores" /> </fields> </schema>
If you ensure that your schema includes these fields, and you add the addEntitySentiment
component to an ingest workflow, then entity sentiment will be added to the documents processed by that ingest workflow.
Best Practices
These are highly recommended...
- Entity sentiment should be fed only clean text appropriate to the sentiment model. This means no navigation, menus, extraneous content, advertisements, etc. With respect to the underlying sentiment model, ideally, the model should be trained on a subset of the same content.
- The current sentiment model (downloaded from Attivio) is trained for product, movie, book and company reviews written in (modern) English. The ability of the model to detect sentiment decreases rapidly as the text diverges from the training data. The existing model will not work well on news, politics, general business content or Twitter, etc. Texts written by British authors, or which were written more than 20 years old, can be expected to use different positive and negative terms. Obviously, the existing sentiment model will not work at all on content in languages other than English.
- The type of entity extraction used with entity sentiment should generally be only dictionary-based extraction, and the dictionary should target entities of particular interest relative to the domain. Other types of entity extraction may introduce error.