The following exercise builds on the Attivio Quick Start Tutorial and takes a closer look at scope tags, scope queries, creation of an entity dictionary, configuration of a new entity-extraction document transformer, and addition of the new transformer to the existing entity-extraction ingestion workflow.
1. Deploy the Attivio Factbook project
Open the Attivio Quick Start Tutorial in a new tab in your browser, follow the instructions for Steps 1–6 to deploy and start the Factbook project and run its connectors, and return to this page when complete. If you already have the project deployed, just follow Step 5 to start it and Step 6 to run its connectors.
2. Query and view extracted entities (scopes)
You can use Attivio's Search UI web application to see terms in document text which are tagged with scopes by Attivio's entity-extraction components.
Navigate to http://localhost:17000/searchui/ (or click the Administrator UI's Query > Search UI menu link) in your web browser to open the Search UI web application.
Log in with username
aieadmin and password
attivio when prompted.
In the search field, change the query string from
table:country and click the Go button to view all documents which have this
table field value assigned.
At the top right of the screen, click the Details > On button to display full details for the documents.
Look at the
text field values shown for the documents and observe that some terms in these field values are wrapped in
span HTML tags with
class attribute values starting with "scope", as in this example:
These tags indicate that the term in question has been tagged with a scope by Attivio's entity-extraction functionality. In this example, the type of the scope for the term "President" is
jobtitle ; other scopes tagged in the indexed documents include
3. Issue a scope query
Scope search, using the Advanced Query Language's
SCOPE operator, allows you to return only documents which contain terms tagged with a particular scope by class.
In Search UI, enter
text:SCOPE(person) as a new query.
Click the blue down-arrow symbol to the right of the search field and change the query language from Simple to Advanced. (The
SCOPE operator is not supported in the Simple Query Language.)
Click the Go button to view all documents which include terms tagged with
These terms should appear wrapped in
span class="scope person" HTML tags when Search UI's Details function is set to "On"; even when it's set to "Off", Search UI highlights many scopes in query results.
Repeat this exercise for other scopes, such as
If you see no results for a scope query, double-check that the query language is still set to Advanced.
You can also query for documents which include specific terms or scopes within a given scope tag.
In Search UI, enter
text:sentence:AND(scope(date),scope(person),president) as a new query.
Click the blue down-arrow symbol to the right of the search field and ensure that the query language is set to Advanced.
Click the Go button to view all documents which have
text field values that include a
date scope, a
person scope, and the term "president" all within a single
sentence scope tag.
4. Create a new entity dictionary in Business Center
Now we'll walk through creation of a custom dictionary-based entity-extraction component to allow you to tag your documents' content with custom scopes.
First, we need to create a new entity dictionary which contains the terms we want this component to tag. We do this using the dictionary tools in Attivio Business Center.
Navigate to http://localhost:17000/dictionaryadmin/ (or click the Administrator UI's Business Center > Dictionary Administration menu link) in your web browser to open the Dictionary Administration web application.
Log in with username
aieadmin and password
attivio if prompted.
Create a new entity dictionary:
- Click the Create a New Dictionary button to access the New Dictionary screen.
- Enter a Dictionary Name value (e.g., CustomScopes).
- Click the Type drop-down control and select Entity as the type.
- Enter a Group name (e.g., Test).
- Click the Save button to save the new dictionary, then click OK to dismiss the save notification.
Add terms to the new dictionary:
- Click the Edit Terms button to access the Terms screen.
- Click the Add New Term button at the top right of the screen to open the New Term dialog box.
- Enter World War II as a Term value and click the Save button.
- Click the OK button to dismiss the notification message for the new term.
- Repeat the above steps to add two further terms: Great Depression and Cold War.
Approve and publish the new dictionary:
- Click the Approve button to approve the new dictionary (making it eligible for publication), then click OK to dismiss the approval notification.
- Click the Publish button to publish the approved dictionary (making it active), then click OK to dismiss the publication notification.
5. Configure an entity-extraction document transformer
Next we'll create a new ingest transformer component which tags the terms found in our new entity dictionary with a custom
Navigate to http://localhost:17000/admin/workflow_ingest (or click the Administrator UI's System Management > Workflows > Ingest menu link) to access the Ingest workflow editor screen.
Click the extractBaseEntities workflow to open its editor dialog box.
This dialog box shows the list of stages (components) included in this workflow: these are document transformers which operate on documents routed through this workflow.
These stages run in order, so each document is first processed by sentenceFinder, then by companyFinder, and so on.
Create a new workflow component (stage) that instantiates the ExtractDictionaryEntities class:
- Click the Add New Component button to open the Platform Component selection dialog box.
- Type "Dictionary" into the filter field and click the Filter button to view only component classes whose names contain this term.
- Select the ExtractDictionaryEntities class and click the OK button to open the New ExtractDictionaryEntities dialog box.
Configure the new component:
- Enter historyFinder in the Name property field. This is the name of our new document transformer component.
- Enter history in the Entity Type property field. This specifies the name of the custom scope tag this component will assign to matched terms.
- Enter CustomScopes (the name of the new entity dictionary) in the Dictionary Name property field.
- Enter en (English) in the Default Locale property field.
- Click the Save button to save the new document transformer component.
The new historyFinder component should appear at the end of the list of workflow stages. (We could use the Move Up and Move Down buttons to re-order the workflow stages, but we don't need to for this exercise.
Click the Save button to save the modified extractBaseEntities workflow.
6. Re-ingest the Factbook data with the updated configuration
The new entity-extraction stage will operate on documents during ingestion, so we need to re-run our connectors, re-feeding our documents so that they can be processed by the new stage.
Repeat Step 6 from the Attivio Quick Start Tutorial to run the four sample Factbook connectors.
In the Search UI, enter
text:SCOPE(history) into the search field, set the query language to Advanced, and click the Go button.
You should see only documents which include the terms "World War II", "Great Depression", or "Cold War", and those terms should be highlighted in the results.
Now, try extracting your own custom entities. Execute steps 4, 5 and 6 again, creating your own custom entity dictionary and respective component. Perhaps you can look to extract the names of famous people, such as Napoleon, Osama bin Laden, or Christopher Columbus, all of whom appear in the country data of the Factbook project.