Scanners are designed to connect to external data sources and feed their results in to the Attivio Intelligence Engine (AIE). The scanner then can be executed from the AIE Administrator.
Inside AIE, scanners are wrapped in connectors, which connect to an external data source and feed IngestDocuments to ingestion workflows.
View incoming links.
Create the Project
This example carries on from the project creation example on the Using Java APIs page.
Creating Custom Code
This section demonstrates how to create a custom scanner (implemented in a connector) by creating a new Java class in the AIE Designer. This process includes:
- Creating the custom scanner class.
- Compiling the class.
- Deploying the project to your AIE nodes.
- Creating and configuring a new connector.
- Run the connector and examine the new documents in SAIL.
The example does not demonstrate how to connect to your specific data source.
Create a Custom Scanner in Java
In general, the AbstractScanner class should be used as a base class for all scanner development, as it provides common logic to make scanner development simple and consistent across different applications. There are several useful variants of the AbstractScanner class available in the com.attivio.connector package.
There are two critical methods to every scanner:
When AIE starts a scanner, it first invokes the
validateConfiguration() to verify the scanner's configuration and secondly, calls
start() to start crawling the external data source.
A custom scanner is configured the same as any Attivio connector configuration. All properties from the configuration are automatically set using the standard java bean set/get methods. For example, in the
SampleScanner it would support the single
The code file, SampleScanner.java, is available as an attachment to this page.
Create a New Class
With the project open in AIE Designer, follow these steps to create the new scanner class.
- In the Package Explorer view, right-click the project name. Select New > Class from the popup menu.
- In the New Java Class dialog box, enter the Package. We used com.acme.examples.
- Enter the class Name. We used SampleScanner. Click the Finish button.
- This opens an editing buffer in AIE Designer for the code that defines the new class. If you wish, you may copy the code from the link above and paste it into the buffer.
Package and Imports
The boilerplate at the top of the file must include the following import statements.
Optional Class Annotations
Classes defined in AIE Designer are automatically made available for editing in the AIE Administrator. You can make this integration more graceful by supplying optional annotations prior to the class definition, as shown here.
The significance of these lines is:
- The workflow field in the editor will be populated with "ingest".
- "SampleScanner Connector" will appear as the class name in the New Connector list on the Connectors page of the AIE Administrator.
- If you hover the mouse over "SampleScanner Connector," AIE will pop up a label saying "Demonstrates scanner mechanism."
- The myVariable property field will appear on the "Other" tab of the connector editor.
See Adding New Components to the AIE Administrator for more information.
The validateConfiguration() method is run automatically when you start up the connector. It checks to see that all required inputs are present and correct. If not, it stops scanner execution by throwing an error. In this example, a very simple validateConfiguration() method checks to be sure that the myVariable property is not null.
The significance of this block of code is:
- The new class, SampleScanner, extends the AbstractScanner class.
- We declared myVariable to be a string property of the class.
- If the myVariable property is null when the scanner starts to run, AIE will throw an exception.
The start() method is called when the scanner is run. It contains all the logic related to obtaining the incoming records and sorting the values into the fields of an IngestDocument.
Get the Content
This portion of the scanner is concerned with connecting to your content source and extracting documents from it. Then one typically loops over the individual documents, building IngestDocuments. You'll have to fill in the blanks here.
The next section of the example demonstrates the rudiments of assembling an IngestDocument.
Document ID cannot be changed
Once an IngestDocument has been created, its ID field value cannot be changed.
Feed the Documents
"Feeding" the document releases it into an ingestion workflow.
Note that you can use a scanner to delete indexed documents! (Not every scanner does this.)
Catch and Throw Errors
One AttivioException in particular must be rethrown by a scanner. If the scanner does not rethrow the CRAWL_STOPPED exception, it will be impossible to pause or stop the scanner once it starts running.
Other exceptions you may handle as you please.
Get() and Set() Methods
Each scanner property, such as myVariable, needs a pair of get() and set() methods. This is another place where we can insert optional annotations to guide how the AIE Administrator will present this property in the Connector Editor.
Creating a New Connector
The new scanner is only a part of the new connector. We have to use the AIE Administrator to create the actual connector.
Follow this procedure to create the new connector.
- In the AIE Designer, go to the AIE Runtime menu and select Start All Project Servers. Wait until the AIE node is running.
- Open your browser to the URL of the AIE Administrator, which is usually http://host:17000\admin.
- Navigate to the System Management > Connectors page.
- Click New. From the New Connector list, select SampleScanner Connector. Click OK.
- In the New Connector dialog, on the Scanner tab, give the connector a name. We used mySimpleScanner.
- Note that the Ingest Workflow field is prepopulated with the value "ingest" due to the annotation was added to the scanner definition.
- Switch to the Other tab. Note the "Value of MyVariable" field. Again, the label of the field is due to one of our annotations.
- Fill in any value for MyVariable. We used "c:\documents" but any string will do. (If you leave the field empty, you'll see ValidateConfiguration() throw an error when you start the connector.)
- Click Save.
Test the New Connector
To test the new connector, follow this procedure:
- On the AIE Administrator Connectors page, right-click mySimpleScanner and select Start from the context menu.
- The scanner will run to completion in a few seconds. It will claim to have processed three documents. That's two indexed documents and one attempt at a deletion.
- Navigate to Query > SAIL.
- Search for *:*.
- Open the Search Options display, and check the Debug box. Click the Search button.
- Examine the search results. You should see two documents, with the fields and field values specified in the scanner class definition.