Page tree

Overview

Scanners are designed to connect to external data sources and feed their results in to the Attivio Intelligence Engine (AIE). The scanner then can be executed from the AIE Administrator

Inside AIE, scanners are wrapped in connectors, which connect to an external data source and feed IngestDocuments to ingestion workflows.

ScannerConnectorAPIBoxes

View incoming links.

Create the Project

This example carries on from the project creation example on the Using Java APIs page.

Creating Custom Code

This section demonstrates how to create a custom scanner (implemented in a connector) by creating a new Java class in the AIE Designer. This process includes:

  • Creating the custom scanner class.
  • Compiling the class.
  • Deploying the project to your AIE nodes.
  • Creating and configuring a new connector.
  • Run the connector and examine the new documents in SAIL.

The example does not demonstrate how to connect to your specific data source. 

Create a Custom Scanner in Java

In general, the AbstractScanner  class should be used as a base class for all scanner development, as it provides common logic to make scanner development simple and consistent across different applications. There are several useful variants of the AbstractScanner class available in the com.attivio.connector package.

There are two critical methods to every scanner: 

  • validateConfiguration()
  • start()

When AIE starts a scanner, it first invokes the validateConfiguration() to verify the scanner's configuration and secondly, calls start() to start crawling the external data source.

A custom scanner is configured the same as any Attivio connector configuration. All properties from the configuration are automatically set using the standard java bean set/get methods. For example, in the SampleScanner it would support the single myVariable property.

The code file, SampleScanner.java, is available as an attachment to this page.

Create a New Class

With the project open in AIE Designer, follow these steps to create the new scanner class.

  1. In the Package Explorer view, right-click the project name.  Select New > Class from the popup menu.
  2. In the New Java Class dialog box, enter the Package.  We used com.acme.examples.
  3. Enter the class Name.  We used SampleScanner.  Click the Finish button.
  4. This opens an editing buffer in AIE Designer for the code that defines the new class.  If you wish, you may copy the code from the link above and paste it into the buffer.

Package and Imports

The boilerplate at the top of the file must include the following import statements.

SampleScanner.java
package com.acme.examples;
 
import java.util.Date;
import com.attivio.sdk.AttivioException;
import com.attivio.sdk.error.ConfigurationError;
import com.attivio.sdk.error.ConnectorError;
import com.attivio.sdk.ingest.IngestDocument;
import com.attivio.sdk.schema.FieldNames;
import com.attivio.sdk.server.annotation.ConfigurationOption;
import com.attivio.sdk.server.annotation.ConfigurationOptionInfo;
import com.attivio.sdk.server.annotation.ScannerInfo;

Optional Class Annotations

Classes defined in AIE Designer are automatically made available for editing in the AIE Administrator.  You can make this integration more graceful by supplying optional annotations prior to the class definition, as shown here.

SampleScanner.java
/** A sample scanner to retrieve data from a 3rd party repository. */
@ScannerInfo(suggestedWorkflow = "ingest")
@ConfigurationOptionInfo(
      displayName="SampleScanner Connector",
      description="Demonstrates scanner mechanism.", 
      groups = { 
          @ConfigurationOptionInfo.Group(
              path=ConfigurationOptionInfo.OTHER ,
              propertyNames={"myVariable"})
      })

The significance of these lines is:

  • The workflow field in the editor will be populated with "ingest". 
  • "SampleScanner Connector" will appear as the class name in the New Connector list on the Connectors page of the AIE Administrator.
  • If you hover the mouse over "SampleScanner Connector," AIE will pop up a label saying "Demonstrates scanner mechanism."
  • The myVariable property field will appear on the "Other" tab of the connector editor.

See Adding New Components to the AIE Administrator for more information.

ValidateConfiguration()

The validateConfiguration() method is run automatically when you start up the connector.  It checks to see that all required inputs are present and correct.  If not, it stops scanner execution by throwing an error. In this example, a very simple validateConfiguration() method checks to be sure that the myVariable property is not null.

SampleScanner.java
public class SampleScanner extends AbstractScanner {
 
  // a getter/setter will enable the variable to be configured via xml/code
  private String myVariable;
  
  @Override
  public void validateConfiguration() throws AttivioException {
    super.validateConfiguration();
    if (myVariable == null) {
        throw new AttivioException(ConfigurationError.MISSING_PROPERTY, "myVariable property must be set.");
      } 
  }

The significance of this block of code is:

  • The new class, SampleScanner, extends the AbstractScanner class.
  • We declared myVariable to be a string property of the class.
  • If the myVariable property is null when the scanner starts to run, AIE will throw an exception.

Start()

The start() method is called when the scanner is run.  It contains all the logic related to obtaining the incoming records and sorting the values into the fields of an IngestDocument. 

Get the Content

This portion of the scanner is concerned with connecting to your content source and extracting documents from it.  Then one typically loops over the individual documents, building IngestDocuments. You'll have to fill in the blanks here.

SampleScanner.java
  @Override
  public void start() throws AttivioException {
    try {
      // connect to the 3rd party system

      // loop over all of the documents in the repository

Construct IngestDocuments

The next section of the example demonstrates the rudiments of assembling an IngestDocument.

SampleScanner.java
      // create a new IngestDocument for each, use/generate a unique ID for each document
      IngestDocument doc = new IngestDocument("1");
      // populate the document with the source document's meta data
      doc.setField(FieldNames.TITLE, "my document title");
      doc.setField(FieldNames.DATE, new Date());
      // use variables for configuration or document information
      doc.setField(FieldNames.SOURCEURI, myVariable);
      // set binary content for the file, use other put methods to stream in binary content
      doc.setField(FieldNames.CONTENT_POINTER, super.put("1", new byte[0]));

Document ID cannot be changed

Once an IngestDocument has been created, its ID field value cannot be changed.

Feed the Documents

"Feeding" the document releases it into an ingestion workflow.

SampleScanner.java
          // feed each document, batching etc will be handled automatically
          super.feed(doc);

          // feed more documents
          super.feed(new IngestDocument("2"));

Delete Documents

Note that you can use a scanner to delete indexed documents! (Not every scanner does this.)

SampleScanner.java
          // delete documents (if necessary) use other methods to delete by query, etc
          super.getMessagePublisher().delete("3");

Catch and Throw Errors

One AttivioException in particular must be rethrown by a scanner.  If the scanner does not rethrow the CRAWL_STOPPED exception, it will be impossible to pause or stop the scanner once it starts running.

Other exceptions you may handle as you please.

SampleScanner.java
          // The CRAWL_STOPPED exception must be re-thrown, or you won't be able to pause
          // or stop the connector.
      } catch (AttivioException e) {
          if (e.getErrorCode() == ConnectorError.CRAWL_STOPPED)
              throw e;
          // Handle other exceptions as you please.
          else
              throw e;
      }
  }

Get() and Set() Methods

Each scanner property, such as myVariable, needs a pair of get() and set() methods.  This is another place where we can insert optional annotations to guide how the AIE Administrator will present this property in the Connector Editor.

SampleScanner.java
   @ConfigurationOption(displayName="MyVariable",
          description = "Enter value string.")
  
  public String getMyVariable() {
    return myVariable;
  }
 
  public void setMyVariable(String myVariable) {
    this.myVariable = myVariable;
  }
 
}

Creating a New Connector

The new scanner is only a part of the new connector.  We have to use the AIE Administrator to create the actual connector.

Follow this procedure to create the new connector.

  1. In the AIE Designer, go to the AIE Runtime menu and select Start All Project Servers. Wait until the AIE node is running.
  2. Open your browser to the URL of the AIE Administrator, which is usually http://host:17000\admin.
  3. Navigate to the System Management > Connectors page.
  4. Click New.  From the New Connector list, select SampleScanner Connector.  Click OK.
  5. In the New Connector dialog, on the Scanner tab, give the connector a name.  We used mySimpleScanner.
  6. Note that the Ingest Workflow field is prepopulated with the value "ingest" due to the annotation was added to the scanner definition.
  7. Switch to the Other tab.  Note the "Value of MyVariable" field. Again, the label of the field is due to one of our annotations.
  8. Fill in any value for MyVariable. We used "c:\documents" but any string will do. (If you leave the field empty, you'll see ValidateConfiguration() throw an error when you start the connector.)
  9. Click Save.

Test the New Connector

To test the new connector, follow this procedure:

  1. On the AIE Administrator Connectors page, right-click mySimpleScanner and select Start from the context menu.
  2. The scanner will run to completion in a few seconds.  It will claim to have processed three documents.  That's two indexed documents and one attempt at a deletion.
  3. Navigate to Query > SAIL.
  4. Search for *:*.
  5. Open the Search Options display, and check the Debug box.  Click the Search button.
  6. Examine the search results.  You should see two documents, with the fields and field values specified in the scanner class definition.

 

 

  • No labels