The following exercise builds on the Attivio Quick Start tutorial to introduce the concept of building a custom connector using the Attivio Connector SDK. We'll deploy the Factbook project and then create a custom connector which will connect to an email server via IMAP and ingest the emails present in the Inbox for the account into our Attivio index.
In order to complete the following exercise, you'll need to have the following software installed:
|Application||Recommended Version||Download Location||Notes|
|Maven||3.5 or later||https://maven.apache.org/|
|Eclipse||Oxygen or later||https://www.eclipse.org/||Be sure to install the M2Eclipse plugin or use the Eclipse IDE for Java Developers installer which includes Maven integration.|
1. Deploy the Attivio Factbook Project
Open the Attivio Quick Start Tutorial in a new tab in your browser, follow the instructions to deploy the Factbook project, and return to this page when complete.
2. Create an Attivio Java SDK project
The Attivio Java SDK allows custom connectors to be added to your Attivio projects. The SDK produces a custom module which you can install in any compatible Attivio project. In the next several steps, we will create a custom connector which will log into an email server via IMAP and ingest all the emails that are in the Inbox.
Navigate to the directory where you want to create your connector module. This should be a separate location from the Attivio installation or any Attivio projects you have created.
Create a new module using the Maven archetype command:
This will download a number of artifacts and prompt you for the following:
- groupId - enter 'com.attivio.platform'
- artifactId: enter 'imapconnector'
- package - accept the default package
- Confirm the settings by typing 'Y'
This command will result in a module project directory as follows:
A number of sample source code files and their respective tests are included by default. Feel free to review them, such as
Import the imapconnector project into Eclipse:
- In Eclipse, choose File > Import.
- Expand the Maven group and select Existing Maven Projects and click Next.
- Navigate to the project directory.
- The project's
pom.xmlfile should be displayed and selected. Click Finish.
- In Eclipse's Package Explorer, expand the
- In the next step, we will create the code for the IMAP Connector in a class named
3. Copy the SampleDataSourceScanner Example Class
The Connector SDK provides a number of code samples to get you started in creating all types of connectors. To begin, copy the file at
/imapconnector/src/main/java/com/attivio/platform/imapconnector/SampleDataSourceScanner.java and create a new file from it named
IMapConnector.java in the same package.
4. Define the Configuration of the Connector
Observe the first part of the code that contains a comment, followed by the
@ConfigurationOptionInfo annotations. These describe the purpose of the connector to users when using the Connector Admin wizard and specify its preferred workflow. We can also configure any custom properties we want users to be able to set and where they will be displayed.
Edit the comment to adequately describe the connector
Since emails can have attachments which will require text extraction, we'll configure our connector to submit the documents it feeds to the fileIngest workflow which Attivio provides for this purpose.
Next, let's specify the name and description which will be displayed when users are creating a new connector in the Connector Admin. Ignore the
groups section for now, we'll come back to that.
Next, we will identify all the properties we wish to allow users to set. These include:
||The mail server which supports IMap, such as
The port you need to use to connect using IMAP securely, such as
||The username which will be used when logging in to the server||String|
||The password which will be used when logging in to the server||String|
||The list of folders from which to retrieve emails||List of Strings|
The sample code includes a single property definition as well as its getter and setter. We'll modify this existing one for our
Notice the naming convention of the methods. This is important for the Connector Admin UI to function properly. The getter is named by prepending "get" with the name of the variable with its first character capitalized. The setter likewise is prepended with "set". Therefore, the getter and setter for the
@ConfigurationOptionannotation of the getter gives us the ability to provide a
descriptionwhich will provide guidance to the user in the UI. We've also specified that this field is required using the
Next, we'll add the additional properties.
Notice we've made each of these required and we've used a
foldersgetters so that the text of the
passwordfield is masked in the UI and the
foldersproperty is presented as a list in the UI, complete with buttons to add and remove entries. There are a number of other
formEntryClassoptions which you might find useful when building connectors. Feel free to explore the examples in the
Now we can return to the @ConfigurationOptionInfo annotation of our class and edit the list of property names.
We've specified that our 5 properties should be displayed on a tab labelled "IMap Settings" in the new connector wizard. We could also choose to display them on an existing tab.
5. Create a Test Class and Define a Temporary
Before we get too far along, we should take some time to establish a test class. Once again, we'll copy the provided sample code.
Copy the file at
/imapconnector/src/test/java/com/attivio/platform/imapconnector/SampleDataSourceScannerTest.java and create a new file from it named
IMapConnectorTest.java in the same package.
Modify the source code of the test class to match the following:
Replace the definition of the
start method of the
IMapConnector class with the following:
- Temporarily comment out the
validateConfigurationmethod. We'll return to it later.
- Right-click the
IMapConnectorTest.javaclass in Eclipse's package explorer and choose Run As > JUnit Test. The test should pass. However, we've yet to implement our custom functionality. Let's continue on, but now that we have a test class created, we can continue to update it as we progress.
6. Implement Authentication
Now the fun begins. We can start to implement the logic of the connector. We'll start with logging in.
Start by modifying the project's
pom.xml file to add the following dependency. We'll be using this library in our connector code.
Add the following imports to our connector class:
Next, modify the
start method to match the following. This will connect to the server, open one or more folders and create a document with the name of the folder as its title for each.
Let's update our test class to see if our connector code successfully logs into the server. Update your test class to match the following, filling in your own values for
password in place of
Re-run the test. If it does not pass, take the time to debug your code now and continue once your connector can successfully log into the server.
7. Iterate Through Messages and Create Documents
Now that we can connect to our server and open and close one or more folders, we're ready to start ingesting the messages themselves.
Edit the imports,
start and new
setEnvelope method to match the following :
Edit the test class to match the following, filling in your own values for
password in place of
Re-run the test to confirm we have ingested the first message.
Sidebar on IngestDocument
In some of the previous code snippets, we've created
IngestDocument objects. These represent each searchable document we are adding to the Attivio index and are a central part of creating custom connectors.
Following are some basics when working with the
|Import statement which allows you to create and modify documents to feed into Attivio|
|Creates a new |
|Import statement which allows you to access the |
|Contains document field names that are frequently used in Attivio.|
|Submits the document to the ingestion workflow. This should be done once each document is completely constructed.|
8. Handle Attachments
So far, we've only generated code to create some simple
IngestDocument objects with ids, titles, and a couple other meta data fields. Emails retrieved using IMAP can be plain text or HTML and can also have attachments. The following code should create the necessary documents for us. Beneath the code is an explanation of the snippets we have not seen yet.
In the most recent code, we've introduced a bit more of the SDK:
|To avoid consuming an excessive amount of memory while ingesting documents, when we have binary content, such as a PDF, or even a large amount of text (which some emails have), we put the binary payload in the Content Store and only carry a pointer to this content in the |
|It's likely we'll be interested in displaying an email along with its attachments in our search application. It is a best practice to maintain this "parent-child" relationship by populating a field named "parentid" on the child documents with the .id value of the document from which they came.|
|Notice when we create the attachment documents, we take care to give them unique ids by prepending the original id of the parent document with "attachment-" and appending the filename of the attachment.|
9. Add Logging
In the above code, you may have noticed several
System.out.println() statements. While this is convenient while developing and testing exclusively in Eclipse, it is not what we want to when our connector is ready for deployment to a running Attivio project. Instead, we want to write to the same logs as all other connectors.
Add the following import:
Add the following to the class:
Replace all the
System.out.println() statements with
LOG.trace() statements as appropriate.
10. Add Configuration Validation
Next, we want to catch any bad configurations. We can validate the values set for our custom properties. We would implement any such logic in the provided
For example, we could check whether the port that is set is a number greater than 0.
We could take this much further to ensure that the
emailServer property is either a valid domain or IP address. If there are rules for the
folders, we could validate those as well.
11. Handle Exceptions
Another improvement we should make is to throw appropriate errors when things go wrong. In the validation code we added in the previous step, you can see that we throw an
AttivioException when we hit an issue. We have access to a number of error codes via the
ConnectorError object. You can pick the most appropriate error code to throw, defaulting to
12. Set the Module Configuration
src/main/resources/attivio.module.json file to add the new component [connector]:
13. Build and Install the imapconnector Module in the Factbook Project
At this point, our IMAPConnector class should look like the following:
Now, let's build it and install it in our Factbook project.
When building a production-quality connector, you should delete all the sample classes and their corresponding tests before building your module.
Run the following commands to build the project:
Be sure you modify your test to only ingest a small number of emails, or else you may be waiting a long time while the test executes.
This will create a new file in the target directory of your module named
imapconnector-0.1.0-SNAPSHOT-dist.zip. Next, we'll install this module into our Factbook project.
Run the following command to install the imapconnector module:
Confirm the module has been installed:
Incrementally add the imapconnector module to the Factbook project:
Open the Attivio CLI:
update command and hit
deploy command and hit
Once the project is running again, move on to the next step.
14. Add an Instance of the ImapConnector
- Click Business Center > Connector Admin UI
- Click New Connector
- Select the Imap Connector type and click Next
Name the connector "emails" and click Next
On the IMap Settings tab of the Configure page of the new connector wizard, enter the following:
Field Value Name emails Port 993 Username redacted Password redacted Folders Inbox
- Click Validate
- Click Next
- Once the field mapping are previewed, click Save
- Run the connector
15.Test the Results
- Go to http://localhost:17000/searchui/ to open Search UI. Login with username
- Execute a search for
- The searchUI should display emails from the mailbox(es) specified, with the attachment(s) if any in the mails.
We've covered a lot in this tutorial. We created a custom connector and adjusted how the custom properties it requires are displayed. Our connector logs to the Attivio standard and error logs and throws appropriate exceptions when things go wrong.
If we were to continue building our connector for production use, we'd want to make our connector ingest documents incrementally, meaning each time it runs it only ingests content that is new or has been edited or deleted. We may want to make it ingest more quickly by making it run multiple threads concurrently. Some sources push back when you make requests too frequently. If we can determine what those "push backs" look lie, we can make our connector respond to them in a graceful way, such as halting ingestion for a period of time.
These advanced concepts will be handled in future tutorials. For now, we encourage you to start building connectors and reach out to us with any questions.