Page tree
Skip to end of metadata
Go to start of metadata

Overview

This page presents a step-by-step example of using the Java Client API to create a Java application that feeds documents to an Attivio server. (Do not confuse this with a "connector," which runs on an Attivio server and pulls documents from a remote source.)

IngestClientBoxes

View incoming links.

Create an Ingest Application in Java

This exercise demonstrates how to create an independent Java application that can push documents to an Attivio server for ingestion. 

The example reads lines from the medals.csv file (from the Quick Start Tutorial) and converts each line into an IngestDocument.  It then batches up the 87 documents and submits them to an Attivio server.  When the documents have been completely ingested, it reports success.

We'll run the example in the Attivio Designer console to review the results. 

Examine the Raw Data

A Java application that sends documents to an Attivio server must get its data from somewhere. In this example we have opened the medals.csv file from the Quick Start Tutorial. The first few lines look like this:

<install-dir>\conf\sdk\content\medals.csv
"id","country","gold_i","silver_i","bronze_i","total_i"
1,"United States",36,38,36,110
2,"China",51,21,28,100
3,"Russia",23,21,28,72
4,"Britain",19,13,15,47

There are 87 entries in the file. The top row contains the names of the IngestDocument fields that correspond to each data entry. (The "_i" suffix attached to four of the field names tells Attivio to treat these fields as integers.)

Loading CSV Content with AIE

Attivio includes a very sophisticated CSV Connector. There is no need to build your own. Opening a CSV file was the simplest way to show real data moving through this example.

Create the Project

Create a New Class

To create a new Java class for this application:

  1. In Designer, select the project node in the Project Explorer view.  In this example the project is plusjava.
  2. Select New > Class from the File menu.  This exposes the New Java Class dialog.
  3. In the New Java Class dialog, set the Source Folder to plusjava/src.
  4. In the New Java Class dialog, enter a new package name such as com.acme.client.  The object is to create a package that does not already exist in the project to prevent any possible name collisions.
  5. Continuing in the Java Class dialog, enter a name for the new class.  In this example we used PushDocumentTest
  6. Click the Finish button to create the class.  This will open the new class in an editing view.
  7. For the purposes of this exercise, open the PushDocumentTest.java file (attached to this page) and paste the content into the editor.  Save the file.

 

Import Required Classes

The PushDocumentTest.java file begins with a package statement and a list of imported classes.  (The library files that support these classes are already part of the Eclipse project generated by createproject.)

PushDocumentTest.java
package com.acme.client;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

import com.attivio.sdk.ingest.DocumentList;
import com.attivio.sdk.AttivioException;
import com.attivio.sdk.client.IngestClient;
import com.attivio.sdk.ingest.IngestDocument;
import com.attivio.sdk.service.Platform;
import com.attivio.sdk.service.ServiceFactory;

Setting up Main

The PushDocumentTest class will run independently of Attivio, and will connect to a "remote" Attivio server.

PushDocumentTest.java
public class PushDocumentTest {

To run this class as an independent application, we must supply it with a main method. 

PushDocumentTest.java
public static void main(String[] arg) throws IOException {

The main() method expects a string array of arguments, which we will not be using in this example.  It throws appropriate exceptions.

Create an Ingest Client

An Ingest Client is a tool kit for sending IngestDocuments to an Attivio ingestion workflow. 

PushDocumentTest.java
            // create an Ingest Client
            System.setProperty("AIE_ZOOKEEPER", "localhost:16980");
            Platform.instance.setProjectName("remote");  // must be the name of the remote project
            try (IngestClient feeder = ServiceFactory.getService(IngestClient.class)) {

              // set the default workflow
              feeder.setIngestWorkflowName("ingest");

In the lines of code above, we first created a clientFactory. Then we used it to create an IngestClient (called feeder). The AIE_ZOOKEEPER environment variable contains the location string for the project's configuration servers (the same as the zookeepers list in the topology.xml file).

Note that we have to tell the ingest client which Attivio ingestion workflow to use.  In this example we used the default ingest workflow.

Create a Document List

A Document List is a container for multiple IngestDocuments.  We can store documents in the list as they are created, and then send them all to the Attivio Server in one batch.

PushDocumentTest.java
              // create a document list to put docs into
              DocumentList docs = new DocumentList();

Loading the Data

Reading CSV data into a Java program is extremely simple.  We will read the content line-by-line from a Buffered Reader, and then convert each line into a StringArray by cutting the line apart at the commas.

PushDocumentTest.java
            // Connect to the target file
            BufferedReader CSVFile = new BufferedReader(
                    new FileReader("C:\\attivio50\\conf\\factbook\\content\\medals.csv"));'

            String dataRow = CSVFile.readLine(); // Read first line.
            dataRow = CSVFile.readLine();        // Read next line

In the final two lines of this code fragment we have read the top line of the file (the field names) and then discarded it in favor of the second line, which contains actual data.  To keep the example code simple, we'll just type in the field labels instead of trying to manipulate them in the array.

Iterate on Rows

The next section of the example occurs within a while loop.  Each time CSVFile.readLine() reads a new line in the dataRow variable, the while loop tests it for a null value.  A null value means that the iteration has reached the end of the file.

If dataRow contains a line of data, the next step is to split the string up at the commas and put the values into a string array, dataArray[]

PushDocumentTest.java
            while (dataRow != null) {
                String[] dataArray = dataRow.split(",");
                // let's look at the data as it goes by
                for (int i = 0; i <= 5; i++) {
                    System.out.println("Element at index " + i + ": "
                            + dataArray[i]);
                }

We added a nested for loop to send the values to the console so the user can view them.

Create an IngestDocument

Within the while loop, create a new IngestDocument bound to the variable doc.

PushDocumentTest.java
                // Create a new document
                IngestDocument doc = new IngestDocument(dataArray[0]);

The IngestDocument has a id value drawn from dataArray[0], which is to say, from column 1 of the CSV file. 

Set IngestDocument Field Values

The IngestDocument is a container for named fields and their values.  We have to name the fields and set the values one-by-one.

PushDocumentTest.java
                // set field values from the string array
                doc.addValue("title", dataArray[1]);    // country
                doc.addValue("gold_i", dataArray[2]);   // gold
                doc.addValue("silver_i", dataArray[3]); // silver
                doc.addValue("bronze_i", dataArray[4]); // bronze
                doc.addValue("total_i", dataArray[5]);  // total

 

Add Document to Document List

Just before the while loop ends, there are two tasks to perform.

First, add the new IngestDocument to the DocumentList (docs).   Second, read in the next line of the CSV file before looping back to the beginning of the while loop.

PushDocumentTest.java
                // add the doc to a document list
                docs.add(doc);

                // Read next line from file.
                dataRow = CSVFile.readLine(); 

            }   // return to while

            // Close the file once all data has been read.
            CSVFile.close();

After exiting the while loop, be sure to be sure to close the connection to the file.

Submit the Document List

The Ingest Client (feeder) is already aware of the location of the Attivio server.  To submit the documents for ingestion, we use the feeder.feed() method.

PushDocumentTest.java
            // feed the documents to the system
            feeder.feed(docs);

Commit the Changes

Newly-ingested documents cannot be seen in search results until the index is committed.  We can send Attivio a commit message through the IngestClient.  The IngestClient will automatically hold the commit until all of the documents have been ingested.

PushDocumentTest.java
            // commit the documents to make them searchable
            feeder.commit();

            // Notify the user that the job is done.
            System.out.println("Ingested " + feeder.getDocumentsFed() + " documents." );

The final line prints a message to the console telling the user how many documents have been ingested.

Zone-Specific Commits

In some use cases, it is desirable to commit only those documents in a specific zone. In these cases, the following code can replace the code in the example above:

			// commit the documents to make them searchable
			feeder.commit("zone-a","zone-b");

			// Notify the user that the job is done.
			System.out.println("Ingested " + feeder.getDocumentsFed() + " documents."); 

Disconnect the Feeder

It is important to disconnect the content feeder when you are finished with it.

PushDocumentTest.java
            }  // Close feeder.  IngestClient is Closeable and we are using the try-with-resources idiom.

Catch Exceptions

We have to include some error-catching boilerplate at the end.  Note that this free-standing Java application might have to handle an AttivioException, but should not throw owe.  Attivio isn't there to catch it.

    } catch (AttivioException e) {
      System.err.println("Category: " + e.getErrorCode().getCategory());
      System.err.println("Code: " + e.getErrorCode().getCode());
      e.printStackTrace();
  } } }

Compile the Class

To compile PushDocumentTest in Attivio Designer, simply save the file. With default settings, this automatically generates a new <project>.jar file. 

There is no need to redeploy the project for this test.

Run the Program

To run PushDocumentTest from Designer, we'll have to create a Run Configuration for it.  

  1. In Designer, navigate from the Run menu to Run Configurations.  This opens the Run Configurations dialog box.
  2. Right-click Java Application and select New
  3. Create a new PushDocumentTest run application in project plusjava.
  4. On the Arguments Tab, paste in the following VM arguments. (Edit the file path to suit your situation.)

    -Dattivio.log.printStackTraces=true -Dattivio.log.level=INFO 
    -Dattivio.log.directory="C:\attivio-projects\plusjava\build\logs"
  5. Apply the changes.   Close the dialog box.
 

Example Ingest Run

AIE must be running!

It should go without saying that the Attivio server should be up and running before you attempt to run PushDocumentTest.

To run the application, use the Designer Run menu.  Select the Run command.

The console window scrolls up rapidly as the field values display, ending with these messages:

Command Window
Element at index 0: 87
Element at index 1: "Venezuela"
Element at index 2: 0
Element at index 3: 0
Element at index 4: 1
Element at index 5: 1
2013-04-04 12:22:38,046 INFO  ContentFeeder - Connected client 11512@10.255.97.1
09_352119c9-f4ba-4c77-a79d-cb14506837c4 to http://localhost:17001/doc
2013-04-04 12:22:38,053 INFO  IngestClient - Saw new clientID: 11512@10.255.97.1
09_352119c9-f4ba-4c77-a79d-cb14506837c4
2013-04-04 12:22:38,053 INFO  IngestClient - Saw new workflow: [wf=[ingest], rem
=false, ri=false] for client: 11512@10.255.97.109_352119c9-f4ba-4c77-a79d-cb1450
6837c4
2013-04-04 12:22:39,303 INFO  ContentFeeder - client 11512@10.255.97.109_352119c
9-f4ba-4c77-a79d-cb14506837c4 waiting for message results, timeout=-1 workflows=
[wf=[ingest], rem=false, ri=false]
2013-04-04 12:22:40,391 INFO  ContentFeeder - client 11512@10.255.97.109_352119c
9-f4ba-4c77-a79d-cb14506837c4 waiting for message results, timeout=-1 workflows=
[wf=[ingest], rem=false, ri=false]
Ingested 87 documents.

 

The IngestClient and ContentFeeder messages show normal activity, and are followed by the summary line, "Ingested 87 documents."

 

  • No labels