Page tree
Skip to end of metadata
Go to start of metadata

 

Overview

Ingest transformers (also called Document Transformers) operate on IngestDocuments as they pass through ingestion workflows.  A typical transformer operates on document field values and then places some resulting value in a new field.  

IngestionTransformerContext

Document transformers that implement the DocumentModifyingTransformer   interface must implement a processDocument()  method, perform whatever modifications are necessary on the IngestDocument, and then return true to pass the document to the next component, or false to drop the document from the workflow:

Based on SampleSimpleIngestTransformer.java
  public boolean processDocument(IngestDocument doc) throws AttivioException {
    // a really simple example to set a field value.
    doc.setField(field, value);
    // indicates that everything went as expected
    return true;
  }

Inside the processDocument  method any business logic required can be performed, including adding fields, removing fields, and modifying field values. The  IngestDocument  class contains many useful methods for modifying documents in a transformer.

Document ID cannot be changed

Once an IngestDocument has been created, its ID field value cannot be changed.

Datatypes in IngestDocuments

The AIE Schema describes the datatypes and behavior of strongly-typed fields in the Universal Index. Many users assume that these strong datatype restrictions must apply to IngestDocument fields, too, but this is a misleading idea.

Schema field definitions are not binding on the fields of an IngestDocument, even when the fields have the same names as Schema fields. An IngestDocument is a scratch pad where AIE connectors and workflow components work up a description of an index entry. Field values can be transformed from one datatype to another during this process.

The schema field definitions are applied during indexing. The indexer attempts to cast the document's field values into the types required by the schema. If a field value cannot be correctly cast, the value is dropped.

View incoming links.

Creating Custom Code

This section presents the simplest-possible example of extending the AIE ingestion process with custom code. Tasks include:

  • Creating a new field transformer in Java. This transformer inserts the value "Hello World" into the table field of each city record in the Quick Start Tutorial Factbook demo.
  • Creating a new schema field.
  • Wrapping the new transformer in a new component.
  • Putting the new component in a new workflow.
  • Inserting the new workflow between the cityConnector and the ingest workflow, so that all incoming city records receive the "Hello World" message.
  • Viewing city records in SAIL to see the "Hello World" message appear in the search results.

Create a Custom Field Transformer in Java

A field transformer is one that operates on the fields of an IngestDocument.

To create the world's simplest field transformer, do the following:

  1. In AIE Designer, in the Package Explorer, right-click the appropriate project (AIE - plusjava in this case), click the New -> Class menu item. The New Java Class dialog box appears.
  2. Specify the Source folder as the /src folder of your project, in this case AIE - plusjava/src.
  3. Select (or invent) an appropriate Java Package for the new transformer. In this case we chose com.acme.examples.
  4. Enter the transformer's name. In this case it is DemoFieldTransformer.
  5. Accept the remaining defaults and click Finish. This opens an editor for DemoFieldTransformer.java .
  6. Edit the DemoFieldTransformer class definition to import the necessary classes and interfaces, as shown here:

    DemoFieldTransformer.java
    package com.acme.examples;
    
    import com.attivio.sdk.AttivioException;
    import com.attivio.sdk.ingest.IngestDocument;
    import com.attivio.sdk.server.component.ingest.DocumentModifyingTransformer;
  7. Add two fields to the class definition. We're going to add the value "Hello World!" to a new string field called greeting_s. The "_s" suffix is a shorthand way of tellling AIE that we want to treat this new field as a string.

    DemoFieldTransformer
    public class DemoFieldTransformer implements DocumentModifyingTransformer {
    
        private String field = "greeting_s";
        private String value = "Hello World";
  8. Edit the processDocument()  method. This is the method that processes each IngestDocument that appears in the input queue of the transformer. In this case, it is defined to add "Hello World!" to the greeting_s field. The method returns "true," meaning that the modified document can be passed to the next stage of the workflow. ("False" would result in the document being dropped.)

    DemoFieldTransformer
        public boolean processDocument(IngestDocument doc) throws AttivioException {
            doc.addValue(field, value);
            return true; 
        }
    }
  9. Save the file.  If the Project menu Build Automatically feature is checked (which is the default), saving the file will automatically rebuild the project, creating a new Factbook.jar file. Otherwise, use the Project -> Build Project menu command.
  10. Once the transformer has been saved and the Factbook.jar file is in place, use AIE Designer or AIE-CLI to first deploy  the project.
    1. From AIE Designer, use AIE Runtime > Deploy Project Configuration.
    2. Alternately, from AIE-CLI, use the deploy command.

Note that the first attempt to run a new project through Designer or AIE-CLI will automatically deploy it.

At this point the new transformer is ready.

Datatypes in IngestDocuments

The AIE Schema describes the datatypes and behavior of strongly-typed fields in the Universal Index. Many users assume that these strong datatype restrictions must apply to IngestDocument fields, too, but this is a misleading idea.

Schema field definitions are not binding on the fields of an IngestDocument, even when the fields have the same names as Schema fields. An IngestDocument is a scratch pad where AIE connectors and workflow components work up a description of an index entry. Field values can be transformed from one datatype to another during this process.

The schema field definitions are applied during indexing. The indexer attempts to cast the document's field values into the types required by the schema. If a field value cannot be correctly cast, the value is dropped.

 

Configure a Component, Workflow, and Connector

Using dynamic configuration, we're going to modify the Factbook cityConnector so it sends IngestDocuments to a new workflow called myIngest. This workflow will include a DemoTransformer component, which encapsulates an instance of the new DemoFieldTransformer java class. The component will add the greeting "Hello World!" to each IngestDocument. When this step is complete, the new workflow will pass the IngestDocument to the standard AIE ingest workflow for indexing.

All of this needs to be configured a step at a time in the AIE Administrator.

Start the AIE Node

In the AIE Designer, right-click the project in the Project Explorer view.  Select AIE Runtime > Start All Project Servers.

Wait until the AIE node is running.

Create a Component

We need to configure a new component, DemoTransformer, to encapsulate the DemoFieldTransformer instance.

  1. Direct your browser to http://localhost:17000/admin (or substitute your host and port).
  2. Open the Palette.
  3. Click the New button. This opens a dialog box to select a platform component type.
  4. Open the list of Document Transformers.
  5. Scroll down to the DemoFieldTransformer. Select it and click OK. This opens a Component Editor.
  6. On the Platform Component tab of the editor, enter the component's name. We used "DemoTransformer."
  7. When finished, click the Save button. 


 

Create a Workflow

The next step is to insert the component into a custom workflow. The workflow myIngest will have a DemoTransformer stage followed by an ingest subflow. The ingest workflow is the entry point into AIE's usual linguistic processing and indexing.

  1. Navigate to System Manager > Workflows > Ingest in the AIE Administrator interface.
  2. Click the New button. This opens an Ingest Workflow Editor.
  3. Give the workflow a name ("myIngest").
  4. Use the Add Existing Component button to add the DemoTransformer component to the workflow.
  5. Use the Add Subflow button to add the ingest subflow to the workflow.
  6. Click the Save button.

MyIngestWorkflowEditor

Modify CityConnector

We need to make one more modification. We're going to change the output of the Factbook cityConnector from the ingest workflow to the myIngest workflow.

  1. Navigate to System Management > Connectors, and click on the cityConnector. This opens the cityConnector in a Connector Editor.
  2. Scroll down the Scanner tab to the Ingest Workflow field.
  3. Change "ingest" to "myIngest" in the Ingest Workflow field.
  4. Click the Save button.

cityConnectorEditor

Update and Deploy

Although it is not necessary for this demonstration, this is the point where an AIE developer would make the dynamic connector and workflow changes permanent by updating and redeploying the configuration.  This writes the dynamic changes on the configuration server back to the AIE project sources as XML files, making them a permanent part of the project.

  1. Once the connector, workflow, and transformer have been saved, use AIE Designer or AIE-CLI to first update the project and then deploy it.
    1. From AIE Designer, use AIE Runtime > Update Local Configuration. Then use AIE Runtime > Deploy Project Configuration using the Ignore Changes and Replace Existing Project option.
    2. Alternately, from AIE-CLI, use the update command, followed by the deploy force command.

Load Cities

To try out the new transformer, start the Factbook demo.

  1. In AIE Administrator, Browse to System Management -> Connectors.
  2. Check the box beside cityConnector and click the Start link. It loads 2601 records.

Search for Cities

Open SAIL and look for the Hello World message.

  1. Browse to the Query -> SAIL search page.
  2. Open the SAIL settings (the "gear" icon) to the Field Expressions tab.  Scroll down to the bottom of that display and add "greeting_s" to the list of Other Display Fields.  Save the settings.
  3. Using the Simple Query Language, search for *:*.
  4. Look for the greeting_s field and the "Hello World" value.

 

 

  • No labels