Page tree

Quick Start

The following exercise builds on the Attivio Quick Start Tutorial and takes a closer look at creating connectors, mapping the document fields for indexing and creating a custom ingestion workflow with an added document transformer. 

1. Deploy the Attivio Factbook Project

Open the Attivio Quick Start Tutorial in a new tab in your browser, follow the instructions to deploy the Factbook project, and return to this page when complete.

2. Download the sample data

Download the subscribers.csv file and save it to C:\temp\subscribers.csv. Following are the top few lines of the file:

id,first_name,last_name,email,gender,ip_address
1,Melissa,Lotte,mlotte0@washington.edu,Female,112.122.116.179
2,Lavinie,Rickis,lrickis1@symantec.com,Female,118.90.216.140
3,Lorraine,Reyner,lreyner2@fema.gov,Female,10.178.208.251
4,Shawnee,Royste,sroyste3@reference.com,Female,183.243.127.111
5,Gonzales,Vassano,gvassano4@issuu.com,Male,202.103.224.210
6,Darrin,Dandy,ddandy5@time.com,Male,32.37.204.46 

Notice a few characteristics of our CSV file:

  • Our first row contains column headers
  • We have a column which is unique and can serve as the document ID
  • The comma is the delimiter

3. Create a New Connector

    Go to http://localhost:17000/admin/connectors to open the Attivio Admin UI.

    Click the button to create a new connector.

    Search for and select CSV Files. Then press OK

    The New Connector dialog will open up. Enter values for the following fields:

    • Name: subscribers
    • Start Directory: C:\temp
    • ID Fields: id
    • Wildcard Include Filter: subscribers.csv

    Click on the Field Mappings tab.

    Create the following Field Mappings:

    first_name first_name_s
    last_name last_name_s
    email email_s
    gender gender_s
    ip_address ip_address_s

    Attivio projects include a default schema which includes fields such as title, author, text etc., but it does not include fields such as first_name, last_name etc which our sample file contains. So, in this case, we're taking advantage of Attivio's dynamic fields. Attivio provides a shortcut that lets you index your content without first modifying the schema. This shortcut is very convenient during prototyping and when creating a demo in real time.

    When you create the dynamic field (in your connector or workflow component) you can add a special suffix to the field name. The suffix tells Attivio how to index the field. We've appended "_s" to our field names. This tells Attivio to create a string field. To learn more about the Attivio Schema see Configure the Attivio Schema.

     

    Create the following Static Field Values entry:

    table subscribers

    By doing this, we are telling the connector to create a field called table on every document it creates with a value of subscribers. This will allow us to filter these documents in Search UI later.

    Click Save & Test

    The Save & Test feature allows you preview the first 10 documents that would be fed in by the connector. Click on any document id in the left panel and that document's details will be displayed to the right. Notice our field mappings and static field values have been applied.

    Click OK

    Click Save

    Run the connector by selecting it and clicking the button in the top toolbar.

    Click Start in the confirmation dialog window.

    Once the connector finishes, you will see the number of documents it has ingested.

    4. View the Results

    1. Go to http://localhost:17000/searchui/ to open Search UI. Login with username aieadmin and password attivio.



    2. Submit the default search for *:* and then select the "subscribers" value under the table facet.

    3. Search UI does not show dynamic fields by default, so let's show all fields by setting the Details button to On.



    4. We're now looking at the ingested documents from our CSV file.


    5. Create a New Ingest Workflow

    When we created our subscribers connector, we did not edit the default Ingest Workflow property. By default, CSV Scanners submit their documents to the workflow named ingest. In the next steps, we're going to transform our documents slightly during ingestion by upper-casing the first and last names. It is a best practice to not alter the out-of-the-box workflows, so we'll create a new workflow, insert our transformer, then call the ingest workflow.

      Go to http://localhost:17000/admin/workflow_ingest to open the Ingest Workflows page of the Attivio Admin UI.


      Click the New button to create a new workflow.

      The New Workflow dialog will open up. Enter values for the following fields:

      • Name: ingestSubscribers

      Click the Add New Component button to create an instance of a built-in transformer type.

      Type capitalize in the text box and click Filter.

      Expand the Document Transformers node and select Capitalize.

      Click OK.

      The New Component dialog will open up. Enter values for the following fields:

      Name: uppercaseNames

      Field Mappings:

      first_name_s first_name_s
      last_name_s last_name_s

      Click Save.

      Click the Add Subflow button.

      Type ingest in the text box and click Filter.

      Expand the INGEST node and select ingest.

      Click OK.

      Click Save.

       Return to http://localhost:17000/admin/connectors to open the Attivio Admin UI.

      Edit the subscribers connector by clicking on it.

      Set Ingest Workflow to ingestSubscribers.

      Click Save.

      Re-run the subscribers connector by selecting it and clicking the button in the top toolbar.

      6. View the Results Again

      1. Return to http://localhost:17000/searchui/ to open Search UI and use the steps used earlier to show all fields for the documents in the subscribers table.

      2. Notice now that the first_name_s and last_name_s fields are uppercase:

      Exercise

      Now that we've gone through the basics of ingesting CSV data, mapping fields, creating new workflows and built-in transformers, try the following exercise out for yourself:

      1. Download the movies.csv file.
      2. Create a new CSV connector named movies.
      3. Create the following field mappings:

        movie_titlemovie_title_s
        genregenre_s
        urluri
        descriptiontext
      4. Create the following Static Field Values entry:

        tablemovies
      5. Create a new workflow to submit the movie documents to.
      6. Add an instance of the Copy component to copy the movie_title_s field into the title field.
      7. Call the ingest subflow.
      8. Edit the CSV connector to submit documents to your new workflow.
      9. Run the movies connector and view the results in Search UI.

       

      • No labels