The following exercise builds on the Attivio Quick Start Tutorial and takes a closer look at creating connectors, mapping the document fields for indexing and creating a custom ingestion workflow with an added document transformer.
1. Deploy the Attivio Factbook Project
Open the Attivio Quick Start Tutorial in a new tab in your browser, follow the instructions to deploy the Factbook project, and return to this page when complete.
2. Download the sample data
Download the subscribers.csv file and save it to
C:\temp\subscribers.csv. Following are the top few lines of the file:
id,first_name,last_name,email,gender,ip_address 1,Melissa,Lotte,email@example.com,Female,22.214.171.124 2,Lavinie,Rickis,firstname.lastname@example.org,Female,126.96.36.199 3,Lorraine,Reyner,email@example.com,Female,10.178.208.251 4,Shawnee,Royste,firstname.lastname@example.org,Female,188.8.131.52 5,Gonzales,Vassano,email@example.com,Male,184.108.40.206 6,Darrin,Dandy,firstname.lastname@example.org,Male,220.127.116.11
Notice a few characteristics of our CSV file:
- Our first row contains column headers
- We have a column which is unique and can serve as the document ID
- The comma is the delimiter
3. Create a New Connector
Go to http://localhost:17000/admin/connectors to open the Attivio Admin UI.
Click thebutton to create a new connector.
Search for and select CSV Files. Then press OK
The New Connector dialog will open up. Enter values for the following fields:
- Name: subscribers
- Start Directory: C:\temp
- ID Fields: id
- Wildcard Include Filter: subscribers.csv
Click on the Field Mappings tab.
Create the following Field Mappings:
Attivio projects include a default schema which includes fields such as
text etc., but it does not include fields such as
last_name etc which our sample file contains. So, in this case, we're taking advantage of Attivio's dynamic fields. Attivio provides a shortcut that lets you index your content without first modifying the schema. This shortcut is very convenient during prototyping and when creating a demo in real time.
When you create the dynamic field (in your connector or workflow component) you can add a special suffix to the field name. The suffix tells Attivio how to index the field. We've appended "
_s" to our field names. This tells Attivio to create a string field. To learn more about the Attivio Schema see Configure the Attivio Schema.
Create the following Static Field Values entry:
By doing this, we are telling the connector to create a field called
table on every document it creates with a value of
subscribers. This will allow us to filter these documents in Search UI later.
Click Save & Test
The Save & Test feature allows you preview the first 10 documents that would be fed in by the connector. Click on any document id in the left panel and that document's details will be displayed to the right. Notice our field mappings and static field values have been applied.
Run the connector by selecting it and clicking thebutton in the top toolbar.
Click Start in the confirmation dialog window.
Once the connector finishes, you will see the number of documents it has ingested.
4. View the Results
- Go to http://localhost:17000/searchui/ to open Search UI. Login with username
- Submit the default search for *:* and then select the "subscribers" value under the table facet.
- Search UI does not show dynamic fields by default, so let's show all fields by setting the Details button to On.
- We're now looking at the ingested documents from our CSV file.
5. Create a New Ingest Workflow
When we created our subscribers connector, we did not edit the default Ingest Workflow property. By default, CSV Scanners submit their documents to the workflow named
ingest. In the next steps, we're going to transform our documents slightly during ingestion by upper-casing the first and last names. It is a best practice to not alter the out-of-the-box workflows, so we'll create a new workflow, insert our transformer, then call the
Go to http://localhost:17000/admin/workflow_ingest to open the Ingest Workflows page of the Attivio Admin UI.
Click the New button to create a new workflow.
The New Workflow dialog will open up. Enter values for the following fields:
- Name: ingestSubscribers
Click the Add New Component button to create an instance of a built-in transformer type.
Type capitalize in the text box and click Filter.
Expand the Document Transformers node and select Capitalize.
The New Component dialog will open up. Enter values for the following fields:
Click the Add Subflow button.
Type ingest in the text box and click Filter.
Expand the INGEST node and select ingest.
Return to http://localhost:17000/admin/connectors to open the Attivio Admin UI.
Edit the subscribers connector by clicking on it.
Set Ingest Workflow to ingestSubscribers.
Re-run the subscribers connector by selecting it and clicking the button in the top toolbar.
6. View the Results Again
- Return to http://localhost:17000/searchui/ to open Search UI and use the steps used earlier to show all fields for the documents in the subscribers table.
- Notice now that the first_name_s and last_name_s fields are uppercase:
Now that we've gone through the basics of ingesting CSV data, mapping fields, creating new workflows and built-in transformers, try the following exercise out for yourself:
- Download the movies.csv file.
- Create a new CSV connector named movies.
Create the following field mappings:
movie_title movie_title_s genre genre_s url uri description text
Create the following Static Field Values entry:
- Create a new workflow to submit the movie documents to.
- Add an instance of the Copy component to copy the movie_title_s field into the title field.
- Call the ingest subflow.
- Edit the CSV connector to submit documents to your new workflow.
- Run the movies connector and view the results in Search UI.