Page tree
Skip to end of metadata
Go to start of metadata

Overview

The JoiningDatabaseScanner is designed to efficiently load documents from a database in which parts of each document have been stored in multiple tables that share a common key, such as you would see in a star schema. The Joining Database Scanner supports multiple SQL queries, which lets us pull records from multiple database tables in a single operation. 

The Joining Database Scanner assembles document parts from different tables into whole documents before submitting them to the AIE ingestion process. 

Required Modules

These features require that the dbconnector module be included when you run createproject to create the project directories.

 

View incoming links.

Configure a Joining Database Connector

The JoiningDatabaseScanner retrieves multiple records that are the parts of larger documents. It then assembles these records into a single IngestDocument. These assembled IngestDocuments are then ingested in the usual manner.

In this exercise we will create a Joining Database Connector use it to retrieve records from multiple SQL tables. The primary SQL query retrieves country descriptions. Secondary queries extract records from city and economy tables.

Primary SQL Query
SELECT country.code as code,
    country.name as title,
    country.capital as text
    FROM country ORDER BY code

Note that the Joining Database Connector requires that the results be ordered by the join key (ORDER BY code). Ordering the query results helps AIE locate all the records that should be assembled into a specific document.

The secondary SQL queries must also order results by country code. Note the use of two dynamic field names, cityname_mvs and GDP_i. The "_s" and "_i" suffixes on the variable names are a shorthand method of telling AIE to index these fields as a string and an integer, respectively.

Secondary SQL Queries
SELECT city.country as code,
    city.name as cityname_mvs
    FROM city ORDER BY code

SELECT economy.country as code,
    economy.GDP as GDP_i
    FROM economy ORDER BY code

AIE will combine the three sets of query results into single records that describe a country, its GDP, and its major cities.

Defining the Connector

For this example we need to create a connector, a component, and a workflow. We can do it all in the AIE Administrator.

Start AIE. This will will make the AIE Administrator available at http://<host>:17000/admin.

In the AIE Administrator, navigate to System Management > Connectors. Click New in the menu bar. Select the Joining Database Connector from the list.

NewJoiningDBConnector

On the Scanner tab of the resulting dialog box, enter the Connector Name (JoiningDBConnector). You'll also need the three SQL queries and the database connection parameters.

The ID Column will be code, referring to the document ID field in the results of the primary query. Set the workflow name to ingest.

JoiningDBScanner

 

Click Save.The Connector UI writes out the connection configuration to the project's configuration servers.

Running the Joining Database Connector

Erasing the Index

While testing a new connector, you will frequently need to empty the index and try again. Four methods of deleting the index are described here.

To run the Database Connector, open the AIE Administrator, and navigate to the System Management > Connectors page. Right-click on JoiningDBConnector and click on Start.

StartJoinDBScanner

Then navigate to SAIL, which is Query > SAIL. Search for *:*, which retrieves all records in all tables. We can see that the query results contain combined information from all three SQL queries.

JoinScannerResults

The documents returned are for each country, but the values from the city and economy tables have been collapsed into the multi-value fields on the country document.

There are many cases in which it is not desirable to collapse tables together. In these cases, AIE's JOIN operator allows for query-time linking of related table results.

Main Article:Relational Querying JOIN

Joining Database Scanner Properties

The Joining Database Scanner can be configured by setting properties in fields of the Connector Editor UI.

JDBC Database
Scanner Editor
Remarks
Connector Name

The name of the connector as seen in the UI or in XML.

Namespace IDThis is an internal field (display only).
Node SetThe nodeset the connector should run on. Defaults to default-service-nodeset. The Editor can set this value only on new, unsaved connectors.
JDBC Driver Class

The database JDBC driver, as in "com.mysql.jdbc.Driver". Required.

JDBC Connection StringThe database connection string, such as "jdbc:mysql://localhost:3306/mondial". Required.
User NameThe database user's username. Required.
PasswordThe database user's password. Required.
SQLThe SQL query that pulls the desired content from the database. Required.
ID columnName of the database field to be used as the document ID. Required.
Reuse ConnectionReuse connection for all SQL. Must be false for certain databases like MySQL databases that can have only one cursor open at a time.
Other SQLOne or more other SQL queries to execute.
Document ID PrefixAppend this prefix to the Document ID during processing.
Ingest WorkflowIngestion workflow to receive the ingested documents. String.
Advanced 
JDBC fetch size hintUsed when retrieving result sets.
Pre SQLThe SQL that will be run before the execution of the main SQL statement.
Post SQLThe SQL that will be run after the execution of the main SQL statement.
Delete SQLSQL that will be return a list of document Ids that should be removed from AIE.
RecordSetFetcher Class

The class that filters unchanged rows from a SQL statement, as in "com.attivio.connector.IncrementalResultSetFetcher".

Auto CommitWhether the connector should issue a commit message after ingesting the records.
failOnAutoCommitFailure
 
 
Read Only

Expert use only. Puts this connection in read-only mode as a hint to the driver to enable database optimizations.

failOnReadonlyFailure
 
 
SQL Query Timeout
 
The connector will wait this many seconds before returning from a failed query. -1 means don't time out.
useSimpleStatementCreation 
Incremental 
Incremental Mode ActivatedEnables incremental updates.  Boolean.
Incremental DeletesOptional. Used with 'incremental-activated' parameter to control if AIE should delete documents that have been removed from the source files. Default is true.

The table above is for the Joining Database Scanner Tab.  The other tabs in the Connector Editor are described on the Connectors page.

 

Incremental Updating

The JoiningDatabaseScanner supports incremental deletes but does not support incremental updates. 

  • No labels