Overview
The JoiningDatabaseScanner is designed to efficiently load documents from a database in which parts of each document have been stored in multiple tables that share a common key, such as you would see in a star schema. The Joining Database Scanner supports multiple SQL queries, which lets us pull records from multiple database tables in a single operation.
The Joining Database Scanner assembles document parts from different tables into whole documents before submitting them to the AIE ingestion process.
Required Modules
These features require that the dbconnector module be included when you run createproject to create the project directories.
View incoming links.
Configure a Joining Database Connector
The JoiningDatabaseScanner retrieves multiple records that are the parts of larger documents. It then assembles these records into a single IngestDocument. These assembled IngestDocuments are then ingested in the usual manner.
In this exercise we will create a Joining Database Connector use it to retrieve records from multiple SQL tables. The primary SQL query retrieves country descriptions. Secondary queries extract records from city and economy tables.
SELECT country.code as code, country.name as title, country.capital as text FROM country ORDER BY code
Note that the Joining Database Connector requires that the results be ordered by the join key (ORDER BY code). Ordering the query results helps AIE locate all the records that should be assembled into a specific document.
The secondary SQL queries must also order results by country code. Note the use of two dynamic field names, cityname_mvs and GDP_i. The "_s" and "_i" suffixes on the variable names are a shorthand method of telling AIE to index these fields as a string and an integer, respectively.
SELECT city.country as code, city.name as cityname_mvs FROM city ORDER BY code SELECT economy.country as code, economy.GDP as GDP_i FROM economy ORDER BY code
AIE will combine the three sets of query results into single records that describe a country, its GDP, and its major cities.
Defining the Connector
For this example we need to create a connector, a component, and a workflow. We can do it all in the AIE Administrator.
Start AIE. This will will make the AIE Administrator available at http://<host>:17000/admin.
In the AIE Administrator, navigate to System Management > Connectors. Click New in the menu bar. Select the Joining Database Connector from the list.
On the Scanner tab of the resulting dialog box, enter the Connector Name (JoiningDBConnector). You'll also need the three SQL queries and the database connection parameters.
The ID Column will be code, referring to the document ID field in the results of the primary query. Set the workflow name to ingest.
Click Save.The Connector UI writes out the connection configuration to the project's configuration servers.
Running the Joining Database Connector
Erasing the Index
While testing a new connector, you will frequently need to empty the index and try again. Four methods of deleting the index are described here.
To run the Database Connector, open the AIE Administrator, and navigate to the System Management > Connectors page. Right-click on JoiningDBConnector and click on Start.
Then navigate to SAIL, which is Query > SAIL. Search for *:*, which retrieves all records in all tables. We can see that the query results contain combined information from all three SQL queries.
The documents returned are for each country, but the values from the city and economy tables have been collapsed into the multi-value fields on the country document.
There are many cases in which it is not desirable to collapse tables together. In these cases, AIE's JOIN operator allows for query-time linking of related table results.
Main Article:Relational Querying JOIN
Joining Database Scanner Properties
The Joining Database Scanner can be configured by setting properties in fields of the Connector Editor UI.
JDBC Database Scanner Editor | Remarks |
---|---|
Connector Name | The name of the connector as seen in the UI or in XML. |
Namespace ID | This is an internal field (display only). |
Node Set | The nodeset the connector should run on. Defaults to default-service-nodeset. The Editor can set this value only on new, unsaved connectors. |
JDBC Driver Class | The database JDBC driver, as in "com.mysql.jdbc.Driver". Required. |
JDBC Connection String | The database connection string, such as "jdbc:mysql://localhost:3306/mondial". Required. |
User Name | The database user's username. Required. |
Password | The database user's password. Required. |
SQL | The SQL query that pulls the desired content from the database. Required. |
ID column | Name of the database field to be used as the document ID. Required. |
Reuse Connection | Reuse connection for all SQL. Must be false for certain databases like MySQL databases that can have only one cursor open at a time. |
Other SQL | One or more other SQL queries to execute. |
Document ID Prefix | Append this prefix to the Document ID during processing. |
Ingest Workflow | Ingestion workflow to receive the ingested documents. String. |
Advanced | |
JDBC fetch size hint | Used when retrieving result sets. |
Pre SQL | The SQL that will be run before the execution of the main SQL statement. |
Post SQL | The SQL that will be run after the execution of the main SQL statement. |
Delete SQL | SQL that will be return a list of document Ids that should be removed from AIE. |
RecordSetFetcher Class | The class that filters unchanged rows from a SQL statement, as in "com.attivio.connector.IncrementalResultSetFetcher". |
Auto Commit | Whether the connector should issue a commit message after ingesting the records. |
failOnAutoCommitFailure | |
Read Only | Expert use only. Puts this connection in read-only mode as a hint to the driver to enable database optimizations. |
failOnReadonlyFailure | |
SQL Query Timeout | The connector will wait this many seconds before returning from a failed query. -1 means don't time out. |
useSimpleStatementCreation | |
Incremental | |
Incremental Mode Activated | Enables incremental updates. Boolean. |
Incremental Deletes | Optional. Used with 'incremental-activated' parameter to control if AIE should delete documents that have been removed from the source files. Default is true. |
The table above is for the Joining Database Scanner Tab. The other tabs in the Connector Editor are described on the Connectors page.
Incremental Updating
The JoiningDatabaseScanner supports incremental deletes but does not support incremental updates.