Page tree
Skip to end of metadata
Go to start of metadata

Overview

The Attivio Intelligence Engine (AIE) can ingest content from a database through a JDBC interface.

This page provides a brief tutorial overview of this process, using both the JDBC Database Scanner and the Joining Database Scanner. The Standard scanner extracts records based on a single SQL query. Any joins performed by the query occur in the database engine. The Joining scanner uses multiple SQL queries to extract related records from multiple database tables, and joins them into normalized records in an AIE workflow.  The Collapsing Database Scanner pulls in unjoined records and joins them withing the scanner, avoiding flooding the ingest pathway with many partial documents.

More detailed information about each scanner can be found on these pages:

Main article: JDBC Database Scanner
Main article: Joining Database Scanner
Main article: Collapsing Database Scanner

Required Modules

These features require that the dbconnector module be included when you run createproject to create the project directories.

View incoming links.

Which Database Scanner is best for you?

Ingesting database records can be extremely simple, but the usual case is complicated by the need to JOIN records from multiple tables into (untimately) a single AIE query result entry. AIE offers different ways to do this depending on your situation.

There are several scenarios for ingesting these multi-part documents:

  • We could use a JDBC Database Scanner to run multiple SQL queries, each of which retrieves partial documents from a different database table, and store the records in multiple AIE index tables. Then we would use query-side joins to assemble the parts of the desired documents at query time. This approach produces efficient ingestion but slow querying, and it tends to inflate the index with many records.
  • We could use a JDBC Database Scanner to run a SQL JOIN query, forcing the database engine to retrieve and assemble the parts of each document before handing it to AIE for ingestion. This places an enormous load on the database engine, which might not be acceptable to the database administrator. In addition, SQL JOIN queries do not handle multiple-valued fields when combining similarly-named fields from different tables. Only the first value for each field would be reported to AIE.
  • We could use a Joining Database Scanner to:
    • Run multiple SQL queries to efficiently pull parts of documents from the database without asking the database engine to perform any JOINs. This alleviates concerns about overloading the database engine.
    • Then use a Collapse Grouped Messages component to assemble the document parts into complete documents as part of an ingestion workflow. This stores each reassembled document as a single record in the AIE index. This keeps the AIE index lean, and avoids loading down the system with extra work at query time.
  • We could use a Collapsing Database Scanner to run a SQL JOIN query on the database server, and then collapse the incoming records in the AIE scanner, thereby avoiding a flood of partial-document messages in the ingest workflow.

These alternate paths let you work around the most-sensitive pain point in your project's database ingestion step. 

 

  • No labels