Welcome to the Attivio technical training and certification program! We are delighted to have this opportunity to acquaint you with our company, Attivio Inc., and our flagship product, the Attivio platform.
The Attivio training program covers all aspects of applying the Attivio platform to your projects. In this segment of the training, we'll introduce ourselves and our product, and we'll demonstrate how we bring value to the marketplace.
View incoming links.
The lesson covers these topics:
Attivio - Who We Are
Attivio is an award-winning enterprise software company headquartered in downtown Boston, Massachusetts, USA.
Our mission is to put AI-powered answers and insights at the core of every enterprise.
Attivio is a powerful platform for providing unified information access. This platform is made up of multiple layers that build on one another to deliver unified information access (UIA).
Data and Content Integration
The problem with business information is that it is broken up into many isolated repositories. Information consumers must explore and dig for each separate nugget of information, and then assemble the pieces into a meaningful picture. It isn't just that there are many different repositories to explore, but that some of the repositories are very difficult to access except by skilled personnel. Some repositories cannot be opened at all except by software engineers using complex APIs. Each repository can have its own API or user interface, and every interface is different.
We often depict the multi-repository problem in diagrams like this one:
How does Attivio solve this problem? Attivio can read more than 800 file formats, which include nearly every kind of document you would normally find on a computer file system. Attivio can harvest pages from Web sites, content management systems, document management systems, and email systems, and can use SQL to connect to most common databases. In addition, Attivio provides an easily-extensible connector framework to handle integration with virtually any repository.
Attivio can reach into all of these silos, and can then convert the data and content into records of a single universal index.
Data Mining, Text Analytics, and Data Transformation
Attivio's extremely powerful linguistics modules perform a variety of text analytics in 68 languages. Of the many linguistics tools that Attivio employs out-of-the-box, there are several that recognize and index significant phrases, concepts, and names of persons, locations, products, corporations and other "entities." Other text-analytics tools normalize terms and ensure that the same ideas are indexed the same way across all types of documents. Attivio also provides tools for classifying information or rating the sentiment of content and the sentiment expressed toward entities within content.
These capabilities build a de facto web of relationships among the documents and records in Attivio's universal index. This adds a layer of structured metadata around what was formerly unstructured content, and builds conceptual links between database records and free-text documents.
Combining linguistics capabilities, a library of data transformation components and a workflow layer, Attivio provides a robust content enrichment and data transformation framework that does not require external ETL-like tools or content processing pipelines to clean, enhance and manipulate information as it is brought into Attivio.
Unifying Structured and Unstructured Worlds
Attivio seamlessly bridges the gulf between structured business data (in databases, spreadsheets, and CRM systems) and unstructured content (email, news feeds, whitepapers, reports, memos).
Attivio enriches unstructured text while respecting the utility of database records. Attivio's text-analytics modules operate on both structured and unstructured content, indexing entities and significant concepts wherever they are encountered.
Attivio ingests news articles, database records, email messages, spreadsheets, and pages from content-management systems with equal ease, and turns them all into Attivio universal index entries. From that point on, the distinction between structured and unstructured information is no longer relevant. Inside Attivio, it is all the same, and it is all available.
Multiple Query Options
Attivio not only combines information from multiple repositories, it exposes a rich set of query interfaces to support a wide range of business questions. There is a Google-like keyword search interface; a sophisticated query language capable of relational search; and a SQL-based query language that lets Attivio provide content to external SQL-based applications.
Simple Query Language
Attivio's Simple Query Language is recommended for untrained users. It is similar to the query languages used by Google, Yahoo, and other commonly-used Web search engines. It offers keyword matching, simple AND/OR/NOT logic, numeric ranges, character wildcards, and a limited facility for score boosting. Most users just type in a word or two and then search.
The Simple Query Language is easy but unsophisticated. It cannot adequately express detailed or sophisticated queries.
Advanced Query Language
The Advanced Query Language is more complex than the Simple Query Language, and would not normally be employed by end users. It is more commonly used by application developers who can assemble the queries programmatically, possibly in response to control settings on a graphical user interface or Web page. This is a prefix-notation language that includes the following operators: DATE, TERM, ENTITY, FUZZY, REGEX, STARTSWITH, ENDSWITH, NEAR, ONEAR, PHRASE, AND/OR/NOT, BOOST, QUERY, SUBQUERY, INNER/OUTER JOIN, FILTER, DISTANCE, and POLYGON.
The Advanced Query Language offers additional search options such as the Relational Querying JOIN feature, which lets us perform structured JOIN operations in combination with full-text searches.
Query Using SQL
SQL can be used to query Attivio indexes through the Data Source Discovery product.
The architecture of Attivio is based on the Staged Event-Driven Architecture (SEDA) pattern. In SEDA, each pool of components has a work queue in front of it. Components work on their input queue and forward system messages to the queue of the next component in the workflow. The SEDA architecture allows Attivio to manage processing via sizing the queues and component instances all while processing content in an asynchronous fashion. The architecture can be scaled up to operate seamlessly on billions of documents in huge server farms, or it can be scaled down to provide decision-support on individual laptops.
This is the basic diagram of information flow through the Active Intelligence Engine. As you progress through the training materials and the documentation, you'll see increasingly detailed versions of this diagram. This version, however, describes the core elements of Attivio operation.
- Content and Data: Attivio can ingest unstructured text (called "content") and structured database records (called "data") in a very wide variety of formats.
- Connectors: A connector is a utility that reads a particular kind of information from a particular source. It includes a "scanner" to read the data, and a "publication manager" that packages the information into a form that Attivio can process. (The publication manager is often referred to as the "feeder.")
- Ingestion Workflows: The incoming information is processed through a "workflow," which may be thought of as a chain of processing stages that normalize text, enhance the quality of the information in various ways and cleanse and normalize content and data. Ingestion workflows are highly modular, so Attivio applies only the processing that you require for that type of input. The ingestion workflow ends by writing records into the Attivio universal index.
- Universal Index: The core of Attivio is a disk-based universal index. The diagram shows a monolithic, single index, which is the simplest case. Attivio allows us to subdivide the index (partitioning) to spread a large number of records across many servers, resulting in faster query execution and the ability to scale up linearly in volume. We can also duplicate the index across multiple servers to handle a high volume of query traffic and provide high availability.
- Query Applications: At the right edge of the diagram are three examples of query applications. This can be any kind of application that directs queries to Attivio. Typical examples would be a search portal, a business intelligence reporting tool, or an active dashboard system.
- Query Workflow: If the document text was modified by linguistic processing, such as by removing stop words or by stemming, it makes sense to perform the same steps on incoming queries. Otherwise the query terms may not match the ones that were stored in the index. Queries can also be expanded by adding synonyms or expanding acronyms found in the query.
- Response Workflow: The index engine applies the query to the index, and generates a list of matching items. The response workflow handles tasks that cannot be performed until the result set is known. In addition, the response workflow can automatically resubmit a failed query, substituting terms that have been corrected for spelling errors
- Attivio Administrator: The Attivio Administrator can dynamically create, configure and maintain most aspects of an Attivio application.
Once the basic architecture of Attivio is understood, the next topic is "information security." This means securing the contents of the universal index against unauthorized viewing.
The following diagram is a simplified view of the Attivio architecture. Note that the workflows have been temporarily removed from the diagram to enhance clarity, but we still have the ingestion process on the left side of the index and the query process on the right.
Attivio can ingest user records, group records, and access control lists (ACLs) along with the usual documents and data. As a result, the index engine can filter not only by query criteria but also by access criteria.
In a secure system, every incoming query is accompanied by the user's security credentials. As part of the query workflow, Attivio's Security Module rewrites the query to JOIN the query results with ACLs to filter the set of documents or records that the user is allowed to view. The index engine returns only the matching information that this user can view.
Examples of Attivio Applications
Attivio installations vary in number and type of repositories, application use cases, and scale and breadth of deployment. Generally, these aspects of an Attivio deployment define where a given Attivio installation is with regard to Attivio's. A number of Attivio applications have been Web-site information "portals." These types of applications are generally considered level 1 or level 2 of the maturity model.
Attivio applications that take an active role in seeking out important information and bringing it to the user's desktop, are considered to be level 3 and level 4 of the maturity model. These applications actively seek new or changed content and turn data and content into actionable information. The applications then deliver the new content directly to end user interfaces, along with any related documents or records that might be impacted by it. The business world refers to these systems as "dashboards," because they deliver real-time information directly into the user's view, like the gauges on the dashboard of a car. Attivio powered dashboard are referred to as Active Dashboards.
When Attivio is deployed enterprise-wide as a key component of enterprise architecture, this is considered level 5, the highest level, of the maturity model. In these deployments, Attivio provides a common information access layer that supports a broad range of applications and uses from keyword search-based, discovery-oriented applications to active dashboards for specific line of business users to integration into existing applications to Business Intelligence (BI) analytics and reporting.
BI Reporting about Unstructured Data
Attivio's SQL Software Development Kit (sqlsdk) lets us apply standard SQL queries to Attivio's universal index. This lets us ingest unstructured content, and then analyze it using a standard Business Intelligence (BI) reporting tool like JasperSoft's iReport. This SDK provides JDBC and ODBC interfaces that let external applications connect directly to Attivio.
These drivers let BI applications poll the universal index and extract content as if they were querying any SQL database. This gives the reporting tool access to all the unstructured content that Attivio has ingested, and all of the metadata that Attivio has generated to describe this content. You don't just get news articles about your products; you also get summaries of the sentiment trends in those articles. BI developers can explore this rich new resource using the SQL query syntax they already know.
The link below this paragraph takes you to a video of a JDBC demo. In this demo, we loaded 100 years of baseball statistics and baseball-related news articles into Attivio. From the reporting tool, it is trivial to view either the player statistics (structured data) or the news articles (unstructured). The demo continues with a series of SQL queries:
- First is a query across "all tables" for any record containing a string similar to the name "Ortiz," which is the name of several well-known ball players over the last few decades. The results contain structured player records and also unstructured news articles, all of which mention the name "Ortiz."
- Another query uses SQL to find all news articles where the word "steroids" appears near a player's name. The ability to use Attivio's language analysis tools from the report platform is a major new capability for the report writer.
- Finally, a SQL query generates a table of players, ranked by the number of news articles that have linked the player's name with steroids. Certain names just naturally float to the top of that table. (The following illustration shows the iReport interface after running this query..)
Watch the video of the JDBC demo.
Connecting your reporting tool to Attivio is just like connecting to any other SQL database. Just enter the database connection parameters into your reporting tool and go to work.
Key Technology Differentiators
- Populate this area with the 8 critical capabilities.