Page tree
Skip to end of metadata
Go to start of metadata
Web Crawler


The Web Crawler provides a highly-scalable, rich set of website indexing capabilities to the Attivio Platform. With the Web Crawler, users can configure one to many crawls of websites using a list of seed URLs or XML Sitemaps, adjust the speed at which pages are requested, provide credentials for sites requiring authentication, index pages generated dynamically using JavaScript, filter the content to be indexed by crawl depth, URL pattern, file extension or mimetype, and much, much more.

Feature List

more »


Web Crawler in included with the installer.

Quick Start

Web Crawler Quick Start
Get the Web Crawler up and running in a few easy steps.


Web Crawler Reference
A listing of all available configuration options and properties for the Web Crawler connector.

Web Crawler Store Cleaner Reference
A listing of all available configuration options and properties for the Web Crawler Store Cleaner connector.

Status UI
A guide to understanding the information displayed by the Web Crawler Status UI.

Crawl Store
A brief overview of the storage layer that Web Crawler connectors use for advanced processing.

Web Crawler Incremental Updating
An introduction and overview of the incremental crawling feature.

JavaScript Execution
Learn how to index pages with dynamic content generated by JavaScript.

An explanation of how duplicate web pages are found and what is done with them.

Crawl Filtering
A breakdown of the logic behind custom Web Crawler URL filters and general URL filtration.

A description of all the authentication protocols the Web Crawler supports to access secured web content.

Web Crawler Performance
A summation of the customization Web Crawler offers for improving crawl performance.


How to Configure Incremental Crawling Using a Sitemap

How to Explicitly Crawl Specific Sets of URLs


Web Crawler Frequently Asked Questions


Web Crawler Troubleshooting Guide



  • No labels