Page tree
Skip to end of metadata
Go to start of metadata
Web Crawler

Overview

The Web Crawler provides a highly-scalable, rich set of website indexing capabilities to the Attivio Platform. With the Web Crawler, users can configure one to many crawls of websites using a list of seed URLs or XML Sitemaps, adjust the speed at which pages are requested, provide credentials for sites requiring authentication, index pages generated dynamically using JavaScript, filter the content to be indexed by crawl depth, URL pattern, file extension or mimetype, and much, much more.

Feature List

more »


Download

Web Crawler in included with the installer.

Quick Start

Web Crawler Quick Start
Get the Web Crawler up and running in a few easy steps.

Documentation

Web Crawler Reference
A listing of all available configuration options and properties for the Web Crawler connector.

Web Crawler Store Cleaner Reference
A listing of all available configuration options and properties for the Web Crawler Store Cleaner connector.

Status UI
A guide to understanding the information displayed by the Web Crawler Status UI.

Crawl Store
A brief overview of the storage layer that Web Crawler connectors use for advanced processing.

Web Crawler Incremental Updating
An introduction and overview of the incremental crawling feature.

JavaScript Execution
Learn how to index pages with dynamic content generated by JavaScript.

Deduplication
An explanation of how duplicate web pages are found and what is done with them.

Crawl Filtering
A breakdown of the logic behind custom Web Crawler URL filters and general URL filtration.

Authentication
A description of all the authentication protocols the Web Crawler supports to access secured web content.

Web Crawler Performance
A summation of the customization Web Crawler offers for improving crawl performance.

Tutorials

How to Configure Incremental Crawling Using a Sitemap

How to Explicitly Crawl Specific Sets of URLs

FAQs

Web Crawler Frequently Asked Questions

Troubleshooting

Web Crawler Troubleshooting Guide

 

 

  • No labels