For future documentation improvements, we have a Documentation Wishlist http://webarchive.jira.com/wiki/display/Heritrix/Documentation+Wishlist.
An Introduction to Heritrix https://webarchive.jira.com/wiki/download/attachments/5441/Mohr-et-al-2004.pdf
provides more detailed information on the structure and design of Heritrix. Some very-old info can still be gleaned from the old wiki (http://web.archive.org/web/*/http://crawler.archive.org/cgi-bin/wiki.pl?HomePage.
# Mailing lists
Algolia provides a developer-friendly RESTful API for website and app instant search. Most web services and mobile apps, such as Spotify, Salesforce or Amazon need to provide a fast and meaningful access to database objects via a simple search box. People want to find songs, invoices, products in just a few keystrokes.
tags: api application-search developer-tools full-text-search indexed-searchMixnode is a fast, flexible and massively scalable web crawler in the cloud. Using Mixnode eliminates the need for upfront investment in infrastructure, hardware, software and labour that would be required if you built or ran your own web crawler.
tags: crawling web-crawler web-crawling web-scraper web-scrapingApache Nutch --
tags: web-crawler web-crawling web-scraperExpertrec site search engine helps you add ultra fast search to your website . It adds a superb autosuggest and search listing pages where people can find products with a few keystrokes. Along with this you get complete control over your search results with complete merchandising options and access to real time search analytics.
tags: angularjs node.js objective python rubyACHE is a web crawler for domain-specific search
tags: web-crawler web-crawling web-scraper web-scrapingStormCrawler is an open source SDK for building distributed web crawlers with Apache Storm. The project is under Apache license v2 and consists of a collection of reusable resources and components, written mostly in Java.
tags: web-crawlerWith Google Custom Search, add a search box to your homepage to help people find what they need on your website.
tags: embeddable search-engine