grab-site is a crawler for archiving websites to WARC files. It includes a dashboard for monitoring multiple crawls, and supports changing URL ignore patterns during the crawl.
tags: command-line-interface offline-website web-archive web-crawler website-downloaderDarcy Ripper is a powerful pure Java multi-platform web crawlerwith great work load and speed capabilities. Darcy is a standalone multi-platform Graphical User Interface Application that can be used by simple users as well as programmers to download web related resources on the fly.
tags: web-crawler web-spiderHeritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
tags: web-crawler web-crawling web-data-crawlingApache Nutch --
tags: web-crawler web-crawling web-scraperACHE is a web crawler for domain-specific search
tags: web-crawler web-crawling web-scraper web-scraping