Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
tags: framework data-mining web-scrapingHeritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
tags: web-crawler web-crawling web-data-crawlingApache Nutch --
tags: web-crawler web-crawling web-scraperACHE is a web crawler for domain-specific search
tags: web-crawler web-crawling web-scraper web-scraping