A high performance web crawler in Elixir.
Crawler.crawl("http://elixir-lang.org", max_levels: 2)
Option | Type | Default Value | Description |
---|---|---|---|
:max_levels |
integer | 3 | Maximum nested level of pages to crawl. |
Crawler is under active development, below is a non-comprehensive list of features to be implemented.
- Set the maximum crawl level.
- Save to disk.
- Set timeouts.
- The ability to manually stop/pause/restart the crawler.
- Restrict crawlable domains, paths or file types.
- Limit concurrent crawlers.
- Limit rate of crawling.
- Set crawler's user agent.
- The ability to retry a failed crawl.
- DSL for scraping page content.
Please see CHANGELOG.md.
Licensed under MIT.