Webspider (anemone) with a Rails analysis app on top.
- Edit your database.yml and config/spider.yml
- bundle install
- migrate
- rake spider:start
- Wait a night
- rake spider:refresh
- Will refresh the cached count columns for every page. This is necessary for sorting and displaying of links and backlinks count in the overview
- open localhost:3000/pages
- Validator: This also supports to check for W3 Parsing Errors. See spider.yml “w3c_url:” to a private W3 Installation, or set it to nil
Tested with ruby 1.8.7 only!
When you like to crawl again or a new page clear out the database before (TODO):
rake spider:clear rake spider:start
Using the unix tool “screen” is very nice for not having to run my client machine all the time while crawling on a different machine btw.
- i18n
- more filtering options
- support multiple domains (up to now, you can specify one in the config/spider.yml and have to clear out everything
- Speed of Crawling… ideas? anyone? threading seems not to work
- …