Giter Club home page Giter Club logo

site-structure-analyzer's Introduction

Site Structure Analyzer

Webspider (anemone) with a Rails analysis app on top.

Usage

  • Edit your database.yml and config/spider.yml
  • bundle install
  • migrate
  • rake spider:start
  • Wait a night
  • rake spider:refresh
    • Will refresh the cached count columns for every page. This is necessary for sorting and displaying of links and backlinks count in the overview
  • open localhost:3000/pages
  • Validator: This also supports to check for W3 Parsing Errors. See spider.yml “w3c_url:” to a private W3 Installation, or set it to nil

Tested with ruby 1.8.7 only!

When you like to crawl again or a new page clear out the database before (TODO):

  rake spider:clear
  rake spider:start

Using the unix tool “screen” is very nice for not having to run my client machine all the time while crawling on a different machine btw.

TODO

  • i18n
  • more filtering options
  • support multiple domains (up to now, you can specify one in the config/spider.yml and have to clear out everything
  • Speed of Crawling… ideas? anyone? threading seems not to work

site-structure-analyzer's People

Contributors

zealot128 avatar berthartm avatar

Stargazers

Toony avatar  avatar  avatar

Watchers

 avatar James Cloos avatar  avatar

Forkers

berthartm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.