Giter Club home page Giter Club logo

webbookcrawler's Introduction

WebBookCrawler

针对国内的大部分小说网站,如起点中文网,3G书城,17k等,爬取其书库页面(如http://all.qidian.com/Default.aspx )。获得书的id,书名等信息。

程序采用多个抓取线程FetcherThread获取页面信息,通过正则表达式获得每本书的每项数据,一个写线程WriterThread完成入库。

程序中的Page类是一个抽象类,要具体针对某个特定的网站,只需要继承并实现Page的几个方法即可,关键是正则表达式。

其他详见代码内说明。

webbookcrawler's People

Contributors

zehao avatar

Stargazers

 avatar  avatar  avatar Dsy avatar Jay Chen avatar  avatar wang laoliu avatar So Zhang avatar  avatar  avatar

Watchers

James Cloos avatar  avatar Microee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.