Giter Club home page Giter Club logo

wooyun's Introduction

wooyun爬虫及搜索

wooyun.org bug search

index.html

list.html

1.相关组件

Python (建议2.7) pip
mongodb
scrapy
Flask
pymongo

2.爬取wooyun公开漏洞

  • 按路径建立文件夹:wooyun/web/app/static/wooyun_res/htmls、wooyun/web/app/static/wooyun_res/images
  • 在wooyun/下运行默认命令:scrapy crawl wooyun,完成所有数据的爬取。有三个参数可控制爬取方式。

 ** -a page_max:**控制爬取页数。0:默认值,表示全部爬取;num:大于0,表示爬取页数。eg:scrapy crawl -a page_max=2 wooyun #爬取两页数据(即第一页和第二页)

 ** -a local_store:**控制是否将页面及图片下载至本地。true:默认值,下载页面和图片至本地保存;false:不下载页面和图片,只保存标题等信息及相关链接。 eg:scrapy crawl -a local_store=true wooyun

 ** -a update:控制是否为增量更新爬取。false:默认值,非增量更新爬取(全部爬取);ture:增量爬取,从之前的爬取位置起从后向前爬取。 eg:scrapy crawl -a update=true wooyun**

  • 爬虫参数保存位置为:wooyun/wooyun/spider/settings.py,可根据需要修改
  • web参数保存位置为:wooyun/web/app/views_py/settings.py

3.爬取乌云知识库

  • 按路径建立文件夹:wooyun/web/app/static/wooyun_res/htmls、wooyun/web/app/static/wooyun_res/images
  • 在wooyuh/下运行默认命令:scrapy runspider wooyun/spider/wooyun_doc_spider.py,完成所有数据爬取。有两个参数控制爬取方式。

 ** -a page_max:** 控制爬取页数。0:默认值,表示全部爬取;num:大于0,表示爬取页数。** eg:scrapy runspider -a page_max=2 wooyun/spider/wooyun_doc_spider.py** #爬取两页数据(即第一页和第二页)

 ** -a local_store:**控制是否将页面及图片下载至本地。true:默认值,下载页面和图片至本地保存;false:不下载页面和图片,只保存标题等信息及相关链接。 eg:scrapy runspider -a local_store=true wooyun/spider/wooyun_doc_spider.py

  • 由于页面没有知识库文章总量参数,因此无法通过数量判断更新量,更新时需手动输入参数。

4.web信息搜索

web界面采用Flask框架作为web服务器,bootstrap作为前端

启动web server :在web目录下运行python run.py,默认端口是5000

搜索:在浏览器通过http://localhost:5000进行搜索漏洞,多个关键字可以用空格分开。

5.其它

本程序只用于技术研究和个人使用,程序组件均为开源程序,漏洞来源于乌云公开漏洞,版权归wooyun.org

[email protected]

wooyun's People

Contributors

mysterymask avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.