Giter Club home page Giter Club logo

wooyun's Introduction

wooyun爬虫及搜索

wooyun.org bug search

index.html

list.html

1.相关组件

Python (建议2.7) pip
mongodb
scrapy
Flask
pymongo

2.爬取wooyun公开漏洞

  • 按路径建立文件夹:wooyun/web/app/static/wooyun_res/htmls、wooyun/web/app/static/wooyun_res/images
  • 在wooyun/下运行默认命令:scrapy crawl wooyun,完成所有数据的爬取。有三个参数可控制爬取方式。

 ** -a page_max:**控制爬取页数。0:默认值,表示全部爬取;num:大于0,表示爬取页数。eg:scrapy crawl -a page_max=2 wooyun #爬取两页数据(即第一页和第二页)

 ** -a local_store:**控制是否将页面及图片下载至本地。true:默认值,下载页面和图片至本地保存;false:不下载页面和图片,只保存标题等信息及相关链接。 eg:scrapy crawl -a local_store=true wooyun

 ** -a update:控制是否为增量更新爬取。false:默认值,非增量更新爬取(全部爬取);ture:增量爬取,从之前的爬取位置起从后向前爬取。 eg:scrapy crawl -a update=true wooyun**

  • 爬虫参数保存位置为:wooyun/wooyun/spider/settings.py,可根据需要修改
  • web参数保存位置为:wooyun/web/app/views_py/settings.py

3.爬取乌云知识库

  • 按路径建立文件夹:wooyun/web/app/static/wooyun_res/htmls、wooyun/web/app/static/wooyun_res/images
  • 在wooyuh/下运行默认命令:scrapy runspider wooyun/spider/wooyun_doc_spider.py,完成所有数据爬取。有两个参数控制爬取方式。

 ** -a page_max:** 控制爬取页数。0:默认值,表示全部爬取;num:大于0,表示爬取页数。** eg:scrapy runspider -a page_max=2 wooyun/spider/wooyun_doc_spider.py** #爬取两页数据(即第一页和第二页)

 ** -a local_store:**控制是否将页面及图片下载至本地。true:默认值,下载页面和图片至本地保存;false:不下载页面和图片,只保存标题等信息及相关链接。 eg:scrapy runspider -a local_store=true wooyun/spider/wooyun_doc_spider.py

  • 由于页面没有知识库文章总量参数,因此无法通过数量判断更新量,更新时需手动输入参数。

4.web信息搜索

web界面采用Flask框架作为web服务器,bootstrap作为前端

启动web server :在web目录下运行python run.py,默认端口是5000

搜索:在浏览器通过http://localhost:5000进行搜索漏洞,多个关键字可以用空格分开。

5.其它

本程序只用于技术研究和个人使用,程序组件均为开源程序,漏洞来源于乌云公开漏洞,版权归wooyun.org

[email protected]

wooyun's People

Contributors

mysterymask avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

wooyun's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.