Giter Club home page Giter Club logo

mspider's Introduction

MSpider

Talk

The information security department of 360 company has been recruiting for a long time and is interested in contacting the mailbox zhangxin1[at]360.cn.

Installation

In Ubuntu, you need to install some libraries.

You can use pip or easy_install or apt-get to do this.

  • lxml
  • chardet
  • splinter
  • gevent
  • phantomjs

Example

  1. Use MSpider collect the vulnerability information on the wooyun.org.
	python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true
  1. Use MSpider collect the news information on the news.sina.com.cn.
	python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn"  -t 15 --random-agent true

ToDo

  1. Crawl and storage of information.
  2. Distributed crawling.

MSpider's help

Usage:
  __  __  _____       _     _
 |  \/  |/ ____|     (_)   | |
 | \  / | (___  _ __  _  __| | ___ _ __
 | |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
 | |  | |____) | |_) | | (_| |  __/ |
 |_|  |_|_____/| .__/|_|\__,_|\___|_|
               | |
               |_|
                        Author: Manning23


Options:
  -h, --help            show this help message and exit
  -u MSPIDER_URL, --url=MSPIDER_URL
                        Target URL (e.g. "http://www.site.com/")
  -t MSPIDER_THREADS_NUM, --threads=MSPIDER_THREADS_NUM
                        Max number of concurrent HTTP(s) requests (default 10)
  --depth=MSPIDER_DEPTH
                        Crawling depth
  --count=MSPIDER_COUNT
                        Crawling number
  --time=MSPIDER_TIME   Crawl time
  --referer=MSPIDER_REFERER
                        HTTP Referer header value
  --cookies=MSPIDER_COOKIES
                        HTTP Cookie header value
  --spider-model=MSPIDER_MODEL
                        Crawling mode: Static_Spider: 0  Dynamic_Spider: 1
                        Mixed_Spider: 2
  --spider-policy=MSPIDER_POLICY
                        Crawling strategy: Breadth-first 0  Depth-first 1
                        Random-first 2
  --focus-keyword=MSPIDER_FOCUS_KEYWORD
                        Focus keyword in URL
  --filter-keyword=MSPIDER_FILTER_KEYWORD
                        Filter keyword in URL
  --filter-domain=MSPIDER_FILTER_DOMAIN
                        Filter domain
  --focus-domain=MSPIDER_FOCUS_DOMAIN
                        Focus domain
  --random-agent=MSPIDER_AGENT
                        Use randomly selected HTTP User-Agent header value
  --print-all=MSPIDER_PRINT_ALL
                        Will show more information

mspider's People

Contributors

manning23 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.