Giter Club home page Giter Club logo

spidersel's Introduction

๐Ÿ•ท๏ธ SpiderSel ๐Ÿ•ท๏ธ

Python 3 script to crawl and spider websites for keywords via selenium


Buy Me A Coffee

๐Ÿ’Ž Features

SpiderSel provides the following features:

  • Crawling of HTTP and HTTPS websites for keywords via Selenium (native JS support)
  • Spidering of new URLs found within source code (adjustable depth, stays samesite)
  • Filtering keywords by length and removing non-sense (paths, emails, protocol handlers etc.)
  • Storing keywords and ignored strings into a separate results directory (txt files)

Basically alike to CeWL or CeWLeR but with support for websites that require JavaScript.

๐ŸŽ“ Usage

usage: spidersel.py [-h] --url URL [--depth DEPTH] [--min-length MIN_LENGTH]

Web Crawler and Keyword Extractor

options:
  -h, --help                  show this help message and exit
  --url URL                   URL of the website to crawl
  --depth DEPTH               Depth of subpage spidering (default: 1)
  --min-length MIN_LENGTH     Minimum keyword length (default: 4)
  --lowercase                 Convert all keywords to lowercase
  --include-emails            Include emails as keywords

๐Ÿณ Example 1 - Docker Run

External Dockerhub Image

docker run -v ${PWD}:/app/results --rm l4rm4nd/spidersel:latest --url https://www.apple.com --lowercase --include-emails

You will find your scan results in the current directory.

Local Docker Build Image

If you don't trust my image on Dockerhub, please go ahead and build the image yourself:

git clone https://github.com/Haxxnet/SpiderSel && cd SpiderSel
docker build -t spidersel .
docker run -v ${PWD}:/app/results --rm spidersel --url https:/www.apple.com --lowercase --include-emails

๐Ÿ Example 2 - Native Python

Installation

# clone repository and change directory
git clone https://github.com/Haxxnet/SpiderSel && cd SpiderSel

# optionally install google-chrome if not available yet
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb

# install python dependencies; optionally use a virtual environment (e.g. virtualenv, pipenv, etc.)
pip3 install -r requirements.txt

Running

python3 spidersel.py --url https://www.apple.com/ --lowercase --include-emails

The extracted keywords will be stored in an output file within the results folder.

spidersel's People

Contributors

l4rm4nd avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.