Giter Club home page Giter Club logo

httpscreenshot's Introduction

httpscreenshot

Installation via Docker

docker pull jesseosiecki/httpscreenshot docker run jesseosiecki/httpscreenshot

Installation on Ubuntu

Via Script

Run install-dependencies.sh script as root.

This script has been tested on Ubuntu 20.04 as root (sudo).

Manually

apt-get install swig swig2.0 libssl-dev python-dev python-pip
pip install -r requirements.txt

If you run into: 'module' object has no attribute 'PhantomJS' then pip install selenium (or pip install --upgrade selenium).

If installing on Kali Linux, PhantomJS might not be in the repositories, you can download from https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.8-linux-x86_64.tar.bz2 and symlink to /usr/bin like so:

sudo ln -s /path/to/phantomjs /usr/bin/phantomjs

README and Use Cases

HTTPScreenshot is a tool for grabbing screenshots and HTML of large numbers of websites. The goal is for it to be both thorough and fast which can sometimes oppose each other.

Before getting into documentation - this is what I USUALLY use for options if I want to screenshot a bunch of sites:

./httpscreenshot.py -i \<gnmapFile\> -p -w 40 -a -vH

Notice there are a ton of worker threads (40). This can be problematic, I make up for failures that could have been a result of too many threads with a second run:

./httpscreenshot.py -i \<gnmapFile\> -p -w 5 -a -vH

YMMV

The options are as follows:

-h, --help show this help message and exit -l LIST, --list LIST List of input URLs -i INPUT, --input INPUT nmap gnmap output file -p, --headless Run in headless mode (using phantomjs) -w WORKERS, --workers WORKERS number of threads -t TIMEOUT, --timeout TIMEOUT time to wait for pageload before killing the browser -v, --verbose turn on verbose debugging -a, --autodetect Automatically detect if listening services are HTTP or HTTPS. Ignores NMAP service detction and URL schemes. -vH, --vhosts Attempt to scrape hostnames from SSL certificates and add these to the URL queue -dB DNS_BRUTE, --dns_brute DNS_BRUTE Specify a DNS subdomain wordlist for bruteforcing on wildcard SSL certs -r RETRIES, --retries RETRIES Number of retries if a URL fails or timesout -tG, --trygui Try to fetch the page with FireFox when headless fails -sF, --smartfetch Enables smart fetching to reduce network traffic, also increases speed if certain conditions are met. -pX PROXY, --proxy PROXY SOCKS5 Proxy in host:port format

Some of the above options have non-obvious use-cases, so the following provides some more detail:

-l, --list -> Takes as input a file with a simple list of input URLs in the format "http(s)://<URL>"

-i, --input -> Takes a gnmap file as input. This includes masscan gnmap output.

-p, --headless -> I find myself using this option more and more. By default the script "drives" Firefox. As the number of threads increases this becomes really ugly - 20,30 Firefox windows open at once. This options uses "phantomjs" which doesn't have a GUI but will still do a decent job parsing javascript.

-w, --workers -> The number of threads to use. Increase for more speed. The list of input URL's is automatically shuffled to avoid hammering at IP addresses that are close to each other when possible. If you add too many threads, you might start seeing timeouts in responses - adjust for your network and machine.

-t TIMEOUT, --timeout -> How long to wait for a response from the server before calling it quits

-v, --verbose -> Will spit out some extra debugging output.

-a, --autodetect -> Without this option enabled, HTTPScreenshot will behave as follows:

If a LIST of urls is specified as input, sites with scheme "http://" are treated as non-ssl and sites with scheme "https://" are treated as ssl-enabled

For GNMAP input the script will scrape input and try to use any SSL detection performed by nmap. Unfortunately this is unreliable, nmap doesn't always like to tell you that something is SSL enabled. Further, masscan doesn't do any version or service detection.

The -a or --autodetect option throws away all SSL hints from the input file and tries to detect on its own.

-vH, --vhosts -> Often when visiting websites by their IP address (e.g: https://192.168.1.30), we will receive a different page than expected or an error. This is because the site is expecting a certain "virtual host" or hostname instead of the IP address, sometimes a single HTTP server will respond with many different pages for different hostnames.

For plaintext "http" websites, we can use reverse DNS, BING reverse IP search etc... to try and find the hostnames associated with an IP address. This is not currently a feature in HTTPScreenshot, but may be implemented later.

For SSL enabled "https" sites, this can be a little easier. The SSL certificate will provide us with a hint at the domain name in the CN field. In the "subject alt names" field of the certificate, when it exists, we may get a whole list of other domain names potentially associated with this IP. Often these are in the form "*.google.com" (wildcard certificate) but sometimes will be linked to a single hostname only like "www.google.com"

The -vH or --vhosts flag will, for each SSL enabled website extract the hostnames from the CN and subject alt names field, and add them to the list of URL's to be screenshotted. For wildcard certificates, the "*." part of the name is dropped.

-dB, --dns_brute -> Must use with -vH for it to make sense. This flag specifies a file containing a list of potential subdomains. For any wildcard certificate e.g: "*.google.com", HTTPScreenshot will try to bruteforce valid subdomains and add them to the list of URLs to be screenshotted.

-r, --retries -> Sometimes Firefox or ghostscript timeout when fetching a page. This could be due to a number of factors, sometimes you just have too many threads going, a network hiccup, etc. This specifies the number of times to "retry" a given host when it fails.

-tG, --trygui -> Upon failure to fetch with the headless browser phantomJS, will pop open FireFox and try again.

-sF, --smartfetch -> Enables smart fetching to reduce network traffic, also increases speed if certain conditions are met.

httpscreenshot's People

Contributors

breenmachine avatar hlein avatar jesse-osiecki avatar jessekrembs avatar jstnkndy avatar khasmek avatar libraryax avatar phin3has avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

httpscreenshot's Issues

Kali and HttpScreenshot FYI.

More of an FYI then a Issue. It appears that PhantomJS is not part of the Kali Linux repo system. I'm working to get this corrected as it seems like Kali and Httpscreenshot would be handy together.

Put screenshots in a directory

How do you feel about putting all the screenshots in a directory rather than just putting them in the folder that the script was run from? Seems like this would make for a whole lot less clutter. Unrelated, have you had phantomjs work in Kali? I'm updated to the latest selenium and I followed the instructions for symlinking phantomjs to bin, but I still get :

"Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x209afd0>> ignored
Message: Unable to start phantomjs with ghostdriver."

httpsreenshot not working

i edit masshttp to point to masscan and also httpscreenshot.py
i first execute it and webdriver_prefs.json file not found at usr directory
and i upload it on that directory and execute it again and there was an error on geckodriver.
my geckodriver version is 0.26. then i rm the geckodriver and placed it with geckodriver 0.30 and again execute it.
and also i had error like this

./masshttp.sh
Starting masscan 1.3.2 (http://bit.ly/14GZzcT) at 2022-12-10 11:56:15 GMT
Initiating SYN Stealth Scan
Scanning 256 hosts [2 ports/host]
mkdir: cannot create directory ‘httpscreenshots’: File exists
Nmap version detection not used! Discovery module may miss some hosts!
Message: Process unexpectedly closed with status 1

Message: Process unexpectedly closed with status 1

i need help and if i had done things wrong

Missing dependency geckodriver

Installed httpscreenshot on 12/5/16 from repo onto Kali rolloing, fully updated, and received an error: Message: 'geckodriver' executable must be in PATH.

All listed dependencies were met. Manually downloaded geckodriver 11.1 from: https://github.com/mozilla/geckodriver/releases

Steps to fix:

mkdir /root/tools/gecko
cd /root/tools/gecko
wget https://github.com/mozilla/geckodriver/releases/download/v0.11.1/geckodriver-v0.11.1-linux64.tar.gz

gunzip geckodriver-v0.11.1-linux64.tar.gz
tar -xvf geckodriver-v0.11.1-linux64.tar.gz

ln -s /root/tools/gecko/geckodriver /usr/bin

Workflow Script.

Hey Guys

When you did your talk at Shmoocon your showed off a little scripts that help automate the the whole, process (scan,create directories, screenshot, cluster?) will that be available soon? Seems mighty handy.

The clusters.html not showing the results of scanning httpscreenshot

The clusters.html not showing the results of scanning httpscreenshot

After making an attempt to scan see the results clusters.html but the web page is blank, just say WEB catalog Application catalog.

When performing scanning virtual machine freezes, I have to turn it off and then try to see the results. I am using kali linux 2.0

If anyone knows any solution I would greatly appreciate it.

Regards.

Dependency install issues

`Reading package lists... Done
Building dependency tree
Reading state information... Done
Package xvfb is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package swig
E: Unable to locate package swig3.0
E: Couldn't find any package by glob 'swig3.0'
E: Couldn't find any package by regex 'swig3.0'
E: Unable to locate package libssl-dev
E: Unable to locate package libjpeg-dev
E: Package 'xvfb' has no installation candidate`

I get this when I run ./install-dependencies.sh
Any clue what could be causing me to miss locating those packages. I seem to be unable to apt-get them or pip them too...
Thanks!

Dependency install issue

Hi,
Tried to install it on Kali2021.1 but i got some issues.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package firefox is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package swig3.0
E: Couldn't find any package by glob 'swig3.0'
E: Couldn't find any package by regex 'swig3.0'
E: Package 'firefox' has no installation candidate
E: Unable to locate package firefox-geckodriver

And when i tried to install swig, i got version 4.0
Any ideas?

Thanks!

Does not work on OSX qsize(): not supported

On OSX Queue.qsize() breaks for multiprocessor queues. There are two calls to this function in httpscreenshot.

Locally I was able to get httpscreenshot to run on OSX by changing these to use the .empty() method, but this results in not getting any stdout feedback on how many items left in the queue... unless you hack some external global counter... but then you have to contend with thread-safe (and multi-processor safe) access to that... so I'm not sure its worth a pull request. Hopefully someone will update the OSX libs to implement qsize(), but in the meantime if anyone wants to run this on OSX, here is what worked for me:

@@ -194,12 +194,12 @@ def worker(urlQueue, tout, debug, headless, doProfile, vhosts, subs, extraHosts,

        while True:
                #Try to get a URL from the Queue
-               if urlQueue.qsize() > 0:
+               if not urlQueue.empty():
                        try:                    
                                curUrl = urlQueue.get(timeout=tout)
                        except Queue.Empty:
                                continue
-                       print '[+] '+str(urlQueue.qsize())+' URLs remaining'
+                       print '[+] more URLs remaining'
                        screenshotName = quote(curUrl[0], safe='')
                        if(debug):
                                print '[+] Got URL: '+curUrl[0]

request: nmap XML for input

Just a request to add support for xml input in addition to gnmap. All the other various importers take in xml.

install-dependencies.sh fails on kali linux 2016.1 rolling swig2.0 is not available

cd /opt/httpscreenshot/ && ./install-dependencies.sh
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package swig2.0 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
swig:i386 swig

E: Package 'swig2.0' has no installation candidate

Recommended approach please?

headless mode?

i am running httpscreenshot with the headless (-p) option enabled, but still getting the following warning..

_/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
will the httpscreenshot.py be updated to do this by default or is this user error? if user error, what needs to be changed? thanks!

error version nmap when I running the script

root@kali:~/playbook2/httpscreenshot# cat masshttp.sh
#!/bin/sh
masscan -p80,443 -iL networks.txt -oG http.gnmap --rate 5000 -e tun0
cd httpscreenshots
python ~/playbook2/httpscreenshot/httpscreenshot.py -i ../http.gnmap -p -t 30 -w 50 -a -vH -r 1
python ~/playbook2/httpscreenshot/httpscreenshot.py -i ../http.gnmap -p -t 10 -w 10 -a -vH
cd ..
python screenshotClustering/cluster.py -d screenshots/

root@kali:~/playbook2/httpscreenshot# ./masshttp.sh
Starting masscan 1.0.6 (http://bit.ly/14GZzcT) at 2019-09-10 15:48:48 GMT
-- forced options: -sS -Pn -n --randomize-hosts -v --send-eth
Initiating SYN Stealth Scan
Scanning 1 hosts [2 ports/host]
Nmap version detection not used! Discovery module may miss some hosts!
[Errno 8] Exec format error
[Errno 8] Exec format error
[Errno 8] Exec format error
[Errno 8] Exec format error
[Errno 8] Exec format error

root@kali:~/playbook2/httpscreenshot# cat http.gnmap
.# Masscan 1.0.6 scan initiated Tue Sep 10 15:48:48 2019
.# Ports scanned: TCP(1;80-80,) UDP(0;) SCTP(0;) PROTOCOLS(0;)
Timestamp: 1568130528 Host: 10.10.10.51 () Ports: 80/open/tcp//http//
.# Masscan done at Tue Sep 10 15:48:59 2019

Firefox mode blockage

setupBrowserProfile() seems to have a blockage at browser = webdriver.Firefox(fp).
When running in non-headless mode, it just stops at this point, it opens the browser instance but never returns.

Running on Ubuntu 14.04, python 2.7.6.

I'm looking into it, but I figured I'd let you know.

OSError: [Errno 9] Bad file descriptor

I am using httpscreenshot to see if sites are up and understanding what they are hosting. This list has 675 domains and I am using the --headless flag. The list of domains consist of sites that may or may not be hosting anything. Testing against 50 sites known to be hosting content the script works pretty well. When testing against a list of sites that might not be hosting anything it fails out with the following:

[-] Something bad happened with URL:
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "../httpscreenshot.py", line 370, in worker
browser.quit()
File "/home//.local/lib/python2.7/site-packages/selenium/webdriver/phantomjs/webdriver.py", line 80, in quit
self.service.stop()
File "/home//.local/lib/python2.7/site-packages/selenium/webdriver/common/service.py", line 151, in stop
self.send_remote_shutdown_command()
File "/home//.local/lib/python2.7/site-packages/selenium/webdriver/phantomjs/service.py", line 67, in send_remote_shutdown_command
os.close(self._cookie_temp_file_handle)
OSError: [Errno 9] Bad file descriptor

I've sought help on this error but can't get to a conclusion as to why this does not work. I'm trying to understand sites available to conduct further analysis on those hosting. Any help would be much appreciated.

Thanks.

Issue when running

How or where to fix:
/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

Add an ouput directory option for storing png/html files

Hello, first of all, thank for your tool !

Can you create this feature for a next release of your tool :

Add a option to redirect all .html / .png file to a folder. Because when you have more than 10 targets, it's a big mess in the "." folder ^^

Best regards,

Missing some sites

Unsure why at the moment exactly. Leaving this issue for tracking purposes. Seems to be related to the merge of the smart-fetch feature.

Missing libjpeg-dev

Missing requirement in the install script. I had specifically install apt-get install libjpeg-dev it when Pillow was failing to install

Target Operating Systems

@breenmachine this is just a question I wanted to get out of the way before I do some more work.

What are the Operating Systems/version httpscreenshot should aim to work with and be set up on?

I was thinking -
Ubuntu 14.04
Kali
Mac OS X

Kali selenium

Running does not work on Kali:
httpscreenshot/httpscreenshot.py", line 10, in
from selenium import webdriver
ImportError: No module named selenium

no apt-get install selenium or apt-get install python-selenium

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.