breenmachine / httpscreenshot Goto Github PK

View Code? Open in Web Editor NEW

625.0 35.0 176.0 14.43 MB

Shell 2.26% Python 88.70% JavaScript 1.54% CSS 6.22% Dockerfile 1.28%

httpscreenshot's Introduction

httpscreenshot

Installation via Docker

docker pull jesseosiecki/httpscreenshot docker run jesseosiecki/httpscreenshot

Installation on Ubuntu

Via Script

Run install-dependencies.sh script as root.

This script has been tested on Ubuntu 20.04 as root (sudo).

Manually

apt-get install swig swig2.0 libssl-dev python-dev python-pip
pip install -r requirements.txt

If you run into: 'module' object has no attribute 'PhantomJS' then pip install selenium (or pip install --upgrade selenium).

If installing on Kali Linux, PhantomJS might not be in the repositories, you can download from https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.8-linux-x86_64.tar.bz2 and symlink to /usr/bin like so:

sudo ln -s /path/to/phantomjs /usr/bin/phantomjs

README and Use Cases

HTTPScreenshot is a tool for grabbing screenshots and HTML of large numbers of websites. The goal is for it to be both thorough and fast which can sometimes oppose each other.

Before getting into documentation - this is what I USUALLY use for options if I want to screenshot a bunch of sites:

./httpscreenshot.py -i \<gnmapFile\> -p -w 40 -a -vH

Notice there are a ton of worker threads (40). This can be problematic, I make up for failures that could have been a result of too many threads with a second run:

./httpscreenshot.py -i \<gnmapFile\> -p -w 5 -a -vH

YMMV

The options are as follows:

-h, --help show this help message and exit -l LIST, --list LIST List of input URLs -i INPUT, --input INPUT nmap gnmap output file -p, --headless Run in headless mode (using phantomjs) -w WORKERS, --workers WORKERS number of threads -t TIMEOUT, --timeout TIMEOUT time to wait for pageload before killing the browser -v, --verbose turn on verbose debugging -a, --autodetect Automatically detect if listening services are HTTP or HTTPS. Ignores NMAP service detction and URL schemes. -vH, --vhosts Attempt to scrape hostnames from SSL certificates and add these to the URL queue -dB DNS_BRUTE, --dns_brute DNS_BRUTE Specify a DNS subdomain wordlist for bruteforcing on wildcard SSL certs -r RETRIES, --retries RETRIES Number of retries if a URL fails or timesout -tG, --trygui Try to fetch the page with FireFox when headless fails -sF, --smartfetch Enables smart fetching to reduce network traffic, also increases speed if certain conditions are met. -pX PROXY, --proxy PROXY SOCKS5 Proxy in host:port format

Some of the above options have non-obvious use-cases, so the following provides some more detail:

-l, --list -> Takes as input a file with a simple list of input URLs in the format "http(s)://<URL>"

-i, --input -> Takes a gnmap file as input. This includes masscan gnmap output.

-p, --headless -> I find myself using this option more and more. By default the script "drives" Firefox. As the number of threads increases this becomes really ugly - 20,30 Firefox windows open at once. This options uses "phantomjs" which doesn't have a GUI but will still do a decent job parsing javascript.

-w, --workers -> The number of threads to use. Increase for more speed. The list of input URL's is automatically shuffled to avoid hammering at IP addresses that are close to each other when possible. If you add too many threads, you might start seeing timeouts in responses - adjust for your network and machine.

-t TIMEOUT, --timeout -> How long to wait for a response from the server before calling it quits

-v, --verbose -> Will spit out some extra debugging output.

-a, --autodetect -> Without this option enabled, HTTPScreenshot will behave as follows:

If a LIST of urls is specified as input, sites with scheme "http://" are treated as non-ssl and sites with scheme "https://" are treated as ssl-enabled

For GNMAP input the script will scrape input and try to use any SSL detection performed by nmap. Unfortunately this is unreliable, nmap doesn't always like to tell you that something is SSL enabled. Further, masscan doesn't do any version or service detection.

The -a or --autodetect option throws away all SSL hints from the input file and tries to detect on its own.

-vH, --vhosts -> Often when visiting websites by their IP address (e.g: https://192.168.1.30), we will receive a different page than expected or an error. This is because the site is expecting a certain "virtual host" or hostname instead of the IP address, sometimes a single HTTP server will respond with many different pages for different hostnames.

For plaintext "http" websites, we can use reverse DNS, BING reverse IP search etc... to try and find the hostnames associated with an IP address. This is not currently a feature in HTTPScreenshot, but may be implemented later.

For SSL enabled "https" sites, this can be a little easier. The SSL certificate will provide us with a hint at the domain name in the CN field. In the "subject alt names" field of the certificate, when it exists, we may get a whole list of other domain names potentially associated with this IP. Often these are in the form "*.google.com" (wildcard certificate) but sometimes will be linked to a single hostname only like "www.google.com"

The -vH or --vhosts flag will, for each SSL enabled website extract the hostnames from the CN and subject alt names field, and add them to the list of URL's to be screenshotted. For wildcard certificates, the "*." part of the name is dropped.

-dB, --dns_brute -> Must use with -vH for it to make sense. This flag specifies a file containing a list of potential subdomains. For any wildcard certificate e.g: "*.google.com", HTTPScreenshot will try to bruteforce valid subdomains and add them to the list of URLs to be screenshotted.

-r, --retries -> Sometimes Firefox or ghostscript timeout when fetching a page. This could be due to a number of factors, sometimes you just have too many threads going, a network hiccup, etc. This specifies the number of times to "retry" a given host when it fails.

-tG, --trygui -> Upon failure to fetch with the headless browser phantomJS, will pop open FireFox and try again.

-sF, --smartfetch -> Enables smart fetching to reduce network traffic, also increases speed if certain conditions are met.

httpscreenshot's People

Contributors

Stargazers

Watchers

Forkers

cephurs jessekrembs rogwfu haylesr mak- rypeck marengz bupt007 carnal0wnage goryszewskig zflix lucabongiorni davinirjr cyberscions superteece cerebralmischief foxglovesec ikarius6 foxweek rubicondimitri yeffel 4sp1r3 ognz malwareengineering dginther mcdiamondz maznika w3bt00lz psuedoelastic sunnyneo webshell520 kkirsche doahrepos n0ttytalks b8box 0xbadca7 maxsaxedesignsecurity trypt1991 khasmek 0v3rm1nd-dr4g0n phin3has jsmit260 icanhasflag damolh yeraldinm team-firebugs jinyu00 jumbo-wjb dave7280 jxj0 gomsec dselig11235 pygain11 alchemycyberblaze expert4u-rahul wy85117 infosecsecurity rollys jjusttme2 magnologan natto97 glennneiger thingtono pshappyyou w00t3k batibol gr3yr0n1n raydrrrr ykankaya fingerleakers m1n1xm0rk mbischoping g43ys3lf os0666 binaryreaper njfox ip-2014 fzxcp3 nonfind sjukperro shellgh05t mmg1 blacklabssecurity bardelch ziednamouchi msssh joeldeleep crimsonk1ng ourobouros 0xjashim secdatahunter io-security-solutions bbhunter peges cyberbuck shahid1996 vismylover khood-r7 lonehand heikipikker

httpscreenshot's Issues

Kali and HttpScreenshot FYI.

More of an FYI then a Issue. It appears that PhantomJS is not part of the Kali Linux repo system. I'm working to get this corrected as it seems like Kali and Httpscreenshot would be handy together.

Put screenshots in a directory

How do you feel about putting all the screenshots in a directory rather than just putting them in the folder that the script was run from? Seems like this would make for a whole lot less clutter. Unrelated, have you had phantomjs work in Kali? I'm updated to the latest selenium and I followed the instructions for symlinking phantomjs to bin, but I still get :

"Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x209afd0>> ignored
Message: Unable to start phantomjs with ghostdriver."

httpsreenshot not working

i edit masshttp to point to masscan and also httpscreenshot.py
i first execute it and webdriver_prefs.json file not found at usr directory
and i upload it on that directory and execute it again and there was an error on geckodriver.
my geckodriver version is 0.26. then i rm the geckodriver and placed it with geckodriver 0.30 and again execute it.
and also i had error like this

./masshttp.sh
Starting masscan 1.3.2 (http://bit.ly/14GZzcT) at 2022-12-10 11:56:15 GMT
Initiating SYN Stealth Scan
Scanning 256 hosts [2 ports/host]
mkdir: cannot create directory ‘httpscreenshots’: File exists
Nmap version detection not used! Discovery module may miss some hosts!
Message: Process unexpectedly closed with status 1

Message: Process unexpectedly closed with status 1

i need help and if i had done things wrong

Missing dependency geckodriver

Installed httpscreenshot on 12/5/16 from repo onto Kali rolloing, fully updated, and received an error: Message: 'geckodriver' executable must be in PATH.

All listed dependencies were met. Manually downloaded geckodriver 11.1 from: https://github.com/mozilla/geckodriver/releases

Steps to fix:

mkdir /root/tools/gecko
cd /root/tools/gecko
wget https://github.com/mozilla/geckodriver/releases/download/v0.11.1/geckodriver-v0.11.1-linux64.tar.gz

gunzip geckodriver-v0.11.1-linux64.tar.gz
tar -xvf geckodriver-v0.11.1-linux64.tar.gz

ln -s /root/tools/gecko/geckodriver /usr/bin

Workflow Script.

Hey Guys

When you did your talk at Shmoocon your showed off a little scripts that help automate the the whole, process (scan,create directories, screenshot, cluster?) will that be available soon? Seems mighty handy.

The clusters.html not showing the results of scanning httpscreenshot

After making an attempt to scan see the results clusters.html but the web page is blank, just say WEB catalog Application catalog.

When performing scanning virtual machine freezes, I have to turn it off and then try to see the results. I am using kali linux 2.0

If anyone knows any solution I would greatly appreciate it.

Regards.

Dependency install issues

`Reading package lists... Done
Building dependency tree
Reading state information... Done
Package xvfb is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package swig
E: Unable to locate package swig3.0
E: Couldn't find any package by glob 'swig3.0'
E: Couldn't find any package by regex 'swig3.0'
E: Unable to locate package libssl-dev
E: Unable to locate package libjpeg-dev
E: Package 'xvfb' has no installation candidate`

I get this when I run ./install-dependencies.sh
Any clue what could be causing me to miss locating those packages. I seem to be unable to apt-get them or pip them too...
Thanks!

Dependency install issue

Hi,
Tried to install it on Kali2021.1 but i got some issues.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package firefox is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package swig3.0
E: Couldn't find any package by glob 'swig3.0'
E: Couldn't find any package by regex 'swig3.0'
E: Package 'firefox' has no installation candidate
E: Unable to locate package firefox-geckodriver

And when i tried to install swig, i got version 4.0
Any ideas?

Thanks!

Does not work on OSX qsize(): not supported

On OSX Queue.qsize() breaks for multiprocessor queues. There are two calls to this function in httpscreenshot.

Locally I was able to get httpscreenshot to run on OSX by changing these to use the .empty() method, but this results in not getting any stdout feedback on how many items left in the queue... unless you hack some external global counter... but then you have to contend with thread-safe (and multi-processor safe) access to that... so I'm not sure its worth a pull request. Hopefully someone will update the OSX libs to implement qsize(), but in the meantime if anyone wants to run this on OSX, here is what worked for me:

@@ -194,12 +194,12 @@ def worker(urlQueue, tout, debug, headless, doProfile, vhosts, subs, extraHosts,

        while True:
                #Try to get a URL from the Queue
-               if urlQueue.qsize() > 0:
+               if not urlQueue.empty():
                        try:                    
                                curUrl = urlQueue.get(timeout=tout)
                        except Queue.Empty:
                                continue
-                       print '[+] '+str(urlQueue.qsize())+' URLs remaining'
+                       print '[+] more URLs remaining'
                        screenshotName = quote(curUrl[0], safe='')
                        if(debug):
                                print '[+] Got URL: '+curUrl[0]

request: nmap XML for input

Just a request to add support for xml input in addition to gnmap. All the other various importers take in xml.

install-dependencies.sh fails on kali linux 2016.1 rolling swig2.0 is not available

cd /opt/httpscreenshot/ && ./install-dependencies.sh
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package swig2.0 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
swig:i386 swig

E: Package 'swig2.0' has no installation candidate

Recommended approach please?

headless mode?

i am running httpscreenshot with the headless (-p) option enabled, but still getting the following warning..

_/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
will the httpscreenshot.py be updated to do this by default or is this user error? if user error, what needs to be changed? thanks!

error version nmap when I running the script

root@kali:~/playbook2/httpscreenshot# cat masshttp.sh
#!/bin/sh
masscan -p80,443 -iL networks.txt -oG http.gnmap --rate 5000 -e tun0
cd httpscreenshots
python ~/playbook2/httpscreenshot/httpscreenshot.py -i ../http.gnmap -p -t 30 -w 50 -a -vH -r 1
python ~/playbook2/httpscreenshot/httpscreenshot.py -i ../http.gnmap -p -t 10 -w 10 -a -vH
cd ..
python screenshotClustering/cluster.py -d screenshots/

root@kali:~/playbook2/httpscreenshot# ./masshttp.sh
Starting masscan 1.0.6 (http://bit.ly/14GZzcT) at 2019-09-10 15:48:48 GMT
-- forced options: -sS -Pn -n --randomize-hosts -v --send-eth
Initiating SYN Stealth Scan
Scanning 1 hosts [2 ports/host]
Nmap version detection not used! Discovery module may miss some hosts!
[Errno 8] Exec format error
[Errno 8] Exec format error
[Errno 8] Exec format error
[Errno 8] Exec format error
[Errno 8] Exec format error

root@kali:~/playbook2/httpscreenshot# cat http.gnmap
.# Masscan 1.0.6 scan initiated Tue Sep 10 15:48:48 2019
.# Ports scanned: TCP(1;80-80,) UDP(0;) SCTP(0;) PROTOCOLS(0;)
Timestamp: 1568130528 Host: 10.10.10.51 () Ports: 80/open/tcp//http//
.# Masscan done at Tue Sep 10 15:48:59 2019

Feature Request : SOCKS Proxy

Have you thought about implementing a SOCKS proxy? Would be beneficial when working with remote sites.

Firefox mode blockage

setupBrowserProfile() seems to have a blockage at browser = webdriver.Firefox(fp).
When running in non-headless mode, it just stops at this point, it opens the browser instance but never returns.

Running on Ubuntu 14.04, python 2.7.6.

I'm looking into it, but I figured I'd let you know.

OSError: [Errno 9] Bad file descriptor

I am using httpscreenshot to see if sites are up and understanding what they are hosting. This list has 675 domains and I am using the --headless flag. The list of domains consist of sites that may or may not be hosting anything. Testing against 50 sites known to be hosting content the script works pretty well. When testing against a list of sites that might not be hosting anything it fails out with the following:

[-] Something bad happened with URL:
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "../httpscreenshot.py", line 370, in worker
browser.quit()
File "/home//.local/lib/python2.7/site-packages/selenium/webdriver/phantomjs/webdriver.py", line 80, in quit
self.service.stop()
File "/home//.local/lib/python2.7/site-packages/selenium/webdriver/common/service.py", line 151, in stop
self.send_remote_shutdown_command()
File "/home//.local/lib/python2.7/site-packages/selenium/webdriver/phantomjs/service.py", line 67, in send_remote_shutdown_command
os.close(self._cookie_temp_file_handle)
OSError: [Errno 9] Bad file descriptor

I've sought help on this error but can't get to a conclusion as to why this does not work. I'm trying to understand sites available to conduct further analysis on those hosting. Any help would be much appreciated.

Thanks.

Close project - what's the use of the specific font ?

Hello @breenmachine,

Since I'm also developing a similar (yet older) project of httpscreenshot, named webscreenshot (so much creativity though :p) also based on phantomjs, I'd like to know why you are using specific font LiberationSerif-BoldItalic.ttf: is it because of a particular bug (that I should be aware of) ?

Cheers.

Issue when running

How or where to fix:
/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

Add an ouput directory option for storing png/html files

Hello, first of all, thank for your tool !

Can you create this feature for a next release of your tool :

Add a option to redirect all .html / .png file to a folder. Because when you have more than 10 targets, it's a big mess in the "." folder ^^

Best regards,

Stuck at '[-] Something bad happened with URL: http://xx.xx.xx.xx:80'

I have a problem, where I get stuck at [-] Something bad happened with URL: http://xx.xx.xx.xx:80. Most of the times it happens, it will simply move along, but sometimes I get stuck and then have to cancel out, removed the effected IP from http.gnmap and start all over. This gets quite frustrating, when you have to do it every 5-10 minutes.

I was thinking -
Ubuntu 14.04
Kali
Mac OS X

Kali selenium

Running does not work on Kali:
httpscreenshot/httpscreenshot.py", line 10, in
from selenium import webdriver
ImportError: No module named selenium

no apt-get install selenium or apt-get install python-selenium