Giter Club home page Giter Club logo

urlextractor's Introduction

URLextractor

Information gathering & website reconnaissance

Usage: ./extractor http://www.hackthissite.org/

Tips:

  • Colorex: put colors to the ouput pip install colorex and use it like ./extractor http://www.hackthissite.org/ | colorex -g "INFO" -r "ALERT"
  • Tldextract: is used by dnsenumeration function pip install tldextract

Features:

  • IP and hosting info like city and country (using FreegeoIP)
  • DNS servers (using dig)
  • ASN, Network range, ISP name (using RISwhois)
  • Load balancer test
  • Whois for abuse mail (using Spamcop)
  • PAC (Proxy Auto Configuration) file
  • Compares hashes to diff code
  • robots.txt (recursively looking for hidden stuff)
  • Source code (looking for passwords and users)
  • External links (frames from other websites)
  • Directory FUZZ (like Dirbuster and Wfuzz - using Dirbuster) directory list)
  • URLvoid API - checks Google page rank, Alexa rank and possible blacklists
  • Provides useful links at other websites to correlate with IP/ASN
  • Option to open ALL results in browser at the end

Changelog to version 0.2.0:

  • [Fix] Changed GeoIP from freegeoip to ip-api
  • [Fix/Improvement] Remove duplicates from robots.txt
  • [Improvement] Better whois abuse contacts (abuse.net)
  • [Improvement] Top passwords collection added to sourcecode checking
  • [New feature] Firt run verification to install dependencies if need
  • [New feature] Log file
  • [New feature] Check for hostname on log file
  • [New feature] Check if hostname is listed on Spamaus Domain Blacklist
  • [New feature] Run a quick dnsenumeration with common server names

Changelog to version 0.1.9:

  • Abuse mail using lynx istead of curl
  • Target server name parsing fixed
  • More verbose about HTTP codes and directory discovery
  • MD5 collection for IP fixed
  • Links found now show unique URLs from array
  • [New feature] Google results
  • [New feature] Bing IP check for other hosts/vhosts
  • [New feature] Opened ports from Shodan
  • [New feature] VirusTotal information about IP
  • [New feature] Alexa Rank information about $TARGET_HOST

Requirements:

Tested on Kali light mini AND OSX 10.11.3 with brew

sudo apt-get install bc curl dnsutils libxml2-utils whois md5sha1sum lynx openssl -y

Configuration file:

CURL_TIMEOUT=15 #timeout in --connect-timeout
CURL_UA=Mozilla #user-agent (keep it simple)
INTERNAL=NO #YES OR NO (show internal network info)
URLVOID_KEY=your_API_key #using API from http://www.urlvoid.com/
FUZZ_LIMIT=10 #how many lines it will read from fuzz file
OPEN_TARGET_URLS=NO #open found URLs at the end of script
OPEN_EXTERNAL_LINKS=NO #open external links (frames) at the end of script
FIRST_TIME=YES #if first time check for dependecies

Todo list:

  • Upload to github :)
  • Check for installed packages
  • Integration with other APIs
  • Export to CSV
  • Integration with CipherScan

Stargazers over time

Stargazers over time

urlextractor's People

Contributors

eschultze avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

urlextractor's Issues

Default user-agent trips IDS tools

The default user-agent used by curl inadvertently triggers intrusion detection tools (e.g. Snort) which might end up temporarily blocking the access to the target URL.

The fix is simple though: in the config.sh file change CURL_UA=Mozilla to CURL_UA=Mozilla/5.0. Just append "/5.0" and it should be good to go.

[FALSE ALARM, nevermind] Bug report: single quote in the URL is treated as an end-of-URL.

Page tested: https://www.uchinokomato.me/chara/show/44405

When it tries to obtain this URL:

https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/original/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889'MO%E3%81%95%E3%82%93%E3%81%8B%E3%82%89%E3%81%AE%E9%A0%82%E3%81%8D%E7%89%A9%29.png?1459753344

(Note the ' symbol INSIDE the URL and is part of the string.)
It extracts this instead:

https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/original/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889

What happened is that the script got confused thinking that the single quote (or apostrophe) inside the URL is the end of the string, but wasn't. Here is the HTML code that the extractor script is seeing:

<a data-lightbox="gallery" data-title="Uploaded at 2016-4-4 7:01
89'MO様に描いていただきました" href="https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/original/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889'MO%E3%81%95%E3%82%93%E3%81%8B%E3%82%89%E3%81%AE%E9%A0%82%E3%81%8D%E7%89%A9%29.png?1459753344"><img src="https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/medium/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889'MO%E3%81%95%E3%82%93%E3%81%8B%E3%82%89%E3%81%AE%E9%A0%82%E3%81%8D%E7%89%A9%29.png?1459753344" alt="%e3%83%aa%e3%83%b3%e3%82%ab%e3%81%a1%e3%82%83%e3%82%93%2889'mo%e3%81%95%e3%82%93%e3%81%8b%e3%82%89%e3%81%ae%e9%a0%82%e3%81%8d%e7%89%a9%29" style="height: 234.283px;"></a>

Note the URL is wrapped in double quotes.

Using a quote and double quote together can be used to have strings inside a quote (such as in javascript:

onchange="Function('Arg1', 'Arg2'); Calculate()"

)

Also " cannot be used in a filename (reserved character).

Feature request: ++Python translation and CipherScan for SSL stuff

Thanks for the script!
I would get behind the python translation, and also adding a function for checking SSL stuff with cipherscan (https://github.com/jvehent/cipherscan)
With the python integration I would also add doing a scapy traceroute to the target and generating the map path.

res,unans = traceroute(["www.target.com"],dport=[80,443],maxttl=20,retry=-2)
res.graph() # piped to ImageMagick's display program. Image below.
res.graph(type="ps",target="| lp") # piped to postscript printer
res.graph(target="> /workingfolder/traceroute.svg") # saved to file

Cheers

how to run

./extractor.sh: 13: ./extractor.sh: source: not found
-e \e[1;32m##################################################
-e # URLextractor #
-e # Information Gathering & Website Reconnaissance #
-e # coded by eschultze #
-e # version - 0.1.9 #
-e ##################################################\e[m
[INFO] Date: 28/08/17 | Time: 01:52:10
./extractor.sh: 27: ./extractor.sh: [[: not found
./extractor.sh: 156: ./extractor.sh: Syntax error: Bad for loop variable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.