Giter Club home page Giter Club logo

webanalyze's Introduction

webanalyze

This is a port of Wappalyzer in Go. This tool is designed to be performant and allows to test huge lists of hosts.

Because Wappalyzer removed the public access to their app definitions, webanalyze currently loads definitions from enthec.

Installation and usage

Precompiled releases

Precompiled releases can be downloaded directly here.

Build

If you want to build for yourself:

$ go install -v github.com/rverton/webanalyze/cmd/webanalyze@latest
$ webanalyze -update # loads new technologies.json file from wappalyzer project
$ webanalyze -h
Usage of webanalyze:
  -apps string
        app definition file. (default "technologies.json")
  -crawl int
        links to follow from the root page (default 0)
  -host string
        single host to test
  -hosts string
        filename with hosts, one host per line.
  -output string
        output format (stdout|csv|json) (default "stdout")
  -search
        searches all urls with same base domain (i.e. example.com and sub.example.com) (default true)
  -silent
	    avoid printing header (default false)
  -update
        update apps file
  -worker int
        number of worker (default 4)

The -update flags downloads a current version of technologies.json from the wappalyzer repository to the current folder.

Docker

# Clone the repo
git clone https://github.com/rverton/webanalyze.git
# Build the container
docker build -t webanalyze:latest webanalyze
# Run the container
docker run -it webanalyze:latest -h

Development / Usage as a lib

See cmd/webanalyze/main.go for an example on how to use this as a library.

Example

$ ./webanalyze -host robinverton.de -crawl 1
 :: webanalyze        : v1.0
 :: workers           : 4
 :: apps              : technologies.json
 :: crawl count       : 1
 :: search subdomains : true

https://robinverton.de/hire/ (0.5s):
    Highlight.js,  (Miscellaneous)
    Netlify,  (Web Servers, CDN)
    Google Font API,  (Font Scripts)
http://robinverton.de (0.8s):
    Highlight.js,  (Miscellaneous)
    Netlify,  (Web Servers, CDN)
    Hugo, 0.42.1 (Static Site Generator)
    Google Font API,  (Font Scripts)

$ ./webanalyze -host robinverton.de -crawl 1 -output csv
 :: webanalyze        : v1.0
 :: workers           : 4
 :: apps              : technologies.json
 :: crawl count       : 1
 :: search subdomains : true

Host,Category,App,Version
https://robinverton.de/hire/,Miscellaneous,Highlight.js,
https://robinverton.de/hire/,Font Scripts,Google Font API,
https://robinverton.de/hire/,"Web Servers,CDN",Netlify,
http://robinverton.de,"Web Servers,CDN",Netlify,
http://robinverton.de,Static Site Generator,Hugo,0.42.1
http://robinverton.de,Miscellaneous,Highlight.js,
http://robinverton.de,Font Scripts,Google Font API,

webanalyze's People

Contributors

aidanhall34 avatar benmcginnis avatar caspergn avatar cathalgarvey avatar derekperkins avatar enthec-opensource avatar hrbrmstr avatar nikosrig avatar nl5887 avatar paulwhitings2 avatar rverton avatar thierry-f-78 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webanalyze's Issues

Redirects are followed to other domains with no notification or options

First, A W E S O M E T O O L !

Thank you a billion.

Problem:
When analyzing a subdomain that immediately redirects to a completely different subdomain, the tool analyzes the destination domain. There is no notification of this in the output, and there is no option to prohibit this behavior. When performing recon inside a scope, this behavior is problematic.

Solution:
Flag to ignore redirects and quit. And/or provide notification in the output that the results are for the technologies of a domain other than the one supplied.

Doesn't work with recent apps.json

$ webanalyze -update
2017/04/19 01:22:30 app definition file updated from  https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/apps.json
$ webanalyze -host google.com
2017/04/19 01:22:47 error initializing: json: cannot unmarshal string into Go struct field App.cats of type int
2017/04/19 01:22:47 Scanning with 4 workers.
2017/04/19 01:22:47 error creating export file: json: cannot unmarshal string into Go struct field App.cats of type int

Detection quality low

A lot of technologies which were detected in the past are not detected anymore. I guess some format changed in the upstream technologies.json, which needs to be verified.

IPv6 error

Hello,

When i tried running a scan of an IPv6 address i got the following error every time

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x702474]

goroutine 34 [running]:
github.com/rverton/webanalyze.(*WebAnalyzer).Process(0xc00000e580, 0xc000051f58, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/ubuntu/go/src/github.com/rverton/webanalyze/webanalyze.go:75 +0x74
main.main.func1(0xc00001a120, 0xc000011390, 0xc000011398, 0xc000016330)
	/home/ubuntu/go/src/github.com/rverton/webanalyze/cmd/webanalyze/main.go:115 +0x36d
created by main.main
	/home/ubuntu/go/src/github.com/rverton/webanalyze/cmd/webanalyze/main.go:111 +0x3f2

Support PCRE

Hi,

Some rule canot compile with regexp.Compile() (PCRE error)

	var list []AppRegexp

	for _, regexString := range s {

		// Split version detection
		splitted := strings.Split(regexString, "\\;")

		regex, err := regexp.Compile(splitted[0]) //here
		if err != nil {
			// ignore failed compiling for now
			// log.Printf("warning: compiling regexp for failed: %v", regexString, err)
		}

Confidence Tag

First of all thank you! SO FAST! :)

Any plans to add the "confidence" tag from the wappalyzer app?"Implies" works, adding "confidence" would be really useful! From wappalyzer docs: "confidence Indicates a less reliable pattern that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% if not specified."

Thanks again!

build yourself doesn't work anymore

$ go get -v -u github.com/rverton/webanalyze/cmd/webanalyze

github.com/rverton/webanalyze/cmd/webanalyze is invalid. there is tree master now.

error: can not open host file : open : no such file or directory

Apparently the command line binary tries to read the hosts file when using the "update" flag. When using this flag, the check should stop right after the update:

if update {
		err = webanalyze.DownloadFile(webanalyze.WappalyzerURL, "apps.json")
		if err != nil {
			log.Fatalf("error: can not update apps file: %v", err)
		}

		log.Println("app definition file updated from ", webanalyze.WappalyzerURL)

	}

// DEN1AL - this should not run!
	// check single host or hosts file
	if host != "" {

Fails to analyze host:port URL

webanalyze -host google.com:80
 :: webanalyze        : v1.0
 :: workers           : 4
 :: apps              : apps.json
 :: crawl count       : 0
 :: search subdomains : true

google.com:80 error: Failed to retrieve: Get "google.com:80": unsupported protocol scheme "google.com"

Crawling subdomains

if urlResolved.Hostname() != base.Hostname() {

It looks like the parseLinks function returns nil if the parsed url is different from the base url (which is great for cases like the following hostname.com & nothostname.com). This works for most cases, but it would be nice to add an option for searching subdomains like app.hostname.com with a base url of hostname.com while crawling. I have some thoughts on a PR and would be happy to PR if there is interest.

does not detect most client side app versions

Im having an issue with version detection for most client side apps like jQuery. This also applies to PHP. Wappalyzer is able to detect most versions, but i only get the application name with webanalyze.
webanalyze - webanalyze
wapalyzer - wappalyzer

Latest version showing wrong version

Latest version is showing it's v0.3.7

:: webanalyze : v0.3.7
:: workers : 4
:: technologies : technologies.json
:: crawl count : 0
:: search subdomains : true
:: follow redirects : false

Crawl functionality

Hi @rverton

Is it possible to add a crawl functionality that crawls fx. the first 20 (configurable) found sub pages?

Wappalyzer has these variables available:
maxDepth
maxUrls
maxWait
recursive

Mysterious NivCMS on different sites results

Analyze results for different sites contain NivCMS, however this is not true (according to Wappalyzer):

$ webanalyze -update
2020/04/24 15:13:09 app definition file updated from  https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/apps.json
$ webanalyze -host https://archive.org/
 :: webanalyze        : v1.0
 :: workers           : 4
 :: apps              : apps.json
 :: crawl count       : 0
 :: search subdomains : true

https://archive.org/ (3.6s):
    PHP,  (Programming languages)
    NivCMS,  (CMS)
    jQuery,  (JavaScript libraries)
    Nginx, 1.16.1 (Web servers, Reverse proxies)
    React,  (JavaScript frameworks)
    jQuery UI,  (JavaScript libraries)
    jQuery,  (JavaScript libraries)
    Polyfill,  (JavaScript libraries)
    Ubuntu,  (Operating systems)
    Bootstrap,  (UI frameworks)
    Clipboard.js,  (Miscellaneous)

image

Output flag breaks connection

Hello,

Just wanted to let you know when I run the tool with -output the tool is not able to connect to anything. But once I remove it the tools works fine again

apps.json is now technologies.json upstream

$ webanalyze -update
2020/09/04 15:24:50 app definition file updated from  https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/apps.json

$ cat apps.json
404: Not Found

cannot unmarshal object into Go value of type map[int]webanalyze.C ategory

Fresh Ubuntu 16.04 installation:

$ uname -a
Linux instance-1 4.13.0-1011-gcp #15-Ubuntu SMP Mon Feb 12 16:29:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"

$ sudo apt install golang-go
...

$ go version
go version go1.6.2 linux/amd64

$ export GOPATH=$HOME

$ go get -u github.com/rverton/webanalyze/...

$ webanalyze -update
2018/03/07 09:13:00 app definition file updated from  https://raw.githubusercontent.com/AliasIO/Wappalyzer/maste
r/src/apps.json

$ webanalyze -host fb.me
2018/03/07 09:18:14 error initializing: json: cannot unmarshal object into Go value of type map[int]webanalyze.C
ategory
2018/03/07 09:18:14 Scanning with 4 workers.
^C

-silent option doesn't filter the -update output status

When executing the following the "-silent" option is still providing the "-update" header in the output:

./webanalyze -apps technologies.json -update -silent -output json -host 10.10.10.238

The Output:

2021/07/01 20:43:09 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/technologies.json {"hostname":"http://10.10.10.238","matches":[{"app":{"cats":["22"],"category_names":["Web servers"],"cookies":null,"headers":{"Server":"(?:Apache(?:$|/([\\d.]+)|[^/-])|(?:^|\\b)HTTPD)\\;version:\\1"},"meta":null,"html":null,"script":null,"url":null,"website":"http://apache.org","implies":null},"app_name":"Apache","matches":[["Apache/2.4.29","2.4.29"]],"version":"2.4.29"},{"app":{"cats":["28"],"category_names":["Operating systems"],"cookies":null,"headers":{"Server":"Ubuntu","X-Powered-By":"Ubuntu"},"meta":null,"html":null,"script":null,"url":null,"website":"http://www.ubuntu.com/server","implies":null},"app_name":"Ubuntu","matches":[["Ubuntu"]],"version":""}]}

Notice how the first line, "2021/07/01 20:43:09 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/technologies.json" is still being included in the output.

Can we get that silenced too?

BTW - Great app!

More workes = More "Failed to Retrieve"

When I use more workers, I get "Failed to Retrieve" on a lot of URLs that worked at a lower amount of workers. The more workers I added, the more "Failed to Retrieve". Any ideas as to why this may be happening?

Schema updated

Hi @rverton

Wappalyzer have updated it's schema.

Additional regexes (same structure as meta):

  • cookies
  • js

Additional app infos:

  • icon

Furthermore @AliasIO have added two new concepts: excludes and implies.

If the program detects Wordpress it implies that PHP is also used. Same princip if the program detects SPDY means HTTP/2.0 cannot be detected.

There exists a problem with implies and excludes though; it can either be an array or a string. The app Wordpress has an array for implies but Adminer has implies as a string.

support go module

Can you guys add go mod? Easy for others to use.

It's an awesome project, thx.

Process hangs when crawl specified and an error occurs

The process hangs if you specify -crawl > 0 and an error occurs.

# ./webanalyze -output json -host http://saldkjklasdj.com -crawl 100
2020/01/09 19:00:15 Scanning with 4 workers.
2020/01/09 19:00:15 [-] Error for http://saldkjklasdj.com: Failed to retrieve
{"hostname":"http://saldkjklasdj.com","matches":null}
^C

Docker build error missing shared library libresolv.so.2 when running update in last stage in Dockerfile

Running build on Docker windows with WSL2 returns an error while building

 => [stage-1 2/6] RUN apk add --no-cache libc6-compat
 => [stage-1 3/6] WORKDIR /app
 => [build-env 2/2] RUN go install github.com/rverton/webanalyze/cmd/webanalyze@latest
 => [stage-1 4/6] COPY --from=build-env /go/bin/webanalyze .
 => [stage-1 5/6] RUN mkdir -p /app     && adduser -D webanalyze     && chown -R webanalyze:webanalyze /app
 => ERROR [stage-1 6/6] RUN ["./webanalyze", "-update"]
 > [stage-1 6/6] RUN ["./webanalyze", "-update"]:
#12 0.426 Error loading shared library libresolv.so.2: No such file or directory (needed by ./webanalyze)

Add version detection from Wappalyzers regular expressions

Wappalyzer makes use of appended strings for version detection like here:

        "IIS": {
		"cats": [
			"22"
		],
		"headers": {
			"Server": "IIS(?:/([\\d.]+))?\\;version:\\1"
		},
		"icon": "IIS.png",
		"implies": "Windows Server",
		"website": "http://www.iis.net"
	},

If split this "additiona field" starting from a ;, we can also parse this information.

Doesn't detect all the available applications

Looks like webanalyze (with this apps.json) doesn't detect as many applications as AliasIO/wappalyzer:

$ webanalyze -host="http://stackshare.io"
2017/04/19 13:44:32 Scanning with 4 workers.
2017/04/19 13:44:34 [+] http://stackshare.io (1.346715838s):
2017/04/19 13:44:34     - Google Font API        - [17]
2017/04/19 13:44:34     - Nginx  - [22]
2017/04/19 13:44:34     - Express        - [18 22]
2017/04/19 13:44:34     - Ruby on Rails  - [18]
$ docker run --rm wappalyzer/cli http://stackshare.io | jq '.applications | .[] | .name'
"Algolia Realtime Search"
"AngularJS"
"Express"
"Handlebars"
"Intercom"
"List.js"
"Mailchimp"
"Moment.js"
"New Relic"
"Nginx"
"React"
"Segment"
"Snap.svg"
"SweetAlert"
"Twitter Bootstrap"
"UserVoice"
"Varnish"
"jQuery"
"Node.js"

panic: Negative WaitGroup counter

deepakgadupudi@cloudshell:/Vishnu/Stack$ cat hosts | wc
55195 55195 1585167
deepakgadupudi@cloudshell:
/Vishnu/Stack$ webanalyze -hosts hosts -output csv > results.csv
2020/02/06 02:38:41 Scanning with 4 workers.
2020/02/06 02:38:46 [-] Error for http://0addomain.com: Failed to retrieve
panic: sync: negative WaitGroup counter

goroutine 24 [running]:
sync.(*WaitGroup).Add(0xc0000962f0, 0xffffffffffffffff)
/usr/local/go/src/sync/waitgroup.go:74 +0x139
sync.(*WaitGroup).Done(...)
/usr/local/go/src/sync/waitgroup.go:99
github.com/rverton/webanalyze.process(0xc00252ccc0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/deepakgadupudi/gopath/src/github.com/rverton/webanalyze/webanalyze.go:251 +0x1a23
github.com/rverton/webanalyze.worker(0xc000070120, 0xc0000700c0, 0xc0000962e0)
/home/deepakgadupudi/gopath/src/github.com/rverton/webanalyze/webanalyze.go:137 +0x73
created by github.com/rverton/webanalyze.initWorker
/home/deepakgadupudi/gopath/src/github.com/rverton/webanalyze/webanalyze.go:122 +0x77

Not correctly detecting WordPress Installs

Hi,

If I run the following URL 3VALLEYSMANUFACTURING.CO.UK through webanalyze then it does not report WordPress as being used.

But if I run the same URL through the Wappalyzer website then it does report WordPress.

Any idea why the two are not matching?

Several errors - running parrot linux

./webanalyze.go -h
./webanalyze.go: line 1: package: command not found
./webanalyze.go: line 3: syntax error near unexpected token newline' ./webanalyze.go: line 3: import ('

Memory leak

Hi friend,
I were use webanalyze in my web crawler, it seem memory leak in process func but i canot resolve it, can u help me

image

memory graph captured with pprof

De-duplication of results

Hi Robin,

Firstly thanks for creating this amazing tool :)
I noticed that at times a technology gets matched multiple times as a results its shown multiple times in the output, As we can see in the below screenshot that PHP gets pointed out more then once.

./webanalyze -host https://www.bugcrowd.com/

image

--
Regards,
@bugbaba

unable to unmarshal: wappalyzer schema definition file changed

I installed your tool like described in the README (in Arch Linux) and it keeps giving this error below:

$ go run cmd/webanalyze/main.go -host https://host/path/ 
2017/08/10 16:24:18 error initializing: json: cannot unmarshal number into Go struct field App.cats of type string

Is this normal? After the previous error the execution seems to proceed but hang.

2017/08/10 16:24:18 Scanning with 4 workers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.