rverton / webanalyze Goto Github PK
View Code? Open in Web Editor NEWPort of Wappalyzer (uncovers technologies used on websites) to automate mass scanning.
License: MIT License
Port of Wappalyzer (uncovers technologies used on websites) to automate mass scanning.
License: MIT License
Latest version is showing it's v0.3.7
:: webanalyze : v0.3.7
:: workers : 4
:: technologies : technologies.json
:: crawl count : 0
:: search subdomains : true
:: follow redirects : false
As in title. :)
First, A W E S O M E T O O L !
Thank you a billion.
Problem:
When analyzing a subdomain that immediately redirects to a completely different subdomain, the tool analyzes the destination domain. There is no notification of this in the output, and there is no option to prohibit this behavior. When performing recon inside a scope, this behavior is problematic.
Solution:
Flag to ignore redirects and quit. And/or provide notification in the output that the results are for the technologies of a domain other than the one supplied.
webanalyze -host google.com:80
:: webanalyze : v1.0
:: workers : 4
:: apps : apps.json
:: crawl count : 0
:: search subdomains : true
google.com:80 error: Failed to retrieve: Get "google.com:80": unsupported protocol scheme "google.com"
go install -v github.com/rverton/webanalyze/cmd/webanalyze@latest
Hello,
When i tried running a scan of an IPv6 address i got the following error every time
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x702474]
goroutine 34 [running]:
github.com/rverton/webanalyze.(*WebAnalyzer).Process(0xc00000e580, 0xc000051f58, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/ubuntu/go/src/github.com/rverton/webanalyze/webanalyze.go:75 +0x74
main.main.func1(0xc00001a120, 0xc000011390, 0xc000011398, 0xc000016330)
/home/ubuntu/go/src/github.com/rverton/webanalyze/cmd/webanalyze/main.go:115 +0x36d
created by main.main
/home/ubuntu/go/src/github.com/rverton/webanalyze/cmd/webanalyze/main.go:111 +0x3f2
./webanalyze -update no longer works due to a format change at https://github.com/AliasIO/wappalyzer/tree/master/src/technologies
$ ./webanalyze -update 2021/09/09 19:14:06 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/technologies.json $ cat technologies.json 404: Not Found%
Hi @rverton
Is it possible to add a crawl functionality that crawls fx. the first 20 (configurable) found sub pages?
Wappalyzer has these variables available:
maxDepth
maxUrls
maxWait
recursive
The current schema used by wappalyzer differs slightly from the schema implemented in webanalyze. It has to be checked which fields differ first and then we can work on them one by one.
I'm wondering about the decision of not following redirects with http.Client
within fetchHost
function: https://github.com/rverton/webanalyze/blob/master/webanalyze.go#L105
Following HTTP redirects is crucial for our use case.
What do you think about adding the ability to provide some custom http.Client
to NewWebAnalyzer
?
I'm going to submit a pull request to better describe the idea.
According to the documents and issue here
Fields like scriptSrc
, html
and many others are not always in string form, it could be a map or an array of string.
Looks like webanalyze process string type only, can you fix it?
Line 204 in 6012713
It looks like the parseLinks function returns nil if the parsed url is different from the base url (which is great for cases like the following hostname.com & nothostname.com). This works for most cases, but it would be nice to add an option for searching subdomains like app.hostname.com with a base url of hostname.com while crawling. I have some thoughts on a PR and would be happy to PR if there is interest.
I get this error when I run in on a Silicon Apple:
zsh: exec format error: webanalyze
$ webanalyze -update
2020/09/04 15:24:50 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/apps.json
$ cat apps.json
404: Not Found
$ go get -v -u github.com/rverton/webanalyze/cmd/webanalyze
github.com/rverton/webanalyze/cmd/webanalyze
is invalid. there is tree master now.
The process hangs if you specify -crawl > 0 and an error occurs.
# ./webanalyze -output json -host http://saldkjklasdj.com -crawl 100
2020/01/09 19:00:15 Scanning with 4 workers.
2020/01/09 19:00:15 [-] Error for http://saldkjklasdj.com: Failed to retrieve
{"hostname":"http://saldkjklasdj.com","matches":null}
^C
Analyze results for different sites contain NivCMS, however this is not true (according to Wappalyzer):
$ webanalyze -update
2020/04/24 15:13:09 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/apps.json
$ webanalyze -host https://archive.org/
:: webanalyze : v1.0
:: workers : 4
:: apps : apps.json
:: crawl count : 0
:: search subdomains : true
https://archive.org/ (3.6s):
PHP, (Programming languages)
NivCMS, (CMS)
jQuery, (JavaScript libraries)
Nginx, 1.16.1 (Web servers, Reverse proxies)
React, (JavaScript frameworks)
jQuery UI, (JavaScript libraries)
jQuery, (JavaScript libraries)
Polyfill, (JavaScript libraries)
Ubuntu, (Operating systems)
Bootstrap, (UI frameworks)
Clipboard.js, (Miscellaneous)
Wappalyzer makes use of appended strings for version detection like here:
"IIS": {
"cats": [
"22"
],
"headers": {
"Server": "IIS(?:/([\\d.]+))?\\;version:\\1"
},
"icon": "IIS.png",
"implies": "Windows Server",
"website": "http://www.iis.net"
},
If split this "additiona field" starting from a ;
, we can also parse this information.
When I use more workers, I get "Failed to Retrieve" on a lot of URLs that worked at a lower amount of workers. The more workers I added, the more "Failed to Retrieve". Any ideas as to why this may be happening?
When I put symbolic link of binatry file to /usr/local/bin/ I cannot run webanalyzer in other directory, because technologies.json file is searched not relative to webanalyzer directory, can u fix that please? :)
Hi,
Some rule canot compile with regexp.Compile() (PCRE error)
var list []AppRegexp
for _, regexString := range s {
// Split version detection
splitted := strings.Split(regexString, "\\;")
regex, err := regexp.Compile(splitted[0]) //here
if err != nil {
// ignore failed compiling for now
// log.Printf("warning: compiling regexp for failed: %v", regexString, err)
}
Hello,
Just wanted to let you know when I run the tool with -output the tool is not able to connect to anything. But once I remove it the tools works fine again
deepakgadupudi@cloudshell:/Vishnu/Stack$ cat hosts | wc/Vishnu/Stack$ webanalyze -hosts hosts -output csv > results.csv
55195 55195 1585167
deepakgadupudi@cloudshell:
2020/02/06 02:38:41 Scanning with 4 workers.
2020/02/06 02:38:46 [-] Error for http://0addomain.com: Failed to retrieve
panic: sync: negative WaitGroup counter
goroutine 24 [running]:
sync.(*WaitGroup).Add(0xc0000962f0, 0xffffffffffffffff)
/usr/local/go/src/sync/waitgroup.go:74 +0x139
sync.(*WaitGroup).Done(...)
/usr/local/go/src/sync/waitgroup.go:99
github.com/rverton/webanalyze.process(0xc00252ccc0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/deepakgadupudi/gopath/src/github.com/rverton/webanalyze/webanalyze.go:251 +0x1a23
github.com/rverton/webanalyze.worker(0xc000070120, 0xc0000700c0, 0xc0000962e0)
/home/deepakgadupudi/gopath/src/github.com/rverton/webanalyze/webanalyze.go:137 +0x73
created by github.com/rverton/webanalyze.initWorker
/home/deepakgadupudi/gopath/src/github.com/rverton/webanalyze/webanalyze.go:122 +0x77
$ webanalyze -update
2017/04/19 01:22:30 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/apps.json
$ webanalyze -host google.com
2017/04/19 01:22:47 error initializing: json: cannot unmarshal string into Go struct field App.cats of type int
2017/04/19 01:22:47 Scanning with 4 workers.
2017/04/19 01:22:47 error creating export file: json: cannot unmarshal string into Go struct field App.cats of type int
When executing the following the "-silent" option is still providing the "-update" header in the output:
./webanalyze -apps technologies.json -update -silent -output json -host 10.10.10.238
The Output:
2021/07/01 20:43:09 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/technologies.json {"hostname":"http://10.10.10.238","matches":[{"app":{"cats":["22"],"category_names":["Web servers"],"cookies":null,"headers":{"Server":"(?:Apache(?:$|/([\\d.]+)|[^/-])|(?:^|\\b)HTTPD)\\;version:\\1"},"meta":null,"html":null,"script":null,"url":null,"website":"http://apache.org","implies":null},"app_name":"Apache","matches":[["Apache/2.4.29","2.4.29"]],"version":"2.4.29"},{"app":{"cats":["28"],"category_names":["Operating systems"],"cookies":null,"headers":{"Server":"Ubuntu","X-Powered-By":"Ubuntu"},"meta":null,"html":null,"script":null,"url":null,"website":"http://www.ubuntu.com/server","implies":null},"app_name":"Ubuntu","matches":[["Ubuntu"]],"version":""}]}
Notice how the first line, "2021/07/01 20:43:09 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/technologies.json" is still being included in the output.
Can we get that silenced too?
BTW - Great app!
webanalyze -hosts domains > stack
Do you have a way to save results??
Running build on Docker windows with WSL2 returns an error while building
=> [stage-1 2/6] RUN apk add --no-cache libc6-compat
=> [stage-1 3/6] WORKDIR /app
=> [build-env 2/2] RUN go install github.com/rverton/webanalyze/cmd/webanalyze@latest
=> [stage-1 4/6] COPY --from=build-env /go/bin/webanalyze .
=> [stage-1 5/6] RUN mkdir -p /app && adduser -D webanalyze && chown -R webanalyze:webanalyze /app
=> ERROR [stage-1 6/6] RUN ["./webanalyze", "-update"]
> [stage-1 6/6] RUN ["./webanalyze", "-update"]:
#12 0.426 Error loading shared library libresolv.so.2: No such file or directory (needed by ./webanalyze)
Can you guys add go mod? Easy for others to use.
It's an awesome project, thx.
Add cli flag and functionality to use a proxy server.
dest url : https://angularjs.org
First of all thank you! SO FAST! :)
Any plans to add the "confidence" tag from the wappalyzer app?"Implies" works, adding "confidence" would be really useful! From wappalyzer docs: "confidence Indicates a less reliable pattern that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% if not specified."
Thanks again!
for examples: User-agent
webanalyze -apps /tmp/technologies.json -host examples.com
error: can not open apps file : could not find technologies file
Hey there!
Thanks for this port, this is really awesome!
I'm havin an issue with, are you guys seeing this too ?
2021/02/04 19:21:07 initialization failed: json: cannot unmarshal array into Go struct field App.technologies.meta of type string
version 0.3.2
Ty
When installing the tool using go install github.com/rverton/webanalyze/cmd/webanalyze@latest
, it's throwing the error,
/home/michael/go/pkg/mod/golang.org/x/[email protected]/idna/idna10.0.0.go:25:2: golang.org/x/[email protected]: unexpected EOF
Thanks for this tool..!
I'm just having a porbelm in storing the results in a json or csv format.
version 0.3.3 tries to download technologies.jon file from https://raw.githubusercontent.com/AliasIO/Wappalyzer/master/src/technologies.json which is incorrect
I installed your tool like described in the README (in Arch Linux) and it keeps giving this error below:
$ go run cmd/webanalyze/main.go -host https://host/path/
2017/08/10 16:24:18 error initializing: json: cannot unmarshal number into Go struct field App.cats of type string
Is this normal? After the previous error the execution seems to proceed but hang.
2017/08/10 16:24:18 Scanning with 4 workers.
Apparently the command line binary tries to read the hosts file when using the "update" flag. When using this flag, the check should stop right after the update:
if update {
err = webanalyze.DownloadFile(webanalyze.WappalyzerURL, "apps.json")
if err != nil {
log.Fatalf("error: can not update apps file: %v", err)
}
log.Println("app definition file updated from ", webanalyze.WappalyzerURL)
}
// DEN1AL - this should not run!
// check single host or hosts file
if host != "" {
Looks like webanalyze
(with this apps.json
) doesn't detect as many applications as AliasIO/wappalyzer
:
$ webanalyze -host="http://stackshare.io"
2017/04/19 13:44:32 Scanning with 4 workers.
2017/04/19 13:44:34 [+] http://stackshare.io (1.346715838s):
2017/04/19 13:44:34 - Google Font API - [17]
2017/04/19 13:44:34 - Nginx - [22]
2017/04/19 13:44:34 - Express - [18 22]
2017/04/19 13:44:34 - Ruby on Rails - [18]
$ docker run --rm wappalyzer/cli http://stackshare.io | jq '.applications | .[] | .name'
"Algolia Realtime Search"
"AngularJS"
"Express"
"Handlebars"
"Intercom"
"List.js"
"Mailchimp"
"Moment.js"
"New Relic"
"Nginx"
"React"
"Segment"
"Snap.svg"
"SweetAlert"
"Twitter Bootstrap"
"UserVoice"
"Varnish"
"jQuery"
"Node.js"
Hi Robin,
Firstly thanks for creating this amazing tool :)
I noticed that at times a technology gets matched multiple times as a results its shown multiple times in the output, As we can see in the below screenshot that PHP
gets pointed out more then once.
./webanalyze -host https://www.bugcrowd.com/
--
Regards,
@bugbaba
Hi,
If I run the following URL 3VALLEYSMANUFACTURING.CO.UK through webanalyze then it does not report WordPress as being used.
But if I run the same URL through the Wappalyzer website then it does report WordPress.
Any idea why the two are not matching?
Hi @rverton
Wappalyzer have updated it's schema.
Additional regexes (same structure as meta):
Additional app infos:
Furthermore @AliasIO have added two new concepts: excludes and implies.
If the program detects Wordpress it implies that PHP is also used. Same princip if the program detects SPDY means HTTP/2.0 cannot be detected.
There exists a problem with implies and excludes though; it can either be an array or a string. The app Wordpress has an array for implies but Adminer has implies as a string.
error initializing: json: cannot unmarshal number into Go struct field App.cats of type string
A lot of technologies which were detected in the past are not detected anymore. I guess some format changed in the upstream technologies.json, which needs to be verified.
./webanalyze.go -h
./webanalyze.go: line 1: package: command not found
./webanalyze.go: line 3: syntax error near unexpected token newline' ./webanalyze.go: line 3:
import ('
Fresh Ubuntu 16.04 installation:
$ uname -a
Linux instance-1 4.13.0-1011-gcp #15-Ubuntu SMP Mon Feb 12 16:29:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
$ sudo apt install golang-go
...
$ go version
go version go1.6.2 linux/amd64
$ export GOPATH=$HOME
$ go get -u github.com/rverton/webanalyze/...
$ webanalyze -update
2018/03/07 09:13:00 app definition file updated from https://raw.githubusercontent.com/AliasIO/Wappalyzer/maste
r/src/apps.json
$ webanalyze -host fb.me
2018/03/07 09:18:14 error initializing: json: cannot unmarshal object into Go value of type map[int]webanalyze.C
ategory
2018/03/07 09:18:14 Scanning with 4 workers.
^C
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.