Giter Club home page Giter Club logo

webappanalyzer's Introduction

webappanalyzer

Validator Status License

This project is a continuation of the iconic Wappalyzer that went private recently on August 2023.

First and foremost, Enthec is committed not to set this repo private at any moment since this would be out of the scope of the company's business.

Our interest is to keep it growing, so it can be helpful to the community as it has been until now.

There are no changes to be expected in the library. We will update it with the same JSON structure currently in use so the user experience will not be modified.

Specification

A long list of regular expressions is used to identify technologies on web pages. Wappalyzer inspects HTML code, as well as JavaScript variables, response headers and more.

Patterns (regular expressions) are kept in src/technologies/. The following is an example of an application fingerprint.

Example

"Example": {
  "description": "A short description of the technology.",
  "cats": [
    "1"
  ],
  "cookies": {
    "cookie_name": "Example"
  },
  "dom": {
    "#example-id": {
      "exists": "",
      "attributes": {
        "class": "example-class"
      },
      "properties": {
        "example-property": ""
      },
      "text": "Example text content"
    }
  },
  "dns": {
    "MX": [
      "example\\.com"
    ]
  },
  "js": {
    "Example.method": ""
  },
  "excludes": "Example",
  "headers": {
    "X-Powered-By": "Example"
  },
  "text": "\bexample\b",
  "css": "\\.example-class",
  "robots": "Disallow: /unique-path/",
  "implies": "PHP\\;confidence:50",
  "requires": "WordPress",
  "requiresCategory": "Ecommerce",
  "meta": {
    "generator": "(?:Example|Another Example)"
  },
  "probe": {
    "/path": ""
  },
  "scriptSrc": "example-([0-9.]+)\\.js\\;confidence:50\\;version:\\1",
  "scripts": "function webpackJsonpCallback\\(data\\) {",
  "url": "example\\.com",
  "xhr": "example\\.com",
  "oss": true,
  "saas": true,
  "pricing": ["mid", "freemium", "recurring"],
  "website": "https://example.com",
}

JSON fields

Find the JSON schema at schema.json.

Required properties

Field Type Description Example
cats Array One or more category IDs. [1, 6]
website String URL of the application's website. "https://example.com"

Optional properties

Field Type Description Example
description String A short description of the technology in British English (max. 250 characters). Write in a neutral, factual tone; not like an ad. "A short description."
icon String Application icon filename. "WordPress.svg"
cpe String CPE is a structured naming scheme for technologies. To check if a CPE is valid and exists (using v2.3), use the search). "cpe:2.3:a:apache:http_server
:*:*:*:*:*:*:*:*"
saas Boolean The technology is offered as a Software-as-a-Service (SaaS), i.e. hosted or cloud-based. true
oss Boolean The technology has an open-source license. true
pricing Array Cost indicator (based on a typical plan or average monthly price) and available pricing models. For paid products only.

One of:

  • lowLess than US $100 / mo
  • midBetween US $100 - $1,000 / mo
  • highMore than US $1,000 / mo

Plus any of:

  • freemium Free plan available
  • onetime One-time payments accepted
  • recurring Subscriptions available
  • poa Price on asking
  • payg Pay as you go (e.g. commissions or usage-based fees)
["low", "freemium"]

Implies, requires and excludes (optional)

Field Type Description Example
implies String | Array The presence of one application can imply the presence of another, e.g. WordPress means PHP is also in use. "PHP"
requires String | Array Similar to implies but detection only runs if the required technology has been identified. Useful for themes for a specific CMS. "WordPress"
requiresCategory int | Array Similar to requires; detection only runs if a technology in the required category id has been identified. "Ecommerce"
excludes String | Array Opposite of implies. The presence of one application can exclude the presence of another. "Apache"

Patterns (optional)

Field Type Description Example
cookies Object Cookies. { "cookie_name": "Cookie value" }
dom String | Array | Object Uses a query selector to inspect element properties, attributes and text content. { "#example-id": { "property": { "example-prop": "" } } }
dns Object DNS records: supports MX, TXT, SOA and NS. { "MX": "example\\.com" }
js Object JavaScript properties (case sensitive). Avoid short property names to prevent matching minified code. { "jQuery.fn.jquery": "" }
headers Object HTTP response headers. { "X-Powered-By": "^WordPress$" }
text String | Array Matches plain text. Should only be used in very specific cases where other methods can't be used. \bexample\b
css String | Array CSS rules. Unavailable when a website enforces a same-origin policy. For performance reasons, only a portion of the available CSS rules are used to find matches. "\\.example-class"
probe Object Request a URL to test for its existence or match text content. { "/path": "Example text" }
robots String | Array Robots.txt contents. "Disallow: /unique-path/"
url String | Array Full URL of the page. "^https?//.+\\.wordpress\\.com"
xhr String | Array Hostnames of XHR requests. "cdn\\.netlify\\.com"
meta Object HTML meta tags, e.g. generator. { "generator": "^WordPress$" }
scriptSrc String | Array URLs of JavaScript files included on the page. "jquery\\.js"
scripts String | Array JavaScript source code. Inspects inline and external scripts. For performance reasons, avoid scripts where possible and use js instead. "function webpackJsonpCallback\\(data\\) {"
html (deprecated) String | Array HTML source code. Patterns must include an HTML opening tag to avoid matching plain text. For performance reasons, avoid html where possible and use dom instead. "<a [^>]*href=\"index.html"

Patterns

Patterns are essentially JavaScript regular expressions written as strings, but with some additions.

Quirks and pitfalls

  • Because of the string format, the escape character itself must be escaped when using special characters such as the dot (\\.). Double quotes must be escaped only once (\"). Slashes do not need to be escaped (/).
  • Flags are not supported. Regular expressions are treated as case-insensitive.
  • Capture groups (()) are used for version detection. In other cases, use non-capturing groups ((?:)).
  • Use start and end of string anchors (^ and $) where possible for optimal performance.
  • Short or generic patterns can cause applications to be identified incorrectly. Try to find unique strings to match.

Tags

Tags (a non-standard syntax) can be appended to patterns (and implies and excludes, separated by \\;) to store additional information.

Tag Description Example
confidence Indicates a less reliable pattern that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% if not specified. "js": { "Mage": "\\;confidence:50" }
version Gets the version number from a pattern match using a special syntax. "scriptSrc": "jquery-([0-9.]+)\.js\\;version:\\1"

Version syntax

Application version information can be obtained from a pattern using a capture group. A condition can be evaluated using the ternary operator (?:).

Example Description
\\1 Returns the first match.
\\1?a: Returns a if the first match contains a value, nothing otherwise.
\\1?a:b Returns a if the first match contains a value, b otherwise.
\\1?:b Returns nothing if the first match contains a value, b otherwise.
foo\\1 Returns foo with the first match appended.

webappanalyzer's People

Contributors

alexmili avatar andreaskubasa avatar antoniojtorres avatar enthec-opensource avatar kikobeats avatar kingthorin avatar m3rls avatar paulaperedafernandez avatar reginaldl avatar seanhamlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

webappanalyzer's Issues

IIS technology CPE typo/correction

The CPE for IIS is defined in actual signatures as
cpe:2.3:a:microsoft:internet_information_server::::::::

It should be
cpe:2.3:a:microsoft:internet_information_services::::::::

Wagtail regex error

Platform
MacOS 12.7 with python 3.10.

Describe the bug
The Wagtail DOM regexs [style*='images/'], img[src*='images/'], etc. Are not compiling in python 3.10.

To Reproduce

import re
re.compile("(?:\\.[a-z]+|/media)(?:/[\\w-]+)?/(?:original_images/[\\w-]+|images/[\\w-.]+\\.(?:(?:fill|max|min)-\\d+x\\d+(?:-c\\d+)?|(?:width|height|scale)-\\d+|original))\\.")

Expected behavior
The regex should compile.

Additional context
The source of the bug is [\\w-.]. When modified to [\\w-\\.] everything works accordingly. Before doing a PR fixing this problem, I wanted to be sure I didn't miss something about how Javascript regexs work.

`await wappalyzer.destroy();` hangs forever

Platform
MacOS

Describe the bug
After await wappalyzer.init(); if I do a await wappalyzer.destroy(); it hangs forever despite checking I'm calling it the right way.

When doing console.log(wappalyzer.destroy) I get [AsyncFunction: destroy].

It was in their last public documentation https://www.npmjs.com/package/wappalyzer/v/6.10.66 so I don't understand what I'm doing wrong ๐Ÿ˜ข .

Additional context
Version v6.10.66

Actual pattern validation?

Is your feature request related to a problem? Please describe.
As part of the CI workflow for PRs (etc) would it be possible to validate the regex patterns or dom selectors ?

In the past we've found that upstream of AliasIO we encountered invalid regex patterns added to the technology files, or invalid selectors.

The two "normal" cases seemed to be:

  • when a regex contained curly braces which either weren't matched in a repetition declaration or when they weren't escaped.
  • when a dom selector had unbalanced single or double quotes.

There are plenty of other things that can make a regex or dom selector invalid, it would be good to catch and fix these early.

Describe the solution you'd like
I believe this could be added to the existing Python based validation. In Java a pattern can be compiled (Pattern.compile(String)) at which point an exception would be thrown if invalid. We also came up with something similar for DOM selectors. I assume something similar can be done with Python.

Describe alternatives you've considered

  • Live with the errors and correct them as they're noticed.

Additional context
Not sure what else to say here. Mainly I was thinking that catching potential errors as close to introduction as possible would be the easiest way to address/prevent them.

ZAP use of webappanalyzer

Hiya ZAP team here (inc @thc202, @kingthorin, @ricekot),

As you may know we have a ZAP add-on which wraps the old Wappalyzer functionality: https://www.zaproxy.org/docs/desktop/addons/technology-detection/

Obviously this is not now being updated โ˜น๏ธ

Are you ok with us migrating to use webappanalyzer instead?
We will update that link to give you credit.

FYI we have a github action which automatically pulls the source. This also has some unit tests - we've found bugs in Wappalyzer regexes quite a few times, and always report these back ๐Ÿ˜

Include information on vulnerable tech

Describe the solution you'd like
Included with each tech found, indicate if it's vulnerable and a link to the vulnerability such as a CWE. Expose this data in such a way that another tool, such as ZAP, can leverage it.

Describe alternatives you've considered
Searching using other tools.

Python regex compilation errors

Platform
macOS 12.7 with Python 3.10.

Describe the bug
After my last Issue, I tried to test ALL regex with python and found 3861 errors.

To Reproduce
See this gist

Expected behavior
All regex should compile in Python

Additional context
No

Should migrate schema.json

It would probably be a good idea for this repo to include the latest schema.json from the original project.

Consistancy issues in JSON schema

Describe the bug
The definition of the schema in schema.json specifies that some fields may be one of several types.

For instance, the field implies can either be a string or an array (of strings):

"implies": {
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/non-empty-non-blank-string"
      }
    },
    {
      "$ref": "#/definitions/non-empty-non-blank-string"
    }
  ]
},

This makes it really hard to parse the JSON object in languages like Go where you need to define the types statically:

type Techno struct {
    Categories  []int    `json:"cats"`
    Implies     []string `json:"implies"`  // This may just be a string!
}

Trying to deserialize a JSON object into an instance of this struct will return an error if the JSON input is using a string type instead of an array of strings:

json: cannot unmarshal string into Go struct field Techno.implies of type []string

Expected behavior
Have a single type for each field.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.