Giter Club home page Giter Club logo

soheilkhodayari / jaw Goto Github PK

View Code? Open in Web Editor NEW
74.0 3.0 10.0 44.21 MB

JAW: A Graph-based Security Analysis Framework for Client-side JavaScript

Home Page: https://ja-w.me

License: GNU Affero General Public License v3.0

Python 8.65% Shell 0.76% JavaScript 81.12% Ruby 0.03% Dockerfile 0.01% HTML 0.10% CSS 0.01% Makefile 0.01% C++ 9.33%
csrf javascript neo4j property-graph vulnerability-detection static-analysis web-crawling client-side

jaw's Introduction

JAW

J A W

Website | Docs | Setup | Crawler | Quick Start Docker (Example)

JAW

Node Platform Open Source Love Tweet

An open-source, prototype implementation of property graphs for JavaScript based on the esprima parser, and the EsTree SpiderMonkey Spec. JAW can be used for analyzing the client-side of web applications and JavaScript-based programs.

This project is licensed under GNU AFFERO GENERAL PUBLIC LICENSE V3.0. See here for more information.

JAW has a Github pages website available at https://soheilkhodayari.github.io/JAW/.

Release Notes:

Content

Overview of JAW

  1. Test Inputs
  2. Data Collection
  3. HPG Construction
  4. Analysis and Outputs

Setup

  1. Installation

Quick Start

  1. Running the Pipeline
  2. Quick Example
  3. Crawling and Data Collection
  4. Graph Construction
  5. Security Analysis
  6. Test Web Application

Further Information

  1. Detailed Documentation
  2. Contribution and Code of Conduct
  3. Academic Publication

Overview of JAW

The architecture of the JAW is shown below.

Test Inputs

JAW can be used in two distinct ways:

  1. Arbitrary JavaScript Analysis: Utilize JAW for modeling and analyzing any JavaScript program by specifying the program's file system path.

  2. Web Application Analysis: Analyze a web application by providing a single seed URL.

Data Collection

  • JAW features several JavaScript-enabled web crawlers for collecting web resources at scale.

HPG Construction

  • Use the collected web resources to create a Hybrid Program Graph (HPG), which will be imported into a Neo4j database.

  • Optionally, supply the HPG construction module with a mapping of semantic types to custom JavaScript language tokens, facilitating the categorization of JavaScript functions based on their purpose (e.g., HTTP request functions).

Analysis and Outputs

  • Query the constructed Neo4j graph database for various analyses. JAW offers utility traversals for data flow analysis, control flow analysis, reachability analysis, and pattern matching. These traversals can be used to develop custom security analyses.

  • JAW also includes built-in traversals for detecting client-side CSRF, DOM Clobbering and request hijacking vulnerabilities.

  • The outputs will be stored in the same folder as that of input.

Setup

The installation script relies on the following prerequisites:

  • Latest version of npm package manager (node js)
  • Any stable version of python 3.x
  • Python pip package manager

Afterwards, install the necessary dependencies via:

$ ./install.sh

For detailed installation instructions, please see here.

Quick Start

Running the Pipeline

You can run an instance of the pipeline in a background screen via:

$ python3 -m run_pipeline --conf=config.yaml

The CLI provides the following options:

$ python3 -m run_pipeline -h

usage: run_pipeline.py [-h] [--conf FILE] [--site SITE] [--list LIST] [--from FROM] [--to TO]

This script runs the tool pipeline.

optional arguments:
  -h, --help            show this help message and exit
  --conf FILE, -C FILE  pipeline configuration file. (default: config.yaml)
  --site SITE, -S SITE  website to test; overrides config file (default: None)
  --list LIST, -L LIST  site list to test; overrides config file (default: None)
  --from FROM, -F FROM  the first entry to consider when a site list is provided; overrides config file (default: -1)
  --to TO, -T TO        the last entry to consider when a site list is provided; overrides config file (default: -1)

Input Config: JAW expects a .yaml config file as input. See config.yaml for an example.

Hint. The config file specifies different passes (e.g., crawling, static analysis, etc) which can be enabled or disabled for each vulnerability class. This allows running the tool building blocks individually, or in a different order (e.g., crawl all webapps first, then conduct security analysis).

Quick Example

For running a quick example demonstrating how to build a property graph and run Cypher queries over it, do:

$ python3 -m analyses.example.example_analysis --input=$(pwd)/data/test_program/test.js

Crawling and Data Collection

This module collects the data (i.e., JavaScript code and state values of web pages) needed for testing. If you want to test a specific JavaScipt file that you already have on your file system, you can skip this step.

JAW has crawlers based on Selenium (JAW-v1), Puppeteer (JAW-v2, v3) and Playwright (JAW-v3). For most up-to-date features, it is recommended to use the Puppeteer- or Playwright-based versions.

Playwright CLI with Foxhound

This web crawler employs foxhound, an instrumented version of Firefox, to perform dynamic taint tracking as it navigates through webpages. To start the crawler, do:

$ cd crawler
$ node crawler-taint.js --seedurl=https://google.com --maxurls=100 --headless=true --foxhoundpath=<optional-foxhound-executable-path>

The foxhoundpath is by default set to the following directory: crawler/foxhound/firefox which contains a binary named firefox.

Note: you need a build of foxhound to use this version. An ubuntu build is included in the JAW-v3 release.

Puppeteer CLI

To start the crawler, do:

$ cd crawler
$ node crawler.js --seedurl=https://google.com --maxurls=100 --browser=chrome --headless=true

See here for more information.

Selenium CLI

To start the crawler, do:

$ cd crawler/hpg_crawler
$ vim docker-compose.yaml # set the websites you want to crawl here and save
$ docker-compose build
$ docker-compose up -d

Please refer to the documentation of the hpg_crawler here for more information.

Graph Construction

HPG Construction CLI

To generate an HPG for a given (set of) JavaScript file(s), do:

$ node engine/cli.js  --lang=js --graphid=graph1 --input=/in/file1.js --input=/in/file2.js --output=$(pwd)/data/out/ --mode=csv

optional arguments:
  --lang: 	language of the input program
  --graphid:  an identifier for the generated HPG
  --input: 	path of the input program(s)
  --output: 	path of the output HPG, must be i
  --mode: 	determines the output format (csv or graphML)

HPG Import CLI

To import an HPG inside a neo4j graph database (docker instance), do:

$ python3 -m hpg_neo4j.hpg_import --rpath=<path-to-the-folder-of-the-csv-files> --id=<xyz> --nodes=<nodes.csv> --edges=<rels.csv>
$ python3 -m hpg_neo4j.hpg_import -h

usage: hpg_import.py [-h] [--rpath P] [--id I] [--nodes N] [--edges E]

This script imports a CSV of a property graph into a neo4j docker database.

optional arguments:
  -h, --help  show this help message and exit
  --rpath P   relative path to the folder containing the graph CSV files inside the `data` directory
  --id I      an identifier for the graph or docker container
  --nodes N   the name of the nodes csv file (default: nodes.csv)
  --edges E   the name of the relations csv file (default: rels.csv)

HPG Construction and Import CLI (v1)

In order to create a hybrid property graph for the output of the hpg_crawler and import it inside a local neo4j instance, you can also do:

$ python3 -m engine.api <path> --js=<program.js> --import=<bool> --hybrid=<bool> --reqs=<requests.out> --evts=<events.out> --cookies=<cookies.pkl> --html=<html_snapshot.html>

Specification of Parameters:

  • <path>: absolute path to the folder containing the program files for analysis (must be under the engine/outputs folder).
  • --js=<program.js>: name of the JavaScript program for analysis (default: js_program.js).
  • --import=<bool>: whether the constructed property graph should be imported to an active neo4j database (default: true).
  • --hybrid=bool: whether the hybrid mode is enabled (default: false). This implies that the tester wants to enrich the property graph by inputing files for any of the HTML snapshot, fired events, HTTP requests and cookies, as collected by the JAW crawler.
  • --reqs=<requests.out>: for hybrid mode only, name of the file containing the sequence of obsevered network requests, pass the string false to exclude (default: request_logs_short.out).
  • --evts=<events.out>: for hybrid mode only, name of the file containing the sequence of fired events, pass the string false to exclude (default: events.out).
  • --cookies=<cookies.pkl>: for hybrid mode only, name of the file containing the cookies, pass the string false to exclude (default: cookies.pkl).
  • --html=<html_snapshot.html>: for hybrid mode only, name of the file containing the DOM tree snapshot, pass the string false to exclude (default: html_rendered.html).

For more information, you can use the help CLI provided with the graph construction API:

$ python3 -m engine.api -h

Security Analysis

The constructed HPG can then be queried using Cypher or the NeoModel ORM.

Running Custom Graph traversals

You should place and run your queries in analyses/<ANALYSIS_NAME>.

Option 1: Using the NeoModel ORM (Deprecated)

You can use the NeoModel ORM to query the HPG. To write a query:

  • (1) Check out the HPG data model and syntax tree.
  • (2) Check out the ORM model for HPGs
  • (3) See the example query file provided; example_query_orm.py in the analyses/example folder.
$ python3 -m analyses.example.example_query_orm  

For more information, please see here.

Option 2: Using Cypher Queries

You can use Cypher to write custom queries. For this:

  • (1) Check out the HPG data model and syntax tree.
  • (2) See the example query file provided; example_query_cypher.py in the analyses/example folder.
$ python3 -m analyses.example.example_query_cypher

For more information, please see here.

Vulnerability Detection

This section describes how to configure and use JAW for vulnerability detection, and how to interpret the output. JAW contains, among others, self-contained queries for detecting client-side CSRF and DOM Clobbering

Step 1. enable the analysis component for the vulnerability class in the input config.yaml file:

request_hijacking:
  enabled: true 
  # [...]
  # 
domclobbering:
  enabled: false
  # [...]

cs_csrf:
  enabled: false
  # [...]

Step 2. Run an instance of the pipeline with:

$ python3 -m run_pipeline --conf=config.yaml

Hint. You can run multiple instances of the pipeline under different screens:

$ screen -dmS s1 bash -c 'python3 -m run_pipeline --conf=conf1.yaml; exec sh'
$ screen -dmS s2 bash -c 'python3 -m run_pipeline --conf=conf2.yaml; exec sh'
$ # [...]

To generate parallel configuration files automatically, you may use the generate_config.py script.

How to Interpret the Output of the Analysis?

The outputs will be stored in a file called sink.flows.out in the same folder as that of the input. For Client-side CSRF, for example, for each HTTP request detected, JAW outputs an entry marking the set of semantic types (a.k.a, semantic tags or labels) associated with the elements constructing the request (i.e., the program slices). For example, an HTTP request marked with the semantic type ['WIN.LOC'] is forgeable through the window.location injection point. However, a request marked with ['NON-REACH'] is not forgeable.

An example output entry is shown below:

[*] Tags: ['WIN.LOC']
[*] NodeId: {'TopExpression': '86', 'CallExpression': '87', 'Argument': '94'}
[*] Location: 29
[*] Function: ajax
[*] Template: ajaxloc + "/bearer1234/"
[*] Top Expression: $.ajax({ xhrFields: { withCredentials: "true" }, url: ajaxloc + "/bearer1234/" })

1:['WIN.LOC'] variable=ajaxloc
	0 (loc:6)- var ajaxloc = window.location.href

This entry shows that on line 29, there is a $.ajax call expression, and this call expression triggers an ajax request with the url template value of ajaxloc + "/bearer1234/, where the parameter ajaxloc is a program slice reading its value at line 6 from window.location.href, thus forgeable through ['WIN.LOC'].

Test Web Application

In order to streamline the testing process for JAW and ensure that your setup is accurate, we provide a simple node.js web application which you can test JAW with.

First, install the dependencies via:

$ cd tests/test-webapp
$ npm install

Then, run the application in a new screen:

$ screen -dmS jawwebapp bash -c 'PORT=6789 npm run devstart; exec sh'

Detailed Documentation.

For more information, visit our wiki page here. Below is a table of contents for quick access.

The Web Crawler of JAW

Data Model of Hybrid Property Graphs (HPGs)

Graph Construction

Graph Traversals

Contribution and Code Of Conduct

Pull requests are always welcomed. This project is intended to be a safe, welcoming space, and contributors are expected to adhere to the contributor code of conduct.

Academic Publication

If you use the JAW for academic research, we encourage you to cite the following paper:

@inproceedings{JAW,
  title = {JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals},
  author= {Soheil Khodayari and Giancarlo Pellegrino},
  booktitle = {30th {USENIX} Security Symposium ({USENIX} Security 21)},
  year = {2021},
  address = {Vancouver, B.C.},
  publisher = {{USENIX} Association},
}   

Acknowledgements

JAW has come a long way and we want to give our contributors a well-deserved shoutout here!

@tmbrbr, @c01gide, @jndre, and Sepehr Mirzaei.

jaw's People

Contributors

jackfromeast avatar jndre avatar soheilkhodayari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jaw's Issues

Feature: option for incremental storage of graph construction output during construction

Feature

Introduce an option in the configuration file to sequentially save the outcomes of the graph construction phase to disk as nodes and edges are generated.

Why Needed?

This functionality proves invaluable, especially when a timeout is configured, ensuring that instead of losing everything, we retain partial results.

Notes

This feature used to exist (partially) in v1. It has to be adapted to the new implementation.

[Crawler] New bs4 scriptElement.text returns empty for non-empty scripts

The new version of the Beautiful Soup library is buggy with respect to the scriptElement.text API, which is used to fetch the script content in the hpg_crawler component. This API randomly returns None or empty for some webpages while the script content is non-empty, which leads to missing JavaScript in the concatenated output program.

The code must use the scriptElement.string API instead for more reliable script collection and compatibility with the newer version.

Parallelization of IPCG alias check

The alias check function during the IPCG build is one of the most time-consuming steps when analyzing large code-bases (30k+ lines).
I wrote a proof-of-concept c++ executable that executes the fixed alias check from #11 on several threads.

The PoC calls the executable via execSync and exchanges in- and outputs in JSON files, which is not efficient.
A native NodeJS module that exchanges data via Node-API would probably be more efficient.

// #define JSON_DIAGNOSTICS 1
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include "./json.hpp"
#include <thread>
#include <shared_mutex>

std::shared_mutex rw_mutex;

size_t isValidSubstringPosition(const std::string& str, const std::string& substring) {
    size_t pos = str.find(substring);
    if (pos == std::string::npos) {
        return std::string::npos;
    }

    bool validStart = pos == 0 || str[pos - 1] == '.' || substring.front() == '.';
    bool validEnd = pos + substring.length() == str.length() || str[pos + substring.length()] == '.' || substring.back() == '.';

    if (validStart && validEnd) {
        return pos;
    }
    return std::string::npos;
}

void processChunk(const std::vector<std::pair<std::string, std::string>>& chunk,
                  const std::vector<std::pair<std::string, std::string>>& shared_function_map_pairs, std::vector<std::pair<std::string, std::string>>& shared_new_alias_pairs) {

    for (const auto& alias_pair : chunk) {
        std::vector<std::pair<std::string, std::string>> local_pairs;
        std::vector<std::pair<std::string, std::string>> new_pairs;

        {
            std::shared_lock<std::shared_mutex> readLock(rw_mutex);
            local_pairs = shared_new_alias_pairs;
        }

        for (const auto& function_map_pair : shared_function_map_pairs) {
            size_t pos = isValidSubstringPosition(function_map_pair.first, alias_pair.first);
            if (pos != std::string::npos) {
                std::string newKey = function_map_pair.first;
                newKey.replace(pos, alias_pair.first.length(), alias_pair.second);
                new_pairs.push_back({newKey, function_map_pair.second});
            }
        }

        for (const auto& function_map_pair : local_pairs) {
            size_t pos = isValidSubstringPosition(function_map_pair.first, alias_pair.first);
            if (pos != std::string::npos) {
                std::string newKey = function_map_pair.first;
                newKey.replace(pos, alias_pair.first.length(), alias_pair.second);
                new_pairs.push_back({newKey, function_map_pair.second});
            }
        }

        // Update the shared container with the new pairs
        if (!new_pairs.empty()) {
            std::unique_lock<std::shared_mutex> writeLock(rw_mutex);
            shared_new_alias_pairs.insert(shared_new_alias_pairs.end(), new_pairs.begin(), new_pairs.end());
        }
    }
}

int main(int argc, char* argv[]) {
    if (argc != 3) {
        std::cerr << "Usage: program <pairs_file> <function_map_file>" << std::endl;
        return 1;
    }

    std::ifstream aliasPairsStream(argv[1]);
    std::ifstream functionMapStream(argv[2]);

    nlohmann::json aliasPairsJson;
    nlohmann::json functionMapJson;

    aliasPairsStream >> aliasPairsJson;
    std::cout << "decoded alias pairs" << std::endl;
    functionMapStream >> functionMapJson;
    std::cout << "decoded function map" << std::endl;


    std::vector<std::pair<std::string, std::string>> alias_pairs;
    for (const auto& pair : aliasPairsJson) {
        if (pair[0].is_null() || pair[1].is_null()) {
            continue;
        }
        alias_pairs.emplace_back(pair[0].get<std::string>(), pair[1].get<std::string>());
    }

    std::vector<std::pair<std::string, std::string>> function_map_pairs;
    for (auto it = functionMapJson.begin(); it != functionMapJson.end(); ++it) {
        function_map_pairs.emplace_back(it.key(), it.key());
    }

    std::vector<std::pair<std::string, std::string>> new_alias_pairs;

    const int num_threads = std::thread::hardware_concurrency();
    std::vector<std::thread> threads;
    int chunk_size = alias_pairs.size() / num_threads;

    for (int i = 0; i < num_threads; ++i) {
        int start_idx = i * chunk_size;
        int end_idx = (i == num_threads - 1) ? alias_pairs.size() : start_idx + chunk_size;

        std::vector<std::pair<std::string, std::string>> chunk(alias_pairs.begin() + start_idx, alias_pairs.begin() + end_idx);
        threads.push_back(std::thread(processChunk, chunk, std::ref(function_map_pairs), std::ref(new_alias_pairs)));
    }

    // Join threads
    for (auto& th : threads) {
        th.join();
    }

    // Serialize and write to out.json
    nlohmann::json outputJson = new_alias_pairs;
    std::ofstream out("out.json");
    out << outputJson.dump(4);
    out.close();

    return 0;
}

checkFunctionMapForPartialAliasing does not check if the detected substring is a valid alias

The function checkFunctionMapForPartialAliasing checks for aliasing.
For an existing assignment let alias = foo; the function searches if foo is a substring of any functionName in the functionMap. If such substring is found, the function replaces foo with alias and creates a new entry for the resulting string in the functionMap.

However, the function does not check that the substring is correctly terminated as shown by the following example:

<html>

<body>
    <script>
        let source = location.hash;

        class Foo {
            constructor() { }

            unsafe(param) {
                fetch(param);
            }

            otherUnsafe(otherParam) {
                fetch(otherParam);
            }
        }

        let foo = new Foo();
        let alias = foo;
        alias.unsafe(source); // correct invocation

        let foobar = new Foo();
        let uzzy = oobar;
        fuzzy.otherUnsafe(source); // false invocation
    </script>
</body>

</html>

sinks.flows.out:

====================================================
[timestamp] generated on 27/10/2023 06:41:16
====================================================

[*] webpage URL: http://localhost/test-aliasing.html

----------------------------------------------------

[*] webpage: bd1dbaa1763f35715a8dc4101fcf31a97cdf82e4da035aaf5732fa04edcba7f2
[*] script: 0.js
[*] semantic_types: ['RD_WIN_LOC', 'WR_REQ_URL']
[*] node_id: 21
[*] cfg_node_id: 20
[*] loc: 10
[*] sink_type: fetch
[*] sink_code: fetch(param)

1:['RD_WIN_LOC'] variable=param
	1 (loc:6)- param = FUNCTION_ARGUMENT
	2 (loc:17)- param <--(invocation-value)-- [def-scope-id=43] source
	3 (loc:1)- [scope-id=43]  let source = location.hash


----------------------------------------------------
[*] webpage: bd1dbaa1763f35715a8dc4101fcf31a97cdf82e4da035aaf5732fa04edcba7f2
[*] script: 0.js
[*] semantic_types: ['RD_WIN_LOC', 'WR_REQ_URL']
[*] node_id: 30
[*] cfg_node_id: 29
[*] loc: 15
[*] sink_type: fetch
[*] sink_code: fetch(otherParam)

1:['RD_WIN_LOC'] variable=otherParam
	1 (loc:10)- otherParam = FUNCTION_ARGUMENT
	2 (loc:21)- otherParam <--(invocation-value)-- [def-scope-id=58] source
	3 (loc:1)- [scope-id=58]  let source = location.hash


----------------------------------------------------

The function falsely detects a flow in line 21 of the code, which is the nonexistent function fuzzy.
This happens because the oobar in foobar is replaced with uzzy.

As a side-effect, this bug leads us to find a lot more aliases during IPCG creation and the exponential growth of the functionMap drastically impacts performance

The fix is to check if the discovered substring is correctly terminated by . :

var checkFunctionMapForPartialAliasing = function(pairs){
  for(var i=0; i< pairs.length; i++){
          var partialActualName = pairs[i][0];
          var partialAliasName = pairs[i][1];
  
          if(partialActualName !== undefined && partialAliasName !== undefined){
              // console.log(i + "/" + pairs.length + " " +  Object.keys(functionMap).length);
              for(var functionName in functionMap){
                  if (functionName !== undefined) {
                      let len = partialActualName.length;
                      let idx = functionName.indexOf(partialActualName);
  
                      if(idx == -1){
                          continue;
                      }
  
                      if((partialActualName[0] == '.' || idx == 0 || functionName[idx-1] == '.' ) && (partialActualName[len-1] == '.' || idx+len == functionName.length || functionName[idx+len] == '.') ){
                          var newName = functionName.replace(partialActualName, partialAliasName);
                          functionMap[newName] = functionMap[functionName];
                          // console.log(functionName + " <=> " + newName + "\n" + Object.keys(functionMap).length);
                      }
                  }
              }
          }
      }
}

I am probably missing some edge-case involving [] member expressions.

does it work with Aura?

before trying, it would be good to ask if using an Aura db (or perhaps any remote db) is currently possible.

Crawler configuration

Rewrite the crawler's code such that one can choose what type of information to store, e.g. saving a website's urls without having to store any other information.

Data flow analysis does not find the bindings between function-call arguments and function definition params

The analysis step that connects function-call arguments and function definition params is not executed. Thus, the data flow analysis does not traverse function calls. This affects both XSRF and DOM-Clobbering detection. I appended my testcase for XSRF detection. The fetch inside unsafe() is not reached according to the results.

website hosted at http://localhost:

<html>
        <body>
                <script>
                        let csrf_src = window.location.hash;
                        fetch("https://" + csrf_src + "/");

                        function unsafe(x){
                                fetch("https://" + x + "/");
                        }

                        unsafe(csrf_src);
                </script>
        </body>
</html>

config.yaml

# 1. which webapps to test?
testbed: 
  ## option 1: test a specifc website
  site: http://localhost
  ## option 2: provide a top-site list (e.g., Alexa, Tranco, etc)
  #sitelist: /input/tranco_Y3JG_unique.csv
  #from_row: 1
  #to_row: 5000


# 2. crawler configuration
crawler:
  # max number of urls to visit
  maxurls: 2
  # time budget for crawling each site in seconds
  sitetimeout: 1800 # 30 mins;
  # browser to use for crawling
  browser:
    name: chrome
    headless: true


# 3. static analysis configuration
staticpass:
  # time budget for static analysis of each site in seconds
  sitetimeout: 10800 # 3 hrs
  # max amount of available memory for static analysis per process
  memory: 32000


# 4. dynamic analysis configuration
dynamicpass:
  # time budget for dynamic analysis of each site in seconds
  sitetimeout: 10800 # 3 hrs
  # which browser to use
  browser:
    name: chrome
    # use remote browserstack browsers or not
    use_browserstack: false
    browserstack_username: xyz
    browserstack_password: xyz
    browserstack_access_key: xyz
    


# 5. choose the vulnerability analysis component to run
# only one component must have the `enable` option as true
general_vuln_analysis:
  # performs a general vulnerability analysis (i.e., more than one vulnerability class)
  # when enabled, other vulnerability-specific components (as configured below) 
  # will not run individually anymore
  enable: false

domclobbering:
  enabled: false
  # enable or disable the passes, useful for large-scale analysis 
  # e.g., first crawl all websites, then analyze them,
  # as opposed to crawling and analyzing sequentially at the same time
  passes:
    crawling: true
    static: true
    static_neo4j: true
    dynamic: true


cs_csrf:
  enabled: true
  passes:
    crawling: true
    static: true
    static_neo4j: true

sink.flows.out:

====================================================
[timestamp] generated on 22/08/2023 09:21:11
====================================================

[*] NavigationURL: http://localhost

----------------------------------------------------
[*] Tags: ['NON-REACH']
[*] NodeId: {'TopExpression': '6', 'CallExpression': '7', 'Argument': '9'}
[*] Location: 5
[*] Function: Fetch
[*] Template: "https://" + x + "/"
[*] Top Expression: fetch("https://" + x + "/")
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['WIN.LOC-READ']
[*] NodeId: {'TopExpression': '22', 'CallExpression': '23', 'Argument': '25'}
[*] Location: 2
[*] Function: Fetch
[*] Template: "https://" + csrf_src + "/"
[*] Top Expression: fetch("https://" + csrf_src + "/")

1:['WIN.LOC-READ'] variable=csrf_src
        0 (loc:1)- let csrf_src = window.location.hash
----------------------------------------------------

RecursionError in _get_varname_value_from_context

When parsing JS code that has some kind of recursion and includes a sink, the data flow analysis infinitely recurses until we receive RecursionError: maximum recursion depth exceeded.

example for request highjacking:

<html>

<body>
    <script>
        let source = location.hash;

        function foo(param, depth) {
            if (depth > 10) {
                fetch(param);
            } else {
                bar(param, depth + 1);
            }
        }

        function bar(param, depth) {
            foo(param, depth + 1);
        }

        foo(source, 0);
    </script>
</body>

</html>

The recursion in the data flow analysis goes:

  • _get_varname_value_from_context
  • get_function_def_of_block_stmt
  • check_if_function_has_param
  • get_function_call_values_of_function_definitions
  • _get_varname_value_from_context ...

This bug was triggered by some third-party library scripts

where do the inputs go?

Following along with the readme instructions, I'm putting my js program in the hpg_contstuction/outputs folder (I made a new directory called test_inputs):

Screen Shot 2021-07-08 at 7 06 34 PM

from root, i'm running various versions of:

python3 -m hpg_construction.api test_inputs/ --js=test_program

but I can't seem to get it to run without error

2021-07-08 19:05:20,711 - JAW - ERROR - Path of the program under analysis must be within the outputs folder. (api.py:62)

I tried:

	base_path = args['path']
	if constantsModule.OUTPUT_NODES_RELS_PATH not in base_path:
		logger.error(OUTPUT_NODES_RELS_PATH)
		sys.exit(1)

to see what it was looking for and got:

NameError: name 'OUTPUT_NODES_RELS_PATH' is not defined

I got the same issue doing the test script:

python3 -m hpg_construction.api ./hpg_construction/outputs/test_website/test_webpage --hybrid=false

Data flow analysis fails for parameters of member functions

After applying the patch from #9 the data flows of the parameters of regular functions are detected. But invocations of member functions are not discovered. This affects both CSRF and DOMC analysis. The following example is for the CSRF analysis:

<html>

<body>
    <script>
        let source = window.location.hash;

        function regularFunction(a) {
            fetch("https://" + a + "/"); // found flow
        }
        regularFunction(source);

        const functionExpression = function (b) {
            fetch("https://" + b + "/"); // found flow
        };
        functionExpression(source);

        class Foo {
            memberFunction(c) {
                fetch("https://" + c + "/"); // no flow found
            }

            static staticFunction(d) {
                fetch("https://" + d + "/"); // no flow found
            }
        }
        Foo.staticFunction(source);

        let baz = {
            shorthandFunction: function (e) {
                fetch("https://" + e + "/"); // found flow only up to 'source' identifier
            }
        };
        baz.shorthandFunction(source);

        let foo = new Foo();
        foo.memberFunction(source);

        foo.addedFunction = function (f) {
            fetch("https://" + f + "/"); // found flow only up to 'source' identifier
        };
        foo.addedFunction(source);

        Foo.prototype.functionAddedToPrototype = function (g) {
            fetch("https://" + g + "/"); // not found
        };
        foo.functionAddedToPrototype(source);
    </script>
</body>

</html>

sink.flows.out:

====================================================
[timestamp] generated on 07/09/2023 07:32:24
====================================================

[*] NavigationURL: http://localhost

----------------------------------------------------
[*] Tags: ['WIN.LOC-READ']
[*] NodeId: {'TopExpression': '14', 'CallExpression': '15', 'Argument': '17'}
[*] Location: 4
[*] Function: Fetch
[*] Template: "https://" + a + "/"
[*] Top Expression: fetch("https://" + a + "/")

1:['WIN.LOC-READ'] variable=a
	0 (loc:3)- a = LOCAL_FUNCTION_ARGUMENT
	1 (loc:6)- a <--(invocation-value)-- [def-scope-id=23] source
	2 (loc:1)- [scope-id=23]  let source = window.location.hash
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['WIN.LOC-READ']
[*] NodeId: {'TopExpression': '32', 'CallExpression': '33', 'Argument': '35'}
[*] Location: 9
[*] Function: Fetch
[*] Template: "https://" + b + "/"
[*] Top Expression: fetch("https://" + b + "/")

1:['WIN.LOC-READ'] variable=b
	0 (loc:8)- b = LOCAL_FUNCTION_ARGUMENT
	1 (loc:11)- b <--(invocation-value)-- [def-scope-id=41] source
	2 (loc:1)- [scope-id=41]  let source = window.location.hash
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['NON-REACH']
[*] NodeId: {'TopExpression': '52', 'CallExpression': '53', 'Argument': '55'}
[*] Location: 15
[*] Function: Fetch
[*] Template: "https://" + c + "/"
[*] Top Expression: fetch("https://" + c + "/")

1:['NON-REACH'] variable=c
	0 (loc:14)- c = LOCAL_FUNCTION_ARGUMENT
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['NON-REACH']
[*] NodeId: {'TopExpression': '65', 'CallExpression': '66', 'Argument': '68'}
[*] Location: 19
[*] Function: Fetch
[*] Template: "https://" + d + "/"
[*] Top Expression: fetch("https://" + d + "/")

1:['NON-REACH'] variable=d
	0 (loc:18)- d = LOCAL_FUNCTION_ARGUMENT
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['NON-REACH']
[*] NodeId: {'TopExpression': '88', 'CallExpression': '89', 'Argument': '91'}
[*] Location: 26
[*] Function: Fetch
[*] Template: "https://" + e + "/"
[*] Top Expression: fetch("https://" + e + "/")

1:['NON-REACH'] variable=e
	0 (loc:25)- e = LOCAL_FUNCTION_ARGUMENT
	1 (loc:29)- e <--(invocation-value)-- [def-scope-id=97] source
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['NON-REACH']
[*] NodeId: {'TopExpression': '121', 'CallExpression': '122', 'Argument': '124'}
[*] Location: 35
[*] Function: Fetch
[*] Template: "https://" + f + "/"
[*] Top Expression: fetch("https://" + f + "/")

1:['NON-REACH'] variable=f
	0 (loc:34)- f = LOCAL_FUNCTION_ARGUMENT
	1 (loc:37)- f <--(invocation-value)-- [def-scope-id=130] source
----------------------------------------------------

----------------------------------------------------
[*] Tags: ['NON-REACH']
[*] NodeId: {'TopExpression': '145', 'CallExpression': '146', 'Argument': '148'}
[*] Location: 40
[*] Function: Fetch
[*] Template: "https://" + g + "/"
[*] Top Expression: fetch("https://" + g + "/")

1:['NON-REACH'] variable=g
	0 (loc:39)- g = LOCAL_FUNCTION_ARGUMENT
----------------------------------------------------

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.