Giter Club home page Giter Club logo

taintmini's Introduction

TaintMini

TaintMini is a framework for detecting flows of sensitive data in Mini-Programs with static taint analysis. It is a novel universal data flow graph approach that captures data flows within and across mini-programs.

taintmini

We implemented TaintMini based on pdg_js (from DoubleX by Aurore Fass et al.). For more implementation details, please refer to our paper and the DoubleX paper.

Table of contents

Prerequisites

Environment

For optimal performance, we recommend allocating at least 4 cores and 16 GiB of memory to run the tool. Additionally, for best IO performance during analysis, we recommend using SSDs rather than hard disk drives, due to the large number of small files (less than one page size) that Mini-Programs typically have. As a reference, we used 16 vCPUs of Intel Xeon Silver 4314, 128 GiB of 3200 MHz DDR4 memory, and 2 TiB of NVMe SSD (700 KIOPS) as the host for building and validating our artifact evaluation submission.

Dependencies

Install Node.js dependencies for pdg_js first.

# make sure node.js and npm is installed
node --version && cd pdg_js && npm i

Install requirements for python.

# install requirements
pip install -r requirements.txt

Pre-processing

TaintMini operates on unpacked WeChat Mini-Programs, necessitating the use of a WeChat Mini-Program unpacking tool in advance. Please note that we are unable to provide such a tool directly due to potential legal implications. We recommend seeking it out on external websites.

Usage

usage: mini-taint [-h] -i path [-o path] [-c path] [-j number] [-b]

optional arguments:
  -h, --help            show this help message and exit
  -i path, --input path
                        path of input mini program(s). Single mini program directory or index files will both be fine.
  -o path, --output path
                        path of output results. The output file will be stored outside of the mini program directories.
  -c path, --config path
                        path of config file. See default config file for example. Leave the field empty to include all results.
  -j number, --jobs number
                        number of workers.
  -b, --bench           enable benchmark data log. Default: False

Results will be written to the directory provided by the -o/--output flag. Result files are named $(basename <directory>)-result.csv, along with $(basename <directory>)-bench.csv if -b/--bench option is present.

Config

The config.json is a JSON formatted file, which includes two fields: sources and sinks:

  • sources is an array, indicating the source APIs that need to be included. Please note there is a special value named [double_binding] which indicates the data flows from WXML.
  • sinks is an array, indicating the sink APIs that need to be included.

For examples, please refer to the config.json file.

Examples

Single MiniProgram

Analyze a single MiniProgram; Include all sources and sinks; Enable multi-processing (all available CPU cores); No benchmark required.

python main.py -i /path/to/miniprogram -o ./results -j $(nproc)

Multiple MiniPrograms

Analyze multiple MiniPrograms; Include all sources and sinks; Enable multi-processing (all available CPU cores); Benchmarks required.

# generate index
find /path/to/miniprograms -maxdepth 1 -type d -name "wx*" > index.txt
# start analysis
python main.py -i ./index.txt -o ./results -j $(nproc) -b

Citation

If you find TaintMini useful, please consider citing our paper and DoubleX:

@inproceedings{wang2023taintmini,
  title={TAINTMINI: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis},
  author={Wang, Chao and Ko, Ronny and Zhang, Yue and Yang, Yuqing and Lin, Zhiqiang},
  booktitle={Proceedings of the 45th International Conference on Software Engineering},
  year={2023}
}

@inproceedings{fass2021doublex,
author="Aurore Fass and Doli{\`e}re Francis Som{\'e} and Michael Backes and Ben Stock",
title="{\textsc{DoubleX}: Statically Detecting Vulnerable Data Flows in Browser Extensions at Scale}",
booktitle="ACM CCS",
year="2021"
}

License

This project is licensed under the terms of the AGPLV3 license.

taintmini's People

Contributors

chaowangsec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

taintmini's Issues

关于同一个小程序不同页面间的数据流的疑问

尊敬的TaintMini作者,您好。

我近期正尝试用TaintMini做一些关于小程序的分析。论文中提到TaintMini有能力处理不同页面之间的数据流,即论文Figure2中标号为③的数据流向箭头。我使用用论文中提供的代码片段以及其他的样本测试发现输出没有包含预期结果。我阅读了相关代码,但是未找到处理小程序页面间数据流的代码(包括通过navigateTo函数的url参数传递数据,以及通过eventChannel传递数据)。请问这部分的具体实现在哪里呢?
测试用的关键代码如下,另外附件里包含了完整的代码文件。按我的预期,工具应当能够发现从var d = wx.getStorageSync("phone");console.log(obj);的数据流,但实际输出结果中并未包含该数据流。

// config.json
{
  "sources": ["wx.getStorageSync"],
  "sinks": ["console.log", "wx.navigateTo"]
}
// index.js
Page({
    data: {
    },
    onLoad: function (options) {
        var d = wx.getStorageSync("phone");

        wx.navigateTo({
            url: "/pages/test1/test1",
            success: function(res) {
                res.eventChannel.emit('sinkPage', {data: 
                    {phone: d}
                });
            }
        });
    },
    globalData: {
        userInfo: null
    }
});
// test1.js
Page({
    data: {
    },
    onLoad: function (option) {
        var evChannel = this.getOpenerEventChannel();
        evChannel.on("sinkPage", function (data) {
            var obj = data.phone;
            console.log(obj);
        })
    }
})

TestGWY.zip

求助:运行出错

输入路径为小程序的app.json
输入:python main.py -i 路径
得到 UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 472: illegal multibyte sequence 的报错,求助

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.