Giter Club home page Giter Club logo

parmesan's Introduction

ParmeSan: Sanitizer-guided Greybox Fuzzing

License

ParmeSan is a sanitizer-guided greybox fuzzer based on Angora.

Published Work

USENIX Security 2020: ParmeSan: Sanitizer-guided Greybox Fuzzing.

The paper can be found here: ParmeSan: Sanitizer-guided Greybox Fuzzing

Building ParmeSan

See the instructions for Angora.

Basically run the following scripts to install the dependencies and build ParmeSan:

build/install_rust.sh
PREFIX=/path/to/install/llvm build/install_llvm.sh
build/install_tools.sh
build/build.sh

ParmeSan also builds a tool bin/llvm-diff-parmesan, which can be used for target acquisition.

Building a target

First build your program into a bitcode file using clang (e.g., base64.bc). Then build your target in the same way, but with your selected sanitizer enabled. To get a single bitcode file for larger projects, the easiest solution is to use gllvm.

# Build the bitcode files for target acquisition
USE_FAST=1 $(pwd)/bin/angora-clang -emit-llvm -o base64.fast.bc -c base64.bc
USE_FAST=1 $(pwd)/bin/angora-clang -fsanitize=address -emit-llvm -o base64.fast.asan.bc -c base64.bc
# Build the actual binaries to be fuzzed
USE_FAST=1 $(pwd)/bin/angora-clang -o base64.fast -c base64.bc
USE_TRACK=1 $(pwd)/bin/angora-clang -o base64.track -c base64.bc

Then acquire the targets using:

bin/llvm-diff-parmesan -json base64.fast.bc base64.fast.asan.bc

This will output a file targets.json, which you provide to ParmeSan with the -c flag.

For example:

$(pwd)/bin/fuzzer -c ./targets.json -i in -o out -t ./base64.track -- ./base64.fast -d @@

Options

ParmeSan's SanOpt option can speed up the fuzzing process by dynamically switching over to a sanitized binary only once the fuzzer reaches one of the targets specified in the targets.json file.

Enable using the -s [SANITIZED_BIN] option.

Build the sanitized binary in the following way:

USE_FAST=1 $(pwd)/bin/angora-clang -fsanitize=address -o base64.asan.fast -c base64.bc

Targets input file

The targets input file consisit of a JSON file with the following format:

{
  "targets":  [1,2,3,4],
  "edges":   [[1,2], [2,3]],
  "callsite_dominators": {"1": [3,4,5]}
}

Where the targets denote the identify of the cmp instruction to target (i.e., the id assigned by the __angora_trace_cmp() calls) and edges is the overlay graph of cmp ids (i.e., which cmps are connected to each other). The edges filed can be empty, since ParmeSan will add newly discovered edges automatically, but note that the performance will be better if you provide the static CFG.

It is also possible to run ParmeSan in pure directed mode (-D option), meaning that it will only consider new seeds if the seed triggers coverage that is on a direct path to one of the specified targets. Note that this requires a somewhat complete static CFG to work (an incomplete CFG might contain no paths to the targets at all, which would mean that no new coverage will be considered at all).

ParmeSan Screenshot

How to get started

Have a look at BUILD_TARGET.md for a step-by-step tutorial on how to get started fuzzing with ParmeSan.

FAQ

  • Q: I get a warning like ==1561377==WARNING: DataFlowSanitizer: call to uninstrumented function gettext when running the (track) instrumented program.
  • A: In many cases you can ignore this, but it will lose the taint (meaning worse performance). You need to add the function to the abilist (e.g., llvm_mode/dfsan_rt/dfsan/done_abilist.txt) and add a custom DFSan wrapper (in llvm_mode/dfsan_rt/dfsan/dfsan_custom.cc). See the Angora documentation for more info.
  • Q: I get an compiler error when building the track binary.
  • A: ParmeSan/ Angora uses DFSan for dynamic data-flow analysis. In certain cases building target applications can be a bit tricky (especially in the case of C++ targets). Make sure to disable as much inline assembly as possible and make sure that you link the correct libraries/ llvm libc++. Some programs also do weird stuff like an indirect call to a vararg function. This is not supported by DFSan at the moment, so the easy solution is to patch out these calls, or do something like indirect call promotion.
  • Q: llvm-diff-parmesan generates too many targets!
  • A: You can do target pruning using the scripts in tools/ (in particular tools/prune.py) or use ASAP to generate a target bitcode file with fewer sanitizer targets.

Docker image

You can also get the pre-built docker image of ParmeSan.

docker pull vusec/parmesan
docker run --rm -it vusec/parmesan
# In the container you can build objdump
/parmesan/misc/build_objdump.sh

parmesan's People

Contributors

dcasenove avatar jbn605 avatar microsvuln avatar sirmc avatar

parmesan's Issues

IDAssigner: collectCallSiteDominators

Problem

I've started looking at pieces of code related to indirect calls, specifically those in the IDAssigner pass which is used to get static CFG information.

The function collectCallSiteDominators references indirect call sites and is used to populate the CallSiteDominatorsMap which maps call site IDs to CmpIds

CallSiteDominatorsMap[PrevCallSiteId] = CSDominatorCmpIds;

To test the functionality of this function I have adapted the code found in this thread into this small test


#include "stdlib.h"
#include "stdio.h"

void example_fun2(void)
{
    printf("Example Fun 2\n");
}

void example_fun(int param)
{
    printf("Example Fun\n:");

    if(param == 10){
        void (*fp) (void) = &example_fun2;
        (*fp)();
    }
}

int main(void)
{
    int i = 0;

    if(i == 0) {
     example_fun(10);
    }
}

Current result


{
"targets": [3],
"edges": [[0,0], [3,0], [3,3], [9,0], [9,9]],
"callsite_dominators":{}
}

In the current implementation no call site dominator is found and execution stops after the following code is executed

auto ArgI = CI->getArgOperand(0);
if (!ArgI)
continue;
Instruction *Inst = dyn_cast<Instruction>(ArgI);
if (!Inst)
continue;

Based on these results I am having trouble understanding what this function is supposed to do and what the intended results are.
On bigger testcases such as objdump it does indeed find call site dominators but I can't understand why it doesn't work on a test like this too.

Possible fix

Looking at the thread a possible solution could be to remove

if (CI->getNumArgOperands() < 1)
continue;
auto ArgI = CI->getArgOperand(0);
if (!ArgI)
continue;
Instruction *Inst = dyn_cast<Instruction>(ArgI);

and replace it with


 Instruction *Inst = dyn_cast<Instruction>(CI->getCalledValue());

Result


{
"targets": [3],
"edges": [[0,0], [3,0], [3,3], [9,0], [9,9]],
"callsite_dominators":{"327263": [3]}
}

In this case the CallSiteId 327263 is the following indirect call and 3 the cmpId assigned to the following conditional


if(param == 10){
    void (*fp) (void) = &example_fun2;
    (*fp)();
}

This implementation also finds call site dominators on objdump but the results are much different than the ones in the current implementation.

EDIT:

Example to assign multiple CmpId to a CallSiteId

#include "stdlib.h"
#include "stdio.h"

void example_fun3(void)
{
    printf("Example Fun 3\n");
}

void example_fun2(void)
{
    printf("Example Fun 2\n");
}

void example_fun(int param)
{
    printf("Example Fun\n:");

    void (*fp) (void);
    int param2 = 10;

    if(param == 10){
        fp = example_fun3;
    }

    if(param2 == 10){
        fp = example_fun2;
    }
    (*fp)();
}

int main(void)
{
    int i = 0;

    if(i == 0) {
     example_fun(10);
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.