Giter Club home page Giter Club logo

in_nomine_function's Introduction

In Nomine Function

Models and code for the paper:

In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks [PDF]

TL;DR: We used a transformer model to predict function name in a stripped binary.

If you are using this code please cite:

@article{artuso2019nomine,
  title={In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks},
  author={Artuso, Fiorella and Di Luna, Giuseppe Antonio and Massarelli, Luca and Querzoni, Leonardo},
  journal={arXiv},
  year={2019}
}

Quickstart

Using our model to predict names for a stripped binary is straightforward. Here we'll show how to predict function names for a given binary. In this example we will use the same gonnacry sample we used in our paper.

Requirements: To predict names for your own binary you need radare2 installed on your machine. To install it take a look at https://github.com/radareorg/radare2.


Clone this repo and install dependencies

First of all clone this repository:

git clone --recursive https://github.com/lucamassarelli/in_nomine_function 

After cloning the repo, install all requirements:

pip install -r requirements.txt

Download trained model

Now you need to download the trained model. To download the transformer_pt model:

python downloader.py --transformer_pt

This will download the pretrained transformer model described in the paper in the folder data/model/ .


Disassemble the program and dump the assembly code

Then, you need to dump the assembly code for unnamed funtions in your stripped binary:

python dump_data_from_binary.py -i gonnacry.o -o data/gonnacry -s

This will create two files: data/gonnacry.asm where each line correspond to the dumped assembly code for a function and data/gonnacry.meta where each line correspond to the address and the name of each function. The -s option tells to script to dump all functions in the binary, also the ones that are referenced by a symbol.


Predict!

Finally, launch the predictions:

./predict.sh data/gonnacry.asm data/gonnacry.pred data/model/model.transformer_asm_name_step_219400.pt

This will predict the names for the functions in your binary and will print them in the prediction file, each line in the prediction file represent the predicted name for the corresponding line in data/gonnacry.asm and data/gonnacry.meta files.


Of course, you can replace gonnacry binary with any unix X86 executable of your choice.

Reproducing paper results

We are committed to permit an easy reproduction of research results. We hope that the information below will permit to anyone to reproduce an easy reproduction of the results in our research paper.

Downloading ubuntu dataset

If you want to download the whole ubuntu dataset described in the paper, you can use (This operation needs 40 GB of free space):

python downloader.py --all_data

If you don't need the whole dataset, you can download only the test set:

python downloader.py --test_data

Reproducing results on the test set

Once downloaded the ubuntu test set you can run:

./predict.sh data/ubuntu_test_data/ubuntu_ds_test.asm \
             ubuntu_ds_test.pred \
             data/model/model.transformer_asm_name_step_219400.pt \
             data/ubuntu_test_data/ubuntu_ds_test.name

This will predict names for the functions in the test set and then it will compute precision, recall and f1 score on them.

You can also predict names using our pretrained seq2seq model. To download it:

python downloader.py --seq2seq_pt

Training a model from scratch

You can train your model from scratch. First, download the whole ubuntu dataset as explained above.

Once you have downloaded the dataset you need to preprocess it:

./preprocess.sh

This script assume the dataset is in the default folder data/ubuntu_all_data/, modify it according to your local setup. The preprocessing will create binary data file into the folder data/preprocessed_ubuntu_dataset.

Finally, you can start the training.

To train transformer model:

python OpenNMT-py/train.py --config model_configs/config_transformer.yaml

To train seq2seq model:

python OpenNMT-py/train.py --config model_configs/config_seq2seq.yaml

Acknowledgements

In our code we use godown to download data from Google drive. We thank circulosmeos, the creator of godown. Giuseppe Di Luna was supported by the AXA Fellowship.

in_nomine_function's People

Contributors

gadiluna avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.