Giter Club home page Giter Club logo

source2binary_dataset_construction's Introduction

Source2binary Dataset Construction

This is the repository for the paper "One to One or One to many? What function inline brings to binary similarity analysis".

Constuction

Folder "construction" shows some scripts to extract the binaries. "construction\Dockerfile_source2binary" is a Dockerfile for compiling coreutils v8.29 using clang-10 and O0-O3 options. Run "docker build -t image_owner/image_name -f Dockerfile_source2binary ." to build an image containing the source and binary of coreutils.

Labeling

Folder "ground_truth_building" contains the code to automatically label the above dataset. In detail, the code structure is listed as follows:

dir file function
IDA_pro_scripts extract_binary_range.py scripts to extract binary function boundary for IDA 7.0 and lower
extract_binary_range_75.py scripts to extract binary function boundary for IDA 7.5
extract_debug_information extract_debug_dump.py extract the line mapping from .debug_line section in binary using readelf
extract_source_information use_understand_to_extract_entity.py use understand to extract the source line-to-function mapping.
mapping binary2source_mapping.py extend the line-mapping with binary address-to-function mapping and source line-to-function mapping to function level mapping.
- binary2source_mapping_using_understand.py main function to conduct labeling for all binaries and source projects.
summary_for_inline_staticstics.py summary the metrics for all binaries.

When using the above scripts for dataset labeling, some paths need to be set. ``binary2source_mapping_using_understand.py'' contains several paths including the path of ida, the path of understand python, the path of understand tool, the path of dataset, and paths of scripts. And the running of the scripts requires the install of IDA Pro, understand, readelf and python3. The current version is implemented in Linux, but using it in windows is also feasible.

source2binary_dataset_construction's People

Contributors

island255 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.