Giter Club home page Giter Club logo

binseeker's Introduction

Repository of BinSeeker

I. Introduction of BinSeeker-

It's a vulnerability search tool for cross-platform binary. Given a vulnerability function f, BinSeeker- can identify whether a binary program contains the same vulnerability as f. Currently, it support three architectures, such as X86, ARM32, MIPS32.

II. Prerequisites

To use BinSeeker-, we need the following tools installed

  • IDA Pro - for generating the LSFG (data flow graph and control flow graph)and extracting features of basic blocks
  • python2.7 - all the source code is written in python2.7
  • miasm - for converting assembly program to IR. We extend it to support more assembly instructions. Please directly copy the miasm2 provided by us to the python directory of IDA Pro.

III. Directory structure

  • 0_Libs/search_program: it contains the binary file considered as the target from which BinSeeker search vulnerability.
  • 1_Features/search_program: it contains the instruction features, control flow graph and data flow graph for each function in the target.
  • 5_CVE_Feature: It contains the instruction features, control flow graph and data flow graph of each version of the two vulnerabilities (CVE-2014-3508, CVE-2015-1791).
  • 6_Search_TFRecord: Tfrecord data file is a binary file that stores data and labels in a unified way. It can make better use of memory and make rapid replication, movement, reading and storage in tensorflow.
  • 7_Search_Result: All the search result list will be stored here.

IV. Usage

  1. We need modify the config.py file. All the dependency directories can be modified here. Simple modification is listed as following, but it need to follow the directory structure we defined:
IDA32_DIR = "installation directory of 32-bit IDA Pro program"
IDA64_DIR = "installation directory of 64-bit IDA Pro program"
  1. We put the programs to be searched in the 0_Libs/search_program directory.
  2. We run the command.py file to generate the labeled semantic flow graphs and extract initial numerical vectors for basic blocks. The result files should be placed in the 1_Features/search_program directory.
  3. We execute the 7_search_by_list_binseeker file to obtain embedding vectors of the functions and get the function list in descending order of similarity scores.

Note: All steps can be executed in the Linux system.

V. Build BinSeeker- from source code for model modification and retraining

Optional installation and configuration: Python-2.7.13

If you have an appropriate Python-2.7 version, you can skip this installation. Please make sure that you have installed Python with ucs4 unicode encoding. You can identify ucs2 and ucs4 with the following code.

>> import sys
>>print sys.maxunicode
1114111# it means the ucs4 encoding
65535# it means the ucs2 encoding, you need reinstall your python. The tensorflow-1.1.0 requires the ucs4 unicode encoding style.
  1. install required libraries, or it will cause some troubles. sudo apt-get install python-dev libffi-dev libssl-dev libxml2-dev libxslt-dev libmysqlclient-dev libsqlite3-dev zlib1g-dev libgdbm-dev
  2. download and install Python-2.7.13 wget -c https://www.python.org/ftp/python/2.7.13/Python-2.7.13.tar.xz xz -d Python-2.7.13.tar.xz tar xf Python-2.7.13.tar cd Python-2.7.13 ./configure --prefix=/usr/local/python2713 --enable-unicode=ucs4 make make install
  3. install setuptools and pip package wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python curl -O https://bootstrap.pypa.io/get-pip.py python get-pip.py
  4. link pip and python to bin path rm /usr/bin/pip2 rm /usr/bin/pip2 ln -s /usr/local/python2713/bin/pip /usr/bin/pip2 ln -s /usr/local/python2713/bin/pip /usr/bin/pip rm /usr/bin/python rm /usr/bin/python2 ln -s /usr/local/python2713/bin/python /usr/bin/python2 ln -s /usr/local/python2713/bin/python /usr/bin/python
  5. add environment variables export PATH="$PATH:/usr/local/python2713/lib/python2.7/site-packages:/usr/local/python2713/bin"

Required installation

If you want to train your own network model, you need to install tensorflow-1.1.0 version. We build this version of tensorflow from source code. The following is the detailed installation instructions (for cpu-only tensorflow) on the ubuntu14 machine.

  1. install dependent packages
sudo apt-get install zlib1g-dev swig python-wheel pkg-config zip g++ unzip python-numpy python-dev
wget https://pypi.python.org/packages/c8/0a/b6723e1bc4c516cb687841499455a8505b44607ab535be01091c0f24f079/six-1.10.0-py2.py3-none-any.whl#md5=3ab558cf5d4f7a72611d59a81a315dc8 #download and install six
sudo pip install six-1.10.0-py2.py3-none-any.whl 
sudo pip install networkx
sudo pip install pyparsing
sudo pip install numpy
  1. install bazel building tool
  • Download bazel-0.4.2-installer-linux-x86_64.sh from https://github.com/bazelbuild/bazel .
  • chmod +x bazel-0.4.2-installer-linux-x86_64.sh
  • ./bazel-0.5.4-installer-linux-x86_64.sh --user
  • add bazel file path to the PATH environment variable. e,g,: export PATH="$PATH:$HOME/bin"
  1. install java8/openjdk8 sudo add-apt-repository ppa:openjdk-r/ppa sudo apt-get update sudo apt-get install openjdk-8-jdk sudo update-alternatives --config java #note: select the appropriate version sudo update-alternatives --config javac
  2. install tensorflow
  • $git clone --recurse-submodules https://github.com/tensorflow/tensorflow.git -b r1.1 #download source code,--recurse-submodules is used for downloading the dependent tools,-b r1.1 means the tensorflow-1.1.0 version.
  • enter the tensorflow directory and then select the python path e.g.,./condigure /usr/bin/python
note: the following is the selection during the installation process.
malloc implementation: Y
Google Cloud Platform support: N
Hadoop File System support: N
XLA just-in-time compiler: N
Python library paths: Default is [/usr/local/lib/python2.7/dist-packages],you can select a different path.
OpenCL support: N
CUDA support: N
Configuration finished.
  • execute bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package to build the tensorflow source code
  • execute ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg to get the installation wheel tensorflow-1.1.0-cp27-cp27mu-linux_x86_64.whl
  • install the tensorflow package sudo pip install /tmp/tensorflow_pkgtensorflow-1.1.0-cp27-cp27mu-linux_x86_64.whl, it will also install funcsigs mock pbr protobuf.
  • verify the installation
$python
>>import tensorflow as tf
>>hello=tf.constant('Hello, tensorflow!')
>>sees=tf.Session()
>>print sees.run(hello)
Hello, tensorflows!
>>a=tf.constant(10)
>>b=tf.constant(32)
>>print sees.run(a+b)
42

Usage

It is consistent with the usage described above.

binseeker's People

Contributors

buptssegj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

binseeker's Issues

Semantic Emulation

hello,I want to know the execution sequence of the subsequent semantic simulation, which is not mentioned in the document. The registeroffset module in emulate-x86.py was also not found. Looking forward to your reply

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.