Giter Club home page Giter Club logo

vdpython's Introduction

VDPython

VulDeePecker algorithm implemented in Python

VulDeePecker

  • Detects exploitable code in C/C++
  • Uses N-grams and deep learning with LSTMs to train detection model
  • Invents idea of code gadgets for semantically-related code
    • Code gadgets are vectorized for input to neural network
    • [Training/testing set for this project includes existing code gadgets and vulnerability classification]
  • Trained on two vulnerability types
  • Paper
  • GitHub

Running project

  • To run program, use this command: python vuldeepecker.py [gadget_file], where gadget_file is one of the text files containing a gadget set
  • Program has 3 parts:
    • Performing gadget "cleaning"
      • Remove comments, string/character literals
      • Replacing all user-defined variables and functions with VAR# and FUN#, respectively
        • The # is an integer identifying the user-defined variable/function within the gadget
        • Note: this identifier only applies within the scope of the gadget
    • Vectorize gadget
      • Gadgets are parsed, tokenized, and transformed to vectors of embeddings
      • Vectors are normalized to a constant length through either truncation or padding
    • Train and test neural model
      • Gadget vectors are used as input to train the neural model
      • Data is split into training set and testing set
      • Neural model is trained, tested, and accuracy is reported

Code Files

  • vuldeepecker.py
    • Interface to project, uses functionality from other code files
    • Fetches each gadget, cleans, buffers, trains Word2Vec model, vectorizes, passes to neural net
  • clean_gadget.py
    • For each gadget, replaces all user variables with "VAR#" and user functions with "FUN#"
    • Removes content from string and character literals
  • vectorize_gadget.py
    • Converts gadgets into vectors
    • Tokenizes gadget (converts to symbols, operators, keywords)
    • Uses Word2Vec to convert tokens to embeddings
    • Combines token embeddings in a gadget to create 2D gadget vector
  • blstm.py
    • Defines Bidirectional Long Short Term Memory neural network for training/prediction of vulnerabilities
    • Gets gadget vectors as input
    • Implements functions for both training and testing the model
    • Uses parameters defined in VulDeePecker paper

vdpython's People

Contributors

alyssakatz avatar deveshr avatar johnb110 avatar rtx-brian-duncan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vdpython's Issues

配置环境

小白求问一下代码的配置环境,大佬们看看我

代码运行需要什么样的 python环境

使用 Tensorflow 2.x 作为 keras 后端,报如下错误

Traceback (most recent call last):
  File ".\vuldeepecker.py", line 100, in <module>
    main()
  File ".\vuldeepecker.py", line 96, in main
    blstm.train()
  File "xxx\blstm.py", line 72, in train
    epochs=4, class_weight=self.class_weight)
  File "C:\Anaconda\Lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "C:\Anaconda\Lib\site-packages\tensorflow\python\keras\engine\training.py", line 815, in fit
    model=self)
  File "C:\Anaconda\Lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 1124, in __init__
    if class_weight:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

something wrong with the parameters when config the BLSTM

There is a ValueError when I git down all the code and run the code follow the guidance.
Traceback is follow:

Traceback (most recent call last): File "vuldeepecker.py", line 100, in main() File "vuldeepecker.py", line 95, in main blstm = BLSTM(df,name=base) File "D:\vuldetect\dataset\VulDeePecker-master\VDPython-master\blstm.py", line 40, in init test_size=0.2, stratify=labels[resampled_idxs]) File "F:\Anaconda\envs\tensorflow\lib\site-packages\sklearn\model_selection_split.py", line 2100, in train_test_split
default_test_size=0.25) File "F:\Anaconda\envs\tensorflow\lib\site-packages\sklearn\model_selection_split.py", line 1782, in _validate_shuffle_split
train_size) ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.