Giter Club home page Giter Club logo

trovon's Introduction

Learning from what we know: How to perform vulnerability prediction using noisy historical data

This repository contains the source code and dataset for the paper Learning from what we know: How to perform vulnerability prediction using noisy historical data, published in Empirical Software Engineering (EMSE).

The paper is available here: Paper

The bib entry for citing the paper is available here: Cite

In addition to the source code of our proposed approach TROVON, we also implement existing approaches due to unavailable authors' implementation. Our implementations of the existing approaches which we compare TROVON with, are also available in this repository. Please refer to the details below.


Dataset

The dataset is composed of the following:

  1. We gathered vulnerabilities, (i.e., the vulnerable and the corresponding fixed components) of the 36 releases of Linux Kernel, 10 releases of Openssl, and 10 releases of Wireshark. For this task, we use VulData7 which is a vulnerability patch gathering tool that used commit IDs provided by National Vulnerability Database (NVD) to gather the aforementioned. These are available in the vulnerabilities directory.

  2. We also gathered codebase for the aforementioned releases. For this task, we use FrameVPM which is a framework built to evaluate and compare vulnerability prediction models. The framework is available here.


Source code

The source code of the vulnerability prediction approaches - TROVON and the existing (that we compared TROVON with) are available as below mentioned:

  1. Source code of our proposed approach TROVON is available in the code directory.

  2. Source code to replicate the following approaches - Software Metrics, Text Mining, Imports, and Function Calls, is available in the FrameVPM repository.

  3. Source code of our implementation of the approach Devign is available in the devign directory.

  4. Source code of our implementation of the approaches LSTM and LSTM-RF is available in the lstm-rf directory.


Tools required/dependencies to be taken care of

  1. Apache Maven
  2. srcML
  3. seq2seq
  4. Tkinter
  5. TensorFlow
  6. PyYAML
  7. Perl

Model training

Please refer to the script train.sh

./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0

For model configuration, please refer length_50-l-1-2.yml. It is configured to train on sequences of length 50, which can be changed based on your requirement.


Model testing

Please refer to the script test.sh

./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]

trovon's People

Contributors

garghub avatar pstoeckle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

trovon's Issues

How can i determine the mutant items?

I want to reproduce this paper, but there are some problems with the java code. How can i determine the mutant items and then i can get the lhs or rhs files to train the seq2seq model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.