Giter Club home page Giter Club logo

benchmarkids's Introduction

BenchmarkIDS

Java CLI application to benchmark IDS, using WEKA's Java API and CSE-CIC-IDS2018 dataset.

This is the supporting application of my MSc dissertation on "Benchmarking behavior-based IDS using bio-inspired algorithms".

It's a very specific application, mainly aimed at automating the specific tests used on my dissertation.

Still, it´s a nice example on how to use WEKA's Java API.

The BenchmarkIDS application was created as a result of meeting some requirements related to this work, namely the flexibility provided by the API and trying to overcome some limitations of the graphic environment. The application, developed in Java, has no graphical environment and the results are only shown on the screen (they are not saved in a file).

Structure

The application consists of 4 modules: the initialization module (setup), the training module, the testing module and the scenarios module.

The initialization module is responsible for creating the data files that will be used by the training and testing modules.

Starting from the parameters passed on the command line, open the corresponding files from the dataset, performs the pre-processing tasks and generate the files for training the model (randomly selects 200000 lines from the file and divides them into two files, one with 140,000 lines (70%) for training and the other with the remaining 60,000 (30%), used to test the model) and for zero-day attack simulation testing.

The training module reads the previously generated data file and trains the model, using predefined algorithms. The parameterization of the algorithms is also pre-defined, with 3 models of each algorithm being generated.

The test module loads the previously generated models, reads the test file and the Zero-Day attack simulation file and tests and evaluates the models, generating a set of metrics indicative of the performance of each model.

The scenarios module, which implements the 4 test scenarios of my work, loads the files previously generated from the CSE-CIC-IDS2018 dataset by the Orange data-mining application, selects from one of them 140000 lines for the training phase and 60,000 lines of the second one for the test phase and applies this data to the 3 algorithms used. For reasons of statistical significance, this procedure is repeated ten times, with the data selected each time being completely independent of the others.

Compiling

I compiled the code using Netbeans version 11.2

It's absolutely necessary, for the compilation to succeed:

  • Include the Weka API, by means of including the file Weka.jar from your Weka installation directory
  • Include the Wekaclassalgos library. This library, originally developed by Jason Brownlee, is available in github at https://github.com/fracpete/wekaclassalgos

For those using Maven, there's also a Maven dependency that deals with including Wekaclassalgos library (see https://github.com/fracpete/wekaclassalgos for more information).

Information on how to use the WEKA API in Java applications can be found at Use weka in your java code

Usage

To use this application it is necessary to open the Windows command line (or, if using Linux, a shell or terminal) and change to the folder (directory) where the application distribution is located.

Initialization module

In order to execute the initialization module and thus generate the necessary data files, a command with the following structure must be executed:

java -jar BenchmarkIDS.jar setup <fich1> <fich2>

The parameters mean the following:

  • - file number that will serve as the basis for training the model
  • - file number that will serve as the basis for the zero-day attack simulation

So, for example, the command

java -jar BenchmarkIDS.jar setup 02 03

will execute the initialization module, accessing the Data_02.csv file to generate the model training data and the Data_03.csv file to create the file that will simulate the zero-day attack.

Training module

In the training module, the command structure is as follows:

java -jar BenchmarkIDS.jar training <algoritmo>

The algoritmo parameter can only assume 3 distinct values:

  • clonalg - to select the CLONALG algorithm
  • mlp - to select the Multi-Layer Perceptron algorithm
  • lvq - to select the Learning Vector Quantization algorithm

If, for example, we intended to train CLONALG models, the command to be executed would be, then:

java -jar BenchmarkIDS.jar training clonalg

Test Module

The test module includes two different options, one to test a certain algorithm and the other to generate and test the ensembles of the 3 algorithms, grouped 2 to 2.

The command structure for testing a given algorithm, including the zero-day attack simulation test, is as follows:

java -jar BenchmarkIDS.jar test <algoritmo>

The valid and supported values for the parameter algoritmo are the same as for the parameter with the same name in the training module.

To generate the ensembles and perform the respective tests, simply execute the command:

java -jar BenchmarkIDS.jar ensemble

Scenarios Module

To use the scenarios module, simply issue the command:

java -jar BenchmarkIDS.jar training <cenario>

The valid values for the parameter cenario are :

  • cenario1 - will perform the defined tests for scenario 1
  • cenario2 - will perform the defined tests for scenario 2
  • cenario3 - will perform the defined tests for scenario 3
  • cenario4 - will perform the defined tests for scenario 4

Notes

As mentioned before, this application was developed to meet specific requirements of my work. As such, some things were just simplified, like:

  • The location of the data files is hardcoded
  • The algorithms used are just the ones I used on my work
  • The patametrization of each algorithm is also hardcoded
  • The name of the data files resembles what I used on my work and is also hardcoded

About the authors

Paulo Ferreira ([email protected]) - I'm a MSc student - Cybersecurity and Computer Forensics - at Polytechnic Institute of Leiria (IPL), School of Technology and Management (ESTG)

Mário Antunes ([email protected]) - professor and Coordinator of the Master in Cybersecurity and Computer Forensics at Polytechnic Institute of Leiria (IPL), School of Technology and Management (ESTG), Department of Computer Engineering (DEI).

benchmarkids's People

Contributors

paulo-ferreira-mcif avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.