Giter Club home page Giter Club logo

western-oc2-lab / intrusion-detection-system-using-machine-learning Goto Github PK

View Code? Open in Web Editor NEW
325.0 10.0 84.0 18.1 MB

Code for IDS-ML: intrusion detection system development using machine learning algorithms (Decision tree, random forest, extra trees, XGBoost, stacking, k-means, Bayesian optimization..)

License: MIT License

Jupyter Notebook 100.00%
machine-learning random-forest decision-tree xgboost bayesian-optimization hyperparameter-optimization hpo kmeans python-examples intrusion-detection

intrusion-detection-system-using-machine-learning's Introduction

Intrusion-Detection-System-Using-Machine-Learning

This repository contains the code for the project "IDS-ML: Intrusion Detection System Development Using Machine Learning". The code and proposed Intrusion Detection System (IDSs) are general models that can be used in any IDS and anomaly detection applications. In this project, three papers have been published:

The code introduction of this repository is publicly available at:

This repository proposed three intrusion detection systems by implementing many machine learning algorithms, including tree-based algorithms (decision tree, random forest, XGBoost, LightGBM, CatBoost etc.), unsupervised learning algorithms (k-means), ensemble learning algorithms (stacking, proposed LCCDE), and hyperparameter optimization techniques (Bayesian optimization)**.

Paper Abstract

Paper 1: Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles

  The use of autonomous vehicles (AVs) is a promising technology in Intelligent Transportation Systems (ITSs) to improve safety and driving efficiency. Vehicle-to-everything (V2X) technology enables communication among vehicles and other infrastructures. However, AVs and Internet of Vehicles (IoV) are vulnerable to different types of cyber-attacks such as denial of service, spoofing, and sniffing attacks. An intelligent IDS is proposed in this paper for network attack detection that can be applied to not only Controller Area Network (CAN) bus of AVs but also on general IoVs. The proposed IDS utilizes tree-based ML algorithms including decision tree (DT), random forest (RF), extra trees (ET), and Extreme Gradient Boosting (XGBoost). The results from the implementation of the proposed intrusion detection system on standard data sets indicate that the system has the ability to identify various cyber-attacks in the AV networks. Furthermore, the proposed ensemble learning and feature selection approaches enable the proposed system to achieve high detection rate and low computational cost simultaneously.

Figure 1: The overview of the tree-based IDS model.

Paper 2: MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles

  Modern vehicles, including connected vehicles and autonomous vehicles, nowadays involve many electronic control units connected through intra-vehicle networks to implement various functionalities and perform actions. Modern vehicles are also connected to external networks through vehicle-to-everything technologies, enabling their communications with other vehicles, infrastructures, and smart devices. However, the improving functionality and connectivity of modern vehicles also increase their vulnerabilities to cyber-attacks targeting both intra-vehicle and external networks due to the large attack surfaces. To secure vehicular networks, many researchers have focused on developing intrusion detection systems (IDSs) that capitalize on machine learning methods to detect malicious cyber-attacks. In this paper, the vulnerabilities of intra-vehicle and external networks are discussed, and a multi-tiered hybrid IDS that incorporates a signature-based IDS and an anomaly-based IDS is proposed to detect both known and unknown attacks on vehicular networks. Experimental results illustrate that the proposed system can accurately detect various types of known attacks on the CAN-intrusion-dataset representing the intra-vehicle network data and the CICIDS2017 dataset illustrating the external vehicular network data.
  The proposed MTH-IDS framework consists of two traditional ML stages (data pre-processing and feature engineering) and four tiers of learning models:

  1. Four tree-based supervised learners — decision tree (DT), random forest (RF), extra trees (ET), and extreme gradient boosting (XGBoost) — used as multi-class classifiers for known attack detection;
  2. A stacking ensemble model and a Bayesian optimization with tree Parzen estimator (BO-TPE) method for supervised learner optimization;
  3. A cluster labeling (CL) k-means used as an unsupervised learner for zero-day attack detection;
  4. Two biased classifiers and a Bayesian optimization with Gaussian process (BO-GP) method for unsupervised learner optimization.

Figure 2: The overview of the MTH-IDS model.

Paper 3: LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles

  Modern vehicles, including autonomous vehicles and connected vehicles, have adopted an increasing variety of functionalities through connections and communications with other vehicles, smart devices, and infrastructures. However, the growing connectivity of the Internet of Vehicles (IoV) also increases the vulnerabilities to network attacks. To protect IoV systems against cyber threats, Intrusion Detection Systems (IDSs) that can identify malicious cyber-attacks have been developed using Machine Learning (ML) approaches. To accurately detect various types of attacks in IoV networks, we propose a novel ensemble IDS framework named Leader Class and Confidence Decision Ensemble (LCCDE). It is constructed by determining the best-performing ML model among three advanced ML algorithms (XGBoost, LightGBM, and CatBoost) for every class or type of attack. The class leader models with their prediction confidence values are then utilized to make accurate decisions regarding the detection of various types of cyber-attacks. Experiments on two public IoV security datasets (Car-Hacking and CICIDS2017 datasets) demonstrate the effectiveness of the proposed LCCDE for intrusion detection on both intra-vehicle and external networks.

Figure 3: The overview of the LCCCDE IDS model.

Implementation

Dataset

CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems

  • Publicly available at: https://www.unb.ca/cic/datasets/ids-2017.html
  • For the purpose of displaying the experimental results in Jupyter Notebook, the sampled subsets of CICIDS2017 is used in the sample code. The subsets are in the "data" folder.

CAN-intrusion dataset, a benchmark network security dataset for intra-vehicle intrusion detection

Code

  • Tree-based_IDS_GlobeCom19.ipynb: code for the paper "Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles"
  • MTH_IDS_IoTJ.ipynb: code for the paper "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles"
  • LCCDE_IDS_GlobeCom22.ipynb: code for the paper "LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles"

Machine Learning Algorithms

  • Decision tree (DT)
  • Random forest (RF)
  • Extra trees (ET)
  • XGBoost
  • LightGBM
  • CatBoost
  • Stacking
  • K-means

Hyperparameter Optimization Methods

  • Bayesian Optimization with Gaussian Processes (BO-GP)
  • Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)

If you are interested in hyperparameter tuning of machine learning algorithms, please see the code in the following link:
https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms

Requirements & Libraries

Contact-Info

Please feel free to contact us for any questions or cooperation opportunities. We will be happy to help.

Citation

If you find this repository useful in your research, please cite one of the following two articles as:

L. Yang, A. Moubayed, I. Hamieh and A. Shami, "Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles," 2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1-6, doi: 10.1109/GLOBECOM38437.2019.9013892.

@INPROCEEDINGS{9013892,
  author={Yang, Li and Moubayed, Abdallah and Hamieh, Ismail and Shami, Abdallah},
  booktitle={2019 IEEE Global Communications Conference (GLOBECOM)}, 
  title={Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles}, 
  year={2019},
  pages={1-6},
  doi={10.1109/GLOBECOM38437.2019.9013892}
  }

L. Yang, A. Moubayed, and A. Shami, “MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles,” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 616-632, Jan.1, 2022, doi: 10.1109/JIOT.2021.3084796.

@ARTICLE{9443234,
  author={Yang, Li and Moubayed, Abdallah and Shami, Abdallah},
  journal={IEEE Internet of Things Journal}, 
  title={MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of Vehicles}, 
  year={2022},
  volume={9},
  number={1},
  pages={616-632},
  doi={10.1109/JIOT.2021.3084796}}

L. Yang, A. Shami, G. Stevens, and S. DeRusett, “LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles," in 2022 IEEE Global Communications Conference (GLOBECOM), 2022, pp. 1-6, doi: 10.1109/GLOBECOM48099.2022.10001280.

@INPROCEEDINGS{10001280,
  author={Yang, Li and Shami, Abdallah and Stevens, Gary and de Rusett, Stephen},
  booktitle={GLOBECOM 2022 - 2022 IEEE Global Communications Conference}, 
  title={LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles}, 
  year={2022},
  pages={3545-3550},
  doi={10.1109/GLOBECOM48099.2022.10001280}}

intrusion-detection-system-using-machine-learning's People

Contributors

liyanghart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

intrusion-detection-system-using-machine-learning's Issues

Regarding CICIDS2017 Dataset

I have the CICIDS2017 dataset file but after concatinating all Days .CSVs, the merged file is getting too long and it is not getting processed. Coould you please provide the CICIDS2017 file (as only sample file is given in data folder) to execute the code successively.

I have a question about code

In function Anomaly_IDS, there has one code acc = metrics.accuracy_score(y2, result2), but i get a error like "Unresolved reference 'y2'", can i get your help

Problems with oversampling

After setting up the oversampling parameters, when oversampling the training set of X and Y, the following error occurs please tell me how to solve it。
ValueError: Unknown label type: unknown. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

Lack of memory

How did you exactly run the Tree-base jupyter notebook on the original CIDIDS2017 dataset, it's insane!!! 16 GB of memory is far from being enough to support the whole exps.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.