Giter Club home page Giter Club logo

network-traffic-analysis-using-machine-learning's Introduction

ML Classification - Network Traffic Analysis

This project is designed to analyze and classify real network traffic data to differentiate between malicious and benign traffic records. By comparing and fine-tuning several Machine Learning algorithms, it aims to achieve the highest accuracy with the lowest false positive and negative rates.

Dataset: Aposemat IoT-23

The dataset utilized in this project is CTU-IoT-Malware-Capture-34-1, a part of the Aposemat IoT-23 dataset. This labeled dataset includes malicious and benign IoT network traffic and was created through the Avast AIC laboratory with support from Avast Software.

Data Classification Process

The project consists of four phases, each represented by a corresponding notebook within the notebooks directory. Intermediate data files are stored in the data directory, while trained models are kept in the models directory.

Phase 1: Initial Data Cleaning

Notebook: initial-data-cleaning.ipynb

This phase involves the initial exploration and cleaning of the dataset:

  1. Load the raw dataset into a pandas DataFrame.
  2. Review dataset summary and statistics.
  3. Fix combined columns.
  4. Remove irrelevant columns.
  5. Correct unset values and validate data types.
  6. Inspect the cleaned dataset.
  7. Save the cleaned dataset to a CSV file.

Phase 2: Data Processing

Notebook: data-preprocessing.ipynb

In this phase, the focus is on processing and transforming the data:

  1. Load the dataset into a pandas DataFrame.
  2. Review dataset summary and statistics.
  3. Analyze the target attribute.
  4. Encode the target attribute using LabelEncoder.
  5. Handle outliers with the Inter-quartile Range (IQR).
  6. Encode IP addresses.
  7. Manage missing values:
  8. Scale numerical attributes with MinMaxScaler.
  9. Handle categorical features: manage rare values and apply One-Hot Encoding.
  10. Verify the processed dataset and save it to a CSV file.

Phase 3: Model Training

Notebook: [model-training.ipynb])

This phase includes training and evaluating various classification models:

  1. Naive Bayes: ComplementNB
  2. Decision Tree: DecisionTreeClassifier
  3. Logistic Regression: LogisticRegression
  4. Random Forest: RandomForestClassifier
  5. Support Vector Classifier: SVC
  6. K-Nearest Neighbors: KNeighborsClassifier
  7. XGBoost: XGBClassifier

Evaluation Method:

The results of each model are analyzed and compared.

Phase 4: Model Tuning

Notebook: model-tuning.ipynb

This phase focuses on fine-tuning the best-performing model:

The performance of the model is analyzed both before and after tuning.


By following these steps, the project aims to effectively classify network traffic and detect malicious activities with high accuracy.

network-traffic-analysis-using-machine-learning's People

Contributors

ashraygattani avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.