Giter Club home page Giter Club logo

rimtouny / feature-selection Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.64 MB

Delved into advanced techniques to enhance ML performance during the uOttawa 2023 ML course. This repository offers Python implementations of Naïve Bayes (NB) and K-Nearest Neighbor (KNN) classifiers on the MCS dataset.

License: MIT License

Jupyter Notebook 100.00%
auto-encoder bernoulli-naive-bayes chi-square dimensionality-reduction feature-selection information-gain knn-classification mutual-information pca tsne-visualization

feature-selection's Introduction

Mobile Crowd Sensing (MCS) Data Analysis with NB and KNN Classifiers Using Differnet Feature Selection Methods

This repository contains Python implementations of Naïve Bayes (NB) and K-Nearest Neighbor (KNN) classifiers applied on the MCS dataset. Explored advanced techniques to improve machine learning performance during the 2023 uOttawa ML course.

  • Required libraries: scikit-learn, pandas, matplotlib.
  • Execute cells in a Jupyter Notebook environment.
  • The uploaded code has been executed and tested successfully within the Google Colab environment.

Binary-class classification problem

Task is to classify the MCS dataset legitimacy status: Legitimate / Fake.

Independent Variables:

  • Features include ID, Latitude, Longitude, Day, Hour, Minute, Duration, RemainingTime, Resources, Coverage, OnPeakHours, GridNumber.

Target variable:

  • 'Legitimacy' column represents the target with two classes: 'Legitimate' and 'Fake'.

Key Tasks Undertaken

  1. Dataset Splitting based on 'Day' Feature:

    • Created training (days 0, 1, 2) and test (day 3) datasets based on 'day' feature values. image
  2. Baseline Performance of NB and KNN:

    • Presented confusion matrices and F1 scores as baseline performance measures for both classifiers.

      • Bernoulli Naive Bayes merge_from_ofoct

      • K-Nearest Neighbors merge_from_ofoct (1)

      • 2D TSNE plots for Training and Testing Set merge_from_ofoct
  3. Dimensionality Reduction (DR) using PCA and Auto Encoder (AE):

    • Explored PCA and AE methods to determine optimal reduced dimensions based on F1 scores of test datasets.

    • Plotted the number of components vs. F1 score for both classifiers, showcasing the best performance. merge_from_ofoct (1)

      • Maximum of PCA-Bernoulli Naive Bayes: 93.31858407079646
      • Best number of n_components PCA-Bernoulli Naive Bayes: 10 merge_from_ofoct
      • Maximum of PCA-K-Nearest Neighbors: 94.81165600568585
      • Best number of n_components PCA-K-Nearest Neighbors: 2 merge_from_ofoct (1)
  4. Feature Selection with Filter and Wrapper Methods:

    • Explored feature selection methods such as Information Gain, Mutual Information, Variance Threshold, and Chi-Square to determine the optimal number of features and analyzed the relationship between the number of features and F1 scores, improving baseline performance. merge_from_ofoct (2)

    • Employed Wrapper Selection techniques like Forward Feature Elimination, Back Feature Elimination, and Recursive Feature Elimination to evaluate feature relevance. Investigated the correlation between the number of features and F1 scores, enhancing the baseline performance. merge_from_ofoct

    • Visualized results through 2D TSNE plots using the selected best method for both training and test datasets. merge_from_ofoct

      merge_from_ofoct (2)

  5. Clustering Analysis using Latitude and Longitude:

    • Explored clustering methods (K-means, SOFM, DBSCAN) on latitude and longitude features to identify legitimate-only clusters.
    • Plotted the total number of legitimate-only members in legitimate clusters against different cluster numbers for each algorithm. merge_from_ofoct (5)

feature-selection's People

Contributors

rimtouny avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.