Giter Club home page Giter Club logo

rimtouny / user-forest-cover-type-prediction Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 6.39 MB

Predicting Colorado forest cover types using diverse ML models for classification. Baseline creation, feature selection, comparison, and tuning optimize accuracy in this University of Ottawa Master's Machine Learning course final project (2023).

License: MIT License

Jupyter Notebook 100.00%
baseline-performance decisiontreeclassifier eda feature-selection knn logisticregression machine-learning naive-bayes-classifier svm tsne-visualization

user-forest-cover-type-prediction's Introduction

User Forest Cover Type Prediction

Predicting Colorado forest cover types using diverse ML models for classification. Baseline creation, feature selection, comparison ,and tuning optimize accuracy on Forest Cover Type Prediction in this University of Ottawa Master's Machine Learning course final project (2023). image

  • Required libraries: scikit-learn, pandas, matplotlib.
  • Execute cells in a Jupyter Notebook environment.
  • The uploaded code has been executed and tested successfully within the Google Colab environment.

Multi-class classification problem

Task is to classify the Forest Cover Type Prediction dataset into seven types: Spruce/Fir, Lodgepole Pine, Ponderosa Pine, Cottonwood/Willow, Aspen, Douglas-fir, and Krummholz.

Independent Variables:

  • 54 geographical Features include 'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways', 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm', 'Horizontal_Distance_To_Fire_Points', 'Wilderness_Area1' to 'Wilderness_Area4', and 'Soil_Type1' to 'Soil_Type40'.

Target variable:

  • 'Cover Type' column represents the target with 7 classes

Key Tasks Undertaken

  1. Problem’s Overview:

    • Create a conceptual figure showcasing the end-to-end data flow.
    • Illustrate insights into the problem through data flow visualization.
  2. Dataset’s Overview (EDA):

    • Present numerical information about the dataset. merge_from_ofoct (1)
  3. General Flowchart:

    • Develop a detailed flowchart illustrating each step of the project's implementation.

  4. Visualize Training and Test Sets:

    • Generate TSNE plots separately for the training and test sets to understand the problem's complexity.
      • Problem Complexity image

      • Reduction and Transformation image

  5. Obtain Baseline Performance:

    • Apply diverse ML methods (KNN, LogisticRegression, SVM, DecisionTreeClassifier, Naive Bayes Classifier) to establish a baseline. merge_from_ofoct (6)

    • Champion Model merge_from_ofoct

  6. First Improvement Strategy: Feature Selection:

    • Implement feature selection methods, including

      • Filter Selection Methods (Information Gain/Mutual Information , Feature Selection , Variance Threshold ,Chi-Square)
      • Wrapper Selection Methods (Forward Feature Elimination- Backward Feature Elimination- Recursive Feature Elimination
    • Proceed with the best-performing feature subset and ML model for subsequent stages.

      • Champion Model in Filter Selection: Information Gain
           Maximum of Feature Selection-K-Nearest Neighbors: 73.96721311475409
           Best number of n_components Feature Selection-K-Nearest Neighbors: 12
        
           Maximum of Feature Selection-Decision Tree Classifier: 76.65573770491804
           Best number of n_components Feature Selection-Decision Tree Classifier: 8

      • Champion Model in Wrapper Selection: Recursive
           Maximum of Recursive_FE-K-Nearest Neighbors: 73.96721311475409
           Best number of n_components Recursive_FE-K-Nearest Neighbors: 12
        
           Maximum of Recursive_FE-Decision Tree Classifier: 76.26229508196721
           Best number of n_components Recursive_FE-Decision Tree Classifier: 10

  7. Adding More Machine Learning Models:

    • Implement advanced models (Random Forest, ensemble techniques) to enhance performance. image

    • Compare new technique performance with the initial improvement through confusion matrices. image

user-forest-cover-type-prediction's People

Contributors

rimtouny avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.