Giter Club home page Giter Club logo

machine-learning-challenge's Introduction

Machine Learning - Exoplanet Exploration Analysis

Objective: Create machine learning models capable of classifying candidate exoplanets from NASA Kepler space telescope raw dataset

Background: Kepler Space Observatory had verified 1284 new exoplanets as of May 2016. As of October 2017 there are over 3000 confirmed exoplanets total. The raw dataset exoplanet_data.csv is a cumulative record of all observed Kepler "objects of interest."(Source)

The models below were chosen on the basis of Binary Classification Predictive modeling where class label is predicted for a given example of input data(Source). The planets would either be confirmed as a new exoplanet or not.

Analysis Report

Comparative to all the algorithms below, the Random Forests and Logistic Regression are the models that reached greater than 85% accuracy, with Random Forests at 89%. If one were to make predictions of exoplanets from these five models, the best model would be Random Forests. However, the limitations of the accuracy scores are that there were no specific features selected or dropped when training the model so erroneous data might have been included in the Hyperparameter Tuning phase causing our test accuracy to be skewed.

To improve accuracy the following may be considered:

  • Using effective and efficient value hyperparameters when using Hyperparameter Tuning with GridSearch
  • Removing features in the dataset that do not provide substance to classifying the exoplanets to reduce processing time
  • Controlling the prevention of overfitting and under-fitting for each model

Model Rank

1. Random Forests

  • Predictive Test Accuracy: 0.890
  • Best Grid score: 0.873

2. Logistic Regression

  • Predictive Test Accuracy: 0.880
  • Best Grid score: 0.885

3. Decision Trees

  • Predictive Test Accuracy: 0.799
  • Best Grid score: 0.790

4. K-Nearest Neighbors(KNN)

  • Predictive Test Accuracy: 0.660
  • Best Grid score: 0.672

5. Support Vector Machine(SVM)

  • Predictive Test Accuracy: 0.600
  • Best Grid score: 0.605

Additional Information

Fastest run time

  • Decision Trees at 5.1 seconds with 216 fits

Slowest run time

  • K-Nearest Neighbors(KNN) at 29.1 minutes with 28420 fits

Challenges

  • Capturing all Hyperparameter Tuning model parameters for each model
  • Run time for fitting after GridSearchCV

Model and Dataset Visualizations

  • Extra visualizations can be found in the data-visualizations directory

machine-learning-challenge's People

Contributors

diannejardinez avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.