Giter Club home page Giter Club logo

mars-spectrometry-14th-place-solution's Introduction

Mars-spectrometry-14th-place-solution

This is my final solution to the Mars-spectrometry challenge by NASA hosted on drivendata.

check out this competetion here

Why this challenge ?

TLDR:

NASA's rover sends back data after conducting evolutionary gas analysis on the soil samples it collected. We need to model the data if given 10 compounds are present in the sample.

The story so far:

We all are curious about our neighbouring red planet. So NASA's been sending rovers to the surface of Mars, These rovers move on the surface of the mars and collect various soil samples. These rovers are also equipped with gas evoultionary analysis(EGA) instruments. The collected soil samples are heated at different temperatures and the evolved gaseous ions are observed. Based on the abundance of evolved ions we can tell what kind of chemical compostion the soil sample is made of. This is called EGA

The data generated from the whole EGA is sent back to the earth. Now Scientists need to model this data to find clues about presence of any life on the martian soil.

The data is available at competition website (You have to login to datadriven). We as Data scientists or Machine Learning practioners have to model this data to find out the best possible model and to make the best predictions.

About the Problem:

  • The very domain/nature of the problem is unique.
  • The data has less samples and lots of features.
  • The problem requires us to classify the prescence of 10 targets ( carbonate, iron_oxide, sulfate etc...)
  • The metric used is aggregated_log_loss.

My solution:

  • My final soulution is simple average of caliberated predictions of ensemble models.
  • I used logistic regression to find out the most relevant features (feature selection) [selected 10s of features from more than 10k features].
  • Added other features like total_abundance for each sample, relative abundance, changes in abundance of ions etc... (feature engineering).
  • Generated 8 different types of training sets based on various temperature and time bins (feature engineering).
  • used 20 fold cross validation.
  • used 10 Catboost classifiers to predict 10 targets (Binary classification fashion) on each dataset.
  • Caliberated every models predictions to better match the targets.
  • Stacked the predictions in simple average fashion.

My best predictions are avg_preds.csv from caliberate and predict notebook with agg_logloss of 0.13 on private leaderboard.

Other things I have tried

  • Nerual networks --> not very great for this competition
  • Autoencoders and denoising autoencoders and they too didn't workout
  • upsampling the minority class
  • Automated feature Engineering (feature tools) didn't workout due to huge data and low compute resources.

Thank you

Have a nice day๐Ÿ˜Š

mars-spectrometry-14th-place-solution's People

Contributors

k-loki avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.