Giter Club home page Giter Club logo

capstone_westnilevirus's Introduction

Capstone: West Nile Virus

Competition Description:

West Nile virus is most commonly spread to humans through infected mosquitos. Around 20% of people who become infected with the virus develop symptoms ranging from a persistent fever, to serious neurological illnesses that can result in death.

In 2002, the first human cases of West Nile virus were reported in Chicago. By 2004 the City of Chicago and the Chicago Department of Public Health (CDPH) had established a comprehensive surveillance and control program that is still in effect today.

Every week from late spring through the fall, mosquitos in traps across the city are tested for the virus. The results of these tests influence when and where the city will spray airborne pesticides to control adult mosquito populations.

Given weather, location, testing, and spraying data, this competition asks you to predict when and where different species of mosquitos will test positive for West Nile virus. A more accurate method of predicting outbreaks of West Nile virus in mosquitos will help the City of Chicago and CPHD more efficiently and effectively allocate resources towards preventing transmission of this potentially deadly virus.

Submissions are evaluated on area under the ROC curve between the predicted probability that West Nile Virus is present and the observed outcomes.

Approach Summary:

Data Analysis:

  • Matplotlib, Seaborn to visualise key features.

Data Pre-Processing:

  • PCA used to compress weather features
  • RandomOversampling to deal with 5% representation from the minority class

Modelling:

  • Imblearn pipeline to allow correct re-sampling during GridSearchCV cross-validation
  • Classification models compared: Logistic Regression, Decision Tree, Random Forest, Bagging, Gradient Boosting.
  • Cross-validation scoring method = roc-auc

Prediction:

  • Tableau used to map quality of predictions geographically and temporally.
  • Tableau .twb file also provided in this repo

Outcome:

  • XGBoosting was the selected classifier, which was vastly superior to other classifiers, however despite achieving a cross-validated auc of 93%, the test score was 69%.
  • Conclusions detailed in the python notebook

Potential Improvements:

  • Apply a Neural Net to improve performance vs. test set.

capstone_westnilevirus's People

Contributors

noahberhe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.