Giter Club home page Giter Club logo

ml_classification_kickstarter's People

Watchers

 avatar

ml_classification_kickstarter's Issues

Data Exploration

  • Check for correlation
  • Create "fast" plots
  • Check data distribution
  • Check balancing of categories
  • Check hypothesis (still feasible?)

Evaluation

  • Set benchmark model (dummyclassifier)
  • Check defined metric
  • Define standardized evaluation functions
  • Analyze a subgroup of errors
  • Partial dependence plots (?)
  • Computation time (issue?)
  • Show difference between the threee selected models

Data Mining

  • Merge CSV Data
  • Check different csv files for integrity
  • check column names
  • Update Requirements.txt

Data cleaning

  • Create MASTER notebook
  • Create "personal" notebooks
  • Find strategy how to deal with NAN, missing values, catergorical variables...
  • Define use of "log scaling" dependent on features
  • Check for data leakage

Predictive Modeling

  • Which classifier to use?
  • Build three different models
  • Split train/test
  • Use of pipeline? (scaling)?

Data Visualization

  • Seaborn & Matplotlib
  • Create meaningfull plot first, then beautify them
  • Uniform colormap/palette
  • Error plot (?)
  • Use plot to validate or reject hypothesis

Final Test

  • Check errors on train / test data
  • Evaluate Bias / Variance
  • Plot confusion matrix, check evaluation metric

Feature Engineering

  • Necessary?
  • String extraction?
  • Polinominal features?
  • Länge der Beschreibung?
  • Check data leakage

Model- and Hyperparametertuning

  • Define hyperparameters subject to to optimization
  • Random search followed by Gridsearch (?)
  • Documentation of hyperparameters of best model
  • Upsampling / downsampling (?)

Create Slides

  • Use google slides
  • Define storytelling
  • Define audience
  • Present extra findings

Business Understanding

  • Get background information
  • Formulate hypothesis
  • Read through Kaggle competition comments?
  • Are there other things to predict besides funding?
  • What metric to use?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.