Giter Club home page Giter Club logo

data-mining-techniques's Introduction

DM

data-mining-techniques's People

Contributors

jmitnik avatar ruimtekapper avatar thomasdegier avatar

Watchers

 avatar  avatar

data-mining-techniques's Issues

Todo's Assignment 2

Code

Task 0: Preparation

  • Download dataset
  • Load in the data-set using a load_expedia_data function.

Task 1: Business Understanding

Related work research, etc.

Task 2: Data Understanding

  • Transform variables for pandas

    • Impute null / missing variables:
      • visitor_hist_starrating,
      • visitor_hist_adr_usd,
      • prop_review_score,
      • prop_location_score2,
      • srch_query_affinity_score,
      • orig_destination_distance,
      • comp1_rate,
      • comp1_inv,
      • comp1_rate_percent_diff,
      • comp2_rate,
      • comp2_inv,
      • comp2_rate_percent_diff,
      • comp3_rate,
      • comp3_inv,
      • comp3_rate_percent_diff,
      • comp4_rate,
      • comp4_inv,
      • comp4_rate_percent_diff,
      • comp5_rate,
      • comp5_inv,
      • comp5_rate_percent_diff,
      • comp6_rate,
      • comp6_inv,
      • comp6_rate_percent_diff,
      • comp7_rate,
      • comp7_inv,
      • comp7_rate_percent_diff,
      • comp8_rate,
      • comp8_inv,
      • comp8_rate_percent_diff,
      • gross_bookings_usd
  • Generate statistics of the data (mean, spread, etc)

  • Generate meaningful plots

  • Mark a number of manual features as most interesting (high correlations, etc)?

  • Maybe do some more in-depth analysis, such as Chi-square test for categorical variables?

Task 3: Data Preparation

  • Impute missing values
  • Apply feature selection
  • Encode to feature encodings (SK-learn)

Task 4: Modeling and Evaluation

  • Test various models (motivate your choices here)

Paper

  • Setup a paper format
    ...

Iteration 2: Todo's for improving our pipeline

General

  • Deal with categories in categorical variables that might not exist in the test-set
  • Add csv results with DCG score, for each config setup.

Feature Engineering [Sami]

  • Combine competitor
  • Average location
  • IsInSameCountry
  • IsInCountryAsDestination
  • DateTime

Feature Impute [Thomas]

  • Groupby hotel or user, and fill in their nans

Feature Selection [Thomas]

  • Explore which models are better (SVC)

  • Optional: apply PCA on the data -> throw this into the algorithm as well.

Business Understanding [Thomas / Sami / Jonathan]

  • Find concrete inspirations (Tips for each)

Todo's Assignment 1

Todo's part 1

Coding

  • Finish data transformations [Sami]
  • Encode unfinished data transformations (sklearn stuff) [Jonathan]
  • Move all classes below 5 to an 'others' class

Writing

  • Write about preprocessing: feature engineering, explanations

  • Statistics and correlations

  • Plots about featues and resulting models

  • Discussion about dataset / observations/ models etc

  • Algorithms and parameters descriptions

  • Which features were selected (algorithm-wise)

  • Evaluation (performances)

  • Compare results

Todo's part 2

Coding

  • Finish plots [Thomas]
  • Get some basic statistics (maybe variance / correlations/ etc)
  • Finish data encoding [Jonathan]
  • Train 2 models [jonathan]
  • Gather a few metrics (Recall / Precision)

Writings

  • Write about plots and maybe a few statistics of the features [thomas]

  • Write about our feature transformations (which one we left out, etc) [thomas/jona]

  • Write about classifiers, their results and why we used them [jonathan]

Todo's part 3

3a - Kaggle [sami]

  • Describe competition
  • Describe some winning technique
  • Do some analysis

3b - MAE vs MSE [Thomas]

Done

3c - SmsCollective [jonathan]

  • Describe techniques used
  • Describe what to do for transformations
  • Describe which models to use
  • Analysis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.