Giter Club home page Giter Club logo

otto23's Introduction

Otto23

Kaggle Tabular Competition - OTTO – Multi-Objective Recommender System 2023
- My refactored solution to get top 7% (Bronze) in this Kaggle competition
- Uses cudf and polars to speed up processing

Leaderboard Scores

Scores should be about the same as below. Slight variations are due to covisit creation

Architecture Private Public
Covisit 0.57828 0.57807
Covisit + Ranker 0.58047 0.57966

Quick start

For covisit predictions only, run steps 1-5.
For covisit + LGBMRanker predictions, run steps 1-6.

  1. Download Kaggle Otto data. Unzip in folder data
  2. Run NB 01_01 - preprocess json data into parquets. Also saves aid type dictionaries as pickle files
  3. Run NB 01_02 - Creates local train and validation data files. For local val., we train on weeks 1-3, and validate on wk 4. Wk4 validation is created based on "Train/Test Split" on the Otto Github repo. Also, the final train set is created by using weeks 2-4 for train, and the original test set (week 5)
  4. Run NB 01_03 - Create smaller versions of validation data (5 / 10 / 25 / 50 / 100 %). This lets you iterate faster. Pick smallest val. data that meets your need. 5% is a good start point
  5. Run NB 02_01 - Creates covisit matrices (function = preprocess_covisits), and saves them as pickle files. Set CV_NUM to val. data split as preferred. Set DO_LOCAL_VALIDATION (set False if you want to submit to Kaggle Otto). This notebook only submits predictions based on covisitation matrices
  6. RUN NB 03_01 - Extends off of NB 02_01. Get predictions with LGBMRanker. NB takes 50 covisitation candidates, and finds the top 20 for each aid type. Creates user and item features. Need to sweep each aid type for optimal HPs. Currently set only for 'carts' and 'orders'. Similar to NB 02_01, need to set CV_NUM and DO_LOCAL_VALIDATION. Final submissions use LGBMRanker predictions

Todos

  • Update NB 02_01 with code changes from NB 03_01. Should be minor code cleanup
  • Refactor suggest_carts function into otto_utils.py while maintaining same process speed. Starmap?
  • Add a blurb on how validation data is created

Acknowledgements

Thanks all

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.