Giter Club home page Giter Club logo

kaggledays-2019-gbdt's Introduction

kaggledays-2019-gbdt

Original workshop in Paris:

Open in Colab

First part of workshop (basics of Skopt and GBM):

Open in Colab

Second part of workshop (LightGBM, XGBoost, CAtBoost, NAS):

Open in Colab

Kaggle Days Paris

Competitive GBDT Specification and Optimization Workshop

Instructors

About the workshop

Gradient Boosting Decision Trees (GBDT) presently represent the state of the art for building predictors for flat table data. However, they seldom perform the best out-of-the-box (using default values) because of the many hyper-parameters to tune. Especially in the most recent GBDT implementations, such as LightGBM, the over-sophistication of hyper-parameters renders finding the optimal settings by hand or simple grid search difficult because of high combinatorial complexity and long running times for experiments.

Random Optimization (BERGSTRA, James; BENGIO, Yoshua. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 2012, 13.Feb: 281-305.) and Bayesian Optimization (SNOEK, Jasper; LAROCHELLE, Hugo; ADAMS, Ryan P. Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems. 2012. p. 2951-2959) are often the answer you'll find from experts.

In this workshop we demonstrate how to use different optimization approaches based on Scikit-Optimize, a library built on top of NumPy, SciPy and Scikit-Learn, and we present an easy and fast approach to set them ready and usable.

Prerequisites

You should be aware of the role and importance of hyper-parameter optimization in machine learning.

Obtaining the Tutorial Material

In order to make the workshop easily accessible, we are offering cloud access:

We also have a brief exercise that can be found at:

The solution can be found here.

All the materials can be cloned from Github at the kaggledays-2019-gbdt repository. We also have prepared a stand-alone Windows installation using WinPython (just require us for the link).

Local installation notes

In order to successfully run this workshop on your local computer, you need a Python3 installation (we suggest installing the most recent Anaconda distribution) and at least the following packages:

  • numpy >= 1.15.4
  • pandas >= 0.23.4
  • scipy >= 1.1.0
  • skopt >= 0.5.2
  • sklearn >= 0.20.2
  • lightgbm >= 2.2.2
  • xgboost >= 0.81
  • catboost >= 0.12.2

kaggledays-2019-gbdt's People

Contributors

lmassaron avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.