Giter Club home page Giter Club logo

open-solution-avito-demand-prediction's Introduction

Avito Demand Prediction Challenge: open solution

This is an open solution to the Avito Demand Prediction Challenge.

More competitions ๐ŸŽ‡

Check collection of public projects ๐ŸŽ, where you can find multiple Kaggle competitions with code, experiments and outputs.

The goal

Create (entirely) open solution to this competition. We are opening not only the code, but also the process of creating it. Rules are simple:

  • Clean code and extendable solution are - in the long run - much better than current public LB position
  • This solution should - by itself - establish solid benchmark, as well as provide good base for your custom ideas and experiments.

Disclaimer

In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script ๐Ÿ˜‰.

Installation

  1. clone this repository: git clone https://github.com/minerva-ml/open-solution-avito-demand-prediction.git
  2. install requirements
  3. register to Neptune (if you wish to use it)
  4. update neptune.yaml configuration file with your data filepaths
  5. run experiment
  • with neptune:
$ neptune login
$ neptune experiment run --config neptune.yaml main.py -- train_evaluate_predict --pipeline_name main

collect submit from /output/solution-1 directory.

  • with pure python:
$ python main.py -- train_evaluate_predict --pipeline_name main

collect submit from experiment_dir directory that was specified in neptune.yaml

Get involved

You are welcome to contribute your code and ideas to this open solution. To get started:

  1. Check competition project here, on GitHub to see what we are working on right now.
  2. Express your interest in particular task by writing comment in this task, or by creating new one with your fresh idea.
  3. We will get back to you quickly in order to start working together.
  4. Check CONTRIBUTING for some more information.

User support

There are several ways to seek help:

  1. Kaggle discussion is our primary way of communication.
  2. Read project's Wiki, where we publish descriptions about the code, pipelines and supporting tools such as neptune.ml.
  3. Submit an issue directly in this repo.

open-solution-avito-demand-prediction's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-solution-avito-demand-prediction's Issues

train model on 'category name' and 'parent category name'

Train model on images, with targets (multi-output model):

  • category name
  • parent category name

Features are two vectors:

  • probability distribution over category name (softmax)
  • probability distribution over parent category name (softmax)

Timestamp Features

Explore/add timestamp features:

  • day
  • day of the week
  • month (two values only)

Then explore/add group-by features with times.

UPDATE (from @Leoniak713):

  • features are extracted
  • group-by accepts these features

Explore solution_1 results

Explora best/worst results and most important features from solution_1 in the notebooks/devbook.ipynb

Repetitive code

Is there any logic by repeting the same code twice? Why you put is_missing under train_mode flag? You can declare it out of train_mode and then in case you need return it...

    if train_mode:
        is_missing = Step(name='is_missing',
                          transformer=fe.IsMissing(**config.is_missing),
                          input_data=['input'],
                          adapter={'X': ([('input', 'X')])},
                          cache_dirpath=config.env.cache_dirpath, **kwargs)

        is_missing_valid = Step(name='is_missing_valid',
                                transformer=is_missing,
                                input_data=['input'],
                                adapter={'X': ([('input', 'X_valid')])},
                                cache_dirpath=config.env.cache_dirpath, **kwargs)

        return is_missing, is_missing_valid

    else:
        is_missing = Step(name='is_missing',
                          transformer=fe.IsMissing(**config.is_missing),
                          input_data=['input'],
                          adapter={'X': ([('input', 'X')])},
                          cache_dirpath=config.env.cache_dirpath, **kwargs)

        return is_missing

make submit, run master

Initialize your participation in the challenge:

  • make submit,
  • run code from branch master,
  • make submit calculated from this code,
  • review discussion, add some issues.

A lot of missing values

How to handle missing values?
Currently replace with 0 +add new binary nan/not nan is used

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.