signate-studentcup2019's Introduction

Student Cup 2019 1st place solution

Team ZoZei

Public / Private: 10817.55866 / 11713.39842

本モデルの外部データと学習済みモデルはYandex Diskからダウンロード可能。
学習用データと検証用データは各自コンペ公式サイトからダウンロードすること。

データの準備

┣ data/
    ┣ train.csv             - 学習用データ
    ┣ test.csv              - 検証用データ
    ┣ csv_roseneki_XX.csv   - 駅固有情報(https://opendata-web.site/station/)
      (XX = 11, 12, 13, 14)
    ┣ 13_2018.csv           - 位置参照情報(http://nlftp.mlit.go.jp/isj/)
    ┣ L02-XXP-2K_13.csv     - 公示地価(http://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-L01-v2_5.html)
      (XX = 30, 28)
    ┣ station_users.csv     - 駅乗降者数(https://opendata-web.site/station/rank/)
    ┣ tokyo_population.csv  - 人口情報(https://www.e-stat.go.jp/gis/)
    ┣ google_map_XX.csv     - google map apiで取得した座標情報
      (XX = train, test)

環境構築

Python 3.6.7
各種パッケージは任意の仮想環境上でrequirements.txtの環境を導入してください。

IMPORTANT

各パッケージのバージョンはrequirements.txtの通りにしてください。

フォルダ構成

┣ data/
    ┣ train.csv
    ...
┣ preprocess/
    ┣ preprocess.py         - データクレンジング
    ┣ generate_features.py  - 特徴量生成
┣ train/
    ┣ interpolate.py        - 建物IDによる内部回帰
    ┣ train_cat_vXX.py      - モデルの訓練
      (XX = 56, 58, 59)
┣ predict/
    ┣ predict.py            - Stacking及び予測ファイル作成
┣ run.py                    - 実行ファイル
┣ predictions/
    ┣ ...                   - 中間ファイル
┣ models/
    ┣ ...                   - 学習済みモデル

実行

./run.sh [-d] [-l]

(オプション)
-d : データ前処理をスキップする。train/test_complete.csvがすでにある場合に使用する。
-l : 学習済みモデルを使用する。modelsに学習済みモデルが存在しない場合は使用しないこと。

謝辞

Team members

signate-studentcup2019's People

Contributors

Stargazers

Watchers

signate-studentcup2019's Issues

TypeError: ufunc 'isnat' is only defined for datetime and timedelta.

First of all, I want to say thank you for sharing this great code.
When I tried to run generate_feature.py, I got this error.
It comes from pandas==0.25.1 and looks like this issue.
But, we can avoid it easily by using the following code.

for col in ID_COL:
    if col not in ['age', 'maxfloor']:
        train['buildingid'] += train[col].fillna(0).astype(str).str[:5]
        test['buildingid'] += test[col].fillna(0).astype(str).str[:5]
    elif col in ['age']:
        train['buildingid'] += col[0] + \
            (train[col].fillna(-1) * 100).astype(int).astype(str)
        test['buildingid'] += col[0] + \
            (test[col].fillna(-1) * 100).astype(int).astype(str)
    elif col in ['maxfloor']:
        train['buildingid'] += col[3] + \
            (train[col].fillna(-1) * 100).astype(int).astype(str)
        test['buildingid'] += col[3] + \
            (test[col].fillna(-1) * 100).astype(int).astype(str)

Thank you.

Recommend Projects