Giter Club home page Giter Club logo

kdd_cup_2018's Introduction

KDD_CUP_2018

获得奖项

  • KDD CUP 2018 第三名;
  • 最佳长期预测第二名;
  • 最后十天预测第五名;

KDD CUP 2018 解决方案


队名: 头号玩家@ICA@CortexLabs

队员: 周杰(华东师范大学) 蔡恒兴(CortexLabs,中山大学)


KDD CUP 2018 Top4 链接(转发,侵权请联系删除): https://github.com/piupiuup/kdd2018/blob/master/.gitignore/code

KDD CUP 初赛 Top1 链接(转发,侵权请联系删除):https://github.com/ryancheunggit/kddcup2018-of-fresh-air

KDD CUP 2017 Task2 Top2 链接:https://github.com/12190143/Black-Swan


环境要求

Python2.7

  • sklearn
  • pandas
  • numpy
  • XGBoost
  • LightGBM

说明

  • baseline/ 数据预处理和主要的基本模型
  • dataset/ 存放数据集和临时数据
  • image/ 画图分析输出
  • output/ 结果输出保存

大体思路

  1. 数据预处理(主要是缺失值处理,如果连续缺失少于三个则线性填补,否则用3*24个连续值预测下一个值的预训练模型(pre_train.py)填充),采用一天的滑动窗口来增加数据
  2. 主要模型
    • 1) lightgbm为主要模型,每次预测一个值预测48次,ld和bj的5个预测值分别训练5个模型,所有站点一起训练
    • 2) ExtraTreeRegression 每次预测48个值,分5个模型预测5个指标,所有站点一起训练
    • 3) xgboost思路同lightgbm
    • 4)lightgbm 对特征数据进行log处理预测,其他类似
  3. 主要特征
    • 1)用前21天数据预测后两天的值,包括原始值,max,min,median等统计量,同时包含天,周等为单位的统计量
    • 3)天气特征,主要使用网格数据,附近一个站点的数据,这里只用了温度,湿度和气压数据
    • 4)天气预报,通过自己抓取得到,见crawl_data.py文件以及官方给定api数据
    • 5)是否周末,是否工作日,是否工作日第一天,最后一天,是否放假第一天,是否放假最后一天
    • 6)初预测指标以外的特征,比如预测PM25时,加入PM10的特征,发现只加入最后3-4天的数据比较好
  4. 模型结果融合
    • 1)同一个模型用不同的参数来训练
    • 2)同一个模型用不同的数据来训练,通过控制时间范围和数据缺失的多少来获得不同的训练数据
    • 3)对获得结果进行简单mean或者median以及加权求和

具体方案见报告

联系方式

kdd_cup_2018's People

Contributors

drjzhou avatar enjoysport2022 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kdd_cup_2018's Issues

crawl_data.py无法获取天气信息

感谢您的分享! 在运行crawl_data.py这个文件时发现41行的table为None,致使后面无法获取到天气信息,本人对爬虫了解较浅,请问该如何修改呢,谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.