Giter Club home page Giter Club logo

daguan-classify-2018's Introduction

达观杯2018

Backers on Open Collective Sponsors on Open Collective

参数没调好,仓促比赛,单模型线上没测过,线下0.784,最终得分0.791,排名18/3462,排名不高就不多写了,等着前排分享。思路如同代码所写,很简单。

数据请在达观数据处下载,放在data目录下。

一、环境

环境/库 版本
Ubuntu 14.04.5 LTS
python 3.6
jupyter notebook 4.2.3
tensorflow-gpu 1.10.1
numpy 1.14.1
pandas 0.23.0
matplotlib 2.2.2
gensim 3.5.0
tqdm 4.24.0

二、数据预处理

都写在jupyter里了。运行src/preprocess/EDA.ipynb生成各种文件。

三、baseline模型训练

src/preprocess/中运行:

python baseline-x-cv.py

四、深度模型训练

然后直接train模型,单GPU运行,模型自选:

python train_predict.py --gpu 4 --option 5 --model convlstm --feature char

多GPU训练示例:

python train_predict.py --gpu 4,5,6,7 --option 5 --model convlstm --feature char

五、模型融合输出

python stacking.py --gpu 1 --tfidf True --option 5

这里是stacking和伪标签一起做了,请修改代码自选是否用伪标签。

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

daguan-classify-2018's People

Contributors

monkeywithacupcake avatar nlpjoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

daguan-classify-2018's Issues

key_words_train_features.df

博主你好,请问在train_predict.py这个文件中

def static_data_prepare():
    train_y = pd.read_csv(config.TRAIN_X, usecols=['label_c_numeric']).values
    kw_train_df = pd.read_csv('../data/feature/key_words_train_feature.df')
    kw_test_df = pd.read_csv('../data/feature/key_words_test_feature.df')

里面的key_words_train_features.df是怎么得来的?是过滤掉低频词之后直接save的df吗?

main_feature为all时,处理了2次word_embedding,没有处理char_embedding?

word_embedding = Embedding(self.max_w_features, self.word_embed_size, weights=[self.word_embedding], trainable=False, name='word_embedding')

代码很好,学到很多~

有个疑问:这里应该是char_embedding吧?因为上面line21-24表示当main_feature为all时处理的是word_embedding,此时应该补充处理一下char_embedding吧?

同样的疑问也在textcnn_model.py中出现了,line37-40, line53-55

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.