Giter Club home page Giter Club logo

atec2018-nlp's Introduction

ATEC2018 NLP赛题 复赛f1 = 0.7327

由于PAI平台限制,所有代码都放在一个文件里面,pai_train.py是获得本次比赛成绩的文件,实验共使用了4个模型,分别是自定义Siamese网络、ESIM网络、Decomposable Attention和DSSM网络。其中Siamese、ESIM和Decomposable Attention有char level和word level两个版本,DSSM网络只有char和word的合并版本。最佳记录由多个模型进行blending融合预测,遗憾没有尝试一下10fold交叉训练模型,前排貌似都用了,而且这里每个模型都只用了2个小时来训练。

模型性能比较,字符级的esim模型在这个任务中表现最佳。

model name 模型输出与标签相关性r 最优f1评分 取得最优f1评分的阈值
siamese char 0.553536380131115 0.6971525551574581 0.258
siamese word 0.5308273808879237 0.6873517065157875 0.242
esim char 0.5853469280801447 0.7116622491480499 0.233
esim word 0.5783574742744366 0.7100964753080524 0.263
decom char 0.5288425401105513 0.6825720620842572 0.249
decom word 0.4943718720970039 0.6677430929314676 0.212
dssm both 0.5638034287814917 0.6980098067493511 0.263

训练感受:

  1. batchsize不要太大,虽然每个epoch更快完成, 但每个epoch权重更新次数变少了,收敛更慢
  2. 使用循环学习率可以收敛到更好的极值点,更容易跳出局部极值,如在一个epoch中,使学习率从小变大,又逐渐变小
  3. 利用SWA这种简单的模型融合方法可以获得泛化能力更好的性能,本地提升明显,但线上没有改善。

pai_transform.pypai_old.py是两次不成功的尝试: pai_transform.py试图参考fastai的ULMFiT方法,通过训练语言模型作为embedding输入,并针对当前分类任务更改网络结构以适应当前训练过程。 pai_old.py试图参考quora分享,使用文本特征工程进行分类。

模型来源siamese参考:https://blog.csdn.net/huowa9077/article/details/81082795 ESIM网络、Decomposable Attention来自Kaggle分享:https://www.kaggle.com/lamdang/dl-models DSSM网络来自bird大神分享:https://openclub.alipay.com/read.php?tid=7480&fid=96 感谢以上!

atec2018-nlp's People

Contributors

ziweipolaris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

atec2018-nlp's Issues

当用字向量siamase网络训练时出现OOM错误

当用字向量siamase网络训练时出现OOM错误, GPU内存5G, tensorflow版本是1.13.1, keras版本是2.2.4, 请问是我的GPU内存不够吗?如果是,请问你的机器GPU内存是多少?谢谢
Train on 99915 samples, validate on 2562 samples
Epoch 1/2
Traceback (most recent call last):
File "pai_train.py", line 888, in
train_all_models(index=0)
File "pai_train.py", line 811, in train_all_models
train_model(model, swa_model, cfg)
File "pai_train.py", line 794, in train_model
fit()
File "pai_train.py", line 786, in fit
epochs=n_epoch,verbose=2)
File "/home/cc/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/cc/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/cc/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/cc/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/cc/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in call
run_metadata_ptr)
File "/home/cc/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[635974,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/mul_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[{{node metrics/acc/Mean_1}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

只用siamase网络结构的问题

你好;
看了您的代码很有启发,我自己在跑这个模型的时候,只用了siamase网络这个,然后用的字向量,用的网上的。我有几个问题想请教一下哈?
1.首先就是我这边用的10万条数据训练的,但是有数据不平衡问题,想请教一下是怎么解决的,
2.还有就是我在siamase网络上的验证集F1只有0.4几,提交了一次后是0.51,我想问一下这个结果正常吗?
3.还有就是想问一下用多少个epoch比较合适

请教词向量相关

您好!
看了您的代码,收获很大。请问词向量是你们在这个数据集上训练的吗?有没有去停用词或者其他处理呢?我用的 gensim 只去掉标点,window_size = 5 训练的,只使用了 decom 的几个融合的提交测试了下。在第一赛季的B榜上(新人学习赛)只有 0.53+,按照 README 应该有 0.6+ 所以向您请教下是不是我哪里处理有问题。
多谢了!

求数据

您好,刚在公众号上看到了这篇推送,去官网之后发现无权下载数据,能否分享一下数据,谢谢!
我的邮箱为:[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.