Comments (4)
@SivilTaram
Hi Qian, I used the same settings as the non-BERT model when trained the BERT-based model(L-6_H-256_A-8).
I have not trained with the official 12-layer BERT model yet, I guess the 18k data is too little , which makes the model difficult to converge,may be you can try the following strategies:
- Reduce the layers of the model, i.e. use the first few layers of the pre-trained BERT model.
- Freeze the encoder in the first few epochs and then train the whole model.
- Design some special unsupervised pretraining tasks for the copy model, pretraining encoder and decoder at the same time.
from dialogue-utterance-rewriter.
@liu-nlper Thanks for your quick response! I will try again following your kind suggestions. If it is solved, I will get back to report the experimental results.
from dialogue-utterance-rewriter.
After struggling for a few days, finally I have to admit that it is difficult to incorporate the official 12-layer BERT chinese version into the task of rewrite (either for reproduced T-Ptr-Net, or T-Ptr-Lambda, even for L-Ptr-Lambda). I have tried for several ways as following, but none of them has shown improvements than the non-BERT baseline:
- 12-layer encoder, 12-layer decoder (encoder initialized by BERT, finetuned with learning rate from 0.1 to 1.5)
- 12-layer encoder, 6-layer decoder, hidden 768 (encoder initialized by BERT, finetuned with learning rate from 0.1 to 1.5)
- 6-layer encoder, 6-layer deocder, hidden 256 (BERT as encoder embedding)
- LSTM encoder, LSTM decoder, hidden 512 (BERT as encoder embedding)
- 6-layer encoder, 6-layer decoder, hidden 768 (encoder initalized by the first 6-layer of BERT).
I post the above results for reference. If any reader has employ the BERTology (Google's 12 layer chinese model) into the task successfully, please feel free to concat me (qian dot liu at buaa.edu.cn), thanks :)
from dialogue-utterance-rewriter.
After struggling for a few days, finally I have to admit that it is difficult to incorporate the official 12-layer BERT chinese version into the task of rewrite (either for reproduced T-Ptr-Net, or T-Ptr-Lambda, even for L-Ptr-Lambda). I have tried for several ways as following, but none of them has shown improvements than the non-BERT baseline:
- 12-layer encoder, 12-layer decoder (encoder initialized by BERT, finetuned with learning rate from 0.1 to 1.5)
- 12-layer encoder, 6-layer decoder, hidden 768 (encoder initialized by BERT, finetuned with learning rate from 0.1 to 1.5)
- 6-layer encoder, 6-layer deocder, hidden 256 (BERT as encoder embedding)
- LSTM encoder, LSTM decoder, hidden 512 (BERT as encoder embedding)
- 6-layer encoder, 6-layer decoder, hidden 768 (encoder initalized by the first 6-layer of BERT).
I post the above results for reference. If any reader has employ the BERTology (Google's 12 layer chinese model) into the task successfully, please feel free to concat me (qian dot liu at buaa.edu.cn), thanks :)
I also try these bert model to initialize transformer layer, but didn't show improvements. Model following:
- L3H8
- L6H8 (1st, 2ed, 3rd, 4th, 5th, 6th layer from bert)
- L6H8(1st, 3ed, 5rd, 7th, 9th, 11th layer from bert)
- L12H8
But I find bert_based model show better in other dev dataset.
Could you add me on WeChat. My WeChat ID is CHNyouqh. THX :) !
from dialogue-utterance-rewriter.
Related Issues (10)
- 可以直接迁移到英文数据集吗
- Using translate.py to predict file is ok,but using translate.py to predict single text with params --text has a problem.
- Problem with loss HOT 1
- Question on Masking (i.e. attention bias) Upon Cross-Attention HOT 3
- To construct the input
- about residual connection
- How does it apply to English data sets
- inputs_ids
- 生成的结果为空和重复的汉字 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dialogue-utterance-rewriter.