Comments (9)
Both jobs were started at the same time on separate p3.16xlarge instances. According to above log, using the HybridBlock was about 1.5 hours faster than the the fixed embedding matrix way. Ì'm not sure about why the difference is so large. I'll check the full logs later if this is due to a sustained increased throughput or maybe due to some flakiness of the p3 instance running the unmodified code..
from gluon-nlp.
I obtained
2019-07-25 18:20:45,923 - root - [Epoch 29] valid Loss=1.5228, valid ppl=4.5849, valid bleu=25.98
2019-07-25 18:26:46,444 - root - [Epoch 29] test Loss=1.3216, test ppl=3.7493, test bleu=26.10
2019-07-25 18:26:46,452 - root - Save best parameters to transformer_en_de_u512/valid_best.params
2019-07-25 18:33:03,171 - root - Best model valid Loss=1.4929, valid ppl=4.4499, valid bleu=26.25
2019-07-25 18:38:53,523 - root - Best model test Loss=1.2879, test ppl=3.6253, test bleu=26.85
when using above Block compared to
2019-07-25 19:43:16,816 - root - [Epoch 29] valid Loss=1.5230, valid ppl=4.5857, valid bleu=25.78
2019-07-25 19:49:17,773 - root - [Epoch 29] test Loss=1.3219, test ppl=3.7506, test bleu=26.03
2019-07-25 19:55:01,214 - root - Best model valid Loss=1.4923, valid ppl=4.4471, valid bleu=26.34
2019-07-25 20:00:55,458 - root - Best model test Loss=1.2867, test ppl=3.6208, test bleu=26.79
with the current script in the master branch.
In both cases running: MXNET_GPU_MEM_POOL_TYPE=Round python train_transformer.py --dataset WMT2014BPE --src_lang en --tgt_lang de --batch_size 2700 --optimizer adam --num_accumulated 16 --lr 2.0 --warmup_steps 4000 --save_dir transformer_en_de_u512 --epochs 30 --gpus 0,1,2,3,4,5,6,7 --scaled --average_start 5 --num_buckets 20 --bucket_scheme exp --bleu 13a --log_interval 10
from gluon-nlp.
Hi @JulianSlzr. Thanks for catching up. I think it would not make much difference. In t2t and sockeye, they also use slightly one from the paper. Do you mind try running the script with this modification to see how the performance changes?
from gluon-nlp.
I think the relative order should not affect the performance
from gluon-nlp.
Any update?
from gluon-nlp.
As the relative order doesn't seem to matter, how about replacing the current logic generating a fixed length (max_length) embedding matrix with
class PositionalEmbedding(mx.gluon.HybridBlock):
"""Positional embedding.
Parameters
----------
embed_size : int
Dimensionality of positional embeddings.
"""
def __init__(self, embed_size, **kwargs):
super().__init__(**kwargs)
inv_freq = 1 / mx.nd.power(10000, mx.nd.arange(0.0, embed_size, 2.0) / embed_size)
with self.name_scope():
self.inv_freq = self.params.get_constant('inv_freq', inv_freq.reshape((1, -1)))
def hybrid_forward(self, F, pos_seq, inv_freq): # pylint: disable=arguments-differ
"""Compute positional embeddings.
Parameters
----------
pos_seq : Symbol or NDArray
Positions to compute embedding for. Shape (length, )
Returns
-------
pos_emb: Symbol or NDArray
Positional embeddings for positions secified in pos_seq. Shape
(length, embed_size).
"""
inp = F.dot(pos_seq.reshape((-1, 1)), inv_freq)
pos_emb = F.concat(F.sin(inp), F.cos(inp), dim=-1)
return pos_emb
In that case we don't require users to specify max_length a-priori when using sinusoidal embedding.
Above Block is currently used already as part of #846
from gluon-nlp.
In the first version, I also used real-time computing, but I found it slows down the training. Then, I changed it to a predefined embedding matrix. Have you checked how long it takes to complete the training compared to using a fix embedding matrix?
from gluon-nlp.
See the two log files attached. Somehow the unmodified run got delayed by 1 hour during the first evaluation:
2019-07-24 21:38:45,328 - root - [Epoch 0 Batch 7520/7679] loss=7.0374, ppl=1138.4511, throughput=163.62K wps, wc=5783.88K
2019-07-24 22:14:10,205 - root - [Epoch 0] valid Loss=6.2416, valid ppl=513.6734, valid bleu=0.21
2019-07-24 22:51:49,854 - root - [Epoch 0] test Loss=6.4132, test ppl=609.8661, test bleu=0.18
2019-07-24 22:51:49,862 - root - Save best parameters to transformer_en_de_u512/valid_best.params
2019-07-24 22:52:29,000 - root - [Epoch 1 Batch 160/7679] loss=6.9615, ppl=1055.1741, throughput=152.81K wps, wc=5780.33K
Compared to the modified run
2019-07-24 21:39:13,101 - root - [Epoch 0 Batch 7520/7679] loss=7.0567, ppl=1160.6043, throughput=162.76K wps, wc=5783.88K
2019-07-24 21:55:46,334 - root - [Epoch 0] valid Loss=6.2516, valid ppl=518.8240, valid bleu=0.27
2019-07-24 22:09:51,370 - root - [Epoch 0] test Loss=6.4214, test ppl=614.8332, test bleu=0.24
2019-07-24 22:09:51,378 - root - Save best parameters to transformer_en_de_u512/valid_best.params
2019-07-24 22:10:35,695 - root - [Epoch 1 Batch 160/7679] loss=6.9881, ppl=1083.6077, throughput=152.01K wps, wc=5780.33K
train_transformer.log
train_transformer_with_pos_emb_block.log
However, based on the throughput numbers (in attached files) I think we can conclude that replacing the precomputed embedding matrix by above block does not, or at least not significantly impact the throughput.
from gluon-nlp.
It seems that the main difference comes from the validation/testing in the first few epochs, where the modified embedding outputs a shorter sequence so that the beam search ends more quickly. This seems to suggest that the positions of sin and cos have something to do with the generation process.
from gluon-nlp.
Related Issues (20)
- Upgrade sacremoses to 0.0.44
- Loading 'distilbert_6_768_12' is broken HOT 2
- Operator npx.broadcast_like HOT 3
- [v0.x] Some CPU tests fail after switching to use MXNet 1.8.0 HOT 1
- Dead link to build_steps.groovy in contribute.rst#make-changes
- difference between gluonnlp 0.10.0 and gluonnlp 1.0.0 RoBERTaModel?
- Badge link was broken in README
- GPT2 tests mysteriously killed HOT 3
- gluon-nlp/scripts/question_answering/train.py HOT 2
- XLMR Conversion fails on parameter dictionary KeyError HOT 3
- I want BigRNN back... HOT 5
- Difference in naming convention between op.py and test_op.py
- Implementation of Incremental Decoding
- Wrong ETA for max_seq_length != 512
- Wheel fails to build setup.py for gluonnlp HOT 2
- (not a bug) question about bert `create_pretraining_data.tokenize_lines()` HOT 1
- ImportError [...]/lib/python3.10/site-packages/gluonnlp/data/fast_bert_tokenizer.cpython-310-x86_64-linux-gnu.so: undefined symbol: _PyGen_Send HOT 5
- Problems not being imported from colab HOT 1
- nlp version error HOT 1
- Installation Error in google Collab env with Python3.7 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gluon-nlp.