Comments (7)
@hackiey I found the problem. It is in the evaluation, as in this isssue: andy840314/QANet-pytorch-@30213e2
from qanet-pytorch.
Hi @BangLiu , I think it is impossible that the f1 score is lower than em score, maybe you swapped these two numbers somewhere.
I read your code and I found the ema might be wrong, ema has two parameters, first parameters are the model trainable parameters, which used for training and don't use ema operation, second parameters are shadow parameters, initialized by model parameters before training, we use ema operation in shadow parameters, and store the shadow parameters as model parameters after training.
Yes the 8 heads will get out of memory problem, we may need a memory efficient self-attention method.
from qanet-pytorch.
Hi @hackiey , thanks so much for your advice! Yeah, the EM and F1 I wrote is swapped ....
I will look into my EMA and check the problem.
from qanet-pytorch.
Hi @hackiey , I checked my EMA, but I think I did exactly what you said:
(Compared with https://github.com/hackiey/torcheras/blob/master/torcheras/model.py)
- https://github.com/BangLiu/QANet-PyTorch/blob/master/QANet_main.py
In main file, I registered all the model parameters. So the shadow parameter in EMA is initialized. - https://github.com/BangLiu/QANet-PyTorch/blob/master/trainer/QANet_trainer.py
In trainer file, EMA is used in 3 places.
a) line 167, in each train step, we call EMA to calculate shadow parameters.
b) line 209 and 262 in _valid_epoch, I test model in dev dataset. Each time we test model, we assign the shadow parameters as model parameters. After finished validate, we resume model parameters and continue training.
c) line 268 and 292 in _save_checkpoint, similarly, we assign shadow parameters to model, and save it. After saved checkpoint, we resume model parameters and continue training.
Therefore, I think I did exactly what you did. It seems EMA has no problem, and I may have other potential problem in my code ......
from qanet-pytorch.
@BangLiu It seems you are right, but this value looks like the EMA doesn't work, could you post all the epoch's results?
from qanet-pytorch.
@hackiey Yeah, above is my training results during 30 epochs. I also tried to find the difference in preprocessing and evaluation, but until now I haven't found any critical difference.
from qanet-pytorch.
@hackiey As far as I know, my bug is located somewhere in preprocessing (in my implementation is SQuAD.py) or evaluation (in my repository is QANet_trainer.py and metrics.py). I replaced this two parts by your original way, and the performance is good (F1 79.18, EM 69.82 at 22th epoch). I am trying to find out what is the critical problem caused this bug.
from qanet-pytorch.
Related Issues (5)
- out of memory HOT 1
- A possible bug in self_attention.py HOT 3
- Add license HOT 3
- how to use multiple GPUs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qanet-pytorch.