I tested the performance of this implementation. What I get is: With ema (26 epoch

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Performance and memory problem about qanet-pytorch HOT 7 CLOSED

hackiey commented on June 12, 2024

Performance and memory problem

from qanet-pytorch.

Comments (7)

BangLiu commented on June 12, 2024 1

@hackiey I found the problem. It is in the evaluation, as in this isssue: andy840314/QANet-pytorch-@30213e2

from qanet-pytorch.

hackiey commented on June 12, 2024

Hi @BangLiu , I think it is impossible that the f1 score is lower than em score, maybe you swapped these two numbers somewhere.

I read your code and I found the ema might be wrong, ema has two parameters, first parameters are the model trainable parameters, which used for training and don't use ema operation, second parameters are shadow parameters, initialized by model parameters before training, we use ema operation in shadow parameters, and store the shadow parameters as model parameters after training.

Yes the 8 heads will get out of memory problem, we may need a memory efficient self-attention method.

from qanet-pytorch.

BangLiu commented on June 12, 2024

Hi @hackiey , thanks so much for your advice! Yeah, the EM and F1 I wrote is swapped ....
I will look into my EMA and check the problem.

from qanet-pytorch.

BangLiu commented on June 12, 2024

Hi @hackiey , I checked my EMA, but I think I did exactly what you said:
(Compared with https://github.com/hackiey/torcheras/blob/master/torcheras/model.py)

https://github.com/BangLiu/QANet-PyTorch/blob/master/QANet_main.py
In main file, I registered all the model parameters. So the shadow parameter in EMA is initialized.
https://github.com/BangLiu/QANet-PyTorch/blob/master/trainer/QANet_trainer.py
In trainer file, EMA is used in 3 places.
a) line 167, in each train step, we call EMA to calculate shadow parameters.
b) line 209 and 262 in _valid_epoch, I test model in dev dataset. Each time we test model, we assign the shadow parameters as model parameters. After finished validate, we resume model parameters and continue training.
c) line 268 and 292 in _save_checkpoint, similarly, we assign shadow parameters to model, and save it. After saved checkpoint, we resume model parameters and continue training.

Therefore, I think I did exactly what you did. It seems EMA has no problem, and I may have other potential problem in my code ......

from qanet-pytorch.

hackiey commented on June 12, 2024

@BangLiu It seems you are right, but this value looks like the EMA doesn't work, could you post all the epoch's results?

from qanet-pytorch.

BangLiu commented on June 12, 2024

@hackiey Yeah, above is my training results during 30 epochs. I also tried to find the difference in preprocessing and evaluation, but until now I haven't found any critical difference.

from qanet-pytorch.

BangLiu commented on June 12, 2024

@hackiey As far as I know, my bug is located somewhere in preprocessing (in my implementation is SQuAD.py) or evaluation (in my repository is QANet_trainer.py and metrics.py). I replaced this two parts by your original way, and the performance is good (F1 79.18, EM 69.82 at 22th epoch). I am trying to find out what is the critical problem caused this bug.

from qanet-pytorch.

Performance and memory problem about qanet-pytorch HOT 7 CLOSED

Comments (7)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent