Giter Club home page Giter Club logo

Comments (7)

BangLiu avatar BangLiu commented on June 12, 2024 1

@hackiey I found the problem. It is in the evaluation, as in this isssue: andy840314/QANet-pytorch-@30213e2

from qanet-pytorch.

hackiey avatar hackiey commented on June 12, 2024

Hi @BangLiu , I think it is impossible that the f1 score is lower than em score, maybe you swapped these two numbers somewhere.

I read your code and I found the ema might be wrong, ema has two parameters, first parameters are the model trainable parameters, which used for training and don't use ema operation, second parameters are shadow parameters, initialized by model parameters before training, we use ema operation in shadow parameters, and store the shadow parameters as model parameters after training.

Yes the 8 heads will get out of memory problem, we may need a memory efficient self-attention method.

from qanet-pytorch.

BangLiu avatar BangLiu commented on June 12, 2024

Hi @hackiey , thanks so much for your advice! Yeah, the EM and F1 I wrote is swapped ....
I will look into my EMA and check the problem.

from qanet-pytorch.

BangLiu avatar BangLiu commented on June 12, 2024

Hi @hackiey , I checked my EMA, but I think I did exactly what you said:
(Compared with https://github.com/hackiey/torcheras/blob/master/torcheras/model.py)

  1. https://github.com/BangLiu/QANet-PyTorch/blob/master/QANet_main.py
    In main file, I registered all the model parameters. So the shadow parameter in EMA is initialized.
  2. https://github.com/BangLiu/QANet-PyTorch/blob/master/trainer/QANet_trainer.py
    In trainer file, EMA is used in 3 places.
    a) line 167, in each train step, we call EMA to calculate shadow parameters.
    b) line 209 and 262 in _valid_epoch, I test model in dev dataset. Each time we test model, we assign the shadow parameters as model parameters. After finished validate, we resume model parameters and continue training.
    c) line 268 and 292 in _save_checkpoint, similarly, we assign shadow parameters to model, and save it. After saved checkpoint, we resume model parameters and continue training.

Therefore, I think I did exactly what you did. It seems EMA has no problem, and I may have other potential problem in my code ......

from qanet-pytorch.

hackiey avatar hackiey commented on June 12, 2024

@BangLiu It seems you are right, but this value looks like the EMA doesn't work, could you post all the epoch's results?

from qanet-pytorch.

BangLiu avatar BangLiu commented on June 12, 2024

screenshot 2018-08-01 15 08 43

@hackiey Yeah, above is my training results during 30 epochs. I also tried to find the difference in preprocessing and evaluation, but until now I haven't found any critical difference.

from qanet-pytorch.

BangLiu avatar BangLiu commented on June 12, 2024

@hackiey As far as I know, my bug is located somewhere in preprocessing (in my implementation is SQuAD.py) or evaluation (in my repository is QANet_trainer.py and metrics.py). I replaced this two parts by your original way, and the performance is good (F1 79.18, EM 69.82 at 22th epoch). I am trying to find out what is the critical problem caused this bug.

from qanet-pytorch.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.