Giter Club home page Giter Club logo

Comments (4)

JingqingZ avatar JingqingZ commented on August 17, 2024

Hi, thanks for the question! Could you elaborate on what do you mean by

The loss is not decreasing after this point and not converging or stuck to local minima.

Does the loss ever decrease in the first 20 epoch? What is the ROUGE score you have achieved by fine-tuning on PEGASUS and T5?

Any plans to release PEGASUS_base

Sorry, there is currently no plan to release base models due to checkpoints incompatibility.

from pegasus.

rohitsroch avatar rohitsroch commented on August 17, 2024

Does the loss ever decrease in the first 20 epoch? What is the ROUGE score you have achieved by fine-tuning on PEGASUS and T5?

@JingqingZ Apologies for the confusion. Yes, the loss decreases smoothly for the first 15-20 epochs but it doesn't converge. Below is the reference plot during training with learning rate 2e-4.

loss

  • If you check after 5k steps (15 epochs), loss value slightly changes or almost constant (~1.5). Also, I tried training for a further 5 epochs with an increased learning rate to 2e-3, but it diverged and became stable to same value (~1.5). Any thoughts on what should I do?

  • Also tried training the model using a triangular Cyclical learning rate policy but the same behavior occurs.

  • Although, results/summary of the evaluation set is good. Below are the ROUGE score for PEGASUS (large) and T5 (small) using following decoding params for both

  beam_size = 1
  top_p = 0.95
  top_k = 50
  temperature=0.5

NOTE: Below scores are Average across 78 datapoints in eval set

PEGASUSlarge

ROUGE-1 ROUGE-2 ROUGE-L
precision 0.493 0.237 0.368
recall 0.532 0.263 0.403
fmeasure 0.486 0.237 0.365

T5small

ROUGE-1 ROUGE-2 ROUGE-L
precision 0.507 0.211 0.363
recall 0.443 0.189 0.322
fmeasure 0.455 0.192 0.329
  • I didn't use Beam search algorithm for decoding as it taking lot of time to decode (with a beam size of 5) for each input on n1-standard VM 8v CPUs

from pegasus.

JingqingZ avatar JingqingZ commented on August 17, 2024

Hi, thanks for the information!

I think the overall performance (given the learning curve and ROUGE scores) of PEGASUS looks reasonable so I don't think there is anything wrong in there. But apparently it can be improved by tuning some hyper-parameters, which need some empirical experiments.

the loss decreases smoothly for the first 15-20 epochs but it doesn't converge. Below is the reference plot during training with learning rate 2e-4.

It seems the loss is still decreasing and the fine-tuning may need more steps. In our paper Appendix C, we provide a full table of hyper-parameters we used to fine-tune each dataset and most of them have more fine-tuning steps (and possibly larger batch size) than yours. The learning rate can be smaller as well if the fluctuation of loss persists.

Considering the relatively small eval set with 78 examples, some slight fluctuation of loss on the eval set is possible.

I didn't use Beam search algorithm for decoding

Beam search actually can improve the ROUGE quite significantly for a couple of points.

Hope this may answer your questions!

from pegasus.

rohitsroch avatar rohitsroch commented on August 17, 2024

@JingqingZ, Thanks a lot for quick help, I will check Appendix C in paper :). Closing this issue!

from pegasus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.