Giter Club home page Giter Club logo

Comments (10)

JingqingZ avatar JingqingZ commented on August 17, 2024 7

Just a rough estimation, as far as I remember, the PEGASUS_BASE was pre-trained on TPUv2 with 4-8 cores for ~5 days (500k steps). The PEGASUS_LARGE was pre-trained with TPUv3 with ~256 cores for 1-3 weeks (~500k per week).

from pegasus.

JingqingZ avatar JingqingZ commented on August 17, 2024

Hi, thanks for the question! Sorry, we are not planning to train on German but I would personally look forward to contributions by the community. Yes, you would need to pre-train using a german dataset.

from pegasus.

nicmer avatar nicmer commented on August 17, 2024

I am also interested in pre-training for German.
I would be interested in how long did the pre-training take / how expensive was pre-training on the English corpora (very rough numbers are fine) - just to get an idea of how expensive it would be to pretrain for another language. I could not find information about that in the paper or in the archive. But I might have missed something.

from pegasus.

zouweidong91 avatar zouweidong91 commented on August 17, 2024

Hi,

Thank you for the code! Are you planning on releasing a pre-trained version for German? If not, would we need to pre-train the model using a german dataset?

Thank you

Have you pretrianed on the german dataset? And how do you compile the c++ code in ops folder. Thank you

from pegasus.

OpUs-Nebula avatar OpUs-Nebula commented on August 17, 2024

Hi @JingqingZ, I am looking to do something similar in Swedish. What is meant by batch size and steps in your paper? In Appendix B it says that you finetuned on Billsum for 50k steps with a batch size of 256, which seems impossible if each step implies processing 256 documents(the original paper on BillSum states 22,165 bills as being the whole dataset)

from pegasus.

JingqingZ avatar JingqingZ commented on August 17, 2024

What is meant by batch size and steps in your paper?

Batch size: https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network

Steps: number of training steps given the batch size.

which seems impossible if each step implies processing 256 documents

It requires GPU (or TPU) with larger memory to process large batch.

from pegasus.

OpUs-Nebula avatar OpUs-Nebula commented on August 17, 2024

I understand that it is computationally intensive, and therefore needs GPU or TPU with larger memory to process each batch. What I don't understand is how you processed 50k * 256 = 12.8M documents when finetuning on BillSum when there are only 24k of them according to Appendix A.

from pegasus.

JingqingZ avatar JingqingZ commented on August 17, 2024

Some documents are used multiple times.

from pegasus.

OpUs-Nebula avatar OpUs-Nebula commented on August 17, 2024

from pegasus.

JingqingZ avatar JingqingZ commented on August 17, 2024

I understand, thanks for clearing that up. Did you get a chance to run your model finetuned on BillSum against human evaluators at any point after the paper was written?

As far as I know, I am sorry no.

from pegasus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.