Pre-training on German dataset about pegasus HOT 10 OPEN

google-research commented on August 17, 2024

Pre-training on German dataset

from pegasus.

Comments (10)

JingqingZ commented on August 17, 2024 7

Just a rough estimation, as far as I remember, the PEGASUS_BASE was pre-trained on TPUv2 with 4-8 cores for ~5 days (500k steps). The PEGASUS_LARGE was pre-trained with TPUv3 with ~256 cores for 1-3 weeks (~500k per week).

from pegasus.

JingqingZ commented on August 17, 2024

Hi, thanks for the question! Sorry, we are not planning to train on German but I would personally look forward to contributions by the community. Yes, you would need to pre-train using a german dataset.

from pegasus.

nicmer commented on August 17, 2024

I am also interested in pre-training for German.
I would be interested in how long did the pre-training take / how expensive was pre-training on the English corpora (very rough numbers are fine) - just to get an idea of how expensive it would be to pretrain for another language. I could not find information about that in the paper or in the archive. But I might have missed something.

from pegasus.

zouweidong91 commented on August 17, 2024

Hi,

Thank you for the code! Are you planning on releasing a pre-trained version for German? If not, would we need to pre-train the model using a german dataset?

Thank you

Have you pretrianed on the german dataset? And how do you compile the c++ code in ops folder. Thank you

from pegasus.

OpUs-Nebula commented on August 17, 2024

Hi @JingqingZ, I am looking to do something similar in Swedish. What is meant by batch size and steps in your paper? In Appendix B it says that you finetuned on Billsum for 50k steps with a batch size of 256, which seems impossible if each step implies processing 256 documents(the original paper on BillSum states 22,165 bills as being the whole dataset)

from pegasus.

JingqingZ commented on August 17, 2024

What is meant by batch size and steps in your paper?

Batch size: https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network

Steps: number of training steps given the batch size.

which seems impossible if each step implies processing 256 documents

It requires GPU (or TPU) with larger memory to process large batch.

from pegasus.

OpUs-Nebula commented on August 17, 2024

I understand that it is computationally intensive, and therefore needs GPU or TPU with larger memory to process each batch. What I don't understand is how you processed 50k * 256 = 12.8M documents when finetuning on BillSum when there are only 24k of them according to Appendix A.

from pegasus.

JingqingZ commented on August 17, 2024

Some documents are used multiple times.

from pegasus.

OpUs-Nebula commented on August 17, 2024

I understand, thanks for clearing that up. Did you get a chance to run your model finetuned on BillSum against human evaluators at any point after the paper was written?

…

On 17 Feb 2021, at 21:25, Jingqing Zhang ***@***.***> wrote: Some documents are used multiple times. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from pegasus.

JingqingZ commented on August 17, 2024

I understand, thanks for clearing that up. Did you get a chance to run your model finetuned on BillSum against human evaluators at any point after the paper was written?

As far as I know, I am sorry no.

from pegasus.

Pre-training on German dataset about pegasus HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent