In the <a href="https://d4mucfpksywv.cloudfront.net/better-language-models/language_mo

Perplexity higher than GPT2 paper about belgpt2 HOT 2 CLOSED

TheophileBlard commented on May 24, 2024

Perplexity higher than GPT2 paper

from belgpt2.

Comments (2)

TheophileBlard commented on May 24, 2024 1

Thank you for your response.
You're right, we should not draw conclusions without comparing to other French models. However, I don't think you can say that a "complex" test set is the same thing than a completely other dataset, on which the model hasn't be finetuned on. Was the final perplexity on your train set similar to the test set ?

Unfortunately, there are not many large French datasets around, and I think you already used most of them in your train set... And there are not (yet) many generative model in French language to which we could compare yours. It could be nice to establish a baseline for this specific task in French!

Full disclaimer, I also wanted to train a GPT2 model from scratch, then finetune it to generate movie reviews, based on the allocine dataset. But I have no way near the compute power you used for belGPT2, and I only manage to train on Wikipedia. Thereby, I was thinking about using your model instead, but those inconsistencies scare me.

EDIT: I forgot to say: it IS really nice work.

from belgpt2.

ant-louis commented on May 24, 2024

Hey, I think that the reported perplexity depends strongly on the dataset on which it is calculated, and that it is difficult to draw conclusions by comparing perplexities computed on different datasets (what’s more, in different languages). For example if you look at the original GPT-2 paper, you can see that the small version reaches a perplexity of 35.13 on LAMBADA, but also obtains a perplexity of 65.85 on PTB and even 75.2 on 1BW.

Hence, it could simply be that the test set on which I calculated these perplexities is complex for the task of language modeling. To find out, it would be ideal to calculate the perplexity of BelGPT-2 on other French datasets commonly used as benchmarks.

However, I have also noticed these inconsistencies when the model produces several consecutive sentences, and I don't have an explanation for this yet.

from belgpt2.

Perplexity higher than GPT2 paper about belgpt2 HOT 2 CLOSED

Comments (2)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent