Giter Club home page Giter Club logo

Comments (2)

TheophileBlard avatar TheophileBlard commented on May 24, 2024 1

Thank you for your response.
You're right, we should not draw conclusions without comparing to other French models. However, I don't think you can say that a "complex" test set is the same thing than a completely other dataset, on which the model hasn't be finetuned on. Was the final perplexity on your train set similar to the test set ?

Unfortunately, there are not many large French datasets around, and I think you already used most of them in your train set... And there are not (yet) many generative model in French language to which we could compare yours. It could be nice to establish a baseline for this specific task in French!

Full disclaimer, I also wanted to train a GPT2 model from scratch, then finetune it to generate movie reviews, based on the allocine dataset. But I have no way near the compute power you used for belGPT2, and I only manage to train on Wikipedia. Thereby, I was thinking about using your model instead, but those inconsistencies scare me.

EDIT: I forgot to say: it IS really nice work.

from belgpt2.

ant-louis avatar ant-louis commented on May 24, 2024

Hey, I think that the reported perplexity depends strongly on the dataset on which it is calculated, and that it is difficult to draw conclusions by comparing perplexities computed on different datasets (what’s more, in different languages). For example if you look at the original GPT-2 paper, you can see that the small version reaches a perplexity of 35.13 on LAMBADA, but also obtains a perplexity of 65.85 on PTB and even 75.2 on 1BW.

Hence, it could simply be that the test set on which I calculated these perplexities is complex for the task of language modeling. To find out, it would be ideal to calculate the perplexity of BelGPT-2 on other French datasets commonly used as benchmarks.

However, I have also noticed these inconsistencies when the model produces several consecutive sentences, and I don't have an explanation for this yet.

from belgpt2.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.