Comments (2)
Thank you for your response.
You're right, we should not draw conclusions without comparing to other French models. However, I don't think you can say that a "complex" test set is the same thing than a completely other dataset, on which the model hasn't be finetuned on. Was the final perplexity on your train set similar to the test set ?
Unfortunately, there are not many large French datasets around, and I think you already used most of them in your train set... And there are not (yet) many generative model in French language to which we could compare yours. It could be nice to establish a baseline for this specific task in French!
Full disclaimer, I also wanted to train a GPT2 model from scratch, then finetune it to generate movie reviews, based on the allocine dataset. But I have no way near the compute power you used for belGPT2, and I only manage to train on Wikipedia. Thereby, I was thinking about using your model instead, but those inconsistencies scare me.
EDIT: I forgot to say: it IS really nice work.
from belgpt2.
Hey, I think that the reported perplexity depends strongly on the dataset on which it is calculated, and that it is difficult to draw conclusions by comparing perplexities computed on different datasets (what’s more, in different languages). For example if you look at the original GPT-2 paper, you can see that the small version reaches a perplexity of 35.13 on LAMBADA, but also obtains a perplexity of 65.85 on PTB and even 75.2 on 1BW.
Hence, it could simply be that the test set on which I calculated these perplexities is complex for the task of language modeling. To find out, it would be ideal to calculate the perplexity of BelGPT-2 on other French datasets commonly used as benchmarks.
However, I have also noticed these inconsistencies when the model produces several consecutive sentences, and I don't have an explanation for this yet.
from belgpt2.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from belgpt2.