Giter Club home page Giter Club logo

Comments (2)

markus-eberts avatar markus-eberts commented on September 2, 2024

Hi,
your understanding of our training process is mostly correct. Some corrections:

If (True), fine-tuning the hyperparameter (epoch in train.conf) is unuseful or meaningless

We only tuned some hyperparameters on the CoNLL04 development set (learning rate and especially relation threshold). We ended up using the same learning rate as in the original BERT paper (5e-5), which also works well in our other projects. So the only parameter that was really tuned on the development set was the relation threshold (and we tuned it only on the CoNLL04 development set, since we found the threshold to also work well for other datasets). We experienced little to no overfitting on the development set regarding the number of epochs (note that we also use a learning rate schedule). The model achieves similar performance already after just a few epochs (3-5) and training it for longer only improves performance by a little bit. We just settled for 20 epochs here, but we also achieve similar results with a higher number (e.g. 40 epochs).

[...] then each time (one epoch) the model is trained on train and dev dataset (train_dev.json), the new traind model is tested once on the test set (test.json),
Finally, acorss all training epoches, the model with the best performance on the test set is saved as the finally model, and the highest metric values on test dataset is reported in the paper.

Of course we do not apply early stopping to the test dataset. We just train the model on the combined train and dev set and then (after being trained for 20 epochs) evaluate it on the test dataset. We repeat this 5 times and report the averaged results. Note that most other papers do not state if experiments were averaged over multiple runs (or just the best out of x runs was reported, which can also make a large difference).

If all the baseline methods take the same operation (adding the validation set dev.json to the training set train.json to form a new dataset train_dev.json to train the model), it may be relatively fair.

There are others who also used the combined train+dev set, for example the highly cited work by Bekoulis et al. ("Joint entity recognition and relation extraction as a multi-head selection problem"). For many other papers (also on other datasets), we do not know if the combined set was used or not, since many prior papers did not report their training/dev/test split (and preprocessing) and/or did not disclose their code on GitHub. There are also no official dev sets for CoNLL04 and ADE. Also, training the model on the combined set only makes a larger difference for CoNLL04, and has only little effect on SciERC. In all cases, it does not affect any state-of-the-art claims.

[...] it may be relatively fair.

By combining and re-training the model on train+dev, we essentially decided to not use early stopping on the development set (since we experienced no overfitting) and rather use it as additional training data. I think both approaches (early stopping or combination) have it pros and cons, depending on the circumstances.

from spert.

markus-eberts avatar markus-eberts commented on September 2, 2024

Please leave a comment if you have additional questions.

from spert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.