Comments (2)
Hi,
your understanding of our training process is mostly correct. Some corrections:
If (True), fine-tuning the hyperparameter (epoch in train.conf) is unuseful or meaningless
We only tuned some hyperparameters on the CoNLL04 development set (learning rate and especially relation threshold). We ended up using the same learning rate as in the original BERT paper (5e-5), which also works well in our other projects. So the only parameter that was really tuned on the development set was the relation threshold (and we tuned it only on the CoNLL04 development set, since we found the threshold to also work well for other datasets). We experienced little to no overfitting on the development set regarding the number of epochs (note that we also use a learning rate schedule). The model achieves similar performance already after just a few epochs (3-5) and training it for longer only improves performance by a little bit. We just settled for 20 epochs here, but we also achieve similar results with a higher number (e.g. 40 epochs).
[...] then each time (one epoch) the model is trained on train and dev dataset (train_dev.json), the new traind model is tested once on the test set (test.json),
Finally, acorss all training epoches, the model with the best performance on the test set is saved as the finally model, and the highest metric values on test dataset is reported in the paper.
Of course we do not apply early stopping to the test dataset. We just train the model on the combined train and dev set and then (after being trained for 20 epochs) evaluate it on the test dataset. We repeat this 5 times and report the averaged results. Note that most other papers do not state if experiments were averaged over multiple runs (or just the best out of x runs was reported, which can also make a large difference).
If all the baseline methods take the same operation (adding the validation set dev.json to the training set train.json to form a new dataset train_dev.json to train the model), it may be relatively fair.
There are others who also used the combined train+dev set, for example the highly cited work by Bekoulis et al. ("Joint entity recognition and relation extraction as a multi-head selection problem"). For many other papers (also on other datasets), we do not know if the combined set was used or not, since many prior papers did not report their training/dev/test split (and preprocessing) and/or did not disclose their code on GitHub. There are also no official dev sets for CoNLL04 and ADE. Also, training the model on the combined set only makes a larger difference for CoNLL04, and has only little effect on SciERC. In all cases, it does not affect any state-of-the-art claims.
[...] it may be relatively fair.
By combining and re-training the model on train+dev, we essentially decided to not use early stopping on the development set (since we experienced no overfitting) and rather use it as additional training data. I think both approaches (early stopping or combination) have it pros and cons, depending on the circumstances.
from spert.
Please leave a comment if you have additional questions.
from spert.
Related Issues (20)
- How to easily use this model for inference HOT 3
- Can't make predictions following the example HOT 8
- Help! Help! HOT 1
- Help, HOT 6
- How to call only the relation classifier on a pair of entities? HOT 2
- What is the meaning of the dataset tensors? HOT 1
- Simple example issue HOT 1
- Parts of entities are recognised separately HOT 3
- How does span filtering work? HOT 3
- Runtime Error HOT 1
- RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered HOT 4
- Does SpERT work with GPT models? HOT 1
- How to prepare dataset for training the model? HOT 9
- Can't download datasets
- TypeError: 'NoneType' object is not callable HOT 1
- Can't make train following the example
- Trained model : Relation classification is bad
- HELP HOT 1
- Extract entities and relation from Spacy tokens?
- [WARNI] NaN or Inf found in input tensor. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spert.