Giter Club home page Giter Club logo

Comments (11)

markus-eberts avatar markus-eberts commented on July 28, 2024

Hi,
this should not be the case and I currently do not have any explanation for this. The code snippet you posted is fine. Why do you think it is still the pretrained model? And can you post the library versions and the configuration you used?

from spert.

victorbai2 avatar victorbai2 commented on July 28, 2024

@markus-eberts Thanks for your response. I firstly check the model size in the data/save/...directory, and found that the pytorch_model.bin is the same size as the one for pretrained model downloaded(413M), and then I evaluated the both models with evaluation dataset, and the result is the same.

The configuration and other things are the same.

from spert.

victorbai2 avatar victorbai2 commented on July 28, 2024

@markus-eberts please find the below example that I tested in google colab.

Spert.ipynb.zip

from spert.

markus-eberts avatar markus-eberts commented on July 28, 2024

I just updated the repository (some changes due to upgrade to new 'transformers' version) and requirements.txt. Model saving works fine on my side. Could you please pull the newest changes, use the libraries in requirements.txt and try again?

from spert.

victorbai2 avatar victorbai2 commented on July 28, 2024

@markus-eberts Hi I implemented the changes to google colab, but the result is unfortunately the same and the weight is not saved, the epoch I set for testing is 3. I checked the size of pytorch_model.bin that is saved in dir /save/data..../finial_model.

Is that the same in your end? I wonder if the code: model.save_pretrained(dir_path) only saves the pretrained weight as the name reflects.

BTW, I even ran " python ./spert.py eval --config configs/example_eval.conf " with the purely pretrained_weight (pytorch_model.bin), surprisingly, it can be evaluated? Where all other layers or weights used after cls layers?

from spert.

victorbai2 avatar victorbai2 commented on July 28, 2024

@markus-eberts I think I understand it now, I used the pytorch_model.bin that has been trained by you and I downloaded in dir: data/model/pytorch_model.bin.

But one thing seems strange for me is why the trained saved model(pytorch_model.bin) is the same size of original pretrained model. After training, should not the model become much larger just as the one for tensorflow?

from spert.

markus-eberts avatar markus-eberts commented on July 28, 2024

Is that the same in your end? I wonder if the code: model.save_pretrained(dir_path) only saves the pretrained weight as the name reflects.

The 'save_pretrained' method of 'transformers' definitely saves the whole model. I use the library alot and it's also stated in the documentation.

But one thing seems strange for me is why the trained saved model(pytorch_model.bin) is the same size of original pretrained model

I'm not sure if you are comparing with the CoNLL04 model provided by us or the bert-base-cased model downloaded via 'transformers' library. The CoNLL04 'pytorch_model.bin' trained by us is already finetuned on the task of joint entity and relation extraction. So it should roughly match the size of your trained model and give good evaluation results. Regarding the bert-base-cased model (MLM pre-trained, but not finetuned on the target task), I also do not expect a large size difference to a finetuned model, since we only add shallow (relative to BERT) linear layers.

from spert.

victorbai2 avatar victorbai2 commented on July 28, 2024

@markus-eberts I compared the trained model from you and the bert-base-case, the size of those two is the same.

BTW, is all the code written by yourself? It is very high quality code.

from spert.

markus-eberts avatar markus-eberts commented on July 28, 2024

I compared the trained model from you and the bert-base-case, the size of those two is the same.

This is reasonable. When you use your trained model for evaluation (e.g. 'python ./spert.py eval --config configs/example_eval.conf' and set model_path/tokenizer_path to your model) it should give you similar results as on the validation dataset (as outputted after training). In this case, everything works as expected and the model was saved correctly.

BTW, is all the code written by yourself? It is very high quality code.

Yes and thank you. I try my best to make the code 'readable' and easy to follow. However, since this is just the code accompanying a research paper, its main purpose is to reproduce our evaluation results. I often wish to have done some code parts better (from a software architectural point of view) but lacked the time to do so. After all, the next paper deadline is usually right around the corner ;). Of course I'm glad that the code and the SpERT model itself is useful for the research community and beyond.

from spert.

victorbai2 avatar victorbai2 commented on July 28, 2024

@markus-eberts you are really productive. If you would like to read your next paper once it is published.

from spert.

markus-eberts avatar markus-eberts commented on July 28, 2024

@markus-eberts you are really productive. If you would like to read your next paper once it is published.

Thanks.

from spert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.