zphang / bert_on_stilts Goto Github PK
View Code? Open in Web Editor NEWFork of huggingface/pytorch-pretrained-BERT for BERT on STILTs
License: Apache License 2.0
Fork of huggingface/pytorch-pretrained-BERT for BERT on STILTs
License: Apache License 2.0
Hi,
I was hoping to compare this approach with my own sentence embedding method.
Sorry if this is mentioned somewhere (I couldn't find it) but is the "best" pretrained model from the paper freely available? The one fine-tuned on MNLI (and not additionally the tasks of GLUE).
I would be super helpful for comparison if the model weights were on https://huggingface.co/models! (I don't see anything when searching for "STILTS")
Hi, this is really amazing works. Would you mind explaining these bert_load_modes? Thanks a lot!
I noticed all the pretrained models are using BERT Large.
Why not release pretrained models on BERT Base too?
[Adding as a place marker / bookmark.]
Awesome work! :-D
I'm interested in biomedical uses of contextual language models; I would appreciate a heads-up if anyone applies this or other approaches to that task (I am aware of BioBERT: https://arxiv.org/pdf/1901.08746.pdf).
Hello! I am interested in whether you were able to get similar results to your paper using adapters rather than whole model tuning.
It seems like adapters might be less effective at holding the information received from an intermediate task as compared to tuning the whole BERT model.
There also doesn't seem to be much documentation on Adapters in this code base - do you have any pointers of a good example in the code?
--task_name $TASK \ --do_train --do_val --do_test --do_val_history \ --do_save \ --do_lower_case \ --bert_model bert-large-uncased \ --bert_load_path $PRETRAINED_MODEL_PATH \ --bert_load_mode model_only \ --bert_save_mode model_all \ --train_batch_size 24 \ --learning_rate 2e-5 \ --output_dir $OUTPUT_PATH
File "glue/train.py", line 224
args.output_dir, f"all_state___epoch{epoch:04d}___batch{step:06d}.p"
^
SyntaxError: invalid syntax
I have some questions about interpreting the results in Table 1.
Here's Table 1 for reference:
My questions apply to all models but for simplicity I will just ask about BERT.
My question is:
How to identify the intermediate task for "BERT on STILTS" in the Test Set Scores. For example, in CoLA, the best model in the development set is just BERT with score in 62.1. This is reflected in "BERT, Best of Each". So, in the test set, which model is used for "BERT on STILTS"?
I'm guessing it has to be the best model within the ones that had intermediate tasks so it's BERT->MNLI since its score is 59.8. So for CoLA, BERT->MNLI scored 59.8 on the development set score and 62.1 in the test set. Is the correct way to read the table?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.