Comments (13)
Hey!
FAIR has demonstrated that using BERT for unsupervised translation greatly improves BLEU.
Paper: https://arxiv.org/abs/1901.07291
Repo: https://github.com/facebookresearch/XLM
Older papers showing pre-training with LM (not MLM) helps Seq2Seq: https://arxiv.org/abs/1611.02683
Hope this helps!
from transformers.
These links are useful.
Does anyone know if BERT improves things also for supervised translation?
Thanks.
from transformers.
Could be relevant:
Towards Making the Most of BERT in Neural Machine Translation
On the use of BERT for Neural Machine Translation
from transformers.
I have managed to replace transformer's encoder with a pretrained bert encoder, however experiment results were very poor. It dropped BLEU score by about 4
The source code is available here: https://github.com/torshie/bert-nmt , implemented as a fairseq user model. It may not work out of box, some minor tweeks may be needed.
from transformers.
These links are useful.
Does anyone know if BERT improves things also for supervised translation?
Thanks.
https://arxiv.org/pdf/1901.07291.pdf seems to suggest that it does improve the results for supervised translation as well. However this paper is not about using BERT embeddings, rather about pre-training the encoder and decoder on an Masked Language Modelling objective. The biggest benefit comes from initializing the encoder with the weights from BERT, and surprisingly using it to initialize the decoder also brings small benefits, even though if I understand correctly you still have to randomly initialize the weights for the encoder attention module, since it's not present in the pre-trained network.
EDIT: of course the pre-trained network needs to have been trained on multi-lingual data, as stated in the paper
from transformers.
Yes. It is possible to use both BERT as encoder and GPT as decoder and glue them together.
There is a recent paper on this: Multilingual Translation via Grafting Pre-trained Language Models
https://aclanthology.org/2021.findings-emnlp.233.pdf
https://github.com/sunzewei2715/Graformer
from transformers.
Hi Kerem, I don't think so. Have a look at the fairsep repo maybe.
from transformers.
@thomwolf hi there, I couldn't find out anything about the fairsep repo. Could you post a link? Thanks!
from transformers.
Hi, I am talking about this repo: https://github.com/pytorch/fairseq.
Have a look at their Transformer's models for machine translation.
from transformers.
I have conducted several MT experiments which fixed the embeddings by using BERT, UNFORTUNATELY, I find it makes performance worse. @JasonVann @thomwolf
from transformers.
Does anyone know if BERT improves things also for supervised translation?
Also interested
from transformers.
Because BERT is an encoder, I guess we need a decoder. I looked here: https://jalammar.github.io/
and it seems Openai Transformer is a decoder. But I cannot find a repo for it.
https://www.tensorflow.org/alpha/tutorials/text/transformer
I think Bert outputs a vector of size 768. Can we just do a reshape
and use the decoder in that transformer notebook? In general can I just reshape
and try out a bunch of decoders?
from transformers.
Also have a look at MASS and XLM.
from transformers.
Related Issues (20)
- Support align_corners=True in image_transforms module
- loss.backward() producing nan values with 8-bit Llama-3-70B-Instruct HOT 2
- Multiple validation datasets unsupported with `dataloader_persistent_workers=True` HOT 1
- Use models as Seq2Seq model
- Reconstruction statements mentioned in the paper HOT 6
- pipeline 'text-classification' in >=4.40.0 throwing TypeError: Got unsupported ScalarType BFloat16 HOT 1
- Saved weights differ from the original model HOT 3
- 'LlamaModel' object has no attribute '_gradient_checkpointing_func'. HOT 6
- The chatml template doesn't have a system message in the tokenizer_config.json HOT 2
- Error during inference of SpeechT5 model for TTS HOT 1
- flash-attention is not running, although is_flash_attn_2_available() returns true HOT 2
- Llama3 with LlamaForSequenceClassification - Shape mismatch error HOT 2
- Cannot replicate results from object detection task guide HOT 9
- Idefics2 fine-tuning: Error when unscale_gradients called on FP16 gradients during training with transformers and accelerate HOT 1
- Wav2Vec2CTCTokenizer adds random unknown tokens to encoded input HOT 1
- MLFlowCallback MLFLOW_RUN_ID not used HOT 1
- Correct check for SDPA in Vision Language Models HOT 1
- KeyError Issue Reason HOT 1
- ValueError Reason HOT 7
- Issue with `output_router_logits` Parameter Not Being Passed Correctly in `SwitchTransformersForConditionalGeneration`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.