Giter Club home page Giter Club logo

Comments (13)

SinghJasdeep avatar SinghJasdeep commented on May 4, 2024 21

Hey!

FAIR has demonstrated that using BERT for unsupervised translation greatly improves BLEU.

Paper: https://arxiv.org/abs/1901.07291

Repo: https://github.com/facebookresearch/XLM

Older papers showing pre-training with LM (not MLM) helps Seq2Seq: https://arxiv.org/abs/1611.02683

Hope this helps!

from transformers.

gtesei avatar gtesei commented on May 4, 2024 6

These links are useful.

Does anyone know if BERT improves things also for supervised translation?

Thanks.

from transformers.

sailordiary avatar sailordiary commented on May 4, 2024 4

Could be relevant:

Towards Making the Most of BERT in Neural Machine Translation
On the use of BERT for Neural Machine Translation

from transformers.

torshie avatar torshie commented on May 4, 2024 3

I have managed to replace transformer's encoder with a pretrained bert encoder, however experiment results were very poor. It dropped BLEU score by about 4

The source code is available here: https://github.com/torshie/bert-nmt , implemented as a fairseq user model. It may not work out of box, some minor tweeks may be needed.

from transformers.

tacchinotacchi avatar tacchinotacchi commented on May 4, 2024 1

These links are useful.

Does anyone know if BERT improves things also for supervised translation?

Thanks.

https://arxiv.org/pdf/1901.07291.pdf seems to suggest that it does improve the results for supervised translation as well. However this paper is not about using BERT embeddings, rather about pre-training the encoder and decoder on an Masked Language Modelling objective. The biggest benefit comes from initializing the encoder with the weights from BERT, and surprisingly using it to initialize the decoder also brings small benefits, even though if I understand correctly you still have to randomly initialize the weights for the encoder attention module, since it's not present in the pre-trained network.

EDIT: of course the pre-trained network needs to have been trained on multi-lingual data, as stated in the paper

from transformers.

lileicc avatar lileicc commented on May 4, 2024 1

Yes. It is possible to use both BERT as encoder and GPT as decoder and glue them together.
There is a recent paper on this: Multilingual Translation via Grafting Pre-trained Language Models
https://aclanthology.org/2021.findings-emnlp.233.pdf
https://github.com/sunzewei2715/Graformer

from transformers.

thomwolf avatar thomwolf commented on May 4, 2024

Hi Kerem, I don't think so. Have a look at the fairsep repo maybe.

from transformers.

JasonVann avatar JasonVann commented on May 4, 2024

@thomwolf hi there, I couldn't find out anything about the fairsep repo. Could you post a link? Thanks!

from transformers.

thomwolf avatar thomwolf commented on May 4, 2024

Hi, I am talking about this repo: https://github.com/pytorch/fairseq.
Have a look at their Transformer's models for machine translation.

from transformers.

alphadl avatar alphadl commented on May 4, 2024

I have conducted several MT experiments which fixed the embeddings by using BERT, UNFORTUNATELY, I find it makes performance worse. @JasonVann @thomwolf

from transformers.

echan00 avatar echan00 commented on May 4, 2024

Does anyone know if BERT improves things also for supervised translation?

Also interested

from transformers.

nyck33 avatar nyck33 commented on May 4, 2024

Because BERT is an encoder, I guess we need a decoder. I looked here: https://jalammar.github.io/
and it seems Openai Transformer is a decoder. But I cannot find a repo for it.
https://www.tensorflow.org/alpha/tutorials/text/transformer
I think Bert outputs a vector of size 768. Can we just do a reshape and use the decoder in that transformer notebook? In general can I just reshape and try out a bunch of decoders?

from transformers.

Bachstelze avatar Bachstelze commented on May 4, 2024

Also have a look at MASS and XLM.

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.