Giter Club home page Giter Club logo

Comments (6)

glample avatar glample commented on June 20, 2024 2

Hello,

In the supervised approach, we generated translations for all words from the source language to the target language, and vice-versa (a translation being a pair (x, y) associated with the probability for y of being the correct translation of x). Then, we considered all pairs of words (x, y) such that y has a high probability of being a translation of x, but also that x has a high probability of being a translation of y. Then, we sorted all generated translation pairs by frequency of the source word, and took the 5000 first resulting pairs for training, and the 1500 following ones for testing.

The initial selection pair most likely has an impact on the alignment performance, but we did not study this extensively. But we noticed that based on how we were selecting the pairs, the results in the supervised setting were different. In particular, when we were selecting pairs for which there was very little ambiguity / no multiple possible translations, then the translation accuracy was better, but note that the test set was also not the same, and maybe the difference of test pairs alone was enough to explain the differences.

Previous works have shown that using more than 5000 pairs of words does not improve the performance (Artetxe et al., 2017), and can even be detrimental (see Dinu et al., 2015). This is why we decided to consider 5000 pairs only (also because we wanted to be consistent with previous works).

from muse.

fallingstar621 avatar fallingstar621 commented on June 20, 2024 1

@glample thank you for providing more insights! Also Congratulations on the acceptance of the paper!

from muse.

fallingstar621 avatar fallingstar621 commented on June 20, 2024 1

@glample Thanks for the reply. Again, great insights!

from muse.

glample avatar glample commented on June 20, 2024

Thank you :)

from muse.

fallingstar621 avatar fallingstar621 commented on June 20, 2024

@glample Can I ask another question? Why pre-defined dictionary is only used in the first iteration of supervised training? Can we use the pre-defined dictionary rather than build from the embedding in the following iterations? I tried supervised training for several language pairs. In some cases, I observed that the precision@k metric actually drops over iterations (starting from the second iteration). In particular, the number of translation pairs Does that mean the Procrustes can make the alignment worse? Have you experienced this kind of "convergence" problem in your experiments? Any suggestion on changing the parameters (e.g., number of iterations, dico_threshold, dico_max_rank, etc.)? Thanks in advance!

from muse.

glample avatar glample commented on June 20, 2024

Can we use the pre-defined dictionary rather than build from the embedding in the following iterations? Do you mean it is possible to use the pre-defined dictionary in addition to the dictionary generated by the alignement, or instead of the generated dictionary? Currently we use the generated dictionary for the next iteration, and totally discard the pre-defined dictionary. But it is true that you could probably use a combination of both and make the supervised + refinement model even stronger.

We sometimes observed that the iterations at step t >= 2 were not as good as the initial one, but this was only for languages where embeddings are difficult to align like en-ru or en-zh. For pairs composed of European languages we did not observed anything like this.

from muse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.