Is COMET on a 0-1 scale? Is it normalized to [0,1]? I saw on the paper of uncertainty-

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you so much for explaining! This helps a lot:) <a class="user-mention notranslat

COMET compares the MT with both source and reference in a shared

Thank you, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Thank you so much, <a class="user-mention notranslate" data-hovercard-type="user" data

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Which scale is COMET? [0,1]? about comet HOT 11 CLOSED

i55code commented on May 9, 2024

Which scale is COMET? [0,1]?

from comet.

Comments (11)

ricardorei commented on May 9, 2024

Hi @i55code most COMET models (the only exception is our ranking model) are trained with z-scores from WMT shared tasks. Z-scores are unbounded thus COMET predictions can go over 1 and below 0.

Having negative scores it's actually more common than having scores above 1 but sometimes this happens if a translation is really good.

from comet.

i55code commented on May 9, 2024

Thank you so much for explaining! This helps a lot:) @ricardorei

I also hope to ask two more questions:

Is it possible that the data does not have "src" field? Does using different "src" have a big impact in COMET? BLEU does not require "src".
The score format is [seg_scores, sys_score], when we are talking about COMET score, it is the sys_score, right? What does seq_scores do? In what cases, would knowing seq_scores be helpful?

Thanks!

from comet.

ricardorei commented on May 9, 2024

COMET compares the MT with both source and reference in a shared embeddings space. If the source is completely different ideally this will drop your score. COMET metrics are really designed with the intent of using the source and some metrics (the reference-free metrics) only require the source! We don't have any model that uses only the reference. Are you trying to use COMET in a particular use-case where you don't have access to the source?
the sys_score is the average of the seg_scores. The seg_scores are basically the quality assessments for each individual hypothesis (MT). They might be useful for some applications where we are actually interested in comparing two (or more) systems at the segment level.

from comet.

i55code commented on May 9, 2024

Thank you, @ricardorei, this helps a lot. I am trying to evaluate in a case when the source is not available. In my work, I am interested to know how well a translation is given some reference text. If there is future development on source-free metrics, feel free to let me know.

One more question I have is on the language list that are covered, are these languages refer to the source language? If, say, a language is not covered, how much can people trust the output? I read the "uncertainty-aware" paper, is there a simple way to print out the bounds from COMET's repo?

from comet.

ricardorei commented on May 9, 2024

The language list is basically source and target.

Let's imagine you are evaluating translations from Maori into English (or vice-versa). Maori is not covered in XLMR and the results might be unreliable. This is especially important if the target language is the uncovered language and/or if using a reference-free metric.

Regarding the question "how much people can trust the output?" this is very hard to answer. In WMT20 we evaluated the model on Inuktitut and the results were good, yet Tom Kocmi from Microsoft tested several COMET models on several languages and he reported that it was not stable for languages not covered in XLM-R (#18).

I read the "uncertainty-aware" paper, is there a simple way to print out the bounds from COMET's repo?
The code used in that paper is based on a previous COMET codebase and you can find everything here: UA_COMET

In this current codebase, you can only replicate the Monte Carlo Dropout results using the --mc_dropout flag. This should give you reasonable bounds without having to run several models.

from comet.

i55code commented on May 9, 2024

Thank you so much, @ricardorei . I mainly work with low resource languages, and COMET sometimes will give surprising good result when there is low BLEU. So I would be interested in getting a good bounds. Thank you, I will try UA_COMET.

Thank you for your explanation, I really appreciate it and have a good weekend!

from comet.

ricardorei commented on May 9, 2024

Thanks, @i55code have a nice weekend!

from comet.

i55code commented on May 9, 2024

@ricardorei Hi Ricardo, I have one last question to ask. For --mc_dropout, I saw the output on command line, but is there a simple way to invoke it through Python interface. I see that there is model.predict(...), and model.mc_dropout(), what is a simple command in python to get bounds? Thanks!

from comet.

ricardorei commented on May 9, 2024

Its an argument of the predict function:
mean_scores, std_scores, sys_score = model.predict(data, batch_size=8, gpus=1, mc_dropout=30)

from comet.

i55code commented on May 9, 2024

Thank you, @ricardorei Ricardo! Really learn a lot today. I hope to use COMET more in my future research. Take good care and stay safe!

from comet.

ricardorei commented on May 9, 2024

Thanks, @i55code feel free to reach out for questions!

from comet.

Which scale is COMET? [0,1]? about comet HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent