Comments (11)
Hi @i55code most COMET models (the only exception is our ranking model) are trained with z-scores from WMT shared tasks. Z-scores are unbounded thus COMET predictions can go over 1 and below 0.
Having negative scores it's actually more common than having scores above 1 but sometimes this happens if a translation is really good.
from comet.
Thank you so much for explaining! This helps a lot:) @ricardorei
I also hope to ask two more questions:
- Is it possible that the data does not have "src" field? Does using different "src" have a big impact in COMET? BLEU does not require "src".
- The score format is [seg_scores, sys_score], when we are talking about COMET score, it is the sys_score, right? What does seq_scores do? In what cases, would knowing seq_scores be helpful?
Thanks!
from comet.
-
COMET compares the MT with both source and reference in a shared embeddings space. If the source is completely different ideally this will drop your score. COMET metrics are really designed with the intent of using the source and some metrics (the reference-free metrics) only require the source! We don't have any model that uses only the reference. Are you trying to use COMET in a particular use-case where you don't have access to the source?
-
the sys_score is the average of the seg_scores. The seg_scores are basically the quality assessments for each individual hypothesis (MT). They might be useful for some applications where we are actually interested in comparing two (or more) systems at the segment level.
from comet.
Thank you, @ricardorei, this helps a lot. I am trying to evaluate in a case when the source is not available. In my work, I am interested to know how well a translation is given some reference text. If there is future development on source-free metrics, feel free to let me know.
One more question I have is on the language list that are covered, are these languages refer to the source language? If, say, a language is not covered, how much can people trust the output? I read the "uncertainty-aware" paper, is there a simple way to print out the bounds from COMET's repo?
from comet.
The language list is basically source and target.
Let's imagine you are evaluating translations from Maori into English (or vice-versa). Maori is not covered in XLMR and the results might be unreliable. This is especially important if the target language is the uncovered language and/or if using a reference-free metric.
Regarding the question "how much people can trust the output?" this is very hard to answer. In WMT20 we evaluated the model on Inuktitut and the results were good, yet Tom Kocmi from Microsoft tested several COMET models on several languages and he reported that it was not stable for languages not covered in XLM-R (#18).
I read the "uncertainty-aware" paper, is there a simple way to print out the bounds from COMET's repo?
The code used in that paper is based on a previous COMET codebase and you can find everything here: UA_COMET
In this current codebase, you can only replicate the Monte Carlo Dropout results using the --mc_dropout
flag. This should give you reasonable bounds without having to run several models.
from comet.
Thank you so much, @ricardorei . I mainly work with low resource languages, and COMET sometimes will give surprising good result when there is low BLEU. So I would be interested in getting a good bounds. Thank you, I will try UA_COMET.
Thank you for your explanation, I really appreciate it and have a good weekend!
from comet.
Thanks, @i55code have a nice weekend!
from comet.
@ricardorei Hi Ricardo, I have one last question to ask. For --mc_dropout, I saw the output on command line, but is there a simple way to invoke it through Python interface. I see that there is model.predict(...), and model.mc_dropout(), what is a simple command in python to get bounds? Thanks!
from comet.
Its an argument of the predict function:
mean_scores, std_scores, sys_score = model.predict(data, batch_size=8, gpus=1, mc_dropout=30)
from comet.
Thank you, @ricardorei Ricardo! Really learn a lot today. I hope to use COMET more in my future research. Take good care and stay safe!
from comet.
Thanks, @i55code feel free to reach out for questions!
from comet.
Related Issues (20)
- if tgt is same with src, the score is still high HOT 2
- [QUESTION] Train UnifiedMetric/XCOMET with word level predictions. HOT 1
- Sparsemax not actually used in COMET-KIWI, XCOMET-XL/XXL HOT 4
- Invalid link reference of reference-free model in readme
- Minimizing cpu RAM vs only use GPU RAM HOT 1
- what is the precision when load_from_checkpoint?
- Runtime error when loading wmt23-cometkiwi-da-xl HOT 1
- Different scores from different COMET package versions 1.1.2 and 2.2.1 HOT 2
- Different versions of COMET code give different scores with the same model and date.
- [QUESTION] large file scoring HOT 3
- [QUESTION] Splitting big models over multiple GPUs HOT 6
- [QUESTION] Memory footprint HOT 21
- [INPUT] Text Length of Input (source, reference, and hypothesis) HOT 2
- Change the global variable logger to comet_logger HOT 1
- Training script for XCOMET HOT 1
- Safetensors Support
- [QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction HOT 4
- [QUESTION] why num_layers = num_hidden_layers + 1 HOT 1
- [QUESTION] Comet kiwi architecture HOT 8
- Training data and scripts used for wmt22-cometkiwi-da HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comet.