unbabel / comet Goto Github PK
View Code? Open in Web Editor NEWA Neural Framework for MT Evaluation
Home Page: https://unbabel.github.io/COMET/html/index.html
License: Apache License 2.0
A Neural Framework for MT Evaluation
Home Page: https://unbabel.github.io/COMET/html/index.html
License: Apache License 2.0
Using the comet score with QE model (wmt-large-qe-estimator-1719) through the command line asks for a reference, even when it shouldn't be used in the calculation (if I am not mistaken)
COMET works with python 3.6 but not with python 3.8
OS: MacOS
Packaging: pip
Version 1.0.0rc6
I pip installed comet on my python 3.8.12 virtual environment and then tested the example provided in the readme Scoring with Python
seg_scores, sys_score = model.predict(data, batch_size=8, gpus=0)
But I get the following error:
/usr/local/Cellar/[email protected]/3.8.12/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py in _launch(self, process_obj)
45 try:
46 reduction.dump(prep_data, fp)
---> 47 reduction.dump(process_obj, fp)
48 finally:
49 set_spawning_popen(None)
/usr/local/Cellar/[email protected]/3.8.12/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
AttributeError: Can't pickle local object 'CometModel.predict.<locals>.<lambda>'
I tested the same code on python 3.6 and it did work, so thanks a lot :)
I expected comet to work with >=python3.5
Is there any plan to make it work for python 3.8?
Thanks again.
Dear authors,
Thanks a lot for COMET and outsourcing your code.
Running word_level QE estimation trigger the following error:
Traceback (most recent call last):
File "", line 1, in
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/Ebenge/Desktop/quality_estimation/examples/word_level/wmt_2018/de_en/microtransquest.py", line 52, in
sources_tags, targets_tags = model.predict(test_sentences[:1], split_on_space=True)
File "/Users/Ebenge/Desktop/quality_estimation/transquest/algo/word_level/microtransquest/run_model.py", line 991, in predict
eval_dataset = self.load_and_cache_examples(None, to_predict=predict_examples)
File "/Users/Ebenge/Desktop/quality_estimation/transquest/algo/word_level/microtransquest/run_model.py", line 1203, in load_and_cache_examples
features = convert_examples_to_features(
File "/Users/Ebenge/Desktop/quality_estimation/transquest/algo/word_level/microtransquest/utils.py", line 345, in convert_examples_to_features
with Pool(process_count) as p:
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
Traceback (most recent call last):
File "", line 1, in
return Popen(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
super().init(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
exitcode = _main(fd, parent_sentinel)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
_check_not_importing_main()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
prepare(preparation_data)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
_fixup_main_from_path(data['init_main_from_path'])
python microtransquest.py (the one in examples/word_level/wmt_2018/de_en)
No error
Wrap everything inside a main π
OS: OS
certifi==2021.10.8
charset-normalizer==2.0.7
click==8.0.3
configparser==5.1.0
cycler==0.11.0
docker-pycreds==0.4.0
filelock==3.4.0
flatbuffers==2.0
fonttools==4.28.2
gitdb==4.0.9
GitPython==3.1.24
huggingface-hub==0.1.2
idna==3.3
joblib==1.1.0
kiwisolver==1.3.2
matplotlib==3.5.0
numpy==1.21.4
onnxruntime==1.9.0
packaging==21.3
pandas==1.3.4
pathtools==0.1.2
Pillow==8.4.0
promise==2.3
protobuf==3.19.1
psutil==5.8.0
pyparsing==3.0.6
python-dateutil==2.8.2
pytz==2021.3
PyYAML==6.0
regex==2021.11.10
requests==2.26.0
sacremoses==0.0.46
scikit-learn==1.0.1
scipy==1.7.2
sentencepiece==0.1.96
sentry-sdk==1.5.0
seqeval==1.2.2
setuptools-scm==6.3.2
shortuuid==1.0.8
six==1.16.0
smmap==5.0.0
subprocess32==3.5.4
tensorboardX==2.4.1
termcolor==1.1.0
threadpoolctl==3.0.0
tokenizers==0.10.3
tomli==1.2.2
torch==1.10.0
tqdm==4.62.3
transformers==4.12.5
typing-extensions==4.0.0
urllib3==1.26.7
wandb==0.12.7
yaspin==2.1.0
Fix solve the pb.
COMET installation is failing on windows. Could you please take a look?
(base) C:\>conda create --name comet_windows_3_7 python=3.7
(base) C:\>conda activate comet_windows_3_7
(comet_windows_3_7) C:\>pip install unbabel-comet
Using cached test_tube-0.7.4.tar.gz (21 kB)
ERROR: Command errored out with exit status 1:
command: 'C:\Users\test\Anaconda3\envs\comet_windows_3_7\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\test\\AppData\\Local\\Temp\\pip-install-68_b6icx\\test-tube_f70f8dd226d64a01b89decde5fae3cab\\setup.py'"'"'; __file__='"'"'C:\\Users\\test\\AppData\\Local\\Temp\\pip-install-68_b6icx\\test-tube_f70f8dd226d64a01b89decde5fae3cab\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\test\AppData\Local\Temp\pip-pip-egg-info-ko3bque_'
cwd: C:\Users\test\AppData\Local\Temp\pip-install-68_b6icx\test-tube_f70f8dd226d64a01b89decde5fae3cab\
Complete output (7 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\test\AppData\Local\Temp\pip-install-68_b6icx\test-tube_f70f8dd226d64a01b89decde5fae3cab\setup.py", line 28, in <module>
install_requires=load_requirements(PATH_ROOT),
File "C:\Users\test\AppData\Local\Temp\pip-install-68_b6icx\test-tube_f70f8dd226d64a01b89decde5fae3cab\setup.py", line 10, in load_requirements
with open(os.path.join(path_dir, 'requirements.txt'), 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\test\\AppData\\Local\\Temp\\pip-install-68_b6icx\\test-tube_f70f8dd226d64a01b89decde5fae3cab\\requirements.txt'
[Extracting from #30]
It would be nice to would be to add support for sacrebleu-style builtin test sets, e.g.,
# one option
$ cat system.txt | comet -t wmt20 -l de-en [other args]
# another option
$ cat system.txt | comet --sacrebleu-testset wmt20/de-en
$ cat system.txt | comet --sacrebleu-testset mtedx/valid/pt-es
You could accomplish this by just using sacrebleu as a library. Itβs pretty easy:
from sacrebleu.utils import get_source, get_references, get_files
# trigger sacrebleu test set
# make these optional: nargs=β?β for argparse
if args.source is None and args.references is None:
if args.sacrebleu_dataset is None:
# throw error
# some test sets are hierarchical, e.g., βmtedx/validβ
test_set, langpair = args.sacrebleu_dataset.rsplit(β/β, maxsplit=1)
source = get_source(test_set, langpair)
ref = get_referencees(test_set, langpair)
# alternative
source, ref, _ = get_files(test_set, langpair)
Originally posted by @mjpost in #30 (comment)
I want to train my own metric which is a ranker model and referenceless. According to https://github.com/Unbabel/COMET/blob/0.1.0/docs/source/training.md the data format is a csv file with src, mt, ref and score. Because the ranker model needs to have a pos hypothesis and a neg hypothesis to train and as I understand it doesn't need score to train so is the data format the same for training ranker model?
When a model download is halted before it completes, and then a new command is used referring to the same model (e.g. the default comet score -s src.de -h hyp.en -r ref.en
, the script will try to retrieve the cached (incomplete) download and will result in an error:
Exception: [meta_tags.csv|hparams.yaml is missing from the checkpoint folder.
It is resolved is the cache is cleared
Full error trace:
Traceback (most recent call last):
File "/home/chryssa/anaconda3/bin/comet", line 11, in <module>
load_entry_point('unbabel-comet==0.0.7', 'console_scripts', 'comet')()
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/unbabel_comet-0.0.7-py3.7.egg/comet/cli.py", line 121, in score
model = load_checkpoint(model) if os.path.exists(model) else download_model(model)
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/unbabel_comet-0.0.7-py3.7.egg/comet/models/__init__.py", line 98, in download_model
return load_checkpoint(checkpoint_path)
File "/home/chryssa/anaconda3/lib/python3.7/site-packages/unbabel_comet-0.0.7-py3.7.egg/comet/models/__init__.py", line 136, in load_checkpoint
"[meta_tags.csv|hparams.yaml is missing from the checkpoint folder."
Hello!
Whenever I am try to run your demo, when using :
from comet.models import download_model
I get:
Segmentation fault (core dumped)
I'm using Amazon EC2 instances with ubuntu 18.04 and I've also tried with ubuntu 20.04 and got the same errors.
I log into the instance and do:
sudo su
apt-get update
apt-get install python3-pip
pip3 install unbabel-comet
If I'm on 18.04 I have to pip upgrade because otherwise sentence piece breaks.
I've tried building from source installing from pip nothing seems to be working it installs and I can even do:
import comet
But whenever :
from comet.models import download_model
segmentation fault, also if I use the cli command:
comet download -d apequest --saving_path data/
again the same error!
OS: Linux
Packaging pip
Version : pip 20.0.2
I've tried multiple instances, virtual environments but nothing seems to be effective! With ubuntu 20.04 I used python 3.8, I've seen that the recommendation is python 3.6, On ubuntu 18.04 I've used the default python3 which is 3.6.9 and still had the same issues!
Here is the output of my pip freeze
absl-py==0.11.0
cachetools==4.1.1
certifi==2020.11.8
cffi==1.14.3
chardet==3.0.4
click==7.1.2
Cython==0.29.15
fairseq==0.9.0
fastBPE==0.1.0
filelock==3.0.12
fsspec==0.8.4
future==0.18.2
google-auth==1.23.0
google-auth-oauthlib==0.4.2
grpcio==1.33.2
idna==2.10
joblib==0.17.0
Markdown==3.3.3
numpy==1.19.4
oauthlib==3.1.0
pandas==1.0.5
portalocker==2.0.0
protobuf==3.14.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
python-dateutil==2.8.1
pytorch-lightning==1.0.7
pytorch-nlp==0.5.0
pytz==2020.4
PyYAML==5.3.1
regex==2020.11.13
requests==2.25.0
requests-oauthlib==1.3.0
rsa==4.6
sacrebleu==1.4.14
sacremoses==0.0.43
scikit-learn==0.23.1
scipy==1.5.4
sentencepiece==0.1.94
six==1.15.0
sphinx-markdown-tables==0.0.15
tensorboard==2.2.0
tensorboard-plugin-wit==1.7.0
threadpoolctl==2.1.0
tokenizers==0.7.0
torch==1.4.0
tqdm==4.52.0
transformers==2.10.0
-e git+https://github.com/Unbabel/COMET@c9ac4c9cbdb8484aa5ee286c9cbe13002c16a193#egg=unbabel_comet
urllib3==1.26.2
Werkzeug==1.0.1
wget==3.2
If you could help me figure this thing out it would be wonderful!
Thank you for your time and willingness to share your tool! I'm eager to try it :-)
comet-compare output wrong info in the json file:
I think that the problems are due to this line
Line 142 in 6009f67
It seems that the entry 0 is always output, while it should output the entry "i".
comet-compare -s SRC.txt -r REF.txt -x SysX.txt -y SysY.txt --to_json JSON.txt
SRC.txt, REF.txt, SysX.txt and SysY.txt must have more than one lines.
If applicable, add screenshots to help explain your problem.
OS: ubuntu
Packaging: pip3
Version: unbabel-comet==1.0.0rc8
Is COMET on a 0-1 scale? Is it normalized to [0,1]? I saw on the paper of uncertainty-aware COMET, there is a COMET score that is bigger than 1, how is it so? Is that normal/frequent?
COMET should output a list of available models if -m
is used with an invalid model name.
It's a bit of a pain to figure out the available models from the CLI. I had to come to the Github page.
Thanks for the tool! I'm wondering if COMET supports multiple references, or if we can just score each sentence with all the references and take the maximum value? Sorry if this has already been mentioned somewhere.
Hi Unbabel,
I wanted to ask, why is the wmt20-comet-da default model when using COMET? I am a bit worried that people using it off the shelf won't understand the underlying difference and will mistakingly report COMET scores on the QE metric.
Why not set the reference-based COMET as the default, since it seems to be outperforming comet-da (also I fear that comet-da will have more biases and potential problems than reference-based).
Thank you for the answer,
Tom
Instead of bootstrap resampling, we could simply run paired t-test to check if a system is significantly better than other.
Simples to understand.
Using comet 1.0.1 with Python 3.9, I get the following (using testset in /data of mt-telescope):
$ comet-score -s newstest2020-ruen.src.ru.txt -t newstest2020-ruen.OnlineA.txt -r newstest2020-ruen.ref.en.txt
Global seed set to 12
wmt20-comet-da is already in cache.
Traceback (most recent call last):
File ".../telescope-venv/bin/comet-score", line 8, in
sys.exit(score_command())
File ".../telescope-venv/lib/python3.9/site-packages/comet/cli/score.py", line 180, in score_command
model = load_from_checkpoint(model_path)
File ".../telescope-venv/lib/python3.9/site-packages/comet/models/init.py", line 57, in load_from_checkpoint
model_class = str2model[hparams["class_identifier"]]
TypeError: 'NoneType' object is not subscriptable
Output comet score.
OS: Linus
Packaging Pip
Version 1.0.1
While inference (comet-score) using the provided metrics works well, if I try to train a new metric with comet-train and the use it to predict quality scores I get the following error:
comet-score: error: argument --model: invalid choice: 'lightning_logs/version_19/checkpoints/epoch=1-step=129339.ckpt' (choose from 'emnlp20-comet-rank', 'wmt20-comet-da', 'wmt20-comet-qe-da', 'wmt21-cometinho-da')
The error disappears once I comment out line 64: choices=available_metrics.keys(),
in comet/cli/score.py
parser.add_argument(
"--model",
type=Union[str, Path_fr],
required=False,
default="wmt20-comet-da",
choices=available_metrics.keys(),
help="COMET model to be used.",
)
When I tried a clean install of current COMET version via pip, I got an error that there is a conflict in versions. I managed to resolve it by manually degrading PyYAML to 3.3.*
Could you check it, please?
COMET takes 30-40 minutes to evaluate 400 sentences testset on CPU. Therefore GPU is necessary, but it took me some time before I found out there is a flag "cuda". Can you add an example with cuda parametr into README?
Following the README:
> echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und KindergΓ€rten wurden erΓΆffnet." >> src.de
> echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
> comet-score -s src.de -t hyp.en --model wmt20-comet-qe-da
results in the following output:
Global seed set to 12
usage: comet-score [-h] [-s SOURCES] [-t TRANSLATIONS] [-r REFERENCES] [--batch_size BATCH_SIZE] [--gpus GPUS]
[--to_json TO_JSON]
[--model {emnlp20-comet-rank,wmt20-comet-da,wmt20-comet-qe-da,wmt21-cometinho-da}]
[--mc_dropout MC_DROPOUT] [--seed_everything SEED_EVERYTHING]
comet-score: error: wmt20-comet-qe-da requires -r/--references.
Looking at the code:
Lines 79 to 80 in 61caa5a
refless
in the name, which none of the available models have:
comet-score: error: argument --model: invalid choice: [...] (choose from 'emnlp20-comet-rank', 'wmt20-comet-da', 'wmt20-comet-qe-da', 'wmt21-cometinho-da')
I'd hope to get a score for my hypotheses given the sources.
n/a
OS: MacOS
Packaging: pip
Version: unbabel-comet==1.0.0rc4
Multi-GPU support would be nice.
Scoring larger test sets takes ages on a single GPU :)
Is there a theoretical range of values for the COMET regressor?
Since the final estimator layer is an FFN https://github.com/Unbabel/COMET/blob/master/comet/models/regression/regression_metric.py#L95
Which goes to
COMET/comet/modules/feedforward.py
Line 58 in 85b0c8f
Is the theoretical range (-inf, inf)
, https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html?
Are there some table of practical ranges that the comet owners/contributors/users have found for varying languages and length?
I would like to use this library for COMET scores but I want to host xlm-roberta-large elsewhere like HuggingFace. How we can enable in this library? Would modifying xlmr.py be enough to call the XLM model that is deployed to a remote GPU server?
I would like to place the computationally intensive encoder of the model on a GPU server (for faster batch-inference) that is shared by multiple COMET scorers but also maybe with other applications that may also be benefiting from xlm-roberta-large.
Use a dependency manager such as Poetry to end with problems with requirements.
Hi,
Recently I'm reimplementing the results in your COMET EMNLP'20 paper. I also carefully referred to documentation for more details. However, when reimplementing experiments over wmt-metrics
data, I find something unexpected. Here are my steps for preparing:
conda
:conda create -n comet python=3.8
conda activate comet
pip install unbabel-comet
wmt-metrics
data via:comet download -d wmt-metrics --saving_path data/wmt-metrics/
After these steps I continue the implementation with the released model:
en-de
, so I first split test19-relative-ranking.csv
file into multiple files: storing source
, reference
, positive_hypothesis
, and negative_hypothesis
line by line. Here is the content of python source file script_language_filter.py
from argparse import ArgumentParser
from csv import reader
parser = ArgumentParser()
parser.add_argument('--input_file', type=str, required=True)
parser.add_argument('--language', type=str, required=True)
parser.add_argument('--output_src', type=str, required=True)
parser.add_argument('--output_ref', type=str, required=True)
parser.add_argument('--output_pos', type=str, required=True)
parser.add_argument('--output_neg', type=str, required=True)
args = parser.parse_args()
def main():
with open(args.input_file, mode='r', encoding='utf-8') as f1, \
open(args.output_src, mode='w', encoding='utf-8') as f2, \
open(args.output_ref, mode='w', encoding='utf-8') as f3, \
open(args.output_pos, mode='w', encoding='utf-8') as f4, \
open(args.output_neg, mode='w', encoding='utf-8') as f5:
csv_reader = reader(f1)
next(csv_reader) # escape the first line
for _, row in enumerate(csv_reader):
# csv_file title:
# data, lp, src, ref, pos, neg, pos.model, neg.model, bestmodel
# indexes of our interest:
# 1 , 2 , 3 , 4 , 5
if row[1] != args.language:
continue
f2.write(row[2].strip() + '\n')
f3.write(row[3].strip() + '\n')
f4.write(row[4].strip() + '\n')
f5.write(row[5].strip() + '\n')
return
if __name__ == '__main__':
main()
Then, run this command:
python script_language_filter.py \
--input_file test19-relative-ranks.csv \
--language "de-en" \
--output_src test19-relative-ranks.src \
--output_ref test19-relative-ranks.ref \
--output_pos test19-relative-ranks.pos \
--output_neg test19-relative-ranks.neg
After this step, I get 4 more files test19-relative-ranks.{src,ref,pos,neg}
, each yielding 17,073 lines.
comet score -s test19-relative-ranks.src \
-h test19-relative-ranks.pos \
-r test19-relative-ranks.ref \
--batch_size 16 \
--to_json test19-relative-ranks.pos.json \
--model emnlp-base-da-ranker
For negative_hypothesis-reference:
comet score -s test19-relative-ranks.src \
-h test19-relative-ranks.neg \
-r test19-relative-ranks.ref \
--batch_size 16 \
--to_json test19-relative-ranks.neg.json \
--model emnlp-base-da-ranker
Then I get two files storing predicted scores test19-relative-ranks.{pos,neg}.json
.
from argparse import ArgumentParser
from json import load
parser = ArgumentParser()
parser.add_argument('--pos_json', type=str, required=True)
parser.add_argument('--neg_json', type=str, required=True)
args = parser.parse_args()
def main():
with open(args.pos_json, mode='r', encoding='utf-8') as f1, \
open(args.neg_json, mode='r', encoding='utf-8') as f2:
pos_data = load(f1)
neg_data = load(f2)
concor = 0
discor = 0
for pos, neg in zip(pos_data, neg_data):
if pos['predicted_score'] > neg['predicted_score']:
concor += 1
else:
discor += 1
print('%d items in total. Concor: %d, Discor: %d, WMTKendall: %f' % (concor + discor, concor, discor, (concor - discor) / (concor + discor)))
return
if __name__ == '__main__':
main()
And I run:
python script_compute_rr.py \
--pos_json test19-relative-ranks.pos.json \
--neg_json test19-relative-ranks.neg.json
The results are:
17073 items in total. Concor: 11244, Discor: 5829, WMTKendall: 0.317167
So here I find my result is extremely higher than the reported result 0.202
in your paper (column de-en, row COMET-RANK in Table 2). I'm not sure which step I'm wrong. Besides, I also want to know whether the model tagged emnlp-base-da-ranker
is exactly the well-trained model corresponding to the reported results in Table 2 of your paper.
Could you answer these questions for me? Many thanks!
I tried to download the model using:
model = download_model("wmt-large-da-estimator-1719"
But I get the following error:
'''
AttributeError Traceback (most recent call last)
in
----> 4 model = download_model("wmt-large-da-estimator-1719")
7 frames
/proj/tools/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in setattr(self, name, value)
817 buffers[name] = value
818 else:
--> 819 object.setattr(self, name, value)
820
821 def delattr(self, name):
AttributeError: can't set attribute
'''
When installing either using pip install unbabel-comet
or directly with pip install -r requirements.txt
I get an error error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Running setup.py install for fastBPE ... error ERROR: Command errored out with exit status 1: command: /home/ubuntu/cometenv/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-eeqbscuj/fastbpe_45baab04cd16456fb32b018392790726/setup.py'"'"'; __file__='"'"'/tmp/pip-install-eeqbscuj/fastbpe_45baab04cd16456fb32b018392790726/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-hfrq684c/install-record.txt --single-version-externally-managed --compile --install-headers /home/ubuntu/cometenv/include/site/python3.7/fastBPE cwd: /tmp/pip-install-eeqbscuj/fastbpe_45baab04cd16456fb32b018392790726/ Complete output (15 lines): running install running build running build_py package init file 'fastBPE/__init__.py' not found (or not a regular file) running build_ext building 'fastBPE' extension creating build creating build/temp.linux-x86_64-3.7 creating build/temp.linux-x86_64-3.7/fastBPE x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -IfastBPE -I/usr/include/python3.7m -I/home/ubuntu/cometenv/include/python3.7m -c fastBPE/fastBPE.cpp -o build/temp.linux-x86_64-3.7/fastBPE/fastBPE.o -std=c++11 -Ofast -pthread fastBPE/fastBPE.cpp:28:10: fatal error: Python.h: No such file or directory #include "Python.h" ^~~~~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /home/ubuntu/cometenv/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-eeqbscuj/fastbpe_45baab04cd16456fb32b018392790726/setup.py'"'"'; __file__='"'"'/tmp/pip-install-eeqbscuj/fastbpe_45baab04cd16456fb32b018392790726/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-hfrq684c/install-record.txt --single-version-externally-managed --compile --install-headers /home/ubuntu/cometenv/include/site/python3.7/fastBPE Check the logs for full command output.
(Ubuntu 18.04) Version 34.0
Python 3.7
When I run "comet-score -s test.en-zh.en -t decoder-out -r test.en-zh.zh", I got the following warnings. Is that normal? or am I missing something?
/root/.cache/torch/unbabel_comet/wmt20-comet-da//checkpoints/model.ckpt
Some weights of the model checkpoint at xlm-roberta-large were not used when initializing XLMRobertaModel: ['lm_head.bias', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight']
This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Encoder model frozen.
/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/container.py:435: UserWarning: Setting attributes on ParameterList is not supported.
warnings.warn("Setting attributes on ParameterList is not supported.")
GPU available: True, used: True
Comet-score
should be able to print a shortened score, as an average of all segment scores, when passed a particular flag (maybe something like --quiet
). To the best of my knowledge, this does not seem possible currently (though of course, I could be wrong as I am new to this package)
Currently, Comet-score
prints a line-by-line score for each segment. This can be quite an overkill, especially if one is only interested in the score for the whole test set (which is currently calculated as an average for each segment score). Displaying only the average would be useful in these cases.
This is the current output when I run comet-score
from the CLI:
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)>
error when running comet-score
on python >3.7 MacOS.
Install python via homebrew, create a virtual environment with comet and then run comet-score
.
OS: macOS Mojave 10.14.6
It seems that, for some reason, Brew has not run the Install Certificates.command that comes in the Python3 bundle for Mac.
Hi,
I'm using version 1.0.0rc6 and scoring within Python, seg_scores, sys_score = model.predict(data, gpus=1)
.
However, I cannot disable the progress bar using show_progress=False
. It would be nice to have this option.
Thanks!
Sometimes we only have translations with their references and no source. But the default COMET expects something like:
{"src": src, "mt": hyp, "ref": ref}
or QE comet
{"src": src, "mt": hyp}
Is there a way to let comet take
{"mt": hyp, "ref": ref}
Would this be a feasible approach?
{"src": ref, "mt": hyp, "ref": ref}
If I have apex
installed, this library throws ImportError: cannot import name 'container_abcs' from 'torch._six'
.
When I try to run this example from the readme:
echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und KindergΓ€rten wurden erΓΆffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp1.en
echo -e "The fire could have been stopped\nSchools and pre-school were open" >> hyp2.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en
comet-score -s src.de -t hyp1.en -r ref.en
...I get: ImportError: cannot import name 'container_abcs' from 'torch._six'
, but if I uninstall apex
, comet works again.
Torch versions:
Traceback (most recent call last):
File "/home/scarrion/anaconda3/envs/mltests/bin/comet-score", line 5, in <module>
from comet.cli.score import score_command
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/comet/__init__.py", line 19, in <module>
from .download_utils import download_model
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/comet/download_utils.py", line 26, in <module>
from comet.models import available_metrics
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/comet/models/__init__.py", line 17, in <module>
from .regression.regression_metric import RegressionMetric
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/comet/models/regression/regression_metric.py", line 26, in <module>
from comet.models.base import CometModel
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/comet/models/base.py", line 29, in <module>
import pytorch_lightning as ptl
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
from pytorch_lightning import metrics # noqa: E402
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
from pytorch_lightning.metrics.utils import deprecated_metrics
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/metrics/utils.py", line 29, in <module>
from pytorch_lightning.utilities import rank_zero_deprecation
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 26, in <module>
from pytorch_lightning.utilities.imports import _compare_version, _TORCHTEXT_AVAILABLE
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/utilities/imports.py", line 73, in <module>
_APEX_AVAILABLE = _module_available("apex.amp")
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/pytorch_lightning/utilities/imports.py", line 36, in _module_available
return find_spec(module_path) is not None
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/importlib/util.py", line 94, in find_spec
parent = __import__(parent_name, fromlist=['__path__'])
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/apex/__init__.py", line 8, in <module>
from . import amp
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/apex/amp/__init__.py", line 1, in <module>
from .amp import init, half_function, float_function, promote_function,\
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/apex/amp/amp.py", line 1, in <module>
from . import compat, rnn_compat, utils, wrap
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/apex/amp/rnn_compat.py", line 1, in <module>
from . import utils, wrap
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/apex/amp/wrap.py", line 3, in <module>
from ._amp_state import _amp_state
File "/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/apex/amp/_amp_state.py", line 14, in <module>
from torch._six import container_abcs
ImportError: cannot import name 'container_abcs' from 'torch._six' (/home/scarrion/anaconda3/envs/mltests/lib/python3.8/site-packages/torch/_six.py)
OS: Ubuntu 20.04
Packaging: pip
Version: 1.0.1
Hi,
I am using your tool in my python pipelines, but I had a problem, that your requirements are too strict and I get conflict with many tools. Could you please reinvestigate if you must specify all packages on exact version?
Additionally, are you planning to shift into transformer3?
Thank you,
TK
Hello,
In my ongoing evaluation of metrics, I have found out, that COMET (especially source based) is very unpredictable when it evaluates a language that is not supported by XLM model. This can easily happen because there is no list of supported languages for COMET on your git page and users would need to investigate this issue to XLM paper/git.
Would it be possible to add a check that language is supported (and thus ask for the language code when evaluating)?
Here I share my findings with you: Here is a graph for COMET source-based (QE), on Y-axis are human deltas and on X-axis are COMET deltas. The green dots around COMET delta 0 (it isn't exactly 0) are for language pairs where one of the languages is not supported by XLM, you can see that humans did found a difference for given languages but COMET was chaotic (other metrics doesn't have this problem).
Hi Ricardo, thank you so much for your previous answers.
I have a follow-up question regarding to train Comet myself without using any source sentences.
So the input to the system will be translated sentences, reference sentences and the rating of the translations.
Is there a way to do this in the current code base? Thank you in advance.
Hi Ricardo, is it possible for me to train from your checkpoint wmt20-comet-da? Currently, I saw that "wmt20-comet-da" contains only the model.
I understand COMET returns a single score for each sentence it evaluates. I was wondering if there is any way to report a corpus level metric, and what would one be? Similar to how BLEU is reported.
Hello! I've tried to train my a comet model using my own data! I want to train using hter as a metric, I used your configuration that's present in the repo: https://github.com/Unbabel/COMET/blob/master/configs/xlmr/base/hter-estimator.yaml
Python 3.6.9
python3 -m venv comet
pip install unbabel-comet
comet train -f config.yml
Where config.yml is the configuration I mentioned above with alterations to the training data path.
It does not seem to be an issue with the data as I have the correct column names and the model did train through the 2 epochs that were established in the configuration file.
Trained model, that could be loaded via python.
Here's the output from my logs.
Epoch 2: 100%|ββββββββββ| 25000/25000 [1:16:17<00:00, 5.46it/s, loss=0.056, v_num=4-54, pearson=0.924, kendall=0.81, spearman=0.946, avg_loss=0.0621]
Traceback (most recent call last):
File "/home/ubuntu/comet/bin/comet", line 33, in <module>
sys.exit(load_entry_point('unbabel-comet==0.0.6', 'console_scripts', 'comet')())
File "/home/ubuntu/comet/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/comet/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/ubuntu/comet/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ubuntu/comet/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/comet/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/ubuntu/comet/lib/python3.6/site-packages/comet/cli.py", line 63, in train
trainer.fit(model)
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 453, in fit
self.call_hook('on_fit_end')
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 835, in call_hook
trainer_hook(*args, **kwargs)
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/trainer/callback_hook.py", line 57, in on_fit_end
callback.on_fit_end(self, self.get_model())
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
return fn(*args, **kwargs)
TypeError: on_fit_end() takes 2 positional arguments but 3 were given
OS: Linux
Packaging: pip
Version: latest
Thank you for your time!
Cumprimentos,
Jose :-)
Is comet download
still a supported command? If not, what is the best way to download the data needed to reproduce the results?
$ comet download --help
=> command not found: comet
$ comet-download --help
=> command not found: comet-download
I tried running comet download
as specified in data/README.md
, and I get the error command not found: comet
. However, comet-score
and comet-compare
works, so I know that I did install it. I also tried comet-download
and still get the same error.
I'm using wmt20-comet-qe-da for MT quality estimation. I wanted to use HTER based QE model for better interpretability. Is that model supported yet? If not can you guide me a bit to understand DA system scores? eg. I have source and MT. I get a DA score - what's the threshold score to be considered good or bad?
It would be really nice if COMET could read input from STDIN, e.g.,
# three fields triggers comet-ref
$ paste source.txt hyps.txt ref.txt | comet [args]
# two fields -> comet-src
$ paste source.txt hyps.txt | comet [args]
This is consistent with standard UNIX usage. It is also slightly less cumbersome, and allows comet to be used in settings without writing files to disk.
Hi, I found that the HTER models are off the download list of the current codes.
https://github.com/Unbabel/COMET/blob/master/comet/models/__init__.py
I wonder whether they are still supported in the current version.
I used version 1.0.0rc9, and it report this.
"Exception: wmt-large-hter-estimator is not in the availale_metrics
or is a valid checkpoint folder."
Is that normal or should I use the previous version?
Thanks.
Running pip install -r requirements with proposed python3.6 results in: RuntimeError: Python version >= 3.7 required, when trying to install the fairseq module. Installation runs smoothly with 3.7.
Full error trace below:
Collecting fairseq==0.9.0 (from -r requirements.txt (line 7))
Cache entry deserialization failed, entry ignored
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/67/bf/de299e082e7af010d35162cb9a185dc6c17db71624590f2f379aeb2519ff/fairseq-0.9.0.tar.gz (306kB)
100% |ββββββββββββββββββββββββββββββββ| 307kB 2.0MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-s58jkscl/numpy-1.20.1/setup.py", line 30, in <module>
self.__include_dirs = []
RuntimeError: Python version >= 3.7 required.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-5bklbiad/fairseq/setup.py", line 161, in <module>
zip_safe=False,
File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 128, in setup
_install_setup_requires(attrs)
File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 123, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 513, in fetch_build_eggs
replace_conflicting=True,
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 774, in resolve
replace_conflicting=replace_conflicting
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1057, in best_match
return self.obtain(req, installer)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1069, in obtain
return installer(requirement)
File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 580, in fetch_build_egg
return cmd.easy_install(req)
File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 698, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 724, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 909, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1177, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1163, in run_setup
run_setup(setup_script, args)
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 253, in run_setup
raise
File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 166, in save_modules
saved_exc.resume()
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 141, in resume
six.reraise(type, exc, self._tb)
File "/usr/lib/python3/dist-packages/setuptools/_vendor/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-s58jkscl/numpy-1.20.1/setup.py", line 30, in <module>
self.__include_dirs = []
RuntimeError: Python version >= 3.7 required.
Hi Ricardo, I read all your code implementations. From my understanding, when I use comet-train to train my own evaluation metric, I will not load in any pretrained Comet weights except the initialization of XLM-Roberta weights. I just want to double check if this is correct. Currently, I set "resume_from_checkpoint" in the trainer.yaml in the config to be "null".
Thank you in advance! This is a great work.
I'm running
for i in *.output.txt; do
comet-score -s src.txt -r newstest2021.en-de.ref.ref-A.de -t $i >$i.comet
done
and the progress bar is going into $i.comet.
comet-score -s src.txt -r ref.txt -t hyp.txt >output
less output
It would be better to:
less output
does this
^MPredicting: 0it [00:00, ?it/s]^MPredicting: 1%| | 1/126 [00:01<02:30, 1.20s/it]^MPredicting: 2%|β | 2/126 [00:01<01:30, 1.37it/s]^MPredicting: 2%|β | 3/126 [00:01<01:13, 1.67it/s]^MPredicting: 3%|β | 4/126 [00:02<01:07, 1.81it/s]^MPredicting: 4%|β | 5/126 [00:02<01:02, 1.94it/s]^MPredicting: 5%|β | 6/126 [00:02<00:58, 2.05it/s]^MPredicting: 6%|β | 7/126 [00:03<00:55, 2.14it/s]^MPredicting: 6%|β | 8/126 [00:03<00:52, 2.26it/s]^MPredicting: 7%|β | 9/126 [00:03<00:50, 2.30it/s]^MPredicting: 8%|β | 10/126 [00:04<00:48, 2.40it/s]^MPredicting: 9%|β | 11/126 [00:04<00:46, 2.45it/s]^MPredicting: 10%|β | 12/126 [00:04<00:46, 2.47it/s]^MPredicting: 10%|β | 13/126 [00:05<00:46, 2.45it/s]^MPredicting: 11%|β | 14/126 [00:05<00:45, 2.48it/s]^MPredicting: 12%|ββ | 15/126 [00:05<00:43, 2.53it/s]^MPredicting: 13%|ββ | 16/126 [00:06<00:43, 2.54it/s]^MPredicting: 13%|ββ | 17/126 [00:06<00:42, 2.58it/s]^MPredicting: 14%|ββ | 18/126 [00:06<00:41, 2.62it/s]^MPredicting: 15%|ββ | 19/126 [00:07<00:40, 2.65it/s]^MPredicting: 16%|ββ | 20/126 [00:07<00:39, 2.66it/s]^MPredicting: 17%|ββ | 21/126 [00:07<00:39, 2.66it/s]^MPredicting: 17%|ββ | 22/126 [00:08<00:39, 2.63it/s]^MPredicting: 18%|ββ | 23/126 [00:08<00:38, 2.64it/s]^MPredicting: 19%|ββ | 24/126 [00:09<00:38, 2.65it/s]^MPredicting: 20%|ββ | 25/126 [00:09<00:37, 2.68it/s]^MPredicting: 21%|ββ | 26/126 [00:09<00:36, 2.71it/s]^MPredicting: 21%|βββ | 27/126 [00:09<00:36, 2.74it/s]^MPredicting: 22%|βββ | 28/126 [00:10<00:35, 2.76it/s]^MPredicting: 23%|βββ | 29/126 [00:10<00:35, 2.77it/s]^MPredicting: 24%|βββ | 30/126 [00:10<00:34, 2.76it/s]^MPredicting: 25%|βββ | 31/126 [00:11<00:34, 2.76it/s]^MPredicting: 25%|βββ | 32/126 [00:11<00:34, 2.76it/s]^MPredicting: 26%|βββ | 33/126 [00:11<00:33, 2.77it/s]^MPredicting: 27%|βββ | 34/126 [00:12<00:32, 2.80it/s]^MPredicting: 28%|βββ | 35/126 [00:12<00:32, 2.83it/s]^MPredicting: 29%|βββ | 36/126 [00:12<00:31, 2.85it/s]^MPredicting: 29%|βββ | 37/126 [00:12<00:31, 2.86it/s]^MPredicting: 30%|βββ | 38/126 [00:13<00:30, 2.87it/s]^MPredicting: 31%|βββ | 39/126 [00:13<00:30, 2.88it/s]^MPredicting: 32%|ββββ | 40/126 [00:13<00:29, 2.89it/s]^MPredicting: 33%|ββββ | 41/126 [00:14<00:29, 2.89it/s]^MPredicting: 33%|ββββ | 42/126 [00:14<00:28, 2.90it/s]^MPredicting: 34%|ββββ | 43/126 [00:14<00:28, 2.92it/s]^MPredicting: 35%|ββββ | 44/126 [00:14<00:27, 2.94it/s]^MPredicting: 36%|ββββ | 45/126 [00:15<00:27, 2.93it/s]^MPredicting: 37%|ββββ | 46/126 [00:15<00:27, 2.93it/s]^MPredicting: 37%|ββββ | 47/126 [00:16<00:27, 2.89it/s]^MPredicting: 38%|ββββ | 48/126 [00:16<00:27, 2.88it/s]^MPredicting: 39%|ββββ | 49/126 [00:16<00:26, 2.91it/s]^MPredicting: 40%|ββββ | 50/126 [00:17<00:25, 2.93it/s]^MPredicting: 40%|ββββ | 51/126 [00:17<00:25, 2.93it/s]^MPredicting: 41%|βββββ | 52/126 [00:17<00:25, 2.94it/s]^MPredicting: 42%|βββββ | 53/126 [00:18<00:24, 2.94it/s]^MPredicting: 43%|βββββ | 54/126 [00:18<00:24, 2.95it/s]^MPredicting: 44%|βββββ | 55/126 [00:18<00:24, 2.92it/s]^MPredicting: 44%|βββββ | 56/126 [00:19<00:24, 2.91it/s]^MPredicting: 45%|βββββ | 57/126 [00:19<00:23, 2.91it/s]^MPredicting: 46%|βββββ | 58/126 [00:19<00:23, 2.91it/s]^MPredicting: 47%|βββββ | 59/126 [00:20<00:23, 2.91it/s]^MPredicting: 48%|βββββ | 60/126 [00:20<00:22, 2.90it/s]^MPredicting: 48%|βββββ | 61/126 [00:20<00:22, 2.91it/s]^MPredicting: 49%|βββββ | 62/126 [00:21<00:22, 2.91it/s]^MPredicting: 50%|βββββ | 63/126 [00:21<00:21, 2.91it/s]^MPredicting: 51%|βββββ | 64/126 [00:22<00:21, 2.89it/s]^MPredicting: 52%|ββββββ | 65/126 [00:22<00:21, 2.90it/s]^MPredicting: 52%|ββββββ | 66/126 [00:22<00:20, 2.88it/s]^MPredicting: 53%|ββββββ | 67/126 [00:23<00:20, 2.85it/s]^MPredicting: 54%|ββββββ | 68/126 [00:23<00:20, 2.84it/s]^MPredicting: 55%|ββββββ | 69/126 [00:24<00:20, 2.85it/s]^MPredicting: 56%|ββββββ | 70/126 [00:24<00:19, 2.86it/s]^MPredicting: 56%|ββββββ | 71/126 [00:24<00:19, 2.87it/s]^MPredicting: 57%|ββββββ | 72/126 [00:25<00:18, 2.87it/s]^MPredicting: 58%|ββββββ | 73/126 [00:25<00:18, 2.87it/s]^MPredicting: 59%|ββββββ | 74/126 [00:25<00:18, 2.87it/s]^MPredicting: 60%|ββββββ | 75/126 [00:25<00:17, 2.89it/s]^MPredicting: 60%|ββββββ | 76/126 [00:26<00:17, 2.90it/s]^MPredicting: 61%|ββββββ | 77/126 [00:26<00:16, 2.90it/s]^MPredicting: 62%|βββββββ | 78/126 [00:26<00:16, 2.91it/s]^MPredicting: 63%|βββββββ | 79/126 [00:27<00:16, 2.91it/s]^MPredicting: 63%|βββββββ | 80/126 [00:27<00:15, 2.91it/s]^MPredicting: 64%|βββββββ | 81/126 [00:27<00:15, 2.90it/s]^MPredicting: 65%|βββββββ | 82/126 [00:28<00:15, 2.90it/s]^MPredicting: 66%|βββββββ | 83/126 [00:28<00:14, 2.92it/s]^MPredicting: 67%|βββββββ | 84/126 [00:28<00:14, 2.93it/s]^MPredicting: 67%|βββββββ | 85/126 [00:28<00:13, 2.93it/s]^MPredicting: 68%|βββββββ | 86/126 [00:29<00:13, 2.93it/s]^MPredicting: 69%|βββββββ | 87/126 [00:29<00:13, 2.91it/s]^MPredicting: 70%|βββββββ | 88/126 [00:30<00:13, 2.91it/s]^MPredicting: 71%|βββββββ | 89/126 [00:30<00:12, 2.90it/s]^MPredicting: 71%|ββββββββ | 90/126 [00:30<00:12, 2.90it/s]^MPredicting: 72%|ββββββββ | 91/126 [00:31<00:12, 2.91it/s]^MPredicting: 73%|ββββββββ | 92/126 [00:31<00:11, 2.90it/s]^MPredicting: 74%|ββββββββ | 93/126 [00:32<00:11, 2.90it/s]^MPredicting: 75%|ββββββββ | 94/126 [00:32<00:11, 2.89it/s]^MPredicting: 75%|ββββββββ | 95/126 [00:32<00:10, 2.89it/s]^MPredicting: 76%|ββββββββ | 96/126 [00:33<00:10, 2.88it/s]^MPredicting: 77%|ββββββββ | 97/126 [00:33<00:10, 2.88it/s]^MPredicting: 78%|ββββββββ | 98/126 [00:34<00:09, 2.88it/s]^MPredicting: 79%|ββββββββ | 99/126 [00:34<00:09, 2.87it/s]^MPredicting: 79%|ββββββββ | 100/126 [00:34<00:09, 2.88it/s]^MPredicting: 80%|ββββββββ | 101/126 [00:35<00:08, 2.88it/s]^MPredicting: 81%|ββββββββ | 102/126 [00:35<00:08, 2.88it/s]^MPredicting: 82%|βββββββββ | 103/126 [00:35<00:07, 2.88it/s]^MPredicting: 83%|βββββββββ | 104/126 [00:36<00:07, 2.88it/s]^MPredicting: 83%|βββββββββ | 105/126 [00:36<00:07, 2.88it/s]^MPredicting: 84%|βββββββββ | 106/126 [00:36<00:06, 2.87it/s]^MPredicting: 85%|βββββββββ | 107/126 [00:37<00:06, 2.87it/s]^MPredicting: 86%|βββββββββ | 108/126 [00:37<00:06, 2.87it/s]^MPredicting: 87%|βββββββββ | 109/126 [00:38<00:05, 2.86it/s]^MPredicting: 87%|βββββββββ | 110/126 [00:38<00:05, 2.86it/s]^MPredicting: 88%|βββββββββ | 111/126 [00:38<00:05, 2.86it/s]^MPredicting: 89%|βββββββββ | 112/126 [00:39<00:04, 2.86it/s]^MPredicting: 90%|βββββββββ | 113/126 [00:39<00:04, 2.85it/s]^MPredicting: 90%|βββββββββ | 114/126 [00:39<00:04, 2.85it/s]^MPredicting: 91%|ββββββββββ| 115/126 [00:40<00:03, 2.85it/s]^MPredicting: 92%|ββββββββββ| 116/126 [00:40<00:03, 2.85it/s]^MPredicting: 93%|ββββββββββ| 117/126 [00:41<00:03, 2.85it/s]^MPredicting: 94%|ββββββββββ| 118/126 [00:41<00:02, 2.85it/s]^MPredicting: 94%|ββββββββββ| 119/126 [00:41<00:02, 2.84it/s]^MPredicting: 95%|ββββββββββ| 120/126 [00:42<00:02, 2.81it/s]^MPredicting: 96%|ββββββββββ| 121/126 [00:43<00:01, 2.80it/s]^MPredicting: 97%|ββββββββββ| 122/126 [00:43<00:01, 2.79it/s]^MPredicting: 98%|ββββββββββ| 123/126 [00:44<00:01, 2.79it/s]^MPredicting: 98%|ββββββββββ| 124/126 [00:44<00:00, 2.80it/s]^MPredicting: 99%|ββββββββββ| 125/126 [00:44<00:00, 2.80it/s]^MPredicting: 100%|ββββββββββ| 126/126 [00:44<00:00, 2.81it/s]^MPredicting: 100%|ββββββββββ| 126/126 [00:45<00:00, 2.79it/s]
OS: Ubuntu 20.04 x86_64
Packaging pip
Version pip install unbabel-comet==1.0.0rc2
After installation, whether via pip
, poetry
, or direct usage (./comet/cli/score.py
), I get the following error:
$ PYTHONPATH=. python3 ./comet/cli/score.py
Traceback (most recent call last):
File "./comet/cli/score.py", line 56, in <module>
from comet.download_utils import download_model
File "/home/mattpost/src/COMET/comet/__init__.py", line 19, in <module>
from .download_utils import download_model
File "/home/mattpost/src/COMET/comet/download_utils.py", line 26, in <module>
from comet.models import available_metrics
File "/home/mattpost/src/COMET/comet/models/__init__.py", line 17, in <module>
from .regression.regression_metric import RegressionMetric
File "/home/mattpost/src/COMET/comet/models/regression/regression_metric.py", line 26, in <module>
from comet.models.base import CometModel
File "/home/mattpost/src/COMET/comet/models/base.py", line 41, in <module>
class OrderedSampler(Sampler[int]):
TypeError: 'type' object is not subscriptable
This is with Python 3.8.10, so relatively recent.
OS: Linux (ubuntu 20.04.3)
Packaging: all
Version: 1.0.1 (latest source)
Hello,
Our system is being validated against both "wmt-large-da-estimator-1719" and "wmt-large-hter-estimator" estimators with the same translations dataset, of course (70k+ translations).
The two estimators give completely opposite results.
The "da" estimator is placing our MT system in "...the bottom 25%" while the "HTER" estimator returns a "top 25%" score.
I know this is not a technical issue, but can you please provide some additional information on how we might be able to interpret those types of results?
Thank you very much
Using comet-compare
I noticed that the exact same outputs of two different systems receive different score, although almost negligible.
But, although negligibly different, these scores are non consider a tie; and hence there is an impact on the number of wins/losses reported.
"ties (%)": 0.0,
"x_wins (%)": 1.0,
"y_wins (%)": 0.0
{
"src": "NedΓ‘vno prohrΓ‘l s Raonicem v Brisbane Open.",
"system_x": {
"mt": "He recently lost to Raonic at the Brisbane Open.",
"score": 0.8726277947425842
},
"system_y": {
"mt": "He recently lost to Raonic at the Brisbane Open.",
"score": 0.872564971446991
},
"ref": "He recently lost against Raonic in the Brisbane Open."
},
comet-compare -s SRC -r REF -x SysX -y SysY --to_json JJJ
Actually, I am using directly the function "compare_command()" included in "cli/compare.py"
Either an identical score for identical outputs, or a bit more flexible counts of wins/losses/ties
If applicable, add screenshots to help explain your problem.
OS: ubuntu
Packaging: pip3
Version: unbabel-comet==1.0.0rc8
there are several issues with refless models:
the default refless model "wmt20-comet-qe-da" si not included in the _REFLESS_MODELS;
Line 43 in 6009f67
even if added, the tool fails here
Line 118 in 6009f67
the tool outputs the ref even if the model is reflesse; hence, it raises an error;
Line 145 in 6009f67
Actually, I solved all the bugs in my local code.
Before reporting a bug, make sure that the bug can be reproduced with a minimal example and add your relevant changes, to see if the issue persists.
If the test is failing, please add your test cases to the issue (as a draft PR, or simply paste the code to the issue description here).
OS: ubuntu
Packaging: pip3
Version: unbabel-comet==1.0.0rc8
I've installed COMET on two different machines, (Python 3.8.10, Ubuntu, running on GPU, and 3.9.10, MacOS, running on CPU) and am getting vastly different results on each machine for the same model and the same texts. Clearly, they can't both be correct.
I'm wondering whether there are any benchmark results that I can compare to, for example the simple examples in the installation instructions, so I can try to figure out what's going on and validate the installation. Also, I'm wondering whether the fact that I'm running COMET on Chinese characters might have something to do with the different results. Are there any benchmark results for EN<>CN?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.