Comments (6)
can you share a bit more details?
For example:
- exact command you are calling
- samples (I don't need the entire test set but just a few samples that are causing different outputs)
- the output that is being produced?
It is possible to have small differences on the same samples (things like 0.3456 vs 0.3462) but more than that, something is wrong...
from comet.
Thanks for the quick response. The command I'm calling follows the examples in the README. E.g.
comet-score -s SRCA.txt -t HTA1_MT_EN-US_-ZH-CN.txt --model wmt20-comet-qe-da
SRCA.txt is a plain text file in English. HTA1_MT_EN-US-_ZH-CN.txt is a translation in Chinese.
For the first segment (sentence) I get the scores: 0.2642 and 0.4263, respectively so fairly different and not a rounding issue.
This segment/sentence in SRCA is "From the factory floor to air-to-air combat, artificial intelligence will soon replace humans not only in jobs that involve basic, repetitive tasks but advanced analytical and decision-making skills."
The target text is "不久之后,人工智能将在许多工作中取代人类,从工厂车间到空对空作战,不仅基本和重复性的工作将由人工智能进行,需要高级分析和决策技能的工作亦是如此。" (I don't speak/read Chinese, so I don't have a clue what it says.)
from comet.
The output in my computer was:
mt.txt Segment 0 score: 0.4283
mt.txt score: 0.4283
from comet.
On a completely different machine, using a GPU, I get the same output:
mt.txt Segment 0 score: 0.4283
mt.txt score: 0.4283
from comet.
I don't get where the 0.2642 score comes from... I can't replicate it with your example
from comet.
Thanks, yes, that's what I'm getting on my Mac, too. I get the other score (0.2642) on my Linux machine, where I had some error messages while trying to install comet because of package incompatibilities. I had to wipe those packages and reinstall. This leads me to think that there's some residual incompatibility that wasn't resolved which leads to random result numbers. Which is why I was asking for a benchmark to figure out which results to trust. Since I'm getting 0.4283 (or 0.4263 with a newline), that's it. Thanks.
from comet.
Related Issues (20)
- Avoid downloading XLM-R checkpoint from huggingface HOT 1
- Use BetterTransformer for fast inference HOT 1
- [QUESTION] default model is not update? HOT 2
- Version 2.0 HOT 2
- Specifying GPU ID for inference HOT 4
- Models not accessible HOT 2
- Inefficient _layer_norm implementation in layerwise_attention.py HOT 1
- [QUESTION]__init__.py generates a wrong path for hparams.yaml in Windows HOT 6
- tensor_lru_cache is limited to tensors with at least 2-Dimensions HOT 5
- `Unbabel/wmt22-comet-da` model not working as part of Huggingface evaluate HOT 2
- Can't reproduce Cometinho model scores HOT 3
- Do system scores above 100 really "differ"? HOT 4
- wmt20-comet-qe-da do not work under huggingface guides (possible version conflicts) HOT 3
- Support pandas 2
- [QUESTION] How to finetune `wmt22-comet-da` and have results scaled to 0-1 range HOT 1
- UnifiedMetric test failing: test_multitask_with_references HOT 1
- Segmentation error when tring to reproduce wmt22 results HOT 1
- How to reproduce Unbabel/wmt22-comet-da model HOT 2
- Evaluate lines with newline characters
- [QUESTION] Does COMET support Scoring multiple refs like scarebleu? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comet.