salaniz / pycocoevalcap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tylin/coco-caption

285.0 5.0 83.0 130.05 MB

Python 3 support for the MS COCO caption evaluation tools

License: Other

Python 100.00%

pycocoevalcap's Introduction

Microsoft COCO Caption Evaluation

Evaluation codes for MS COCO caption generation.

Description

This repository provides Python 3 support for the caption evaluation metrics used for the MS COCO dataset.

The code is derived from the original repository that supports Python 2.7: https://github.com/tylin/coco-caption.
Caption evaluation depends on the COCO API that natively supports Python 3.

Requirements

Java 1.8.0
Python 3

Installation

To install pycocoevalcap and the pycocotools dependency (https://github.com/cocodataset/cocoapi), run:

pip install pycocoevalcap

Usage

See the example script: example/coco_eval_example.py

Files

eval.py: The file includes COCOEavlCap class that can be used to evaluate results on COCO.
tokenizer: Python wrapper of Stanford CoreNLP PTBTokenizer
bleu: Bleu evalutation codes
meteor: Meteor evaluation codes
rouge: Rouge-L evaluation codes
cider: CIDEr evaluation codes
spice: SPICE evaluation codes

Setup

SPICE requires the download of Stanford CoreNLP 3.6.0 code and models. This will be done automatically the first time the SPICE evaluation is performed.
Note: SPICE will try to create a cache of parsed sentences in ./spice/cache/. This dramatically speeds up repeated evaluations. The cache directory can be moved by setting 'CACHE_DIR' in ./spice. In the same file, caching can be turned off by removing the '-cache' argument to 'spice_cmd'.

References

Microsoft COCO Captions: Data Collection and Evaluation Server
PTBTokenizer: We use the Stanford Tokenizer which is included in Stanford CoreNLP 3.4.1.
BLEU: BLEU: a Method for Automatic Evaluation of Machine Translation
Meteor: Project page with related publications. We use the latest version (1.5) of the Code. Changes have been made to the source code to properly aggreate the statistics for the entire corpus.
Rouge-L: ROUGE: A Package for Automatic Evaluation of Summaries
CIDEr: CIDEr: Consensus-based Image Description Evaluation
SPICE: SPICE: Semantic Propositional Image Caption Evaluation

Developers

Xinlei Chen (CMU)
Hao Fang (University of Washington)
Tsung-Yi Lin (Cornell)
Ramakrishna Vedantam (Virgina Tech)

Acknowledgement

David Chiang (University of Norte Dame)
Michael Denkowski (CMU)
Alexander Rush (Harvard University)

pycocoevalcap's People

Contributors

Stargazers

Watchers

pycocoevalcap's Issues

CIDEr score is 0 while all other metrics are normal

I'm currently using the pycocoevalcap package to evaluate the performance of my image captioning model. I've noticed that the CIDEr score is consistently 0 for all of my model's generated captions, while all other metrics (BLEU, METEOR, SPICE and ROUGE) are normal.

I have tried to run the evaluation on each image separately, but the situation remains the same. The CIDEr score is always 0.

I'm not sure what could be causing this issue, as the other metrics seem to be working correctly. Can anyone help me figure out why the CIDEr score is not being computed correctly?

Thanks in advance for your help!

OSError: [Errno 22] Invalid argument

I use python3.6 and tensorflow 1.4.0 in windows 10, and I put pycocoevalcap in this project https://github.com/yunjey/show-attend-and-tell, got some issues:

 File "D:/AIBU/image_caption/showAttendTell/train.py", line 22, in main
    solver.train()
  File "D:\AIBU\image_caption\showAttendTell\core\solver.py", line 172, in train
    scores = evaluate(data_path='./data', split='val', get_scores=True)  
  File "D:\AIBU\image_caption\showAttendTell\core\bleu.py", line 47, in evaluate
    final_scores = score(ref, hypo)
  File "D:\AIBU\image_caption\showAttendTell\core\bleu.py", line 21, in score
    score,scores = scorer.compute_score(ref,hypo)
  File "D:\AIBU\image_caption\showAttendTell\pycocoevalcap\meteor\meteor.py", line 38, in compute_score
    stat = self._stat(res[i][0], gts[i])
  File "D:\AIBU\image_caption\showAttendTell\pycocoevalcap\meteor\meteor.py", line 58, in _stat
    self.meteor_p.stdin.flush()
OSError: [Errno 22] Invalid argument

how to deal with it?

meteor.py

excuse me, do you modify the spice.py?

computing METEOR fails

Hi! when running the example script, I get the following error:

(venv2) *******% python coco_eval_example.py
loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.19s)
creating index...
index created!
tokenization...
PTBTokenizer tokenized 2492309 tokens at 1957824,35 tokens per second.
PTBTokenizer tokenized 381324 tokens at 1401808,46 tokens per second.
setting up scorers...
Downloading stanford-corenlp-3.6.0 for SPICE ...
Progress: 384.5M / 384.5M (100.0%)
Extracting stanford-corenlp-3.6.0 ...
Done.
computing Bleu score...
{'testlen': 313500, 'reflen': 368039, 'guess': [313500, 272996, 232492, 191988], 'correct': [153357, 45146, 12441, 3457]}
ratio: 0.851811900369252
Bleu_1: 0.411
Bleu_2: 0.239
Bleu_3: 0.137
Bleu_4: 0.079
computing METEOR score...
Traceback (most recent call last):
  File "coco_eval_example.py", line 22, in <module>
    coco_eval.evaluate()
  File "/home/stud/*****/venv2/lib/python3.8/site-packages/pycocoevalcap/eval.py", line 53, in evaluate
    score, scores = scorer.compute_score(gts, res)
  File "/home/stud/*****/venv2/lib/python3.8/site-packages/pycocoevalcap/meteor/meteor.py", line 43, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().strip()))
ValueError: could not convert string to float: ''

as you can see, the error happens in meteor.py. Could you maybe give an advice on this problem?
Best, David

BrokenPipeError: [Errno 32] Broken pipe

File "/home/amax/Documents/YBK/Image_to_text/54_Dynamic_graph/code/DCL-main/generation_api/pycocoevalcap/meteor/meteor.py", line 63, in _stat
self.meteor_p.stdin.write('{}\n'.format(score_line))
BrokenPipeError: [Errno 32] Broken pipe

excuse me, how can i solve the problem about BrokenPipeError? Could you give me some hints?

[Errno 22] Invalid argument for self.meteor_p.stdin.flush()

Hi. I followed your steps but i get this error. I have java 1.8 and i'm using windows. I printed the value of score_line, and it is attached in the second image below. It seems that the flush() method isn't supported on windows? Any solution to this?

OSError: [Errno 22] Invalid argument

The question is strange. I run example.py in pycocoevalcap gives the results normally. However, when I run the model code, I report the error OSError: [Errno 22] Invalid argument.

Java: 1.8
python: 3.6
OS: windows
java -jar -Xmx2G meteor-1.5.jar - - -stdio -l en -norm ->This is the output:

But this file(paraphrase-en.gz) exists in the right path.

Any plans for adding SPICE metric?

Are there any plans for adding SPICE metric?

The official repo has it already.

the ratio of BLEU

I was wondering how much of the ratio of the modified n gram in BLEU-2 ,3 ,and 4. And if possible how to change the proportion? Or does the calculation only purely specify the specified n gram and not a proportion to the other n grams?

I mean , theoritically speaking BLEU-4 would have 0.25 , 0.25 , 0.25 ,0.25 proportion for n-gram 1 ,2,3,and 4 and then it's computed to become the BLEU-4 score. Does this wrapper do that?

stanford-corenlp-full-2015-12-09.zip is not accessible now

Hi,

In SPICE metric, it requires http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip in the code:

pycocoevalcap/spice/get_stanford_models.py

Line 26 in a24f74c

url = 'http://nlp.stanford.edu/software/{}.zip'.format(CORENLP)

However, due to some website re-org, the http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip is not accessible anymore.

Do you have any plans for workaround?

mean maybe variance?

Let's say I have multiple pair of of sentences, other than calculating the mean of bleu, metoer... could i also get the variance?

Logical error in CIDEr ?

In the regard of CIDEr's logic, both candidates and references need to be mapped to their stem or root form. However, I couldn't see such operation in the code. I think implementation is wrong. So, you may need to add this logic to yourself.

why it is so slow to compute the meteor score?

Hi, @salaniz. I compute meteor and rouge scores but I find it is rather slow to wait for the computation result of Meteor scores. Could you please tell me why? Thanks!

Here is the code for reproduction, if it helps.

from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge


def evaluate_coco(ref_data, hyp_data):
    scorer_meteor = Meteor()
    scorer_rouge = Rouge()
    ref_data = [[ref_datum] for ref_datum in ref_data]
    hyp_data = [[hyp_datum] for hyp_datum in hyp_data]
    ref = dict(zip(range(len(ref_data)), ref_data))
    hyp = dict(zip(range(len(hyp_data)), hyp_data))

    print("coco meteor score ...")
    coco_meteor_score = scorer_meteor.compute_score(ref, hyp)[0]
    print("coco rouge score ...")
    coco_rouge_score = float(scorer_rouge.compute_score(ref, hyp)[0])
    return coco_meteor_score, coco_rouge_score


def main():
    ref_data = ['there is a cat on the mat']
    hyp_data = ['the cat is on the mat']
    evaluate_coco(ref_data, hyp_data)

Problem in tokenizer

when I run the code in the coco_eval_example.py file It gave me the following error:
Traceback (most recent call last):
File "d:/Udacity/CVND_Exercises/P2_Image_Captioning/demo.py", line 21, in
coco_eval.evaluate()
File "d:\Udacity\CVND_Exercises\P2_Image_Captioning\pycocoevalcap\eval.py", line 33, in evaluate
gts = tokenizer.tokenize(gts)
File "d:\Udacity\CVND_Exercises\P2_Image_Captioning\pycocoevalcap\tokenizer\ptbtokenizer.py", line 55, in tokenize
p_tokenizer = subprocess.Popen(cmd, cwd=path_to_jar_dirname, stdout=subprocess.PIPE)
File "C:\Users\user\anaconda3\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\user\anaconda3\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Can anyone help me please. And thank you in advance.

[Errno 22] Invalid argument

Hi, I face the same issue as what others have faced before and there's no solution presented. The issues are just closed immediately without telling how to fix the problem

Example of using pycocoevalcap WITHOUT coco data

No, this is not an issue. It's an example for whoever trying to use this package with their own data.

from pycocoevalcap.tokenizer.ptbtokenizer import PTBTokenizer
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge
from pycocoevalcap.cider.cider import Cider
from pycocoevalcap.spice.spice import Spice

class Evaluator:
    def __init__(self) -> None:
        self.tokenizer = PTBTokenizer()
        self.scorer_list = [
            (Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
            (Meteor(), "METEOR"),
            (Rouge(), "ROUGE_L"),
            (Cider(), "CIDEr"),
            (Spice(), "SPICE"),
        ]
        self.evaluation_report = {}

    def do_the_thing(self, golden_reference, candidate_reference):
        golden_reference = self.tokenizer.tokenize(golden_reference)
        candidate_reference = self.tokenizer.tokenize(candidate_reference)
        
        # From this point, some variables are named as in the original code
        # I have no idea why they name like these
        # The original code: https://github.com/salaniz/pycocoevalcap/blob/a24f74c408c918f1f4ec34e9514bc8a76ce41ffd/eval.py#L51-L63
        for scorer, method in self.scorer_list:
            score, scores = scorer.compute_score(golden_reference, candidate_reference)
            if isinstance(method, list):
                for sc, scs, m in zip(score, scores, method):
                    self.evaluation_report[m] = sc
            else:
                self.evaluation_report[method] = score

golden_reference = [
    "The quick brown fox jumps over the lazy dog.",
    "The brown fox quickly jumps over the lazy dog.",
    "A sly brown fox jumps over the lethargic dog.",
    "The speedy brown fox leaps over the sleepy hound.",
    "A fast, brown fox jumps over the lazy dog.",
]
golden_reference = {k: [{'caption': v}] for k, v in enumerate(golden_reference)}

candidate_reference = [
    "A fast brown fox leaps above the tired dog.",
    "A quick brown fox jumps over the sleepy dog.",
    "The fast brown fox jumps over the lazy dog.",
    "The brown fox jumps swiftly over the lazy dog.",
    "A speedy brown fox leaps over the drowsy dog.",
]
candidate_reference = {k: [{'caption': v}] for k, v in enumerate(candidate_reference)}

evaluator = Evaluator()

evaluator.do_the_thing(golden_reference, candidate_reference)

print(evaluator.evaluation_report)

bleu error

i use captions_val2014.json，but bleu value wrong . how to solve it.?

gts = tokenizer.tokenize(gts)
i try "print (gts) in eval.py",but show the {458755: ['']}
my equip is py 3.5.4,and tf 1.4.0

Could not cache item for SPICE

When i import the package and calculate spice: I have these 90 word long sentence to calculate SPICE. Maybe it is too long, but it showw error could not cache item which makes it extremely long to comute. Are there any work around or solution to this? Thank you.

is your code guaranteed?

Hi, and thanks for the code!
I'd like to know if your code performs exactly the same as the original one. I am using your code in my research work on Image Captioning. So may I know if the code is 100% same as the original?

Thanks and Regards

What are the default values of parameters for meteor?

I found that the result obtain from NLTK.translate.meteor is different from this repo (i.e., the meteor scorer). So I'm wondering what are the default configurations of the parameters for this metric, like alpha beta gamma delta, and weights, corresponding to this doc ?