Giter Club home page Giter Club logo

pycocoevalcap's Introduction

Microsoft COCO Caption Evaluation

Evaluation codes for MS COCO caption generation.

Description

This repository provides Python 3 support for the caption evaluation metrics used for the MS COCO dataset.

The code is derived from the original repository that supports Python 2.7: https://github.com/tylin/coco-caption.
Caption evaluation depends on the COCO API that natively supports Python 3.

Requirements

  • Java 1.8.0
  • Python 3

Installation

To install pycocoevalcap and the pycocotools dependency (https://github.com/cocodataset/cocoapi), run:

pip install pycocoevalcap

Usage

See the example script: example/coco_eval_example.py

Files

./

  • eval.py: The file includes COCOEavlCap class that can be used to evaluate results on COCO.
  • tokenizer: Python wrapper of Stanford CoreNLP PTBTokenizer
  • bleu: Bleu evalutation codes
  • meteor: Meteor evaluation codes
  • rouge: Rouge-L evaluation codes
  • cider: CIDEr evaluation codes
  • spice: SPICE evaluation codes

Setup

  • SPICE requires the download of Stanford CoreNLP 3.6.0 code and models. This will be done automatically the first time the SPICE evaluation is performed.
  • Note: SPICE will try to create a cache of parsed sentences in ./spice/cache/. This dramatically speeds up repeated evaluations. The cache directory can be moved by setting 'CACHE_DIR' in ./spice. In the same file, caching can be turned off by removing the '-cache' argument to 'spice_cmd'.

References

Developers

  • Xinlei Chen (CMU)
  • Hao Fang (University of Washington)
  • Tsung-Yi Lin (Cornell)
  • Ramakrishna Vedantam (Virgina Tech)

Acknowledgement

  • David Chiang (University of Norte Dame)
  • Michael Denkowski (CMU)
  • Alexander Rush (Harvard University)

pycocoevalcap's People

Contributors

elliottd avatar endernewton avatar hao-fang avatar iaalm avatar j-min avatar nicolas-lair avatar peteanderson80 avatar ramakrishnavedantam928 avatar salaniz avatar tylin avatar vrama91 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pycocoevalcap's Issues

CIDEr score is 0 while all other metrics are normal

I'm currently using the pycocoevalcap package to evaluate the performance of my image captioning model. I've noticed that the CIDEr score is consistently 0 for all of my model's generated captions, while all other metrics (BLEU, METEOR, SPICE and ROUGE) are normal.

I have tried to run the evaluation on each image separately, but the situation remains the same. The CIDEr score is always 0.

I'm not sure what could be causing this issue, as the other metrics seem to be working correctly. Can anyone help me figure out why the CIDEr score is not being computed correctly?

Thanks in advance for your help!

OSError: [Errno 22] Invalid argument

I use python3.6 and tensorflow 1.4.0 in windows 10, and I put pycocoevalcap in this project https://github.com/yunjey/show-attend-and-tell, got some issues:

 File "D:/AIBU/image_caption/showAttendTell/train.py", line 22, in main
    solver.train()
  File "D:\AIBU\image_caption\showAttendTell\core\solver.py", line 172, in train
    scores = evaluate(data_path='./data', split='val', get_scores=True)  
  File "D:\AIBU\image_caption\showAttendTell\core\bleu.py", line 47, in evaluate
    final_scores = score(ref, hypo)
  File "D:\AIBU\image_caption\showAttendTell\core\bleu.py", line 21, in score
    score,scores = scorer.compute_score(ref,hypo)
  File "D:\AIBU\image_caption\showAttendTell\pycocoevalcap\meteor\meteor.py", line 38, in compute_score
    stat = self._stat(res[i][0], gts[i])
  File "D:\AIBU\image_caption\showAttendTell\pycocoevalcap\meteor\meteor.py", line 58, in _stat
    self.meteor_p.stdin.flush()
OSError: [Errno 22] Invalid argument

how to deal with it?

meteor.py

excuse me, do you modify the spice.py?

computing METEOR fails

Hi! when running the example script, I get the following error:

(venv2) *******% python coco_eval_example.py
loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.19s)
creating index...
index created!
tokenization...
PTBTokenizer tokenized 2492309 tokens at 1957824,35 tokens per second.
PTBTokenizer tokenized 381324 tokens at 1401808,46 tokens per second.
setting up scorers...
Downloading stanford-corenlp-3.6.0 for SPICE ...
Progress: 384.5M / 384.5M (100.0%)
Extracting stanford-corenlp-3.6.0 ...
Done.
computing Bleu score...
{'testlen': 313500, 'reflen': 368039, 'guess': [313500, 272996, 232492, 191988], 'correct': [153357, 45146, 12441, 3457]}
ratio: 0.851811900369252
Bleu_1: 0.411
Bleu_2: 0.239
Bleu_3: 0.137
Bleu_4: 0.079
computing METEOR score...
Traceback (most recent call last):
  File "coco_eval_example.py", line 22, in <module>
    coco_eval.evaluate()
  File "/home/stud/*****/venv2/lib/python3.8/site-packages/pycocoevalcap/eval.py", line 53, in evaluate
    score, scores = scorer.compute_score(gts, res)
  File "/home/stud/*****/venv2/lib/python3.8/site-packages/pycocoevalcap/meteor/meteor.py", line 43, in compute_score
    scores.append(float(self.meteor_p.stdout.readline().strip()))
ValueError: could not convert string to float: ''

as you can see, the error happens in meteor.py. Could you maybe give an advice on this problem?
Best, David

BrokenPipeError: [Errno 32] Broken pipe

File "/home/amax/Documents/YBK/Image_to_text/54_Dynamic_graph/code/DCL-main/generation_api/pycocoevalcap/meteor/meteor.py", line 63, in _stat
self.meteor_p.stdin.write('{}\n'.format(score_line))
BrokenPipeError: [Errno 32] Broken pipe

excuse me, how can i solve the problem about BrokenPipeError? Could you give me some hints?

[Errno 22] Invalid argument for self.meteor_p.stdin.flush()

Hi. I followed your steps but i get this error. I have java 1.8 and i'm using windows. I printed the value of score_line, and it is attached in the second image below. It seems that the flush() method isn't supported on windows? Any solution to this?

capture
capture1

OSError: [Errno 22] Invalid argument

The question is strange. I run example.py in pycocoevalcap gives the results normally. However, when I run the model code, I report the error OSError: [Errno 22] Invalid argument.
图片
Java: 1.8
python: 3.6
OS: windows
java -jar -Xmx2G meteor-1.5.jar - - -stdio -l en -norm ->This is the output:
图片
But this file(paraphrase-en.gz) exists in the right path.

the ratio of BLEU

I was wondering how much of the ratio of the modified n gram in BLEU-2 ,3 ,and 4. And if possible how to change the proportion? Or does the calculation only purely specify the specified n gram and not a proportion to the other n grams?

I mean , theoritically speaking BLEU-4 would have 0.25 , 0.25 , 0.25 ,0.25 proportion for n-gram 1 ,2,3,and 4 and then it's computed to become the BLEU-4 score. Does this wrapper do that?

mean maybe variance?

Let's say I have multiple pair of of sentences, other than calculating the mean of bleu, metoer... could i also get the variance?

Logical error in CIDEr ?

In the regard of CIDEr's logic, both candidates and references need to be mapped to their stem or root form. However, I couldn't see such operation in the code. I think implementation is wrong. So, you may need to add this logic to yourself.

why it is so slow to compute the meteor score?

Hi, @salaniz. I compute meteor and rouge scores but I find it is rather slow to wait for the computation result of Meteor scores. Could you please tell me why? Thanks!

Here is the code for reproduction, if it helps.

from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge


def evaluate_coco(ref_data, hyp_data):
    scorer_meteor = Meteor()
    scorer_rouge = Rouge()
    ref_data = [[ref_datum] for ref_datum in ref_data]
    hyp_data = [[hyp_datum] for hyp_datum in hyp_data]
    ref = dict(zip(range(len(ref_data)), ref_data))
    hyp = dict(zip(range(len(hyp_data)), hyp_data))

    print("coco meteor score ...")
    coco_meteor_score = scorer_meteor.compute_score(ref, hyp)[0]
    print("coco rouge score ...")
    coco_rouge_score = float(scorer_rouge.compute_score(ref, hyp)[0])
    return coco_meteor_score, coco_rouge_score


def main():
    ref_data = ['there is a cat on the mat']
    hyp_data = ['the cat is on the mat']
    evaluate_coco(ref_data, hyp_data)

Problem in tokenizer

when I run the code in the coco_eval_example.py file It gave me the following error:
Traceback (most recent call last):
File "d:/Udacity/CVND_Exercises/P2_Image_Captioning/demo.py", line 21, in
coco_eval.evaluate()
File "d:\Udacity\CVND_Exercises\P2_Image_Captioning\pycocoevalcap\eval.py", line 33, in evaluate
gts = tokenizer.tokenize(gts)
File "d:\Udacity\CVND_Exercises\P2_Image_Captioning\pycocoevalcap\tokenizer\ptbtokenizer.py", line 55, in tokenize
p_tokenizer = subprocess.Popen(cmd, cwd=path_to_jar_dirname, stdout=subprocess.PIPE)
File "C:\Users\user\anaconda3\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\user\anaconda3\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Can anyone help me please. And thank you in advance.

[Errno 22] Invalid argument

image

Hi, I face the same issue as what others have faced before and there's no solution presented. The issues are just closed immediately without telling how to fix the problem

Example of using pycocoevalcap WITHOUT coco data

No, this is not an issue. It's an example for whoever trying to use this package with their own data.

from pycocoevalcap.tokenizer.ptbtokenizer import PTBTokenizer
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge
from pycocoevalcap.cider.cider import Cider
from pycocoevalcap.spice.spice import Spice

class Evaluator:
    def __init__(self) -> None:
        self.tokenizer = PTBTokenizer()
        self.scorer_list = [
            (Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
            (Meteor(), "METEOR"),
            (Rouge(), "ROUGE_L"),
            (Cider(), "CIDEr"),
            (Spice(), "SPICE"),
        ]
        self.evaluation_report = {}

    def do_the_thing(self, golden_reference, candidate_reference):
        golden_reference = self.tokenizer.tokenize(golden_reference)
        candidate_reference = self.tokenizer.tokenize(candidate_reference)
        
        # From this point, some variables are named as in the original code
        # I have no idea why they name like these
        # The original code: https://github.com/salaniz/pycocoevalcap/blob/a24f74c408c918f1f4ec34e9514bc8a76ce41ffd/eval.py#L51-L63
        for scorer, method in self.scorer_list:
            score, scores = scorer.compute_score(golden_reference, candidate_reference)
            if isinstance(method, list):
                for sc, scs, m in zip(score, scores, method):
                    self.evaluation_report[m] = sc
            else:
                self.evaluation_report[method] = score

golden_reference = [
    "The quick brown fox jumps over the lazy dog.",
    "The brown fox quickly jumps over the lazy dog.",
    "A sly brown fox jumps over the lethargic dog.",
    "The speedy brown fox leaps over the sleepy hound.",
    "A fast, brown fox jumps over the lazy dog.",
]
golden_reference = {k: [{'caption': v}] for k, v in enumerate(golden_reference)}

candidate_reference = [
    "A fast brown fox leaps above the tired dog.",
    "A quick brown fox jumps over the sleepy dog.",
    "The fast brown fox jumps over the lazy dog.",
    "The brown fox jumps swiftly over the lazy dog.",
    "A speedy brown fox leaps over the drowsy dog.",
]
candidate_reference = {k: [{'caption': v}] for k, v in enumerate(candidate_reference)}

evaluator = Evaluator()

evaluator.do_the_thing(golden_reference, candidate_reference)

print(evaluator.evaluation_report)

bleu error

i use captions_val2014.json,but bleu value wrong . how to solve it.?

gts = tokenizer.tokenize(gts)
i try "print (gts) in eval.py",but show the {458755: ['']}
my equip is py 3.5.4,and tf 1.4.0
image

Could not cache item for SPICE

When i import the package and calculate spice: I have these 90 word long sentence to calculate SPICE. Maybe it is too long, but it showw error could not cache item which makes it extremely long to comute. Are there any work around or solution to this? Thank you.

is your code guaranteed?

Hi, and thanks for the code!
I'd like to know if your code performs exactly the same as the original one. I am using your code in my research work on Image Captioning. So may I know if the code is 100% same as the original?

Thanks and Regards

What are the default values of parameters for meteor?

I found that the result obtain from NLTK.translate.meteor is different from this repo (i.e., the meteor scorer). So I'm wondering what are the default configurations of the parameters for this metric, like alpha beta gamma delta, and weights, corresponding to this doc ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.