yxuansu / magic Goto Github PK

Language Models Can See: Plugging Visual Controls in Text Generation

Home Page: https://arxiv.org/abs/2205.02655

Python 94.26% Shell 5.74%

multimodal plug-and-play-language-models text-generation unsupervised-learning zero-shot clip gpt-2 image-captioning story-generation

magic's Introduction

Hi there 👋

I am a final-year Ph.D. student at Language Technology Lab, University of Cambridge. I am broadly interested in natural language processing (NLP) and machine learning. The majority of my research lies in text generation. Recently, I focus my research on the topic of contrastive learning and the study of its potential in language model pre-training, discourse representation learning, knowledge probing, open-ended text generation, and multi-modal text generation. Please refer to [my personal page] for the complete list of my research.

Personally, I really like pandas. The one in my icon is my favourite and her name is Hehua.

magic's People

Contributors

Stargazers

Watchers

magic's Issues

Difference between paper and code

Hi,

In https://github.com/yxuansu/MAGIC/blob/main/story_generation/language_model/utlis.py#L266, you say you only use the generated text to calculate similarity. But in the paper, the MAGIC scores are calculated based on the prefix and the generated tokens. Could you explain this difference?

Thx~

Is is necessary to finetune gpt2?

could I just directly use gpt2 in MAGIC search? Is there any comparison results between finetune gpt2 and fix gpt2 ?

about the performance of the Zerocap demonstrated in this paper

Thanks for your amazing work on the zero-shot captioning task. As shown in Table 1 of this paper, the zerocap's performance on COCO is as follows:

however, it seems different from the performance reported in Zerocap's paper and is shown as follows:

In this paper, did the Zerocap use different settings that resulted in this difference?
I would greatly appreciate your response.
Thank you.

is there anybody reproduce this methods with other LM/VLM?

Thanks to share this great work :)

I have tried to reproduce this methods with my own LM(gpt-3)/VLM(CLIP). However, the quality is significantly inferior to the example you provided.

is there anybody reproduce this methods with your own LM/VLM ?
or
is there any implementation detail when I use my own LM/VLM ?

About the image captioning results on MS-COCO, I could reproduce exactly the same scores as presented in Table1 of the paper, by using the following result file. (https://github.com/yxuansu/MAGIC/blob/main/image_captioning/inference_result/mscoco/magic_result.json)
However, this file only contains results of 4982 images, while the number of full validation images is about 40k.
Why is the image captioning score on MS-COCO not measured on full validation set, but only on a subset?

Java ClassNotFoundException raised

Hi, I tried to evaluate result, ClassNotFoundException error raised.
How can I add SemgrexPattern class?

(magic) teang1995@devbox:~/codes/MAGIC/image_captioning/evaluation$ python cocoeval.py --result_file_path ../inference_result/flickr30k/baselines/contrastive_result.json
tokenization...
PTBTokenizer tokenized 72436 tokens at 390823.69 tokens per second.
PTBTokenizer tokenized 14999 tokens at 142902.49 tokens per second.
setting up scorers...
computing Bleu score...
{'testlen': 13000, 'reflen': 12470, 'guess': [13000, 12000, 11000, 10000], 'correct': [6192, 2110, 773, 341]}
ratio: 1.0425020048114642
Bleu_1: 0.476
Bleu_2: 0.289
Bleu_3: 0.181
Bleu_4: 0.119
computing METEOR score...
METEOR: 0.127
computing Rouge score...
ROUGE_L: 0.353
computing CIDEr score...
CIDEr: 0.089
computing SPICE score...
Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/semgraph/semgrex/SemgrexPattern
        at edu.anu.spice.SpiceParser.<clinit>(SpiceParser.java:64)
        at edu.anu.spice.SpiceScorer.scoreBatch(SpiceScorer.java:70)
        at edu.anu.spice.SpiceScorer.main(SpiceScorer.java:60)
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.semgraph.semgrex.SemgrexPattern
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 3 more
Traceback (most recent call last):
  File "cocoeval.py", line 16, in <module>
    cocoEval.evaluate()
  File "/home/teang1995/codes/MAGIC/image_captioning/evaluation/pycocoevalcap/eval.py", line 59, in evaluate
    score, scores = scorer.compute_score(gts, res)
  File "/home/teang1995/codes/MAGIC/image_captioning/evaluation/pycocoevalcap/spice/spice.py", line 69, in compute_score
    subprocess.check_call(spice_cmd, 
  File "/data1/teang1995/anaconda3/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/teang1995/codes/MAGIC/image_captioning/evaluation/pycocoevalcap/spice/tmp/tmpc2vzaamg', '-cache', '/home/teang1995/codes/MAGIC/image_captioning/evaluation/pycocoevalcap/spice/cache', '-out', '/home/teang1995/codes/MAGIC/image_captioning/evaluation/pycocoevalcap/spice/tmp/tmp1rq2qr4m', '-subset', '-silent']' returned non-zero exit status 1.

yxuansu / magic Goto Github PK

magic's Introduction

Hi there 👋

magic's People

Contributors

Stargazers

Watchers

Forkers

magic's Issues

Difference between paper and code

Is is necessary to finetune gpt2?

about the performance of the Zerocap demonstrated in this paper

is there anybody reproduce this methods with other LM/VLM?

Evaluation code for story generation

教程

Why is the image captioning score on MS-COCO not measured on full validation set, but only on a subset?

Java ClassNotFoundException raised

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent