milvlg / prophet Goto Github PK

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Home Page: https://arxiv.org/abs/2303.01903

License: Apache License 2.0

Python 94.46% Shell 5.54%

a-okvqa gpt-3 multimodal-deep-learning okvqa prompt-engineering pytorch visual-question-answering

prophet's Introduction

Prophet

This repository is the official implementation of the Prophet, a two stage framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. In stage one, we train a vanilla VQA model on a specific knowledge-based VQA dataset and extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples. In stage two, answer heuristics are used to prompt GPT-3 to generate better answers. Prophet significantly outperforms existing state-of-the-art methods on two datasets, delivering 61.1% on OK-VQA and 55.7% on A-OKVQA. Please refer to our paper for details.

Updates

April 28, 2023

Add pretrained and finetuned models on A-OKVOA.

March 10, 2023

Training and testing codes of the two-stages Prophet framework.
Pretrained and finetuned models on OK-VOA.

Prerequisites
Usage
Evaluation
Citation
License

Prerequisites

Hardware and Software Requirements

To conduct the following experiments, a machine with at least 1 RTX 3090 GPU, 50GB memory, and 300GB free disk space is recommended. We strongly recommend using an SSD drive to guarantee high-speed I/O.

Following software is needed:

Python >= 3.9
Cuda >= 11.3
Pytorch >= 12.0
what you can find in environment.yml

We recommend downloading Anaconda first and then creating a new environment with the following command:

$ conda env create -f environment.yml

This command will create a new environment named prophet with all the required packages. To activate the environment, run:

$ conda activate prophet

Data Preparation

Before running the code, prepare two folders: datasets and assets. The datasets folder contains all the datasets and features used in this project, and the assets folder contains the pre-computed resources and other intermediate files (you can use them to skip some early experiment steps and save time).

First, download the datasets and assets. Then put the datasets and assets folder in the root directory of this project. Download MSCOCO 2014 and 2017 images from here (you can skip MSCOCO 2017 if you only experiments on OK-VQA) and put them in the datasets folder. Run the following command to extract the features of the images:

$ bash scripts/extract_img_feats.sh

After that, the datasets and assets folder will have the following structure:

Click to expand

datasets
├── aokvqa
│   ├── aokvqa_v1p0_test.json
│   ├── aokvqa_v1p0_train.json
│   └── aokvqa_v1p0_val.json
├── coco2014
│   ├── train2014
│   └── val2014
├── coco2014_feats
│   ├── train2014
│   └── val2014
├── coco2017
│   ├── test2017
│   ├── train2017
│   └── val2017
├── coco2017_feats
│   ├── test2017
│   ├── train2017
│   └── val2017
├── okvqa
│   ├── mscoco_train2014_annotations.json
│   ├── mscoco_val2014_annotations.json
│   ├── OpenEnded_mscoco_train2014_questions.json
│   └── OpenEnded_mscoco_val2014_questions.json
└── vqav2
    ├── v2_mscoco_train2014_annotations.json
    ├── v2_mscoco_val2014_annotations.json
    ├── v2_OpenEnded_mscoco_train2014_questions.json
    ├── v2_OpenEnded_mscoco_val2014_questions.json
    ├── v2valvg_no_ok_annotations.json
    ├── v2valvg_no_ok_questions.json
    ├── vg_annotations.json
    └── vg_questions.json

We've also provided a tree structure of the entire project in misc/tree.txt.

Usage

We provide bash scripts for each stage of the Prophet framework. You can find them in the scripts directory. There are two common arguments you should take care of when running each script:

--task: specify the task (i.e., the target dataset) you want to deal with. The available options are ok (training on train set of OK-VQA and evaluating on the test set of OK-VQA), aok_val (training on train set of A-OKVQA and evaluating on the val set of A-OKVQA) and aok_test (training on train set and val set of A-OKVQA and evaluating on the test set of A-OKVQA);

Note that although Prophet uses VQA v2 datasets for pre-training, there are slight differences in how the datasets are used for different tasks (ok, aok_val, and aok_test), as detailed in configs/task_to_split.py. This means that different pre-training commands need to be followed for each task.

--version: specify the version name of this run. This name will be used to create a new folder in the outputs directory to store the results of this run.

Notice that you can omit any arguments when invoking following scripts, it will then use the default arguments written in the script files.

Before running any script, you can also update the configuration files (*.yml) in the configs directory to change hyperparameters.

1. OK-VQA

Take OK-VQA for example, Propht consists of two phases, stage one for training a vanilla VQA model and extracting answer heuristics, and stage two for prompting GPT-3 with answer heuristics.

Stage one

At this stage, we train an improved MCAN model (check the paper for detail description) through pretraning on VQA v2 and finetuning on target dataset. Multiple GPUs are supported by setting --gpu 0,1,2,3 (for example). Run pretraining step with commands:

$ bash scripts/pretrain.sh \
    --task ok --version okvqa_pretrain_1 --gpu 0

We've provided a pretrained model for OK-VQA here. Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task ok --version okvqa_finetune_1 --gpu 0 \
    --pretrained_model outputs/okvqa_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckpts/{your_version_name}. We've also provided a finetuned model for OK-VQA here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task ok --version okvqa_heuristics_1
    --gpu 0 --ckpt_path outputs/okvqa_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for OK-VQA are answer_aware_examples_okvqa.json and candidates_okvqa.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_prompt_1 \
    --examples_path outputs/results/okvqa_heuristics_1/examples.json \ 
    --candidates_path outputs/results/okvqa_heuristics_1/candidates.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

We also provide example scripts for the aok_val and aok_test modes on A-OKVQA.

Click to expand

2. A-OKVQA (val)

Stage one

Similary, for task of aok_val, run pretraining step with commands:

$ bash scripts/pretrain.sh \
    --task aok_val --version aokvqa_val_pretrain_1 --gpu 0

We've provided a pretrained model for aok_val here.Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task aok_val --version aokvqa_val_finetune_1 --gpu 0 \
    --pretrained_model outputs/aokvqa_val_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckpts/{your_version_name}.We've also provided a finetuned model for aok_val here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task aok_val --version aokvqa_val_heuristics_1
    --gpu 0 --ckpt_path outputs/aokvqa_val_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for aok_val are examples_aokvqa_val.json and candidates_aokvqa_val.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_val_prompt_1 \
    --examples_path outputs/results/aokvqa_val_heuristics_1/examples.json \ 
    --candidates_path outputs/results/aokvqa_val_heuristics_1/candidates.json \
    --captions_path assets/captions_aokvqa.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

3. A-OKVQA (test)

For task of aok_val, run pretraining step with commands:

Stage one

$ bash scripts/pretrain.sh \
    --task aok_test --version aokvqa_test_pretrain_1 --gpu 0

We've provided a pretrained model for aok_test here. Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task aok_test --version aokvqa_test_finetune_1 --gpu 0 \
    --pretrained_model outputs/aokvqa_test_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckptss/{your_version_name}.We've also provided a finetuned model for aok_test here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task aok_test --version aokvqa_test_heuristics_1
    --gpu 0 --ckpt_path outputs/aokvqa_test_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for aok_test are examples_aokvqa_test.json and candidates_aokvqa_test.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_test_prompt_1 \
    --examples_path outputs/results/aokvqa_test_heuristics_1/examples.json \ 
    --candidates_path outputs/results/aokvqa_test_heuristics_1/candidates.json \
    --captions_path assets/captions_aokvqa.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

Evaluation

For the task of ok and aok_val whose annotations are available, the scores are automatically computed after finetuning and prompting. You can also evaluate the result files that outputted after finetuning or prompting, by run

$ bash scripts/evaluate_file.sh \
    --task ok --result_path outputs/results/okvqa_prompt_1/result.json

Using the corresponding result files and evaluation script above, we obtain the accuracies in the following table, respectively.

OK-VQA

A-OKVQA (val)

A-OKVQA (test)

MCAN	Prophet
53.0%	61.1%

MCAN	Prophet
52.0%	58.2%

MCAN	Prophet
45.6%	55.7%

For the task of aok_test, you need to submit the result file to the A-OKVQA Leaderboard to evaluate the result.

Citation

If you use this code in your research, please cite our paper:

@inproceedings{shao2023prompting,
  title={Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering},
  author={Shao, Zhenwei and Yu, Zhou and Wang, Meng and Yu, Jun},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  pages={14974--14983},
  year={2023}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

prophet's People

Contributors

Stargazers

Watchers

prophet's Issues

pretrain.py的代码，当RESUME的时候，start_epoch = self.__C.CKPT_EPOCH是不是有问题呢？

start_epoch = self.__C.CKPT_EPOCH

但是，我看了openvqa中的代码，发现start_epoch = ckpt['epoch'] ，ckpt = torch.load(path)，是从ckpt中加载的，而不是从配置文件读取。如果按照上述的代码，每次接着预训练每次都从self.__C.CKPT_EPOCH开始。我感觉不对，还是有什么其它的意义呢？

bug

Have you ever encountered this kind of bug?

当我运行bash scripts/extract_img_feats.sh时显示下面内容，但并没有生成coco2014_feats

当我运行bash scripts/extract_img_feats.sh时显示下面内容，但并没有生成coco2014_feats
image dirs: ['datasets/coco2014/train2014/', 'datasets/coco2014/val2014/']
total images: 0
100%|████████████████████████████████████| 1.26G/1.26G [2:37:25<00:00, 143kiB/s]
0it [00:00, ?it/s]
请问是为什么

运行“bash scripts/extract_img_feats.sh”后提示“RuntimeError: CUDA error: out of memory”

亲爱的作者您好，我运行“bash scripts/extract_img_feats.sh”命令后提示显存不足，服务器的显卡是3090，我把tools文件夹里的extract_img_feats.py文件修改了一下，把关于图片的操作修改到cpu上就可以运行了，但是速度非常慢，需要运行两百多小时，请问一下有没有什么解决方法？

mcan_530_okvqa.json

你好，请问一下mcan_530_okvqa.json 是哪部分代码生成的谢谢

KeyError: 179520 ？？

while running command

bash scripts/pretrain.sh \
    --task ok --version okvqa_pretrain_1 --gpu 0

I met this problem:

Traceback (most recent call last):
  File "/root/autodl-fs/prophet-main/main.py", line 35, in <module>
    runner.run()
  File "/root/autodl-fs/prophet-main/prophet/stage1/pretrain.py", line 162, in run
    self.train(train_set, valid_set)
  File "/root/autodl-fs/prophet-main/prophet/stage1/pretrain.py", line 93, in train
    for step, input_tuple in enumerate(dataloader):
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/autodl-fs/prophet-main/prophet/stage1/utils/load_data.py", line 136, in __getitem__
KeyError: 179520

whole project structure:

prophet-main
├── assets
│   ├── answer_aware_examples_okvqa.json
│   ├── answer_dict_aokvqa.json
│   ├── answer_dict_okvqa.json
│   ├── answer_dict_vqav2.json
│   ├── candidates_aokvqa_test.json
│   ├── candidates_aokvqa_val.json
│   ├── candidates_okvqa.json
│   ├── captions_aokvqa.json
│   ├── captions_okvqa.json
│   ├── examples_aokvqa_test.json
│   ├── examples_aokvqa_val.json
│   └── Untitled.ipynb
├── ckpts
│   └── epoch_6.pkl
├── CLIP
│   ├── clip
│   │   ├── bpe_simple_vocab_16e6.txt.gz
│   │   ├── clip.py
│   │   ├── __init__.py
│   │   ├── model.py
│   │   └── simple_tokenizer.py
│   ├── CLIP.png
│   ├── data
│   │   ├── country211.md
│   │   ├── prompts.md
│   │   ├── rendered-sst2.md
│   │   └── yfcc100m.md
│   ├── hubconf.py
│   ├── LICENSE
│   ├── MANIFEST.in
│   ├── model-card.md
│   ├── notebooks
│   │   ├── Interacting_with_CLIP.ipynb
│   │   └── Prompt_Engineering_for_ImageNet.ipynb
│   ├── README.md
│   ├── requirements.txt
│   ├── setup.py
│   └── tests
│       └── test_consistency.py
├── configs
│   ├── finetune.yml
│   ├── path_cfgs.py
│   ├── pretrain.yml
│   ├── prompt.yml
│   ├── __pycache__
│   │   ├── path_cfgs.cpython-39.pyc
│   │   ├── task_cfgs.cpython-39.pyc
│   │   └── task_to_split.cpython-39.pyc
│   ├── task_cfgs.py
│   └── task_to_split.py
├── datasets
│   ├── aokvqa
│   │   ├── aokvqa_v1p0_test.json
│   │   ├── aokvqa_v1p0_train.json
│   │   └── aokvqa_v1p0_val.json
│   ├── coco2014
│   │   ├── train2014
│   │   ├── train2014.zip
│   │   └── val2014
│   ├── coco2014_feats
│   │   ├── train2014
│   │   ├── train2014.zip
│   │   ├── val2014
│   │   └── val2014.zip
│   ├── coco2017
│   ├── coco2017_feats
│   ├── datasets.zip
│   ├── okvqa
│   │   ├── mscoco_train2014_annotations.json
│   │   ├── mscoco_val2014_annotations.json
│   │   ├── OpenEnded_mscoco_train2014_questions.json
│   │   └── OpenEnded_mscoco_val2014_questions.json
│   ├── old_data
│   │   ├── coco2014
│   │   └── coco2014_feats
│   ├── Untitled.ipynb
│   └── vqav2
│       ├── v2_mscoco_train2014_annotations.json
│       ├── v2_mscoco_val2014_annotations.json
│       ├── v2_OpenEnded_mscoco_train2014_questions.json
│       ├── v2_OpenEnded_mscoco_val2014_questions.json
│       ├── v2valvg_no_ok_annotations.json
│       ├── v2valvg_no_ok_questions.json
│       ├── vg_annotations.json
│       └── vg_questions.json
├── environment.yml
├── evaluation
│   ├── ans_punct.py
│   ├── aok_utils
│   │   ├── eval_predictions.py
│   │   ├── load_aokvqa.py
│   │   ├── __pycache__
│   │   └── remap_predictions.py
│   ├── aokvqa_evaluate.py
│   ├── okvqa_evaluate.py
│   ├── __pycache__
│   │   ├── ans_punct.cpython-39.pyc
│   │   ├── aokvqa_evaluate.cpython-39.pyc
│   │   └── okvqa_evaluate.cpython-39.pyc
│   └── vqa_utils
│       ├── __pycache__
│       ├── vqaEval.py
│       └── vqa.py
├── LICENSE
├── main.py
├── misc
│   ├── framework.png
│   └── tree.txt
├── outputs
│   ├── ckpts
│   │   ├── okvqa_finetune_1
│   │   ├── okvqa_heuristics_1
│   │   └── okvqa_pretrain_1
│   ├── logs
│   │   ├── okvqa_finetune_1
│   │   └── okvqa_pretrain_1
│   └── results
│       ├── okvqa_finetune_1
│       └── okvqa_heuristics_1
├── preds
├── prophet
│   ├── __init__.py
│   ├── __pycache__
│   │   └── __init__.cpython-39.pyc
│   ├── stage1
│   │   ├── finetune.py
│   │   ├── heuristics.py
│   │   ├── model
│   │   ├── pretrain.py
│   │   ├── __pycache__
│   │   └── utils
│   └── stage2
│       ├── prompt.py
│       └── utils
├── README.md
├── scripts
│   ├── evaluate_file.sh
│   ├── evaluate_model.sh
│   ├── extract_img_feats.sh
│   ├── finetune.sh
│   ├── heuristics_gen.sh
│   ├── pretrain.sh
│   └── prompt.sh
├── --task
├── tools
│   ├── extract_img_feats.py
│   ├── __pycache__
│   │   └── transforms.cpython-39.pyc
│   └── transforms.py
└── Untitled.ipynb

Huggingface model

Have your models been updated to Hugging Face?

skip step 1 and go directly to step 2

Step 1 takes a long time. You mentioned in your introduction that we can skip step 1 and go directly to step 2 based on the answer_aware_example_okvqa.json and candidates_okvqa.json you provided, right?

请问你们跑这几个数据集花了多少钱调用api... 我才刚开始跑一小会就花了5美元。。太贵了

OpenAI's apikey

I can't call OpenAI's apikey on the rented server, is there any way, thank you

1 multiple GPUs

Dear author, how do I set up multiple GPUs correctly?
questions.docx

How can I run this model in my custom dataset?

当我在训练stage1时预训练、微调和生成候选答案时报了一样的错OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the

当我在训练stage1时使用官方给的预训练、微调和生成候选答案命令时报了一样的错
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-large-cased is not the path to a directory containing a file named config.json
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode
在main.py中添加代码
TRANSFORMERS_OFFLINE=1 # 离线状态下可运行
又报了新的错误：
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104,'Connection reset by peer'))
请问该怎么解决呢

Accuracy does not increased

I have trained on the custom dataset, During the training model loss decreased but accuracy remained the same at Zero.

very good guy and very impressive story

it is NOT an issue but respect!!!!

预训练MCAN模型和在okvqa上微调是一起的吗？应该先预训练MCAN，再去微调。

At this stage, we train an improved MCAN model through pretraning on VQA v2 and finetuning on target dataset. Take OK-VQA for example, run pretraining step with commands:

$ bash scripts/pretrain.sh --task ok --version okvqa_pretrain_1 --gpu 0

预训练MCAN模型和在okvqa上微调是一起的吗？应该先预训练MCAN，再去微调。
但是，上面的脚本，task是ok，是不是MCAN已经预训练结束了，然后在okvqa上进行微调？还是，预训练和微调放在一起执行呢？
是否应该有单独进行mcan预训练的执行脚本代码？然后，保存checkpoint，提供下载，然后再去okvqa上进行微调？

请问国内可以直接跑吗？跑起来可以直接调到openai的API？

could prophet work with LLaMA(from facebook)

I think prophet can offer answer heuristics for LLaMA instead of OpenAI GPT-3 to get better results.
first, LLaMA model is opensource.
second,LLaMA maybe better than GPT-3.

Hello, may I ask how you avoid the problem of large explanatory texts in the answers generated by chat3.5.

当我运行stage2的命令时，显示错误连接openAI,这个是什么原因呢？

Loaded dataset size: 9009, top10 accuracy: 91.81, top1 accuracy: 86.54
Loaded dataset size: 5046, top10 accuracy: 79.83, top1 accuracy: 53.05

Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5046 0:06:31/-:--:--(--s/iter)<class 'openai.error.APIConnectionError'> Error communicating with OpenAI
retrying...

when I skip stage1 and execute stage2, the dataset size is 0

is right?

50GB memory？

"To conduct the following experiments, a machine with at least 1 RTX 3090 GPU, 50GB memory", wherein "50GB memory" refers to Memory for CPU or GPU？

Error communicating with OpenAI

Hello dear author, I tried your modified result but there is still this problem. Do you have any good solutions?

The process of image caption

Is there the process code about image caption? I find a "captions_okvqa.json" file in the assets about image caption, but I do not find the process code about this file

Replacing GPT-3 with other academic LLMs

Thank you so much for your excellent work!

I have a minor problem about the LLM selection. Have you tried other academic LLM models, e.g., LLAMA, to replace GPT-3? Will it make a big performance difference? Thanks!

Best regards

Naive question on OK-VQA and A-OKVQA evaluation.

Hi @ParadoxZW @MIL-VLG , thanks for your grate project.

I am not very familiar with OK-VQA and A-OKVQA evaluation. Here are some naive questions:

OK-VQA and A-OKVQA have an open-ended QA setting. For each question, it has ~10 gt answers (although some answers are the same). Do you use exact match (vqav2-style, match at least 3 gt answers) to compute the accuracy?
Is it common to train on A-OKVQA train+val and conduct inference on A-OKVQA test?

Trained model

Can we use the model you have already trained from existing code？

could you upload file to Baidu netdisk

the file of dataset and pretrained model from sharepoint.com can not download successfully, could you upload pretrained model to Baidu netdisk?

assets

May I ask if the files in the assets folder were created by myself? If they were generated by code, please let me learn the code. Thank you.

当我在训练stage1时预训练、微调和生成候选答案时报了一样的错 TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType 请问该怎么解决呢

Loading common data...
== Total image number: 123287
Traceback (most recent call last):
File "/root/autodl-tmp/prophet/main.py", line 40, in
runner.run()
File "/root/autodl-tmp/prophet/prophet/stage1/pretrain.py", line 160, in run
common_data = CommonData(self.__C)
File "/root/autodl-tmp/prophet/prophet/stage1/utils/load_data.py", line 55, in init
self.tokenizer = AutoTokenizer.from_pretrained(__C.BERT_VERSION)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1834, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1959, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 213, in init
if not os.path.isfile(vocab_file):
File "/root/miniconda3/envs/prophet/lib/python3.9/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

okvqa-stage1-pretrain

when l pretrain in okvqa use mcan model,it error
raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.
During handling of the above exception, another exception occurred:
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-large-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

so need download bert-large-uncased online? and run code offline?

The process of image caption

Is there the process code about image caption? I find a "captions_okvqa.json" file in the assets about image caption, but I do not find the process code about this file

mcan_530_okvqa.json

请问我运行完finetune.sh生成的json文件生成了 10096条数据您给的mcan_530_okvqa.json 只有5048条
而且当我评估时准确率只有50%是因为什么是流程错了吗是用的您给的预训练模型

Could you provide a finetuned model for A-OKVQA dataset?

Hi, I am quite interested in your nice work and happy to see the code has been released!
A finetuned model for OK-VQA has been provided. It works.
I want to try your method on another dataset, A-OKVQA but don't find the finetuned checkpoint.
So could you provide the finetuned model for A-OKVQA dataset? Thanks!

OpenAI-Api Cost

l want know the cost of the whole process of the use of openai-api,or the cost of one use test of the model.
I'm afraid I can't afford the expense of my experiments.
Probably cost is enough.l hope can get answer,thank you very much.

Checkpoints Availability

Hello! I was wondering when/if the checkpoints for prophet would be made publically available? Thanks in advance :)

您好，关于Caption这块有疑问想请教您一下

   这个多模态模型可以理解为：使用“caption”和“答案启发”方式将图片要素转为文字来进行两个模态的交互吗？
  “Caption的内容是需要使用an off-the-shelf captioning model将图像翻译成caption”，那么这个额外的图转文模型才是多模态交互关键点吗，这样的话这个额外的模型才是决定此模型关键把，如果额外模型效果不好，噪音也就很大调用gpt意义就不大了吧？

The first candidate answer of your provided candidates_okvqa.json in assets.zip

Thank you very much for providing the code. I calculated the accuracy of the first answer on OKVQA val in the candidates_okvqa.json in assets.zip you provided. The code I run is following. It turns out that the accuracy is 47.06 instead of 53. Did I do something wrong？

import json

#load data
with open('candidates_okvqa.json') as f:
    answer_candidates = json.load(f)
with open('mscoco_val2014_annotations.json') as f:
    val_datasets_annotations = json.load(f)['annotations']

#organize answer list
val_datasets = []
for val_a in val_datasets_annotations:
    multi_answers = []
    for ans in val_a['answers']:
        multi_answers.append(ans['raw_answer'])
    row = {'question_id': val_a['question_id'], 'direct_answers': multi_answers}
    val_datasets.append(row)

#compute score for a predicted answer
def direct_scores(pred_answer, direct_answers):
    acc_num = 0
    cnt = 0
    for _, answer_id in enumerate(direct_answers):
        if pred_answer == answer_id:
            cnt += 1
    if cnt ==1:
        acc_num = 0.3
    elif cnt == 2:
        acc_num = 0.6
    elif cnt > 2:
        acc_num = 1
    return acc_num

#Calculate the accuracy of the first candidate answer for all samples
acc = 0.0
for single_sample in val_datasets:
    single_sample['DA_candidate'] = [each_answer['answer'] for each_answer in answer_candidates[str(single_sample['question_id'])]]
    score = []
    for i in single_sample['DA_candidate']:
        score.append(direct_scores(i, single_sample['direct_answers']))
    acc += score[0]
print(acc/len(val_datasets))

Looking forward to your reply.

Prerequisites Questions

Dear author, when I was processing "conda env create -f environment.yml" a error occured like this：

If its right to delete the "@v1.0"
I hope you can help me to answer this question, thank you very much.

milvlg / prophet Goto Github PK

prophet's Introduction

Prophet

Updates

Table of Contents

Prerequisites

Hardware and Software Requirements

Data Preparation

Usage

1. OK-VQA

Stage one

Stage two

2. A-OKVQA (val)

Stage one

Stage two

3. A-OKVQA (test)

Stage one

Stage two

Evaluation

Citation

License

prophet's People

Contributors

Stargazers

Watchers

Forkers

prophet's Issues

Recommend Projects

Recommend Topics

Recommend Org