Giter Club home page Giter Club logo

prophet's Introduction

Prophet

PWC PWC

This repository is the official implementation of the Prophet, a two stage framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. In stage one, we train a vanilla VQA model on a specific knowledge-based VQA dataset and extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples. In stage two, answer heuristics are used to prompt GPT-3 to generate better answers. Prophet significantly outperforms existing state-of-the-art methods on two datasets, delivering 61.1% on OK-VQA and 55.7% on A-OKVQA. Please refer to our paper for details.

prophet

Updates

April 28, 2023

  • Add pretrained and finetuned models on A-OKVOA.

March 10, 2023

  • Training and testing codes of the two-stages Prophet framework.
  • Pretrained and finetuned models on OK-VOA.

Table of Contents

Prerequisites

Hardware and Software Requirements

To conduct the following experiments, a machine with at least 1 RTX 3090 GPU, 50GB memory, and 300GB free disk space is recommended. We strongly recommend using an SSD drive to guarantee high-speed I/O.

Following software is needed:

  1. Python >= 3.9
  2. Cuda >= 11.3
  3. Pytorch >= 12.0
  4. what you can find in environment.yml

We recommend downloading Anaconda first and then creating a new environment with the following command:

$ conda env create -f environment.yml

This command will create a new environment named prophet with all the required packages. To activate the environment, run:

$ conda activate prophet

Data Preparation

Before running the code, prepare two folders: datasets and assets. The datasets folder contains all the datasets and features used in this project, and the assets folder contains the pre-computed resources and other intermediate files (you can use them to skip some early experiment steps and save time).

First, download the datasets and assets. Then put the datasets and assets folder in the root directory of this project. Download MSCOCO 2014 and 2017 images from here (you can skip MSCOCO 2017 if you only experiments on OK-VQA) and put them in the datasets folder. Run the following command to extract the features of the images:

$ bash scripts/extract_img_feats.sh

After that, the datasets and assets folder will have the following structure:

Click to expand
datasets
├── aokvqa
│   ├── aokvqa_v1p0_test.json
│   ├── aokvqa_v1p0_train.json
│   └── aokvqa_v1p0_val.json
├── coco2014
│   ├── train2014
│   └── val2014
├── coco2014_feats
│   ├── train2014
│   └── val2014
├── coco2017
│   ├── test2017
│   ├── train2017
│   └── val2017
├── coco2017_feats
│   ├── test2017
│   ├── train2017
│   └── val2017
├── okvqa
│   ├── mscoco_train2014_annotations.json
│   ├── mscoco_val2014_annotations.json
│   ├── OpenEnded_mscoco_train2014_questions.json
│   └── OpenEnded_mscoco_val2014_questions.json
└── vqav2
    ├── v2_mscoco_train2014_annotations.json
    ├── v2_mscoco_val2014_annotations.json
    ├── v2_OpenEnded_mscoco_train2014_questions.json
    ├── v2_OpenEnded_mscoco_val2014_questions.json
    ├── v2valvg_no_ok_annotations.json
    ├── v2valvg_no_ok_questions.json
    ├── vg_annotations.json
    └── vg_questions.json

We've also provided a tree structure of the entire project in misc/tree.txt.

Usage

We provide bash scripts for each stage of the Prophet framework. You can find them in the scripts directory. There are two common arguments you should take care of when running each script:

  • --task: specify the task (i.e., the target dataset) you want to deal with. The available options are ok (training on train set of OK-VQA and evaluating on the test set of OK-VQA), aok_val (training on train set of A-OKVQA and evaluating on the val set of A-OKVQA) and aok_test (training on train set and val set of A-OKVQA and evaluating on the test set of A-OKVQA);

Note that although Prophet uses VQA v2 datasets for pre-training, there are slight differences in how the datasets are used for different tasks (ok, aok_val, and aok_test), as detailed in configs/task_to_split.py. This means that different pre-training commands need to be followed for each task.

  • --version: specify the version name of this run. This name will be used to create a new folder in the outputs directory to store the results of this run.

Notice that you can omit any arguments when invoking following scripts, it will then use the default arguments written in the script files.

Before running any script, you can also update the configuration files (*.yml) in the configs directory to change hyperparameters.

1. OK-VQA

Take OK-VQA for example, Propht consists of two phases, stage one for training a vanilla VQA model and extracting answer heuristics, and stage two for prompting GPT-3 with answer heuristics.

Stage one

At this stage, we train an improved MCAN model (check the paper for detail description) through pretraning on VQA v2 and finetuning on target dataset. Multiple GPUs are supported by setting --gpu 0,1,2,3 (for example). Run pretraining step with commands:

$ bash scripts/pretrain.sh \
    --task ok --version okvqa_pretrain_1 --gpu 0

We've provided a pretrained model for OK-VQA here. Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task ok --version okvqa_finetune_1 --gpu 0 \
    --pretrained_model outputs/okvqa_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckpts/{your_version_name}. We've also provided a finetuned model for OK-VQA here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task ok --version okvqa_heuristics_1
    --gpu 0 --ckpt_path outputs/okvqa_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for OK-VQA are answer_aware_examples_okvqa.json and candidates_okvqa.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_prompt_1 \
    --examples_path outputs/results/okvqa_heuristics_1/examples.json \ 
    --candidates_path outputs/results/okvqa_heuristics_1/candidates.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

We also provide example scripts for the aok_val and aok_test modes on A-OKVQA.

Click to expand

2. A-OKVQA (val)

Stage one

Similary, for task of aok_val, run pretraining step with commands:

$ bash scripts/pretrain.sh \
    --task aok_val --version aokvqa_val_pretrain_1 --gpu 0

We've provided a pretrained model for aok_val here.Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task aok_val --version aokvqa_val_finetune_1 --gpu 0 \
    --pretrained_model outputs/aokvqa_val_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckpts/{your_version_name}.We've also provided a finetuned model for aok_val here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task aok_val --version aokvqa_val_heuristics_1
    --gpu 0 --ckpt_path outputs/aokvqa_val_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for aok_val are examples_aokvqa_val.json and candidates_aokvqa_val.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_val_prompt_1 \
    --examples_path outputs/results/aokvqa_val_heuristics_1/examples.json \ 
    --candidates_path outputs/results/aokvqa_val_heuristics_1/candidates.json \
    --captions_path assets/captions_aokvqa.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

3. A-OKVQA (test)

For task of aok_val, run pretraining step with commands:

Stage one

$ bash scripts/pretrain.sh \
    --task aok_test --version aokvqa_test_pretrain_1 --gpu 0

We've provided a pretrained model for aok_test here. Then, run finetuning step with commands:

$ bash scripts/finetune.sh \
    --task aok_test --version aokvqa_test_finetune_1 --gpu 0 \
    --pretrained_model outputs/aokvqa_test_pretrain_1/ckpts/epoch_13.pkl

All epoch checkpoints are saved in outputs/ckptss/{your_version_name}.We've also provided a finetuned model for aok_test here. You may pick one to generate answer heuristics by run following command:

$ bash scripts/heuristics_gen.sh \
    --task aok_test --version aokvqa_test_heuristics_1
    --gpu 0 --ckpt_path outputs/aokvqa_test_finetune_1/ckpts/epoch_6.pkl
    --candidate_num 10 --example_num 100

The extracted answer heuristics will be stored as candidates.json and examples.json in outputs/results/{your_version_name} directory.

Stage two

You may need the candidates.json and examples.json files generated in the former stage to step into this stage. Or you can just skip stage one, and use the files of answer heuristics we provided in assets. Especially, the candidates.json and examples.json files for aok_test are examples_aokvqa_test.json and candidates_aokvqa_test.json. To prompt GPT-3 with answer heuristics and generate better answers, run the following command:

$ bash scripts/prompt.sh \
    --task ok --version okvqa_test_prompt_1 \
    --examples_path outputs/results/aokvqa_test_heuristics_1/examples.json \ 
    --candidates_path outputs/results/aokvqa_test_heuristics_1/candidates.json \
    --captions_path assets/captions_aokvqa.json \
    --openai_key sk-xxxxxxxxxxxxxxxxxxxxxx

The result file will be stored as result.json in outputs/results/{your_version_name} directory.

Evaluation

For the task of ok and aok_val whose annotations are available, the scores are automatically computed after finetuning and prompting. You can also evaluate the result files that outputted after finetuning or prompting, by run

$ bash scripts/evaluate_file.sh \
    --task ok --result_path outputs/results/okvqa_prompt_1/result.json

Using the corresponding result files and evaluation script above, we obtain the accuracies in the following table, respectively.

OK-VQA A-OKVQA (val) A-OKVQA (test)
MCAN Prophet
53.0% 61.1%
MCAN Prophet
52.0% 58.2%
MCAN Prophet
45.6% 55.7%

For the task of aok_test, you need to submit the result file to the A-OKVQA Leaderboard to evaluate the result.

Citation

If you use this code in your research, please cite our paper:

@inproceedings{shao2023prompting,
  title={Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering},
  author={Shao, Zhenwei and Yu, Zhou and Wang, Meng and Yu, Jun},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  pages={14974--14983},
  year={2023}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

prophet's People

Contributors

bruceisme avatar mil-vlg avatar paradoxzw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

prophet's Issues

bug

image
Have you ever encountered this kind of bug?

mcan_530_okvqa.json

你好,请问一下mcan_530_okvqa.json 是哪部分代码生成的 谢谢

KeyError: 179520 ??

while running command

bash scripts/pretrain.sh \
    --task ok --version okvqa_pretrain_1 --gpu 0

I met this problem:

Traceback (most recent call last):
  File "/root/autodl-fs/prophet-main/main.py", line 35, in <module>
    runner.run()
  File "/root/autodl-fs/prophet-main/prophet/stage1/pretrain.py", line 162, in run
    self.train(train_set, valid_set)
  File "/root/autodl-fs/prophet-main/prophet/stage1/pretrain.py", line 93, in train
    for step, input_tuple in enumerate(dataloader):
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/autodl-fs/prophet-main/prophet/stage1/utils/load_data.py", line 136, in __getitem__
KeyError: 179520

whole project structure:

prophet-main
├── assets
│   ├── answer_aware_examples_okvqa.json
│   ├── answer_dict_aokvqa.json
│   ├── answer_dict_okvqa.json
│   ├── answer_dict_vqav2.json
│   ├── candidates_aokvqa_test.json
│   ├── candidates_aokvqa_val.json
│   ├── candidates_okvqa.json
│   ├── captions_aokvqa.json
│   ├── captions_okvqa.json
│   ├── examples_aokvqa_test.json
│   ├── examples_aokvqa_val.json
│   └── Untitled.ipynb
├── ckpts
│   └── epoch_6.pkl
├── CLIP
│   ├── clip
│   │   ├── bpe_simple_vocab_16e6.txt.gz
│   │   ├── clip.py
│   │   ├── __init__.py
│   │   ├── model.py
│   │   └── simple_tokenizer.py
│   ├── CLIP.png
│   ├── data
│   │   ├── country211.md
│   │   ├── prompts.md
│   │   ├── rendered-sst2.md
│   │   └── yfcc100m.md
│   ├── hubconf.py
│   ├── LICENSE
│   ├── MANIFEST.in
│   ├── model-card.md
│   ├── notebooks
│   │   ├── Interacting_with_CLIP.ipynb
│   │   └── Prompt_Engineering_for_ImageNet.ipynb
│   ├── README.md
│   ├── requirements.txt
│   ├── setup.py
│   └── tests
│       └── test_consistency.py
├── configs
│   ├── finetune.yml
│   ├── path_cfgs.py
│   ├── pretrain.yml
│   ├── prompt.yml
│   ├── __pycache__
│   │   ├── path_cfgs.cpython-39.pyc
│   │   ├── task_cfgs.cpython-39.pyc
│   │   └── task_to_split.cpython-39.pyc
│   ├── task_cfgs.py
│   └── task_to_split.py
├── datasets
│   ├── aokvqa
│   │   ├── aokvqa_v1p0_test.json
│   │   ├── aokvqa_v1p0_train.json
│   │   └── aokvqa_v1p0_val.json
│   ├── coco2014
│   │   ├── train2014
│   │   ├── train2014.zip
│   │   └── val2014
│   ├── coco2014_feats
│   │   ├── train2014
│   │   ├── train2014.zip
│   │   ├── val2014
│   │   └── val2014.zip
│   ├── coco2017
│   ├── coco2017_feats
│   ├── datasets.zip
│   ├── okvqa
│   │   ├── mscoco_train2014_annotations.json
│   │   ├── mscoco_val2014_annotations.json
│   │   ├── OpenEnded_mscoco_train2014_questions.json
│   │   └── OpenEnded_mscoco_val2014_questions.json
│   ├── old_data
│   │   ├── coco2014
│   │   └── coco2014_feats
│   ├── Untitled.ipynb
│   └── vqav2
│       ├── v2_mscoco_train2014_annotations.json
│       ├── v2_mscoco_val2014_annotations.json
│       ├── v2_OpenEnded_mscoco_train2014_questions.json
│       ├── v2_OpenEnded_mscoco_val2014_questions.json
│       ├── v2valvg_no_ok_annotations.json
│       ├── v2valvg_no_ok_questions.json
│       ├── vg_annotations.json
│       └── vg_questions.json
├── environment.yml
├── evaluation
│   ├── ans_punct.py
│   ├── aok_utils
│   │   ├── eval_predictions.py
│   │   ├── load_aokvqa.py
│   │   ├── __pycache__
│   │   └── remap_predictions.py
│   ├── aokvqa_evaluate.py
│   ├── okvqa_evaluate.py
│   ├── __pycache__
│   │   ├── ans_punct.cpython-39.pyc
│   │   ├── aokvqa_evaluate.cpython-39.pyc
│   │   └── okvqa_evaluate.cpython-39.pyc
│   └── vqa_utils
│       ├── __pycache__
│       ├── vqaEval.py
│       └── vqa.py
├── LICENSE
├── main.py
├── misc
│   ├── framework.png
│   └── tree.txt
├── outputs
│   ├── ckpts
│   │   ├── okvqa_finetune_1
│   │   ├── okvqa_heuristics_1
│   │   └── okvqa_pretrain_1
│   ├── logs
│   │   ├── okvqa_finetune_1
│   │   └── okvqa_pretrain_1
│   └── results
│       ├── okvqa_finetune_1
│       └── okvqa_heuristics_1
├── preds
├── prophet
│   ├── __init__.py
│   ├── __pycache__
│   │   └── __init__.cpython-39.pyc
│   ├── stage1
│   │   ├── finetune.py
│   │   ├── heuristics.py
│   │   ├── model
│   │   ├── pretrain.py
│   │   ├── __pycache__
│   │   └── utils
│   └── stage2
│       ├── prompt.py
│       └── utils
├── README.md
├── scripts
│   ├── evaluate_file.sh
│   ├── evaluate_model.sh
│   ├── extract_img_feats.sh
│   ├── finetune.sh
│   ├── heuristics_gen.sh
│   ├── pretrain.sh
│   └── prompt.sh
├── --task
├── tools
│   ├── extract_img_feats.py
│   ├── __pycache__
│   │   └── transforms.cpython-39.pyc
│   └── transforms.py
└── Untitled.ipynb

skip step 1 and go directly to step 2

Step 1 takes a long time. You mentioned in your introduction that we can skip step 1 and go directly to step 2 based on the answer_aware_example_okvqa.json and candidates_okvqa.json you provided, right?

OpenAI's apikey

I can't call OpenAI's apikey on the rented server, is there any way, thank you

1

1

当我在训练stage1时预训练、微调和生成候选答案时报了一样的错OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the

当我在训练stage1时使用官方给的预训练、微调和生成候选答案命令时报了一样的错
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-large-cased is not the path to a directory containing a file named config.json
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode
在main.py中添加代码
TRANSFORMERS_OFFLINE=1 # 离线状态下可运行
又报了新的错误:
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104,'Connection reset by peer'))
请问该怎么解决呢

Accuracy does not increased

I have trained on the custom dataset, During the training model loss decreased but accuracy remained the same at Zero.
Image

预训练MCAN模型和在okvqa上微调是一起的吗?应该先预训练MCAN,再去微调。

At this stage, we train an improved MCAN model through pretraning on VQA v2 and finetuning on target dataset. Take OK-VQA for example, run pretraining step with commands:

$ bash scripts/pretrain.sh --task ok --version okvqa_pretrain_1 --gpu 0

预训练MCAN模型和在okvqa上微调是一起的吗?应该先预训练MCAN,再去微调。
但是,上面的脚本,task是ok,是不是MCAN已经预训练结束了,然后在okvqa上进行微调?还是,预训练和微调放在一起执行呢?
是否应该有单独进行mcan预训练的执行脚本代码?然后,保存checkpoint,提供下载,然后再去okvqa上进行微调?

当我运行stage2的命令时,显示错误连接openAI,这个是什么原因呢?

Loaded dataset size: 9009, top10 accuracy: 91.81, top1 accuracy: 86.54
Loaded dataset size: 5046, top10 accuracy: 79.83, top1 accuracy: 53.05

Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5046 0:06:31/-:--:--(--s/iter)<class 'openai.error.APIConnectionError'> Error communicating with OpenAI
retrying...

50GB memory?

"To conduct the following experiments, a machine with at least 1 RTX 3090 GPU, 50GB memory", wherein "50GB memory" refers to Memory for CPU or GPU?

The process of image caption

Is there the process code about image caption? I find a "captions_okvqa.json" file in the assets about image caption, but I do not find the process code about this file

Replacing GPT-3 with other academic LLMs

Thank you so much for your excellent work!

I have a minor problem about the LLM selection. Have you tried other academic LLM models, e.g., LLAMA, to replace GPT-3? Will it make a big performance difference? Thanks!

Best regards

Naive question on OK-VQA and A-OKVQA evaluation.

Hi @ParadoxZW @MIL-VLG , thanks for your grate project.

I am not very familiar with OK-VQA and A-OKVQA evaluation. Here are some naive questions:

  • OK-VQA and A-OKVQA have an open-ended QA setting. For each question, it has ~10 gt answers (although some answers are the same). Do you use exact match (vqav2-style, match at least 3 gt answers) to compute the accuracy?
  • Is it common to train on A-OKVQA train+val and conduct inference on A-OKVQA test?

Trained model

Can we use the model you have already trained from existing code?

assets

May I ask if the files in the assets folder were created by myself? If they were generated by code, please let me learn the code. Thank you.

当我在训练stage1时预训练、微调和生成候选答案时报了一样的错 TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType 请问该怎么解决呢

Loading common data...
== Total image number: 123287
Traceback (most recent call last):
File "/root/autodl-tmp/prophet/main.py", line 40, in
runner.run()
File "/root/autodl-tmp/prophet/prophet/stage1/pretrain.py", line 160, in run
common_data = CommonData(self.__C)
File "/root/autodl-tmp/prophet/prophet/stage1/utils/load_data.py", line 55, in init
self.tokenizer = AutoTokenizer.from_pretrained(__C.BERT_VERSION)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1834, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1959, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 213, in init
if not os.path.isfile(vocab_file):
File "/root/miniconda3/envs/prophet/lib/python3.9/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

okvqa-stage1-pretrain

when l pretrain in okvqa use mcan model,it error
raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.
During handling of the above exception, another exception occurred:
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-large-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

so need download bert-large-uncased online? and run code offline?

The process of image caption

Is there the process code about image caption? I find a "captions_okvqa.json" file in the assets about image caption, but I do not find the process code about this file

mcan_530_okvqa.json

请问 我运行完finetune.sh生成的json文件 生成了 10096条数据 您给的mcan_530_okvqa.json 只有5048条
而且当我评估时准确率只有50%是因为什么 是流程错了吗 是用的您给的预训练模型

Could you provide a finetuned model for A-OKVQA dataset?

Hi, I am quite interested in your nice work and happy to see the code has been released!
A finetuned model for OK-VQA has been provided. It works.
I want to try your method on another dataset, A-OKVQA but don't find the finetuned checkpoint.
So could you provide the finetuned model for A-OKVQA dataset? Thanks!

OpenAI-Api Cost

l want know the cost of the whole process of the use of openai-api,or the cost of one use test of the model.
I'm afraid I can't afford the expense of my experiments.
Probably cost is enough.l hope can get answer,thank you very much.

Checkpoints Availability

Hello! I was wondering when/if the checkpoints for prophet would be made publically available? Thanks in advance :)

您好,关于Caption这块有疑问想请教您一下

   这个多模态模型可以理解为:使用“caption”和“答案启发”方式将图片要素转为文字来进行两个模态的交互吗?
  “Caption的内容是需要使用an off-the-shelf captioning model将图像翻译成caption”,那么这个额外的图转文模型才是多模态交互关键点吗,这样的话这个额外的模型才是决定此模型关键把,如果额外模型效果不好,噪音也就很大调用gpt意义就不大了吧?

The first candidate answer of your provided candidates_okvqa.json in assets.zip

Thank you very much for providing the code. I calculated the accuracy of the first answer on OKVQA val in the candidates_okvqa.json in assets.zip you provided. The code I run is following. It turns out that the accuracy is 47.06 instead of 53. Did I do something wrong?

import json

#load data
with open('candidates_okvqa.json') as f:
    answer_candidates = json.load(f)
with open('mscoco_val2014_annotations.json') as f:
    val_datasets_annotations = json.load(f)['annotations']

#organize answer list
val_datasets = []
for val_a in val_datasets_annotations:
    multi_answers = []
    for ans in val_a['answers']:
        multi_answers.append(ans['raw_answer'])
    row = {'question_id': val_a['question_id'], 'direct_answers': multi_answers}
    val_datasets.append(row)

#compute score for a predicted answer
def direct_scores(pred_answer, direct_answers):
    acc_num = 0
    cnt = 0
    for _, answer_id in enumerate(direct_answers):
        if pred_answer == answer_id:
            cnt += 1
    if cnt ==1:
        acc_num = 0.3
    elif cnt == 2:
        acc_num = 0.6
    elif cnt > 2:
        acc_num = 1
    return acc_num

#Calculate the accuracy of the first candidate answer for all samples
acc = 0.0
for single_sample in val_datasets:
    single_sample['DA_candidate'] = [each_answer['answer'] for each_answer in answer_candidates[str(single_sample['question_id'])]]
    score = []
    for i in single_sample['DA_candidate']:
        score.append(direct_scores(i, single_sample['direct_answers']))
    acc += score[0]
print(acc/len(val_datasets))

Looking forward to your reply.

Prerequisites Questions

Dear author, when I was processing "conda env create -f environment.yml" a error occured like this:
image
If its right to delete the "@v1.0"
I hope you can help me to answer this question, thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.