This is the official repository for Retrieval Augmented Visual Question Answering

License: GNU General Public License v3.0

Jsonnet 1.24% Python 80.48% Dockerfile 0.12% Makefile 0.02% CSS 0.12% JavaScript 0.05% Jupyter Notebook 14.01% C 0.27% C++ 0.50% Cuda 3.19%

retrieval-augmented-visual-question-answering's People

Contributors

Stargazers

Watchers

Forkers

czp686 nkristina erichen0615 manoj00018 caisarl76 ericsongyl bjliudong

retrieval-augmented-visual-question-answering's Issues

Could someone provide a trained checkpoint file

If someone can help me, I would be extremely grateful.

DPR problem

Hello, I have a question about DPR that I would like to ask. For example, if I want to modify the input caption and target detection labels, how can I use these new inputs to modify the retrieved paragraphs from the article? I tried to find the answer in the code myself, but this part of the code is too complex and difficult to understand. Could you give me some hints?

Question: minimum hardware requirements to run experiments

The repository provides no information about the minimum hardware requirement to execute the project. Even the original paper provides no details on NVIDIA A100 cluster details (apart from mentioning, we use A100 cluster which is useless).

Please specify details on the minimum hardware requirement. Also, does this work with a single RTX 3090 (single hugging face transformer is supported, one can technically run a much smaller model like llama2-7B to generate the response but i am unsure if initial object detection, caption generation, and ocr generation can be done with an single RTX GPU - wonder why not)?

RAVQA-v2 code

Hello, thank you for your excellent work! May I ask when the code for "Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering" will be released?

我加载了您分享的RAVQA的最后测试的检查点文件，但是提示错误，请问怎么解决呢？

RuntimeError: Error(s) in loading state_dict for RagExecutor:
size mismatch for model.generator.shared.weight: copying a param with shape torch.Size([32110, 1024]) from checkpoint, the shape in current model is torch.Size([30532, 1024]).
size mismatch for model.generator.encoder.embed_tokens.weight: copying a param with shape torch.Size([32110, 1024]) from checkpoint, the shape in current model is torch.Size([30532, 1024]).
size mismatch for model.generator.decoder.embed_tokens.weight: copying a param with shape torch.Size([32110, 1024]) from checkpoint, the shape in current model is torch.Size([30532, 1024]).
size mismatch for model.generator.lm_head.weight: copying a param with shape torch.Size([32110, 1024]) from checkpoint, the shape in current model is torch.Size([30532, 1024]).

AttributeError: 'DPRQuestionEncoderTokenizerFast' object has no attribute 'resize_token_embeddings'

我只能使用本地加载的方式，加载了该分词器，为什么提示没有该属性呢

Hello, your work is very good. When can you publish the code

Release code of Fine-grained Late-interaction Multi-modal Retrieval (FLMR)

Hello,
First of all, I want to express my gratitude for your interesting paper: Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering.

After thoroughly reviewing your paper, I am eager to delve deeper into the methodologies and experiments you have presented. To this end, I am particularly interested in the code that underpins your research, as it would greatly aid in understanding and potentially replicating your groundbreaking work.

I understand the complexities and time constraints associated with preparing code for public release, yet I believe that access to the code would not only benefit me but also contribute significantly to the broader academic community. It would enable a more comprehensive understanding of your work and foster further research in the field.

Thank you for your consideration and for the remarkable contributions your research has made to the field. I look forward to the possibility of exploring your work more deeply and the release of the code.

FLMR中Feature-based Vison的一些问题

您好，我最近在尝试根据论文复现FLMR的代码，我注意到在论文的表二中使用了Feature-based vision，这里的视觉特征是指通过Vit得到的整张图片的特征吗？这部分特征是否经过了与文本的对齐呢？

Some bug when processing img_key

When generating VinVL features, in file: prepare_data_for_okvqa.py

Theimg_key is ''.

img_key = img_p.split('.')[0].split('_')[-1] should be img_key = str(imgId).zfill(12)

The same for ocr.py

Missing file okvqa/train_okvqa.yaml and okvqa/test_okvqa.yaml

Thanks for your paper and shared codes!

When I tried to generate the VinVL object features, I run the following script:

python tools/test_sg_net.py \
    --config-file sgg_configs/vgattr/vinvl_x152c4_okvqa_testset.yaml  \
    TEST.IMS_PER_BATCH 8  \
    MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth  \
    MODEL.ROI_HEADS.NMS_FILTER 1  \
    MODEL.ROI_HEADS.SCORE_THRESH 0.2  \
    DATA_DIR "./datasets/"  \
    TEST.IGNORE_BOX_REGRESSION True  \
    MODEL.ATTRIBUTE_ON True  \
    TEST.OUTPUT_FEATURE True

But it didn't find the file okvqa/train_okvqa.yaml or okvqa/test_okvqa.yaml.

I checked the vinvl_x152c4_okvqa_testset.yaml file, and its DATASETS part's TRAIN and TEST dataset yaml file couldn't be found in the pre-processed data folder.

Could you tell me where to find those files or if I did something wrong?

Thanks!

训练RAVQA时，评估阶段输出没有结果

下边是训练阶段给出的输出：
ace <---> ( [SEP] [SEP] [SEP])
vine <---> ( [SEP] [SEP] [SEP])
stuffed animal <---> ( [SEP] [SEP] [SEP])
mouth <---> ( [SEP] [SEP] [SEP])
下方是训练阶段时给出的评估结果：
{'test/accuracy_AnswerType_other': 0.04,
'test/accuracy_QuestionType_eight': 0.0,
'test/accuracy_QuestionType_five': 0.12,
'test/accuracy_QuestionType_four': 0.0,
'test/accuracy_QuestionType_nine': 0.0,
'test/accuracy_QuestionType_one': 0.0,
'test/accuracy_QuestionType_other': 0.19,
'test/accuracy_QuestionType_seven': 0.0,
'test/accuracy_QuestionType_six': 0.0,
'test/accuracy_QuestionType_ten': 0.0,
'test/accuracy_QuestionType_three': 0.0,
'test/accuracy_QuestionType_two': 0.0,
'test/accuracy_overall': 0.04,
'test/epoch': 3,
'test/exact_match_at_1': 0.0,
'test/exact_match_at_2': 0.00019817677368212446,
'test/exact_match_at_3': 0.00019817677368212446,
'test/exact_match_at_4': 0.00019817677368212446,
'test/exact_match_at_5': 0.00019817677368212446,
'test/failed_hit': 0.0014665081252477209,
'test/failed_no_hit': 0.9984542211652794,
'test/gold_precision_at_5': 0.42663495838287757,
'test/gold_recall_at_5': 0.7201743955608403,
'test/n_retrieved_docs': 5,
'test/precision_at_5': 0.5906856916369402,
'test/recall_at_5': 0.8563218390804598,
'test/selected_failed_hit': 0.0011890606420927466,
'test/selected_failed_no_hit': 0.9988109393579072,
'test/selected_successful_hit': 0.0,
'test/selected_successful_no_hit': 0.0,
'test/successful_hit': 3.963535473642489e-05,
'test/successful_no_hit': 3.963535473642489e-05}

我发现是下边的代码可能有问题：好像是预测的结果被截断了。请问我需要修改代码吗？
rag_executor.py:
bos_token_id = self.data_loader.decoder_tokenizer.bos_token_id #bos_token_id =30531
for index, i in enumerate(labels):
cleaned_i = [label if label!=-100 else self.decoder_tokenizer.pad_token_id for label in i]
cleaned_i = torch.LongTensor(cleaned_i)
decoded_label = self.decoder_tokenizer.decode(cleaned_i, skip_special_tokens=True)

        output_sequence = outputs[index]
        # print('output_sequence', output_sequence)
        cleaned_i = [label if label!=-100 else self.decoder_tokenizer.pad_token_id for label in output_sequence]
        cleaned_i = torch.LongTensor(cleaned_i)

        output_sequence = output_sequence.cpu().numpy().tolist()

        if bos_token_id in output_sequence:
            output_sequence = output_sequence[output_sequence.index(bos_token_id):]

        decoded_output = self.decoder_tokenizer.decode(output_sequence, skip_special_tokens=True)
        actual_output = self.decoder_tokenizer.decode(output_sequence, skip_special_tokens=False)
        # print(self.tokenizer.decode(cleaned_i, skip_special_tokens=True))
        
        if batch_idx < 10:
            print(decoded_label, '<--->', decoded_output, '   ({})'.format(actual_output))

wandb connection problem

Hello, last week I was able to run this program normally, but this week I don't know what went wrong. It keeps throwing an error and cannot access api.wandb.ai. I want to try setting wandb to offline or simply not using wandb, but the information I found online couldn't solve my problem. I hope you can tell me which part of the code I should modify. Thank you very much.

Model download problem

Hello,I would like to use Oscar to generate captions. However, I am unable to access the link you provided, possibly due to Azure permissions issues. Unfortunately, it is currently not possible to obtain a free account on Azure in China. I would like to inquire if there are any other links available to download the pre-trained Oscar model. Alternatively, would it be possible for you to upload the model to a different cloud storage platform?

Missing maskrcnn_benchamrk.data module

Thanks for your paper and shared codes!

When I tried to generate the VinVL object features, I run the following script:

python tools/test_sg_net.py \
    --config-file sgg_configs/vgattr/vinvl_x152c4_okvqa_testset.yaml  \
    TEST.IMS_PER_BATCH 8  \
    MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth  \
    MODEL.ROI_HEADS.NMS_FILTER 1  \
    MODEL.ROI_HEADS.SCORE_THRESH 0.2  \
    DATA_DIR "./datasets/"  \
    TEST.IGNORE_BOX_REGRESSION True  \
    MODEL.ATTRIBUTE_ON True  \
    TEST.OUTPUT_FEATURE True

But it went into an error:

Traceback (most recent call last):
  File "tools/test_sg_net.py", line 14, in <module>
    from maskrcnn_benchmark.data import make_data_loader
ModuleNotFoundError: No module named 'maskrcnn_benchmark.data'

When I checked the materials/scene_graph_benchmark folder, there wasn't a data module.

Could you tell me how to solve this problem? Thanks!

About image features

Hello!

I am wondering if you have processed image features for this task before. And do you know what about the model's performance with image features?

Thank you very much!

训练RAVQA时的输出是下边这样的，正常吗？

race <---> [PAD] [unused2] super [unused2] super [unused2] super [unused2] super [unused2] ([PAD] [unused2] super [unused2] super [unused2] super [unused2] super [unused2])

some questions about implementing RAG.

I noticed that there is code related to RAG-sequence marginalization in your codebase, but you didn't provide the execution command for the RAG model in your README.md. Based on my understanding, if we want to implement the RAG model using your code, we may replace seq_logprobs with rag_logprobs to compute the NLL Loss, and use this loss as the model's loss function for backpropagation. I'm curious about how the RAG model is implemented in your code, and if my understanding is incorrect, could you explain the specific implementation approach?

One last question.I have noticed that RAG, TRiG, and RAVQA models are quite similar, but they differ in some aspects. TRiG does not involve joint training of the Retriever and Generater models, while RAG and RAVQA do. However, they use different loss functions as they address different types of problems. I am not sure if my understanding is correct, so I would like to ask for your opinions.

Question about using BLIP2 as the answer generator

Thanks for your valuable work!
I am investigating RA-VQA and studying your previous work Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering. In this paper, you mention that you use BLIP2 as your answer generator. I wonder if you input the whole image into BLIP2 to generate answer?
Looking forward to your apply.

可以用自定义数据微调吗

训练RAVQA

我加载了您提供的DPR检查点文件，然后去训练RAVQA框架。
我发现在训练时，生成的输出时解码后是下边这样的：请问这是正常的吗？
['[PAD] [unused2] [unused8] [unused25] [unused8] [unused915] black [unused14]',
'[PAD] [unused2] [unused8]imating red white blue [unused377] [unused76]',
'[PAD] [unused2] [unused8] [unused2] [unused110] [unused2] [unused74] [unused2] [unused25] [unused2]',
'[PAD] [unused2] [unused8]imating [unused2] [unused8]imating [unused2] [unused8]imating',
'[PAD] [unused2] [unused5] [unused5] [unused5] [unused5] [unused5] [unused5] [unused5] [unused5]']
为什么会有[unused2] 这些类似的乱码呢？

vinvl_large

Hello, may I ask where I can download the "vinvl_large" model mentioned in Step 4: Running models?

下载相关

您好，您的工作很棒，我对您的工作十分感兴趣。但是我在下载46G的data.zip的时候遇到了问题，下载始终失败。请问您是否可以提供百度网盘的链接。十分感谢！

在用RAVQA做测试时，每个passages的输出时下边这样的。这正常吗？

[PAD] race
[PAD] race
[PAD] race [unused12] [unused12]
总是存在一些奇怪的token，请问怎么回事呢？

检索器支持中文吗？

亲爱的作者，请问DPR这个部分，我可以改为使用中文数据集进行训练，然后检索中文吗？

关于更新DPR检索的问题

你好，我现在使用其他的caption和object detection模型得到了更好的img-to-text的转化结果，我用新的文件替代了原来的文件，然后把cache的文件删除了，重新开始训练模型，之前我以为就这样DPR会因为更好的caption和label检索得也会更好，但是我最近认真看了一下代码，在LoadPretrainedDPROutputForGoogleSearchPassage（我觉得应该就是在这个函数里把问题相关的文章读进来）里读取的文件是../Experiments/Knowledge_Retriever_DPR_dim_768_inbatch_negative_caption_FullCorpus_NewRun/test/test_evaluation/train_predictions.json，这样就代表这个文件只要不改变，其实我的DPR检索的文章会跟我之前没改caption和object detection模型的是一样的。不知道我这个理解是否正确？然后应该是要重新跑DPR的test让它重新生成文件才行，但是我现在又遇到一个问题——测试的时候要加载一个ckpt文件但是我在你提供的预训练DPR文件中似乎找不到符合的文件。
请解答一下我的疑惑，这对我来说很重要，谢谢了

评估指标EM存在一些问题

Question about infoseek wikipedia chunking and results split

Hi lin, a series of your work(RA-VQA, FLMR, PreFLMR) has made great contributions to the field of KB-VQA, which is really impressive!

Recently i have special interest in InfoSeek task, after reading with due care, i still have two questions about the details, i wonder if you could generously help:

What text preprocesses did you use for the wikipedia articles(accompained with infoseek task), especially for chunking?
In the released knowledge base of infoseek, the wikipedia articles are rather long. Did you do any chunking to divide a wikipedia content into multiple candidate passages?
And what prompt do you use for generating an answer based on the retrieved passage?
In PreFLMR(https://arxiv.org/abs/2402.08327) Table 7, which split of InfoSeek does the result belong to?
The test/human split of InfoSeek seems have not been released, is the result on the paper belongs to the val split, or M2KR-subsampled infoseek split?

Thanks in advance!

[ERROR] - main : Uncaught exception: <class 'lightning_lite.utilities.exceptions.MisconfigurationException'> --> The provided lr scheduler `LambdaLR` doesn't follow PyTorch's LRScheduler API. You should override the `LightningModule.lr_scheduler_step` hook with your own logic if you are using a custom LR scheduler.

[ERROR] - main : Uncaught exception: <class 'lightning_lite.utilities.exceptions.MisconfigurationException'> --> The provided lr scheduler LambdaLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.
Traceback (most recent call last):
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/main.py", line 341, in
main(config)
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/main.py", line 162, in main
trainer.fit(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 582, in fit
call._call_and_handle_interrupt(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 624, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1042, in _run
self.strategy.setup(self)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 182, in setup
self.setup_optimizers(trainer)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 142, in setup_optimizers
self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
_validate_scheduler_api(lr_scheduler_configs, model)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 352, in _validate_scheduler_api
raise MisconfigurationException(
lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.
wandb: Network error (TransientError), entering retry loop.
wandb: 🚀 View run OKVQA_DPR_FullCorpus at: https://wandb.ai/ravqa/VQA_publication/runs/7bnwi9tl
wandb: Synced 3 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230912_100635-7bnwi9tl/logs

could you share the prediction results of the test set?

使用BLIP2生成答案

您好，使用BLIP2来生成答案大概需要多大的显存呢，我在A6000 48G上运行BLIP2，并且将num_knowledge_passages_in_training设置为1，仍然无法成功运行。我想知道是否是我的一些设置有问题呢？

下载相关https://biglmdiag.blob.core.windows.net/vinvl/datasets/coco_caption

你好，这两个文件下载有问题，请问这两个问件能分享下吗，谢谢！

The hardware requirements for this model.

My two NVIDIA TITAN X GPUs with 12GB memory each can't handle this model. I'm wondering what hardware configuration others are using to run this model successfully.

Didn't find dpr_training_annotations

Hi, when I try to train the DPR model, I didn't find the dpr_training_annotations files in the repo.

The config in the jsonnet is

local dpr_training_annotations = {
  "train": "../data/ok-vqa/pre-extracted_features/passages/retriever_train.json",
  "valid": "../data/ok-vqa/pre-extracted_features/passages/retriever_testdev.json",
  "test": "../data/ok-vqa/pre-extracted_features/passages/retriever_test.json",
};

I didn't find the three json files.

Thanks!

Where can I find the speed comparison between DPR and FLMR, PREFLMR

Thanks for your awesome work!
I just want to know the disparity of the speeds between DPR and FLMR (or PREFLMR)
Looking forward to your reply

performance using GPT-3 pretrain

Thank you for your great work.
And I wonder that how about the performance of RA-VQA using GPT-3 as pretrain model for answer generation.

Question about release of infoseek data split

Hi lin, sorry to trouble you!

Recently i have special interest in the infoseek task. I noticed that you have used different train set (downsampled from infoseek train set), val and test set(sampled from infoseek val set), and the results of infoseek on PreFLMR Table.7 is reported using such train/val/test split. (issue #36)

However, i could not find the split of infoseek datasset in M2KR huggingface repo. For a fair comparison on infoseek task, could you please share it? And, for a quick check, are you using infoseek_eval for evaluating on your test split?

Much thanks!

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Traceback (most recent call last):
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 654, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
results = self._run_stage()
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
return self._run_train()
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1274, in _run_train
self._run_sanity_check()
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1342, in _run_sanity_check
val_loop.run()
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 143, in advance
output = self._evaluation_step(**kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 240, in _evaluation_step
output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1703, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 370, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/trainers/rag_executor.py", line 168, in validation_step
return self._generative_step(sample_batched, batch_idx)
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/trainers/rag_executor.py", line 199, in _generative_step
generation_outputs = self.model.generate(**test_batch)
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/models/rag/rag_model.py", line 406, in generate
generation_outputs = self.generator.generate(**test_batch)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/transformers/generation_utils.py", line 990, in generate
return self.greedy_search(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/transformers/generation_utils.py", line 1292, in greedy_search
outputs = self(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1399, in forward
decoder_outputs = self.decoder(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 900, in forward
raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds")
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Inquiry Regarding Your Awesome work "Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering"

Hello,
We have recently been trying to reproduce the "Vision-Language Alignment" component explored in the "Fine-Grained Post-Interactive Multimodal Retrieval for Retrieving Enhanced Visual Question Answers" section of your paper. "

Could you kindly inform us about which specific fields of the WIT (Wikipedia-based Image Text) dataset were utilized in your research? Additionally, we are curious about the volume of data that was employed for training purposes.

If permissible, we would be immensely grateful for the opportunity to access the relevant portion of your code or model weights (mapping network F_M of DPR Baseline and FMLR). This would significantly aid us in advancing our research, and we assure you that all due credit and references to your groundbreaking work will be prominently acknowledged in any of our subsequent publications or presentations in this field.

We understand the value of your work and appreciate any assistance you can provide. Thank you very much for considering our request.

关于代码中TRiG中检索器的问题

你好，我最近想在你代码中提供的TRiG框架中DPR检索器之上做一些改进，我希望能先得到DPR检索的topk100篇文章，然后再用自己的rerank模块重新排序然后把前五篇文章再放回TRiG中生成答案。所以我想问的是你可以给我一点提示，大概在哪部分的代码能得到关于每个问题检索的topk篇文章吗？

linweizhedragon / retrieval-augmented-visual-question-answering Goto Github PK

retrieval-augmented-visual-question-answering's People

Contributors

Stargazers

Watchers

Forkers

retrieval-augmented-visual-question-answering's Issues

Recommend Projects

Recommend Topics

Recommend Org