xiaoman-zhang / pmc-vqa Goto Github PK

PMC-VQA is a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.

License: MIT License

Python 99.27% Shell 0.73%

pmc-vqa's Introduction

PMC-VQA

The official codes for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model, and establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.

The dataset is available at Huggingface

The model checkpoints are available at MedVInT-TE and MedVInT-TD. The previous checkpoint of MedVInT-TD was mistakenly uploaded. We have rectified the issue and updated the model's checkpoint on July 31. Now, you can access the correct and improved version of the model.

PMC-VQA

Usage

1. Create Environment

Please refer to https://github.com/chaoyi-wu/PMC-LLaMA

2. Prepare Dataset

Download from Huggingface and save into ./PMC-VQA

3. Model Checkpoints

Download the pre-trained MedVInT-TE, and save into ./src/MedVInT_TE/Results directly.

Download the pre-trained MedVInT-TD, and save into ./src/MedVInT_TD/Results directly.

See MedVInT_TE and MedVInT_TD for the details of training MedVInT_TE and MedVInT_TD.

Acknowledgement

CLIP -- https://github.com/openai/CLIP

PMC-CLIP -- https://github.com/WeixiongLin/PMC-CLIP

PMC-LLaMA -- https://github.com/zphang/minimal-llama

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

Contribution

Please raise an issue if you need help, any contributions are welcomed.

Citation

If you use this code or use our pre-trained weights for your research, please cite our paper

@article{zhang2023pmcvqa,
      title={PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering}, 
      author={Xiaoman Zhang and Chaoyi Wu and Ziheng Zhao and Weixiong Lin and Ya Zhang and Yanfeng Wang and Weidi Xie},
      year={2023},
      journal={arXiv preprint arXiv:2305.10415},
}

pmc-vqa's People

Contributors

Stargazers

Watchers

Forkers

lazykumasensei apollohuang1 muchlisinadi asellerg travellerxi 2132660698 chenzhi0521 magic-ai4med zhaozh10 tabaochoke

pmc-vqa's Issues

The checkpoint of Blank MedVInT model is unable to load.

Thanks for your excellent work!
I wanted to replicate the results of your paper on the VQA-RAD dataset, and when I tried to run train_downstream.py, I want to load VQA_lora_PMC_LLaMA_PMCCLIP/blank/checkpoint-1382/pytorch_model.bin. However, when the LLaMa model parameters were loaded, an error occurred that the parameter key-value pairs did not match.
I downloaded PMC - LLAMA model from https://huggingface.co/chaoyi-wu/PMC_LLAMA_7B, and set up corresponding loading paths. What's the solution?

With best wishes

How to reproduce your work on evaluating Rad and slake?

Thanks for sharing code. I have downloaded the checkpoint at https://huggingface.co/xmcmic/MedVInT-TD/tree/main/VQA_lora_PMC_LLaMA_PMCCLIP/blank/checkpoint-1382 and try to run test_VQA_RAD.py. But I have got total nonsense answer. such as:

Could you release more details about evaluation?

About the pretrained model used in the building process of PMC-VQA

Thank you for your marvelous work! And would you open more information about the building process of PMC_VQA dataset in the future, such as the LLaMA-7B trained with text data and finetuned with the manually annotated 2k question-answer pairs？ Thak you very much and any reply would be appreciated.

About the pre-trained models of VQA-Rad and Slake

Thanks for you marvelous work! Could you please consider releasing the pre-trained models for VQA-Rad and Slake? I encountered an issue in './src/MedVInT_TD/test_VQA_RAD.py' where I couldn't find the released model './Results/QA_no_pretrain_no_aug/VQA_RAD/checkpoint-16128' as specified in line 21. Your assistance with this would be greatly appreciated.

About the llama version used in the model

Hi, thank you for your work. May I ask the which llama you used in the model. Because when I use your weight to evaluate the model, all results are . So could you share the llama you used in the model?

How to run as an API for specific image and task?

Hi, thanks for the great work. When i tried test.py on MedVInT_TD, the dataloader return key error Caption. I have checked the download data and see no column name Caption.
I try to approach as wrap it as API which receive images and prompt as input and get the results. However, i am still lost at what is input_ids. Could you elaborate more on this, and how we can use this to serve maybe as FastAPI.

The accuracy on the dataset version-2

Hi, thanks for you amazing work.
I find you release an update dataset with only noncompound images. So can you tell us the accuracy on this new version-2? I think the accuracy reported in the your paper is evaluated on version-1?

Inquiry About Manual Verification Process in PMC-VQA Dataset

Hello,

I would like to express my appreciation for your exceptional work. I am reaching out to inquire further about a specific claim made in your paper, which states: “we propose a test set that has undergone manual verification.”

Verification Process Details: Could you please provide a detailed description of the manual verification process? Understanding the steps involved would greatly help us know the methods used in your study.

Expertise in Diverse Medical Modalities: Considering the PMC-VQA dataset encompasses a variety of medical image modalities, I am curious to know if the verification was conducted by practitioners specialized in the respective medical fields. Does the verification team consist of experts from different medical specialties to ensure the accuracy and reliability of the dataset?

Thank you for your time and assistance.

Open-ended or close-ended?

Hi, it is a great work!
However, I'm confused whether PMC-VQA is an open-ended or close-ended task. The paper states that it is an open-ended task, but the dataset used is a classification dataset and Figure 3 in the paper also depicts a classification-based VQA.

Filtering images by type

Hi, thanks for making this public!
I am wondering if there is a way to take subsets of the dataset by image types, like what you show in Figure 1 of the paper.
For example, getting only the question-answer pairs for ultrasonography figures.
Looking at the dataset on Huggingface, I can't see any columns that contain figure type labels.
Thanks!!

More details needed for reproducibility on VQA

Thank you for releasing code and checkpoints to the new state-of-the-art VQA model! Could you please help us reproduce your results by providing some more details and clarifications?

How many epochs are the models trained for on PMC-VQA (and then on respective benchmarks)?
Which version of VQA-RAD is used? Most available sources cite 2248 QA pairs (including on huggingface) but your paper cites 3515 QA pairs (which perhaps also includes the 1267 framed questions, as described on page 5 of the original paper.
Is the learning rate same (2e-5) for all models and all parameters, and also for different datasets (PMC-VQA and then the individual VQA benchmarks)?

Is it same the "VQA_lora_PMC_LLaMA_PMCCLIP" and "QA_PMC_LLaMA_lora_PMC-CLIP_MLP"?

I want to run test.py in MedVInT_TD directory.
But when I changed the path of ckp(VQA_lora_PMC_LLaMA_PMCCLIP/choice/checkpoint-4000) and model path (chaoyi-wu/PMC_LLAMA_7B), and tokenizer_path to chaoyi-wu/PMC_LLAMA_7B, it doesn't work

I can't find the dir of QA_PMC_LLaMA_lora_PMC-CLIP_MLP.

Clarification regarding model evaluation

In the test.py file line 138:

I understand that pred = generated_texts[i][-1] essentially takes the last token generated (which is usually A, B, C, D), and compares it with the ground truth (which is a few words long like MRI, CT Scan, None of the above).

Can the authors please clarify if that indeed is the case and if that would be a fair comparison? Thank you

What is ./Results/QA_PMC_LLaMA_lora_PMC-CLIP_MLP/choice_training/checkpoint-4146 ?

Could you please provide the link for QA_PMC_LLaMA_lora_PMC-CLIP_MLP/choice_training/checkpoint-4146 you use in this project?

Split generation exception on fetching the `PMC-VQA` data

Dear @xiaoman-zhang, I am attempting for downloading the dataset using dataset library.

Using python 3.10 and dataset==2.15.0 launching the dataset copying as follows:

import datasets
from pathlib import Path
datasets.config.DOWNLOADED_DATASETS_PATH = "./data"
dataset = datasets.load_dataset("xmcmic/PMC-VQA", split='train[:10]')

I end up into the split generation issue:

File "/home/datasets/PMC-VQA/venv/lib/python3.10/site-packages/datasets/table.py", line 2290, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
index: int64
Figure_path: string
Caption: string
Question: string
Choice A: string
Choice B: string
Choice C: string
Choice D: string
Answer: string
split: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 1408
to
{'Figure_path': Value(dtype='string', id=None), 'Question': Value(dtype='string', id=None), 'Answer': Value(dtype='string', id=None), 'Choice A': Value(dtype='string', id=None), 'Choice B': Value(dtype='string', id=None), 'Choice C': Value(dtype='string', id=None), 'Choice D': Value(dtype='string', id=None), 'Answer_label': Value(dtype='string', id=None)}
because column names don't match

Is it expected behaviour and what would be recommendation on accessing the dataset?

Thank you for assistance!

代码问题

非常感谢你们分享的代码，但是看了代码之后发现有很多错误的地方，比如：from model.QA_model import QA_model 应该是 from models.QA_model import QA_model，还有很多类的命名令人疑惑的比如：class Binary_VQA_Model(nn.Module): 这个应该不是二元的VQA模型吧
请问这个代码会继续修改吗

The question about task_type=TaskType.LM in MedVInT_TE

Hello, I have some problem about the MedVInT_TE in
“src/MedVInT_TE/models/llama/vqa_model.py Line52”

def get_peft_config(peft_args: PEFTArguments):
    if peft_args.peft_mode == "lora":
        peft_config = LoraConfig(
            task_type=TaskType.LM, inference_mode=False,
            r=peft_args.lora_rank,
            lora_alpha=32, lora_dropout=0.1
        )

I noticed that TaskType.LM is not included as one of the task types in PEFT. Could you clarify if TaskType.LM refers to an embedding-typed model for Lora that you created? I made changes to the code, adjusting the task_type, but I am still encountering issues when loading the official checkpoint.

Filtering images by type

Hello! If I only need the X-ray data, how do I extract them?

Fine-tuning on VQA-RAD & SLAKE datasets

Hi @xiaoman-zhang,

Thank you for sharing this code of your interesting work PMC-VQA.
I found that you reported the performance on VQA-RAD and SLAKE dataset.
Could you provide the details about the finetuning on VQA-RAD and SLAKE dataset?
Thank you in advance.

Best,

Computing accuracy on close- and open-ended questions

Hi, thank you for providing the code for finetuning the model.
To be able to reproduce your results in the paper, I would like to know how you computed the accuracy on close- and open-ended questions in VQA-RAD and Slake.

Can you confirm that for close-ended questions, you get the set of all answers from the "CLOSE" type questions in both test and train sets of each dataset and call find_most_similar_index() as in the test.py script?

And for open-ended questions, you get the set of all answers from "OPEN" type questions in both test and train sets of each dataset and call find_most_similar_index() as in the test.py script?

Thank you

Where is "llama-7b-hf"?

Could you please provide the link for the llama model you use in this project?