wisdomikezogwo / quilt1m Goto Github PK

View Code? Open in Web Editor NEW

113.0 113.0 9.0 1.11 MB

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

Home Page: https://quilt1m.github.io/

License: MIT License

Python 100.00%

clip-model histopathology medical-dataset multimodal-datasets vlm

quilt1m's People

Contributors

Stargazers

Watchers

Forkers

narminghaffari bolongzh thnguyn2 fatwir 15apk2000 myhdaniel fghezloo one-june

quilt1m's Issues

Reproducing zero-shot classification results

Hi, thank you very much for this great work on image-text contrastive training for histopathology and also publishing a valuable dataset.

I used the provided pre-trained QuiltNet along with given tokenizer to reproduce zero-shot classification results on NCT-CRC-HE-100k dataset. Used following commands,

import open_clip

model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:wisdomik/QuiltNet-B-32')
tokenizer = open_clip.get_tokenizer('hf-hub:wisdomik/QuiltNet-B-32')

I also used the class names and templates as given in the paper as follows,



nct_classnames = ["Adipose", "Debris", "Lymphocytes", "Mucus", "Smooth muscle", "Normal colon mucosa", "Cancer-associated stroma", "Colorectal adenocarcinoma epithelium"]


nct_template = [
    lambda c: f'a histopathology slide showing {c}.',
    lambda c: f'histopathology image of {c}.',
    lambda c: f'pathology tissue showing {c}.',
    lambda c: f'presence of {c} tissue on image.',
]

But I get a top1 accuracy lower than what's reported in the paper (59.56%), I get

zero shot metrics {'nct-zeroshot-val-top1': 0.28518236912136324, 'nct-zeroshot-val-top5': 0.7248697363418835}

I also tried training my own QuiltNet using Open_clip codebase from OpenAI, and the results were,

zero shot metrics {'nct-zeroshot-val-top1': 0.30728805599660086, 'nct-zeroshot-val-top5': 0.6808149026097458}

Could you kindly help me understand why I am. unable to reproduce the given numbers? I need to understand what I might be doing wrong.

Thank you.

Error in data_utils.py

Hello!

After downloading the videos I get an error running

python -m main --base_dir ${BASE_DIR}

the stack trace is as follows;

Traceback (most recent call last):
  File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 166, in <module>
    main(args, data_df, recon_df, device, histo_models_dict, video_paths_dict)
  File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 68, in main
    rep_chunk_im_temp = save_frame_chunks_recon(video_path, stable_times, chunk_id,fps, height, width)
  File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/data_utils.py", line 108, in save_frame_chunks_recon
    clip_start_time, clip_end_time = start_end_time
TypeError: cannot unpack non-iterable int object

Some additional variables that can help in understanding what's happening

>>> stable_se_times 
(2, 17)

>>> start_end_time
2

basically, the assignment coming from start_end_time generates an error due to the line

clip_start_time, clip_end_time = start_end_time

Any clue on where this might come from?

Thanks!

Error on loading QuiltNet-B-16

Hi,
The error is occurred from below command:

from transformers import CLIPModel
model = CLIPModel.from_pretrained("wisdomik/QuiltNet-B-16", use_auth_token=None) .

The error msg is:

RuntimeError: Error(s) in loading state_dict for CLIPModel:
	size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([768, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
	size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([197, 768]) from checkpoint, the shape in current model is torch.Size([50, 768]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

Is there any issues with that? or my dev environment has something wrong?

How to access the corresponding texts for QUILT dataset? I got the access to images via ZENODO but could not find corresponding texts.

Hi,
I am trying to recreate the QUILT dataset. I have a doubt regarding some of the columns in the CSV files that you have shared in the repo. Can you please highlight how you obtained the "stable_times" column in quilt_recon.csv?

Also, Were the images in the "image_path" column of quilt_data.csv extracted using the Static Video Chunk Detection Algorithm? Can you please elaborate on the generation of the quilt_data.csv file?

Thank you

please do not use BiomedCLIP for ARCH dataset

Dear Author,

The ARCH dataset is divided into two subsets: the books_set and the pubmed_set.

I have noticed that the pubmed_set appears to overlap with BioMedCLip, which sources from PubMed Central.

In your paper, you combined these two datasets for cross-modality retrieval. However, I decided to separate them and compare their performance individually.

The retrieval performance on the pubmed_set was as follows:
{15.7; 79.8; 94.4; 16.7; 78.9; 93.7}

Meanwhile, the retrieval performance on the books_set was:
{7.3; 49.2; 74.2; 8.2; 49.7; 73.2}

In contrast, the performance of QUILT-GPT/77 showed different results:

The retrieval performance on the pubmed_set was:
{1.8; 23.6; 46.0; 1.6; 23.4; 45.7}

The retrieval performance on the books_set was:
{1.8; 27.7; 52.8; 1.5; 23.4; 46.4}

From these results, it's clear that there isn't as significant a domain gap between the two datasets as there is with BiomedCLIP.

Image-to-text generation

Can you please guide, How I can use quilt1 for image to text generation. Like I input an image, and it generates the text. Do I need to use LLaVA and BLIP like modes where I assign the weights of the quilt1m and use it for text description generation. As the API mentioned at the hugging face is only for zero short classification. and I could not find the Text retrieval code in GitHub repo. Moreover, I also tried blip, but got compatibility issues. Thanks.

Missing Imports and Code Errors

Hello!

Thanks for the great resource!

I have been trying to run the data reconstruction but I kind of stumbled upon a couple of different errors (some are missing imports - e.g., nn from torch - one was a parenthesis that was not closed). There are also a couple of missing requirements (e.g., scikit-image) in the requirements file.

Would you mind taking a look? I have solved some of these and I am happy to send a PR in case but maybe you have an updated version of the code that runs out of the box.

Downstream tasks setting

First thansk for your impressive work on meidcal VLP coummunity!

From your paper, there are many downstream tasks in the benchmark to evlaute the VLP model, could you provide the pipeline or script to prepare the downstream dataset and evaluation?

Best Regards

Video and Frames Download

Hi again :)

do you have any code you can share for downloading the videos?

Thank you so much! I appreciate your help on this!

Load ViT-B/32

Hi, thank you for sharing your work.

Can you provide me some more details to load ViT-B/32 model?

Providing dataset

First of all, thank you for providing good paper and dataset.

I am wondering whether your team have a plan to provide quilt-1m dataset including additional dataset from twitter, PMC, etc.

Thank you!

How to fine-tune QuiltNet-B-32 model

Hello, thanks for sharing your work.

I want to fine-tune the QuiltNet-B-32 model to suit my downstream tasks. Can you provide a fine-tuning script? Or give an example of using QuiltNet-B-32?

Downloaded Dataset Size

Hi!

What is the expected size of the dataset once downloaded? After processing by calling main.py?
Also approximately how long would each step of the process take?

Clarification on stable_times assignment in data/main.py

Hello, thanks for sharing your work.

I am currently working with your project and I have a question regarding a specific line of code in the data/main.py file. In the file, I noticed the following line:"stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id][0]". I wanted to confirm if this line should actually be:''stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id]''Could you provide some clarification on this?

Additionally, I'm curious about how the list_idle_im_se_t_tup variable is generated and how it ensures that its length matches the length of chunks. Could you please point me to that section of the code or provide some insights on how this synchronization is achieved?

I appreciate your time and assistance. Thank you in advance for your help!

Best regards

Visualizations of the results

Hello, thank you for your great work!

I see that in this repository, there are comparisons in terms of visualization. Could you provide the visualization tutorials or scipt on this? Thank you!

Access to the data

Hello,

Firstly, thank you for this. Amazing work!

Hello, I'm a PhD student and I have applied to get access to your dataset. I haven't received any reply yet, could you please give me access.

[email protected]

Best regards,
Markus Ekvall