wisdomikezogwo / quilt1m Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Home Page: https://quilt1m.github.io/
License: MIT License
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Home Page: https://quilt1m.github.io/
License: MIT License
Hi, thank you very much for this great work on image-text contrastive training for histopathology and also publishing a valuable dataset.
I used the provided pre-trained QuiltNet along with given tokenizer to reproduce zero-shot classification results on NCT-CRC-HE-100k dataset. Used following commands,
import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:wisdomik/QuiltNet-B-32')
tokenizer = open_clip.get_tokenizer('hf-hub:wisdomik/QuiltNet-B-32')
I also used the class names and templates as given in the paper as follows,
nct_classnames = ["Adipose", "Debris", "Lymphocytes", "Mucus", "Smooth muscle", "Normal colon mucosa", "Cancer-associated stroma", "Colorectal adenocarcinoma epithelium"]
nct_template = [
lambda c: f'a histopathology slide showing {c}.',
lambda c: f'histopathology image of {c}.',
lambda c: f'pathology tissue showing {c}.',
lambda c: f'presence of {c} tissue on image.',
]
But I get a top1 accuracy lower than what's reported in the paper (59.56%), I get
zero shot metrics {'nct-zeroshot-val-top1': 0.28518236912136324, 'nct-zeroshot-val-top5': 0.7248697363418835}
I also tried training my own QuiltNet using Open_clip codebase from OpenAI, and the results were,
zero shot metrics {'nct-zeroshot-val-top1': 0.30728805599660086, 'nct-zeroshot-val-top5': 0.6808149026097458}
Could you kindly help me understand why I am. unable to reproduce the given numbers? I need to understand what I might be doing wrong.
Thank you.
Hello!
After downloading the videos I get an error running
python -m main --base_dir ${BASE_DIR}
the stack trace is as follows;
Traceback (most recent call last):
File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 166, in <module>
main(args, data_df, recon_df, device, histo_models_dict, video_paths_dict)
File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 68, in main
rep_chunk_im_temp = save_frame_chunks_recon(video_path, stable_times, chunk_id,fps, height, width)
File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/data_utils.py", line 108, in save_frame_chunks_recon
clip_start_time, clip_end_time = start_end_time
TypeError: cannot unpack non-iterable int object
Some additional variables that can help in understanding what's happening
>>> stable_se_times
(2, 17)
>>> start_end_time
2
basically, the assignment coming from start_end_time
generates an error due to the line
clip_start_time, clip_end_time = start_end_time
Any clue on where this might come from?
Thanks!
Hi,
The error is occurred from below command:
from transformers import CLIPModel
model = CLIPModel.from_pretrained("wisdomik/QuiltNet-B-16", use_auth_token=None)
.
The error msg is:
RuntimeError: Error(s) in loading state_dict for CLIPModel:
size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([768, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([197, 768]) from checkpoint, the shape in current model is torch.Size([50, 768]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
Is there any issues with that? or my dev environment has something wrong?
Hi,
I am trying to recreate the QUILT dataset. I have a doubt regarding some of the columns in the CSV files that you have shared in the repo. Can you please highlight how you obtained the "stable_times" column in quilt_recon.csv?
Also, Were the images in the "image_path" column of quilt_data.csv extracted using the Static Video Chunk Detection Algorithm? Can you please elaborate on the generation of the quilt_data.csv file?
Thank you
Dear Author,
The ARCH dataset is divided into two subsets: the books_set and the pubmed_set.
I have noticed that the pubmed_set appears to overlap with BioMedCLip, which sources from PubMed Central.
In your paper, you combined these two datasets for cross-modality retrieval. However, I decided to separate them and compare their performance individually.
The retrieval performance on the pubmed_set was as follows:
{15.7; 79.8; 94.4; 16.7; 78.9; 93.7}
Meanwhile, the retrieval performance on the books_set was:
{7.3; 49.2; 74.2; 8.2; 49.7; 73.2}
In contrast, the performance of QUILT-GPT/77 showed different results:
The retrieval performance on the pubmed_set was:
{1.8; 23.6; 46.0; 1.6; 23.4; 45.7}
The retrieval performance on the books_set was:
{1.8; 27.7; 52.8; 1.5; 23.4; 46.4}
From these results, it's clear that there isn't as significant a domain gap between the two datasets as there is with BiomedCLIP.
Can you please guide, How I can use quilt1 for image to text generation. Like I input an image, and it generates the text. Do I need to use LLaVA and BLIP like modes where I assign the weights of the quilt1m and use it for text description generation. As the API mentioned at the hugging face is only for zero short classification. and I could not find the Text retrieval code in GitHub repo. Moreover, I also tried blip, but got compatibility issues. Thanks.
Hello!
Thanks for the great resource!
I have been trying to run the data reconstruction but I kind of stumbled upon a couple of different errors (some are missing imports - e.g., nn from torch - one was a parenthesis that was not closed). There are also a couple of missing requirements (e.g., scikit-image) in the requirements file.
Would you mind taking a look? I have solved some of these and I am happy to send a PR in case but maybe you have an updated version of the code that runs out of the box.
First thansk for your impressive work on meidcal VLP coummunity!
From your paper, there are many downstream tasks in the benchmark to evlaute the VLP model, could you provide the pipeline or script to prepare the downstream dataset and evaluation?
Best Regards
Hi again :)
do you have any code you can share for downloading the videos?
Thank you so much! I appreciate your help on this!
Hi, thank you for sharing your work.
Can you provide me some more details to load ViT-B/32 model?
First of all, thank you for providing good paper and dataset.
I am wondering whether your team have a plan to provide quilt-1m dataset including additional dataset from twitter, PMC, etc.
Thank you!
Hello, thanks for sharing your work.
I want to fine-tune the QuiltNet-B-32 model to suit my downstream tasks. Can you provide a fine-tuning script? Or give an example of using QuiltNet-B-32?
Hi!
Hello, thanks for sharing your work.
I am currently working with your project and I have a question regarding a specific line of code in the data/main.py
file. In the file, I noticed the following line:"stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id][0]". I wanted to confirm if this line should actually be:''stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id]''Could you provide some clarification on this?
Additionally, I'm curious about how the list_idle_im_se_t_tup variable is generated and how it ensures that its length matches the length of chunks. Could you please point me to that section of the code or provide some insights on how this synchronization is achieved?
I appreciate your time and assistance. Thank you in advance for your help!
Best regards
Hello, thank you for your great work!
I see that in this repository, there are comparisons in terms of visualization. Could you provide the visualization tutorials or scipt on this? Thank you!
Hello,
Firstly, thank you for this. Amazing work!
Hello, I'm a PhD student and I have applied to get access to your dataset. I haven't received any reply yet, could you please give me access.
Best regards,
Markus Ekvall
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.