kakaobrain / cxr-clip Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 7.0 71 KB

License: Other

Python 100.00%

cxr-clip's People

Contributors

Stargazers

Watchers

Forkers

qqq-tech one-june jcho19 xiweideng mabonmn hongxin001

cxr-clip's Issues

About few-shot evaluation

Thank you for your great work!

I want to ask about few-shot noted in your paper, what does 10% or 100% really mean in few-shot(10%) and few-shot(100%)?
If possible, can you link the few-shot evaluation code for me please?
I deeply appreciate your help.

MIMIC-CXR Pretrain Dataset Pre-Processing

Thank you for releasing the code of this fantastic work! :)

I have a few questions regarding the data preprocessing of MIMIC dataset. I saw there are 5 steps in the preprocessing in the ReadME, I am wondering whether there is any code to perform any data cleaning. Because there are some samples that miss the 'findings' or 'impression' and after the back translation, this will give weird augmented text as shown in my screenshot.

From what I know the only preprocessing code is the "back_translation.py", did I miss any preprocessing code? And if there are other pre-processing steps, could you release these code?

Cheers!

The checkpoints provided does not give expected result

I tried to evaluate I2T retrieval with the official weight but the results are still not comparable to the figures in your paper. I used the resnet50 trained on MIMIC and Chexpert. Here is the results I got:

Can you share what I should notice when inference to produce the figures in your paper?

pre-trained checkpoints for CXR-CLIP?

Thank you for this great work! Where can I obtain pre-trained checkpoints for CXR-CLIP?

The RSNA Pneumonia dataset only contains binary labels for normal and pneumonia cases. Why is it possible to use this dataset for multi-label classification?

[BOS] and [EOS] token missing

Hello,

thanks for sharing the code!

I met two problems when using the code:

(1) when I try to initialize the tokenizer, there is a line tokenizer.bos_token_id = tokenizer.cls_token_id

I receive an error: TypeError: 'int' object is not iterable

My understanding is that, since you donot define the [BOS] token (because you are using a BERT tokenizer), you cannot directly define a value of this token.

(2) following the first question, I found the global text feature extraction of the text encoder uses the [EOS] token. But this token does not exist in the BERT tokenizer either.

Could you please let me know how to correctly initialize your tokenizer with both [BOS] and [EOS] tokens and their corresponding ids?

Thank you!

Some questions about data processing

Hi,
It is a meaningful work : )
I have some questions.
The "text" in each subject is like: [[Findings1,Impression1],[Findings2,Impression2][Findings3,Impression3]...[Findings,Impression]].
Is this correct?
I have seen in the before Issues, it said if findings is missing and just take impression as a list [impression]. But actually when I try it, the "back_translation.py" showes wrong that "ValueError: not enough values to unpack (expected 2, got 1)" which means "list： [impression]" is incorrect.
Thank you.

Cannot reproduce the results in the paper

I tried training the pretrain model on mimic dataset with 30 epochs, batch size 32, lr 5e-6, weight decay 1e-4, the optimizer I used was adamw. Other configs were analogous to the default configs in the repo. After completing 30 epochs, this is the loss I got over each epoch:

When I run evaluate_clip script, I got this result:

which is inferior to your results.

This is my data after processed:

subject_id,study_id,image,view,PA,LATERAL,AP,LL,nan,LAO,RAO,AP AXIAL,SWIMMERS,PA LLD,AP LLD,XTABLE LATERAL,AP RLD,PA RLD,LPO,text,split,text_augment
10000032,53189527,"['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/2a2277a9-b0ded155-c0de8eb9-c124d10e-82c5caab.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/e084de3b-be89b11e-20fe3f9f-9c8d8dfe-4cfd202c.jpg']","['LATERAL', 'PA']",['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/2a2277a9-b0ded155-c0de8eb9-c124d10e-82c5caab.jpg'],['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/e084de3b-be89b11e-20fe3f9f-9c8d8dfe-4cfd202c.jpg'],,,,,,,,,,,,,,"['The cardiac, mediastinal and hilar contours are normal. Pulmonary vasculature\n is normal.  Lungs are clear. No pleural effusion or pneumothorax is present.\n Multiple clips are again seen projecting over the left breast.  Remote\n left-sided rib fractures are also re- demonstrated.', 'No acute cardiopulmonary abnormality.']",train,"['Heart contours, medistina and hilar are normal. Lung vascularity and usual. Lung. Lung is clear. No pleural effusion or pneumometers. Multiple clips are still being seen projecting to the left part of the breast. They are also remote fractures of the left ribs are re-proved.', 'No acute cardiac pulmonary disease and dysrhythmia.']"
10000032,53911762,"['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/68b5c4b1-227d0485-9cc38c3f-7b84ab51-4b472714.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/fffabebf-74fd3a1f-673b6b41-96ec0ac9-2ab69818.jpg']",['AP'],,,"['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/68b5c4b1-227d0485-9cc38c3f-7b84ab51-4b472714.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/fffabebf-74fd3a1f-673b6b41-96ec0ac9-2ab69818.jpg']",,,,,,,,,,,,,"['Single frontal view of the chest provided.\n \n There is no focal consolidation, effusion, or pneumothorax. The\n cardiomediastinal silhouette is normal.  Again seen are multiple clips\n projecting over the left breast and remote left-sided rib fractures.  No free\n air below the right hemidiaphragm is seen.', 'No acute intrathoracic process.']",train,"['One front view of your chest does not make a focal effect, effusion, or pneumothorax. cardiomediastinal outlines are normally common to one another. Again, it can be seen that you are projecting several clips onto your left chest and off the fractures of your left side coast. No free air under your right hemidiafragma should be observed.', ""Yeah, well, it can't really be discreet.""]"
10001122,53447138,"['/mnt/ssd1/CXR/data/imgs/p10/p10001122/s53447138/8039752c-2ea661b7-16f1eafe-055b7e7b-dbd4cdd1.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10001122/s53447138/832b57d8-3ae08663-e152699e-51c5db98-b7cb4226.jpg']",[],,,,,,,,,,,,,,,,['The lung volumes are normal.  No evidence of TB or other parenchymal changes. \n Mild elevation of the left hemidiaphragm.  No pleural effusions.  No\n pneumonia.  The lateral radiograph shows evidence of anterior ligament\n calcification at the anterior aspect of the thoracic spine.  Status post\n cholecystectomy.'],train,"['There is no sign that there are other parenchymal changes and TB, no pleural effusion. Lateral X-ray shows evidence of calciification of ligament at the rear of the front of the rib. Post cholecystitomy. No sign are observed of TB and any other parenchymal variations.']"
10000032,56699142,['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s56699142/ea030e7a-2e3b1346-bc518786-7a8fd698-f673b44c.jpg'],['AP'],,,['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s56699142/ea030e7a-2e3b1346-bc518786-7a8fd698-f673b44c.jpg'],,,,,,,,,,,,,"['The lungs are clear of focal consolidation, pleural effusion or pneumothorax. \n The heart size is normal.  The mediastinal contours are normal. Multiple\n surgical clips project over the left breast, and old left rib fractures are\n noted.', 'No acute cardiopulmonary process.']",train,"['They are free from focal bloated moments, pleural precipitations or pneumothorax, normal heart size. The thoracic contours are normal. There are numerous surgical fractures in the left breast and the old fractures of left ribs have been noted.', 'No acute cardio-pilmonerous processes.']"

Can you elaborate if my processed data is wrong somewhere. I wonder if keeping other views than AP, PA, and LATERAL leads to the inferior figures. Plus, can you share me your training logs?

I look forward to you supports. Thank you in advance.

How to tackle missing both impressions and findings sections

I appreciate your work. It helps us very much.

I want to ask how did you handle reports where both findings and impressions are missing. According to what I've read in the official paper of mimic-cxr dataset, there are approximately 10K reports devoid of both impressions and findings, instead they includes last_paragraph or comparisons section. Did you replace missing impressions and findings with these sections or just simply leave them out? I read cursorly through your code and it seemed like you did not accept empty list in the text column of the dataset.

Thank you in advance for your supports.

Some confusion about pre training datasets

Thanks a lot for the releasing code!

I have some doubts about the use of pre training datasets

In /cxr-clip/cxrclip/data/datasets/imagetext.py

This code doesn't seem quite correct. After so many if statements, it seems that the selection of images in this code does not meet the requirements of "sample images from two distinct views as possible" described in the paper, and some parts may even appear incorrect. Because every time the 80th command is executed, the "image_path_list" variable is reset instead of truly becoming a list for storing image paths.

This also raises another question for me, how are the elements of the list in the view column counted for the training csv of MIMIC-CXR? For example, in a study, I have two AP images, one PA image, and two Lateral images. Should my list be [AP, AP, PA, Lateral, Lateral] or just [AP, PA, Lateral]?

Looking forward to your answer!

How to resume my training?

I always encounter the situation shown in the following picture during training, where the process is terminated by signal 9. I don't know why this situation always occurs, but now I hope to continue training on the checkpoint that I have trained for 4 epochs. How should I do this?

12172/13725 [4:44:35<35:03,  1.35s/it, lr=['0.00004109'], loss=0.975784, CUDA-Mem=0%, CUDAUtil=0%]WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'a4000x4313_1543408_0' has failed to send a keep-alive heartbeat to the rendezvous '78d6761b-372b-4b1a-afe6-abb8735493a4' due to an error of type RendezvousTimeoutError.                                                                                              
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1543460 closing signal SIGTERM                                            
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 1543461) of binary: /home/gem/anaconda3/envs/cxr-clip/bin/python                                                                                                                                  
Traceback (most recent call last):                                                                                                              
  File "/home/gem/anaconda3/envs/cxr-clip/bin/torchrun", line 8, in <module>                                                                    
    sys.exit(main())                                                                                                                            
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, i
n wrapper                                                                                                                                       
    return f(*args, **kwargs)                                                                                                                   
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main                              
    run(args)                                                                                                                                   
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run                               
    elastic_launch(                                                                                                                             
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__                 
    return launch_agent(self._config, self._entrypoint, list(args))                                                                             
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent             
    raise ChildFailedError(                                                                                                                     
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:                                                                              
========================================================                                                                                        
train.py FAILED                                                                                                                                 
--------------------------------------------------------                                                                                        
Failures:                                                                                                                                       
  <NO_OTHER_FAILURES>                                                                                                                           
--------------------------------------------------------                                                                                        
Root Cause (first observed failure):                                                                                                            
[0]:                                                                                                                                            
  time      : 2024-04-09_16:19:50                                                                                                               
  host      : a4000x4313                                                                                                                        
  rank      : 1 (local_rank: 1)                                                                                                                 
  exitcode  : -9 (pid: 1543461)                                                                                                                 
  error_file: <N/A>                                                                                                                             
  traceback : Signal 9 (SIGKILL) received by PID 1543461                                                                                        
========================================================

kakaobrain / cxr-clip Goto Github PK

cxr-clip's People

Contributors

Stargazers

Watchers

Forkers

cxr-clip's Issues

About few-shot evaluation

MIMIC-CXR Pretrain Dataset Pre-Processing

The checkpoints provided does not give expected result

pre-trained checkpoints for CXR-CLIP?

The RSNA Pneumonia dataset only contains binary labels for normal and pneumonia cases. Why is it possible to use this dataset for multi-label classification?

[BOS] and [EOS] token missing

Some questions about data processing

Cannot reproduce the results in the paper

How to tackle missing both impressions and findings sections

Some confusion about pre training datasets

How to resume my training?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent