ibrahimethemhamamci / ct-clip Goto Github PK

A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities

Python 100.00%

ct-clip's People

Contributors

Stargazers

Watchers

Forkers

sezginerr mfdasdelen project-batata hertera1 zhengchen6 jenningsje lisarklok xiweideng

ct-clip's Issues

I applied to access and use the CT-RATE dataset, but after waiting for a few days, I still couldn't pass.

https://huggingface.co/datasets/ibrahimhamamci/CT-RATE

Issue about checkpoint of CT-CLIP

Dear author,

Thank you for your amazing work! I was trying to reproduce your results recently, but I met the following problem when I load the checkpoint of your model.

How can I solve this problem? Looking forward for your response!

CPU Memory size for loading total training data

Hi, thank you for your excellent work again!
I would like to ask how much cpu memory is required to load all the training data into memory? Did you load all 50k training data during the training process? Are there any training strategies that can reduce memory requirements?

Long Preprocessing Times

How long should the data preprocessing steps take? It seems to be stalling on my end - running for more than 3 hours.

Training command with multi-GPU for CT-Clip Zero-shot model

Hi, thank you for your excellent work again!
I would like to ask if the command for multi-GPU training is also accelerate launch run_train.py? And do you use a config for multi-GPU training?

Error while reproducing the project

Hello! I want to try using your zero-shot model on my own data, but I met some problems when I run run_zero_shot.py, with the pre-training .pt file set up, I set the

data_folder = '/dataset_metadata_validation_metadata.csv',
reports_file= "dataset_radiology_text_reports_validation_reports.csv",
labels = "dataset_multi_abnormality_labels_valid_predicted_labels.csv",

which these three .csv file downloaded from your huggingface dataset. However, it seems the dataloader in zero_shot.py cannot right read the data and throws this error as bellow:

$ CUDA_VISIBLE_DEVICES=2 python run_zero_shot.py
/.conda/envs/ct_clip/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/.conda/envs/ct_clip/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
0it [00:00, ?it/s]
Traceback (most recent call last):
File "run_zero_shot.py", line 43, in
inference = CTClipInference(
File "/CT-CLIP-main/scripts/zero_shot.py", line 179, in init
self.dl = DataLoader(
File "/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 351, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/.local/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 107, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

How to solve this problem? I'm not sure if there are errors in any of the settings, and if I want to use the model directly to diagnose new CT cases, is it just a matter of running run_zero_shot.py as I'm currently doing?

Saved model is incomplete when use --use_fsdp

Thank you for open source such meaningful work!
I had some problems during training.
When training with --use_fsdp, the saved model is incomplete, and saved state_dict only contains partial visual transform parameters.
Have you encountered similar problems?

How to resume training CT-CLIP

Thank you for sharing the excellent work. How can one resume from a pre-training stage and continue training?

Dataset storage size

Very meaningful work. Before I download the data from HuggingFace, can you tell me how much storage space the data takes up?

Issue with loading the dataset with huggingface

I get the following error after running the command:

load_dataset("ibrahimhamamci/CT-RATE")

Generating train split: 47149 examples [00:00, 135849.06 examples/s]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1989, in _prepare_split_single
writer.write_table(table)
File "/opt/conda/lib/python3.10/site-packages/datasets/arrow_writer.py", line 584, in write_table
pa_table = table_cast(pa_table, self._schema)
File "/opt/conda/lib/python3.10/site-packages/datasets/table.py", line 2240, in table_cast
return cast_table_to_schema(table, schema)
File "/opt/conda/lib/python3.10/site-packages/datasets/table.py", line 2194, in cast_table_to_schema
raise CastError(
datasets.table.CastError: Couldn't cast
VolumeName: string
Medical material: int64
Arterial wall calcification: int64
Cardiomegaly: int64
Pericardial effusion: int64
Coronary artery wall calcification: int64
Hiatal hernia: int64
Lymphadenopathy: int64
Emphysema: int64
Atelectasis: int64
Lung nodule: int64
Lung opacity: int64
Pulmonary fibrotic sequela: int64
Pleural effusion: int64
Mosaic attenuation pattern: int64
Peribronchial thickening: int64
Consolidation: int64
Bronchiectasis: int64
Interlobular septal thickening: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 2787
to
{'VolumeName': Value(dtype='string', id=None), 'Manufacturer': Value(dtype='string', id=None), 'SeriesDescription': Value(dtype='string', id=None), 'ManufacturerModelName': Value(dtype='string', id=None), 'PatientSex': Value(dtype='string', id=None), 'PatientAge': Value(dtype='string', id=None), 'ReconstructionDiameter': Value(dtype='float64', id=None), 'DistanceSourceToDetector': Value(dtype='float64', id=None), 'DistanceSourceToPatient': Value(dtype='float64', id=None), 'GantryDetectorTilt': Value(dtype='int64', id=None), 'TableHeight': Value(dtype='float64', id=None), 'RotationDirection': Value(dtype='string', id=None), 'ExposureTime': Value(dtype='float64', id=None), 'XRayTubeCurrent': Value(dtype='int64', id=None), 'Exposure': Value(dtype='int64', id=None), 'FilterType': Value(dtype='string', id=None), 'GeneratorPower': Value(dtype='float64', id=None), 'FocalSpots': Value(dtype='string', id=None), 'ConvolutionKernel': Value(dtype='string', id=None), 'PatientPosition': Value(dtype='string', id=None), 'RevolutionTime': Value(dtype='float64', id=None), 'SingleCollimationWidth': Value(dtype='float64', id=None), 'TotalCollimationWidth': Value(dtype='float64', id=None), 'TableSpeed': Value(dtype='float64', id=None), 'TableFeedPerRotation': Value(dtype='float64', id=None), 'SpiralPitchFactor': Value(dtype='float64', id=None), 'DataCollectionCenterPatient': Value(dtype='string', id=None), 'ReconstructionTargetCenterPatient': Value(dtype='string', id=None), 'ExposureModulationType': Value(dtype='string', id=None), 'CTDIvol': Value(dtype='float64', id=None), 'ImagePositionPatient': Value(dtype='string', id=None), 'ImageOrientationPatient': Value(dtype='string', id=None), 'SliceLocation': Value(dtype='float64', id=None), 'SamplesPerPixel': Value(dtype='int64', id=None), 'PhotometricInterpretation': Value(dtype='string', id=None), 'Rows': Value(dtype='int64', id=None), 'Columns': Value(dtype='int64', id=None), 'XYSpacing': Value(dtype='string', id=None), 'RescaleIntercept': Value(dtype='int64', id=None), 'RescaleSlope': Value(dtype='int64', id=None), 'RescaleType': Value(dtype='string', id=None), 'NumberofSlices': Value(dtype='int64', id=None), 'ZSpacing': Value(dtype='float64', id=None), 'StudyDate': Value(dtype='int64', id=None)}
because column names don't match

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 2582, in load_dataset
builder_instance.download_and_prepare(
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1860, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1991, in _prepare_split_single
raise DatasetGenerationCastError.from_cast_error(
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 18 new columns (Pericardial effusion, Coronary artery wall calcification, Mosaic attenuation pattern, Medical material, Lung nodule, Bronchiectasis, Lung opacity, Hiatal hernia, Pleural effusion, Pulmonary fibrotic sequela, Interlobular septal thickening, Atelectasis, Cardiomegaly, Consolidation, Lymphadenopathy, Peribronchial thickening, Emphysema, Arterial wall calcification) and 43 missing columns (DataCollectionCenterPatient, ConvolutionKernel, Rows, CTDIvol, TableHeight, SeriesDescription, RotationDirection, RescaleType, TotalCollimationWidth, Columns, GantryDetectorTilt, TableSpeed, TableFeedPerRotation, SingleCollimationWidth, RevolutionTime, ImageOrientationPatient, ExposureModulationType, SliceLocation, PatientSex, PhotometricInterpretation, NumberofSlices, ManufacturerModelName, DistanceSourceToDetector, XRayTubeCurrent, ReconstructionTargetCenterPatient, DistanceSourceToPatient, RescaleSlope, ZSpacing, SamplesPerPixel, StudyDate, PatientAge, RescaleIntercept, Manufacturer, Exposure, FocalSpots, SpiralPitchFactor, FilterType, ReconstructionDiameter, ExposureTime, GeneratorPower, XYSpacing, ImagePositionPatient, PatientPosition).

This happened while the csv dataset builder was generating data using

hf://datasets/ibrahimhamamci/CT-RATE/dataset/multi_abnormality_labels/train_predicted_labels.csv (at revision 4d92f6d4f805e36e2891359c04302705c314fe43)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.