mlcommons / gandlf Goto Github PK

A generalizable application framework for segmentation, regression, and classification using PyTorch

License: Apache License 2.0

Python 79.52% Roff 0.44% Dockerfile 0.08% Shell 0.39% Jupyter Notebook 19.57%

deep-learning regression classification segmentation data-augmentation biomedical-image-processing medical-imaging framework clinical-workflow machine-learning

gandlf's People

Contributors

Stargazers

Watchers

gandlf's Issues

Add epoch header for output predictions

Is your feature request related to a problem? Please describe.
output_predictions.csv contains headers multiple times but not epoch information.

Describe the solution you'd like
This information would be great to have.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Add texture based feature extraction (radiomics)

Is your feature request related to a problem? Please describe.
As texture features are increasingly being used in medical imaging, it would be nice to have support for same here.

Describe the solution you'd like
Integrate popular radiomics package pyradiomics.

Describe alternatives you've considered
GPU based radiomics package cuRadiomics. Although it might be unstable/not actively developed. Open to any other suggestions.

Additional context
None

IndexError: list index out of range

Hi again,

sorry for opening so many issues >.<
When I try to train on the toy dataset in testing/data.zip I get the error IndexError: list index out of range. This might originate from normalize_nonZero in data_preprocessing. I am using the newest pull from gandalf-refactor and am using Linux.

The train.csv and the toy data:
https://cloud-ext.igd.fraunhofer.de/s/8kCtZzcFRX96Xt8

Full error log:

Using default folds for testing split:  -5
Using default folds for validation split:  -5
Number of channels :  3
Channel Keys :  ['subject_id', '1', '2', '3', 'label', 'path_to_metadata', 'value_0']



Initializing training at :  2021-03-28 10:34:16.335452
Found a pre-existing file for logging, now appending logs to that file!
Found a pre-existing file for logging, now appending logs to that file!
Device requested via CUDA_VISIBLE_DEVICES:  0
Total number of CUDA devices:  1
Device finally used:  0
Sending model to aforementioned device
Memory Total :  15.9 GB, Allocated:  0.1 GB, Cached:  0.1 GB
Device - Current: 0 Count: 1 Name: Tesla P100-PCIE-16GB Availability: True
Using device: cuda
********************
Starting Epoch :  0
Epoch start time :  2021-03-28 10:34:18.923012
Traceback (most recent call last):
  File "gandlf_run", line 75, in <module>
    main()
  File "gandlf_run", line 70, in main
    TrainingManager(dataframe=data_full, headers = headers, outputDir=model_path, parameters=parameters, device=device, reset_prev = reset_prev)
  File "/content/GaNDLF-refactor/GANDLF/training_manager.py", line 146, in TrainingManager
    device=device, params=parameters, testing_data=testingData)
  File "/content/GaNDLF-refactor/GANDLF/training_loop.py", line 477, in training_loop
    model, train_dataloader, optimizer, params
  File "/content/GaNDLF-refactor/GANDLF/training_loop.py", line 133, in train_network
    for batch_idx, (subject) in enumerate(train_dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 557, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torchio/data/queue.py", line 164, in __getitem__
    self.fill()
  File "/usr/local/lib/python3.7/dist-packages/torchio/data/queue.py", line 228, in fill
    subject = self.get_next_subject()
  File "/usr/local/lib/python3.7/dist-packages/torchio/data/queue.py", line 238, in get_next_subject
    subject = next(self.subjects_iterable)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torchio/data/dataset.py", line 85, in __getitem__
    subject = self._transform(subject)
  File "/usr/local/lib/python3.7/dist-packages/torchio/transforms/transform.py", line 121, in __call__
    transformed = self.apply_transform(subject)
  File "/usr/local/lib/python3.7/dist-packages/torchio/transforms/augmentation/composition.py", line 47, in apply_transform
    subject = transform(subject)
  File "/usr/local/lib/python3.7/dist-packages/torchio/transforms/transform.py", line 121, in __call__
    transformed = self.apply_transform(subject)
  File "/content/GaNDLF-refactor/GANDLF/preprocessing.py", line 221, in apply_transform
    images_dict[names_list[idx]]['data'] = torch.tensor(np.expand_dims(array, axis=0))
IndexError: list index out of range

This is the model.yaml:

# affix version
version:
  {
    minimum: 0.0.8,
    maximum: 0.0.8 # this should NOT be made a variable, but should be tested after every tag is created
  }
metrics:
  - mse
# Choose the model parameters here
model:
  {
    dimension: 2, # the dimension of the model and dataset: defines dimensionality of computations
    base_filters: 30, # Set base filters: number of filters present in the initial module of the U-Net convolution; for IncU-Net, keep this divisible by 4
    architecture: vgg16, # options: unet, resunet, fcn, uinc, vgg, densenet
    batch_norm: True, # this is only used for vgg
    final_layer: None, # can be either sigmoid, softmax or none (none == regression)
    amp: False, # Set if you want to use Automatic Mixed Precision for your operations or not - options: True, False
    n_channels: 3, # set the input channels - useful when reading RGB or images that have vectored pixel types
  }
# this is to enable or disable lazy loading - setting to true reads all data once during data loading, resulting in improvements
# in I/O at the expense of memory consumption
in_memory: False
# this will save the generated masks for validation and testing data for qualitative analysis
save_masks: False
# Set the Modality : rad for radiology, path for histopathology
modality: rad
# Patch size during training - 2D patch for breast images since third dimension is not patched 
patch_size: [64,64,64]
# uniform: UniformSampler or label: LabelSampler
patch_sampler: uniform
# Number of epochs
num_epochs: 100
# Set the patience - measured in number of epochs after which, if the performance metric does not improve, exit the training loop - defaults to the number of epochs
patience: 50
# Set the batch size
batch_size: 1
# Set the initial learning rate
learning_rate: 0.001
# Learning rate scheduler - options: triangle, triangle_modified, exp, reduce-on-lr, step, more to come soon - default hyperparameters can be changed thru code
scheduler: triangle
# Set which loss function you want to use - options : 'dc' - for dice only, 'dcce' - for sum of dice and CE and you can guess the next (only lower-case please)
# options: dc (dice only), dc_log (-log of dice), ce (), dcce (sum of dice and ce), mse () ...
# mse is the MSE defined by torch and can define a variable 'reduction'; see https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss
# use mse_torch for regression/classification problems and dice for segmentation
loss_function: mse
# this parameter weights the loss to handle imbalanced losses better
weighted_loss: True 
#loss_function:
#  {
#    'mse':{
#      'reduction': 'mean' # see https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss for all options
#    }
#  }
# Which optimizer do you want to use - adam/sgd
opt: adam
# this parameter controls the nested training process
# performs randomized k-fold cross-validation
# split is performed using sklearn's KFold method
# for single fold run, use '-' before the fold number
nested_training:
  {
    #testing: 5, # this controls the testing data splits for final model evaluation; use '1' if this is to be disabled
    #validation: 5 # this controls the validation data splits for model training
  }
## pre-processing
# this constructs an order of transformations, which is applied to all images in the data loader
# order: resize --> threshold/clip --> resample --> normalize
# 'threshold': performs intensity thresholding; i.e., if x[i] < min: x[i] = 0; and if x[i] > max: x[i] = 0
# 'clip': performs intensity clipping; i.e., if x[i] < min: x[i] = min; and if x[i] > max: x[i] = max
# 'threshold'/'clip': if either min/max is not defined, it is taken as the minimum/maximum of the image, respectively
# 'normalize': performs z-score normalization: https://torchio.readthedocs.io/transforms/preprocessing.html?highlight=ToCanonical#torchio.transforms.ZNormalization
# 'normalize_nonZero': perform z-score normalize but with mean and std-dev calculated on only non-zero pixels
# 'normalize_nonZero_masked': perform z-score normalize but with mean and std-dev calculated on only non-zero pixels with the stats applied on non-zero pixels
# 'crop_external_zero_planes': crops all non-zero planes from input tensor to reduce image search space
# 'resample: resolution: X,Y,Z': resample the voxel resolution: https://torchio.readthedocs.io/transforms/preprocessing.html?highlight=ToCanonical#torchio.transforms.Resample
# 'resample: resolution: X': resample the voxel resolution in an isotropic manner: https://torchio.readthedocs.io/transforms/preprocessing.html?highlight=ToCanonical#torchio.transforms.Resample
# resize the image(s) and mask (this should be greater than or equal to patch_size); resize is done ONLY when resample is not defined
data_preprocessing:
  {
    # 'normalize',
    'normalize_nonZero', # this performs z-score normalization only on non-zero pixels
    'resample':{
      'resolution': [1,2,3]
    },
    #'resize': [128,128], # this is generally not recommended, as it changes image properties in unexpected ways
    'crop_external_zero_planes', # this will crop all zero-valued planes across all axes
  }
# various data augmentation techniques
# options: affine, elastic, downsample, motion, ghosting, bias, blur, gaussianNoise, swap
# keep/edit as needed
# all transforms: https://torchio.readthedocs.io/transforms/transforms.html?highlight=transforms
# 'kspace': one of motion, ghosting or spiking is picked (randomly) for augmentation
# 'probability' subkey adds the probability of the particular augmentation getting added during training (this is always 1 for normalize and resampling)
data_augmentation: 
  {
    default_probability: 0.5,
    'affine',
    'elastic',
    'kspace':{
      'probability': 1
    },
    'bias',
    'blur': {
      'std': [0, 1] # default std-dev range, for details, see https://torchio.readthedocs.io/transforms/augmentation.html?highlight=randomblur#torchio.transforms.RandomBlur
    },
    'noise': { # for details, see https://torchio.readthedocs.io/transforms/augmentation.html?highlight=randomblur#torchio.transforms.RandomNoise
      'mean': 0, # default mean
      'std': [0, 1] # default std-dev range
    },
    'anisotropic':{
      'axis': [0,1],
      'downsampling': [2,2.5]
    },
  }
# parallel training on HPC - here goes the command to prepend to send to a high performance computing
# cluster for parallel computing during multi-fold training
# not used for single fold training
# this gets passed before the training_loop, so ensure enough memory is provided along with other parameters
# that your HPC would expect
# ${outputDir} will be changed to the outputDir you pass in CLI + '/${fold_number}'
# ensure that the correct location of the virtual environment is getting invoked, otherwise it would pick up the system python, which might not have all dependencies
# parallel_compute_command: 'qsub -b y -l gpu -l h_vmem=32G -cwd -o ${outputDir}/\$JOB_ID.stdout -e ${outputDir}/\$JOB_ID.stderr `pwd`/sge_wrapper _correct_location_of_virtual_environment_/venv/bin/python'
## queue configuration - https://torchio.readthedocs.io/data/patch_training.html?#queue
# this determines the maximum number of patches that can be stored in the queue. Using a large number means that the queue needs to be filled less often, but more CPU memory is needed to store the patches
q_max_length: 40
# this determines the number of patches to extract from each volume. A small number of patches ensures a large variability in the queue, but training will be slower
q_samples_per_volume: 5
# this determines the number subprocesses to use for data loading; '0' means main process is used
q_num_workers: 2 # scale this according to available CPU resources
# used for debugging
q_verbose: False

Best
Karol

Add options for other VGG configurations and add "batch_norm" to options

Is your feature request related to a problem? Please describe.
Currently, we only have VGG16 with batch_norm disabled.

Describe the solution you'd like
Adding more options would be good for different applications.

Describe alternatives you've considered
N.A.

Additional context
N.A.

U-Net style archs need a minimum patch size

Is your feature request related to a problem? Please describe.
Currently, if the patch size of U-Net variants is 16, the final output is [...,1], which does not make sense.

Describe the solution you'd like
Add a minimum patch size requirement, based on the number of layers in the network.

Describe alternatives you've considered
N.A.

Additional context
Reported by Vinayak.

Alternatives to OpenSlide

Is your feature request related to a problem? Please describe.
OpenSlide is (and probably will always remain) a pain to install and use because it needs the library files present and correctly loaded in the environment prior to any import.

Describe the solution you'd like
Look for alternatives to this for WSI I/O.

Describe alternatives you've considered
N.A.

Additional context
Post alternatives (and reasons why it is better than OpenSlide) as comments.

Improvement in Inference Manager for Classification Tasks

Is your feature request related to a problem? Please describe.
It is related to the tasks including K-fold cross validation.

Describe the solution you'd like
Inference manager should be improved by looping over the number of folds, and in case it is a classification problem, the results should be averaged over the folds and saved as probabilities.csv file.

Additional context
Add any other context or screenshots about the feature request here.

Memory usage is exploding after adding subject consistency check

Describe the bug
When torchio.Subject.check_consistent_* checks are called, RAM usage explodes, most likely because of this bug in PyTorch.

To Reproduce
Steps to reproduce the behavior:

Start training with a large number of subjects.
See memory usage linearly increasing and never going down

Expected behavior
Consistency check should happen once and then the image should be unloaded from memory.

Screenshots
If applicable, add screenshots to help explain your problem.

GaNDLF Version
0.0.10-dev

Desktop (please complete the following information):
N.A.

Additional context
N.A.

Need to switch from BCE

Describe the bug
A clear and concise description of what the bug is.
When running GaNDLF for a 3D U-Net segmentation task with a weighted dcce loss function, the following error occurred.

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [31,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.

Traceback (most recent call last):
  File "/cbica/home/patis/comp_space/testing/gandlf_mine/gandlf_run", line 157, in <module>
    main()
  File "/cbica/home/patis/comp_space/testing/gandlf_mine/gandlf_run", line 129, in main
    reset_prev=reset_prev,
  File "/gpfs/fs001/cbica/comp_space/patis/testing/gandlf_mine/GANDLF/training_manager.py", line 319, in TrainingManager_split
    testing_data=None,
  File "/gpfs/fs001/cbica/comp_space/patis/testing/gandlf_mine/GANDLF/training_loop.py", line 792, in training_loop
    model, val_dataloader, scheduler, params, epoch, mode="validation"
  File "/gpfs/fs001/cbica/comp_space/patis/testing/gandlf_mine/GANDLF/training_loop.py", line 473, in validate_network
    result = step(model, image, label, params)
  File "/gpfs/fs001/cbica/comp_space/patis/testing/gandlf_mine/GANDLF/training_loop.py", line 94, in step
    loss, metric_output = get_loss_and_metrics(label, output, params)
  File "/gpfs/fs001/cbica/comp_space/patis/testing/gandlf_mine/GANDLF/parameterParsing.py", line 482, in get_loss_and_metrics
    metric_function(predicted, ground_truth, params).cpu().data.item()
RuntimeError: CUDA error: device-side assert triggered

This indicated that there was a problem with the use of the BCELoss() function in line 79 of losses.py:

loss = torch.nn.BCELoss()

After further investigation, it seems that changing line 79 to use the BCEWithLogitsLoss() function solves the issue:

   loss = torch.nn.BCEWithLogitsLoss()

To Reproduce
Steps to reproduce the behavior:
Run this command:

source activate /gpfs/fs001/cbica/home/ahluwalv/dbt_deep_learning/GaNDLF/venv

/gpfs/fs001/cbica/home/ahluwalv/dbt_deep_learning/GaNDLF/venv/bin/python /cbica/home/ahluwalv/dbt_deep_learning/GaNDLF/gandlf_run -config /cbica/home/patis/comp_space/testing/gandlf_mine/exp_vinny/A/config.yaml -data /cbica/home/ahluwalv/dbt_deep_learning/train_dbt.csv,/cbica/home/ahluwalv/dbt_deep_learning/val_dbt.csv -output /cbica/home/ahluwalv/dbt_deep_learning/train_logs_augmentation/ -train 1 -device cuda  -reset_prev True

Expected behavior
It is expected that when using the loss function, that values remain between [0,1].

GaNDLF Version
0.0.10-dev

Desktop (please complete the following information):

OS: Linux

Additional context
This issue occurred during epoch 4, so its occurrence may be random. Ensuring it doesn't happen again will probably be accomplished by the suggested fix.

Dealing with model collapse

Is your feature request related to a problem? Please describe.
Sometimes, because of the way a model gets initialized/trained, model collapse (i.e., loss and weights all become nan) can happen.

Describe the solution you'd like
Instead of having re-initialize the complete training sequence, we should explore ways to make GaNDLF automatically do this.

Describe alternatives you've considered
Open to other suggestions.

Additional context

Multi-batch training for regression/classification is failing

Describe the bug
The ground truth shape is inconsistent

To Reproduce
Steps to reproduce the behavior:

Train for regression/classification with multiple batches
See error at L159 of GANDLF.training_loop

Expected behavior
This error should not happen.

Screenshots
N.A.

GaNDLF Version
0.0.10

Desktop (please complete the following information):
N.A.

Additional context
N.A.

Add option to pad images/masks in the config for label sampler

Is your feature request related to a problem? Please describe.
Label sampler needs padding to happen for the training to proceed correctly. But padding has currently been disabled to save memory.

Describe the solution you'd like
Enable this via a config option.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Pytest for inference

Is your feature request related to a problem? Please describe.
Current unit tests only contain training tests, and are missing inference tests.

Describe the solution you'd like
Add a simple test for inference.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Missing parameters in sample configs

Hi,

the sample configs don't seem to be up to date. Would probably be a good idea to check them all ;)
I got another error from my problem over in #27.
This time it is:

Using default folds for testing split:  -5
Using default folds for validation split:  -5
Number of channels :  3
Traceback (most recent call last):
  File "gandlf_run", line 75, in <module>
    main()
  File "gandlf_run", line 70, in main
    TrainingManager(dataframe=data_full, headers = headers, outputDir=model_path, parameters=parameters, device=device, reset_prev = reset_prev)
  File "/content/GaNDLF-refactor/GANDLF/training_manager.py", line 146, in TrainingManager
    device=device, params=parameters, testing_data=testingData)
  File "/content/GaNDLF-refactor/GANDLF/training_loop.py", line 357, in training_loop
    train=True,
  File "/content/GaNDLF-refactor/GANDLF/data/ImagesFromDataFrame.py", line 240, in ImagesFromDataFrame
    augmentation_list.append(global_preprocessing_dict['crop_external_zero_planes'](patch_size))
  File "/content/GaNDLF-refactor/GANDLF/data/ImagesFromDataFrame.py", line 71, in crop_external_zero_planes
    return CropExternalZeroplanes(patch_size=patch_size)
TypeError: __init__() missing 1 required positional argument: 'psize'

So the psize parameter is missing.

Best
Karol

Ensure correct augmentation order

Describe the bug
For MR images, when multiple augmentations/preprocessing is selected, the correct order needs to be maintained.

To Reproduce
This is not really an error as much as incorrect computation.

Expected behavior
MR images need some augmentations to be handled BEFORE any normalization happens. The order should be:

Affine / Resample
Blur
Gamma (to change the subject intrinsic contrast, before applying another artefact)
Bias
Ghosting / Spike / Motion
Noise
Normalize / intensity scaling

Rest of the augmentations can work on normalized images, so the current pipeline should work.

Screenshots
N.A.

GaNDLF Version
0.0.10-dev

Desktop (please complete the following information):
N.A.

Additional context
Please see discussion in fepegar/torchio#600

Error in `gandlf_collectStats` documentation

Describe the bug
I am working in WSL2 and followed all installation instructions.
I was able to run the file "gandlf_run" on CUDA device and some folders are generated in the Output directory.
However I am getting the following error while running the "gandlf_collectStats"
(GaNDLF Version: 0.0.9)

Optimzer should be stepped before scheduler

Describe the bug
Currently, GaNDLF calls optimizer.step() after scheduler.step() but it should be the other way around. n PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-

Additional context
Reported by @Geeks-Sid

`light_unet` is not working via unit tests

Describe the bug
When light_unet is added to the all_modes_segmentation [ref], the tests fail.

To Reproduce
Steps to reproduce the behavior:

Add light_unet to all_modes_segmentation [ref]
Run pytest [ref]
See error

Expected behavior
It should pass.

Screenshots
N.A.

GaNDLF Version
0.0.10-dev

Additional context
N.A.

Issue in the "normalize_by_val" function

Describe the bug
I believe that the normalize_by_val function is broken. The function itself seems to not work:

def normalize_by_val(input_tensor, mean, std):
    """
    This function returns the tensor normalized by these particular values
    """
    return transforms.Normalize(mean, std)

There is no function called transforms.Normalize and input_tensor is never being used.

Expected behavior
The function should look something like this:

def normalize_by_val(input_tensor, mean, std):
    """
    This function returns the tensor normalized by these particular values
    """
    return normalize_function(input_tensor, mean, std)

OR:

def normalize_by_val(input_tensor, mean, std):
    """
    This function returns the tensor normalized by these particular values
    """
    normalizer = NormalizeClass(mean, std) 
    return normalizer(input_tensor)

GANDLF Version
0.0.8-dev

Validation DICE calculation should not include background

Is your feature request related to a problem? Please describe.
Currently, GaNDLF allows a model to train on background (useful in the case where non-annotated region of mask is extremely varied). But, the validation DICE score gets calculated in the background, as well, which should not be the case.

Describe the solution you'd like
Add an option to discard a particular label from the class list in validation dice consideration.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Add option to track memory usage

Is your feature request related to a problem? Please describe.
Tracking memory usage is a nice option to have for debugging.

Describe the solution you'd like
Add an option track_memory_usage in configuration and if enabled, it can save memory logs per epoch.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Add hausdorff 95

Is your feature request related to a problem? Please describe.
Hausdorff metric would be good to have.

Describe the solution you'd like
MedPy has an implementation to use, but using the entire library is cumbersome, so perhaps that implementation can be used.

Describe alternatives you've considered
Using this metric from a different package.

Additional context
N.A.

Imagenet weights for GANDLF vgg/resnet etc for 2D images

Is your feature request related to a problem? Please describe.
It would be great to have pretrained imagenet weights in GANDLF for 2D images

Describe the solution you'd like
in the .yaml file, it would be great to have imagenet option set to true to work with pretrained weights if need rises.

Describe alternatives you've considered

Additional context
https://pytorch.org/vision/0.8/_modules/torchvision/models/vgg.html

Extend loss weight calculation for classification

Is your feature request related to a problem? Please describe.
Currently, classification problems don't use weight calculations for specific classes.

Describe the solution you'd like
Similar to segmentation, the weights should be dynamic.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Add attention maps

Is your feature request related to a problem? Please describe.
Interpretability in DL training is critical and would be very nice to have in GaNDLF.

Describe the solution you'd like
Integrate M3d-Cam into the training/inference process.

Describe alternatives you've considered
Open to other suggestions.

Additional context
Also requested by @gastouna

Ensure memory pinning is happening properly

Is your feature request related to a problem? Please describe.
Currently, memory pinning doesn't work for CPU data

Describe the solution you'd like
Implement solution based on fepegar/torchio#568 (comment)

Describe alternatives you've considered
N.A.

Additional context
N.A.

Source formatting standardization

Is your feature request related to a problem? Please describe.
Automatic source-level standardization is useful when multiple people are working on the same codebase to ensure cohesive code styling.

Describe the solution you'd like
Black is a very nice mechanism to do this. And putting this as a pre-commit hook would save maintainers a lot of headache and time during PR reviews.

Describe alternatives you've considered
Manually force consistent styling, which is painful.

Additional context
N.A.

Allow resizing of differently sized images in a single cohort

Is your feature request related to a problem? Please describe.
Currently, the resize preprocessing function works when all inputs are of a single data size, which is constrictive.

Describe the solution you'd like
This should be expanded to be able to tackle any image sizes.

Describe alternatives you've considered
N.A.

Additional context
Reported by @Karol-G

Fix codacy issues

Is your feature request related to a problem? Please describe.
There are some basic issues being reported by codacy, which can be quickly fixed.

Describe the solution you'd like
Fix these.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Add a script called `gandlf_preprocess` for offline processing

Is your feature request related to a problem? Please describe.
Currently, GaNDLF performs pre-processing on-the-fly, which is fine but causes issues when there are hardware constraints.

Describe the solution you'd like
Include a script called gandlf_preprocess (which could be an extension of gandlf_padder) which performs all preprocessing and saves the processed images on disk.

Describe alternatives you've considered
N.A.

Additional context
This should create a mechanism which ensures that the preprocessing isn't repeated when the data is called using gandlf_run.

Moving to torchmetrics for handling metrics

Is your feature request related to a problem? Please describe.
Instead of implementing all the metrics, why not use torchmetrics ?

Build a conda recipe for installation

Is your feature request related to a problem? Please describe.
Currently, the installation of GaNDLF is complicated, and it should be simplified.

Describe the solution you'd like
A conda recipe would be very useful.

Describe alternatives you've considered
Pypi would unfortunately not work because of a myriad of dependencies that only work on conda.

Additional context
N.A.

Using gandlf in windows

I am trying to install gandlf in windows by following instructions on https://cbica.github.io/GaNDLF/extending
I am getting an error while doing "conda install -c conda-forge gandlf -y"

Can you please help me with this?

Add code analysis

Is your feature request related to a problem? Please describe.
Adding automated code analysis will improve CI/CD process.

Describe the solution you'd like
Multiple solutions available.

Describe alternatives you've considered
N.A.

Additional context
N.A.

Error In Classification due to a bug in get_loss_and_metrics()

Describe the bug
New version of "get_loss_and_metrics" function is not covering classification tasks.
line 494 --> "if len(predicted) > 1:" statement is designed for sdnet-style archs but for classification tasks, the len(pred) is equal to the batch size, which is the case most of the time for classification.

To Reproduce
Steps to reproduce the behavior:

run GaNDLF for classification tasks
You can use the config file I attached.

Expected behavior
The following error occures.

That is due to line 496 --> loss_seg = loss_function(predicted[0], ground_truth.squeeze(-1), params) uses only the prediction, but the ground truth has the shape of batch_size.

Screenshots
If applicable, add screenshots to help explain your problem.

GaNDLF Version
Version information is found on Help > About
0.11

Desktop (please complete the following information):

Linux WSL Ubuntu 20.04
config_classification.txt

Consolidate `one_hot` definition (currently present in 3 different files)

Bug in regression causing different output size to get broadcasted

Describe the bug
The regression code is giving a weird error.

To Reproduce
Steps to reproduce the behavior:

Run the command:

conda activate ./venv
pytest --device cuda

See this error:

##vso[task.logissue type=warning;]Using a target size (torch.Size([1])) that is different to the input size (torch.Size([2])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

Expected behavior
This error should not be present, because it is causing issues with regression.

Screenshots
N.A.

GaNDLF Version
0.0.10-dev

Desktop (please complete the following information):
N.A.

Additional context
N.A.

Inconsistency between average validation loss presented in console and log

Describe the bug
The console message "Epoch Average Validation Loss" gives different result in the console and logs.

To Reproduce
Train a network for a single epoch with multiple batches

Expected behavior
There should be consistency

Screenshots
N.A.

GANDLF Version
0.0.9-dev

Desktop (please complete the following information):
N.A.

Additional context
Originally reported by Vinayak

Allow pre-split training and validation CSVs

Is your feature request related to a problem? Please describe.
When comparing against other applications, it would be useful to allow users to provide their pre-split training and validation data.

Describe the solution you'd like
Allow overloading the TrainingManager to take pre-sorted data frames.

Describe alternatives you've considered
N.A.

Additional context
Requested by @Geeks-Sid and @jammy270

/bin/sh: 1: qsub: not found

Hi Sarthak,

when I try to train on the toy dataset with the samples/config_classification.yaml I get the error /bin/sh: 1: qsub: not found. I believe this originates from 'parallel_compute_command' in the config. I am using the newest pull from gandalf-refactor and am using Linux.

The train command:

python gandlf_run -config ./experiments/2d_classification/model.yaml -data ./experiments/2d_classification/train.csv -output ./experiments/2d_classification/output_dir/ -train 1 -device cuda

Full error log:

Submitting job for testing split 0 and validation split 0
/bin/sh: 1: qsub: not found
Submitting job for testing split 0 and validation split 1
/bin/sh: 1: qsub: not found
Submitting job for testing split 0 and validation split 2
/bin/sh: 1: qsub: not found
Submitting job for testing split 0 and validation split 3
/bin/sh: 1: qsub: not found
Submitting job for testing split 0 and validation split 4
/bin/sh: 1: qsub: not found
Submitting job for testing split 1 and validation split 0
/bin/sh: 1: qsub: not found
Submitting job for testing split 1 and validation split 1
/bin/sh: 1: qsub: not found
Submitting job for testing split 1 and validation split 2
/bin/sh: 1: qsub: not found
Submitting job for testing split 1 and validation split 3
/bin/sh: 1: qsub: not found
Submitting job for testing split 1 and validation split 4
/bin/sh: 1: qsub: not found
Submitting job for testing split 2 and validation split 0
/bin/sh: 1: qsub: not found
Submitting job for testing split 2 and validation split 1
/bin/sh: 1: qsub: not found
Submitting job for testing split 2 and validation split 2
/bin/sh: 1: qsub: not found
Submitting job for testing split 2 and validation split 3
/bin/sh: 1: qsub: not found
Submitting job for testing split 2 and validation split 4
/bin/sh: 1: qsub: not found
Submitting job for testing split 3 and validation split 0
/bin/sh: 1: qsub: not found
Submitting job for testing split 3 and validation split 1
/bin/sh: 1: qsub: not found
Submitting job for testing split 3 and validation split 2
/bin/sh: 1: qsub: not found
Submitting job for testing split 3 and validation split 3
/bin/sh: 1: qsub: not found
Submitting job for testing split 3 and validation split 4
/bin/sh: 1: qsub: not found
Submitting job for testing split 4 and validation split 0
/bin/sh: 1: qsub: not found
Submitting job for testing split 4 and validation split 1
/bin/sh: 1: qsub: not found
Submitting job for testing split 4 and validation split 2
/bin/sh: 1: qsub: not found
Submitting job for testing split 4 and validation split 3
/bin/sh: 1: qsub: not found
Submitting job for testing split 4 and validation split 4
/bin/sh: 1: qsub: not found

This is the model.yaml (which is the samples/config_classification.yaml):

# affix version
version:
  {
    minimum: 0.0.8,
    maximum: 0.0.8 # this should NOT be made a variable, but should be tested after every tag is created
  }
# Choose the model parameters here
model:
  {
    dimension: 3, # the dimension of the model and dataset: defines dimensionality of computations
    base_filters: 30, # Set base filters: number of filters present in the initial module of the U-Net convolution; for IncU-Net, keep this divisible by 4
    architecture: vgg16, # options: unet, resunet, fcn, uinc, vgg, densenet
    batch_norm: True, # this is only used for vgg
    final_layer: None, # can be either sigmoid, softmax or none (none == regression)
    amp: False, # Set if you want to use Automatic Mixed Precision for your operations or not - options: True, False
    n_channels: 3, # set the input channels - useful when reading RGB or images that have vectored pixel types
  }
# this is to enable or disable lazy loading - setting to true reads all data once during data loading, resulting in improvements
# in I/O at the expense of memory consumption
in_memory: False
# this will save the generated masks for validation and testing data for qualitative analysis
save_masks: False
# Set the Modality : rad for radiology, path for histopathology
modality: rad
# Patch size during training - 2D patch for breast images since third dimension is not patched 
patch_size: [64,64,64]
# uniform: UniformSampler or label: LabelSampler
patch_sampler: uniform
# Number of epochs
num_epochs: 100
# Set the patience - measured in number of epochs after which, if the performance metric does not improve, exit the training loop - defaults to the number of epochs
patience: 50
# Set the batch size
batch_size: 1
# Set the initial learning rate
learning_rate: 0.001
# Learning rate scheduler - options: triangle, triangle_modified, exp, reduce-on-lr, step, more to come soon - default hyperparameters can be changed thru code
scheduler: triangle
# Set which loss function you want to use - options : 'dc' - for dice only, 'dcce' - for sum of dice and CE and you can guess the next (only lower-case please)
# options: dc (dice only), dc_log (-log of dice), ce (), dcce (sum of dice and ce), mse () ...
# mse is the MSE defined by torch and can define a variable 'reduction'; see https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss
# use mse_torch for regression/classification problems and dice for segmentation
loss_function: mse
# this parameter weights the loss to handle imbalanced losses better
weighted_loss: True 
#loss_function:
#  {
#    'mse':{
#      'reduction': 'mean' # see https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss for all options
#    }
#  }
# Which optimizer do you want to use - adam/sgd
opt: adam
# this parameter controls the nested training process
# performs randomized k-fold cross-validation
# split is performed using sklearn's KFold method
# for single fold run, use '-' before the fold number
nested_training:
  {
    testing: 5, # this controls the testing data splits for final model evaluation; use '1' if this is to be disabled
    validation: 5 # this controls the validation data splits for model training
  }
## pre-processing
# this constructs an order of transformations, which is applied to all images in the data loader
# order: resize --> threshold/clip --> resample --> normalize
# 'threshold': performs intensity thresholding; i.e., if x[i] < min: x[i] = 0; and if x[i] > max: x[i] = 0
# 'clip': performs intensity clipping; i.e., if x[i] < min: x[i] = min; and if x[i] > max: x[i] = max
# 'threshold'/'clip': if either min/max is not defined, it is taken as the minimum/maximum of the image, respectively
# 'normalize': performs z-score normalization: https://torchio.readthedocs.io/transforms/preprocessing.html?highlight=ToCanonical#torchio.transforms.ZNormalization
# 'normalize_nonZero': perform z-score normalize but with mean and std-dev calculated on only non-zero pixels
# 'normalize_nonZero_masked': perform z-score normalize but with mean and std-dev calculated on only non-zero pixels with the stats applied on non-zero pixels
# 'crop_external_zero_planes': crops all non-zero planes from input tensor to reduce image search space
# 'resample: resolution: X,Y,Z': resample the voxel resolution: https://torchio.readthedocs.io/transforms/preprocessing.html?highlight=ToCanonical#torchio.transforms.Resample
# 'resample: resolution: X': resample the voxel resolution in an isotropic manner: https://torchio.readthedocs.io/transforms/preprocessing.html?highlight=ToCanonical#torchio.transforms.Resample
# resize the image(s) and mask (this should be greater than or equal to patch_size); resize is done ONLY when resample is not defined
data_preprocessing:
  {
    'normalize',
    # 'normalize_nonZero', # this performs z-score normalization only on non-zero pixels
    'resample':{
      'resolution': [1,2,3]
    },
    #'resize': [128,128], # this is generally not recommended, as it changes image properties in unexpected ways
    'crop_external_zero_planes', # this will crop all zero-valued planes across all axes
  }
# various data augmentation techniques
# options: affine, elastic, downsample, motion, ghosting, bias, blur, gaussianNoise, swap
# keep/edit as needed
# all transforms: https://torchio.readthedocs.io/transforms/transforms.html?highlight=transforms
# 'kspace': one of motion, ghosting or spiking is picked (randomly) for augmentation
# 'probability' subkey adds the probability of the particular augmentation getting added during training (this is always 1 for normalize and resampling)
data_augmentation: 
  {
    default_probability: 0.5,
    'affine',
    'elastic',
    'kspace':{
      'probability': 1
    },
    'bias',
    'blur': {
      'std': [0, 1] # default std-dev range, for details, see https://torchio.readthedocs.io/transforms/augmentation.html?highlight=randomblur#torchio.transforms.RandomBlur
    },
    'noise': { # for details, see https://torchio.readthedocs.io/transforms/augmentation.html?highlight=randomblur#torchio.transforms.RandomNoise
      'mean': 0, # default mean
      'std': [0, 1] # default std-dev range
    },
    'anisotropic':{
      'axis': [0,1],
      'downsampling': [2,2.5]
    },
  }
# parallel training on HPC - here goes the command to prepend to send to a high performance computing
# cluster for parallel computing during multi-fold training
# not used for single fold training
# this gets passed before the training_loop, so ensure enough memory is provided along with other parameters
# that your HPC would expect
# ${outputDir} will be changed to the outputDir you pass in CLI + '/${fold_number}'
# ensure that the correct location of the virtual environment is getting invoked, otherwise it would pick up the system python, which might not have all dependencies
parallel_compute_command: 'qsub -b y -l gpu -l h_vmem=32G -cwd -o ${outputDir}/\$JOB_ID.stdout -e ${outputDir}/\$JOB_ID.stderr `pwd`/sge_wrapper _correct_location_of_virtual_environment_/venv/bin/python'
## queue configuration - https://torchio.readthedocs.io/data/patch_training.html?#queue
# this determines the maximum number of patches that can be stored in the queue. Using a large number means that the queue needs to be filled less often, but more CPU memory is needed to store the patches
q_max_length: 40
# this determines the number of patches to extract from each volume. A small number of patches ensures a large variability in the queue, but training will be slower
q_samples_per_volume: 5
# this determines the number subprocesses to use for data loading; '0' means main process is used
q_num_workers: 2 # scale this according to available CPU resources
# used for debugging
q_verbose: False

Best
Karol

Wrong Output Size for Classification Problems

Describe the bug
When doing multi-class classification in the current master branch, and the CSV file have the class labels in single column, the number of classes are seen as 1.

To Reproduce
Steps to reproduce the behavior:

Checkout to master branch
Use the config file attached and prepare the csv file based on the images I attached.
Run gandlf

Expected behavior

I think the problem is due to the populate_header_in_parameters function. There is an if statement there that setting the number of classes to the number of prediction headers, which is one header, consisting of the image labels.

Screenshots
config_class.txt

ages.githubusercontent.com/24361544/122981569-31de2b00-d39
mini_training_data - Copy.txt
a-11eb-9ba0-4872c0b1079f.png)
GaNDLF Version
0.0.10

GaNDLF Version
0.0.10

Desktop (please complete the following information):

Ubuntu 20.04 WSL 2

Error In Classification Tasks -- `perform_sanity_check_on_subject` Function

Describe the bug
When running GaNDLF, error occured in 638th line of utils.py file. That is due to subject[str(key)] = "NA" and it does not have "path" key value since it is just string.

To Reproduce
Steps to reproduce the behavior:

Run GaNDLF with the configuration I attached.
Error should occur before the training started.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

GaNDLF Version
11

Desktop (please complete the following information):

Linux WSL 20.04

Additional context
mini_training_data_2.txt
config_classification.txt

Generated Patches have 1 channel when reading 3 channel images from one JPG file

Describe the bug
This bug appears in the classification problems. If the dataset are consist of jpg files having 3 channels, but in one jpg image, the patch generator generates images as [128x128x1] instead of [128x128x3].
To Reproduce
Steps to reproduce the behavior:

Go tto master branch
Use the config file attached and prepare the csv file based on the images I attached.
Run gandlf

Expected behavior
A runtime error regarding to the channels.

RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[64, 1, 128, 128] to have 3 channels, but got 1 channels instead

Screenshots

config_class.txt

ages.githubusercontent.com/24361544/122981569-31de2b00-d39a-11eb-9ba0-4872c0b1079f.png)

If applicable, add screenshots to help explain your problem.
mini_training_data - Copy.txt

GaNDLF Version
0.0.10

Desktop (please complete the following information):

Ubuntu 20.04 WSL 2

Hausdorff calculation is problematic

Describe the bug
Calculating hausdorff is giving problems for 2D images

To Reproduce
Steps to reproduce the behavior:

run test_metrics_segmentation_rad_2d from tests
See error

Expected behavior
It should work.

Screenshots
N.A.

GaNDLF Version
0.0.11-dev

Desktop (please complete the following information):
N.A.

Additional context
N.A.

`utils` cleanup

lot of functions that aren't used and should be removed

Code re-organization

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

This framework looks great, but it seems the code internally is pretty messy and hard to extend.

Have a look at Lightning Flash: https://github.com/PyTorchLightning/lightning-flash/blob/master/flash_examples/finetuning/object_detection.py#L23.

It might help to re-organize the code and make the tasks simpler to extends.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

UNet residual connection is not getting activated

Describe the bug
Residual connections are not getting activated for UNet.

To Reproduce
Steps to reproduce the behavior:

Select resunet under ["model"]["architecture"]
See residual flag not getting passed to base modules

Expected behavior
The residual flag should be used.

Screenshots
N.A.

GaNDLF Version
0.0.11-dev

Desktop (please complete the following information):
N.A.

Additional context
N.A.

`densenet` requires patches to be divisible by 16, which is not really correct

Describe the bug
When parameters["model"]["architecture"] == densenet121 (or any other variant), patch divisibility is kicking in.

To Reproduce
Steps to reproduce the behavior:

Select a densenet variant for ["model"]["architecture"]
Put patch_size as something not divisible by 16
See error triggered in parameterParsing.py

Expected behavior
DenseNet should not require this check.

Screenshots
N.A.

GaNDLF Version
0.0.11-dev

Desktop (please complete the following information):
N.A.

Additional context
Originally reported by @orhunguley

Final Layer is not used in forward pass in VGG implementation

Describe the bug
Current VGG implementation in GaNDLF initialized the final layer but do not uses it in the forward pass.

Expected behavior
There should not be any errors, but when one uses VGG models, the final layer specified in the config file is never used.

GaNDLF Version
Version information is found on Help > About
GaNDLF 0.11

Add option to change all data to a single orientation

Is your feature request related to a problem? Please describe.
Since GaNDLF includes preprocessing, it would make sense to ensure that the orientation of all datasets is consistent.

Describe the solution you'd like
It would be useful to ensure TorchIO's to_canonical is added as an option in ImagesFromDataFrame.

Describe alternatives you've considered
Use SimpleITK's DICOMOrientImageFilter, but this cannot be done until TorchIO requires SimpleITK<2.

Additional context
N.A.

mlcommons / gandlf Goto Github PK

gandlf's People

Contributors

Stargazers

Watchers

Forkers

gandlf's Issues

Recommend Projects

Recommend Topics

Recommend Org