Giter Club home page Giter Club logo

dpsda's Issues

Error When `variation_degree_schedule` Value Exceeds 10

Context

  • Dataset: Brain Tumor MRI Dataset with 4 classes, 5713 training samples, and 1312 testing samples. Images are labeled as "label_objectNumber" and converted to RGB. More details can be found here.
  • Environment: PyTorch 1.12.0, CUDA 11.7.0
  • Script Parameters:
    • Feature Extractor: inception_v3
    • FID Model Name: inception_v3
    • Dataset Name for FID: brain
    • Image Size: 64x64
    • Batch Size: 500
    • Variation Degree Schedule: 0 to 42 in steps of 2, with an error occurring for values > 10

Issue Description

The script runs successfully for many iterations when the variation_degree_schedule parameter values are below 10. However, exceeding this value results in the following error during the image variation phase:
"Traceback (most recent call last):
File "/cluster/home/laidir/DPSDA/main.py", line 468, in
main()
File "/cluster/home/laidir/DPSDA/main.py", line 361, in main
packed_samples = api.image_variation(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 255, in image_variation
sub_variations = self._image_variation(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 268, in _image_variation
samples, _ = sample(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 307, in sample
sample = sampler(
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/cluster/apps/eb/software/PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion_api.py", line 354, in forward
sample = sample_fn(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion/gaussian_diffusion.py", line 223, in ddim_sample_loop
for sample in self.ddim_sample_loop_progressive(
File "/cluster/home/laidir/DPSDA/apis/improved_diffusion/gaussian_diffusion.py", line 269, in ddim_sample_loop_progressive
t_batch = th.tensor([indices[0]] * img.shape[0], device=device)
IndexError: list index out of range
"

This error appears to originate from an IndexError in the ddim_sample_loop within the improved diffusion API, specifically when attempting to index a list beyond its range.

Steps to Reproduce

  1. Run the provided script with the variation_degree_schedule parameter set to include values greater than 10.
  2. Observe the IndexError as described above during the image variation phase.

Additional Information

Run with A6000 GPU

Could I run the experiment with A6000 GPU? It seems that A6000 is not enough for the default settings (CIFAR10). Shall I reduce the batch size or use data parallelism?

Many thanks in advance for your help!

Out of Memory Error on A100 40GB GPUs with main_improved_diffusion_cifar10_conditional.sh and main_improved_diffusion_cifar10_conditional.sh

Environment

  • PyTorch Version: 1.12.1
  • CUDA Version: 11.7
  • GPU Type: NVIDIA A100 40GB

Description

I am experiencing an out-of-memory (OOM) error when attempting to run the main_improved_diffusion_cifar10_conditional.sh and main_improved_diffusion_cifar10_conditional.sh scripts. Despite utilizing an NVIDIA A100 GPU with 40GB of memory, which should be sufficient for these tasks, the scripts consistently fail due to memory issues.

The expected behavior, based on documentation and typical usage for similar tasks, would not exceed the 40GB memory limit of the A100 GPU. However, even under normal conditions and with ample available memory, the scripts trigger an OOM error.

Attempts to Resolve

  • Ensured no other significant processes are consuming GPU memory.
  • Monitored memory usage to confirm that the OOM error occurs despite available memory.
  • Reduced batch size and num_samples_schedule (and related parameters)
  • Searched for similar issues or advice in the repository's issues section and online forums.

I appreciate any insights, suggestions, or updates that might help resolve this issue. Thank you for your attention to this matter and for the valuable resources provided.

Best regards,
Roufaida Laidi

FutureWarning: Passing `image` as torch tensor with value range in [-1,1] is deprecated.

Thank you for sharing this great codebase! When I tried the quick example for Cat Cookie with scripts/main_stable_diffusion_cookie.sh, I noticed a warning from the diffusers package as follows.

Found 100 images in the folder /tmp/result_cookie
FID result_cookie : 100%|█████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.73s/it]
07/04/2023 12:40:03 PM [MainThread  ] [INFO ]  fid=86.46057094437475
07/04/2023 12:40:03 PM [MainThread  ] [INFO ]  t=1
07/04/2023 12:40:03 PM [MainThread  ] [INFO ]  Running image variation
  0%|                                                                                     | 0/8 [00:00<?, ?it/s/
.../python3.8/site-packages/diffusers/image_processor.py:204: FutureWarn
ing: Passing `image` as torch tensor with value range in [-1,1] is deprecated. The expected value range for imag
e tensor is [0,1] when passing as pytorch tensor or numpy Array. You passed `image` with value range [-1.0,1.0]
  warnings.warn(

Is this something that I should be careful? Was this warning already there in your experiments? If not, maybe this is due to a new release of diffusers. In that case, I will greatly appreciate it if you can share the version you used for diffusers.

In case it helps, this issue might be related to huggingface/diffusers#3876. Thank you very much for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.