Giter Club home page Giter Club logo

ccsr's Issues

trian model CUDA out of memory

Is there any way to train on 24G on a GTX3090, even with one batch size?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 3; 23.69 GiB total capacity; 23.03 GiB already allocated; 21.69 MiB free; 23.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Epoch 0: 0%| | 2/35135 [00:29<144:16:07, 14.78s/it, loss=0.389, v_num=0, train/loss_simple_step=0.131, train/loss_vlb_step=0.000475, train/loss_step=0.131, global_step=0.000, train/loss_x0_step=0.335, train/loss_x0_from_tao_step=0.366, train/loss_noise_from_tao_step=0.00291, train/loss_net_step=0.704]

Issues with `smallkF` in xFormers: CUDA Support and Operator Build Errors

I am encountering multiple issues with smallkF in the xFormers library. The problems seem to arise from a combination of CUDA support, operator build, and embed per head size. Below are the specific error messages and my current setup:

`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 512

Xt_min -> X0

非常出色的工作!

请问论文中 Xt_min -> X0 这一步的过程中,论文中说使用的是 truncated output,请问这个过程具体是什么样的?我看在论文中好像没有详细介绍,对着一部分非常好奇。

training decoder during Stage 2

Thanks for the wonderful work! I noticed you only train 100 iterations for decoder, while 25k iterations for SD.
Is it enough to train decoder for only 100 iterations?
BTW, did you use combination of L1 loss, perceptual loss, and GAN loss to train decoder?

Thanks in advance!

Use case of CCSR compared to SeeSR

Please can you comment on when one would use CCSR vs SeeSR?

Both appear to have similar objectives? How does CCSR perform vs SeeSR?

Thanks

Training with your own lr data

Hello, thanks a lot for your work!

I currently work on grayscale CT-images super-resolution. As I went through both steps of the training process using pre-trained stable diffusion the results turned out to be far from desired.
I have a dataset of paired lq and hq images that I want to use to train stage1 and stage2 models.
Is it possible to do this in the current design with minor additions to the code?
Will the use of a SwinIR model that was pretrained with my dataset, improve the quality of generation?

No module named Taming found

How can I fix this error?

File "importlib_init_.py", line 126, in import_module
File "", line 1204, in _gcd_import
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in call_with_frames_removed
File "D:\ComfyUI1\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CCSR\ldm\modules\losses_init
.py", line 1, in
from ....ldm.modules.losses.contperceptual import LPIPSWithDiscriminator
File "D:\ComfyUI1\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CCSR\ldm\modules\losses\contperceptual.py", line 4, in
from taming.modules.losses.vqperceptual import * # TODO: taming dependency yes/no?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'taming'

I could resolved the issue successfully by executing the following commands:

          I could resolved the issue successfully by executing the following commands:

pip uninstall pytorch-lightning torch torchvision torchmetrics

pip install torch==2.0.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html pip install torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html pip install torchmetrics==0.6.0 pip install pytorch-lightning==1.4.2

pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

Originally posted by @Limbicnation in #10 (comment)

Colab run on T4

Hello, this is a great work!
I can't run inference code on colab with T4
image
Could you help me, please?

Delete

delete. Wrong repo. Sorry!

Will this work on dual GPU?

Having VRAM issues with 24GB even when tiling by a lot. If I buy another GPU, can I use them both to fix this and if so, how do I enable it?

Or can this code be modified to use FP16 instead of FP32, if that will reduce memory usage?

Stage2 training: errors while loading ckpt from Stage1 training

Got the following error when training stage2
Missing key(s) in state_dict: "betas_inter", "alphas_cumprod_inter", "alphas_cumprod_prev_inter", "sqrt_alphas_cumprod_inter", "sqrt_one_minus_alphas_cumprod_inter", "log_one_minus_alphas_cumprod_inter", "sqrt_recip_alphas_cumprod_inter", "sqrt_recipm1_alphas_cumprod_inter", "posterior_variance_inter", "posterior_log_variance_clipped_inter", "posterior_mean_coef1_inter", "posterior_mean_coef2_inter", "decoder_loss.logvar", "decoder_loss.perceptual_loss.scaling_layer.shift", "decoder_loss.perceptual_loss.scaling_layer.scale", "decoder_loss.perceptual_loss.net.slice1.0.weight", "decoder_loss.perceptual_loss.net.slice1.0.bias", "decoder_loss.perceptual_loss.net.slice1.2.weight", "decoder_loss.perceptual_loss.net.slice1.2.bias...

After checking the difference between stage1 training and stage2 training, it seems that this error is valid because these are modules appears in ddpm_ccsr_stage2.py but not in ddpm_ccsr_stage1.py. I'm wondering if I miss anything, or the load_state_dict() for stage2 training should simpled call with strick==False

Great work! Really appreaciate sharing the details!

Question About Figure.1

你好!
我看到论文中Figure.1 的左图,关于时间步和PSNR与LPISP的关系感觉很有意思,很有洞见的发现。

但是这整个迭代过程是复杂的。从1000-0的时间步上,是从纯噪音到SR的过程。其中的每一步中间结果,其实包括了两部分:图像和噪音。整个过程也就是:图像质量逐渐提高,噪音逐渐减少。我们是想评估图像质量的变化,但是噪音会干扰这个测试。

请问你测试中间步的结果,是如何减少噪音对指标的影响?

因为我测了一下两个好像都是单调的曲线哎?更够提供更多的细节吗?

questions about metrics

Hi author, thanks for your team's contribution.

I would like to ask you a question about calculating the metrics during the training process. Specifically, the training process is usually interspersed with a validation step, do you perform the computation of the evaluation metrics during the validation step, which seems to be time consuming. So I'm wondering how you schedule the evaluation during the training process?

why use one-step sampling

很棒的工作,感谢作者分享!

有个小问题想请教一下:请问推理过程中的x_T是纯噪声吗,x_T到x_tmax这一步的作用是什么?为什么不直接用LR加对应噪声得到x_tmax, 或者x_T逐步去噪直到x_tmin?

ModuleNotFoundError: No module named 'utils.devices'

感谢你们的工作!
当我部署最新的脚本(2024-1-15)时,遇到了如题的错误。看了下脚本结构,确实也没发现utils.devices。
不知道是不是我遗漏了什么。
感谢!

Line at bottom

Have noticed a line at the bottom of the images.

image

Unsure why this happens. Using default settings.

librairy pytorch_lightning.utilities.distributed problem

Issue Description

Hi,

After creating the ccrsr virtual environment and running python3 inference_ccsr.py, I encountered the following issue:

ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

Environment Information

  • pytorch-lightning Version: 2.1.3
  • torch Version: 2.0.1
  • Python Version: 3.9.18

Resolution

To resolve the issue, I made the following modification in the code:

In CCSR/ldm/models/diffusion/ddpm_ccsr_stage2.py and /home/pierre/CCSR/ldm/models/diffusion/ddpm_ccsr_stage1.py, I changed:

from pytorch_lightning.utilities.distributed import rank_zero_only

to:

from pytorch_lightning.utilities.rank_zero import rank_zero_only

This modification allowed me to make it work.

Steps to Reproduce

  1. Create ccrsr virtual environment.
  2. Run python3 inference_ccsr.py.

Does Colab Work ? Error Occured.

When I run CCSR colab demo that can be found in readme.md , Error occured like below.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xformers 0.0.22.post7 requires torch==2.1.0, but you have torch 2.2.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.2.1+cu121 requires torch==2.2.1, but you have torch 2.1.0 which is incompatible.
torchtext 0.17.1 requires torch==2.2.1, but you have torch 2.1.0 which is incompatible.
torchvision 0.17.1+cu121 requires torch==2.2.1, but you have torch 2.1.0 which is incompatible.

error

Am I missing Something ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.