chairc / integrated-design-diffusion-model Goto Github PK

IDDM (Industrial, landscape, animate...), support DDPM, DDIM, PLMS, webui and multi-GPU distributed training. Pytorch实现，生成模型，扩散模型，分布式训练

License: Apache License 2.0

Python 100.00%

ddpm defect-detection diffusion-models industrial ddim distributed-computing distributed-training pytorch unet aigc

integrated-design-diffusion-model's Introduction

Hi 👋, I am chairc

An ordinary master from China 🇨🇳

🔭 I’m currently working on computer vision
🌱 I’m currently researching industrial defect detection
📫 Reach me at [email protected]
👨‍💻 All of my projects are available at my github
📝I regularly write articles on my blog

integrated-design-diffusion-model's People

Contributors

Stargazers

Watchers

Forkers

xiaoningpi wf1024966 zhaohuiqiao0517 lzh1998-jansen xuanjiawang yuanzhongqiao luoqichao tangwy98 egoist945402376 bestl1fe edwardtj lizhaoguo123

integrated-design-diffusion-model's Issues

租用的GPU训练时出现socket has failed to listen on any local network address

请问有什么头绪吗, 谢谢大佬的解答.

Loading Unconditional Model Failure!!!

Version: 1.1.3
Type: BUG

When I use unconditional to interrupt training, loading the model again will cause the following error.

Traceback (most recent call last): File "D:\Integrated-Design-Diffusion-Model\tools\train.py", line 408, in <module> main(args) File "D:\Integrated-Design-Diffusion-Model\tools\train.py", line 293, in main train(args=args) File "D:\Integrated-Design-Diffusion-Model\tools\train.py", line 156, in train load_ckpt(ckpt_path=pretrain_path, model=model, device=device, is_pretrain=pretrain, load_model_ckpt(model=model, model_ckpt=ckpt_model, is_train=is_train, is_pretrain=is_pretrain, File "D:\Integrated-Design-Diffusion-Model\utils\checkpoint.py", line 115, in load_model_ckpt model_weights_dict = {k: v for k, v in model_weights_dict.items() if np.shape(model_dict[k]) == np.shape(v)} File "D:\Integrated-Design-Diffusion-Model\utils\checkpoint.py", line 115, in <dictcomp> model_weights_dict = {k: v for k, v in model_weights_dict.items() if np.shape(model_dict[k]) == np.shape(v)} KeyError: 'label_emb.weight'

The model incorrectly loaded the weights in the conditional mode. I think there is an error here. Should we add conditional parameter to the load_model_ckpt() method in checkpoint.py?

A summary of issues (问题总结)

This Issue is to summarize all kinds of problems and provide corresponding solutions. If there is no relevant problem in this issue, you can propose a new issue, and I will answer it.

Thanks for feedback bugs and contributing code (and pr).

这个Issue是对各种问题进行总结，并提供相应的解决方案。如果这个issue没有回答相关的问题，你可以提出一个新的issue，我会解答。

另外，感谢反馈bug和贡献代码(和pr)。

Problem quick navigation
问题快速导航
Q1. What is the purpose of this project? What significance does it have? (这个项目是做什么的？它有什么意义？)
Q2. How should I choose appropriate parameters during training? (我该如何在训练时选择合适的参数？)
Q3. How can I accelerate image generation during training? (我该如何在训练时加速图像生成？)
Q4. Why am I encountering numerous CUDA or cuDNN errors such as THCudaCheck FAIL file=../aten/src/THC/THCCachingHostAllocator.cpp or RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR during training? (为什么我在训练的时候出现了THCudaCheck FAIL file=../aten/src/THC/THCCachingHostAllocator.cpp或RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR等大片CUDA或cuDNN错误？)
Q5. Why do I see noise issues in the generated images? (为什么我生成的图片会出现噪点问题？)
Q6. How should the dataset be divided? How to set up conditional and unconditional training? (数据集该如何划分？条件训练和非条件训练该怎么设置？)
Q7. The training was interrupted unexpectedly. How can I resume training? (训练异常中断了，如何恢复训练？)
Q8. The training time for each epoch is too long. How can I use a pretrained model? (每轮训练时间太长了，怎么使用预训练模型？)
Q9. Why does using a 32×32 model to generate 64×64 or 128×128 images result in distortion and more objects? (为什么使用32×32的模型生成64×64的图片会扭曲、物体会变多呢？)
Q10. Why do I get a RuntimeError: Address already in use error when starting training? (为什么我启动训练报RuntimeError: Address already in use错误？)

cifar-10 调参

你好，

想问一下你在训练cifar-10的时候的超参是什么样的呢？

输入的是灰度图像能输出灰度图像吗

大佬我是用此模型来生成图像，输入是灰度图，在训练后也能生成灰度图像吗？我跑了200epoch 后生成的是彩色图像

写地太神了, 作为新手仍然想慢慢读源码

有什么好的路线吗读这个源码_(:з)∠)_, 新手看地比较杂乱.

可以用来生成工业上的缺陷图片吗？

恢复训练

我在完成epoch76的时候断网了, 恢复设置为start_epoch为77, 但是开始训练的时候显示是epoch 80... 虽然不影响什么性能但也提一下吧(

数据集是否需要划分

作者你好，小白想问一下，生成和训练的数据集是否要想训练其他神经网络一样划分数据集呀？

生成出来的图片是噪点图

用的ddpm，训练的参数和生成的参数都是一样的，生成出来的是噪点图，想问下是哪部分出现了问题呢

关于训练集和验证集

在训练文件下，
parser.add_argument("--train_dataset_path", type=str,
default="G:/diffusion/Integrated-Design-Diffusion-Model-main/datasets/dataset_demo/class_1")
parser.add_argument("--val_dataset_path", type=str,
default="G:/diffusion/Integrated-Design-Diffusion-Model-main/datasets/dataset_demo/class_2")
# Enable automatic mixed precision training (needed)
我在训练集和验证集放入不同的图片，但是训练出来的图片总是和验证集一模一样，请问这是什么原因？

训练集不起作用，而且生成的图片，不同的轮次的高清hr图片跟原本验证集图片一样，但是可以看到训练过程中还有损失比如MSE=0.00471 求解答