ermongroup / ddim Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 170.0 11 KB

Denoising Diffusion Implicit Models

License: MIT License

Python 100.00%

ddim's People

Contributors

Stargazers

Watchers

Forkers

fagan2888 highcwu wn1695173791 shinypond erikrozi jialing-zhang jxzhangjhu ketan0 alexzhou907 294coder oshrinap shubhamanandjain lyapunovstability donglinwu6066 joyies pkulwj1994 killsking yangbinb zzw-zwzhang jskim0406 yanivbl6 fyremael elbo-ai hit-honor xiuyu-li eunbi1 gorluxor swizad techthiyanes foolsholder wyf0912 quandao1099 laplacekorea ningshuliang guillemdb awakermhy jfsantos googqing zmtomorrow shifengxu jags111 godkimchiy emjian maoliyuan erwannmillon lhfazry junhopark0314 abhishek-sekar siyuwang15 zivzone samar-khanna bell-one xinxu-ustc sandrazd459 loctxmoreh zhongyuan1996 qilicun zelokuo patrickvonplaten willdalh gluucose shunsunsun demonsblack hologerry ginlov dhockaday qshao samasin haizhu12 djfatnerd alex-shilei deepaksingh98 l-justice1998 san9min jaedukseo wizyke ihaeyong zf223669 lyndonlens dlshu voldemortgin af-74413592 ptoyip aobusi hashmatshadab christinaliu2020 lysarthas robot-lfr rentainhe chrisraynoor muahkim kwxu kiruzo asikeero d-dimos siyu-li-dream-follower cranial-xix vk-mittal14 changzhijiang fedral

ddim's Issues

How to train a better model?

Hi, I use this code to train a new model, but I find that I cannot get similar FID results.
You successfully provide a new pretrain model on CelebA, so can you give me some advice? Thanks a lot!

Questions about noise distributions during training and sampling

Hi, thanks for your exciting work!
I found that during sampling of a trained diffusion model, replacing the gaussian noise distribution by uniform distribution would encourage more diverse samples. I'm wondering that from Eq. (12) of this paper, is it reasonable to infer that apart from σ, we can also change the noise distributions during sampling without re-training the model?

Loss is not going down

Hello dear authors! I trained ddim in another dataset. In 1200 epoches,, the loss still seems not going down constantly while sometimes loss became large. I wonder whether it is normal?

Question in results

Hi, I used your default parameters to train the DDIM model using 1000 diffusion steps, 800k iterations.

But my reproduced results are very bad, can you provide some suggestions?

fidelity --gpu 9 --fid --input1 exp/image_samples/baseline_cifar10/ --input2 cifar10-train --sample_type generalized

FID: 187.58

Does this repo support multi-GPUs?

It seems that it only supports single GPU for training

Wrong fid in cifar10

Thanks for sharing your code. I clone this repository, then using the following command:

python main.py --config cifar10.yml --doc test1 --sample --fid --timesteps 10 --ni  --use_pretrained

Then use https://github.com/toshas/torch-fidelity to caculate fid as you suggesst, I use:

fidelity --gpu 0 --fid --input1 ~/ddim/exp/image_samples/images --input2 cifar10-train

The result fid is 18.77365, which is a bit different from paper(13.36). Do I need to make any changes or process the images? In addition, I found that the number of generated pictures is 49984. Does this affect the fid calculation?

run sampling process with cifar10 dataset

Hi! :) I wonder if there were ckpt of model trained on cifar10?

I tried to use diffusers from huggingface, but there were some problem with my network so I couldn't download it from the hub.

And also I want to observe layer's activation distribution during sampling(DDIM on cifar10), so the integrated diffusers probably can't achieve this goal.

thx

Question in the paper

I have a question in Table1 in this paper.

In the table1, the cases of eta=1 and hat{sigma} are the case of DDPM.
However, the performances of these cases are different.

What are the differences between those cases?

)

Thank you :

reconstruction from latent code

Hi, thanks for your excellent paper and project !

I want to explore the "5.4 reconstruction from latent code". Can I just reverse the seq and seq_next to produce t and t+1 in the function below to produce the latent code from the input image ?

ddim/functions/denoising.py

Line 10 in 51cb290

def generalized_steps(x, seq, model, b, **kwargs):

Appreciate any help !

FID of CIFAR-10

Solved

Transferability to transformers

First of all, thank you for providing the code!
In the according paper I read that the only difference between DDPM and
DDIM is how samples are generated.
Intuitively, I would then assume that the CNN model could also
be replaced by a transformer-type architecture.
If my understanding of your paper is correct,
one could still use the same simple loss and your DDIM sample generation.
I would highly appreciate your opinion on this.
Thanks in advance, Anthony.

FID of DDPMs on CIFAR-10

Hi,

I found with the converted pretrained CIFAR-10 DDPM,

ddim/runners/diffusion.py

Line 220 in 34d640e

 # This used the pretrained DDPM model, see https://github.com/pesser/pytorch_diffusion 

I got a FID of 5.68 in the setting of timesteps=100, eta=0, which is a margin away from 4.16 reported in the paper. May I know is that about a model you trained by yourselves?

For FID calculation, I use https://github.com/mseitzer/pytorch-fid

Regarding the issue of excessive FID

Thank you very much for your guidance in #29 ! But after that, I encountered other issues.I can't calculate the correct FID score.
I used the CelebA 64 x 64 model you provided on GitHub, but I found that the images generated by the model scored over 30 when calculating the FID compared to the images from the CelebA dataset. I believe there might be some mistakes on my part, and I hope you can provide some guidance.
Could I trouble you to offer some guidance once again? And thank you for your previous generous assistance!

FID of DDPM on CIFAR-10

Hi! :)
I wonder with the given configurations, whether we could recover the FID of 3.17 if we use the code to train a DDPM from scratch on CIFAR-10, since I run for 800k steps now and I could only obtain the FID of about 4.

Question about Lemma 1 in the paper

I have questions on Lemma 1 of the DDIM paper

How to derive equation (25) or (7) in the paper ?
How to use bishop equation (2.115) for the lemma proof ?

sampling code of DDPM

Hi, I found that your provided ddpm code seems to be different from the paper, such as the coefficient of x_t:

(atm1.sqrt() * beta_t) * (1.0 / at).sqrt() * x + ( (1 - beta_t).sqrt() * (1 - atm1)) * x ) / (1.0 - at)

Thanks!

Generation Process not producing quality images.

I have been training on the CIFAR-10 dataset, and no matter how far I train, it continues to generate images which look like this:

Is this a possible issue with the image generation pipeline, or is it a problem with the model not training correctly?

without "train" function!!!!

DDIM inversion

Could you help me in the inversion an image to a gaussion noise

Why use asymmetric padding in downsample?

I don't understand why use it.

class Downsample(nn.Module):
    def __init__(self, in_channels, with_conv):
        super().__init__()
        self.with_conv = with_conv
        if self.with_conv:
            # no asymmetric padding in torch conv, must do it ourselves
            self.conv = torch.nn.Conv2d(
                in_channels, in_channels, kernel_size=3, stride=2, padding=0
            )

    def forward(self, x):
        if self.with_conv:
            pad = (0, 1, 0, 1)
            x = torch.nn.functional.pad(x, pad, mode="constant", value=0)
            x = self.conv(x)
        else:
            x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)
        return x

lsun.py can't load classes list

thank you for this code.

I tried to load several lsun dataset.
but _verify_classes is error

Replacement index 1 out of range for positional args tuple
  File "/home/vscode/ddim/datasets/lsun.py", line 112, in _verify_classes
    verify_str_arg(classes, "classes", dset_opts)

"""
LSUN <https://www.yf.io/p/lsun>_ dataset.

Args:
root (string): Root directory for the database files.
classes (string or list): One of {'train', 'val', 'test'} or a list of
categories to load. e,g. ['bedroom_train', 'church_outdoor_train'].
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
"""

this code is bug or mistake usage

why the activation functions formed like this in the code?

the activation function is the code is :

def nonlinearity(x):
    return x*torch.sigmoid(x)

wondering what is the advantage of it comparing regular RELU and GELU

test data

Hello, may I ask how to add test data to generate high-quality images? What is given in the project is generated from noise

got wrong FID using pretrained model

Hi, I sampled 50000 images using the provided pretrained CelebA model, but got a FID of 5.83 in the setting of timesteps=1000, eta=0, which is a margin away from 3.51 reported in the paper (when timesteps is set to 100, the FID is 10.13 while 6.53 reported in the paper). May I know where the problem is? Is the total CelebA dataset used for calculating FID?

How about the training setting of CelebA model

I am very interested in the training setting of the CelebA model since I retrained one but got a fid of only 4.5 with 1000 steps DDIM sampler. Could you please give me some suggestion about it?

got an unexpected keyword argument `eta`

Thanks for sharing the code!

While I get the following error when I run the demo code of the diffusers.

diffusers/src/diffusers/schedulers/scheduling_ddpm.py line 258 got an unexpected keyword argument `eta`

CelebA

Hi, I want to know how to use the pretrain model of CelebA in your code? I load it unsuccessfully.

Why not clamp in generalized_steps?

Hi,

Thanks for your great work. May I ask, why you do not clamp in generalized_steps as in ddpm_steps?

ddim/functions/denoising.py

Line 24 in 34d640e

x0_preds.append(x0_t.to('cpu'))

using DistributedDataParallel

I found that the code using DataParllel Instead of DistributedDataParallel.
This results in uneven memory distribution, with GPU0 full and other GPUs accounting for only half. I'm using 3090 and the batch size for each GPU training is set to a maximum of 2.
when I modify the code to use DistribtedDataParallel, the memory of each GPU is almost full (the number of batches is still 2), although the training speed becomes faster, but I would like to increase the number of batches to 4, which does not seem to work, I think the emahelper in the code takes up a lot of memory, it needs to update the parameters of the ema model on the GPU, is there any way to solve this problem?

HOW to use the pretrained model of celeba

Special thanks for your outstanding work, I now want to use celeba's pre-trained model, but I don't know how to write the execution command, can you give me a concrete example?

python main.py --config {DATASET}.yml --exp {PROJECT_PATH} --doc {MODEL_NAME} --use_pretrained --sample --fid --timesteps {STEPS} --eta {ETA} -- ni
@willieneis @yang-song @jiamings @Zymrael @chenlin9

Could you please provide guidance on the calculation of FID?

Hello, I greatly appreciate your selfless sharing of this GitHub repository; it has been immensely beneficial for my learning progress. However, I am currently unclear on how to calculate the Fréchet Inception Distance (FID) metric using your code as a foundation. Would it be possible for you to provide some guidance or recommendations on how to perform FID calculations?

Thank you very much for your contributions.

train loss goes very large

Thank you for your work, but i have a problem.
When i trained with cifar10.yml, the loss went down at first, and the sample result looked normal. But the loss becomed very large after a few steps, and the sample result was full of noise.

i also tried to train my custom data but did not meet this problem