Giter Club home page Giter Club logo

ddim's People

Contributors

chenlin9 avatar jiamings avatar patrickvonplaten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddim's Issues

How to train a better model?

Hi, I use this code to train a new model, but I find that I cannot get similar FID results.
You successfully provide a new pretrain model on CelebA, so can you give me some advice? Thanks a lot!

Questions about noise distributions during training and sampling

Hi, thanks for your exciting work!
I found that during sampling of a trained diffusion model, replacing the gaussian noise distribution by uniform distribution would encourage more diverse samples. I'm wondering that from Eq. (12) of this paper, is it reasonable to infer that apart from σ, we can also change the noise distributions during sampling without re-training the model?
image

Loss is not going down

image
Hello dear authors! I trained ddim in another dataset. In 1200 epoches,, the loss still seems not going down constantly while sometimes loss became large. I wonder whether it is normal?

Question in results

Hi, I used your default parameters to train the DDIM model using 1000 diffusion steps, 800k iterations.

But my reproduced results are very bad, can you provide some suggestions?

fidelity --gpu 9 --fid --input1 exp/image_samples/baseline_cifar10/ --input2 cifar10-train --sample_type generalized

FID: 187.58

Wrong fid in cifar10

Thanks for sharing your code. I clone this repository, then using the following command:

python main.py --config cifar10.yml --doc test1 --sample --fid --timesteps 10 --ni  --use_pretrained

Then use https://github.com/toshas/torch-fidelity to caculate fid as you suggesst, I use:

fidelity --gpu 0 --fid --input1 ~/ddim/exp/image_samples/images --input2 cifar10-train

The result fid is 18.77365, which is a bit different from paper(13.36). Do I need to make any changes or process the images? In addition, I found that the number of generated pictures is 49984. Does this affect the fid calculation?

run sampling process with cifar10 dataset

Hi! :) I wonder if there were ckpt of model trained on cifar10?

I tried to use diffusers from huggingface, but there were some problem with my network so I couldn't download it from the hub.

And also I want to observe layer's activation distribution during sampling(DDIM on cifar10), so the integrated diffusers probably can't achieve this goal.

thx

Question in the paper

I have a question in Table1 in this paper.

In the table1, the cases of eta=1 and hat{sigma} are the case of DDPM.
However, the performances of these cases are different.

What are the differences between those cases?
화면 캡처 2022-06-15 161148
)

Thank you :

reconstruction from latent code

Hi, thanks for your excellent paper and project !

I want to explore the "5.4 reconstruction from latent code". Can I just reverse the seq and seq_next to produce t and t+1 in the function below to produce the latent code from the input image ?

def generalized_steps(x, seq, model, b, **kwargs):

Appreciate any help !

Transferability to transformers

First of all, thank you for providing the code!
In the according paper I read that the only difference between DDPM and
DDIM is how samples are generated.
Intuitively, I would then assume that the CNN model could also
be replaced by a transformer-type architecture.
If my understanding of your paper is correct,
one could still use the same simple loss and your DDIM sample generation.
I would highly appreciate your opinion on this.
Thanks in advance, Anthony.

Regarding the issue of excessive FID

Thank you very much for your guidance in #29 ! But after that, I encountered other issues.I can't calculate the correct FID score.
I used the CelebA 64 x 64 model you provided on GitHub, but I found that the images generated by the model scored over 30 when calculating the FID compared to the images from the CelebA dataset. I believe there might be some mistakes on my part, and I hope you can provide some guidance.
Could I trouble you to offer some guidance once again? And thank you for your previous generous assistance!

FID of DDPM on CIFAR-10

Hi! :)
I wonder with the given configurations, whether we could recover the FID of 3.17 if we use the code to train a DDPM from scratch on CIFAR-10, since I run for 800k steps now and I could only obtain the FID of about 4.

sampling code of DDPM

Hi, I found that your provided ddpm code seems to be different from the paper, such as the coefficient of x_t:

(atm1.sqrt() * beta_t) * (1.0 / at).sqrt() * x + ( (1 - beta_t).sqrt() * (1 - atm1)) * x ) / (1.0 - at)

image

Thanks!

Generation Process not producing quality images.

I have been training on the CIFAR-10 dataset, and no matter how far I train, it continues to generate images which look like this:

428

Is this a possible issue with the image generation pipeline, or is it a problem with the model not training correctly?

DDIM inversion

Could you help me in the inversion an image to a gaussion noise

Why use asymmetric padding in downsample?

I don't understand why use it.

class Downsample(nn.Module):
    def __init__(self, in_channels, with_conv):
        super().__init__()
        self.with_conv = with_conv
        if self.with_conv:
            # no asymmetric padding in torch conv, must do it ourselves
            self.conv = torch.nn.Conv2d(
                in_channels, in_channels, kernel_size=3, stride=2, padding=0
            )

    def forward(self, x):
        if self.with_conv:
            pad = (0, 1, 0, 1)
            x = torch.nn.functional.pad(x, pad, mode="constant", value=0)
            x = self.conv(x)
        else:
            x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)
        return x

lsun.py can't load classes list

thank you for this code.

I tried to load several lsun dataset.
but _verify_classes is error

Replacement index 1 out of range for positional args tuple
  File "/home/vscode/ddim/datasets/lsun.py", line 112, in _verify_classes
    verify_str_arg(classes, "classes", dset_opts)

"""
LSUN <https://www.yf.io/p/lsun>_ dataset.

Args:
root (string): Root directory for the database files.
classes (string or list): One of {'train', 'val', 'test'} or a list of
categories to load. e,g. ['bedroom_train', 'church_outdoor_train'].
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
"""

this code is bug or mistake usage

test data

Hello, may I ask how to add test data to generate high-quality images? What is given in the project is generated from noise

got wrong FID using pretrained model

Hi, I sampled 50000 images using the provided pretrained CelebA model, but got a FID of 5.83 in the setting of timesteps=1000, eta=0, which is a margin away from 3.51 reported in the paper (when timesteps is set to 100, the FID is 10.13 while 6.53 reported in the paper). May I know where the problem is? Is the total CelebA dataset used for calculating FID?

How about the training setting of CelebA model

I am very interested in the training setting of the CelebA model since I retrained one but got a fid of only 4.5 with 1000 steps DDIM sampler. Could you please give me some suggestion about it?

got an unexpected keyword argument `eta`

Thanks for sharing the code!

While I get the following error when I run the demo code of the diffusers.

diffusers/src/diffusers/schedulers/scheduling_ddpm.py line 258 got an unexpected keyword argument `eta`

CelebA

Hi, I want to know how to use the pretrain model of CelebA in your code? I load it unsuccessfully.

using DistributedDataParallel

I found that the code using DataParllel Instead of DistributedDataParallel.
This results in uneven memory distribution, with GPU0 full and other GPUs accounting for only half. I'm using 3090 and the batch size for each GPU training is set to a maximum of 2.
when I modify the code to use DistribtedDataParallel, the memory of each GPU is almost full (the number of batches is still 2), although the training speed becomes faster, but I would like to increase the number of batches to 4, which does not seem to work, I think the emahelper in the code takes up a lot of memory, it needs to update the parameters of the ema model on the GPU, is there any way to solve this problem?

Could you please provide guidance on the calculation of FID?

Hello, I greatly appreciate your selfless sharing of this GitHub repository; it has been immensely beneficial for my learning progress. However, I am currently unclear on how to calculate the Fréchet Inception Distance (FID) metric using your code as a foundation. Would it be possible for you to provide some guidance or recommendations on how to perform FID calculations?

Thank you very much for your contributions.

train loss goes very large

Thank you for your work, but i have a problem.
When i trained with cifar10.yml, the loss went down at first, and the sample result looked normal. But the loss becomed very large after a few steps, and the sample result was full of noise.
image
i also tried to train my custom data but did not meet this problem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.