ermongroup / ddim Goto Github PK
View Code? Open in Web Editor NEWDenoising Diffusion Implicit Models
License: MIT License
Denoising Diffusion Implicit Models
License: MIT License
Hi, I use this code to train a new model, but I find that I cannot get similar FID results.
You successfully provide a new pretrain model on CelebA, so can you give me some advice? Thanks a lot!
Hi, thanks for your exciting work!
I found that during sampling of a trained diffusion model, replacing the gaussian noise distribution by uniform distribution would encourage more diverse samples. I'm wondering that from Eq. (12) of this paper, is it reasonable to infer that apart from σ, we can also change the noise distributions during sampling without re-training the model?
Hi, I used your default parameters to train the DDIM model using 1000 diffusion steps, 800k iterations.
But my reproduced results are very bad, can you provide some suggestions?
fidelity --gpu 9 --fid --input1 exp/image_samples/baseline_cifar10/ --input2 cifar10-train --sample_type generalized
FID: 187.58
It seems that it only supports single GPU for training
Thanks for sharing your code. I clone this repository, then using the following command:
python main.py --config cifar10.yml --doc test1 --sample --fid --timesteps 10 --ni --use_pretrained
Then use https://github.com/toshas/torch-fidelity to caculate fid as you suggesst, I use:
fidelity --gpu 0 --fid --input1 ~/ddim/exp/image_samples/images --input2 cifar10-train
The result fid is 18.77365, which is a bit different from paper(13.36). Do I need to make any changes or process the images? In addition, I found that the number of generated pictures is 49984. Does this affect the fid calculation?
Hi! :) I wonder if there were ckpt of model trained on cifar10?
I tried to use diffusers from huggingface, but there were some problem with my network so I couldn't download it from the hub.
And also I want to observe layer's activation distribution during sampling(DDIM on cifar10), so the integrated diffusers probably can't achieve this goal.
thx
Hi, thanks for your excellent paper and project !
I want to explore the "5.4 reconstruction from latent code". Can I just reverse the seq and seq_next to produce t and t+1 in the function below to produce the latent code from the input image ?
Line 10 in 51cb290
Appreciate any help !
Solved
First of all, thank you for providing the code!
In the according paper I read that the only difference between DDPM and
DDIM is how samples are generated.
Intuitively, I would then assume that the CNN model could also
be replaced by a transformer-type architecture.
If my understanding of your paper is correct,
one could still use the same simple loss and your DDIM sample generation.
I would highly appreciate your opinion on this.
Thanks in advance, Anthony.
Hi,
I found with the converted pretrained CIFAR-10 DDPM,
Line 220 in 34d640e
I got a FID of 5.68 in the setting of timesteps=100
, eta=0
, which is a margin away from 4.16 reported in the paper. May I know is that about a model you trained by yourselves?
For FID calculation, I use https://github.com/mseitzer/pytorch-fid
Thank you very much for your guidance in #29 ! But after that, I encountered other issues.I can't calculate the correct FID score.
I used the CelebA 64 x 64 model you provided on GitHub, but I found that the images generated by the model scored over 30 when calculating the FID compared to the images from the CelebA dataset. I believe there might be some mistakes on my part, and I hope you can provide some guidance.
Could I trouble you to offer some guidance once again? And thank you for your previous generous assistance!
Hi! :)
I wonder with the given configurations, whether we could recover the FID of 3.17 if we use the code to train a DDPM from scratch on CIFAR-10, since I run for 800k steps now and I could only obtain the FID of about 4.
I have questions on Lemma 1 of the DDIM paper
Hi, I found that your provided ddpm code seems to be different from the paper, such as the coefficient of x_t:
(atm1.sqrt() * beta_t) * (1.0 / at).sqrt() * x + ( (1 - beta_t).sqrt() * (1 - atm1)) * x ) / (1.0 - at)
Thanks!
Could you help me in the inversion an image to a gaussion noise
I don't understand why use it.
class Downsample(nn.Module):
def __init__(self, in_channels, with_conv):
super().__init__()
self.with_conv = with_conv
if self.with_conv:
# no asymmetric padding in torch conv, must do it ourselves
self.conv = torch.nn.Conv2d(
in_channels, in_channels, kernel_size=3, stride=2, padding=0
)
def forward(self, x):
if self.with_conv:
pad = (0, 1, 0, 1)
x = torch.nn.functional.pad(x, pad, mode="constant", value=0)
x = self.conv(x)
else:
x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)
return x
thank you for this code.
I tried to load several lsun dataset.
but _verify_classes is error
Replacement index 1 out of range for positional args tuple
File "/home/vscode/ddim/datasets/lsun.py", line 112, in _verify_classes
verify_str_arg(classes, "classes", dset_opts)
"""
LSUN <https://www.yf.io/p/lsun>
_ dataset.
Args:
root (string): Root directory for the database files.
classes (string or list): One of {'train', 'val', 'test'} or a list of
categories to load. e,g. ['bedroom_train', 'church_outdoor_train'].
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, transforms.RandomCrop
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
"""
this code is bug or mistake usage
the activation function is the code is :
def nonlinearity(x):
return x*torch.sigmoid(x)
wondering what is the advantage of it comparing regular RELU and GELU
Hello, may I ask how to add test data to generate high-quality images? What is given in the project is generated from noise
Hi, I sampled 50000 images using the provided pretrained CelebA model, but got a FID of 5.83 in the setting of timesteps=1000, eta=0, which is a margin away from 3.51 reported in the paper (when timesteps is set to 100, the FID is 10.13 while 6.53 reported in the paper). May I know where the problem is? Is the total CelebA dataset used for calculating FID?
I am very interested in the training setting of the CelebA model since I retrained one but got a fid of only 4.5 with 1000 steps DDIM sampler. Could you please give me some suggestion about it?
Thanks for sharing the code!
While I get the following error when I run the demo code of the diffusers.
diffusers/src/diffusers/schedulers/scheduling_ddpm.py line 258 got an unexpected keyword argument `eta`
Hi, I want to know how to use the pretrain model of CelebA in your code? I load it unsuccessfully.
Hi,
Thanks for your great work. May I ask, why you do not clamp in generalized_steps as in ddpm_steps?
Line 24 in 34d640e
I found that the code using DataParllel Instead of DistributedDataParallel.
This results in uneven memory distribution, with GPU0 full and other GPUs accounting for only half. I'm using 3090 and the batch size for each GPU training is set to a maximum of 2.
when I modify the code to use DistribtedDataParallel, the memory of each GPU is almost full (the number of batches is still 2), although the training speed becomes faster, but I would like to increase the number of batches to 4, which does not seem to work, I think the emahelper in the code takes up a lot of memory, it needs to update the parameters of the ema model on the GPU, is there any way to solve this problem?
Special thanks for your outstanding work, I now want to use celeba's pre-trained model, but I don't know how to write the execution command, can you give me a concrete example?
python main.py --config {DATASET}.yml --exp {PROJECT_PATH} --doc {MODEL_NAME} --use_pretrained --sample --fid --timesteps {STEPS} --eta {ETA} -- ni
@willieneis @yang-song @jiamings @Zymrael @chenlin9
Hello, I greatly appreciate your selfless sharing of this GitHub repository; it has been immensely beneficial for my learning progress. However, I am currently unclear on how to calculate the Fréchet Inception Distance (FID) metric using your code as a foundation. Would it be possible for you to provide some guidance or recommendations on how to perform FID calculations?
Thank you very much for your contributions.
Thank you for your work, but i have a problem.
When i trained with cifar10.yml, the loss went down at first, and the sample result looked normal. But the loss becomed very large after a few steps, and the sample result was full of noise.
i also tried to train my custom data but did not meet this problem
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.