rosinality / glow-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of Glow
License: MIT License
PyTorch implementation of Glow
License: MIT License
Paper indicates learning rate be 1e-3 in Appendix C. I wonder why the default of the code is set to be 1e-4.
Any empirical reasons for you to do that?
Update: I found the description in the README.
is there any pretrained models?
Hi, thank you for your amazing work. As you mentioned, using the sigmoid function in the affine coupling layer is benefical to the training. I was wondering why you shifted the sigmoid s = F.sigmoid(log_s + 2)
. Thanks a lot.
안녕하세요. 좋은 코드 감사합니다.
사용 중 질문이 생겨서 이슈에 남깁니다.
어째서 input에 torch_rand 를 넣어서 사용하시는 건가요?
제가 레퍼런스나 다른 구현된 코드 등을 보았을 때는 이에 대한 설명이 없어서 문의 드립니다.
감사합니다.
(한글이나 영어답변 무엇이든 괜찮습니다. 제가 영어가 부족해서 제목에 핵심 문장만 넣었습니다.)
I think there might be a tiny mistake in the dequantization process at the moment.
I think that https://github.com/rosinality/glow-pytorch/blob/master/train.py#L99 should be
n_bins = 2. ** args.n_bits - 1.
rather than
n_bins = 2. ** args.n_bits
since as far as I understand, in the following code snippet the minimum difference in the input levels/bins values (a[1:]-a[:-1]).min()
should be the same as 1/n_bins
(run after image, _ = next(dataset)
on line 109 in train.py
https://github.com/rosinality/glow-pytorch/blob/master/train.py#L109)
In[1]: a = torch.unique(image.reshape(-1))
In[2]: (a[1:]-a[:-1]).min().item()
Out[2]: 0.003921568393707275
In[3]: 1/255.
Out[3]: 0.00392156862745098
In[4]: 1/256.
Out[4]: 0.00390625
Also, it's a bit confusing that by default, the n_bits
is set to 5, whereas by default n_bits
for CelebA is 8, I'd change it to 8.
I have a question about the loss value.
The loss value is the negative log likelihood (bits per dim) in paper, isn't it? When I trained the model, the loss value was much smaller than the best value in paper.
Thank you
I use CelebA 64x64 5-bit for training(4 GPUs), about 2 hours later, the loss is as low as 1.1. At the same time, the sampled image has low visual quality. If I'm not mistaken, the final bpd of Glow on 5-bit CelebA in the paper is 1.02, how could the loss be so small while not well trained?
Hi, given an input image tensor x and extracting the glow model i tried the following:
latent = glow(x)[2]
x_reconstructed = glow.reverse(latent)
Since it is a normalizing flow one would expect, that x_reconstructed is very similar to x, since the only source of errors should be rounding errors. However, I observe very big differences. Does anybody has an explanation for that?
Wow, those generated samples look very good! Do you have any plans to release the model checkpoints (on Google Drive / Dropbox)?
Hi!
Thanks a lot for this repo, it's really great! I am wondering how to reconstruct images, since the current reverse method seems to take a list of images with different sizes.
I tried writing a method:
def reverse_data(self,z):
for i, block in enumerate(self.blocks[::-1]):
if i == 0:
eps = torch.randn_like(z)
input = block.reverse(z,eps)
else:
eps = torch.zeros_like(input)
input = block.reverse(input,eps)
return input
But I am unsure whether this is correct. I would appreciate any help!
Thank you!
Hi, thank you for this simple, beautiful code!
I wanted to know if you've tried this with larger resolutions, as seen in the paper? Does it still work well?
Thank you,
Ryan
python train.py ../train/ --img_size 128 --batch 8
2 x RTX2080 ti
Two cards or one card does not change the error.
Latest.
# packages in environment at /opt/conda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
async-generator 1.10 pypi_0 pypi
attrs 20.3.0 pypi_0 pypi
backcall 0.2.0 py_0
bash-kernel 0.7.2 pypi_0 pypi
beautifulsoup4 4.9.3 pyhb0f4dca_0
blas 1.0 mkl
bleach 3.2.1 pypi_0 pypi
bzip2 1.0.8 h7b6447c_0
ca-certificates 2020.10.14 0
certifi 2020.6.20 pyhd3eb1b0_3
cffi 1.14.0 py38he30daa8_1
chardet 3.0.4 py38_1003
conda 4.9.2 py38h06a4308_0
conda-build 3.20.5 py38_1
conda-package-handling 1.6.1 py38h7b6447c_0
cryptography 2.9.2 py38h1ba5d50_0
cudatoolkit 11.0.221 h6bb024c_0
dataclasses 0.6 pypi_0 pypi
decorator 4.4.2 py_0
defusedxml 0.6.0 pypi_0 pypi
dnspython 2.0.0 pypi_0 pypi
entrypoints 0.3 pypi_0 pypi
filelock 3.0.12 py_0
freetype 2.10.4 h5ab3b9f_0
future 0.18.2 pypi_0 pypi
gdown 3.12.2 pypi_0 pypi
glob2 0.7 py_0
icu 58.2 he6710b0_3
idna 2.9 py_1
intel-openmp 2020.2 254
ipykernel 5.3.4 pypi_0 pypi
ipython 7.19.0 pypi_0 pypi
ipython_genutils 0.2.0 py38_0
ipywidgets 7.5.1 pypi_0 pypi
jedi 0.17.2 py38_0
jinja2 2.11.2 py_0
jpeg 9b h024ee3a_2
json5 0.9.5 pypi_0 pypi
jsonschema 3.2.0 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 6.1.7 pypi_0 pypi
jupyter-console 6.2.0 pypi_0 pypi
jupyter-core 4.7.0 pypi_0 pypi
jupyterlab 2.2.9 pypi_0 pypi
jupyterlab-pygments 0.1.2 pypi_0 pypi
jupyterlab-server 1.2.0 pypi_0 pypi
lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libarchive 3.4.2 h62408e4_0
libedit 3.1.20181209 hc058e9b_0
libffi 3.3 he6710b0_1
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
liblief 0.10.1 he6710b0_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_1
libuv 1.40.0 h7b6447c_0
libxml2 2.9.10 hb55368b_3
lz4-c 1.9.2 heb0550a_3
markupsafe 1.1.1 py38h7b6447c_0
mistune 0.8.4 pypi_0 pypi
mkl 2020.2 256
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.2.0 py38h23d657b_0
mkl_random 1.1.1 py38h0573a6f_0
nbclient 0.5.1 pypi_0 pypi
nbconvert 6.0.7 pypi_0 pypi
nbformat 5.0.8 pypi_0 pypi
ncurses 6.2 he6710b0_1
nest-asyncio 1.4.3 pypi_0 pypi
ninja 1.10.1 py38hfd86e86_0
notebook 5.7.5 pypi_0 pypi
numpy 1.19.2 py38h54aff64_0
numpy-base 1.19.2 py38hfa32c7d_0
olefile 0.46 py_0
openssl 1.1.1h h7b6447c_0
packaging 20.4 pypi_0 pypi
pandocfilters 1.4.3 pypi_0 pypi
parso 0.7.0 py_0
patchelf 0.12 he6710b0_0
pexpect 4.8.0 py38_0
pickleshare 0.7.5 py38_1000
pillow 8.0.0 py38h9a89aac_0
pip 20.0.2 py38_3
pkginfo 1.6.0 py38_0
prometheus-client 0.9.0 pypi_0 pypi
prompt-toolkit 3.0.8 py_0
psutil 5.7.2 py38h7b6447c_0
ptyprocess 0.6.0 py38_0
py-lief 0.10.1 py38h403a769_0
pycosat 0.6.3 py38h7b6447c_1
pycparser 2.20 py_0
pygments 2.7.1 py_0
pyopenssl 19.1.0 py38_0
pyparsing 2.4.7 pypi_0 pypi
pyrsistent 0.17.3 pypi_0 pypi
pysocks 1.7.1 py38_0
python 3.8.3 hcff3b4d_0
python-dateutil 2.8.1 pypi_0 pypi
python-etcd 0.4.5 pypi_0 pypi
python-libarchive-c 2.9 py_0
pytorch 1.7.0 py3.8_cuda11.0.221_cudnn8.0.3_0 pytorch
pytz 2020.1 py_0
pyyaml 5.3.1 py38h7b6447c_0
pyzmq 20.0.0 pypi_0 pypi
qtconsole 4.7.7 pypi_0 pypi
qtpy 1.9.0 pypi_0 pypi
readline 8.0 h7b6447c_0
requests 2.23.0 py38_0
ripgrep 12.1.1 0
ruamel_yaml 0.15.87 py38h7b6447c_0
scipy 1.5.2 py38h0b6359f_0
send2trash 1.5.0 pypi_0 pypi
setuptools 46.4.0 py38_0
six 1.14.0 py38_0
soupsieve 2.0.1 py_0
sqlite 3.31.1 h62c20be_1
terminado 0.9.1 pypi_0 pypi
testpath 0.4.4 pypi_0 pypi
tk 8.6.8 hbc83047_0
torchelastic 0.2.1 pypi_0 pypi
torchvision 0.8.0 py38_cu110 pytorch
tornado 5.1.1 pypi_0 pypi
tqdm 4.46.0 py_0
traitlets 5.0.5 py_0
typing_extensions 3.7.4.3 py_0
urllib3 1.25.8 py38_0
wcwidth 0.2.5 py_0
webencodings 0.5.1 pypi_0 pypi
wheel 0.34.2 py38_0
widgetsnbextension 3.5.1 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yaml 0.1.7 had09818_2
zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0
Namespace(affine=False, batch=8, img_size=128, iter=200000, lr=0.0001, n_bits=5, n_block=4, n_flow=32, n_sample=20, no_lu=False, path='../train/', temp=0.7)
/workspace/glow-pytorch/model.py:102: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/utils/tensor_numpy.cpp:141.)
w_s = torch.from_numpy(w_s)
Loss: 2.15042; logP: -2.13823; logdet: 4.98781; lr: 0.0001000: 0%| | 1/200000
Traceback (most recent call last):
File "train.py", line 177, in <module>
train(args, model, optimizer)
File "train.py", line 148, in train
model_single.reverse(z_sample).cpu().data,
File "/workspace/glow-pytorch/model.py", line 367, in reverse
input = block.reverse(z_list[-1], z_list[-1], reconstruct=reconstruct)
File "/workspace/glow-pytorch/model.py", line 322, in reverse
input = flow.reverse(input)
File "/workspace/glow-pytorch/model.py", line 239, in reverse
input = self.invconv.reverse(input)
File "/workspace/glow-pytorch/model.py", line 136, in reverse
return F.conv2d(output, weight.squeeze().inverse().unsqueeze(2).unsqueeze(3))
RuntimeError: cusolver error: 7, when calling `cusolverDnCreate(handle)`
This bug happens when doing reverse calculation when i % 100 == 0
. I changed it to i == 1
to faster the bug reproduction.
And, changing w_s = torch.from_numpy(w_s)
to w_s = torch.from_numpy(w_s.copy())
turn offs all warnings above. But the error still occurs.
Hi , I try to base your code run on MNIST , but I get negative value loss . How can I compute NLL value per dims ? Sorry , maybe is a stupid question . Thanks !
Thanks for your great implementation.
The learned prior are added in every block in your implementation, which produces more promising results, however, the original implementation adds the prior only at the end of the glow,
your prior
original prior and the location of prior usage
Could you please provide an example or tell me how to learn the y condition (image label)?
Thanks again!
Hi,
I want to change image_size to 256. Do I need to change the value of n_bits and n_bins?
Another question I want to ask is the meaning of the code from lines 111 to 116 in https://github.com/rosinality/glow-pytorch/blob/master/train.py#L111
Thank you
Viet Nguyen
Traceback (most recent call last):
File "D:/Productivity/AchievedAlgorithm/Glow-pytorch-master/train.py", line 192, in
train(args, model, optimizer)
File "D:/Productivity/AchievedAlgorithm/Glow-pytorch-master/train.py", line 139, in train
image + torch.rand_like(image) / n_bins
File "D:\Anaconda\envs\torch_cpu\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Productivity\AchievedAlgorithm\Glow-pytorch-master\model.py", line 355, in forward
out, det, log_p, z_new = block(out)
File "D:\Anaconda\envs\torch_cpu\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Productivity\AchievedAlgorithm\Glow-pytorch-master\model.py", line 280, in forward
out, det = flow(out)
File "D:\Anaconda\envs\torch_cpu\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Productivity\AchievedAlgorithm\Glow-pytorch-master\model.py", line 227, in forward
out, logdet = self.actnorm(input)
File "D:\Anaconda\envs\torch_cpu\lib\site-packages\torch\nn\modules\module.py", line 1051, in call_impl
return forward_call(*input, **kwargs)
File "D:\Productivity\AchievedAlgorithm\Glow-pytorch-master\model.py", line 46, in forward
self.initialize(input)
File "D:\Productivity\AchievedAlgorithm\Glow-pytorch-master\model.py", line 39, in initialize
self.loc.data.copy(-mean)
RuntimeError: The size of tensor a (12) must match the size of tensor b (4) at non-singleton dimension 1
Thank you for your code. It looks like you have tried to use nn.DataParallel
but didn't quite include it in there. Can you tell me your experience with it?
For some reason, the loss kept increasing when I used nn.DataParallel
with 2 GPUs regardless of batch size. To make it run with your code, I changed your calc_loss
a little bit by expanding logdet
to have same size as log_p
. I also tried logdet.mean()
, but it didn't work either. Here, I'm not really sure why logdet
values are different for the 2 GPUs, as it seems to depend on shared weights only.
Hi,
Thank you for your great implementation.
Regarding the "learned" prior, I wanted to ask:
1- Why are you considering the prior to be a Gaussian with trainable parameters rather than a unit Gaussian, for instance.
2- For the Gaussian prior, what is the motivation behind obtaining the mean and std of the Gaussian by passing out
through the CNN? Is it just because you found it to be more useful? (https://github.com/rosinality/glow-pytorch/blob/master/model.py#L285)
Thanks in advance.
Issue
I've printed log_p
and logdet
returned by the model. It seems that log_p_sum
has the size equal to the batch size (which is fine) but the logdet
contains a single value. Should logdet
have the same dimensions as log_p
?
Code for reproducing
Add the following code to train.py line 120.
print(log_p.size(), logdet.size())
Thank you for the repo! Upon reloading the Glow model from save after training, cross entropy performance diminished. This was likely because the ActNorm module was being reinitialized on the first input batch to a value the model didn't expect. I added a constructor flag to specify whether to reinitialize, and this seemed to fix the problem. I would imagine this is a larger issue for datasets with larger variance and smaller batch size.
Fix:
Do you know how to map your loss values to bits per dimension results (see Table 2 in the paper)? I'm having a hard time trying to come up with a formula for the correspondence? Some reddit post mentions subtracting math.log(128)
to take into account scaling, but it still doesn't seem right.
I looked at the original implementation in Tensorflow but couldn't figure that out. Would you mind letting me know what you think about it? Also, do you know how close your implementation is compared to the original code in terms of bits per dimension? Thank you.
In my experiment, when the data set is small, the experiment can run normally. When the number of data sets is large, the loss value always changes to Nan when there are thousands of epochs
Hi,
first off, thank you for you implementation!
I found, that you've deviated from the original OpenAI implementation and enabled to produce the prior parameters (mean, logsd) from the intermediate flow splits via an additional convolution
Line 285 in 97081ff
Unfortunately, this little change leads to the Gauss distribution being unnormalized. If you think about it: let's say the convolution that produces the prior parameters (mean, logsd) will learn to produce mean = z_new, logsd = 0
. Then the gaussian prior likelihood for z_new
: N(z_new; mean=z_new, sd=1)
will always have maximum likelihood, since the query values z_new
are equal to the distribution mode. So, if you integrate this term for all z_new in R
(lets keep it simple one-dimensional), you will end up with a value > 1 (should be infinity), showing that this way of defining the prior leads to an unnormalized distribution.
To fix this you need to (as done in the original OpenAI implementation) remove the condition of the mean and logsd on the split variables and just learn them from a fixed input. Relevant pieces of the original implementation:
Prior is created unconditionally
https://github.com/openai/glow/blob/master/model.py#L180
Fixed zero input
https://github.com/openai/glow/blob/master/model.py#L109
If parameters are learned, apply convolution on fixed input
https://github.com/openai/glow/blob/master/model.py#L111
Create prior from mean and logsd
https://github.com/openai/glow/blob/master/model.py#L116
Hi! I tried to run your code. The network starts out training well and decreases the loss but after a few iterations, the loss just starts to increase. I tried smaller initial learning rate, but still not working. Could you please help me fix this problem? Thank you!
Hi!
Great code, thanks for sharing.
How can I use checkpoint files in order to continue training sessions after pausing?
Eyal
Hi, Thanks for your nice work! I am new to the glow model, so I have some stupid questions, and I don't solve them even if I try to google.
The flow model can translate the input
I appreciate your answer and hope you have a good day!
In the paper, when x is continuous, the loss is calculated by feeding
if i == 0:
with torch.no_grad():
log_p, logdet, _ = model.module(
image + torch.rand_like(image) / n_bins
)
continue
else:
log_p, logdet, _ = model(image + torch.rand_like(image) / n_bins)
First thanks for this work. I just generated some samples using this project with celebA dataset. However, I felt somehow confused when i tried to debug it. As you can see in the first pic, it shows that the default paramter for affine coupling is not correct. It seems that affine paramter keeps false even the default value is true( in variable, watch mini window and also for the cursor).
I think maybe something wrong with argparse, which leads to the default false value for coupling layer.
Hi,
Firstly, thank you for your easily understandable code.
I want to use z_space for manipulation for my studying, I tried to train your code, but it takes me about 8 hours for 1000 iterations.
So can you public your file checkpoint model and optimizer in the last iterations?
Thank you very much!
Hello! Great work on following the implementation of the paper in the code. Makes the paper very lucid.
I have a couple of questions on the AffineCoupling class under the model.py file.
When I set use affine coupling rather than additive coupling, training on cifar10, n_flow = 32, n_block = 3, the weights will rapidly become Nan, and sample s become all black. But the same phenomenon won't appear when I use additive coupling.
hi:
I donot know how to prepair the dataset , for example, if I want to train a model on CeleBA HQ to edit eye / eyebrow /beard / nose / mouth,
what steps I can do to prepair dataset?
thanks a lot
When I try to train on one-channel images, there are dimension mismatches in the initialization of the forward function when using a 32x32 image custom dataset. I get the following error in line 39 of the model:
The expanded size of the tensor (4) must match the existing size (12) at non-singleton dimension 1. Target sizes: [1, 4, 1, 1]. Tensor sizes: [1, 12, 1, 1]
When trying to train on MNIST, there is an error when squeezing the input in line 273 of the model. I get the following error message:
RuntimeError: shape '[32, 4, 3, 2, 3, 2]' is invalid for input of size 6272
the n_bits not only add some noise but influence the training a lot, but what does it mean?
안녕하세요. 좋은 코드 감사합니다.
사용 중 질문이 생겨서 이슈에 남깁니다.
어째서 input에 torch_rand 를 넣어서 사용하시는 건가요?
제가 레퍼런스나 다른 구현된 코드 등을 보았을 때는 이에 대한 설명이 없어서 문의 드립니다.
감사합니다.
(한글이나 영어답변 무엇이든 괜찮습니다. 제가 영어가 부족해서 제목에 핵심 문장만 넣었습니다.)
Hi rosinality,
Your code is clean and easy to read. Thank you for your effort.
I have one question: During the sampling process, why are we sampling z from the standard normal distribution (with temperature)? Shouldn't we sample from the learned p(z)? Is it because p(z) is dependant on the data so that we cannot sample from it? (In the implementation, if I'm understanding it correctly, p(z) has four components, three of them are dependent on both the data and the model, while the last one is only dependant on the model.)
Thanks.
微信:lovedaixiaobaby
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.