chrischute / glow Goto Github PK

View Code? Open in Web Editor NEW

78.0 78.0 21.0 11.17 MB

Implementation of Glow in PyTorch

License: MIT License

Python 100.00%

glow's People

Contributors

Stargazers

Watchers

glow's Issues

about conda and training time

Hello,
I can run the code without conda, but it takes me about 6 hours to train an epoch with 2 1080Ti GPUs. I can't figure out what happened. Is this the influence of conda?

BPD abnormally high

Hi,

I am taking the code from the repo as-is and running it. The only difference is I am running it a different PyTorch version (1.6.0).

For some test_runs (for e.g. 10, 20 epochs), the BPD score is abnormally high,

50000/50000 [16:59<00:00, 49.03it/s, bpd=9.49e+11, lr=0.0003, nll=2.02e+15]

For 2/3 runs even after a 50 epochs I am getting such scores.
Is there something wrong that I am doing during the training or the pytorch version is affecting the results so much?

Cublas error

The version of pytorch in the environment.yml gives a cublas error:

RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCBlas.cu:511

Do you have any idea as to why it occurs?

Generating noise with low BPD and high NLL

In my implementation my BPD and NLL seem to be going down but the quality of the images produced seems to be indistinguishable from plain white noise.

Epoch 0: bpd=4.84,nll=3.44e+3
Epoch 1: bpd=3.58, nll=2.54e+3
Epoch 20: bpd=2.88, nll=2.04e+3
Epoch 29: bpd=2.83,nll=2.01e+3
This is for the training set , validation remains fairly the same bpd±0.02,nll±0.02
So, I don't think it is overfitting (N=512,L=3,K=32)
The dataset is FashionMNIST .

Do I let it train for longer or is it maybe something else?
Would appreciate your input

Loss function

I could not understand how do you compute the loss function? Especially the computation of prior_ll part.
It would be great if you could explain what you are doing there

One-line change reduce memory from 11 GB to 2 GB.

The original glow code uses gradient checkpointing, a very efficient way of reducing peak memory consumption. The following single line adds gradient checkpointing in away that memory consumption from 11 GB to 2 GB. It allowed me to increased batch size from 64 to 256 with no issue. I think 512 is possible, maybe even 1024 if we use float16 for some of the layers.

glow/models/glow/coupling.py

Line 28 in 59ed99f

st = self.nn(x_id)

def forward(self, x, ldj, reverse=False):
        x_change, x_id = x.chunk(2, dim=1)

        #st = self.nn(x_id) # change this line to the one below. 
        st = torch.utils.checkpoint.checkpoint(self.nn, x_id)
        s, t = st[:, 0::2, ...], st[:, 1::2, ...]
        s = self.scale * torch.tanh(s)

Adding to sldj the log densities of splitted variables

Hi!

I wonder, why there is no addition of log(p(z_i)) (i=1..L-1) to the sum of log-det jacobians after split operation in lines 107-111 of glow.py file. Also I have a question about line 110 of the same file: why does one concatenate back already split tensors? Should we let only one part of the split flow further, don't we? Or do I misunderstand something?

Thanks!

Affine coupling forward

Should not line num 38 in coupling .py be

x_change = (x_change * s.exp()) + t

instead of

x_change = (x_change + t) * s.exp()

How could the model be conditioned on image classes?

Hey, thank you very much for this nice repository! I was wondering, what would be the most convenient way to condition Glow model on image classes?

NN in coupling layer

It seems in the paper, the description of the NN in the coupling layer is three convolutions with relu activations. Here, you also include an actnorm or batchnorm in between the convolutions. The other pytorch implementations + the tensorflow implementations don't include this. Should the normalization be there?

Pretrained weights

Very nice implementation, the code structure is very nice.

Is it possible you are willing to share pre trained weights?

chrischute / glow Goto Github PK

glow's People

Contributors

Stargazers

Watchers

Forkers

glow's Issues

about conda and training time

BPD abnormally high

Cublas error

Generating noise with low BPD and high NLL

Loss function

One-line change reduce memory from 11 GB to 2 GB.

Adding to sldj the log densities of splitted variables

Affine coupling forward

How could the model be conditioned on image classes?

NN in coupling layer

Pretrained weights

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent