Comments (12)
I found it from official implementations. It was beneficial to training stability. I think the intuition of this is not found in the papers, but I think having spatial, content dependent priors will make overall problem easier.
from glow-pytorch.
Hi,
I have some detailed questions regarding your code. I will be happy if you have the time to answer them.
-
In both the Actornom and AffineCoupling modules where we need to apply scale and shift operations, in both your code and the original OpenAI code you are first applying the shift and then scale. Although it should not matter very much as long as optimization is concerned, did you find any reason to switch the operations? In the paper, the scale operation is done before the shift operation (Code lines: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L54 and https://github.com/rosinality/glow-pytorch/blob/master/model.py#L184).
-
Can you explain why there is an additional learnable (scale * 3) factor in the ZeroConv2d module (like OpenAI code)? And why is the scale is multiplied by 3? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L151)
-
Can you explain why the input is shifted by +2 before applying Sigmoid in the AffinCouping module (like OpenAI code)? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L182)
-
Why are you padding with 1 (and not 0) in the ZeroConv2D module (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L149), while you are padding with 0 in the other Conv2D layers in the AffineCoupling? OpeAI pads with 0 if I am not mistaken (https://github.com/openai/glow/blob/master/tfops.py#L203).
-
If I am not mistaken, in the official implementation, the authors incorporate additional ActNorm right after both the Conv2D and ZeroConv2D layers (Code lines https://github.com/openai/glow/blob/master/tfops.py#L256 and https://github.com/openai/glow/blob/master/tfops.py#L309). Is there any reason you have not considered this in your code?
I appreciate your time.
Thank you in advance.
from glow-pytorch.
It will help features maps to have different values (or kind of biases) on the edge. I think zero padding can also achieves that some degree, though.
from glow-pytorch.
Thanks for the response!
I will write here in case I gain more knowledge about this technique.
from glow-pytorch.
Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.
- It could make optimization process different as location parameter is also scaled.
- I don't know why...Maybe it is to accelerate adaptation during initial training phases.
- It will make scaling factor close to 1 - This will be helpful for information propagation.
- I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)
- Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.
from glow-pytorch.
Hi,
Thanks for your response, you are right.
I have a question about the amount of noise you add to each data point when training. As can be seen in your code here (and OpenAI's code), you are adding noise to the data based on 1/(2 ** n_bits)
. Since the 8-bit image pixels which naturally take values from [0-255] are divided by 255 when bein converted to [0-1] interval, why should we not choose 1/(2 ** n_bits - 1)
, that is, 1/255
in case of 8-bit images as the right quantity?
I appreciate your thoughts.
Thanks very much in advance.
from glow-pytorch.
Hmm, wouldn't it be better to have noises slight smaller than actual pixel value changes in [0, 255]? Though I guess it will not make much differences.
from glow-pytorch.
Yes, might be. I also think it does not make a huge difference overall.
from glow-pytorch.
Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.
- It could make optimization process different as location parameter is also scaled.
- I don't know why...Maybe it is to accelerate adaptation during initial training phases.
- It will make scaling factor close to 1 - This will be helpful for information propagation.
- I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)
- Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.
Hi, after investigating the Glow official implementation more thorougly, I wanted to make some clarifications regarding this topic:
-
First, I think they alway pad with with one, since they use the
add_edge_padding
function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right? -
Secondly, about the Actnorm, the official implementation always uses actnorm after Conv2D operations that they define except for ZeroInitConv2D that does not do the actnorm, as far as I understand their code.
Thanks.
from glow-pytorch.
First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?
Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.
from glow-pytorch.
First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?
Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.
Absolutely right. Do you understand what the usage of that additional input (the pad
variable) is? In other words, why they concatenate with this additional input after padding with zeros?
from glow-pytorch.
Right. As you mentioned, zero-padding allows for different values since the kernel that will be convolved with the input will have a bias by default, so it allows for different values although the kernel weight is multiplied by zeros (pads). But the pad
variable that is concatenated to the input also allows the kernel weight to have an effect, so in theory it might allow for more degree of freedom. However, I am not sure how helpful it might be in practice.
from glow-pytorch.
Related Issues (20)
- Something wrong with affine coupling? HOT 4
- Question: Does this work with larger resolutions? HOT 2
- coupling.py is error:The size of tensor a (2) must match the size of tensor b (62) at non-singleton dimension 1
- Conditional Gaussian prior parameters produce unnormalized likelihoods HOT 2
- Change image_size to 256 HOT 7
- File Checkpoint HOT 1
- a question about dataset
- Loss value HOT 2
- a question about the sigmoid function in the affine coupling layer HOT 2
- loss NAN HOT 1
- Maybe something wrong with affine paramter in argparse? HOT 1
- Act Norm Output issue HOT 1
- z_list HOT 9
- Why my sample pictures are black? HOT 2
- 如果对图像生成,glow感兴趣,或者需要帮助,可以联系我
- any pretrained models?
- too smalll value of logP
- Flow not perfectly invertible HOT 4
- why with torch.no_grad() when i == 0: HOT 1
- what's the difference between the " reconstruct=True" and " reconstruct=False"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow-pytorch.