Hi, Thank you for your great implementation. Regarding the "lear

Learned Prior about glow-pytorch HOT 12 OPEN

rosinality commented on August 10, 2024

Learned Prior

from glow-pytorch.

Comments (12)

rosinality commented on August 10, 2024 2

I found it from official implementations. It was beneficial to training stability. I think the intuition of this is not found in the papers, but I think having spatial, content dependent priors will make overall problem easier.

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024 1

Hi,

I have some detailed questions regarding your code. I will be happy if you have the time to answer them.

In both the Actornom and AffineCoupling modules where we need to apply scale and shift operations, in both your code and the original OpenAI code you are first applying the shift and then scale. Although it should not matter very much as long as optimization is concerned, did you find any reason to switch the operations? In the paper, the scale operation is done before the shift operation (Code lines: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L54 and https://github.com/rosinality/glow-pytorch/blob/master/model.py#L184).
Can you explain why there is an additional learnable (scale * 3) factor in the ZeroConv2d module (like OpenAI code)? And why is the scale is multiplied by 3? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L151)
Can you explain why the input is shifted by +2 before applying Sigmoid in the AffinCouping module (like OpenAI code)? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L182)
Why are you padding with 1 (and not 0) in the ZeroConv2D module (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L149), while you are padding with 0 in the other Conv2D layers in the AffineCoupling? OpeAI pads with 0 if I am not mistaken (https://github.com/openai/glow/blob/master/tfops.py#L203).
If I am not mistaken, in the official implementation, the authors incorporate additional ActNorm right after both the Conv2D and ZeroConv2D layers (Code lines https://github.com/openai/glow/blob/master/tfops.py#L256 and https://github.com/openai/glow/blob/master/tfops.py#L309). Is there any reason you have not considered this in your code?

I appreciate your time.
Thank you in advance.

from glow-pytorch.

rosinality commented on August 10, 2024 1

It will help features maps to have different values (or kind of biases) on the edge. I think zero padding can also achieves that some degree, though.

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024

Thanks for the response!
I will write here in case I gain more knowledge about this technique.

from glow-pytorch.

rosinality commented on August 10, 2024

Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.

It could make optimization process different as location parameter is also scaled.
I don't know why...Maybe it is to accelerate adaptation during initial training phases.
It will make scaling factor close to 1 - This will be helpful for information propagation.
I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)
Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024

Hi,

Thanks for your response, you are right.
I have a question about the amount of noise you add to each data point when training. As can be seen in your code here (and OpenAI's code), you are adding noise to the data based on 1/(2 ** n_bits). Since the 8-bit image pixels which naturally take values from [0-255] are divided by 255 when bein converted to [0-1] interval, why should we not choose 1/(2 ** n_bits - 1), that is, 1/255 in case of 8-bit images as the right quantity?

I appreciate your thoughts.
Thanks very much in advance.

from glow-pytorch.

rosinality commented on August 10, 2024

Hmm, wouldn't it be better to have noises slight smaller than actual pixel value changes in [0, 255]? Though I guess it will not make much differences.

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024

Yes, might be. I also think it does not make a huge difference overall.

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024

Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.

It could make optimization process different as location parameter is also scaled.

I don't know why...Maybe it is to accelerate adaptation during initial training phases.

It will make scaling factor close to 1 - This will be helpful for information propagation.

I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)

Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.

Hi, after investigating the Glow official implementation more thorougly, I wanted to make some clarifications regarding this topic:

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?
Secondly, about the Actnorm, the official implementation always uses actnorm after Conv2D operations that they define except for ZeroInitConv2D that does not do the actnorm, as far as I understand their code.

Thanks.

from glow-pytorch.

rosinality commented on August 10, 2024

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.

Absolutely right. Do you understand what the usage of that additional input (the pad variable) is? In other words, why they concatenate with this additional input after padding with zeros?

from glow-pytorch.

MoeinSorkhei commented on August 10, 2024

Right. As you mentioned, zero-padding allows for different values since the kernel that will be convolved with the input will have a bias by default, so it allows for different values although the kernel weight is multiplied by zeros (pads). But the pad variable that is concatenated to the input also allows the kernel weight to have an effect, so in theory it might allow for more degree of freedom. However, I am not sure how helpful it might be in practice.

from glow-pytorch.

Learned Prior about glow-pytorch HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent