Giter Club home page Giter Club logo

Comments (12)

rosinality avatar rosinality commented on August 10, 2024 2

I found it from official implementations. It was beneficial to training stability. I think the intuition of this is not found in the papers, but I think having spatial, content dependent priors will make overall problem easier.

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024 1

Hi,

I have some detailed questions regarding your code. I will be happy if you have the time to answer them.

  1. In both the Actornom and AffineCoupling modules where we need to apply scale and shift operations, in both your code and the original OpenAI code you are first applying the shift and then scale. Although it should not matter very much as long as optimization is concerned, did you find any reason to switch the operations? In the paper, the scale operation is done before the shift operation (Code lines: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L54 and https://github.com/rosinality/glow-pytorch/blob/master/model.py#L184).

  2. Can you explain why there is an additional learnable (scale * 3) factor in the ZeroConv2d module (like OpenAI code)? And why is the scale is multiplied by 3? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L151)

  3. Can you explain why the input is shifted by +2 before applying Sigmoid in the AffinCouping module (like OpenAI code)? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L182)

  4. Why are you padding with 1 (and not 0) in the ZeroConv2D module (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L149), while you are padding with 0 in the other Conv2D layers in the AffineCoupling? OpeAI pads with 0 if I am not mistaken (https://github.com/openai/glow/blob/master/tfops.py#L203).

  5. If I am not mistaken, in the official implementation, the authors incorporate additional ActNorm right after both the Conv2D and ZeroConv2D layers (Code lines https://github.com/openai/glow/blob/master/tfops.py#L256 and https://github.com/openai/glow/blob/master/tfops.py#L309). Is there any reason you have not considered this in your code?

I appreciate your time.
Thank you in advance.

from glow-pytorch.

rosinality avatar rosinality commented on August 10, 2024 1

It will help features maps to have different values (or kind of biases) on the edge. I think zero padding can also achieves that some degree, though.

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024

Thanks for the response!
I will write here in case I gain more knowledge about this technique.

from glow-pytorch.

rosinality avatar rosinality commented on August 10, 2024

Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.

  1. It could make optimization process different as location parameter is also scaled.
  2. I don't know why...Maybe it is to accelerate adaptation during initial training phases.
  3. It will make scaling factor close to 1 - This will be helpful for information propagation.
  4. I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)
  5. Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024

Hi,

Thanks for your response, you are right.
I have a question about the amount of noise you add to each data point when training. As can be seen in your code here (and OpenAI's code), you are adding noise to the data based on 1/(2 ** n_bits). Since the 8-bit image pixels which naturally take values from [0-255] are divided by 255 when bein converted to [0-1] interval, why should we not choose 1/(2 ** n_bits - 1), that is, 1/255 in case of 8-bit images as the right quantity?

I appreciate your thoughts.
Thanks very much in advance.

from glow-pytorch.

rosinality avatar rosinality commented on August 10, 2024

Hmm, wouldn't it be better to have noises slight smaller than actual pixel value changes in [0, 255]? Though I guess it will not make much differences.

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024

Yes, might be. I also think it does not make a huge difference overall.

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024

Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.

  1. It could make optimization process different as location parameter is also scaled.
  2. I don't know why...Maybe it is to accelerate adaptation during initial training phases.
  3. It will make scaling factor close to 1 - This will be helpful for information propagation.
  4. I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)
  5. Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.

Hi, after investigating the Glow official implementation more thorougly, I wanted to make some clarifications regarding this topic:

  • First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

  • Secondly, about the Actnorm, the official implementation always uses actnorm after Conv2D operations that they define except for ZeroInitConv2D that does not do the actnorm, as far as I understand their code.

Thanks.

from glow-pytorch.

rosinality avatar rosinality commented on August 10, 2024

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.

Absolutely right. Do you understand what the usage of that additional input (the pad variable) is? In other words, why they concatenate with this additional input after padding with zeros?

from glow-pytorch.

MoeinSorkhei avatar MoeinSorkhei commented on August 10, 2024

Right. As you mentioned, zero-padding allows for different values since the kernel that will be convolved with the input will have a bias by default, so it allows for different values although the kernel weight is multiplied by zeros (pads). But the pad variable that is concatenated to the input also allows the kernel weight to have an effect, so in theory it might allow for more degree of freedom. However, I am not sure how helpful it might be in practice.

from glow-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.