Starting a new issue for better visibility to other people. I have some quick question

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

example of the logic i'll likely just copy-paste over <a href="https://github.com/luci

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com

Questions regarding the discriminator about gigagan-pytorch HOT 10 CLOSED

XavierXiao commented on May 6, 2024 2

Questions regarding the discriminator

from gigagan-pytorch.

Comments (10)

lucidrains commented on May 6, 2024 4

@XavierXiao let's go with your guess 😄 will make it happen by week's end

from gigagan-pytorch.

XavierXiao commented on May 6, 2024 2

OK thanks for the explanation, that is clear for you current implementation. I personally think the multi-scale design may be a bit different though. The novel design of multi-scale loss of GigaGAN is that the discriminator outputs totally L(L+1)/2 predictions, but looks like your current implementation has L predictions (i.e., one prediction for each resolution of the pyramid).

How to make L(L+1)/2 predictions? Well I guess this is what the paper means "makes independent predictions for each image scale".

So, I guess, for a collection of rgbs produced from the generator, you first send the highest resolution image x_64 to the discriminator, and this will return 5 predictions, one at each resolution. Then you INDEPENDENTLY send the second highest resolution images x_32 to the discriminator, this will return 4 predictions, as so on.

How to send different resolution images independently to the discriminator? I think according to what the paper said on the very top of the right column in page 8, it has a fromRGB layer at each resolution that process RGB images with different size and map it to a higher # of channels. So for 64x64x3 rgb input, it first goes through the fromRGB layer at 64 resolution, and then the resulting tensor will goes through the FIRST discriminator block, and then proceed with later blocks. The 32x32x3 rgb input will first goes through the fromRGB layer at 32 resolution, and the resulting tensor will be directly send to the SECOND discriminator block, and then proceed with later blocks. And so on.

This is the most reasonable guess I can have after reading the paper really carefully. Let me know what you think!

from gigagan-pytorch.

XavierXiao commented on May 6, 2024 1

Sorry for the late reply due to July 4th holidays. I took a careful look at the new discriminator implementation. A couple of questions:

I try to go over the computational graph in my mind, but I am still a bit confused about the input to the discriminator. Could you confirm that: assuming the highest resolution of image is 64x64, if I want to train the discriminator (i.e., both real and fake images are sent to the discriminator), then, images should be a 64x64x3 generated image, and rbgs should be a collection of generated image of different sizes (without the highest size), and real_images should be a 64x64x3 real image? My understanding of the code is based on this input format, so correct me if I am wrong.
The predictor network outputs HxWx1, but it seems like we need to obtain a R/F prediction from each predictors. Although the paper does not say explicitly, do you think we should let predictor output 1x1 score?

Others all look good to me! BTW, it is really smart to implement the independent processing via batch dimension concatenation!

from gigagan-pytorch.

lucidrains commented on May 6, 2024

oh yes, you caught another bug for 1., thank you for the code review!

so the aim of the Discriminator was to support both fake + real images being fed in, as well as only fake (for the generator training). only one logit is outputted per batch element, and that logit is high if fake and low if real (or vice versa, as long as you flip the loss when training the generator)

for 2. i thought the multi-scale was referring to being fed in the rgbs output by the generator at different stages. i could be wrong too

from gigagan-pytorch.

lucidrains commented on May 6, 2024

i'll get around to auto-handling the hinge loss within the Discriminator instance tomorrow, as well as gradient penalties

from gigagan-pytorch.

lucidrains commented on May 6, 2024

example of the logic i'll likely just copy-paste over https://github.com/lucidrains/lightweight-gan/blob/main/lightweight_gan/lightweight_gan.py#L1226 , with modification to support distributed using huggingface accelerate

from gigagan-pytorch.

lucidrains commented on May 6, 2024

@XavierXiao want to see if the latest changes are more aligned with your expectations?

from gigagan-pytorch.

XavierXiao commented on May 6, 2024

Wow so fast! Will take a look tomorrow.

from gigagan-pytorch.

nbardy commented on May 6, 2024

Looks about right looking at the code to me.

One part I'm unsure about is if the images passed in to the discriminator should be from different steps of the pyramid from the generator. Or just resized versions of the final layer.

The prior would be skipping lots of big layers in the middle. Which feels like it lines up with the paper details about it being efficient for scaling up.

from gigagan-pytorch.

lucidrains commented on May 6, 2024

closing as addressed

from gigagan-pytorch.

Questions regarding the discriminator about gigagan-pytorch HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent