Comments (10)
@XavierXiao let's go with your guess 😄 will make it happen by week's end
from gigagan-pytorch.
OK thanks for the explanation, that is clear for you current implementation. I personally think the multi-scale design may be a bit different though. The novel design of multi-scale loss of GigaGAN is that the discriminator outputs totally L(L+1)/2 predictions, but looks like your current implementation has L predictions (i.e., one prediction for each resolution of the pyramid).
How to make L(L+1)/2 predictions? Well I guess this is what the paper means "makes independent predictions for each image scale".
So, I guess, for a collection of rgbs produced from the generator, you first send the highest resolution image x_64
to the discriminator, and this will return 5 predictions, one at each resolution. Then you INDEPENDENTLY send the second highest resolution images x_32
to the discriminator, this will return 4 predictions, as so on.
How to send different resolution images independently to the discriminator? I think according to what the paper said on the very top of the right column in page 8, it has a fromRGB
layer at each resolution that process RGB images with different size and map it to a higher # of channels. So for 64x64x3 rgb input, it first goes through the fromRGB layer
at 64 resolution, and then the resulting tensor will goes through the FIRST discriminator block, and then proceed with later blocks. The 32x32x3 rgb input will first goes through the fromRGB
layer at 32 resolution, and the resulting tensor will be directly send to the SECOND discriminator block, and then proceed with later blocks. And so on.
This is the most reasonable guess I can have after reading the paper really carefully. Let me know what you think!
from gigagan-pytorch.
Sorry for the late reply due to July 4th holidays. I took a careful look at the new discriminator implementation. A couple of questions:
- I try to go over the computational graph in my mind, but I am still a bit confused about the input to the discriminator. Could you confirm that: assuming the highest resolution of image is 64x64, if I want to train the discriminator (i.e., both real and fake images are sent to the discriminator), then,
images
should be a 64x64x3 generated image, andrbgs
should be a collection of generated image of different sizes (without the highest size), andreal_images
should be a 64x64x3 real image? My understanding of the code is based on this input format, so correct me if I am wrong. - The predictor network outputs HxWx1, but it seems like we need to obtain a R/F prediction from each predictors. Although the paper does not say explicitly, do you think we should let predictor output 1x1 score?
Others all look good to me! BTW, it is really smart to implement the independent processing via batch dimension concatenation!
from gigagan-pytorch.
oh yes, you caught another bug for 1.
, thank you for the code review!
so the aim of the Discriminator
was to support both fake + real images being fed in, as well as only fake (for the generator training). only one logit is outputted per batch element, and that logit is high if fake and low if real (or vice versa, as long as you flip the loss when training the generator)
for 2.
i thought the multi-scale was referring to being fed in the rgb
s output by the generator at different stages. i could be wrong too
from gigagan-pytorch.
i'll get around to auto-handling the hinge loss within the Discriminator
instance tomorrow, as well as gradient penalties
from gigagan-pytorch.
example of the logic i'll likely just copy-paste over https://github.com/lucidrains/lightweight-gan/blob/main/lightweight_gan/lightweight_gan.py#L1226 , with modification to support distributed using huggingface accelerate
from gigagan-pytorch.
@XavierXiao want to see if the latest changes are more aligned with your expectations?
from gigagan-pytorch.
Wow so fast! Will take a look tomorrow.
from gigagan-pytorch.
Looks about right looking at the code to me.
One part I'm unsure about is if the images passed in to the discriminator should be from different steps of the pyramid from the generator. Or just resized versions of the final layer.
The prior would be skipping lots of big layers in the middle. Which feels like it lines up with the paper details about it being efficient for scaling up.
from gigagan-pytorch.
closing as addressed
from gigagan-pytorch.
Related Issues (20)
- The training code not deal with paired data yet? HOT 2
- [Question] About the upscaler HOT 2
- Multi GPU training HOT 4
- Multi GPU with gradient accumulation
- [Request] Please provide a replicate.com version
- Confused about this project?
- NaN losses after hours of training (UPSAMPLER) HOT 16
- How to implement this model to enhance my input images? Do I have to train the model to use? HOT 2
- Weights of Gigagan Upscaler HOT 1
- Turn on/off gradients computation between generator/discriminator HOT 2
- Wrong order of resolutions list HOT 1
- to_rgb branch has only 1 learnable kernel HOT 7
- Gradient Penalty is very high in the start HOT 10
- How to use this model for SR ?
- Has Anyone Trained This Model Yet? HOT 2
- The text-to-image tasks
- Config to reproduce paper
- question about code in unet_upsampler.py HOT 1
- the loss became nan after a few train steps HOT 2
- [News] Videogigagan is published. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gigagan-pytorch.