Giter Club home page Giter Club logo

giorgiaauroraadorni / gansformer-reproducibility-challenge Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 3.0 95.88 MB

Replication of the novel Generative Adversarial Transformer.

License: MIT License

TeX 8.33% Dockerfile 0.04% Python 47.72% Cuda 2.43% HTML 0.37% Jupyter Notebook 41.11%
attention-is-all-you-need deep-learning gan generative-adversarial-network generative-adversarial-transformer ml reproducibility-challenge stylegan stylegan2 tensorflow transformers

gansformer-reproducibility-challenge's People

Contributors

dependabot[bot] avatar felixboelter avatar giorgiaauroraadorni avatar steflamb avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

gansformer-reproducibility-challenge's Issues

contrib module not found with tensorflow 2.5.3

File "/content/gansformer-reproducibility-challenge/src/dnnlib/tflib/tfutil.py", line 16, in
import tensorflow.contrib # requires TensorFlow 1.x!
ModuleNotFoundError: No module named 'tensorflow.contrib'

Comment regarding the GANformer reproducibility report

Dear Giorgia, Felix and Stefano,
Thank you very much for the interest in the GANformer model and for working on reproducing it!
I just read the reproducibility report and wanted to comment.

Regarding the attention in the generator vs. discriminator: In earlier stages of the model development, I have explored the use of bipartite attention in both the generator and the discriminator, and similarly to the observations you've had, as I kept working on it, I noticed that indeed the model performs better when incorporating attention to the generator only, and so the pre-trained models + default command-line settings of the public repository reflect that.

In the GANformer2 paper we released later last year, we also followed this similar design (of using bipartite attention over the generator only). Table 5 in the first GANformer paper, that shows the number of parameters for the generator and the discriminator for different approaches also matches that (indicating that the GANformer and StyleGAN2 discriminators have the same number of parameters). The paragraph about the discriminator using bipartite attention at the bottom left of page 7 should have been removed and it stayed there due to a mistake on my side. I updated the paper to address that and it should become public through arxiv later today!

Regarding the report's empirical section, we believe that the benefits of the bipartite attention approach are obtained mainly due to the better support of long-range interactions across the image, and so that mainly comes into play for high-resolutions, compositional/complex scenes and for a full coarse of training (where the model starts focusing on fidelity of small details rather than on being coarsely right about the overall shape, as happens in earlier phases of the training). The empirical evaluation in the reproducibility report potentially misses these factors, by lowering the data resolution, focusing on faces only, and comparing results after short training (300 kimgs in the report vs. 10,000 kimgs in the GANformer paper, and 25,000-70,000 kimgs in StyleGAN2).

As is discussed in the paper, the gains in FFHQ after completing the full training are indeed the smallest compared to the other datasets due to being less compositional and structurally diverse than multi-object scenes (as is the case for CLEVR, LSUN-Bedrooms and Cityscapes). Indeed, note that the learning curves comparison provided in the paper are for the CLEVR dataset. We chose CLEVR over e.g. FFHQ since we believe it to be a good example of a compositional scenes dataset that could express the benefits of our more-compositional model.

Finally, wanted to mention that in the report where it compared epsilon values, note that epsilon doesn't stand for learning rate, but rather it's used as a small value that is added by the optimizer for stability. It was set to 1e-8 in both StyleGAN2 and GANformer2 repositories. Meanwhile, the learning rate in the GANformer's repository is 0.001 in line with the paper. (For the other optimizer settings of beta1, beta2 where the Supp. says "beta1=0.9, beta1=0.999", I have fixed that to "beta1=0.0, beta2=0.9" to comply with the repository).

Thanks again for looking into the paper! If you'd like to chat about the paper or have any thoughts, questions or feedback please don't hesitate to concat me at [email protected]. Wishing you all the best,
Drew

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.