Giter Club home page Giter Club logo

Comments (9)

pender avatar pender commented on July 21, 2024 2

Terrific -- thank you so much!

from stylegan-encoder.

pbaylies avatar pbaylies commented on July 21, 2024 1

Hi @pender - my goal with this encoder is to be able to encode faces well into the latent space of the model; by constraining the total output to [1, 512] it's tougher to get a very realistic face without artifacts. Because the dlatents run from coarse to fine, it's possible to mix them for more variation and finer control over details, which NVIDIA does in the original paper. In my experience, an encoder trained like this does a good job of generating smooth and realistic faces, with less artifacts than the original generator.

I am open to having [1, 512] as an option for building a model, but not as the only option, because I don't believe it will ultimately perform as well for encoding as using the entire latent space -- but it will surely train faster!

from stylegan-encoder.

pender avatar pender commented on July 21, 2024 1

aha! Thank you for clarifying. I had a suspicion I might have this wrong when the composite face looked closer to the original than the faces generated by the individual layers.

from stylegan-encoder.

pender avatar pender commented on July 21, 2024 1

Hi @pbaylies, would it be a lot of work to add a flag (or to just indicate to me how) to build an effnet to output a [1, 512] dlatent? I've been staring at the effnet code for a while and I'm not sure how to do it. I can handle changing the assembly of training data but would much appreciate a pointer in correctly tweaking the effnet's architecture itself if you have a minute or two.

from stylegan-encoder.

pender avatar pender commented on July 21, 2024

@pbaylies - Right, I totally get that point with respect to the encoder, which optimizes all 18 layers of the dlatent tensor based on the perceptual loss between the generated image and the original image. My question is specifically regarding the inverse network that you can train via train_effnet.py or train_resnet.py. When you train that network, its training target are exclusively outputs of the StyleGAN mapping network, so it is training to match dlatent tensors where all 18 rows are the same for each data point. In other words, it never receives a training signal that would allow the difference between the 18 layers to be meaningful, so any failure to be the same would just be noise from the training. Am I misunderstanding how the training works?

from stylegan-encoder.

pbaylies avatar pbaylies commented on July 21, 2024

@pender -- so take a close look at where I generate the dataset in generate_dataset_main(), I purposely get more values from the mapping network and mix them in to generate more diverse faces. That's what all the mod_l / mod_r stuff relates to -- generating more diverse dlatents for training.

from stylegan-encoder.

pbaylies avatar pbaylies commented on July 21, 2024

@pender Cheers; I should probably document that section better / at all... :)

from stylegan-encoder.

pbaylies avatar pbaylies commented on July 21, 2024

Hi @pender -- I had tested out a simplified version of this code for that purpose, I'll post something for you tomorrow!

from stylegan-encoder.

pbaylies avatar pbaylies commented on July 21, 2024

Here you go @pender -- see if this works for you!

train_eff512.py.zip

from stylegan-encoder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.