Picking up from <a href="https://github.com/Puzer/stylegan-encoder/issues/6#issuecomme

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Here you go <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

Inverse network output shape about stylegan-encoder HOT 9 CLOSED

pbaylies commented on July 21, 2024 2

Inverse network output shape

from stylegan-encoder.

Comments (9)

pender commented on July 21, 2024 2

Terrific -- thank you so much!

from stylegan-encoder.

pbaylies commented on July 21, 2024 1

Hi @pender - my goal with this encoder is to be able to encode faces well into the latent space of the model; by constraining the total output to [1, 512] it's tougher to get a very realistic face without artifacts. Because the dlatents run from coarse to fine, it's possible to mix them for more variation and finer control over details, which NVIDIA does in the original paper. In my experience, an encoder trained like this does a good job of generating smooth and realistic faces, with less artifacts than the original generator.

I am open to having [1, 512] as an option for building a model, but not as the only option, because I don't believe it will ultimately perform as well for encoding as using the entire latent space -- but it will surely train faster!

from stylegan-encoder.

pender commented on July 21, 2024 1

aha! Thank you for clarifying. I had a suspicion I might have this wrong when the composite face looked closer to the original than the faces generated by the individual layers.

from stylegan-encoder.

pender commented on July 21, 2024 1

Hi @pbaylies, would it be a lot of work to add a flag (or to just indicate to me how) to build an effnet to output a [1, 512] dlatent? I've been staring at the effnet code for a while and I'm not sure how to do it. I can handle changing the assembly of training data but would much appreciate a pointer in correctly tweaking the effnet's architecture itself if you have a minute or two.

from stylegan-encoder.

pender commented on July 21, 2024

@pbaylies - Right, I totally get that point with respect to the encoder, which optimizes all 18 layers of the dlatent tensor based on the perceptual loss between the generated image and the original image. My question is specifically regarding the inverse network that you can train via train_effnet.py or train_resnet.py. When you train that network, its training target are exclusively outputs of the StyleGAN mapping network, so it is training to match dlatent tensors where all 18 rows are the same for each data point. In other words, it never receives a training signal that would allow the difference between the 18 layers to be meaningful, so any failure to be the same would just be noise from the training. Am I misunderstanding how the training works?

from stylegan-encoder.

pbaylies commented on July 21, 2024

@pender -- so take a close look at where I generate the dataset in generate_dataset_main(), I purposely get more values from the mapping network and mix them in to generate more diverse faces. That's what all the mod_l / mod_r stuff relates to -- generating more diverse dlatents for training.

from stylegan-encoder.

pbaylies commented on July 21, 2024

@pender Cheers; I should probably document that section better / at all... :)

from stylegan-encoder.

pbaylies commented on July 21, 2024

Hi @pender -- I had tested out a simplified version of this code for that purpose, I'll post something for you tomorrow!

from stylegan-encoder.

pbaylies commented on July 21, 2024

Here you go @pender -- see if this works for you!

train_eff512.py.zip

from stylegan-encoder.

Inverse network output shape about stylegan-encoder HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent