Comments (2)
Hi Raymond! We landed on this pose correction loss because we were looking for a way to force all heads to have the same canonical pose. We wanted it so that when you asked for a front-facing image, you got a front-facing image, no matter the identity! This pose correction loss is entirely optional; if you train with sufficiently low learning rates it also seems to help stabilize the head poses. However, we noticed that when we tried to up the convergence, we'd encounter pose drift.
So this pose correction loss works in two parts. In part one, we teach the discriminator to recognize poses. Whenever we feed the discriminator a generated image, we have it predict the pose and we simply minimize this error. If all is going to plan, the discriminator will be able to tell us the pose the image appears to have.
In part two, we update the generator based both on the quality of the image (as you do in standard GAN training) but also on how closely it appeared to align with the correct pose. We have the ground truth pose that's fed to the generator. After part one, we now have a trained discriminator that can tell us the pose the image appears to have. Part two is just minimizing this distance, i.e. making sure our generated images' actual poses match the poses that they appear to have.
The nice thing about this strategy is that it doesn't require any actual poses from the real images and yet it still seems to stop (or at least slow) pose drift.
Hope this helps!
Eric
from pi-gan.
Hi, Eric!
Thanks so much for your response! It really clarifies the mechanism given SIREN has learned a good canonical 3D representation for the head.
However, I am still curious about the mechanism of this strategy at the early stage of training. Specifically, how does a Canonical Head "emerge" in the SIREN?
At the very beginning, the SIREN only generates some noises, thus, clearly, it doesn't make sense to teach discriminator to predict position for noise. As the iteration goes by, because of the GAN loss, the SIREN gradually learns to generate coarse, blur, yet promising head image by volume rendering and the discriminator gradually learns to "assign" a position for these generated head images.
Suppose these premature numerically frontal head images are semantically tilted at first (when v=pi/2 and h=pi/2), I believe the discriminator will fail to correct them into semantically frontal faces, since it should have built the connection between the semantically tilted faces and numerical frontal values (v=pi/2 and h=pi/2). However, in the practice, it isn't the case. The semantically frontal faces match with numerically frontal faces only with some small errors which disappear as the training goes.
After checking the whole pipeline, I think it may related to the position sampling strategy and the dataset, since the sampled vertical and horizontal positions are centered at pi/2 and the dataset (say "CelebA") has the property that most of its faces are frontal or close to frontal.
Thus, at the early stage of training, when we feed real images into discriminator, it will think frontal faces are truer than tilted ones and the SIREN is encouraged to generate frontal or near-frontal faces at sampled positions. Since sampled positions are centered around pi/2, SIREN will gradually learn to construct frontal faces, i.e. canonical faces, at the position pi/2. However, since we are also actively sampling other positions and require SIREN to generate truthful images at different views, a 3D model for head is formed instead of a 3D plane.
This is just my premature thought. Can you tell me whether it is correct or not? If not, can you solve my puzzle for the reason why a Canonical Head is guaranteed to "emerge" in the SIREN at the early stage instead of a Tilted Head or 3D plane, i.e. mirror? Thanks a lot!
from pi-gan.
Related Issues (20)
- License? HOT 1
- generator.losses can not visual using tensorboard
- Can't pickle weakref objects HOT 7
- Reproducing the shape results on the project page
- Inverse rendering script assumes strictly oriented input image
- Wrong dataset_path for CARLA and CATS
- hello,how to get the 3d presentation as you mentioned in fig.9 by projecting the depth map? thanks
- Query about Training Time HOT 19
- generated images colours HOT 1
- Intuition behind scaling frequency output of mapping network?
- How many pictures did you use to calculate the FID value of different datasets in the paper? HOT 1
- the link to the datasets HOT 1
- Error when Rendering Videos HOT 4
- 3d mesh reconstruction quality
- Inverse code can't render fine detail as in demo
- Use the same z for training G and D in each iteration.
- Out of memory by training with several 1080ti
- Why I cannot load the pretrained model meeting 'no module named torch_ema'? HOT 1
- Does the pi-GAN discriminator receive the camera pose during training?
- About the lr for G and D
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pi-gan.