garlandzhang / hairy_gan Goto Github PK

View Code? Open in Web Editor NEW

1.0 3.0 0.0 24.47 MB

Jupyter Notebook 99.82% Python 0.14% HTML 0.03%

hairy_gan's Introduction

hairy_gan

GAN for trying different hairstyles

Replicating AttGAN arch: https://arxiv.org/pdf/1711.10678.pdf

celeba dataset attributes:

hairy_gan's People

Contributors

Stargazers

Watchers

hairy_gan's Issues

custom hair colour

We will try different hair colours

Our first approach is to use computer vision to color the hair directly via some noise.

Our second approach is to use a hair color recognizer and create a GAN to take a hair image and convert to the desired color. We verify using the recognizer. Loss will simply just be binary_crossentropy (check colors are same).

cgan for hair color only

the previous post i made was unsuccessful mainly because we are limited to two domains in cycle gan (same with augmented cycle gan). we focus instead on using a single conditional gan.

style_transfer

Implementation is not working...look for second implementation for reference.

black hair => blonde hair GAN

separate images with black hair (A) and images with blond hair (B) from training set

wrap dataset info into class object
split to get training set
filter for mustaches. take complement for no mustaches

recreate cycle gan from scratch
train cycle gan on datasets A and B

consider data augmentation for the datasets

Prototype Design

So while we wait for the models to train, we will now focus on creating a web app/some medium to interact with model(s).

Hair segmentation Thread

Phase 2: How do we use HairyGAN model?

We want to run this model so video output (whether it is a video recording, or real time) gives the translated form. Since we only trained on faces from the frontal perspective, we do not have the flexibility to check the sides or back. This limits the robustness of our model.

attributes ambiguity

Suppose we have an image of a person with blond hair. Then if our requested attributes has a -1 for blond hair, what does that mean for the translated image?

We have been interpreting -1 to say we should get rid of the blond hair, but this is ambiguous as we don't know what hair the translated product should have. Instead, the requested attributes should keep the blond hair (therefore set to 1) UNLESS there is another hair color that was requested.

For example:

Bald image with requested attributes that has no hair change, keep Bald=> Bald image
Bald image with requested attributes that has hair change => Hair Color image

Pretrain classifier head first, then freeze it during normal training

revelation that classifier should not be training based on generated attributes as the generator could be mistaken too

no glasses feature

I really cannot tell what a person looks like without glasses. Make this a feature!

Augmented Cycle Gan

Performs many to many mappings between TWO domains. For example, edges of shoe image gives multiple different colored shoes instead of 1 (i.e. CycleGan).

This will not work for having multiple domains to consider!

restoring resolution

Running the decoding leads to low resolution (128x128) predictions. We need to restore the original resolution.

ML issues

tried to test my model on smaller set of images and on only 1 feature: baldness.

after training the model, it correctly predicts the value of Bald for original images, but always predicts Bald for translated images. This is our goal as we set the new attributes to always be Bald. However, the images do not look bald! This might be due to having too small of a dataset, therefore the model determines Bald by a different feature from what we understand as Bald.

observe in the bald row that we actually do not have a lot of Baldness so this might in fact be what bald is...

lesson verify if output is expected based on the original model

cgan for hair colour

modify existing gan to use conditional generators and discriminators.
ideally we want binary vector input representing combination of inputs. we extract these labels from the property columns and only getting whether or not they are 0 or 1 (in our case, it's -1 or 1).
training will be using these binary vector inputs

problem: given 40 features, we have 2^40 (approx. 1 trillion) combinations... this is too many types to train on.

Instead, we can train on one feature at a time and encode its feature label as the order it is indexed as. For example, if blond hair was the 27th property, then we use the label 27.

For training, since we have multiple features for each image, we can train it on one feature at a time. For example, suppose we are only interested in turning any person's hair color to blond. then we can simply ignore the other features and just look at the blond hair column for all the inputs. Once the generator becomes good at generating blond hair, we can work on a different feature afterwards.

problem 2: what happens if we have an image that doens't have blond hair? Then we have a -1 property value. Then the generator would generate what? Whereas with digits, we can get away with knowing every input value is something we can generate, non-blond hair isn't. At the very least, we should define it more clearly to be, for example, black hair as default.

Since hair color is mutually exclusive, every image in the dataset is true for one of those values. Then we should find the corresponding index for the hair color it belongs to so the generator can actually train on a real hair color (instead of potentially returning gibberish non-blond hair).

problem 3: with digits, our cgan simply took a latent space and label, fed it to a generator, and outputted (ideally) a synthetic representation of the label. Since we are working with images, however, what do we expect the generator to output exactly? If we feed it a label "blond hair" then i expect the model to output an image based off the input with blond hair! So what is the discriminator outputting? Recall the discriminator determines if an output is real or fake (but not interested in the actual value itself). With digits, we pass an image and corresponding label to the discriminator and return 0 (fake) or 1 (real). The discriminator learns by being penalized for thinking the image is fake, when in fact it is real, or vice versa.

Similarly, with face images, we pass the transformed face image (or original) and corresponding label (property index) to the discriminator and return 0 (fake) or 1 (real). And similarly, the discriminator learns by being penalized for thinking the image is fake, when in fact it is real, or the image is transformed (say a black haired person is not a blond haired person) and the ddiscriminator thinks it is real.

So, when we pass real images, we expect the discriminator to output 1. when we pass fake images, we expect the discriminator to output 0. the generator learns by not creating well-represented images.

Summary of training:

input: face image + label (property index)
output: 0 or 1 for fake or real

one last problem... whereas with digits with have mutual exclusiveness. Now we don't. So the discriminator might initially learn to believe these classes are mutually exclusive (? this might not be true but im just considering it).