deepfakes / faceswap-model Goto Github PK

Tweaking the generative model

faceswap-model's Introduction

Notice: This Repo is now in Read Only mode. Information here is likely to be increasingly outdated. Discussion about Faceswap usage should be redirected to our Forums at https://www.faceswap.dev

faceswap-model

If you are interested in Generative networks and Autoencoders, this is the place where you can discuss with others. The "official" code is in the faceswap repo, but feel free to propose alternatives here.

faceswap-model's People

Contributors

Stargazers

Watchers

Forkers

scapeqin bradparks tinaelec johndragon horsebrain tangyiyong jijingunique kyuhwas rhlee2k opptimus badalki zys199029 skp80 torah davidanger teouue guailong618 nekod piotr74ek wh-forker calvinexp michalliu jiapei100 kenshinlam liulangxing f911 bluecodecat annxoik antonym55 neuroradiology rockmanlc tianyatmq sun-ming prefectgou valiother lukasfragas sunlaobo suzg dameng3369 v8lab yuan776 presleyhank leekindo youngchi ycomputer maoguoxu javadeveloper001 chenbillcat living1069 yaodidier miniknife ericmacau chaosdark yodarocker taishouyoiad imkerberos tjock liudicsu moraleak tyang513 gnilihzeux pengpengpro pspglb sanyaade-machine-learning tuyenhuy026 gitgjogh clouders1111 3453-315h chinarefers ez-max zhangtitq uuperez willemnotten zealass batlogic arxxyr arvind-india flynn-z paul1989889 msgpo lanfeng1314 maoguoxus xneurons bigworldnebula jia66 kioco sudao-b oklacowboy25 tgwinly travlife learnmove xxglbs arunsanna gofullthrottle amesianx liuming88 donnmyth kennethfan kyapp69 afleshel

faceswap-model's Issues

neural enchance to improve resolution

Found this in voat, using Dfaker + Enhance neural .
https://sendvid.com/n9lcoqdm
Not sure if we can improve the resolution with this or not.
https://github.com/alexjc/neural-enhance

Resources for image refinment

A bunch of resources i'm trying to get into to explore image refinment:

An article by a guy from Let's Enhance about using VGG16: https://medium.com/lets-enhance-stories/content-and-style-loss-using-vgg-network-e810a7afe5fc

Another approach with feature loss: Generating Images with Perceptual Similarity Metrics based on Deep Networks

WGan (already tried in GAN repo): https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html

https://adeshpande3.github.io/Deep-Learning-Research-Review-Week-1-Generative-Adversarial-Nets

https://machinelearning.apple.com/2017/07/07/GAN.html

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis

Histogramm loss func.

tf.histogram_fixed_width is not differentiable to use in loss func.

I created differentiable version for image histogram.

def tf_image_histogram (tf, input):
    x = input
    x += 1 / 255.0
    
    output = []
    for i in range(256, 0, -1):
        v = i / 255.0
        y = (x - v) * 1000
        
        y = tf.clip_by_value (y, -1.0, 0.0) + 1

        output.append ( tf.reduce_sum (y) )
        x -= y*v

    return tf.stack ( output[::-1] )

^ result same as tf.histogram_fixed_width

and with mean square diff

hist_loss = tf.reduce_mean ( tf.square ( ( tf_image_histogram (tf, y_true) - tf_image_histogram(tf, y_pred) ) /65536 ) )

nn finds out nearest set of pixels from noise which represent same histogram

so how to use it to train image histogram ?

Suggestion: Focal length of source images

I have taken a look at some results and think that the focal lenght of the sources images would make a good feature to enhance the results of the face sawp.

Extracting the focal length from the training images is not trivial and could also be achieved by means of learning like deepfocal.

GAN Discriminator loss wildly variant - expected?

I was trying the GAN model, and the loss values for the discriminator (ie, Loss_DA and Loss_DB) are fluctuating a lot. Every iteration, it jumps randomly between a value somewhere in the range from < .01 to .25. I know it's essentially a binary decision, so more random variation from run to run might be expected, but this seems somewhat extreme, and makes me worry that I'm not getting useful derivatives.

Is this normal? Should I adjust some hyperparameters? (Like batch size, for instance? It's currently 64...)

I should note that I've only been running the training for a few hours thus far, so it's possible things are working as they should - the variation just seemed concerning. The generator seems to be more stable, and has loss values which are generally trending downward, which is good.

(BTW, not sure if this is the right place to post general "discussion" stuff like this - please let me know if there's some other forum / reddit / whatever I should go for similar questions!)

Resources

Face recognition

Face-alignment

Tensorflow/python implementation of MTCNN

Face-segmentation

CNN face-segmentation (caffe)

Face swapping

End-to-end, automatic face swapping pipeline (caffe)

Face reenactment

Kyle Olszewski et al. Realistic Dynamic Facial Textures From a Single Image Using GANs. ICCV 2017

Paper : Boundary-Aware Face Alignment Algorithm

https://wywu.github.io/projects/LAB/LAB.html

Abstract:

We present a novel boundary-aware face alignment algorithm by utilising boundary lines as the geometric structure of a human face to help facial landmark localisation. Unlike the conventional heatmap based method and regression based method, our approach derives face landmarks from boundary lines which remove the ambiguities in the landmark definition

[Featured] Deepfakes uses in the wild

Even if deepfakes have gained lot of attention in the wrong reasons, it is known also to be used for movie faking. If you find interesting use case, please post them here

[Edited for content]

[Google Research Blog] Mobile Real-time Video Segmentation

https://research.googleblog.com/2018/03/mobile-real-time-video-segmentation.html

Current GAN model has a known issue that it sometimes producing video with noticeable flickering. Although we've tried to solve it by applying moving avg. of bounding box coordinate and better tracking model, there are still rooms for improvement. The described method in this blog that achieving frame-to-frame continuity shed light on how we can tackle this problem in our future work.

This blog introduces a segmentation network for mobile application, one of its requirement/constraint is:

A video model should leverage temporal redundancy (neighboring frames look similar) and exhibit temporal consistency (neighboring results should be similar).

They achieve frame-to-frame temporal continuity by concatenating a previous (frame) mask to input channel.

we first pass the computed mask from the previous frame as a prior by concatenating it as a fourth channel to the current RGB input frame to achieve temporal consistency,

There are also data augmentation techniques for ground truth masks described in Training Procedure.

Paper & source: progressive growing of gans

They officially recently ported this in tensorflow, results are 1024*1024 photorealistics images.

paper (nvidia research)
paper (arxiv)
source
video

btw, they train with telsa V or P (16gb vram), I guess we should tuning down to 256 or 512. But the idea of progression of gans its really cool (and time saving in training)

Better face detection algorithm

Face detection is still perfectible. suggest better alternatives if you have some idea

Use OpenCL for compatibility with AMD GPUs

I know the project is using tensorflow which uses CUDA, so for now it is exlusive to NVIDIA GPUs.

Is there a way to use another framework that would use OpenCL to make it work on AMD GPUs and maybe Intel integrated GPUs ?

Or is it completely unimaginable ?

Survey of Model Improvements Proposals

Creating a list to organize my own thoughts but I would love to hear everyone else's ideas and suggestions as well as what they're looking at

Ways to improve the faceswap mode( accuracy, speed, robustness to outliers, model convergence )

Improved face detection options ( Current code: dlib/CNN+mmod )

MTCNN
Mask R-CNN ( would also provide a pixel-by-pixel semantic mask for the face )
YOLOv2

Improved face recognition ( Current code: ageitgey/face_recognition )

Adapt a submission from the NIST Face Recognition 2017 Challenge or other survey
- Website - https://www.nist.gov/programs-projects/face-recognition-prize-challenge
- Results - https://nvlpubs.nist.gov/nistpubs/ir/2017/NIST.IR.8197.pdf

Improved face alignment ( Current code: 1adrianb/face-alignment )

Adapt a submission from the Menpo Face Alignment 2017 Challenge
- Website - https://ibug.doc.ic.ac.uk/resources/2nd-facial-landmark-tracking-competition-menpo-ben/
- Results - https://pdfs.semanticscholar.org/657a/58a220b1e69d14ef7a88be859d2f8d75e6a1.pdf
- Results from 1adrianb's own papers show that the competition winners are more accurate
- Variety of openess on source code from winners..one top submission is on github
- https://github.com/MarekKowalski/DeepAlignmentNetwork
- https://www.dropbox.com/sh/u4g2o5kha0mt1uc/AADDMkoMKG2t4iiTxMUC6e2Ta?dl=0&preview=Deep+Alignment+Network.pptx

Add Batch Normalization after Conv and Dense Layers in the autoencoder ( Current code: No norm )
- lots of papers on improving training speed and reducing over-fitting, better stability
- https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c
- explore alternatives to standard batch norm
  - batch renormalization
  - ghost normalization
  - streaming normalization
Replace rectifier in the conv layer with more robust alternatives ( Current code: Leaky RELU )
- PRELU
- ELU
- PELU
Adjust learning rate when user changes batch size ( Current code: no LR scaling with batch size )
- lots of papers but simply, if you have a larger batch, you can afford to explore larger step sizes and train more quickly with the same model stability
- Apply linear scaling factor to learning rate if batch size is increased ( doubling batch size double learning rate )

https://www.quora.com/Intuitively-how-does-mini-batch-size-affect-the-performance-of-stochastic-gradient-descent

Explore using other optimizers in autoencoder ( Current code: Adam )
- SGD with momentum
- Cyclical Learning Rate
- L4adam
- Yellow Fin
Effective learning rate schedule ( Current code: no adjustment of LR after start or at stagnation )

best practice is to modify(lower) the learning rate or (raise) the batch size either at set intervals or after the loss has stagnated ( Note: for a constant mini-batch size of say 16 which is limited by GPU memory, you can still increase batch size, i.e. running on multiple GPUs or sequential GPU runs )
https://openreview.net/pdf?id=B1Yy1BxCZ
also related: model restarting at set intervals with higher LR to kick out of local minimum
https://arxiv.org/pdf/1608.03983.pdf

Initial learning rate ( Current code: 5e-5 )

Dependent on model architecture ( normalization, batch size, regularization, better rectifiers, optimizers allow you to increase LR with same stability/accuracy
Suspect it is too low but current model has few of the tweaks which promote training stability
Also highly dependent on default batch size

Use keras.preprocessing.image.ImageDataGenerator ( Current code: random_transform and other )

more built-in transforms ( shear, skew, whitening etc. ) to create warpd images that are sent to trainer
built-in normalization for warped batches
integrated into keras model pipeline for queuing and paralleling
https://machinelearningmastery.com/image-augmentation-deep-learning-keras/

Threw in a lot, but can add more if anyone ever looks at this. PS - I was looking at 4/5/7/9 with a re-write of the Original_Model

Training on 1 face only

Hi guys, I'm wondering about the possibility of training the models on 1 face only instead of 2.

The actual workflow makes it mandatory to train a model for a specific src/target pair which is not convenient. If a training can be done with just 1 face, once a model is trained, it would be much more shareable, no?

What do you think?

How does the loss minimize if autoencoder_B keeps changing the weights learned by autoencoder_A ?

The training of the single encoder and 2 decoders happens like in the following simplified code.

self.encoder = self.Encoder()
self.decoder_A = self.Decoder()
self.decoder_B = self.Decoder()
...
self.autoencoder_A = KerasModel(x, self.decoder_A(self.encoder(x)))
self.autoencoder_B = KerasModel(x, self.decoder_B(self.encoder(x)))

for i in epochs:
    self.autoencoder_A.train_one_batch(...) 
    self.autoencoder_B.train_one_batch(...) # doesn't this reset the encoder weights

My understanding when training one autoencoder_A it does not change the weights of autoencoder_B's decoder but changes the weights of the encoder since it is shared. Please correct me if i am wrong.

How does the loss gets minimized if one autoencoder changes the weight of shared encoder another alternatively ?

GitHub repos for 3D Face Model

3D face model can be incorporated as a post-processing solution of current faceswap model for sub-optimal performance on profile face as well as intense facial expressions. For example, matching input/output face expression.

1. eos: A lightweight header-only 3D Morphable Face Model fitting library in modern C++11/14.

Link: https://github.com/patrikhuber/eos
Description: EOS provides python binding and can be easily installed through pip. It supports Basel Face Model 2017 (BFM is used in Face2Face paper).

2. ExpNet: Landmark-Free, Deep, 3D Facial Expressions (Feb. 2018, First Release)

Link: ExpNet
Description: Pretrained weights are provided. It uses tensorflow thus can be easily ported to keras. The limitation of facial expression of their previous FacePoseNet is solved.
Comment: ExpNet can be used as a feature extractor for perceptual loss. But it is "less
able to handle extreme facial expressions" according to ExpNet paper, fig. 7.

Swapping more than face

Hi guys, I'm wondering about the possibility of expanding the swapping to more than face (chin, hairs).

Is it possible? Any idea on that?

NVIDIA GAN for higly detailed output

Just thought about deepfakes when I saw this : https://medium.com/generate-vision/the-paperoftheweek-1-was-a-style-based-generator-architecture-for-generative-adversarial-networks-a36e134c2fbe . It is impressive how they generate highly detailed faces!

Why do two autoencoders need to be trained?

In the training step in faceswap, there are two autoencoders (A and B) trained, one for each of the following tasks:

input face A -> encoder -> base vector A -> decoder A -> output face which resembles A
input face B -> encoder -> base vector B -> decoder B -> output face which resembles B

Then, decoder B is applied to image A, which is what does the swap:

input face A -> encoder -> base vector A -> decoder B -> output face which resembles B

(If this is wrong, please correct me!)

First question: what does it mean for there to be only a single encoder? Since decoder A is not used in the final conversion, I assume the reason why training autoencoder A is necessary is because it contributes to the encoder. (Is this correct? Does decoder A need to be trained simply because without the decoder, there's no way to define autoencoder A's loss function?) Should I think of the encoder as something that transforms inputs into some kind of shared low-dimensional representation of both A and B...? As you can probably tell from this wording, I'm a bit stuck on this point...

Second question: since there's only a single encoder, does this mean that, during training, I should care about both loss_A and loss_B values (even though I'm only interested in swapping B to A)?

Third question: does it matter which autoencoder is trained first, during each training step? In plugins/Model_Original/Trainer.py:

loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A)
loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)

If these two lines were reversed, would the after-the-training-step states of autoencoder_A and autoencoder_B be the same?

Fourth question: I noticed in plugins/Model_Original/Model.py's Decoder function, there's a single Conv2D invoked with activation='sigmoid'. Is there a reason that sigmoid is used here? I see that the encoder uses LeakyReLU instead; why is that?

Sorry for the wall of text; hopefully this is the right place to ask questions like this. If not, could someone point me in the right direction?

Feasability - use for facial tracking/paint work

Hey all -

I work in the VFX industry, and I was curious - how feasible would code similar to this be for tracking an artist's work (on a single frame) onto a face?

For example, we are often asked to paint in/out elements on an actors face (add scars, remove acne). From my understanding, the algorithm takes multiple images of each actor and creates a parity model to be applied. Would there a way to then do a paint job on a single frame, and apply that paint through the tracked model?

Thanks!

-Mike

Face detection algorithms

Following an issue on main faceswap repo, I tried to benchmark misc face detection approaches.

Here is the code I used.

The test photo from the issue had a very inclined face, which was only detected by mmod_human_face_detector of dlib

import cv2
from PIL import Image
import face_recognition

# Load the jpg file into a numpy array
image = cv2.imread("src/test.jpg")

# Find all the faces in the image using the default HOG-based model.
# This method is fairly accurate, but not as accurate as the CNN model and not GPU accelerated.
# See also: find_faces_in_picture_cnn.py
face_locations = face_recognition.face_locations(image)

print("I found {} face(s) in this photograph.".format(len(face_locations)))

for face_location in face_locations:

    # Print the location of each face in this image
    top, right, bottom, left = face_location
    print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))

    # You can access the actual face itself like this:
    face_image = image[top:bottom, left:right]
    pil_image = Image.fromarray(face_image)
    pil_image.show()

########################################################################

import cv2
import numpy

# Give right path to the xml file or put it directly in current folder
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

#Add : cv.EqualizeHist(image, image) ?
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Detect faces in the image
faces = face_cascade.detectMultiScale(
    gray,
    scaleFactor=1.1,
    minNeighbors=5,
    minSize=(30, 30),
    flags = cv2.CASCADE_SCALE_IMAGE
)
print("I found {} face(s) in this photograph with haarcascade.".format(len(face_locations)))

for (x,y,w,h) in faces:
    face_image = image[y: y + h, x: x + w]
    pil_image = Image.fromarray(face_image)
    pil_image.show()

########################################################################

import sys
import dlib
from skimage import io

detector = dlib.get_frontal_face_detector()

dets = detector(image, 1)
print("Number of faces detected: {}".format(len(dets)))
for i, d in enumerate(dets):
    print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
        i, d.left(), d.top(), d.right(), d.bottom()))

########################################################################

import sys
import dlib
from skimage import io

detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")

dets = detector(image, 1)
print("Number of faces detected: {}".format(len(dets)))
for i, d in enumerate(dets):
    print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
        i, d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom()))

Exploring new models

There are many ways to generate faces, if you find some interesting models, discuss it here

Resources on faces/poses generative network

Found here https://github.com/nightrome/really-awesome-gan :

A GAN that generates frontal faces from other poses. Maybe this can help if we lack some poses from target dataset. Maybe they could be generated.
Pose Guided Person Image Generation. This network generates poses from landmarks. Also has some insterest for low detail generation

Enhance generated faces

As spotted on https://www.reddit.com/r/deepfakes/comments/7m0irj/new_jenna_coleman_scenes_training_data/ , there's a NN that can enhance low res photos. Maybe it could be useful someday

https://github.com/alexjc/neural-enhance/blob/master/README.rst

Does anyone successfully train a generic encoder?

How do I reuse models in other faceswaps?

Let's say I have a well trained faceswap with Person A -> Person B
And now I want another faceswap with Person C -> Person A.

Can I reuse, say, encoder_A.h of model 1 and paste that as encoder_B.h in model 2?

Learning rate, beta1, beta 2 in adam optimizer in models

Does anyone have any background on why the values of the 3 parameters of the adam optimizer were chosen? In particular for the learning rate, too low initial learning rates can produce poor accuracy while too high initial learning rates can be unstable and inaccurate too. there is a sweet spot. also high learning rates are faster.

Normal ranges in the academic literature are suggested as 0.1 to 0.0001. the current setting is lower than that at 0.00005. PS - the adam optimizer naturally reduces the learning rate anyways over the course of epochs. with such a lower initial starting rate there isn't much to optimize.

http://openaccess.thecvf.com/content_ICCV_2017/papers/Korshunova_Fast_Face-Swap_Using_ICCV_2017_paper.pdf

a very similar model uses 0.001 to start and overrides the adam optimizer at set intervals over their training with it slowly being reduced to 0.0001

potential accuracy gain if we experiment with altering the initial learning rate?

[Paper] Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network
Hai X. Pham, Yuting Wang, Vladimir Pavlovic

https://arxiv.org/pdf/1803.07716.pdf

References:
...
42. Deepfakes. https://github.com/deepfakes/faceswap

Wonder if this is the first time deepfakes got cited in an academic paper.

Edit: Just skimmed through this paper. It proposed a conditional GAN model w/ an auxiliary classifier "that is capable of synthesizing novel faces from any source portrait given a vector of action unit coefﬁcients". The "vector of facial action unit (AU) intensities" (don't know what exactly AU is) is treated as the conditional vector, which is concatenated to the embedding of generator. In addition, a pre-trained AU estimator is introduced for AU loss (this loss term is basically the same as perceptual loss).

Problems with GPU training.

I have a 1.5 gb gpu (960M) and setting the trainer to LowMem did not help and combining it with a batch size of 16, 8, 4 or 2 did not improve much.

I read that you should change ENCODER_DIM. How do I do this?

Also, can I resume training a model after I exit the application?

[Paper] Fast Face-swap Using Convolutional Neural Networks

A paper with another approach at face swapping. I don't know how it compares to ours, especially as it shows only examples of frontal poses. But maybe there is some useful info to pick (loss function maybe ?)

https://arxiv.org/abs/1611.09577

Why do we use warped images as part of the loss function?

Im just trying to understand the intuition behind setting up the autoencoding train process as a function that maps warped A (or B) to original A (or B). I've been reading the paper: https://arxiv.org/pdf/1706.02932v2.pdf, but their motivations seem different.

Is it for the purpose of being able to produce the a face that maps to a feature/expression (expressed by the warping)?

Improve convert script efficiency

Hello everyone,

I noticed that the training script uses every core I have on my CPU but the convert script must be using only one as I see a low CPU usage. Because of that, converting a lot of pictures is really long on CPU.

Is it possible to use multi-threading to speed things up ? Or the way we do the conversion prevents that ?

can anyone tell me what is a masked model ?

Thank you..!