Giter Club home page Giter Club logo

faceswap-model's Introduction

Notice: This Repo is now in Read Only mode. Information here is likely to be increasingly outdated. Discussion about Faceswap usage should be redirected to our Forums at https://www.faceswap.dev

faceswap-model

If you are interested in Generative networks and Autoencoders, this is the place where you can discuss with others. The "official" code is in the faceswap repo, but feel free to propose alternatives here.

faceswap-model's People

Contributors

deepfakes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

faceswap-model's Issues

Resources for image refinment

Histogramm loss func.

tf.histogram_fixed_width is not differentiable to use in loss func.

I created differentiable version for image histogram.

def tf_image_histogram (tf, input):
    x = input
    x += 1 / 255.0
    
    output = []
    for i in range(256, 0, -1):
        v = i / 255.0
        y = (x - v) * 1000
        
        y = tf.clip_by_value (y, -1.0, 0.0) + 1

        output.append ( tf.reduce_sum (y) )
        x -= y*v

    return tf.stack ( output[::-1] )

^ result same as tf.histogram_fixed_width

and with mean square diff

hist_loss = tf.reduce_mean ( tf.square ( ( tf_image_histogram (tf, y_true) - tf_image_histogram(tf, y_pred) ) /65536 ) )

nn finds out nearest set of pixels from noise which represent same histogram
python_2018-05-13_09-37-28

so how to use it to train image histogram ?

Suggestion: Focal length of source images

I have taken a look at some results and think that the focal lenght of the sources images would make a good feature to enhance the results of the face sawp.

Extracting the focal length from the training images is not trivial and could also be achieved by means of learning like deepfocal.

GAN Discriminator loss wildly variant - expected?

I was trying the GAN model, and the loss values for the discriminator (ie, Loss_DA and Loss_DB) are fluctuating a lot. Every iteration, it jumps randomly between a value somewhere in the range from < .01 to .25. I know it's essentially a binary decision, so more random variation from run to run might be expected, but this seems somewhat extreme, and makes me worry that I'm not getting useful derivatives.

Is this normal? Should I adjust some hyperparameters? (Like batch size, for instance? It's currently 64...)

I should note that I've only been running the training for a few hours thus far, so it's possible things are working as they should - the variation just seemed concerning. The generator seems to be more stable, and has loss values which are generally trending downward, which is good.

(BTW, not sure if this is the right place to post general "discussion" stuff like this - please let me know if there's some other forum / reddit / whatever I should go for similar questions!)

Paper : Boundary-Aware Face Alignment Algorithm

https://wywu.github.io/projects/LAB/LAB.html

Abstract:

We present a novel boundary-aware face alignment algorithm by utilising boundary lines as the geometric structure of a human face to help facial landmark localisation. Unlike the conventional heatmap based method and regression based method, our approach derives face landmarks from boundary lines which remove the ambiguities in the landmark definition

[Featured] Deepfakes uses in the wild

Even if deepfakes have gained lot of attention in the wrong reasons, it is known also to be used for movie faking. If you find interesting use case, please post them here

[Edited for content]

[Google Research Blog] Mobile Real-time Video Segmentation

https://research.googleblog.com/2018/03/mobile-real-time-video-segmentation.html

Current GAN model has a known issue that it sometimes producing video with noticeable flickering. Although we've tried to solve it by applying moving avg. of bounding box coordinate and better tracking model, there are still rooms for improvement. The described method in this blog that achieving frame-to-frame continuity shed light on how we can tackle this problem in our future work.

This blog introduces a segmentation network for mobile application, one of its requirement/constraint is:

A video model should leverage temporal redundancy (neighboring frames look similar) and exhibit temporal consistency (neighboring results should be similar).

They achieve frame-to-frame temporal continuity by concatenating a previous (frame) mask to input channel.

we first pass the computed mask from the previous frame as a prior by concatenating it as a fourth channel to the current RGB input frame to achieve temporal consistency,

There are also data augmentation techniques for ground truth masks described in Training Procedure.

Use OpenCL for compatibility with AMD GPUs

I know the project is using tensorflow which uses CUDA, so for now it is exlusive to NVIDIA GPUs.

Is there a way to use another framework that would use OpenCL to make it work on AMD GPUs and maybe Intel integrated GPUs ?

Or is it completely unimaginable ?

Survey of Model Improvements Proposals

Creating a list to organize my own thoughts but I would love to hear everyone else's ideas and suggestions as well as what they're looking at

Ways to improve the faceswap mode( accuracy, speed, robustness to outliers, model convergence )

  1. Improved face detection options ( Current code: dlib/CNN+mmod )
  • MTCNN
  • Mask R-CNN ( would also provide a pixel-by-pixel semantic mask for the face )
  • YOLOv2
  1. Improved face recognition ( Current code: ageitgey/face_recognition )
  1. Improved face alignment ( Current code: 1adrianb/face-alignment )
  1. Add Batch Normalization after Conv and Dense Layers in the autoencoder ( Current code: No norm )

  2. Replace rectifier in the conv layer with more robust alternatives ( Current code: Leaky RELU )

    • PRELU
    • ELU
    • PELU
  3. Adjust learning rate when user changes batch size ( Current code: no LR scaling with batch size )

    • lots of papers but simply, if you have a larger batch, you can afford to explore larger step sizes and train more quickly with the same model stability
    • Apply linear scaling factor to learning rate if batch size is increased ( doubling batch size double learning rate )
  1. Explore using other optimizers in autoencoder ( Current code: Adam )

    • SGD with momentum
    • Cyclical Learning Rate
    • L4adam
    • Yellow Fin
  2. Effective learning rate schedule ( Current code: no adjustment of LR after start or at stagnation )

  • best practice is to modify(lower) the learning rate or (raise) the batch size either at set intervals or after the loss has stagnated ( Note: for a constant mini-batch size of say 16 which is limited by GPU memory, you can still increase batch size, i.e. running on multiple GPUs or sequential GPU runs )
  • https://openreview.net/pdf?id=B1Yy1BxCZ
  • also related: model restarting at set intervals with higher LR to kick out of local minimum
  • https://arxiv.org/pdf/1608.03983.pdf
  1. Initial learning rate ( Current code: 5e-5 )
  • Dependent on model architecture ( normalization, batch size, regularization, better rectifiers, optimizers allow you to increase LR with same stability/accuracy
  • Suspect it is too low but current model has few of the tweaks which promote training stability
  • Also highly dependent on default batch size
  1. Use keras.preprocessing.image.ImageDataGenerator ( Current code: random_transform and other )

Threw in a lot, but can add more if anyone ever looks at this. PS - I was looking at 4/5/7/9 with a re-write of the Original_Model

Training on 1 face only

Hi guys, I'm wondering about the possibility of training the models on 1 face only instead of 2.

The actual workflow makes it mandatory to train a model for a specific src/target pair which is not convenient. If a training can be done with just 1 face, once a model is trained, it would be much more shareable, no?

What do you think?

How does the loss minimize if autoencoder_B keeps changing the weights learned by autoencoder_A ?

The training of the single encoder and 2 decoders happens like in the following simplified code.

self.encoder = self.Encoder()
self.decoder_A = self.Decoder()
self.decoder_B = self.Decoder()
...
self.autoencoder_A = KerasModel(x, self.decoder_A(self.encoder(x)))
self.autoencoder_B = KerasModel(x, self.decoder_B(self.encoder(x)))

for i in epochs:
    self.autoencoder_A.train_one_batch(...) 
    self.autoencoder_B.train_one_batch(...) # doesn't this reset the encoder weights 

My understanding when training one autoencoder_A it does not change the weights of autoencoder_B's decoder but changes the weights of the encoder since it is shared. Please correct me if i am wrong.

How does the loss gets minimized if one autoencoder changes the weight of shared encoder another alternatively ?

GitHub repos for 3D Face Model

3D face model can be incorporated as a post-processing solution of current faceswap model for sub-optimal performance on profile face as well as intense facial expressions. For example, matching input/output face expression.

1. eos: A lightweight header-only 3D Morphable Face Model fitting library in modern C++11/14.

2. ExpNet: Landmark-Free, Deep, 3D Facial Expressions (Feb. 2018, First Release)

  • Link: ExpNet
  • Description: Pretrained weights are provided. It uses tensorflow thus can be easily ported to keras. The limitation of facial expression of their previous FacePoseNet is solved.
  • Comment: ExpNet can be used as a feature extractor for perceptual loss. But it is "less
    able to handle extreme facial expressions" according to ExpNet paper, fig. 7.

Swapping more than face

Hi guys, I'm wondering about the possibility of expanding the swapping to more than face (chin, hairs).

Is it possible? Any idea on that?

Why do two autoencoders need to be trained?

In the training step in faceswap, there are two autoencoders (A and B) trained, one for each of the following tasks:

input face A -> encoder -> base vector A -> decoder A -> output face which resembles A
input face B -> encoder -> base vector B -> decoder B -> output face which resembles B

Then, decoder B is applied to image A, which is what does the swap:

input face A -> encoder -> base vector A -> decoder B -> output face which resembles B

(If this is wrong, please correct me!)

First question: what does it mean for there to be only a single encoder? Since decoder A is not used in the final conversion, I assume the reason why training autoencoder A is necessary is because it contributes to the encoder. (Is this correct? Does decoder A need to be trained simply because without the decoder, there's no way to define autoencoder A's loss function?) Should I think of the encoder as something that transforms inputs into some kind of shared low-dimensional representation of both A and B...? As you can probably tell from this wording, I'm a bit stuck on this point...

Second question: since there's only a single encoder, does this mean that, during training, I should care about both loss_A and loss_B values (even though I'm only interested in swapping B to A)?

Third question: does it matter which autoencoder is trained first, during each training step? In plugins/Model_Original/Trainer.py:

loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A)
loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)

If these two lines were reversed, would the after-the-training-step states of autoencoder_A and autoencoder_B be the same?

Fourth question: I noticed in plugins/Model_Original/Model.py's Decoder function, there's a single Conv2D invoked with activation='sigmoid'. Is there a reason that sigmoid is used here? I see that the encoder uses LeakyReLU instead; why is that?

Sorry for the wall of text; hopefully this is the right place to ask questions like this. If not, could someone point me in the right direction?

Feasability - use for facial tracking/paint work

Hey all -

I work in the VFX industry, and I was curious - how feasible would code similar to this be for tracking an artist's work (on a single frame) onto a face?

For example, we are often asked to paint in/out elements on an actors face (add scars, remove acne). From my understanding, the algorithm takes multiple images of each actor and creates a parity model to be applied. Would there a way to then do a paint job on a single frame, and apply that paint through the tracked model?

Thanks!

-Mike

Face detection algorithms

Following an issue on main faceswap repo, I tried to benchmark misc face detection approaches.

Here is the code I used.

The test photo from the issue had a very inclined face, which was only detected by mmod_human_face_detector of dlib

import cv2
from PIL import Image
import face_recognition

# Load the jpg file into a numpy array
image = cv2.imread("src/test.jpg")

# Find all the faces in the image using the default HOG-based model.
# This method is fairly accurate, but not as accurate as the CNN model and not GPU accelerated.
# See also: find_faces_in_picture_cnn.py
face_locations = face_recognition.face_locations(image)

print("I found {} face(s) in this photograph.".format(len(face_locations)))

for face_location in face_locations:

    # Print the location of each face in this image
    top, right, bottom, left = face_location
    print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))

    # You can access the actual face itself like this:
    face_image = image[top:bottom, left:right]
    pil_image = Image.fromarray(face_image)
    pil_image.show()

########################################################################

import cv2
import numpy

# Give right path to the xml file or put it directly in current folder
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

#Add : cv.EqualizeHist(image, image) ?
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Detect faces in the image
faces = face_cascade.detectMultiScale(
    gray,
    scaleFactor=1.1,
    minNeighbors=5,
    minSize=(30, 30),
    flags = cv2.CASCADE_SCALE_IMAGE
)
print("I found {} face(s) in this photograph with haarcascade.".format(len(face_locations)))

for (x,y,w,h) in faces:
    face_image = image[y: y + h, x: x + w]
    pil_image = Image.fromarray(face_image)
    pil_image.show()

########################################################################

import sys
import dlib
from skimage import io

detector = dlib.get_frontal_face_detector()

dets = detector(image, 1)
print("Number of faces detected: {}".format(len(dets)))
for i, d in enumerate(dets):
    print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
        i, d.left(), d.top(), d.right(), d.bottom()))

########################################################################

import sys
import dlib
from skimage import io

detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")

dets = detector(image, 1)
print("Number of faces detected: {}".format(len(dets)))
for i, d in enumerate(dets):
    print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
        i, d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom()))

Exploring new models

There are many ways to generate faces, if you find some interesting models, discuss it here

How do I reuse models in other faceswaps?

Let's say I have a well trained faceswap with Person A -> Person B
And now I want another faceswap with Person C -> Person A.

Can I reuse, say, encoder_A.h of model 1 and paste that as encoder_B.h in model 2?

Learning rate, beta1, beta 2 in adam optimizer in models

Does anyone have any background on why the values of the 3 parameters of the adam optimizer were chosen? In particular for the learning rate, too low initial learning rates can produce poor accuracy while too high initial learning rates can be unstable and inaccurate too. there is a sweet spot. also high learning rates are faster.

Normal ranges in the academic literature are suggested as 0.1 to 0.0001. the current setting is lower than that at 0.00005. PS - the adam optimizer naturally reduces the learning rate anyways over the course of epochs. with such a lower initial starting rate there isn't much to optimize.

http://openaccess.thecvf.com/content_ICCV_2017/papers/Korshunova_Fast_Face-Swap_Using_ICCV_2017_paper.pdf

a very similar model uses 0.001 to start and overrides the adam optimizer at set intervals over their training with it slowly being reduced to 0.0001

potential accuracy gain if we experiment with altering the initial learning rate?

[Paper] Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network
Hai X. Pham, Yuting Wang, Vladimir Pavlovic

https://arxiv.org/pdf/1803.07716.pdf

References:
...
42. Deepfakes. https://github.com/deepfakes/faceswap

Wonder if this is the first time deepfakes got cited in an academic paper.

Edit: Just skimmed through this paper. It proposed a conditional GAN model w/ an auxiliary classifier "that is capable of synthesizing novel faces from any source portrait given a vector of action unit coefficients". The "vector of facial action unit (AU) intensities" (don't know what exactly AU is) is treated as the conditional vector, which is concatenated to the embedding of generator. In addition, a pre-trained AU estimator is introduced for AU loss (this loss term is basically the same as perceptual loss).

image

Problems with GPU training.

I have a 1.5 gb gpu (960M) and setting the trainer to LowMem did not help and combining it with a batch size of 16, 8, 4 or 2 did not improve much.

I read that you should change ENCODER_DIM. How do I do this?

Also, can I resume training a model after I exit the application?

Improve convert script efficiency

Hello everyone,

I noticed that the training script uses every core I have on my CPU but the convert script must be using only one as I see a low CPU usage. Because of that, converting a lot of pictures is really long on CPU.

Is it possible to use multi-threading to speed things up ? Or the way we do the conversion prevents that ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.