shaoanlu / faceswap-gan Goto Github PK

View Code? Open in Web Editor NEW

3.4K 177.0 842.0 2.21 MB

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

Jupyter Notebook 72.61% Python 27.39%

face-swap generative-adversarial-network gan gans image-manipulation

faceswap-gan's Introduction

👋Hello, I'm Shao-An!

Working as a Control Software Engineer.
Grew up in Taiwan, residing in the vibrant city of Tokyo, Japan.
Interested in Control Systems, Optimization, and Deep Generative Models.
Currently engaged in the exploration of Software Architecture within Robotic Systems.

Recent Activity

🎉 Merged PR #1 in shaoanlu/CBF_QP_safety_filter
💪 Opened PR #1 in shaoanlu/CBF_QP_safety_filter
💪 Opened PR #234 in qpsolvers/qpsolvers
🎉 Merged PR #6 in shaoanlu/qpsolvers
💪 Opened PR #6 in shaoanlu/qpsolvers

Projects

Control and Optimization
- Diffusion policy for Quadrotor Control: Implements cutting-edge diffusion models for imitation learning in quadrotor control.
- Safety Filter for Collision Avoidance: Employs Control Barrier Functions (CBF) as safety filters for collision avoidance.
- Motion Planning with MPPI: Integrates Model Predictive Control and CBF in a sampling-based approach for motion planning.
Deep Learning and Computer Vision
- Few-Shot Face Translation: Features Generative Adversarial Networks trained on the VGGFace2 dataset for face-translation with few-shot examples.
- Face Swapping GAN: Applies Generative Adversarial Networks for face-swapping in images.
- Facial Analysis Toolbox: Offers a collection of deep learning models for facial analysis in Keras.

faceswap-gan's People

Contributors

Stargazers

Watchers

Forkers

5kejun jjdblast giveup shafiahmed y495965825 canphp finderl chaojie lloves angilis wa-sans-tv magnetowang longjohncoder anazou clorr jchacon4 khamv hhgzljp ganzib4fun awesome-archive khchanaq minstrelsy exgile shubhampachori12110095 xiusdk zhubofei shaoweipng jordipala bigmac447 brandhill hunslater-deeplearning dimdrop prabhjotsl tyolab david30907d datar-ai th4tirishguy leezqcst sybersu huleg forsskieken niotw junhuicao sszllx adairzhao aaronedmistone unaventures hangwudy lavrovd primemover2011 fastrocket rodrmp bigrlab cleoag shoff leerock laal65 johnwall2016 keyky phantoan xxradon jonkoi leerisk tcast necrno scholltan bradfox2 wulingtian clover2008 lebronyxm johndpope sophrinix ruah1984 mrthiago kalvar vish25v qsdj huguanglong wilsonleeee loinapex dineshkumares wq49 jffifa flashus bssrdf bradypuz chickenlove andrewchan2022 gurpreetshanky mousechen mulx10 seifer08ms cmschmtt wangmn93 keyboyi cybort cosecant-csc interesting-opensource-projects duyamin lexfreeman

faceswap-gan's Issues

Make bbox tracking configurable for video processing

Some videos tend to have faster head movements than others, so the default, hardcoded value of 0.65 weight for keeping the previous coords may not suit some of the needs. Consider changing the binary parameter of use_smoothed_bbox = True, to something more like bbox_smoothing_wkeep=0.65 and change the calls to:

def get_smoothed_coord(x0, x1, y0, y1, wkeep=0.65):
    global prev_x0, prev_x1, prev_y0, prev_y1
    x0 = int(wkeep*prev_x0 + (1-wkeep)*x0)
    x1 = int(wkeep*prev_x1 + (1-wkeep)*x1)
    y1 = int(wkeep*prev_y1 + (1-wkeep)*y1)
    y0 = int(wkeep*prev_y0 + (1-wkeep)*y0)
    return x0, x1, y0, y1

if bbox_smoothing_wkeep > 0:
            if frames != 0:
                x0, x1, y0, y1 = get_smoothed_coord(x0, x1, y0, y1, wkeep=bbox_smoothing_wkeep)
                set_global_coord(x0, x1, y0, y1)
            else:
                set_global_coord(x0, x1, y0, y1)
                frames += 1

Eventually it would be great to implement some sort of tracking mechanism with a PID-like behavior.

dlib video face detection takes massive amount of time

I am not a jupyter user although I assume I have the repository setup correctly as it seems to be doing work. I have been working with other code bases for a while using the same modules so a dependency issue is likely not an issue.

When I step through the code for dlib_video_face_detection.ipynb I get to the code block where moviepy does some manipulation of an input video. According to the timestamp on the output I am looking at a very long time before it is complete.

0%| | 9/15887 [11:09<329:34:28, 74.72s/it]

The target video is 1280x720, 00:08:50 long with a data rate of 1413kbps. My hardware consists of a 3.5ghz i5, gtx1080, and 16gb of ram. Training large data sets have not been a problem for me so I am unsure of why processing a video frame by frame would take this long.

What is the purpose of the

output = '_.mp4'
clip1 = VideoFileClip("x-cropped.mp4")
clip = clip1.fl_image(process_video)#.subclip(0,10) #NOTE: this function expects color images!!
%time clip.write_videofile(output, audio=False)

block besides to run the process_video method on each frame? I have used the dlib module as a stand alone script and it processes the video in a handful of minutes pulling many faces as a result.

Would it be beneficial to pre-process the video file in ffmpeg before handing the work to the notebook in any way? Perhaps rip the frames beforehand so that moviepy would not need to step frame by frame?

The Mask geneartion and learn process

I just saw the new update on mask generation and wanted to ask if you could elaborate on how it works and if the code is available to inspect...

By the way, nice idea (and implementation) !

Swapping mouth movement

Does anyone have any idea how one would approach swapping mouth movement? I.e. transforming the target's mouth shape to match the source, rather than swapping the look. This may be a different project entirely...

Tensorboard support?

dataset for training

Hi, @shaoanlu

Thanks for your nice work!

you have shown a result of trained models transforming Hinako Sano (佐野ひなこ, left) to Emi Takei (武井咲, right). Since you have provided the source video of Hinako Sano (佐野ひなこ, left), the training data of Emi Takei (武井咲, right) is not provided. Could please kindly give a link to this data.

Thanks.

Training iteration duration

How long should a single iteration take to train? For reference, I'm using a Tesla P100 and it's taking about 50 seconds.

time is 1875.640280

I am wondering if the train screen changes every 1875 seconds and this speed is correct.
Can not use this program with CUDA?

[2/150][50] Loss_DA: 0.205057 Loss_DB: 0.193327 Loss_GA: 0.415360 Loss_GB: 0.413623 time: 1875.640280
[4/150][100] Loss_DA: 0.200707 Loss_DB: 0.211183 Loss_GA: 0.295002 Loss_GB: 0.341839 time: 3606.393274

Q: What's the reasoning behind using PixelShuffler over Conv2DTranspose?

The upscale block uses pixel shuffler after 4x convolution. I understand that this is a neat way of increasing the number of coefficients and then nicely reshaping everything to bring it one resolution step up, but why this and not Conv2DTranspose?

def upscale_ps(filters, use_norm=True): def block(x): x = Conv2D(filters*4, kernel_size=3, use_bias=False, kernel_initializer=RandomNormal(0, 0.02), padding='same' )(x) x = LeakyReLU(0.1)(x) x = PixelShuffler()(x) return x return block

How do I obtain decoded face result without masking.

Sorry, I'm a python newbie.
I tried with below code but result image looks darker than the preview.
Could you tell me the correct way to get decoding result without masking?

ae_input = cv2.resize(input_face, (128,128))/255. * 2 - 1        
result = np.squeeze(np.array([path_abgr_B([[ae_input]])]))     

# Trying to get ouput witout mask
raw_face = cv2.cvtColor(result[:,:,1:], cv2.COLOR_BGR2RGB)

What is the Repeat Point?

Could you please elaborate a bit about the mixup in the Repeat Point?
Do we really have to manually change and run it or it's optional? and what difference does this snippet make?

For 1 ~ 10000 iteratioons, set:
mixup = lam * concatenate([real, distorted]) + (1 - lam) * concatenate([fake, distorted])
loss_G += K.mean(K.abs(fake_rgb - real))
fake_sz224 = tf.image.resize_images(fake, [224, 224]) # or set use_perceptual_loss = False

For 10000 ~ 13000 iterations, set:
mixup = lam * concatenate([real, distorted]) + (1 - lam) * concatenate([fake_rgb, distorted])
loss_G += K.mean(K.abs(fake - real))
fake_sz224 = tf.image.resize_images(fake, [224, 224]) # Ignore this line if you dont wan to use perceptual loss

For 13000 ~ 16000 or longer iterations, set:
mixup = lam * concatenate([real, distorted]) + (1 - lam) * concatenate([fake_rgb, distorted])
loss_G += K.mean(K.abs(fake - real))
fake_sz224 = tf.image.resize_images(fake_rgb, [224, 224]) # Ignore this line if you dont wan to use perceptual loss

And thank you for your work... I've been learning lots of new things from your take on Keras..

Training stops at a certain iteration

I have a GTX 1080 and have trained successfully the original NN, but when I try to train these, they run for like an hour and then stop training at a given iteration. If I rerun the function, it stops at the same one.

dlib face detection in video

i test the dlib face detection in my own video(about 10s), but it output only one image ,not output the each frame's face image.

Which paper did the code based on?

Please, Which paper did the code based on? I want to understand the pipeline according to the paper.

IOPub data rate exceeded. (FaceSwap_GAN_v2_sz128_train)

When try to use FaceSwap_GAN_v2_sz128_train.ipynb i get this error:

PD: This not happen when i use FaceSwap_GAN_v2_train.ipynb, but i would like to try the 128 too.

我没有mac 没有 GPU 训练很慢！资源文件，能一个吗？

资源文件，和训练好的模型能给我一个吗？感谢 [email protected]

Pre-trained models

First of all, very impressive results. Could you provide some pre-trained models for us to test out your implementation. Google Drive could be a good place to host the models.

ImportError: No module named 'keras_vggface'

Thank you for developing a great program. I installed keras_vggface as "!Pip install keras_vggface" normally. But, the following error occurred. I searched the problem on Google but could not resolve it. Could you tell me how to fix it? Thank you in advance.

IN [1]: from keras_vggface.vggface import VGGFace

ImportError Traceback (most recent call last)
in ()
----> 1 from keras_vggface.vggface import VGGFace

ImportError: No module named 'keras_vggface'

Loss of generator cannot be lowered down below 0.26

I used my GTX970 to train for about 45000 iters, with batchsize=16. However I found that both loss_GA and loss_GB cannot be lowered down below 0.26.
I'm not familiar with GAN, is it because my datasets' lack of diversity? Or dicriminators is over trained?
My both sets contain about 3000 images extracted from video.

Question about preview images

First of all, thank you so much for such a detailed notebook!

Could you explain a little bit more about the training previews?

I am a long time developer, but my specialty is at node.js.
With this new face-swapping hype, I decided to jump on the train for fun, but some stuff is still a bit confusing for me.
I've read some GAN papers and I found this to be the most effective (and most fun!) path to do face swapping.

First of all: Why are the faces on the output sample blue-ish? Is this the correct behavior?
Second: I'm noticing some weird hard pixels around the actors' noses, specifically around Tom Hiddleston's nose. Is this also supposed to happen?
Third: The third column at the sample masks is empty. Is this correct?

Hardware
I am currently running the training script on this setup:

NVIDIA GK210 GPU w/ 12GiB DDR5 VRAM
tensorflow 1.5.0
CUDA 9.0

This is running ~1 iteration/sec.
Current loss information:

Loss_DA: 0.001124 Loss_DB: 0.000390 Loss_GA: 0.008428 Loss_GB: 0.010443

Sample after ~1k iterations

Sample after ~2.5k iterations

Most recent sample after ~3.2k iterations with mask preview

Running "10. Start Training" itself results in out of memory error

Executing the third cell of "10. Start Training" results in following error

2018-01-27 11:21:44.433195: I tensorflow/core/common_runtime/bfc_allocator.cc:683] Sum Total of in-use chunks: 1.41GiB
2018-01-27 11:21:44.433220: I tensorflow/core/common_runtime/bfc_allocator.cc:685] Stats: 
Limit:                  1516306432
InUse:                  1516306432
MaxInUse:               1516306432
NumAllocs:                    1278
MaxAllocSize:            143701248

2018-01-27 11:21:44.433336: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ***************x*******xxx************************************************************************xx
2018-01-27 11:21:44.433369: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[3,3,512,1024]

OS: ubuntu 17.10
CUDA: 8.0
Tensorflow: 1.4

Nvidia GPU model: GM108M [GeForce 920MX]

No preview screen showing while training

Running FaceSwap_GAN_v2_train, but no preview screen is showing.

"Weights file not found." despite them being present

try:
	encoder.load_weights("models/encoder.h5")
	decoder_A.load_weights("models/decoder_A.h5")
	decoder_B.load_weights("models/decoder_B.h5")
	# netDA.load_weights("models/netDA.h5") 
	# netDB.load_weights("models/netDB.h5") 
	print("model loaded.")
except:
	print("weights file not found.")
	pass

At this point in the code, it always fails, saying it couldn't find the weight files. They are located at ./faceswap-GAN-master/models/. Is this incorrect? I should note that the model is the Trump to Cage model from the deepfakes/faceswap project, and that I commented out netDA and netDB because they do not exist.

Any help? Thank you.

is this normal GAN masks in FaceSwap ?

dots with oversharped image
is this normal , or faceswap developers adapted your plugin with errors?

dlib_video_face_detection extract 2 different sizes?

186x186 & 223x222

It's ok or it's a bug?

Idea about matching color before/during training.

I found that one factor for archive good result depends on the variety of lighting/color the data set but sometimes we have a very limit dataset.
Is it possible to match color between two pictures before/during to improve the result of training?
http://answers.opencv.org/question/178127/matching-colors-between-two-pictures-in-opencv/

Is it supposed to look like this after a night of training?

OUTPUT_VIDEO:

https://www.youtube.com/watch?v=wew_q9yJMug

Train preview images:

Also there doesn't seems to be any attempt for try to match colors?

gtx 1060 with 6G ram, out of memory

I am running the script in gtx 1060 with 6G ram laptop, it halts on 56 niters and said out of memory, is there any way I could lower the memory requirement? Thanks.

Feather edge of Smoothed bbox

I want to suggest adding an algorithm to feather the edges of the bounding box such as the opencv feather blender or use a opacity gradient to blend in the bounding box edges where an radius is defined. This would hide the jitter a bit better.

Have you implemented something like this for the alignment and smoothing?

Here's the Link... Do you think this approach does better with the alignments or the current code is better ?

Could you change the input and output to 128, 128 for WGAN model?

Hi, I see a WGAN in temp folder. Could you help change the input and output to 128, 128 for WGAN model? I tried to change it but not succeed, Thanks!!

Error when run FaceSwap_GAN_v2_train.ipynb

Runing on jupyter notebook:

ERROR: https://pastebin.com/egjXVyG3

CONDA LIST:

(gan) C:\Users\ZeroCool22\faceswap-GAN>conda list

packages in environment at C:\ProgramData\Anaconda3\envs\gan:

(gan) C:\Users\ZeroCool22\faceswap-GAN>

mp4 making speed

The mp4 making speed seems to be too slow.

Below (1) is my device type and (2) is the speed image that made the video. I would appreciate it if you could tell me what I need to do.

(1)https://imgur.com/P0mMDlC
(2)https://imgur.com/tqd4lyP

How can I find definition for nc_D_inp?

https://github.com/shaoanlu/faceswap-GAN/blob/master/FaceSwap_GAN_github.ipynb
netDA = Discriminator(nc_D_inp)
netDB = Discriminator(nc_D_inp)

ERROR: FaceSwap_GAN_v2_train.ipynb (TENSORFLOW IS UPDATED)

ERROR: https://pastebin.com/mJesKjZh

(gan) C:\Users\ZeroCool22\faceswap-GAN>conda list
packages in environment at C:\ProgramData\Anaconda3\envs\gan:

Name Version Build Channel
absl-py 0.1.10
backports 1.0 py36h81696a8_1
backports.weakref 1.0rc1 py36_0
bleach 1.5.0 py36_0 conda-forge
boost 1.64.0 py36_vc14_4 [vc14] conda-forge
boost-cpp 1.64.0 vc14_1 [vc14] conda-forge
bzip2 1.0.6 vc14_1 [vc14] conda-forge
ca-certificates 2017.08.26 h94faf87_0
certifi 2018.1.18 py36_0
click 6.7
cudatoolkit 8.0 3 anaconda
cudnn 6.0 0 anaconda
decorator 4.0.11 py36_0 conda-forge
dlib 19.4 np112py36_201 conda-forge
dlib 19.9.0
face-recognition 1.2.1
face-recognition-models 0.3.0
ffmpeg 3.4.1 1 conda-forge
freetype 2.8.1 vc14_0 [vc14] conda-forge
h5py 2.7.1 py36_2 conda-forge
hdf5 1.10.1 vc14_1 [vc14] conda-forge
html5lib 0.9999999 py36_0 conda-forge
icc_rt 2017.0.4 h97af966_0
icu 58.2 vc14_0 [vc14] conda-forge
imageio 2.1.2 py36_0 conda-forge
intel-openmp 2018.0.0 hd92c6cd_8
jpeg 9b vc14_2 [vc14] conda-forge
keras 2.0.9 py36_0 conda-forge
libgpuarray 0.7.5 vc14_0 [vc14] conda-forge
libiconv 1.14 vc14_4 [vc14] conda-forge
libpng 1.6.34 vc14_0 [vc14] conda-forge
libtiff 4.0.9 vc14_0 [vc14] conda-forge
libwebp 0.5.2 vc14_7 [vc14] conda-forge
libxml2 2.9.3 vc14_9 [vc14] conda-forge
mako 1.0.7 py36_0 conda-forge
markdown 2.6.9 py36_0 conda-forge
Markdown 2.6.11
markupsafe 1.0 py36_0 conda-forge
mkl 2018.0.1 h2108138_4
moviepy 0.2.3.2 py36_0 conda-forge
numpy 1.12.1 py36hf30b8aa_1 anaconda
numpy 1.14.0
olefile 0.44 py36_0 conda-forge
opencv 3.3.0 py36_200 conda-forge
openssl 1.0.2n h74b6da3_0
pillow 5.0.0 py36_0 conda-forge
pip 9.0.1 py36_1 conda-forge
protobuf 3.5.1 py36_vc14_3 [vc14] conda-forge
protobuf 3.5.1
pygpu 0.7.5 py36_0 conda-forge
python 3.6.4 0 conda-forge
pyyaml 3.12 py36_1 conda-forge
qt 5.6.2 vc14_1 [vc14] conda-forge
scipy 1.0.0 py36h1260518_0
setuptools 38.4.0 py36_0 conda-forge
setuptools 38.5.1
six 1.11.0 py36_1 conda-forge
six 1.11.0
sqlite 3.20.1 vc14_2 [vc14] conda-forge
tensorboard 0.4.0rc3 py36_2 conda-forge
tensorflow 1.4.0 py36_0 conda-forge
tensorflow-gpu 1.5.0
tensorflow-tensorboard 1.5.1
theano 1.0.1 py36_1 conda-forge
tk 8.6.7 vc14_0 [vc14] conda-forge
tqdm 4.11.2 py36_0 conda-forge
vc 14 0 conda-forge
vs2015_runtime 14.0.25420 0 conda-forge
webencodings 0.5 py36_0 conda-forge
Werkzeug 0.14.1
werkzeug 0.14.1 py_0 conda-forge
wheel 0.30.0
wheel 0.30.0 py36_2 conda-forge
wincertstore 0.2 py36_0 conda-forge
yaml 0.1.7 vc14_0 [vc14] conda-forge
zlib 1.2.11 vc14_0 [vc14] conda-forge

May I ask about the walkthrough?

Hi, it is an amazing project and really better result than deepfakes one. About the walkthrough, do you mean :

I run the dlib_video_face_detection.ipynb for person A (with input video of person A), then unzip the zip file and put it into ./TE/ folder
I run the dlib_video_face_detection.ipynb for person B (with input video of person B), then unzip the zip file and put it into ./SH/ folder
I run FaceSwap_GAN_github.ipynb to get the output_movie.mp4

But in the step 3, where should I set the input video in the file? Thanks.

Best Wishes,
Chi Kiu SO

What is your OS？

"14. Make video clips w/ face alignment" is missing?

I about to make mp4 output but I can't find "14. Make video clips w/ face alignment" in FaceSwap_GAN_v2_sz128_train.ipynb where I can find it?

"Weight files not found" + Type error during "define_loss"

I followed the FaceSwap_GAN_github notebook pretty closely. I was able to install all the packages expect dlib, which I believe we don't need for training.

I have two issues:

During "Load Models" it is throwing an error, saying that the model input has 7 layers but expects 8. I just copied over my fakeapp model. I moved on because I assumed that the program would create a new model later on.
During the loss_DA, loss_GA = define_loss(netDA, real_A, fake_A, vggface_feat) function, I am getting the following error stack. I would love some feedback, I am fairly new to all of this.

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 loss_DA, loss_GA = define_loss(netDA, real_A, fake_A, vggface_feat)
2 loss_DB, loss_GB = define_loss(netDB, real_B, fake_B, vggface_feat)

in define_loss(netD, real, fake, vggface_feat)
3 dist = Beta(mixup_alpha, mixup_alpha)
4 lam = dist.sample()
----> 5 mixup = lam * real + (1 - lam) * fake
6 output_mixup = netD(mixup)
7 loss_D = loss_fn(output_mixup, lam * K.ones_like(output_mixup))

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\ops\math_ops.py in binary_op_wrapper(x, y)
818 with ops.name_scope(None, op_name, [x, y]) as name:
819 if not isinstance(y, sparse_tensor.SparseTensor):
--> 820 y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
821 return func(x, y, name=name)
822

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
637 name=name,
638 preferred_dtype=preferred_dtype,
--> 639 as_ref=False)
640
641

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
702
703 if ret is None:
--> 704 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
705
706 if ret is NotImplemented:

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
111 as_ref=False):
112 _ = as_ref
--> 113 return constant(v, dtype=dtype, name=name)
114
115

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\constant_op.py in constant(value, dtype, shape, name, verify_shape)
100 tensor_value = attr_value_pb2.AttrValue()
101 tensor_value.tensor.CopyFrom(
--> 102 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
103 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
104 const_tensor = g.create_op(

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
368 nparray = np.empty(shape, dtype=np_dt)
369 else:
--> 370 _AssertCompatible(values, dtype)
371 nparray = np.array(values, dtype=np_dt)
372 # check to them.

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\tensor_util.py in _AssertCompatible(values, dtype)
300 else:
301 raise TypeError("Expected %s, got %s of type '%s' instead." %
--> 302 (dtype.name, repr(mismatch), type(mismatch).name))
303
304

TypeError: Expected float32, got /input_8 of type 'TensorVariable' instead.
`

NameError: name 'bbox_moving_avg_coef' is not defined & Typo errors.

On FaceSwap_GAN_v2_test_video

Also there are Typo Errors, instead VIDEO say VODEO.

But even after correct that, gives the title error.

MoviePy] >>>> Building video OUTPUT_VIDEO.mp4
[MoviePy] Writing video OUTPUT_VIDEO.mp4

  0%|                                                                                                    | 0/341 [00:00<?, ?it/s]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<timed eval> in <module>()

<decorator-gen-176> in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in requires_duration(f, clip, *a, **k)
     52         raise ValueError("Attribute 'duration' not set")
     53     else:
---> 54         return f(clip, *a, **k)
     55 
     56 

<decorator-gen-175> in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in use_clip_fps_by_default(f, clip, *a, **k)
    135              for (k,v) in k.items()}
    136 
--> 137     return f(clip, *new_a, **new_kw)

<decorator-gen-174> in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in convert_masks_to_RGB(f, clip, *a, **k)
     20     if clip.ismask:
     21         clip = clip.to_RGB()
---> 22     return f(clip, *a, **k)
     23 
     24 @decorator.decorator

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\video\VideoClip.py in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)
    347                            verbose=verbose, threads=threads,
    348                            ffmpeg_params=ffmpeg_params,
--> 349                            progress_bar=progress_bar)
    350 
    351         if remove_temp and make_audio:

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\video\io\ffmpeg_writer.py in ffmpeg_write_video(clip, filename, fps, codec, bitrate, preset, withmask, write_logfile, audiofile, verbose, threads, ffmpeg_params, progress_bar)
    207 
    208     for t,frame in clip.iter_frames(progress_bar=progress_bar, with_times=True,
--> 209                                     fps=fps, dtype="uint8"):
    210         if withmask:
    211             mask = (255*clip.mask.get_frame(t))

C:\ProgramData\Anaconda3\lib\site-packages\tqdm\_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832 
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\Clip.py in generator()
    473         def generator():
    474             for t in np.arange(0, self.duration, 1.0/fps):
--> 475                 frame = self.get_frame(t)
    476                 if (dtype is not None) and (frame.dtype != dtype):
    477                     frame = frame.astype(dtype)

<decorator-gen-139> in get_frame(self, t)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in wrapper(f, *a, **kw)
     87         new_kw = {k: fun(v) if k in varnames else v
     88                  for (k,v) in kw.items()}
---> 89         return f(*new_a, **new_kw)
     90     return decorator.decorator(wrapper)
     91 

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\Clip.py in get_frame(self, t)
     93                 return frame
     94         else:
---> 95             return self.make_frame(t)
     96 
     97     def fl(self, fun, apply_to=[], keep_duration=True):

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\Clip.py in <lambda>(t)
    134 
    135         #mf = copy(self.make_frame)
--> 136         newclip = self.set_make_frame(lambda t: fun(self.get_frame, t))
    137 
    138         if not keep_duration:

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\video\VideoClip.py in <lambda>(gf, t)
    531         `get_frame(t)` by another frame,  `image_func(get_frame(t))`
    532         """
--> 533         return self.fl(lambda gf, t: image_func(gf(t)), apply_to)
    534 
    535     # --------------------------------------------------------------

<ipython-input-18-3ca7ce65bf3e> in process_video(input_img)
    170         if use_smoothed_bbox:
    171             if frames != 0:
--> 172                 x0, x1, y0, y1 = get_smoothed_coord(x0, x1, y0, y1, image.shape, bbox_moving_avg_coef)
    173                 set_global_coord(x0, x1, y0, y1)
    174                 frames += 1

NameError: name 'bbox_moving_avg_coef' is not defined

Universal face encoder, more resilient decoders?

First: thanks for this useful tool! I'm in the process of learning ML, but reading through this project has helped greatly.

Please correct me if I'm wrong, but I see that each encoder model is intended to be specific to a pair of faces. In my experience a given encoder adapts very rapidly to the addition of a new face set. And you may reuse decoders for a given face as these are only trained with already known data.

Is there any effort to make a universal face encoder, a very large, well-trained model that can be shared publicly? This could cut down on training time and produce arbitrary A->B swaps, where the generator for B is already trained. Perhaps this would require more layers or training time, but would it be possible to leverage the features extracted from layers of frozen existing image classification networks, or networks specifically trained on detecting facial orientation or expressions.

One example I'm thinking of is MSG-Net (https://github.com/zhanghang1989/MSG-Net), which extracts VGG-16 based features to train a model for a large set of artistic styles, but also includes a separate 'inspiration' layer for a given style.

As for the decoders: with current techniques I understand that the interpretation of the abstract face vector and the resulting transformations for a particular face set must be baked into the weights of a decoder model. I've seen some quirky decoder behavior when faces A and B are very different, though. In this case, is it possible to tweak the parameters related to the distortion / warping of the training images? While generating the training data for face A, It might also be interesting if there were some way to automatically swap eyes, mouths, etc. (using opencv, face_recognition), drawing from a large training set of these features from other faces, so the decoder for face A isn't just practicing with variations of A's eyes, for instance.

Maybe it's not helpful to throw ideas out without concrete action toward implementation, but please let us know if there are any helpful experiments we can run.

Preview during video generation

As the result of the output generation for the video is somewhat unknown until the full video is processed (the pyvideo does not let you view the file during creation due to lock or muxing) it would be nice to consider a method using opencv (I haven't time test it so maybe it's totally inferior). For my own purpose I made something like this:

import cv2

cap = cv2.VideoCapture("./INPUT.mp4")
width, height, fps, fcount = [int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), cap.get(cv2.CAP_PROP_FPS), int(cap.get(cv2.CAP_PROP_FRAME_COUNT))]

fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('./OUTPUT.mp4',fourcc, fps, (1920, 360), isColor=True)

for fnum in range(0, fcount):
    ret, frame = cap.read()
    frame = np.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    frame_out = np.array(process_video(frame, wkeep=0.01)) # this method actually lets you pass function parameters nicely
    frame_to_write = cv2.cvtColor(np.clip(frame_out, 0, 255).astype(np.uint8), cv2.COLOR_RGB2BGR)
    
    write_ret = out.write(frame_to_write)
    
    if fnum % 10 == 0:   # should be variable
        clear_output()
        print(frame_to_write.shape)
        print(write_ret)
        print('Frame {} / {}'.format(fnum, fcount))
        plt.figure(figsize=(12,12))
        plt.imshow(frame_out.astype(np.uint8))
        plt.show()
cap.release()
out.release()

The new GAN version doesn't seem to work

I've been training the new version for 10000 + 6000 iterations and the output (from the show_g function) doesn't even start to change... I saw some outputs where the network tried to mimic PersonB but at the end of the day the three columns [test_A, path_A(test_A), path_B(test_A)] all look the same as test_A and vice versa...

Not only that, when I tried turning use_mixup to False I had this error about different number of channels here:

number of input channels does not match corresponding dimension of filter, 3 != 6
`output_real = netD(real) # positive

It seems we have to manually change nc_D_inp to 3 instead of 6.

Keep up the good work...

will you provide pretrained model? thanks~

read me to start for beginner

Can you help to provide the idiot instruction to those new beginner ??

1 & a half day of training...

Why the new face looks so out of place? (i'm very glad with the color correction).

https://www.youtube.com/watch?v=sKhBkUVeFyQ

XGAN and Cycle-GAN with improved w-gan

Have you tried the two methods?
XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
Improved Training of Wasserstein GANs
I tried to use your code, but it's hard to achieve the effect you showed in the "readme".
I used a video of myself and a video of Daniel Wu, maybe I am too ugly that the network can not transform??? It really hurts me.

GAN training

Hi @shaoanlu,

I am a little confused about the GAN training code.
errDA = netDA_train([warped_A,target_A])
errGA = netGA_train([warped_A,target_A])
here warped_A and target_A seems to be of same person "A". These two are referred to distorted_A and real_A in the code. If so, the learned generated fake_A should also be person "A" since your loss_G defined the L1 loss between fake_A and real_A. Where is the relationship to another person "B" in your DA and GA? So as DB and GB.

I have run the code, and indeed learned the swapping between two persons, but I don't quite understand it. Could please give some advice?

Automated switching to refined mask generation

I know (and appreciate) that you added snippet to switch loss function for L2 mask refinement, but could you maybe come up with some metric (and not necessarily the number of iterations) that would express the "right moment" in training to switch to smoother loss function? Or maybe even keep the dependency on the iteration number but do it somewhat automatically so that the training can be just left alone overnight and "figure it out" by itself?

Video generation runs out of GPU

Using 2x K80s. Training works fine, but video generation always runs out of GPU memory, even with batch size 1.

Any way to fix this?