Giter Club home page Giter Club logo

talking-head-anime-4-demo's Introduction

Demo Code for "Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation"

This repository contains demo programs for the "Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation" project. Roughly, the project is about a machine learning model that can animate an anime character given only one image. However, the model is too slow to run in real-time. So, it also proposes an algorithm to use the model to train a small machine learning model that is specialized to a character image that can anime the character in real time.

This demo code has two parts.

  • Improved model. This part gives a model similar to Version 3 of the porject. It has one demo program:

    • The full_manual_poser allows the user to manipulate a character's facial expression and body rotation through a graphical user interface.

    There are no real-time demos because the new model is too slow for that.

  • Distillation. This part allows the user to train small models (which we will refer to as student models) to mimic that behavior of the full system with regards to a specific character image. It also allows the user to run these models under various interfaces. The demo programs are:

    • distill trains a student model given a configuration file, a $512 \times 512$ RGBA character image, and a mask of facial organs.
    • distiller_ui provides a user-friendly interface to distill, allowing you to create training configurations and providing useful documentation.
    • character_model_manual_poser allows the user to control trained student models with a graphical user interface.
    • character_model_ifacialmocap_puppeteer allows the user to control trained student models with their facial movement, which is captured by the iFacialMocap software. To run this software, you must have an iOS device and, of course, iFacialMocap.
    • character_model_mediapipe_puppeteer allows the user to control trained student models with their facial movement, which is captured a web camera and processed by the Mediapipe FaceLandmarker model. To run this software, you need a web camera.

Preemptive FAQs

What is the program to control character images with my facial movement?

There is no such program in this release. If you want one, try the ifacialmocap_puppeteer of Version 3.

OK. I'm confused. Isn't your work about easy VTubing? Are you saying this release cannot do it?

NO. This release does it in a more complicated way. In order to control an image, you need to create a "student model." It is a small (< 2MB) and fast machine learning model that knows how to animate that particular image. Then, the student model can be controlled with facial movement. You can find two student models in the data/character_models directory. The two demos on the project website feature 13 students models.

So, for this release, you can control only these few characters in real time?

No. You can create your own student models.

How do I create this student model then?

  1. You prepare your characater image according to the "Constraint on Input Images" section below.
  2. You prepare a black-and-white mask image that covers the eyes and the mouth of the character, like this image. You can see how I made it with GIMP by inspecting this GIMP file.
  3. You use distiller_ui to create a configuration file that specifies how the student model should be trained.
  4. You use distiller_ui or distill to start the training process.
  5. You wait several ten hours for the student model to finish training. Last time I tried, it was about 30 hours on a computer with an Nvidia RTX A6000 GPU.
  6. After that, you can control the student model with character_model_ifacialmocap_puppeteer and character_model_mediapipe_puppeteer.

Why is this release so hard to use?

Version 3 is arguably easier to use because you can give it an animate and you can control it with your facial movment immediately. However, I was not satisfied with its image quality and speed.

In this release, I explore a new way of doing things. I added a new preprocessing stage (i.e., training the student models) that has to be done one time per character image. It allows the image to be animated much faster at a higher image quality level.

In other words, it makes the user's life difficult but the engineer/researcher happy. Patient users who are willing to go through the steps, though, would be rewarded with faster animation.

Can I use a student model from a web browser?

No. A student model created by distill is a PyTorch model, which cannot run directly in the browser. It needs to be converted to the appropriate format (TensorFlow.js) first, and the web demos use the converted models. However, The conversion code is not included in this repository. I will not release it unless I change my mind.

Hardware Requirements

All programs require a recent and powerful Nvidia GPU to run. I developed the programs on a machine with an Nvidia RTX A6000. However, anything after the GeForce RTX 2080 should be fine.

The character_model_ifacialmocap_puppeteer program requires an iOS device that is capable of computing blend shape parameters from a video feed. This means that the device must be able to run iOS 11.0 or higher and must have a TrueDepth front-facing camera. (See this page for more info.) In other words, if you have the iPhone X or something better, you should be all set. Personally, I have used an iPhone 12 mini.

The character_model_mediapipe_puppeteer program requires a web camera.

Software Requirements

GPU Driver and CUDA Toolkit

Please update your GPU's device driver and install the CUDA Toolkit that is compatible with your GPU and is newer than the version you will be installing in the next subsection.

Python and Python Libraries

All programs are written in the Python programming languages. The following libraries are required:

  • python 3.10.11
  • torch 1.13.1 with CUDA support
  • torchvision 0.14.1
  • tensorboard 2.15.1
  • opencv-python 4.8.1.78
  • wxpython 4.2.1
  • numpy-quaternion 2022.4.2
  • pillow 9.4.0
  • matplotlib 3.6.3
  • einops 0.6.0
  • mediapipe 0.10.3
  • numpy 1.26.3
  • scipy 1.12.0
  • omegaconf 2.3.0

Instead of installing these libraries yourself, you should follow the recommended method to set up a Python environment in the next section.

iFacialMocap

If you want to use ifacialmocap_puppeteer, you will also need to an iOS software called iFacialMocap (a 980 yen purchase in the App Store). Your iOS and your computer must use the same network. For example, you may connect them to the same wireless router.

Creating Python Environment

Installing Python

Please install Python 3.10.11.

I recommend using pyenv (or pyenv-win for Windows users) to manage multiple Python versions on your system. If you use pyenv, this repository has a .python-version file that indicates it would use Python 3.10.11. So, you will be using Python 3.10.11 automatically once you cd into the repository's directory.

Make sure that you can run Python from the command line.

Installing Poetry

Please install Poetry 1.7 or later. We will use it to automatically install the required libraries. Again, make sure that you can run it from the command line.

Cloning the Repository

Please clone the repository to an arbitrary directory in your machine.

Instruction for Linux/OSX Users

  1. Open a shell.
  2. cd to the directory you just cloned the repository too
    cd SOMEWHERE/talking-head-anime-4-demo
    
  3. Use Python to create a virtual environment under the venv directory.
    python -m venv venv --prompt talking-head-anime-4-demo
    
  4. Activate the newly created virtual environment. You can either use the script I provide:
    source bin/activate-venv.sh
    
    or do it yourself:
    source venv/bin/activate   
    
  5. Use Poetry to install libraries.
    cd poetry
    poetry install
    

Instruction for Windows Users

  1. Open a shell.
  2. cd to the directory you just cloned the repository too
    cd SOMEWHERE\talking-head-anime-4-demo
    
  3. Use Python to create a virtual environment under the venv directory.
    python -m venv venv --prompt talking-head-anime-4-demo
    
  4. Activate the newly created virtual environment. You can either use the script I provide:
    bin\activate-venv.bat
    
    or do it yourself:
    venv\Scripts\activate   
    
  5. Use Poetry to install libraries.
    cd poetry
    poetry install
    

Download the Models/Dataset Files

THA4 Models

Please download this ZIP file hosted on Dropbox, and unzip it to the data/tha4 directory the under the repository's directory. In the end, the directory tree should look like the following diagram:

+ talking-head-anime-4-demo
   + data
      - character_models
      - distill_examples
      + tha4
         - body_morpher.pt
         - eyebrow_decomposer.pt
         - eyebrow_morphing_combiner.pt
         - face_morpher.pt
         - upscaler.pt
     - images
     - third_party

Pose Dataset

If you want to create your own student models, you also need to download a dataset of poses that are needed for the training process. Download this pose_dataset.pt file and save it to the data folder. The directory tree should then look like the following diagram:

+ talking-head-anime-4-demo
   + data
      - character_models
      - distill_examples
      - tha4
      - images
      - third_party
      - pose_dataset.pt

Running the Programs

The programs are located in the src/tha4/app directory. You need to run them from a shell with the provided scripts.

Instruction for Linux/OSX Users

  1. Open a shell.

  2. cd to the repository's directory.

    cd SOMEWHERE/talking-head-anime-4-demo
    
  3. Run a program.

    bin/run src/tha4/app/<program-file-name>
    

    where <program-file-name> can be replaced with:

    • character_model_ifacialmocap_puppeteer.py
    • character_model_manual_poser.py
    • character_model_mediapipe_puppeteer.py
    • distill.py
    • disllerer_ui.py
    • full_manual_poser.py

Instruction for Windows Users

  1. Open a shell.

  2. cd to the repository's directory.

    cd SOMEWHERE\talking-head-anime-4-demo
    
  3. Run a program.

    bin\run.bat src\tha4\app\<program-file-name>
    

    where <program-file-name> can be replaced with:

    • character_model_ifacialmocap_puppeteer.py
    • character_model_manual_poser.py
    • character_model_mediapipe_puppeteer.py
    • distill.py
    • disllerer_ui.py
    • full_manual_poser.py

Contraints on Input Images

In order for the system to work well, the input image must obey the following constraints:

  • It should be of resolution 512 x 512. (If the demo programs receives an input image of any other size, they will resize the image to this resolution and also output at this resolution.)
  • It must have an alpha channel.
  • It must contain only one humanoid character.
  • The character should be standing upright and facing forward.
  • The character's hands should be below and far from the head.
  • The head of the character should roughly be contained in the 128 x 128 box in the middle of the top half of the image.
  • The alpha channels of all pixels that do not belong to the character (i.e., background pixels) must be 0.

An example of an image that conforms to the above criteria

Documentation for the Tools

Disclaimer

The author is an employee of pixiv Inc. This project is a part of his work as a researcher.

However, this project is NOT a pixiv product. The company will NOT provide any support for this project. The author will try to support the project, but there are no Service Level Agreements (SLAs) that he will maintain.

The code is released under the MIT license. The THA4 models and the images under the data/images directory are released under the Creative Commons Attribution-NonCommercial 4.0 International.

This repository redistributes a version of the Face landmark detection model from the MediaPipe project. The model has been released under the Apache License, Version 2.0.

talking-head-anime-4-demo's People

Stargazers

문이세 avatar pika avatar Masaki AOTA avatar  avatar steve green avatar Lunae avatar  avatar  avatar Shareef Ifthekhar avatar  avatar 无重力广场 avatar ShoiLyaung avatar  avatar zhengyang Liu avatar  avatar Katsuya Hyodo avatar smile avatar  avatar  avatar Romario Martinus avatar Jimmy avatar Dalnyem avatar Thomas Paul avatar Brodie Been avatar  avatar  avatar  avatar  avatar B1lli avatar  avatar sora avatar Xinyuan Wang avatar Ananda Khosuri avatar  avatar Yiran Wei avatar  avatar qingqing.tang avatar SileiWu avatar  avatar Andy Czerwinski avatar 刘政 avatar  avatar Zhu Shuai avatar RangHo Lee avatar Cheng-Bin Jin avatar “Plane” Abhabongse  Janthong avatar spuun avatar Tyler Luan avatar

Watchers

Norio Shimizu avatar  avatar qingqing.tang avatar Pramook Khungurn avatar  avatar spuun avatar

talking-head-anime-4-demo's Issues

Is it possible to provide a Non-GUI demo?

Thank you sir for your astonishing project, the results is awesome!

When i am trying to run the demo code, I get problem installing wxpython, and found this package is for GUI. I am working with a remote server so it's also troublesome to play with GUI. Is it possible that you can provide a Non-GUI version, that takes an images and a audio file as inputs, and generate the result video directly?

Thank you again and best regards.

Can I animate my own image?

I have been trying to create a real-time talking head to interface with chaGPT and stumbled across your repo. Thank you for sharing your work. Instead of an anime character face, if I were to train a student network by giving my photo, would I be able to control animations? I'm a noob in this field and any pointers are greatly appreciated.
Thanks,
Thomas.

The model training time is too long

I’ve trying to train a student model on my RTX 4070 12Gb.
The face training was quick, taking only 4 hours (with a target of 1 million steps).
However, the body training has already reached the tenth day and is only at 370,000 steps (with a target of 1.5 million steps), averaging 1500~2000 steps/hr.
Is this normal? Would you recommend ending the training early?
Or what size VRAM hardware should be recommended for training?
Thank you for your reply.

Does this model work well with real-life images?

Thanks for great work!
Does this model work well with real-life images?
I'm also curious if it works well with photorealistic images generated by models like Stable Diffusion.
Has anyone tested this?

Regarding the possibility of training a gpu-free model

hi pkhungurn, first of all I have the highest respect for your work.

In the demo v4 version, I tried to train a student model and tested its effect. In real time, it ran very well. Compared with demo v3, it used less GPU resources and the smoothness was also improved. .

But I still failed when I tried to make the student model work on the mobile phone, because the mobile device does not have a powerful enough GPU to support the student model.

So I have always had an idea, is it possible to use GPU to train a model that can be used without GPU? For example, models like live2d can be run on the mobile side. This can greatly expand the usage scenarios of talking-head-anime.

I recently used python to save a series of pictures using the talking-head-anime model, used them to implement a frame-by-frame animation, and finally ran the process in unity, but the action switching was still not smooth enough and difficult to display. Make some complex movements: such as blinking and shaking your head. So I was wondering, is it possible, like the live2d model, that what we ultimately train is a morpher of the image, and the animation effect is achieved through image deformation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.