Giter Club home page Giter Club logo

anitalker's Introduction

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

Demo     Paper     Code

  • The weights and code are being organized, and we will make them public as soon as possible.
  • Thank you for your attention. The paper is currently under peer review, and there may still be minor changes. We will update this repository after the official publication.

Environment Installation

conda create -n anitalker python==3.9.0
conda activate anitalker
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt

Model Zoo

Please download the checkpoint and place them into the folder ckpts

Run the demo

Face facing forward

Keep pose_yaw, pose_pitch, pose_roll to zero.

monalisa_facing_forward

Demo script:

python ./code/demo_audio_generation.py \
    --infer_type 'mfcc_pose_only' \
    --stage1_checkpoint_path 'ckpts/stage1.ckpt' \
    --stage2_checkpoint_path 'ckpts/stage2_pose_only.ckpt' \
    --test_image_path 'test_demos/portraits/monalisa.jpg' \
    --test_audio_path 'test_demos/audios/english_female.wav' \
    --result_path 'results/monalisa_case1/' \
    --control_flag True \
    --seed 0 \
    --pose_yaw 0 \
    --pose_pitch 0 \
    --pose_roll 0 

Adjust the orientation

Changing pose_yaw from 0 to 0.25

monalisa_turn_head_right

Demo script:

python ./code/demo.py \
    --infer_type 'mfcc_pose_only' \
    --stage1_checkpoint_path 'ckpts/stage1.ckpt' \
    --stage2_checkpoint_path 'ckpts/stage2_pose_only.ckpt' \
    --test_image_path 'test_demos/portraits/monalisa.jpg' \
    --test_audio_path 'test_demos/audios/english_female.wav' \
    --result_path 'results/monalisa_case2/' \
    --control_flag True \
    --seed 0 \
    --pose_yaw 0.25 \
    --pose_pitch 0 \
    --pose_roll 0 

Talking in Free-style

monalisa_free_style

Demo script:

python ./code/demo.py \
    --infer_type 'mfcc_pose_only' \
    --stage1_checkpoint_path 'ckpts/stage1.ckpt' \
    --stage2_checkpoint_path 'ckpts/stage2_pose_only.ckpt' \
    --test_image_path 'test_demos/portraits/monalisa.jpg' \
    --test_audio_path 'test_demos/audios/english_female.wav' \
    --result_path 'results/monalisa_case3/'

More Scripts

See MORE_SCRIPTS

Some Advice and Questions

1. Using similar poses to the portrait (Best Practice) To avoid potential deformation issues, it is recommended to keep the generated face angle close to the original portrait angle. For instance, if the face in the portrait is initially rotated to the left, it is advisable to use a value for yaw between -1 and 0 (-90 to 0 degrees). When the difference in angle from the portrait is significant, the generated face may appear distorted.
2. Utilizing algorithms to automatically extract or control using other faces' angles If you need to automate face control, you can employ pose extraction algorithms to achieve this, such as extracting the pose of another person to drive the portrait. The algorithms for extraction have been open-sourced and can be found at this link.
3. What are the differences between MFCC and Hubert features? Both `MFCC` and `Hubert` are front-end features for speech, used to extract audio signals. However, `Hubert` features require more environmental dependencies and occupy a significant amount of disk space. To facilitate quick inference for everyone, we have replaced this feature with a lightweight alternative (MFCC). The rest of the code remains unchanged. We have observed that MFCC converges more easily but may be inferior in terms of expressiveness compared to Hubert. If you need to extract Hubert features, please refer to this link. Considering the highly lifelike nature of the generated results, we currently do not plan to release the weights based on Hubert.

Citation

@misc{liu2024anitalker,
      title={AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding}, 
      author={Tao Liu and Feilong Chen and Shuai Fan and Chenpeng Du and Qi Chen and Xie Chen and Kai Yu},
      year={2024},
      eprint={2405.03121},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments

We would like to express our sincere gratitude to the numerous prior works that have laid the foundation for the development of AniTalker.

Stage 1, which primarily focuses on training the motion encoder and the rendering module, heavily relies on resources from LIA. The second stage of diffusion training is built upon diffae and espnet. For the computation of mutual information loss, we implement methods from CLUB and utilize AAM-softmax in the training of face recognition. Moreover, we leverage the pretrained Hubert model provided by TencentGameMate.

Additionally, we employ 3DDFA_V2 to extract head pose and torchlm to obtain face landmarks, which are used to calculate face location and scale. We have already open-sourced the code usage for these preprocessing steps at talking_face_preprocessing. We acknowledge the importance of building upon existing knowledge and are committed to contributing back to the research community by sharing our findings and code.

Disclaimer

  1. This library's code is not a formal product, and we have not tested all use cases; therefore, it cannot be directly offered to end-service customers.

  2. The main purpose of making our code public is to facilitate academic demonstrations and communication. Any use of this code to spread harmful information is strictly prohibited.

  3. Please use this library in compliance with the terms specified in the license file and avoid improper use.

  4. When using the code, please follow and abide by local laws and regulations.

  5. During the use of this code, you will bear the corresponding responsibility. Our company (AISpeech Ltd.) is not responsible for the generated results.

anitalker's People

Contributors

liutaocode avatar eltociear avatar azuredsky avatar

Stargazers

 avatar Wahyu Achmad avatar  avatar  avatar Devansh Khandekar avatar Francesco avatar Timing up avatar Matt Dinh avatar Muhammad Shifa avatar JungkookWS avatar KK avatar richie avatar  avatar  avatar Egar Rizki avatar Leo avatar GogoDeer avatar Nguyễn Hoài Nam avatar  avatar  avatar darkrat avatar Aditya kumar pandey avatar Zheng Li avatar jack bulusi avatar  avatar idleMan avatar slowfei avatar Xuenan Xu avatar Emanuele Sabetta avatar zhiweicoding avatar  avatar  avatar MK avatar Amit Verma avatar  avatar Computer button pusher // Pousseur de boutons en informatique  || SIN - SIN HACK - HACK || NO-CODE evangelist || Black coffee like my sense of humor avatar LinuxHuang avatar Yonggang Liu avatar Donggeon Lee avatar Ærctic avatar abiu avatar Tony Lee avatar Yafan Wang avatar  avatar Wynn avatar universe avatar  avatar  avatar  avatar chaolong.yang avatar bageyalu avatar Starsky0426 avatar Dummy Guy avatar  avatar Tsui Chising avatar wildish avatar mcks2000 avatar  avatar  avatar JHYANG avatar Ray Hao avatar  avatar  avatar  avatar  avatar  avatar  avatar Wei avatar  avatar leisu avatar  avatar  avatar xuyouzheng avatar  avatar zw yang avatar  avatar  avatar  avatar Ramazan Gur avatar  avatar  avatar Nicolas Iglesias avatar ✘ ⓞⓗⓢⓔⓥ.ⓘⓝ ✘ avatar  avatar  avatar Pavel Iakubovskii avatar Julian avatar Achyut Krishna Byanjankar avatar David avatar Vishal avatar  avatar Bradley Grimm avatar 哈雷不灰心 avatar  avatar Ahren Stevens-Taylor avatar Alvaro Lorete avatar Carlos Emilio Puente avatar tsingchao avatar Antlitz.ai avatar NEWONEAI avatar

Watchers

gradetwo avatar  avatar John D. Pope avatar Thomas Pang avatar Martin Salo avatar Govo avatar qingqing.tang avatar  avatar Nickolay V. Shmyrev avatar blackswan avatar ILJI CHOI avatar Magno Borgo avatar MavenTalker avatar En-gui avatar Jaesub Yun avatar shisan avatar Logan avatar huihui avatar Yunlin Chen avatar  avatar kevin zhou avatar  avatar signal processing fan avatar Peter avatar Jianqiang Ren avatar Snow avatar  avatar  avatar  avatar Starved Midnight avatar ZLH avatar CCludts avatar MagicSource avatar Aditya kumar pandey avatar Claus Steinmassl avatar 김진원 avatar Xiaohua Peng avatar Guile Lindroth avatar Yonggang Liu avatar Tony Lee avatar Pavel Iakubovskii avatar sunzheng avatar Justin John avatar Feiyang(Vance) Chen  avatar  avatar  avatar  avatar xiaozhiob avatar  avatar  avatar Charlie clark avatar  avatar Ziqiao Peng avatar Aitrepreneur avatar Inferencer avatar  avatar  avatar  avatar

anitalker's Issues

checkpoint required

You mention a Model Zoo checkpoint required to make this work, but there are no links to any checkpoints nor any specification of which checkpoint must be downloaded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.