Giter Club home page Giter Club logo

aniportrait's Introduction

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations

Huawei Wei, Zejun Yang, Zhisheng Wang

Tencent Games Zhiji, Tencent

zhiji_logo

Here we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. You can also provide a video to achieve face reenacment.

Pipeline

pipeline

TODO List

  • Now our paper is available on arXiv.

  • Update the code to generate pose_temp.npy for head pose control.

  • We will release audio2pose pre-trained weight for audio2video after futher optimization. You can choose head pose template in ./configs/inference/head_pose_temp as substitution.

Various Generated Videos

Self driven

cxk.mp4
solo.mp4

Face reenacment

Aragaki.mp4
num18.mp4

Audio driven

jijin.mp4
kara.mp4
lyl.mp4
zl.mp4

Installation

Build environment

We recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:

pip install git+https://github.com/painebenjamin/aniportrait.git

Inference

You can now use the command line utility aniportrait. See these for examples in this repository:

Face Reenactment

aniportrait configs/inference/ref_images/solo.png --video configs/inference/video/Aragaki_song.mp4 --num-frames 64 --width 512 --height 512

Note: remove --num-frames 64 to match the length of the video.

Audio Driven

aniportrait configs/inference/ref_images/lyl.png --audio configs/inference/video/lyl.wav --num-frames 96 --width 512 --height 512

Note: remove --num-frames 64 to match the length of the audio.

Help

For help, run aniportrait --help.

Usage: aniportrait [OPTIONS] INPUT_IMAGE

  Run AniPortrait on an input image with a video, and/or audio file. - When
  only a video file is provided, a video-to-video (face reenactment) animation
  is performed. - When only an audio file is provided, an audio-to-video (lip-
  sync) animation is performed. - When both a video and audio file are
  provided, a video-to-video animation is performed with the audio as guidance
  for the face and mouth movements.

Options:
  -v, --video FILE                Video file to drive the animation.
  -a, --audio FILE                Audio file to drive the animation.
  -fps, --frame-rate INTEGER      Video FPS. Also controls the sampling rate
                                  of the audio. Will default to the video FPS
                                  if a video file is provided, or 30 if not.
  -cfg, --guidance-scale FLOAT    Guidance scale for the diffusion process.
                                  [default: 3.5]
  -ns, --num-inference-steps INTEGER
                                  Number of diffusion steps.  [default: 20]
  -cf, --context-frames INTEGER   Number of context frames to use.  [default:
                                  16]
  -co, --context-overlap INTEGER  Number of context frames to overlap.
                                  [default: 4]
  -nf, --num-frames INTEGER       An explicit number of frames to use. When
                                  not passed, use the length of the audio or
                                  video
  -s, --seed INTEGER              Random seed.
  -w, --width INTEGER             Output video width. Defaults to the input
                                  image width.
  -h, --height INTEGER            Output video height. Defaults to the input
                                  image height.
  -m, --model TEXT                HuggingFace model name.
  -nh, --no-half                  Do not use half precision.
  -g, --gpu-id INTEGER            GPU ID to use.
  -sf, --single-file              Download and use a single file instead of a
                                  directory.
  -cf, --config-file TEXT         Config file to use when using the single-
                                  file option. Accepts a path or a filename in
                                  the same directory as the single file. Will
                                  download from the repository passed in the
                                  model option if not provided.  [default:
                                  config.json]
  -mf, --model-filename TEXT      The model file to download when using the
                                  single-file option.  [default:
                                  aniportrait.safetensors]
  -rs, --remote-subfolder TEXT    Remote subfolder to download from when using
                                  the single-file option.
  -c, --cache-dir DIRECTORY       Cache directory to download to. Default uses
                                  the huggingface cache.
  -o, --output FILE               Output file.  [default: output.mp4]
  --help                          Show this message and exit.

Training

Data preparation

Download VFHQ and CelebV-HQ

Extract keypoints from raw videos and write training json file (here is an example of processing VFHQ):

python -m scripts.preprocess_dataset --input_dir VFHQ_PATH --output_dir SAVE_PATH --training_json JSON_PATH

Update lines in the training config file:

data:
  json_path: JSON_PATH

Stage1

Run command:

accelerate launch train_stage_1.py --config ./configs/train/stage1.yaml

Stage2

Put the pretrained motion module weights mm_sd_v15_v2.ckpt (download link) under ./pretrained_weights.

Specify the stage1 training weights in the config file stage2.yaml, for example:

stage1_ckpt_dir: './exp_output/stage1'
stage1_ckpt_step: 30000 

Run command:

accelerate launch train_stage_2.py --config ./configs/train/stage2.yaml

Acknowledgements

We first thank the authors of EMO, and part of the images and audios in our demos are from EMO. Additionally, we would like to thank the contributors to the Moore-AnimateAnyone, majic-animate, animatediff and Open-AnimateAnyone repositories, for their open research and exploration.

Citation

@misc{wei2024aniportrait,
      title={AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations}, 
      author={Huawei Wei and Zejun Yang and Zhisheng Wang},
      year={2024},
      eprint={2403.17694},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

aniportrait's People

Contributors

painebenjamin avatar zejun-yang avatar longdancediff avatar

Stargazers

robe avatar Antonio Gonzalez  avatar Voplica avatar  avatar  avatar toyxyz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.