howiema / cvthead Goto Github PK

[WACV 2024] "CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer"

Home Page: https://arxiv.org/pdf/2311.06443.pdf

Shell 0.84% Python 96.28% C++ 0.35% Cuda 2.53%

face-reenactment gan head-avatar talking-face-generation transformer neural-point-based-rendering generative-adversarial-network voxceleb

cvthead's Introduction

CVTHead

Official Implementation of WACV 2024 paper, "CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer"

Efficient and controllable head avatar generation from a single image with point-based neural rendering.

Introduction

Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR. Existing methods for achieving explicit face control of 3D Morphable Models (3DMM) typically rely on multi-view images or videos of a single subject, making the reconstruction process complex. Additionally, the traditional rendering pipeline is time-consuming, limiting real-time animation possibilities. In this paper, we introduce CVTHead, a novel approach that generates controllable neural head avatars from a single reference image using point-based neural rendering. CVTHead considers the sparse vertices of mesh as the point set and employs the proposed Vertex-feature Transformer to learn local feature descriptors for each vertex. This enables the modeling of long-range dependencies among all the vertices. Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves comparable performance to state-of-the-art graphics-based methods. Moreover, it enables efficient rendering of novel human heads with various expressions, head poses, and camera views. These attributes can be explicitly controlled using the coefficients of 3DMMs, facilitating versatile and realistic animation in real-time scenarios.

Install

Setup environment

conda create -n cvthead python=3.9
conda activate cvthead

pip install -r requirements.txt

Download pre-trained weights

cd data/
bash fetch_data.sh
cd ..

Please go to data/README.md for more details.

Inference

Download our pre-trained model cvthead.pt from Google Drive and put it under data/ folder

Here is a demo to use CVTHead for cross-identity face reenactment

python inference.py --src_pth examples/1.png --drv_pth examples/2.png --out_pth examples/output.png --ckpt_pth data/cvthead.pt

Here is a demo for face generation under the control of FLAME coefficients

python inference.py --src_pth examples/1.png --out_pth examples --ckpt_pth data/cvthead.pt --flame

Training

Dataset Preparation (VoxCelebV1)

Download videos Please refer video-preprocessing to download the video. We follow the original given bounding box from VoxCeleb1, rather than the union of a third-party detector bbox and the given bbox in this preprocessing step.

Note: some videos may no longer be available on Youtube. Due to the copyright and privacy issue, I cannot share these face videos to others.

obtain per frame landmarks with face_alignment

The data organization should looks like

--VoxCeleb1

---- vox_video
------ train
-------- xxxx.mp4
-------- ......
------ test
-------- id10280#NXjT3732Ekg#001093#001192.mp4
-------- xxxx.mp4
-------- ......

---- vox_lmks_meta
------ train
-------- xxxx.pkl
-------- ......
------ test
-------- id10280#NXjT3732Ekg#001093#001192.pkl
-------- xxxx.pkl
-------- ......

training scripts

We find that spliting the training into two separate stages can obtain more stable training curves. In the first stage, we train the model without the adversarial loss. In the second stage, we continue training the model with all losses claimed in the paper for a few epoches.

torchrun --standalone --nnodes 1 --nproc_per_node 2 main_stage1.py --config configs/vox1.yaml
torchrun --standalone --nnodes 1 --nproc_per_node 2 main_stage2.py --config configs/vox1.yaml

Acknowledgement

ROME
DECA
Spiralnet++
face-parsing.PyTorch
face-alignment

Citation

If you found this code helpful, please consider citing:

@article{ma2023cvthead,
  title={CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer},
  author={Ma, Haoyu and Zhang, Tong and Sun, Shanlin and Yan, Xiangyi and Han, Kun and Xie, Xiaohui},
  journal={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2024}
}

cvthead's People

Contributors

Stargazers

Watchers

Forkers

feixuedudiao yyheart vital121 trthanhnguyen oxyo

cvthead's Issues

Is this project available on Windows 10

Hi, I am interested in this project and I found Assertion Error while running on Windows 10.
The error is like this:

(cvthead) E:\Workspace\aqua\CVTHead>python inference.py --src_pth examples/1.png --drv_pth examples/2.png --out_pth examples/output.png --ckpt_pth data/cvthead.pt
 ************ Load pre-traiend Face Parsing Model ************
************ Load pre-traiend Hair+Shoulder Deformation Model ************


 ************ Load pre-traiend DECA ************
C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
Traceback (most recent call last):
  File "E:\Workspace\aqua\CVTHead\inference.py", line 95, in <module>
    main(args)
  File "E:\Workspace\aqua\CVTHead\inference.py", line 75, in main
    model = CVTHead()                                        # cpu model
  File "E:\Workspace\aqua\CVTHead\models\cvthead.py", line 150, in __init__
    self.deca = DECA(config=deca_cfg)
  File "E:\Workspace\aqua\CVTHead\decalib\deca.py", line 50, in __init__
    self._create_model(self.cfg.model)
  File "E:\Workspace\aqua\CVTHead\decalib\deca.py", line 81, in _create_model
    self.E_flame = ResnetEncoder(outsize=self.n_param).to(self.device)
  File "C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "C:\Users\Administrator\anaconda3\envs\cvthead\lib\site-packages\torch\cuda\__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Error: _pickle.UnpicklingError: invalid load key, '<'.

Thank you for your great work!
I set up the environment as requirements.txt, however, I met this error when the model file "data/cvthead.pt" was loaded in inference.py

magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'

I am very thankful if you can suggest the solution to fix it.

Output quality

Hi, great project but the instructions are a little unclear on how to achieve the same quality output as your examples.
I even tried the face generation under coefficient of FLAME using the exact command line that's in this git using your own example source file and the output quality was dramatically worse than the output example that's included in the project.

I am referring to the severe pixilation and reduced colour range. Please see the attached output files for reference.
Thank you