Giter Club home page Giter Club logo

multitalk's Introduction

MultiTalk (INTERSPEECH 2024)

This repository contains a pytorch implementation for the Interspeech 2024 paper, MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset. MultiTalk generates 3D talking head with enhanced multilingual performance.

teaser

Getting started

This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.3 and PyTorch 1.12.0. Later versions should work, but have not been tested.

Installation

Create and activate a virtual environment to work in:

conda create --name multitalk python=3.8
conda activate multitalk

Install PyTorch. For CUDA 11.3 and ffmpeg, this would look like:

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
conda install -c conda-forge ffmpeg

Install the remaining requirements with pip:

pip install -r requirements.txt 

Compile and install psbody-mesh package: MPI-IS/mesh

BOOST_INCLUDE_DIRS=/usr/lib/x86_64-linux-gnu make all

Download models

To run MultiTalk, you need to download stage1 and stage2 model, and the template file of mean face in FLAME topology, Download [stage1 model](https://drive.google.com/file/d/1jI9feFcUuhXst1pM1_xOMvqE8cgUzP_t/view?usp=sharing | stage2 model | template and download FLAME_sample.ply from voca.

After downloading the models, place them in ./checkpoints.

./checkpoints/stage1.pth.tar
./checkpoints/stage2.pth.tar
./checkpoints/FLAME_sample.ply

Demo

Run below command to train the model. We provide sample audios in ./demo/input.

sh scripts/demo.sh multi

To use wav2vec of facebook/wav2vec2-large-xlsr-53, please move to /path/to/conda_environment/lib/python3.8/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py and change the code as below.

L105: tokenizer = Wav2Vec2CTCTokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
to
L105: tokenizer=Wav2Vec2CTCTokenizer.from_pretrained("facebook/wav2vec2-base-960h",**kwargs)

MultiTalk Dataset

Please follow the instructions in MultiTalk_dataset/README.md.

Training and testing

Training for Discrete Motion Prior

sh scripts/train_multi.sh MultiTalk_s1 config/multi/stage1.yaml multi s1

Training for Speech-Driven Motion Synthesis

Make sure the paths of pre-trained models are correct, i.e.,vqvae_pretrained_path and wav2vec2model_path in config/multi/stage2.yaml.

sh scripts/train_multi.sh MultiTalk_s2 config/multi/stage2.yaml multi s2

Testing

Lip Vertex Error (LVE)

For evaluating the lip vertex error, please run below command.

sh scripts/test.sh MultiTalk_s2 config/multi/stage2.yaml vocaset s2

Audio-Visual Lip Reading (AVLR)

For evaluating lip readability with a pre-trained Audio-Visual Speech Recognition (AVSR), download language specific checkpoint, dictionary, and tokenizer from muavic.
Place them in ./avlr/${language}/checkpoints/${language}_avlr.

# e.g "Arabic" 
./avlr/ar/checkpoints/ar_avsr/checkpoint_best.pt
./avlr/ar/checkpoints/ar_avsr/dict.ar.txt
./avlr/ar/checkpoints/ar_avsr/tokenizer.model

And place the rendered videos in ./avlr/${language}/inputs/MultiTalk, corresponding wav files in ./avlr/${language}/inputs/wav.

# e.g "Arabic" 
./avlr/ar/inputs/MultiTalk
./avlr/ar/inputs/wav

Run below command to evaluate lip readability.

python eval_avlr/eval_avlr.py --avhubert-path ./av_hubert/avhubert --work-dir ./avlr --language ${language} --model-name MultiTalk --exp-name ${exp_name}

Notes

  1. Although our codebase allows for training with multi-GPUs, we did not test it and just hardcode the training batch size as one. You may need to change the data_loader if needed.

Acknowledgement

We heavily borrow the code from Codetalk. We sincerely appreciate those authors.

multitalk's People

Contributors

chae0382 avatar sbkim052 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.