Giter Club home page Giter Club logo

mmofusion's Introduction

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

arXiv

The official PyTorch implementation of the paper "MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model".

Please visit our webpage for more details.

News

๐Ÿ“ข 17/Jun/24 - First release - pretrained models, train and test code.

Todo list

  1. Custom Speech Tutorial
  2. Train autoencoder for FGD

1. Setup environment

conda create -n mmofusion python=3.7
conda activate mmofusion

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install -r requirements.txt

2. Data preparation

Download the BEAT datasets, choose the English data v0.2.1.

We preprocess the data based on the DiffuseStyleGesture, thanks for their great work!

Download the audio prepocess model WavLM-Large and text prepocess model crawl-300d-2M.

cd ./process/

python process_BEAT_bvh.py /your/BEAT/path/ /path/to/BEAT/processed/ None None "v0" "step1" "cuda:0"

python process_BEAT_bvh.py /your/BEAT/path/ /path/to/BEAT/processed/ "/your/weights/WavLM-Large.pt" "/your/weights/crawl-300d-2M.vec" "v0" "step3" "cuda:0"

The processed data will be saved in /path/to/BEAT/processed/, before converting it into H5file, you can split the data into train/val/test as our setting by the script data_split_30.ipynb. After that, you will get the H5file BEAT_v0_train.h5 by running:

python process_BEAT_bvh.py /your/BEAT/path/ /path/to/BEAT/processed/ None None "v0" "step4" "cuda:0"

and get the mean, std in ./process/ by running:

python calculate_gesture_statistics.py --dataset BEAT --version "v0"

3. Test

Download our pretrained models including motion generation with upper body and whole body.

You can also find the pretrained autoencoder model last_600000.bin, which we trained it on 30 speakers data.

Edit the model_path and e_path to load the pretrained models for test, and tst_path to load the processed test data.

for upper body
python sample_linear.py --config=./configs/mmofusion.yml --gpu 0

for whole body
python sample_linear.py --config=./configs/mmofusion_whole.yml --gpu 0

You can also modify the weight guidance_param since we use the classifier-free guidance during training.

4. Train

Edit the h5file in the config to load the H5file BEAT_v0_train.h5.

for upper body
python train.py --config=./configs/mmofusion.yml --gpu 0

for whole body 
...

Citation

If you find this repo useful for your research, please consider citing our paper:

@misc{wang2024mmofusion,
      title={MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model}, 
      author={Sen Wang and Jiangning Zhang and Weijian Cao and Xiaobin Hu and Moran Li and Xiaozhong Ji and Xin Tan and Mengtian Li and Zhifeng Xie and Chengjie Wang and Lizhuang Ma},
      year={2024},
      eprint={2403.02905},
      archivePrefix={arXiv},
      primaryClass={id='cs.MM' full_name='Multimedia' is_active=True alt_name=None in_archive='cs' is_general=False description='Roughly includes material in ACM Subject Class H.5.1.'}
}

mmofusion's People

Contributors

wangsen99 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.