Giter Club home page Giter Club logo

sinmdm's Introduction

SinMDM: Single Motion Diffusion

ICLR 2024 Spotlight

teaser

Please visit our project page for more details.

Setup

This code has been tested in the following environment:

  • Ubuntu 18.04.5 LTS
  • Python 3.8
  • conda3 or miniconda3
  • CUDA capable GPU (one is enough)

Setup conda env:

conda env create -f environment.yml
conda activate SinMDM

Install ganimator-eval-kernel by following these instructions, OR by running:

pip install git+https://github.com/PeizhuoLi/ganimator-eval-kernel.git

Preparations

Get Data

Data should be under the ./dataset folder.

Mixamo Dataset

Download the motions used for our benchmark:

bash prepare/download_mixamo_dataset.sh

Or download motions directly from Mixamo and use utils/fbx2bvh.py to convert fbx files to bvh files.

HumanML3D Dataset

Clone HumanML3D, then copy the data dir to our repository:

cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D sin-mdm/dataset/HumanML3D
cd sin-mdm

Then, download the motions used for our benchmark:

bash prepare/download_humanml3d_dataset.sh

Or download the entire dataset by following the instructions in HumanML3D, then copy the result dataset to our repository:

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
Truebones zoo Dataset

Download motions used in our pretrained models:

bash prepare/download_truebones_zoo_dataset.sh

Or download the full dataset here and use utils/fbx2bvh.py to convert fbx files to bvh files.

Synthesis

Preparations

Download pretrained models

Download the model(s) you wish to use using the scripts below. The models will be placed under ./save/.

Mixamo Dataset Models

Download pretrained models used for our benchmark:

bash prepare/download_mixamo_models.sh
HumanML3D Dataset Models

Download pretrained models used for our benchmark:

bash prepare/download_humanml3d_models.sh
Truebones Zoo Dataset Models

Download pretrained models:

bash prepare/download_truebones_models.sh

Pretrained model of "Flying Dragon" will be available soon!

Run synthesis command

To generate motions using a pretrained model use the following command:

python -m sample.generate --model_path ./save/path_to_pretrained_model --num_samples 5 --motion_length 10

Where --num_samples is the number of motions that will be generated and --motion_length is the length in seconds. Use --seed to specify a seed.

Running this will get you:

  • results.npy file with xyz positions of the generated animation
  • rows_00_to_##.mp4 - stick figure animations of all generated motions.
  • sample##.bvh - bvh file for each generated animation that can be visuallized using Blender.

It will look something like this:

example

Adding texture in Blender

Instructions for adding texture in Blender

To add texture in Blender For Mixamo and Truebones zoo datasets follow these steps:

  1. In Blender, Import an FBX file that contains the mesh and texture to blender.
    1. For Mixamo motions use: mixamo_ref.fbx
    2. For motions generated by the truebones zoo uploaded models use the relevant fbx file from here.
  2. Select only the skeleton of the imported FBX and delete it. The mesh will then appear in T-pose.
  3. Import the BVH file that was generated by the model
  4. Select the mesh, go to modifier properties, and press the tooltip icon. Then, select the generated BVH.
blender_skinning

Training

Preperations

Download HumanML3D dependencies:

bash prepare/download_t2m_evaluators.sh

Run training command

python -m train.train_sinmdm --arch unet --dataset mixamo --save_dir <'path_to_save_models'> --sin_path <'path to .bvh file for mixamo/bvh_general dataset or .npy file for humanml dataset'> --lr_method ExponentialLR --lr_gamma 0.99998 --use_scale_shift_norm --use_checkpoint
  • Specify architecture using --arch Options: unet, qna

  • Specify dataset using --dataset Options: humanml, mixamo, bvh_general

  • Use --device to define GPU id.

  • Use --seed to specify seed.

  • Add --train_platform_type {ClearmlPlatform, TensorboardPlatform} to track results with either ClearML or Tensorboard.

  • Add --eval_during_training to run a short evaluation for each saved checkpoint.

  • Add --gen_during_training to synthesize a motion and save its visualization for each saved checkpoint.

    Evaluation and generation during training will slow it down but will give you better monitoring.

Please refer to file utils/parser_util.py for more arguments.

Applications

In-betweening

For in-betweening, the prefix and suffix of a motion are given as input and the model generates the rest according to the motion the network was trained on.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode in_betweening --num_samples 3
  • To specify the motion to be used for the prefix and suffix use --ref_motion <path_to_reference_motion>. If --ref_motion is not specified, the original motion the network was trained on will be used.
  • Use --prefix_end and --suffix_start to specify the length of the prefix and suffix
  • Use --seed to specify seed.
  • Use --num_samples to specify number of motions to generate.

For example:

example

generated parts are colored in an orange scheme, given input is colored in a blue scheme.

Motion expansion

For motion expansion, a motion is given as input and new prefix and suffix are generated for it by the model.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode expansion --num_samples 3
  • To specify the input motion use --ref_motion <path_to_reference_motion>. If --ref_motion is not specified, the original motion the network was trained on will be used.
  • Use --prefix_length and --suffix_length to specify the length of the generated prefix and suffix
  • Use --seed to specify seed.
  • Use --num_samples to specify number of motions to generate.

For example:

example

generated parts are colored in an orange scheme, given input is colored in a blue scheme.

Lower body editing

The model is given a reference motion from which to take the upper body, and generates the lower body according to the motion the model was trained on.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode lower_body --num_samples 3 --ref_motion <path_to_reference_motion>

This application is supported for the mixamo and humanml datasets.

  • To specify the reference motion to take upper body from use --ref_motion <path_to_reference_motion>. If --ref_motion is not specified, the original motion the network was trained on will be used.
  • Use --seed to specify seed.
  • Use --num_samples to specify number of motions to generate.

For example: (reference motion is "chicken dance" and lower body is generated with model train on "salsa dancing")

example

generated lower body is colored in an orange scheme. Upper body which is given as input is colored in a blue scheme.

Upper body editing

Similarly to lower body editing, use --edit_mode upper_body

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode upper_body --num_samples 3 --ref_motion <path_to_reference_motion>

This application is supported for the mixamo and humanml datasets.

For example: (reference motion is "salsa dancing" and upper body is generated with model trained on "punching")

example

generated upper body is colored in an orange scheme. Lower body which is given as input is colored in a blue scheme.

Harmonization

You can use harmonization for style transfer. The model is trained on the style motion. The content motion --ref_motion, unseen by the network, is given as input and adjusted such that it matches the style motion's motifs.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode harmonization --num_samples 3 --ref_motion <path_to_reference_motion>
  • To specify the reference motion use --ref_motion <path_to_reference_motion>
  • Use --seed to specify seed.
  • Use --num_samples to specify number of motions to generate.

For example, here the model was trained on "happy" walk, and we transfer the "happy" style to the input motion:

example

Evaluation

Preparations

HumanML3D

bash prepare/download_t2m_evaluators.sh
bash prepare/download_humanml3d_dataset.sh
bash prepare/download_humanml3d_models.sh

Mixamo

bash prepare/download_mixamo_dataset.sh
bash prepare/download_mixamo_models.sh

Run evaluation command

To evaluate a single model (trained on a single sequence), run:

HumanML3D

python -m eval.eval_humanml --model_path ./save/humanml/0000/model000019999.pt

Mixamo

python -m eval.eval_mixamo --model_path ./save/mixamo/0000/model000019999.pt

Run evaluation benchmark

HumanML3D - reproduce benchmark

with the pre-trained checkpoints:

bash ./eval/eval_only_humanml_benchmark.sh

HumanML3D - train + benchmark

bash ./eval/train_eval_humanml_benchmark.sh

Mixamo - reproduce benchmark

with the pre-trained checkpoints:

bash ./eval/eval_only_mixamo_benchmark.sh

Mixamo - train + benchmark

bash ./eval/train_eval_mixamo_benchmark.sh

Acknowledgments

Our code partially uses each of the following works. We thank the authors of these works for their outstanding contributions and for sharing their code.

MDM, QnA, Ganimator, Guided Diffusion, A Deep Learning Framework For Character Motion Synthesis and Editing.

License

This code is distributed under the MIT LICENSE.

sinmdm's People

Contributors

sinmdm avatar sigal-raab avatar guytevet avatar

Stargazers

coolcoolのyisuanwang avatar Peter_Stacy avatar Kyu Ho (Phonicavi) avatar 4Ever avatar Hamza Amrani avatar Zhuoyang Pan avatar Mocax avatar Han Xu avatar  avatar  avatar Anh H. Vo avatar  avatar Zhenyu Zhang avatar Allen avatar Mihai Bujanca avatar EricPeng avatar  avatar James Hernandez avatar Eugene Zatepyakin avatar  avatar mingzesun avatar Euron ZC avatar  avatar Kangning Yin avatar Mingshuang Luo avatar Jose C. Rubio avatar  avatar  avatar  avatar  avatar Wuyang LI avatar Calvin-Khang Ta avatar Alena avatar Koki Sato avatar Xiaolei Jin avatar jeonhobeom avatar Baldr_Yao avatar Jihoon Kim avatar liuliuliuliu avatar  avatar Guangtao Lyu ( 吕光涛 ) avatar  avatar Jason Schuehlein avatar Aria F avatar Xiaohang Yang avatar QichenZheng avatar Mr. For Example avatar fq avatar Ke avatar  avatar Yiming Xie avatar Haru avatar Zhouyingcheng Liao(廖周应成) avatar Arpan Tripathi avatar Hyeonho, Jeong avatar  avatar  avatar  avatar Ni Zhihong avatar Julia avatar Rei avatar Xiong Lin avatar Lin Huang avatar Zhao (Dylan) Wang avatar zong avatar Sudhir Yarram avatar Yeong-Ho Yu avatar taoranyi avatar play123 avatar Danila avatar Hiromichi Kamata avatar borishan avatar shong avatar  avatar Moira Shooter avatar  avatar  avatar Dongyu Yan avatar  avatar kiui avatar Junran Peng avatar VisionU avatar RenMing avatar Minjae Kim avatar Cheng Luo avatar Tsun-Yi Yang avatar  avatar Jeff Carpenter avatar Tuan Duc Ngo avatar TakatoYoshikawa avatar  avatar  avatar  avatar Ci Li avatar Louis Vass avatar WenqianZhang avatar wanghao avatar  avatar MINGXIAN LIN avatar  avatar

Watchers

 avatar Richard Löwenström avatar  avatar Krtolica Vujadin avatar  avatar Paragoner avatar prthamesh  avatar Snow avatar Jaswer avatar Nitin Saini avatar Kostas Georgiou avatar  avatar EnYu avatar jeonhobeom avatar  avatar  avatar Luo Zhili avatar  avatar Saravana Rathinam avatar  avatar

sinmdm's Issues

Style transfer

Thanks for the great work!
How's the style transfer model trained? Do you plan to release the trained model?

Postprocessing issue: retargeting the truebone zoo data samples

Hi there,

Thanks for your nice work!

I'm currently trying to produce some visualization demos on the Truebone zoo data samples, but have some problems with retargeting the generated bvh source to the fbx file provided in Truebone zoo.

As shown in the figure below, I'm testing the retargeting (with Rokoko) using exactly the same motion as the fbx(provided as bvh in the dataset). The first row is the original skeleton and skin mesh in fbx file; second row is after correcting the bone orientation when loading fbx; third row is the bvh source they provide in their dataset.

The retargeting results using Rokoko are as the two columns in the last row. The retargeted armature is not correct.
I've also tried to transfer the fbx with orientation corrected skeleton to bvh file, resulting in exactly the same skeleton structure. And then retarget this bvh to the fbx armature. But the retargeting with Rokoko still fails.

I'm wondering if you have come across this problem before, and if you have any idea of the solutions?
image

Data discrepancy using download_humanml3d_dataset.sh vs HumanML3D

For dataset/humanml3d_single/npy/0000_a_person_picks_something_up_with_their_right_hand,_and_wipes_down_the_item_with_their_left.npy, we can find its original HumanML3D entry 001803.npy.

But when trying with 001803.npy, the result would be totally meaningless (the skeleton would be floating forward).

When digging into these 2 .npy files, we found that 0000_a_person_picks_something_up_with_their_right_hand,_and_wipes_down_the_item_with_their_left.npy contains more fields like "motion_raw" and "motion_xyz".

In data_utils/data_util.py, we can find that it is using "motion_raw". But even if we transpose the data in 001803.npy, it still not matches "motion_raw".

May we ask how did you derive this "motion_raw" data?

how to cal the Harmonic Mean?

I run train_eval_humanml_benchmark.sh. The results include intra_diversity, gt_intra_diversity, intra_diversity_gt_diff, inter_diversity and sifid. Lack of the Harmonic Mean. how to cal the Harmonic Mean?

The relative PE

Hi, thanks for open-sourcing this excellent work. After reviewing the 'mdm_qnanet.py' code, I couldn't locate the implementation of the relative positional encoding (PE) mentioned in the paper. I'm curious about how this model achieves the generation of diverse results. Do you have any insights to share?

Hand + Face of Human Pose

Hi,
Is it possible to generate a single character from the Pose for about 5 seconds?

I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controlled (pose) input?

I have a video of OpenPose+hands+face and i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video

P.S. Any Model that could supports Pose+Hand+Face, can be used!

Thanks
Best regards

arch

Thank you very much for the work you do. I would like to ask why 'Unet' and 'qna' are alone in the architecture parameter (--arch), while in the article it is a combination of the two to achieve the effect in the description. If I want to combine the two, as in the article, how should I choose the values of the parameters?Looking forward to your reply.

Pretrained weights

@SinMDM thanks for sharing wonderful work is it possible to share the pretrained to test the model either thru google drive or onedrive
THnsk in advance

questions about the limitation

Hello,
Great work.

Another limitation, also common to all single-instance models, is the inability to set generated sub-motions in a specific order, when such order matters (e.g., certain dance moves). This can be addressed by enlarging the receptive field (at the cost of lower diversity).

Can you provide more examples about it. what is "set generated sub-motions in a specific order". Why it can be addressed by enlarging the receptive field?

thank you.

The model parameter file may be corrupted or incorrectly formatted

Hi author, when I was testing the humanml and mixamo datasets, the following problems occurred when loading the model

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/build/SinMDM/sample/generate.py", line 295, in
main()
File "/home/ubuntu/build/SinMDM/sample/generate.py", line 91, in main
state_dict = torch.load(args.model_path, map_location='cpu')
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '{'.

How should I handle this situation

About --dataset mixamo

Hello and thank you for your great work!

I want to ask a question about data:
I have observed that if I use the same bvh data of mixamo, if I specify --dataset mixamo during command line training, the effect will be better and the convergence speed will be faster than when --dataset bvh_general. Even if it's the same bvh data.

Maybe you know what could be the reason?

TensorBoard

Hello author! I would like to use TensorBoard during training, but I get the error that the following folder already exists, and I would like to ask you if there is a conflict in the beginning because the folder where the training results are saved is created when you create the event file, and then a folder with the same name is also created during training? Hope you can reply, thank you very much!

(SinMDM) root@autodl-container-1fda11bb52-bbd9300c:~/SinMDM-main# python -m train.train_sinmdm --arch unet --dataset bvh_general --save_dir ./save/ballet/testballet1/ --sin_path ./dataset/ballet/testballet1.bvh --lr_method ExponentialLR --lr_gamma 0.99998 --use_scale_shift_norm --use_checkpoint --train_platform_type 'TensorboardPlatform'
Traceback (most recent call last):
File "/root/miniconda3/envs/SinMDM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/SinMDM/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/SinMDM-main/train/train_sinmdm.py", line 53, in
main()
File "/root/SinMDM-main/train/train_sinmdm.py", line 30, in main
raise FileExistsError('save_dir [{}] already exists.'.format(args.save_dir))
FileExistsError: save_dir [./save/ballet/testballet1/] already exists.

train

python -m train.train_sinmdm --arch unet --dataset mixamo --save_dir <'path_to_save_models'> --sin_path <'path to .bvh file for mixamo/bvh_general dataset or .npy file for humanml dataset'> --lr_method ExponentialLR --lr_gamma 0.99998 --use_scale_shift_norm --use_checkpoint

可以解释一下这个 --sin_path <'path to .bvh file for mixamo/bvh_general dataset or .npy file for humanml dataset'> 吗?有点迷糊,不知道是什么路径。

reproduce motion harmonization

I am Interested in the style transfer function of your work, and want to reproduce from motion "walk" to motion "walk happily" could you please tell me which motion and which pretrained model you used to achieve this? since it is not shown in the readme.md

Couldn't connect bvh file in Blender with Mixamo fbx texture

I am trying to follow #2 to visualize the generated bvh file in Blender.

Since the mixamo_ref.fbx link is not available anymore, I am downloading Michelle character fbx directly from Mixamo.

My Blender version is 3.6.5. When I deleted the skeleton as instructed, the mesh will be in a T-pose, but became super large. And the bvh skeleton was very small compared to this big T-pose.

And when I selected the modifier and assign the generated bvh to it, it would not do anything. When playing the animation, only the generated bvh skeleton was moving.

Several random guess:
(1) Is it possible that it is due to the Blender version difference? Could you share what is your Blender version?
(2) With the downloaded Michelle character fbx from Mixamo, I noticed that the bone names are with prefix "mixamorig". For example, "mixamorig:Hips". But in the generated bvh file, the bone names are without the prefix, like "Hips".
(3) With the downloaded Michelle character, I noticed that it set the armature object with transform Rotation x 90, Scale X = 0.01, Scale y = 0.01, Scale z = 0.01. I think this is the reason why after removing the skeleton, the mesh became so big. I tried to transform the bvh generated skeleton by doing rotation x -90, scale x = y = z = 100. Now the orientation and the scale matches, but the modifier still do nothing.

Would appreciate for any guidance and suggestion.

Applications

I tested In-betweening inside Applications, but no bvh file is generated inside, why is that? Is it possible that the individual features in Applications are only shown in video?

What is the path to the npy file of humanml3d

Thank you for the interesting work, what is the path to the npy file of humanml3d please?

I run

python -m train.train_sinmdm --arch qna --dataset humanml --save_dir ./checkpoints  --sin_path dataset/HumanML3D/new_joint_vecs/000006.npy  --overwrite

but

File "/home/user/miniconda3/envs/SinMDM/lib/python3.8/bdb.py", line 113, in dispatch_line
    motion = motion.permute(1, 0, 2)  # n_feats x n_joints x n_frames   ==> n_joints x n_feats x n_frames
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 2 is not equal to len(dims) = 3

Questions about prefix_length and suffix_length

Hello author, thank you for your great work!

I have some questions about In-betweening and Motion expansion:
How should --prefix_length and --suffix_length be specified?

For example, I used a 1200-frame action during training. I want to use in-betweening 0-400 frames and 800-1200 frames to generate 400-800 frame actions. How should --prefix_length and --suffix_length be specified?

--prefix_length 400 --suffix_length 800?
Or something else?

Looking forward to your reply

bvh

How should my own bvh file be structured in a uniform way with Mixamo's bvh file?

Diffusion model architecture

Hi, I read your paper and got a good insight from your work. Thank you for sharing your work and code.

I have a question about the model architecture including so many layers of ResBlock(Sorry about this topic. It would be out of your main contribution.)
I'm not familiar with diffusion models, and I'd like to know the needs of deep layers and ResBlocks.
In the one of previous motion diffusion models, they used 4-8 layers with Transformers.
However, recent papers like yours include 10-20 layers of ResBlocks.
I have no idea of any reasoning about these changes of model architectures. It seems to be heavy, and I'd like to know the recent flows in this domain.
Is it not enough to use just a few layers of transformers for synthesizing motions?
Since I don't have any insights in this domain, I'm sorry in advance if it is too stupid question. :)

Why conv_1d is not working well?

Before I ask the question, thank you for your great work.
When I read the code, I observed that the architecture is working with conv2D operations, unlike other motion works that use conv1d as the main operator.
Of course, I understand this because the QnA architecture is for the image task.
But even with the U-net architecture, it uses conv2d operators with the default settings.
I tried option conv_1d to train the u-net architecture, but the loss is higher than conv2d u-net.
I want to know the theoretical background that you make to use the conv2d in the motion domain.
Also, it's so thankful, if you provides ideas or related works that use conv2d in motion generation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.