sinmdm / sinmdm Goto Github PK

View Code? Open in Web Editor NEW

324.0 20.0 19.0 37.23 MB

Single Motion Diffusion Model

License: MIT License

Python 98.52% Shell 1.48%

sinmdm's Introduction

SinMDM: Single Motion Diffusion

ICLR 2024 Spotlight

Please visit our project page for more details.

Setup

This code has been tested in the following environment:

Ubuntu 18.04.5 LTS
Python 3.8
conda3 or miniconda3
CUDA capable GPU (one is enough)

Setup conda env:

conda env create -f environment.yml
conda activate SinMDM

Install ganimator-eval-kernel by following these instructions, OR by running:

pip install git+https://github.com/PeizhuoLi/ganimator-eval-kernel.git

Preparations

Get Data

Data should be under the ./dataset folder.

Mixamo Dataset

Download the motions used for our benchmark:

bash prepare/download_mixamo_dataset.sh

Or download motions directly from Mixamo and use utils/fbx2bvh.py to convert fbx files to bvh files.

HumanML3D Dataset

Clone HumanML3D, then copy the data dir to our repository:

cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D sin-mdm/dataset/HumanML3D
cd sin-mdm

Then, download the motions used for our benchmark:

bash prepare/download_humanml3d_dataset.sh

Or download the entire dataset by following the instructions in HumanML3D, then copy the result dataset to our repository:

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

Truebones zoo Dataset

Download motions used in our pretrained models:

bash prepare/download_truebones_zoo_dataset.sh

Or download the full dataset here and use utils/fbx2bvh.py to convert fbx files to bvh files.

Synthesis

Preparations

Download pretrained models

Download the model(s) you wish to use using the scripts below. The models will be placed under ./save/.

Mixamo Dataset Models

Download pretrained models used for our benchmark:

bash prepare/download_mixamo_models.sh

HumanML3D Dataset Models

Download pretrained models used for our benchmark:

bash prepare/download_humanml3d_models.sh

Truebones Zoo Dataset Models

Download pretrained models:

bash prepare/download_truebones_models.sh

Pretrained model of "Flying Dragon" will be available soon!

Run synthesis command

To generate motions using a pretrained model use the following command:

python -m sample.generate --model_path ./save/path_to_pretrained_model --num_samples 5 --motion_length 10

Where --num_samples is the number of motions that will be generated and --motion_length is the length in seconds. Use --seed to specify a seed.

Running this will get you:

results.npy file with xyz positions of the generated animation
rows_00_to_##.mp4 - stick figure animations of all generated motions.
sample##.bvh - bvh file for each generated animation that can be visuallized using Blender.

It will look something like this:

Adding texture in Blender

Instructions for adding texture in Blender

To add texture in Blender For Mixamo and Truebones zoo datasets follow these steps:

In Blender, Import an FBX file that contains the mesh and texture to blender.
1. For Mixamo motions use: mixamo_ref.fbx
2. For motions generated by the truebones zoo uploaded models use the relevant fbx file from here.
Select only the skeleton of the imported FBX and delete it. The mesh will then appear in T-pose.
Import the BVH file that was generated by the model
Select the mesh, go to modifier properties, and press the tooltip icon. Then, select the generated BVH.

Training

Preperations

Download HumanML3D dependencies:

bash prepare/download_t2m_evaluators.sh

Run training command

python -m train.train_sinmdm --arch unet --dataset mixamo --save_dir <'path_to_save_models'> --sin_path <'path to .bvh file for mixamo/bvh_general dataset or .npy file for humanml dataset'> --lr_method ExponentialLR --lr_gamma 0.99998 --use_scale_shift_norm --use_checkpoint

Specify architecture using --arch Options: unet, qna
Specify dataset using --dataset Options: humanml, mixamo, bvh_general
Use --device to define GPU id.
Use --seed to specify seed.
Add --train_platform_type {ClearmlPlatform, TensorboardPlatform} to track results with either ClearML or Tensorboard.
Add --eval_during_training to run a short evaluation for each saved checkpoint.
Add --gen_during_training to synthesize a motion and save its visualization for each saved checkpoint.

Evaluation and generation during training will slow it down but will give you better monitoring.

Please refer to file utils/parser_util.py for more arguments.

Applications

In-betweening

For in-betweening, the prefix and suffix of a motion are given as input and the model generates the rest according to the motion the network was trained on.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode in_betweening --num_samples 3

To specify the motion to be used for the prefix and suffix use --ref_motion <path_to_reference_motion>. If --ref_motion is not specified, the original motion the network was trained on will be used.
Use --prefix_end and --suffix_start to specify the length of the prefix and suffix
Use --seed to specify seed.
Use --num_samples to specify number of motions to generate.

For example:

generated parts are colored in an orange scheme, given input is colored in a blue scheme.

Motion expansion

For motion expansion, a motion is given as input and new prefix and suffix are generated for it by the model.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode expansion --num_samples 3

To specify the input motion use --ref_motion <path_to_reference_motion>. If --ref_motion is not specified, the original motion the network was trained on will be used.
Use --prefix_length and --suffix_length to specify the length of the generated prefix and suffix
Use --seed to specify seed.
Use --num_samples to specify number of motions to generate.

For example:

generated parts are colored in an orange scheme, given input is colored in a blue scheme.

Lower body editing

The model is given a reference motion from which to take the upper body, and generates the lower body according to the motion the model was trained on.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode lower_body --num_samples 3 --ref_motion <path_to_reference_motion>

This application is supported for the mixamo and humanml datasets.

To specify the reference motion to take upper body from use --ref_motion <path_to_reference_motion>. If --ref_motion is not specified, the original motion the network was trained on will be used.
Use --seed to specify seed.
Use --num_samples to specify number of motions to generate.

For example: (reference motion is "chicken dance" and lower body is generated with model train on "salsa dancing")

generated lower body is colored in an orange scheme. Upper body which is given as input is colored in a blue scheme.

Upper body editing

Similarly to lower body editing, use --edit_mode upper_body

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode upper_body --num_samples 3 --ref_motion <path_to_reference_motion>

This application is supported for the mixamo and humanml datasets.

For example: (reference motion is "salsa dancing" and upper body is generated with model trained on "punching")

generated upper body is colored in an orange scheme. Lower body which is given as input is colored in a blue scheme.

Harmonization

You can use harmonization for style transfer. The model is trained on the style motion. The content motion --ref_motion, unseen by the network, is given as input and adjusted such that it matches the style motion's motifs.

python -m sample.edit --model_path <path_to_pretrained_model> --edit_mode harmonization --num_samples 3 --ref_motion <path_to_reference_motion>

To specify the reference motion use --ref_motion <path_to_reference_motion>
Use --seed to specify seed.
Use --num_samples to specify number of motions to generate.

For example, here the model was trained on "happy" walk, and we transfer the "happy" style to the input motion:

Evaluation

Preparations

HumanML3D

bash prepare/download_t2m_evaluators.sh
bash prepare/download_humanml3d_dataset.sh
bash prepare/download_humanml3d_models.sh

Mixamo

bash prepare/download_mixamo_dataset.sh
bash prepare/download_mixamo_models.sh

Run evaluation command

To evaluate a single model (trained on a single sequence), run:

HumanML3D

python -m eval.eval_humanml --model_path ./save/humanml/0000/model000019999.pt

Mixamo

python -m eval.eval_mixamo --model_path ./save/mixamo/0000/model000019999.pt

Run evaluation benchmark

HumanML3D - reproduce benchmark

with the pre-trained checkpoints:

bash ./eval/eval_only_humanml_benchmark.sh

HumanML3D - train + benchmark

bash ./eval/train_eval_humanml_benchmark.sh

Mixamo - reproduce benchmark

with the pre-trained checkpoints:

bash ./eval/eval_only_mixamo_benchmark.sh

Mixamo - train + benchmark

bash ./eval/train_eval_mixamo_benchmark.sh

Acknowledgments

Our code partially uses each of the following works. We thank the authors of these works for their outstanding contributions and for sharing their code.

MDM, QnA, Ganimator, Guided Diffusion, A Deep Learning Framework For Character Motion Synthesis and Editing.

License

This code is distributed under the MIT LICENSE.

sinmdm's People

Contributors

Stargazers

Watchers

Forkers

barseghyanartur xymfei babyblue26 ricksilliker jinwook-shim chenchy anipen daidaiershidi bf-koba ht-dev-id jaydubvegas neo946b bruinxiong rucchzy rorythwu cchadj hmthanh khlu1658 prashanthk2023

sinmdm's Issues

Style transfer

Thanks for the great work!
How's the style transfer model trained? Do you plan to release the trained model?

could you provide the google driver link of the processed dataset?

Thank you for your great work.
I have some trouble in using gdown to download the processed dataset. Could you provide the google driver link of the processed dataset? I want to download it manually.

Postprocessing issue: retargeting the truebone zoo data samples

Hi there,

Thanks for your nice work!

I'm currently trying to produce some visualization demos on the Truebone zoo data samples, but have some problems with retargeting the generated bvh source to the fbx file provided in Truebone zoo.

As shown in the figure below, I'm testing the retargeting (with Rokoko) using exactly the same motion as the fbx(provided as bvh in the dataset). The first row is the original skeleton and skin mesh in fbx file; second row is after correcting the bone orientation when loading fbx; third row is the bvh source they provide in their dataset.

The retargeting results using Rokoko are as the two columns in the last row. The retargeted armature is not correct.
I've also tried to transfer the fbx with orientation corrected skeleton to bvh file, resulting in exactly the same skeleton structure. And then retarget this bvh to the fbx armature. But the retargeting with Rokoko still fails.

I'm wondering if you have come across this problem before, and if you have any idea of the solutions?

Data discrepancy using download_humanml3d_dataset.sh vs HumanML3D

For dataset/humanml3d_single/npy/0000_a_person_picks_something_up_with_their_right_hand,_and_wipes_down_the_item_with_their_left.npy, we can find its original HumanML3D entry 001803.npy.

But when trying with 001803.npy, the result would be totally meaningless (the skeleton would be floating forward).

When digging into these 2 .npy files, we found that 0000_a_person_picks_something_up_with_their_right_hand,_and_wipes_down_the_item_with_their_left.npy contains more fields like "motion_raw" and "motion_xyz".

In data_utils/data_util.py, we can find that it is using "motion_raw". But even if we transpose the data in 001803.npy, it still not matches "motion_raw".

May we ask how did you derive this "motion_raw" data?

how to generate the crowd animation results

hello, how to generate the crowd animation results like the figure in the paper?

how to cal the Harmonic Mean?

I run train_eval_humanml_benchmark.sh. The results include intra_diversity, gt_intra_diversity, intra_diversity_gt_diff, inter_diversity and sifid. Lack of the Harmonic Mean. how to cal the Harmonic Mean?

The relative PE

Hi, thanks for open-sourcing this excellent work. After reviewing the 'mdm_qnanet.py' code, I couldn't locate the implementation of the relative positional encoding (PE) mentioned in the paper. I'm curious about how this model achieves the generation of diverse results. Do you have any insights to share?

Hand + Face of Human Pose

Hi,
Is it possible to generate a single character from the Pose for about 5 seconds?

I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controlled (pose) input?

I have a video of OpenPose+hands+face and i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video

P.S. Any Model that could supports Pose+Hand+Face, can be used!

Thanks
Best regards

arch

Thank you very much for the work you do. I would like to ask why 'Unet' and 'qna' are alone in the architecture parameter (--arch), while in the article it is a combination of the two to achieve the effect in the description. If I want to combine the two, as in the article, how should I choose the values of the parameters?Looking forward to your reply.

already asked question

About the source code

Thansk for your amazing work. And when could you release the source code?

Pretrained weights

@SinMDM thanks for sharing wonderful work is it possible to share the pretrained to test the model either thru google drive or onedrive
THnsk in advance

how to train a flying, dancing dragon model

great job! Can you tell me how to train a flying or dancing dragon model.

questions about the limitation

Hello,
Great work.

Another limitation, also common to all single-instance models, is the inability to set generated sub-motions in a specific order, when such order matters (e.g., certain dance moves). This can be addressed by enlarging the receptive field (at the cost of lower diversity).

Can you provide more examples about it. what is "set generated sub-motions in a specific order". Why it can be addressed by enlarging the receptive field?

thank you.

Did you pre-train SinMDM on the dataset, similar to MDM on HumanML3D

Did you pre-train SinMDM on the dataset, similar to MDM on HumanML3D. Because it seems that SinMDM cannot use text as prompt now. I am curious if pre-training will allow the model to bring better generalization when generating actions.

The model parameter file may be corrupted or incorrectly formatted

Hi author, when I was testing the humanml and mixamo datasets, the following problems occurred when loading the model

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/build/SinMDM/sample/generate.py", line 295, in
main()
File "/home/ubuntu/build/SinMDM/sample/generate.py", line 91, in main
state_dict = torch.load(args.model_path, map_location='cpu')
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/SinMDM/lib/python3.8/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '{'.

How should I handle this situation

About --dataset mixamo

Hello and thank you for your great work!

I want to ask a question about data:
I have observed that if I use the same bvh data of mixamo, if I specify --dataset mixamo during command line training, the effect will be better and the convergence speed will be faster than when --dataset bvh_general. Even if it's the same bvh data.

Maybe you know what could be the reason?

TensorBoard

Hello author! I would like to use TensorBoard during training, but I get the error that the following folder already exists, and I would like to ask you if there is a conflict in the beginning because the folder where the training results are saved is created when you create the event file, and then a folder with the same name is also created during training? Hope you can reply, thank you very much!

(SinMDM) root@autodl-container-1fda11bb52-bbd9300c:~/SinMDM-main# python -m train.train_sinmdm --arch unet --dataset bvh_general --save_dir ./save/ballet/testballet1/ --sin_path ./dataset/ballet/testballet1.bvh --lr_method ExponentialLR --lr_gamma 0.99998 --use_scale_shift_norm --use_checkpoint --train_platform_type 'TensorboardPlatform'
Traceback (most recent call last):
File "/root/miniconda3/envs/SinMDM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/SinMDM/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/SinMDM-main/train/train_sinmdm.py", line 53, in
main()
File "/root/SinMDM-main/train/train_sinmdm.py", line 30, in main
raise FileExistsError('save_dir [{}] already exists.'.format(args.save_dir))
FileExistsError: save_dir [./save/ballet/testballet1/] already exists.

train

python -m train.train_sinmdm --arch unet --dataset mixamo --save_dir <'path_to_save_models'> --sin_path <'path to .bvh file for mixamo/bvh_general dataset or .npy file for humanml dataset'> --lr_method ExponentialLR --lr_gamma 0.99998 --use_scale_shift_norm --use_checkpoint

可以解释一下这个 --sin_path <'path to .bvh file for mixamo/bvh_general dataset or .npy file for humanml dataset'> 吗？有点迷糊，不知道是什么路径。

Hello! Can you please tell me how to create this display image.

Hello! Can you please tell me how to create this display image. Looking forward to your reply!

can u tell me how to use the texture with your demo？

great job！ I have run the script of sample.generate and generate some bvh file. I use the blender load the file，the motion is great， but not get the texture. How can i get the texture like your demo?

awesome work

hello there ,thanks for this awesome work

reproduce motion harmonization

I am Interested in the style transfer function of your work, and want to reproduce from motion "walk" to motion "walk happily" could you please tell me which motion and which pretrained model you used to achieve this? since it is not shown in the readme.md

Couldn't connect bvh file in Blender with Mixamo fbx texture

I am trying to follow #2 to visualize the generated bvh file in Blender.

Since the mixamo_ref.fbx link is not available anymore, I am downloading Michelle character fbx directly from Mixamo.

My Blender version is 3.6.5. When I deleted the skeleton as instructed, the mesh will be in a T-pose, but became super large. And the bvh skeleton was very small compared to this big T-pose.

And when I selected the modifier and assign the generated bvh to it, it would not do anything. When playing the animation, only the generated bvh skeleton was moving.

Several random guess:
(1) Is it possible that it is due to the Blender version difference? Could you share what is your Blender version?
(2) With the downloaded Michelle character fbx from Mixamo, I noticed that the bone names are with prefix "mixamorig". For example, "mixamorig:Hips". But in the generated bvh file, the bone names are without the prefix, like "Hips".
(3) With the downloaded Michelle character, I noticed that it set the armature object with transform Rotation x 90, Scale X = 0.01, Scale y = 0.01, Scale z = 0.01. I think this is the reason why after removing the skeleton, the mesh became so big. I tried to transform the bvh generated skeleton by doing rotation x -90, scale x = y = z = 100. Now the orientation and the scale matches, but the modifier still do nothing.

Would appreciate for any guidance and suggestion.

how to render the dragon?

https://github.com/SinMDM/SinMDM/blob/main/assets/dragon.png

Applications

I tested In-betweening inside Applications, but no bvh file is generated inside, why is that? Is it possible that the individual features in Applications are only shown in video?

What is the path to the npy file of humanml3d

Thank you for the interesting work, what is the path to the npy file of humanml3d please?

I run

python -m train.train_sinmdm --arch qna --dataset humanml --save_dir ./checkpoints  --sin_path dataset/HumanML3D/new_joint_vecs/000006.npy  --overwrite

but

File "/home/user/miniconda3/envs/SinMDM/lib/python3.8/bdb.py", line 113, in dispatch_line
    motion = motion.permute(1, 0, 2)  # n_feats x n_joints x n_frames   ==> n_joints x n_feats x n_frames
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 2 is not equal to len(dims) = 3

Questions about prefix_length and suffix_length

Hello author, thank you for your great work！

I have some questions about In-betweening and Motion expansion:
How should --prefix_length and --suffix_length be specified?

For example, I used a 1200-frame action during training. I want to use in-betweening 0-400 frames and 800-1200 frames to generate 400-800 frame actions. How should --prefix_length and --suffix_length be specified?

--prefix_length 400 --suffix_length 800?
Or something else?

Looking forward to your reply

bvh

How should my own bvh file be structured in a uniform way with Mixamo's bvh file?

Diffusion model architecture

Hi, I read your paper and got a good insight from your work. Thank you for sharing your work and code.

I have a question about the model architecture including so many layers of ResBlock(Sorry about this topic. It would be out of your main contribution.)
I'm not familiar with diffusion models, and I'd like to know the needs of deep layers and ResBlocks.
In the one of previous motion diffusion models, they used 4-8 layers with Transformers.
However, recent papers like yours include 10-20 layers of ResBlocks.
I have no idea of any reasoning about these changes of model architectures. It seems to be heavy, and I'd like to know the recent flows in this domain.
Is it not enough to use just a few layers of transformers for synthesizing motions?
Since I don't have any insights in this domain, I'm sorry in advance if it is too stupid question. :)

Why conv_1d is not working well?

Before I ask the question, thank you for your great work.
When I read the code, I observed that the architecture is working with conv2D operations, unlike other motion works that use conv1d as the main operator.
Of course, I understand this because the QnA architecture is for the image task.
But even with the U-net architecture, it uses conv2d operators with the default settings.
I tried option conv_1d to train the u-net architecture, but the loss is higher than conv2d u-net.
I want to know the theoretical background that you make to use the conv2d in the motion domain.
Also, it's so thankful, if you provides ideas or related works that use conv2d in motion generation.

sinmdm / sinmdm Goto Github PK

sinmdm's Introduction

SinMDM: Single Motion Diffusion

Setup

Preparations

Get Data

Synthesis

Preparations

Download pretrained models

Run synthesis command

Adding texture in Blender

Training

Preperations

Run training command

Applications

In-betweening

Motion expansion

Lower body editing

Upper body editing

Harmonization

Evaluation

Preparations

Run evaluation command

Run evaluation benchmark

Acknowledgments

License

sinmdm's People

Contributors

Stargazers

Watchers

Forkers

sinmdm's Issues

Recommend Projects

Recommend Topics

Recommend Org