korrawe / guided-motion-diffusion Goto Github PK

View Code? Open in Web Editor NEW

137.0 137.0 9.0 4.2 MB

License: Other

Jupyter Notebook 8.11% Python 91.60% Shell 0.29%

guided-motion-diffusion's People

Contributors

Stargazers

Watchers

Forkers

peterzs jstuder3 zyp5511 bingliangli bruinxiong rucchzy ishraqf keyoneten

guided-motion-diffusion's Issues

Loss becomes NaN when training GMD's UNet on HumanML3D with relative root representation

Hi,

Thansk for the great work! I tried training the motion model with relative data and somewhere during training, I started getting NaN for the loss. To clarify, I'm interested in training GMD's motion model on the original relative representation of HumanM3D dataset. So, I set train_args to card.motion_rel_unet_adagn_xl in train_gmd.py. Note that I am able to train the motion model with absolute-root representation with no issues (when train_args=card.motion_abs_unet_adagn_xl).
Have you faced a similar problem before? I'd appreciate any insights into how to address this issue.

Thanks,
Setareh

Query about the procedure of emphasis projection

Hi authors,
I would like to first appreciate on your interesting work.
I am writing this issue to clarify whether my understanding on your emphasis projection contribution is correct or not, and I really appreciate if you can spend your valuable time to answer my questions.
Firstly, since the dense guidance part in section 4.2 of your paper is for densing the signal, does this mean that, as long as I already have a dense signal (e.g., full trajectory on every frame already), I can leverage section 4.1 only?
Secondly, if I understand correctly, it seems that section 4.1 of you paper happens only during the sampling (inference) period instead of requring a re-training of an existing motion diffusion model. Can I know if my this understanding correct?
Really thanks for your help in advance.

About the foot skating ratio of Real

Thanks foy your great work. I use your provided calculate_skating_ratio function to calculate the foot skating ratio of Real, the result is around 0.05. But The paper OmniControl reports that the foot skating ratio of Real is 0.00. Can you provide the calculation results of Real？

Can you give an example how to generate the trajectory and then generate the motion?

Max frames in your paper

Hi Karunratanakul

Thanks for amazing work! I have some parameter questions about your code.

I found that in your args of the provided ckpt the num_frames=60. Did this mean that in your experiments of the paper, the maximum frames is 60? But in your given code, the default num_frames = 224? (I directly run python -m train.train_trajectory, and in my saved args.json, the num_frams=224) I notice that in this file https://github.com/korrawe/guided-motion-diffusion/blob/d4a38acf4256eac195741533e894a289d7a47c15/utils/parser_util.py#L119C16-L119C16, you set the default num_frames=60, however, by changing this default parameters will not lead to the change in final args. I try other parameters like batch_size in TrainingOptions, The parameter will not lead to the change in final args. Could you please check this? I tried a lot and didn't figure out where the training default values come from. Hope that you can help about this.

Thanks in advance
Bests,
Yiqun

AttributeError: 'Namespace' object has no attribute 'train_keypoint_mask'

Hi,

I'm currently facing an issue while attempting to test "Motion Synthesis". When I run the command:
python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode kps
I encounter the following error:
File "/home/user/guided-motion-diffusion/utils/model_util.py", line 110, in get_model_args
'train_keypoint_mask': args.train_keypoint_mask,
AttributeError: 'Namespace' object has no attribute 'train_keypoint_mask'

Could you please provide guidance on how to resolve this issue?

Can I save the outputs with only skeleton nodes?

Hi, thanks for your great work!
But as in your README. It seems that I can only output the mp4 files or obj files. Can we output the skeleton files like bvh formats?
Looking forward to your early reply.

Conditioned evaluation

Hi again @korrawe !
Can you please help me map the conditioned metrics as reported in the paper (specifically: Traj err, Loc err, Avg err) to those reported in the eval script:

PS - Traj diversity seems to be missing (from ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/eval_humanml_cond_unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224_000500000_gscale2.5_wo_mm.log), how can I calculate it as well?

Question about the evaluation speed

Thanks for opening source of your excellent work!

I have a question about the evaluation speed. In original MDM (https://github.com/GuyTevet/motion-diffusion-model), the evaluation cost about 20 hours, where every replication cost about 30 minutes.
However, in your evaluation log, the evaluation is super fast, every replication only costs half a minute. I tried to run the provided evaluation code but the speed is still slow. Could you please tell me how to speed up the evaluation like to match the speed as shown in your log?

Thanks a lot for your patience!!

The evaluation log of MDM:

==================== Replication 0 ====================
Time: 2022-09-21 11:18:01.921946
---> [ground truth] Matching Score: 2.9829
---> [ground truth] R_precision: (top 1): 0.5127 (top 2): 0.7028 (top 3): 0.7901 
---> [vald] Matching Score: 5.4813
---> [vald] R_precision: (top 1): 0.3428 (top 2): 0.5137 (top 3): 0.6328 
Time: 2022-09-21 11:18:08.139416
---> [ground truth] FID: 0.0016
---> [vald] FID: 0.5877
Time: 2022-09-21 11:18:12.500365
---> [ground truth] Diversity: 9.7076
---> [vald] Diversity: 9.8887
!!! DONE !!!
==================== Replication 1 ====================
Time: 2022-09-21 11:50:54.935330
---> [ground truth] Matching Score: 2.9990
---> [ground truth] R_precision: (top 1): 0.5213 (top 2): 0.7028 (top 3): 0.7920 
---> [vald] Matching Score: 5.6097
---> [vald] R_precision: (top 1): 0.3203 (top 2): 0.5107 (top 3): 0.6260 
Time: 2022-09-21 11:51:01.926553
---> [ground truth] FID: 0.0017
---> [vald] FID: 0.5371
Time: 2022-09-21 11:51:06.478357
---> [ground truth] Diversity: 9.2401
---> [vald] Diversity: 9.3920
!!! DONE !!!

The evaluation log provided in your repo

==================== Replication 0 ====================
Time: 2023-03-07 19:09:19.460262
---> [ground truth] Matching Score: 2.9721
---> [ground truth] R_precision: (top 1): 0.5013 (top 2): 0.7039 (top 3): 0.7974 
---> [vald] Skating Ratio: -1.0000
---> [vald] Matching Score: 5.1525
---> [vald] R_precision: (top 1): 0.3887 (top 2): 0.5850 (top 3): 0.6797 
Time: 2023-03-07 19:09:24.482545
---> [ground truth] FID: 0.0016
---> [vald] FID: 0.2199
Time: 2023-03-07 19:09:27.632161
---> [ground truth] Diversity: 9.8112
---> [vald] Diversity: 9.7503
!!! DONE !!!
==================== Replication 1 ====================
Time: 2023-03-07 19:09:39.281943
---> [ground truth] Matching Score: 2.9388
---> [ground truth] R_precision: (top 1): 0.5082 (top 2): 0.7043 (top 3): 0.8019 
---> [vald] Skating Ratio: -1.0000
---> [vald] Matching Score: 5.2868
---> [vald] R_precision: (top 1): 0.3672 (top 2): 0.5498 (top 3): 0.6455 
Time: 2023-03-07 19:09:43.343890
---> [ground truth] FID: 0.0016
---> [vald] FID: 0.2308
Time: 2023-03-07 19:09:46.924741
---> [ground truth] Diversity: 9.5119
---> [vald] Diversity: 10.1362
!!! DONE !!!

missing `train_keypoint_mask` in eval

Hi @korrawe !
Similarly to #2 , when running eval , I get:

Traceback (most recent call last):
  File "/disk1/guytevet/guided-motion-diffusion/eval/eval_humanml_condition.py", line 445, in <module>
    if args.train_keypoint_mask != "none":
AttributeError: 'Namespace' object has no attribute 'train_keypoint_mask'

What should be the fix in that case?

Thanks,
Guy

Is denoiser used in the keyframe location conditioning?

In the paper, dense signal propagation uses a denosier, the existing DPM model, to solve the keyframe location conditioning task. Does the code also follow what the paper does? I can only find a reward model in the conditioning task, but in the README I didn't see where to download a reward model nor the instructions to generate samples with a reward model. Thanks for correcting me if I missed something.

Choice of mean and std for denormalizing absolute data in 'eval' mode

Hi,

I'm looking at the evaluation code and I'm having a hard time understanding why the absolute data's mean and std are used to denormalize samples from gen_loader which have relative representation (if I understand correctly).
Specificly, in comp_v6_model_dataset.py line 481, gt_poses (sampled from gen_loader) is denormalized using self.dataset.std and self.dataset.mean which are equal to Std_abs_3d.npy and Mean_abs_3d.npy . Shouldn't self.dataset.std_rel and self.dataset.mean_rel be used instead? I tested denormalizing with both of these stats and then visualized the samples (after converting them back to global-xyz representation). The sample normalized with absolute data's mean and std looks better - it walks in a big circle where as the other motion appears to be walking in place with lots of sliding - but I'm not sure if I understand why.

Thanks,
Setareh

Using the averaged model

Hi @korrawe , Great work!

(1) It seems that when sampling the model, you avoid using the averaged model. Is that true? If so, why?

guided-motion-diffusion/sample/generate.py

Line 175 in e54268d

load_saved_model(model, args.model_path) # , use_avg_model=args.gen_avg_model)

(2) During training, do you update the optimized model (self.model) to be the averaged model (self.model_avg)? If so, where?

Question about motion pre/postprocessing and format

I am trying to understand the code used in GMD, and one experiment I have done is (in the default unconditioned generation mode), I replace the model-generated motion with one of the existing motions in the dataset to visualize the motions in the dataset. Specifically, I am running this code in generate.py:

motion = torch.Tensor(np.load('./dataset/HumanML3D/new_joint_vecs_abs_3d/000036.npy')) // loads a 263-dim motion in the HumanML3D dataset
sample = [motion.unsqueeze(0).unsqueeze(1).repeat(10, 1, 1, 1).permute(0, 3, 1, 2) * 10] // reshapes the motion such that it can be processed by sample_to_motion
and then it proceeds to call sample_to_motion on this sample, and render it.

However, when it renders the motion, the motion, while the joints are rotating and moving, the (x, y) position of the motion does not move at all, like the example below.

So my question is, what pre/post-processing goes into the motions after they are generated by the model (aside from std/mean adjustment and inverse random matrix proj)? And why does calling sample_to_motion on one of the existing files in the dataset produce a motion with no movement in the (x, y) direction?

sample00.mp4

Thank you!

SMPL translation inaccuracy

Hi there,
I'm interested in your project and wish to use GMD to generate scenes in the Isaac Gym simulater, which requires translation, rotation and other theta/beta parameters as input. Now I'm able to get the rotation and all theta parameters correct, while there is a trans gap between the meshes I reconstruct and those you provide.
(gray: .obj from GMD; colorful: .obj reconstructed from parameters; visualizer: open3d)

I've noticed that you do something in model/rotation2xyz.py,

# the first translation root at the origin
x_translations = x_translations - x_translations[:, :, [0]]

but after following it, there is still a tiny gap :(

Could you please provide some hints/information about the exact translation of the models?

Thanks,
Lofen Chen