Giter Club home page Giter Club logo

text-to-motion's Introduction

Generating Diverse and Natural 3D Human Motions from Text (CVPR 2022)

teaser_image

Given a textual description for example, "the figure rises from a lying position and walks in a counterclockwise circle, and then lays back down the ground", our approach generates a diverse set of 3d human motions that are faithful to the provided text.

Python Virtual Environment

Anaconda is recommended to create this virtual environment.

conda create -f environment.yaml
conda activate text2motion_pub

If you cannot successfully create the environment, here is a list of required libraries:

Python = 3.7.9   # Other version may also work but are not tested.
PyTorch = 1.6.0 (conda install pytorch==1.6.0 torchvision==0.7.0 -c pytorch)  #Other version may also work but are not tested.
scipy
numpy
tensorflow       # For use of tensorboard only
spacy
tqdm
ffmpeg = 4.3.1   # Other version may also work but are not tested.
matplotlib = 3.3.1

After all, if you want to generate 3D motions from customized raw texts, you still need to install the language model for spacy.

python -m spacy download en_core_web_sm

Download Data & Pre-trained Models

If you just want to play our pre-trained models, you don't need to download datasets.

Datasets

We are using two 3D human motion-language dataset: HumanML3D and KIT-ML. For both datasets, you could find the details as well as download link [here].
Please note you don't need to clone that git repository, since all related codes have already been included in current git project.

Download and unzip the dataset files -> Create a dataset folder -> Place related data files in dataset folder:

mkdir ./dataset/

Take HumanML3D for an example, the file directory should look like this:

./dataset/
./dataset/HumanML3D/
./dataset/HumanML3D/new_joint_vecs/
./dataset/HumanML3D/texts/
./dataset/HumanML3D/Mean.mpy
./dataset/HumanML3D/Std.npy
./dataset/HumanML3D/test.txt
./dataset/HumanML3D/train.txt
./dataset/HumanML3D/train_val.txt
./dataset/HumanML3D/val.txt  
./dataset/HumanML3D/all.txt 

Pre-trained Models

Create a checkpoint folder to place pre-traine models:

mkdir ./checkpoints

Download models for HumanML3D from [here]. Unzip and place them under checkpoint directory, which should be like

./checkpoints/t2m/
./checkpoints/t2m/Comp_v6_KLD01/           # Text-to-motion generation model
./checkpoints/t2m/Decomp_SP001_SM001_H512/ # Motion autoencoder
./checkpoints/t2m/length_est_bigru/        # Text-to-length sampling model
./checkpoints/t2m/text_mot_match/          # Motion & Text feature extractors for evaluation

Download models for KIT-ML [here]. Unzip and place them under checkpoint directory.

Training Models

All intermediate meta files/animations/models will be saved to checkpoint directory under the folder specified by argument "--name".

Training motion autoencoder

HumanML3D

python train_decomp_v3.py --name Decomp_SP001_SM001_H512 --gpu_id 0 --window_size 24 --dataset_name t2m

KIT-ML

python train_decomp_v3.py --name Decomp_SP001_SM001_H512 --gpu_id 0 --window_size 24 --dataset_name kit

Train text2length model:

HumanML3D

python train_length_est.py --name length_est_bigru --gpu_id 0 --dataset_name t2m

KIT-ML

python train_length_est.py --name length_est_bigru --gpu_id 0 --dataset_name kit

Training text2motion model:

HumanML3D

python train_comp_v6.py --name Comp_v6_KLD01 --gpu_id 0 --lambda_kld 0.01 --dataset_name t2m

KIT-ML

python train_comp_v6.py --name Comp_v6_KLD005 --gpu_id 0 --lambda_kld 0.005 --dataset_name kit

Training motion & text feature extractors:

HumanML3D

python train_tex_mot_match.py --name text_mot_match --gpu_id 1 --batch_size 8 --dataset_name t2m

KIT-ML

python train_tex_mot_match.py --name text_mot_match --gpu_id 1 --batch_size 8 --dataset_name kit

Generating and Animating 3D Motions (HumanML3D)

Sampling results from test sets

python eval_comp_v6.py --name Comp_v6_KLD01 --est_length --repeat_time 3 --num_results 10 --ext default --gpu_id 1

where --est_length asks the model to use sampled motion lengths for generation, --repeat_time gives how many sampling rounds are carried out for each description. This script will results in 3x10 animations under directory ./eval_results/t2m/Comp_v6_KLD01/default/.

Sampling results from customized descriptions

python gen_motion_script.py --name Comp_v6_KLD01 --text_file input.txt --repeat_time 3 --ext customized --gpu_id 1

This will generate 3 animated motions for each description given in text_file ./input.txt.

If you find problem with installing ffmpeg, you may not be able to animate 3d results in mp4. Try gif instead.

Quantitative Evaluations

python final_evaluation.py 

This will evaluate the model performance on HumanML3D dataset by default. You could also run on KIT-ML dataset by uncommenting certain lines in ./final_evaluation.py. The statistical results will saved to ./t2m_evaluation.log.

Misc

Contact Chuan Guo at [email protected] for any questions or comments.

text-to-motion's People

Contributors

aniongithub avatar ericguo5513 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text-to-motion's Issues

Evaluation failure

Thanks for your great work!
Running with your conda env (ubuntu18), the script final_evaluations.py fails with:

python final_evaluations.py

Reading ./checkpoints/t2m/Comp_v6_KLD01/opt.txt
Loading dataset t2m ...
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 830/830 [00:00<00:00, 952.81it/s]
Pointer Pointing at 0
Ground Truth Dataset Loading Completed!!!
Reading ./checkpoints/t2m/Comp_v6_KLD01/opt.txt
Loading Evaluation Model Wrapper (Epoch 28) Completed!!
Reading ./checkpoints/t2m/Comp_v6_KLD01/opt.txt
Generating Comp_v6_KLD01 ...
./checkpoints/t2m/Comp_v6_KLD01/model
Loading model: Epoch 344 Schedule_len 049
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "final_evaluations.py", line 318, in <module>
    evaluation(log_file)
  File "final_evaluations.py", line 160, in evaluation
    motion_loader, mm_motion_loader = motion_loader_getter()
  File "final_evaluations.py", line 281, in <lambda>
    batch_size, gt_dataset, mm_num_samples, mm_num_repeats, device
  File "/disk1/guytevet/text-to-motion/motion_loaders/model_motion_loaders.py", line 60, in get_motion_loader
    dataset = CompV6GeneratedDataset(opt, ground_truth_dataset, w_vectorizer, mm_num_samples, mm_num_repeats)
  File "/disk1/guytevet/text-to-motion/motion_loaders/comp_v6_model_dataset.py", line 73, in __init__
    for i, data in tqdm(enumerate(dataloader)):
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/disk2/guytevet/anaconda3/envs/text2motion_pub/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/disk1/guytevet/text-to-motion/data/dataset.py", line 335, in __getitem__
    motion = (motion - self.mean) / self.std
ValueError: operands could not be broadcast together with shapes (104,251) (263,)

Error when training : [libopenh264 @ 0x55fb7278f140] Incorrect library version loaded

Hello,

When trying to train on the KIT-ML dataset, I get the following error message after 2 epochs:
epoch: 002 inner_iter: 2891 1m 30s (- 135m 45s) niter: 0008800 completed: 1%) val_loss: 0.2733 loss: 0.2719 loss_rec: 0.2719 loss_sparsity: 0.9671 loss_smooth: 0.7583
epoch: 002 inner_iter: 2941 1m 31s (- 135m 40s) niter: 0008850 completed: 1%) val_loss: 0.2733 loss: 0.2779 loss_rec: 0.2779 loss_sparsity: 0.9586 loss_smooth: 0.7512
Validation time:
Validation Loss: 0.25845 Reconstruction Loss: 0.25684 Sparsity Loss: 0.89871 Smooth Loss: 0.96879
MovieWriter stderr:
[libopenh264 @ 0x55fb7278f140] Incorrect library version loaded
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

Traceback (most recent call last):
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 233, in saving
yield self
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 1090, in save
anim._init_draw() # Clear the initial frame
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 1748, in _init_draw
self._draw_frame(frame_data)
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 1767, in _draw_frame
self._drawn_artists = self._func(framedata, *self._args)
File "/s/red/a/nobackup/vision/anju/text2motion/text-to-motion/utils/plot_script.py", line 81, in update
ax.lines = []
AttributeError: can't set attribute

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_decomp_v3.py", line 100, in
trainer.train(train_loader, val_loader, plot_t2m)
File "/s/red/a/nobackup/vision/anju/text2motion/text-to-motion/networks/trainers.py", line 209, in train
plot_eval(data, save_dir)
File "train_decomp_v3.py", line 23, in plot_t2m
plot_3d_motion(save_path, kinematic_chain, joint, title="None", fps=fps, radius=radius)
File "/s/red/a/nobackup/vision/anju/text2motion/text-to-motion/utils/plot_script.py", line 114, in plot_3d_motion
ani.save(save_path, fps=fps)
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 1107, in save
writer.grab_frame(**savefig_kwargs)
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 235, in saving
self.finish()
File "/s/red/a/nobackup/vision/anju/text2motion/new_cvenv/lib/python3.8/site-packages/matplotlib/animation.py", line 349, in finish
raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['ffmpeg', '-f', 'rawvideo', '-vcodec', 'rawvideo', '-s', '1000x1000', '-pix_fmt', 'rgba', '-r', '12.5', '-loglevel', 'error', '-i', 'pipe:', '-vcodec', 'h264', '-pix_fmt', 'yuv420p', '-y', './checkpoints/kit/Decomp_SP001_SM001_H512/animation/E0003/00.mp4']' returned non-zero exit status 1.
(/s/red/a/nobackup/vision/anju/text2motion/new_cvenv) carnap:/s/red/a/nobackup/vision/anju/text2motion/text-to-motion$ python train_decomp_v3.py --name Decomp_SP001_SM001_H512 --gpu_id 0 --window_size 24 --dataset_name kit

How to implement SMPL skin model?

Hello

First of all thank you very much for posting this repository.

The result of my reproduction is a skeleton model of the human body, how should the SMPL skin model shown in your project be realized?

Thanks in advance๏ผ

Foot Contact for KIT

Thank you for all your amazing work.
My KIT visualization looks weird.

  1. Feet are moving
  2. Hands are behind
    Have you seen the same issues? Are these issues from the raw data itself?
    Animation

Text to motion for 3D hand motions

Hello,

Is it possible to use this code to train on a dataset with hand-object interactions ( since my goal is text to motion for 3D hand motions)?
If so, could you point me to the relevant portions of the code that I would need to modify?

Thank You for your time.

Which model does HumanML3D use for POS-tagging?

Refer to the HumanML3D dataset, I want to label my own dataset, and fine-tune the text/motion encoder for my own task.
Here I noticed the dataset use POS-tagging and extra-dictionary:

  POS_enumerator = {
      'VERB': 0, 		
      'NOUN': 1,		
      'DET': 2,		
      'ADP': 3,		
      'NUM': 4,		
      'AUX': 5,		
      'PRON': 6,		
      'ADJ': 7,		
      'ADV': 8,		
      'Loc_VIP': 9,
      'Body_VIP': 10,
      'Obj_VIP': 11,
      'Act_VIP': 12,
      'Desc_VIP': 13,
      'OTHER': 14,
  }

  Loc_list = ('left', 'right', 'clockwise', 'counterclockwise', 'anticlockwise', 'forward', 'back', 'backward',
              'up', 'down', 'straight', 'curve')
  
  Body_list = ('arm', 'chin', 'foot', 'feet', 'face', 'hand', 'mouth', 'leg', 'waist', 'eye', 'knee', 'shoulder', 'thigh')
  
  Obj_List = ('stair', 'dumbbell', 'chair', 'window', 'floor', 'car', 'ball', 'handrail', 'baseball', 'basketball')
  
  Act_list = ('walk', 'run', 'swing', 'pick', 'bring', 'kick', 'put', 'squat', 'throw', 'hop', 'dance', 'jump', 'turn',
              'stumble', 'dance', 'stop', 'sit', 'lift', 'lower', 'raise', 'wash', 'stand', 'kneel', 'stroll',
              'rub', 'bend', 'balance', 'flap', 'jog', 'shuffle', 'lean', 'rotate', 'spin', 'spread', 'climb')
  
  Desc_list = ('slowly', 'carefully', 'fast', 'careful', 'slow', 'quickly', 'happy', 'angry', 'sad', 'happily',
               'angrily', 'sadly')

Because I can't find relevant information in the paper and supplementary materials, I want to know:

  1. which model is used for POS-tagging?
  2. how does the extra-dictionary constructed?

Licensing questions

Hello @EricGuo5513 ,

  1. There is no mentioning of the license in this repository. Can you please fill this gap in?
  2. Do I assume correctly that you store the data in SMPL format and related SMPL software should be used to convert it to any other format, thus applying the license of SMPL-Body or equivalent derivative?
  3. Are any of derivatives of the SMPL-Model have been used in your code, thus applying the license of the SMPL-Model as well?

Thanks in advance,
pk

Work slides presentation

Hi guys,

Your work is super interesting and exciting!

I am wondering if you maybe have a slides presentation (PowerPoint file) already prepared? I want to present your paper at my work :)

Thanks!

How to get the glove

I want train this model on chinese dataset, but I do not know how to get chinese glove?

ValueError: array must not contain infs or NaNs

Hello, I would like to evaluate the pre trained model on t2m, but it reported an error as follows. How can I resolve this issue?

(motiondiffuse) ubuntu@ubuntu:~/fyy/text-to-motion$ python final_evaluations.py 
Reading ./checkpoints/t2m/Comp_v6_KLD01/opt.txt
Loading dataset t2m ...
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4384/4384 [00:00<00:00, 6065.30it/s]
4648
4648
Pointer Pointing at 0
Ground Truth Dataset Loading Completed!!!
Reading ./checkpoints/t2m/Comp_v6_KLD01/opt.txt
Loading Evaluation Model Wrapper (Epoch 28) Completed!!
Reading ./checkpoints/t2m/Comp_v6_KLD01/opt.txt
Generating Comp_v6_KLD01 ...
./checkpoints/t2m/Comp_v6_KLD01/model
Loading model: Epoch 344 Schedule_len 049
4648it [01:22, 56.16it/s]
Generated Dataset Loading Completed!!!
==================== Replication 0 ====================
Time: 2023-08-08 22:12:23.306905
========== Evaluating Matching Score ==========
---> [ground truth] Matching Score: nan
---> [ground truth] R_precision: (top 1): 0.5162 (top 2): 0.7088 (top 3): 0.8015 
---> [Comp_v6_KLD01] Matching Score: 3.4038
---> [Comp_v6_KLD01] R_precision: (top 1): 0.4524 (top 2): 0.6278 (top 3): 0.7254 
Time: 2023-08-08 22:12:27.198547
========== Evaluating FID ==========
Traceback (most recent call last):
  File "final_evaluations.py", line 319, in <module>
    evaluation(log_file)
  File "final_evaluations.py", line 172, in evaluation
    fid_score_dict = evaluate_fid(gt_loader, acti_dict, f)
  File "final_evaluations.py", line 101, in evaluate_fid
    fid = calculate_frechet_distance(gt_mu, gt_cov, mu, cov)
  File "/home/ubuntu/fyy/text-to-motion/utils/metrics.py", line 128, in calculate_frechet_distance
    covmean, _ = linalg.sqrtm(sigma1.dot(sigma2), disp=False)
  File "/home/ubuntu/anaconda3/envs/motiondiffuse/lib/python3.8/site-packages/scipy/linalg/_matfuncs_sqrtm.py", line 161, in sqrtm
    A = _asarray_validated(A, check_finite=True, as_inexact=True)
  File "/home/ubuntu/anaconda3/envs/motiondiffuse/lib/python3.8/site-packages/scipy/_lib/_util.py", line 287, in _asarray_validated
    a = toarray(a)
  File "/home/ubuntu/anaconda3/envs/motiondiffuse/lib/python3.8/site-packages/numpy/lib/function_base.py", line 488, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

converting 3D motion plot to the full body render

Hi - thanks for this interesting work. Is there code available to generate the blue 3D body renders that appear in the paper ? I am using T2M running with Comp_v6_KLD01 and have got the mp4 of red/blue/black joint lines plotted by plot_script.py and also have the output gen_motion_XX_XXXX_XX_a,npy joint data, just looking to do the final layer of processing to get to the rendered mesh output style that appears in the paper (as mp4 or gif).

humanml3d convert

I want to convert my human motion dataset into a format like humanml3d, what information should my dataset contain ๏ผŸI have position information.

Loss of feature extractor training

Hi, authors

Thanks for presenting such great work. I followed your instruction to train the feature extractors, the training loss reduces normally, but the val loss rises after a few iterations. The encodes seem to be OVERFITTING heavily. I use your pretrianed vae checkpoint and the only change is the batch size to 128.
Here is my log.

train_text_mot_match_humanml.log

All config bools set to True by default in get_opt.py

I've noticed something in the code when creating my own config file.
On this line, all bools are automatically set to True, as they take a string as input.
It seems to only concern the variable "input_z" in the existing config file, as the variables "is_train" and "is_continue" are set to False here.

Could you describe what these variables are used for exactly?

Thanks for your help.

How can the output data be converted to bvh format?

I'm trying to convert the rotations and joint position into bvh format so that I can do better visualization. I can see that there is an IK method in the motion_process.py file which might help me to get the local rotation information. But it turns out that it's not correct. I fed in the joint positions returned by recover_from_ric method.

Would be appreciate if any help or hint is provided.

Question about 263 dimensions

the paper says pose is defined by root angular vec(Y), linear vec(XZ), root height. And LOCAL joints positions(R^3j), velocities(R^3j) and rotations(R^6j) in root space, finally there is binary features obtained by thresholding the heel and toe joint velocities to emphasize the foot ground contacts(R^4). So i think the dimensions of pose is 4(root)+21*12+4(binary features)=260? where is the lost 3 dimensions?

Any response will be appreciated!!!

Potential Bug in TextMotionMatchTrainer

I am training my own text and motion embedding models for evaluation. I noticed in the TextMotionMatchTrainer class, there is a potential bug in the shift applied to create negative examples for the contrastive loss.

def backward(self):

        batch_size = self.text_embedding.shape[0]
        '''Positive pairs'''
        pos_labels = torch.zeros(batch_size).to(self.text_embedding.device)
        self.loss_pos = self.contrastive_loss(self.text_embedding, self.motion_embedding, pos_labels)

        '''Negative Pairs, shifting index'''
        neg_labels = torch.ones(batch_size).to(self.text_embedding.device)
        shift = np.random.randint(0, batch_size-1) # BUG
        new_idx = np.arange(shift, batch_size + shift) % batch_size
        self.mis_motion_embedding = self.motion_embedding.clone()[new_idx]
        self.loss_neg = self.contrastive_loss(self.text_embedding, self.mis_motion_embedding, neg_labels)
        self.loss = self.loss_pos + self.loss_neg

        loss_logs = OrderedDict({})
        loss_logs['loss'] = self.loss.item()
        loss_logs['loss_pos'] = self.loss_pos.item()
        loss_logs['loss_neg'] = self.loss_neg.item()
        return loss_logs

If we shift 0, then the "negative" examples with be compared to itself. This is especially problematic when training with low batch sizes, like in the README (batch size 8). The correction is

shift = np.random.randint(1, batch_size-1)

After doing this, I see improved training curves. Below, the grey curve is with the bug fix and the purple curve is with the original code. Batch size is 8.

Screenshot 2024-01-18 at 6 01 22โ€ฏPM

Tensorrt version error

when attempting to download from requirements.txt, it flags an arror about an unsatisfied tensorrt version.
tensorrt ERR

not sure if i'm doing something incorrect or not...

edit/update: specifying tensorrt version in requirements.txt seems to to fix it.

How to obtain glove?

Thank you for your amazing work! May I ask how we can obtain glove? It seems like some vocab and a 300-dim word vector. Where can we get the 300 dim vectors? (e.g. if we want to enlarge the vocab or change to another language)

Can I export the pose sequences?

Hi

First of all thank you very much for posting this repository.

Is it possible to export the 3d poses that a particular text prompt generates? I would like to use them for, for instance, blending them with another set of images.

Thanks in advance

Format of `data` passed to `recover_from_ric`?

What is the layout of the data passed in to recover_from_ric? I'd like to just get the local rotations and translations of each bone and do my own FK on it instead of getting absolute positions.

Can't set attribute in plot_3d_motion function

Hello,

please your help with an error that tried to solve but couldn't.

in line 81 of your code in plot_3d_motion function an error appears:

--> ax.lines = []
ax.collections = []
ax.view_init(elev=120, azim=-90)
ax.dist = 7.5

AttributeError: can't set attribute

could you please help me to solve it. This function uses FuncAnimation of matplotlib to create a fast animation of the character.

Thank you!

ValueError: not enough values to unpack (expected 2, got 0)

100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 23384/23384 [00:00<00:00, 154772.68it/s]
Traceback (most recent call last):
File "train_comp_v6.py", line 146, in
train_dataset = Text2MotionDataset(opt, mean, std, train_split_file, w_vectorizer)
File "/home/nsazielle/Synthda/text-to-motion/data/dataset.py", line 91, in init
name_list, length_list = zip(*sorted(zip(new_name_list, length_list), key=lambda x: x[1]))
ValueError: not enough values to unpack (expected 2, got 0)

any idea why this happened?

Time to arrival

It seems to me that you use positional encoding, although in the paper you mention to use time to arrival.

Could you please forward to where you implement and use time to arrival positional encoding?

Error on loading val_dataset.

Hi Guo,

Thank you for sharing the official codes. I followed the instructions in the README.md and copy the HumanML3D dataset from here directly. However, when I tried to run the train_decomp_v3.py, I met an error shown in Fig.1. I would like to know what makes this error and how to fix it.

Many thanks for your help.
2022-06-14 21-20-10ๅฑๅน•ๆˆชๅ›พ

Output of model

Hi, I have tried this project and generated skeleton animations!

However, I also want to visualize it in SMPL mesh, so I am just curious about if it is possible to get 22 joints' rotations and root joint's translation from the output of the model.

I found the output of model are below.
If I wrongly understand, correct me please:

root rotation velocity (Y), 1
root velocity (X and Z), 2
root position (Y), 1
21 joints' local position, 63
21 joints' local rotation, 126
22 joints' local velocity, 66
foot contact, 4

(#frame, 263)

Can not train text2motion model

I used the following command to train text2motion model.

python train_comp_v6.py --name Comp_v6_KLD01 --gpu_id 0 --lambda_kld 0.01 --dataset_name t2m

But an error occurred at about the 500th iteration.

epoch:   0 niter:     50 sub_epoch:  0 inner_iter:   49 1m 3s val_loss: 0.0000  loss_gen: 0.9746  loss_mot_rec: 0.6054  loss_mov_rec: 0.3078  loss_kld: 6.1342  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    100 sub_epoch:  0 inner_iter:   99 1m 52s val_loss: 0.0000  loss_gen: 0.8278  loss_mot_rec: 0.5583  loss_mov_rec: 0.2669  loss_kld: 0.2562  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    150 sub_epoch:  0 inner_iter:  149 2m 39s val_loss: 0.0000  loss_gen: 0.7858  loss_mot_rec: 0.5226  loss_mov_rec: 0.2612  loss_kld: 0.1987  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    200 sub_epoch:  0 inner_iter:  199 3m 26s val_loss: 0.0000  loss_gen: 0.7743  loss_mot_rec: 0.5013  loss_mov_rec: 0.2703  loss_kld: 0.2686  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    250 sub_epoch:  0 inner_iter:  249 4m 15s val_loss: 0.0000  loss_gen: 0.7172  loss_mot_rec: 0.4526  loss_mov_rec: 0.2606  loss_kld: 0.4083  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    300 sub_epoch:  0 inner_iter:  299 5m 5s val_loss: 0.0000  loss_gen: 0.6978  loss_mot_rec: 0.4346  loss_mov_rec: 0.2569  loss_kld: 0.6277  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    350 sub_epoch:  0 inner_iter:  349 5m 56s val_loss: 0.0000  loss_gen: 0.6697  loss_mot_rec: 0.3971  loss_mov_rec: 0.2579  loss_kld: 1.4755  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    400 sub_epoch:  0 inner_iter:  399 6m 44s val_loss: 0.0000  loss_gen: 0.6330  loss_mot_rec: 0.3584  loss_mov_rec: 0.2516  loss_kld: 2.2961  sl_length:10 tf_ratio:0.40
epoch:   0 niter:    450 sub_epoch:  0 inner_iter:  449 7m 13s val_loss: 0.0000  loss_gen: 0.5758  loss_mot_rec: 0.3212  loss_mov_rec: 0.2324  loss_kld: 2.2157  sl_length:10 tf_ratio:0.40
[W python_anomaly_mode.cpp:104] Warning: Error detected in DivBackward0. Traceback of forward call that caused the error:
  File "train_comp_v6.py", line 149, in <module>
    trainer.train(train_dataset, val_dataset, plot_t2m)
  File "/SSD_DISK/users/projects/3Dpose/t2m_test/text-to-motion/networks/trainers.py", line 658, in train
    log_dict = self.update()
  File "/SSD_DISK/users/projects/3Dpose/t2m_test/text-to-motion/networks/trainers.py", line 480, in update
    loss_logs = self.backward_G()
  File "/SSD_DISK/users/projects/3Dpose/t2m_test/text-to-motion/networks/trainers.py", line 456, in backward_G
    self.loss_kld = self.kl_criterion(self.mus_post, self.logvars_post, self.mus_pri, self.logvars_pri)
  File "/SSD_DISK/users/projects/3Dpose/t2m_test/text-to-motion/networks/trainers.py", line 267, in kl_criterion
    2 * torch.exp(logvar2)) - 1 / 2
 (function _print_stack)
Traceback (most recent call last):
  File "train_comp_v6.py", line 149, in <module>
    trainer.train(train_dataset, val_dataset, plot_t2m)
  File "/SSD_DISK/users/projects/3Dpose/t2m_test/text-to-motion/networks/trainers.py", line 658, in train
    log_dict = self.update()
  File "/SSD_DISK/users/projects/3Dpose/t2m_test/text-to-motion/networks/trainers.py", line 484, in update
    self.loss_gen.backward()
  File "/SSD_DISK/users/software/anaconda3/envs/yhy_hm/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/SSD_DISK/users/software/anaconda3/envs/yhy_hm/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: Function 'DivBackward0' returned nan values in its 0th output.

Number of motions seems less than the Real one

Hello, I have a question regarding the method used to compute the number of motion. While investigating your Motiondataset and dataloader, it appears that the motion numbers during training are counted as the total number of motion snippets across all training subset, divided by the batch size. I can understand the reasoning behind this approach, but it leads to a significant reduction on the real data size. Specifically, the number of motions in the training set was initially 23,384. After removing motions under window_size 64, it decreased to 20,942, and further reduced to 14,435 training motions using the aforementioned method.

Your clarification on this behavior would be greatly appreciated. Thank you.

pose representation

I have generated motions from your pretrained model and thess motions are in the world coordinate. If I want to convert the 263 dimensions of the motion to the relative coordinate, what should I do? Is there any code I can refer to?

About the confidence of generated results

Thanks for such great work!
Is there a numerical value that can represent the confidence of the generated result pose motion?
And It would be great if you could explain the return value.
For example, the function generate in trainers.py 384 returns fake_motions, mus_pri, att_wgts. Could you please explain them a little bit?

Do we need to load the pretrained checkpoints when training the feature extractor?

Hi, I use a different motion representation from HumanML3D, and thus I need to train a new feature extractor for evaluation. I run the following command:
python train_tex_mot_match.py --name text_mot_match --gpu_id 1 --batch_size 8 --dataset_name t2m
But an error occurs because I need to load the pretrained checkpoint. Do I need to train a new motion autoencoder before training the feature extraction network?
image

About the comparative experiment

Dear author:
The motion input of your work is 263 dimensions, may I ask if you changed the input of other methods to 263 dimensions and retrained other methods on the HumanML3D dataset when you did the comparison experiment with other methods?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.