guytevet / motionclip Goto Github PK

View Code? Open in Web Editor NEW

377.0 377.0 36.0 426 KB

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

License: MIT License

Shell 0.29% Python 99.71%

motionclip's People

Contributors

Stargazers

Watchers

motionclip's Issues

Why input size is 25 x 6?

what a great work!

I'm implementing with your code, but I have a question.

In your paper, Input is orientations in 6D representation of SMPL body model(24 x 6). When I check from debugging, The input shape is 25 x 6.

Which one is right? or What component is added to 24 x 6?

'amass_30fps_legacy_db.pt'

Could you kind share the file 'amass_30fps_legacy_db.pt' to me ? I can not correctly get it.

Reproducing paper results

Was clip finetuned in training the checkpoint? I am unable to reproduce results with clip frozen (didnt try with it unfrozen yet).

Can not find "data/amass/amass_30fps_legacy_db.pt"

I do parse data, but I can not find this file when I am doing text-to-motion.

code release?

Hi, Thanks for the great work. Do you have an estimate of when the code will be released?

the body of Text Motion render result is opposite

I test the text_motion script according to the readme, but the body in the render result is opposite. I don't modify any config and have no idea whether it is normal.

issue about rendering the sample

In the visualize.py, the param, "interval", is 1000/fps. May I know why this is set by default or why it is 1000.

train in num_frames == -2

if I want to use the num_frames == -2 mode, how should I input training parameters?

Training translation??

Hi, thanks a lot for sharing the code!!

I would like to confirm the training in translation.

In the training command you provided (paper-model), the translation is active(e.g. --translation).

While in the paper, Sec.3 Method, you precise that p_i \in 24 * 6, which indicates the training should not involve the translation.
Also in the demo, it seems that the generated motion has no translation.

Could you please be precise on this detail?

the test of editting and interpolation error

question like this:

SImilarities computed using motion and text embeddings are incorrect

Like CLIP, where we compute the image and text embeddings and compute the similarities to retrieve the best matching text, I tried the same using motion and text, but it does not work.

Eg. Using the AMASS dataset and bs = 2; texts: 'jump', 'dancing',

emb = enc.encode_motions(batch['x']).to(device)
emb /= emb.norm(dim=-1, keepdim=True)

text_inputs = torch.cat([clip.tokenize(c) for c in batch["clip_text"]]).to(device)
text_features = clip_model.encode_text(text_inputs).float()
text_features /= text_features.norm(dim=-1, keepdim=True)

logit_scale = clip_model.logit_scale.exp()
similarity = (logit_scale * emb @ text_features.float().T).softmax(dim=-1)

values, indices = similarity[0].topk(len(batch["clip_text"]))

# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
    print(f"{batch['clip_text'][index]:>16s}: {100 * value.item():.2f}%")

Expected output for similarity[0] -> high "jump" probability
But I get a high "dance" probability output. I have tested this with multiple batches and the correct text does not get the highest similarity a majority of the times. Am I inferencing it wrong?

"action2motion_joints" list & AMASS db file does not include translation vector

Thanks for the interesting work!
I have two questions regarding amass.py and dataset.py respectively.

In line 93-95 in amass.py, why you only select such 18 joints (should it still be 24 or 22?) and switch #8 joint to the root position?

In line 115-122 in dataset.py, should the translation be the "trans" vector where "trans" is the key word of AMASS data dictionary? It seems that you did not store this "trans" vector into any places. Or do I misunderstand the meaning of "self.translation"?

Thanks!

The generated motions are static?

I train the model with the provided code. The generated results are almost static. What happened?

Two questions

I am attracted by your fascinating work and thanks for sharing the codes!

Here I have some questions:

I wonder how you got the paper-model perform so well. I trained my-model with the parameters suggested in readme.md, but it didn't perform will, only reach 27% top1.acc and 41% top5.acc on the action-recognition task.

2.Could you give some instructions on how to generate the motion video that views from different angles and perspectives but not "set global orientation to zero". (for example, the motions in the photo are generated from the side view, 90 degree)

consistency of the motion encoder and the motion decoder

For a certain motion sequence, I encode it with the motion encoder ( function ‘encode_motions()' in visualize.py ) and get the clip features of this sequence. Then, I decode the clip features with the motion decoder ( function generate() of class MOTIONCLIP in motionclip.py ). However, when I visualize the motion sequence after decoded, I found it is very different from the motion before encoding.

I want to know why it happens? Do you have any good solution?

What is the script duration_finetunning.py used for?

Thanks for your great job!
I'm curious about the usage of duration_finetunning.py, is MotionCLIP trained in a two-stage manner? Namely, training Transformer AutoEncoder in the first stage, and finetune CLIP text-image encoder for the second stage. Am I understanding right?

use smplx add-on

Hi MotionCLIP team,
Thanks for your great work, here I am wondering how to render a single image using the blender and the smplx add-on. In my work, I wanna render images from the HumanML3D format motion. Could you give me some instructions?
Really appreciate your work again and hope hearing from you!

Kangning

About AMSS Dataset

In your code, one of the 'amass_train_split' list is 'BioMotionLab_NTroje'. I did not see it on the AMSS dataset. It is different from the code in mass.

AssertionError：data/amass/amass_30fps_legacy_db.pt

Hi，thx for your geat job!!
I want to use the code to run text_to_motion scrpit, just like:

python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt

But I got the below exception:

Traceback (most recent call last): File "/data/miniconda3/envs/motion_clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/miniconda3/envs/motion_clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/visualize/text2motion.py", line 41, in <module> main() File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/visualize/text2motion.py", line 21, in main model, datasets = get_model_and_data(parameters, split='vald') File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/utils/get_model_and_data.py", line 26, in get_model_and_data datasets = get_datasets(parameters, clip_preprocess, split) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/datasets/get_dataset.py", line 18, in get_datasets dataset = DATA(split=split, clip_preprocess=clip_preprocess, **parameters) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/datasets/amass.py", line 142, in __init__ assert os.path.exists(self.datapath) AssertionError

Where can I to get tht pt file？ Much appreciate！

question about amass_parser.py

when I run python -m src.datasets.amass_parser --dataset_name amass, I get an error: RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 9932 but got size 52 for tensor number 1 in the list.
the details are as follows:
Loading babel labels
DONE! - Loading babel labels
args.input_dir: ./data/render
Loading Body Models
DONE! - Loading Body Models
Reading Transitions_mocap sequence...
0%| | 0/1 [00:00<?, ?it/s]/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:166: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_raw_text_labels = np.full(data['poses'].shape[0], "", dtype=np.object)
/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:167: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_proc_text_labels = np.full(data['poses'].shape[0], "", dtype=np.object)
/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:168: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_action_cat = np.full(data['poses'].shape[0], "", dtype=np.object)
rot_mats.shape: torch.Size([191, 52, 3, 3])
joints.shape: torch.Size([1, 52, 3])
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/licg/.conda/envs/motionclip/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/licg/.conda/envs/motionclip/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 318, in
db = read_data(args.input_dir, # ./data/amass
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 115, in read_data
results_dict = read_single_sequence(split_name, dataset_name, seq_folder, seq_name, body_models, target_fps,
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 228, in read_single_sequence
body_motion = body_model(pose_body=pose_body, pose_hand=pose_hand, root_orient=root_orient)
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/data2/lichenguang/projects/MotionCLIP/human_body_prior/body_model/body_model.py", line 247, in forward
verts, joints = lbs(betas=shape_components, pose=full_pose, v_template=v_template,
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 231, in lbs
J_transformed, A = batch_rigid_transform(rot_mats, J, parents, dtype=dtype)
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 394, in batch_rigid_transform
transforms_mat = transform_mat(
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 349, in transform_mat
return torch.cat([F.pad(R, [0, 0, 0, 1]),
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 9932 but got size 52 for tensor number 1 in the list.

Could you telll me how can I remove this bug? I think the problem is about the AMASS dataset. The contents of directory 'data/amass' are as follows:

environment creation using conda fails

Hi, environment.yml creating conda env is failing in Windows, Is there any suggestions?

Create conda environment failed

Running conda env create -f environment.yml in command:

conda --version: 4.13.0

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed
ResolvePackageNotFound:
  - _openmp_mutex==4.5=1_gnu
  - lcms2==2.12=h3be6417_0
  - lame==3.100=h7b6447c_0
  - ninja==1.10.2=hff7bd54_1
  - mkl_fft==1.3.0=py38h42c9631_2
  - xz==5.2.5=h7b6447c_0
  - torchvision==0.9.1=py38_cu101
  - numpy==1.20.3=py38hf144106_0
  - libgcc-ng==9.3.0=h5101ec6_17
  - openssl==1.1.1k=h27cfd23_0
  - mkl-service==2.4.0=py38h7f8727e_0
  - readline==8.1=h27cfd23_0
  - libuv==1.40.0=h7b6447c_0
  - libwebp-base==1.2.0=h27cfd23_0
  - intel-openmp==2021.3.0=h06a4308_3350
  - libtasn1==4.16.0=h27cfd23_0
  - mkl_random==1.2.2=py38h51133e4_0
  - libiconv==1.15=h63c8f33_5
  - ncurses==6.2=he6710b0_1
  - zstd==1.4.9=haebb681_0
  - ffmpeg==4.3=hf484d3e_0
  - freetype==2.10.4=h5ab3b9f_0
  - libgomp==9.3.0=h5101ec6_17
  - openjpeg==2.3.0=h05c96fa_1
  - pytorch==1.8.1=py3.8_cuda10.1_cudnn7.6.3_0
  - certifi==2021.5.30=py38h06a4308_0
  - tk==8.6.10=hbc83047_0
  - mkl==2021.3.0=h06a4308_520
  - lz4-c==1.9.3=h295c915_1
  - pip==21.0.1=py38h06a4308_0
  - pillow==8.3.1=py38h2c7a002_0
  - libtiff==4.2.0=h85742a9_0
  - numpy-base==1.20.3=py38h74d4b33_0
  - gmp==6.2.1=h2531618_2
  - gnutls==3.6.15=he1e5248_0
  - openh264==2.1.0=hd408876_0
  - nettle==3.7.3=hbbd107a_1
  - libunistring==0.9.10=h27cfd23_0
  - python==3.8.11=h12debd9_0_cpython
  - libffi==3.3=he6710b0_2
  - cudatoolkit==10.1.243=h6bb024c_0
  - libidn2==2.3.2=h7f8727e_0
  - zlib==1.2.11=h7b6447c_3
  - ca-certificates==2021.7.5=h06a4308_1
  - sqlite==3.36.0=hc218d9a_0
  - libpng==1.6.37=hbc83047_0
  - setuptools==52.0.0=py38h06a4308_0
  - libstdcxx-ng==9.3.0=hd4cf53a_17
  - bzip2==1.0.8=h7b6447c_0
  - jpeg==9b=h024ee3a_2
  - ld_impl_linux-64==2.35.1=h7274673_9

AMASS Dataset issue

I have a confusion about the amass dataset.
In the amass_parser.py file, let's download the dataset BioMotionLab_NTroje, but there are three on the official website of amass, which are BMLhandball、BMLmovi、BMLrub. I don't know which one to download.

Text-to-Motion issue

After running the code python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt ，appear the following issue. I don't know how to solve it.
Restore weights..
Visualization of the epoch 100
Generate the videos..
Render the generated samples: 100%|████████████████████████████████████████████████████| 24/24 [00:09<00:00, 2.41it/s]
Load the generated samples: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\anaconda\Anaconda3\envs\motionclip\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\anaconda\Anaconda3\envs\motionclip\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\text2motion.py", line 41, in
main()
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\text2motion.py", line 37, in main
viz_clip_text(model, grid, epoch, parameters, folder=folder)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 280, in viz_clip_text
frames = generate_by_video({}, {}, generation,
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 132, in generate_by_video
gener["frames"] = pool_job_with_desc(pool, iterator,
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in pool_job_with_desc
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\anim.py", line 44, in load_anim
data = np.array(imageio.mimread(path, memtest=False))[..., :3]
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\functions.py", line 354, in mimread
reader = read(uri, format, "I", **kwargs)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\functions.py", line 186, in get_reader
return format.get_reader(request)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\format.py", line 170, in get_reader
return self.Reader(self, request)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\format.py", line 221, in init
self._open(**self.request.kwargs.copy())
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillowmulti.py", line 60, in _open
return PillowFormat.Reader._open(self)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillow.py", line 138, in _open
as_gray=as_gray, is_gray=_palette_is_grayscale(self._im)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillow.py", line 689, in _palette_is_grayscale
palette = np.asarray(pil_image.getpalette()).reshape((256, 3))
ValueError: cannot reshape array of size 384 into shape (256,3)

The generated action is reversed

I generated a Text-to-Motion based on readme and the following situation occurred.

I want to know what caused it.

Training for action recognition

In the paper, it is said that for babel action recognition the training was performed using the labels rather than the raw text, but in code i found self.clip_label_text = "text_raw_labels" . "text_raw_labels" seems to load the raw text for each frame rather than just the category. Can you help me with understanding this? Thanks!

How to get the rendered image and the rigged character?

I appreciate this fascinating work so much.

I have a question: how can we render the images and where is the file location for the rigged character?

Thank you so much!

continuous issue about rendering the sample

In fig5, each motion has many actions of fragment. Example of fragment is as follows.

How are the actions of this fragment extracted from the results?

What device did you use to train the model and how long did it cost?

Hi, thank you for your amazing work.

I am curious about the training requirements. Could you share it?

results on HumanML3D dataset

Hi Guy Tevet,

I trust this message finds you well. I sincerely hope recent events, including any conflicts, haven't hindered your research efforts. Please accept my best wishes for you and everyone in your vicinity.

Upon checking the example text file you provided in this repo, I observed that the dataset's captions seem relatively simpler than those of HumanML3D, which has been used in your other studies.

With that in mind, I would be grateful if you could answer a couple of inquiries:

Have you experimented with MotionCLIP on the HumanML3D dataset? If so, could you kindly share your experiences? Please understand that even brief insights would be greatly appreciated. It is totally fine if it is not a very accurate description.
In your MotionCLIP-used dataset, did you attempt to train a standalone decoder utilizing CLIP features as input?

I thank you in advance for any insights you might offer.
Best regards.

Execution issues

On executing the code after first release using the training command, i.e python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d
--lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 100 --snapshot 10
--device
--datapath ./data/amass_db/amass_30fps_db.pt
--folder ./exps/my-paper-model

I face 2 errors,
(i) train.py: error: the following arguments are required: --dataset
Fix - Training command needs to be edited with the inclusion of dataset parameter
(ii) NameError: name 'batch_losses_dict' is not defined
Fix - line # 45 to 49 need to be commented in trainer.py as they seem to be of no use

Please correct me know if I am wrong in the above fixes

Could we ignore global translation and only take joint rotation as input to MotionCLIP?

Hi. I just wonder if we can ignore global translation and only take joint rotation as input to your pretrained MotionCLIP? Does this make sense? Thank you very much!

ffmpeg version

作者您好，请问您使用ffmpeg的版本是多少呢？

two extra loss terms: mmd and hessian_penalty

Hi,

thanks for sharing this project.

I saw you have two loss terms (mmd term and hessian term) that were not discussed in your paper (maybe I miss the detail, if so, could you point out that I miss them).

Could you please clarify whether these two terms were utilized during your training implementation, please?
And if you used them, could you possibly explain them a little bit to me about what is the functionality of them, thank you very much for your patience!

best regards

Visualize failed

when I test the Text-to-Motion part, the visualize part failed,

list index out of range
  File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 111, in pool_job_with_desc
    for _ in pool.imap_unordered(plot_3d_motion_dico, iterator): 
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 139, in generate_by_video
    gener["frames"] = pool_job_with_desc(pool, iterator,
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 413, in viz_clip_edit
    frames = generate_by_video({}, {}, generation,
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/motion_editing.py", line 33, in main
    viz_clip_edit(model, datasets, edit_csv, epoch, parameters, folder=folder)
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/motion_editing.py", line 37, in <module>
    main()
  File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
IndexError: list index out of range

it seems thatpool.imap_unordered(plot_3d_motion_dico, iterator) fails.

AttributeError: 'AMASS' object has no attribute 'nfeats'

when running "python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt" i get a that error. i just change.

The problem is in src/datasets/dataset.py", line 371.

I can run the script by assigning values to those variables in lines 371 and 372, so it seems that the issue lies in the parameter passing.

Thanks for your time, im kind of a noob :)

Motions are static

Hi,

Are there any clues about the reason why the motions reproduced are static? I believe I followed the instructions very well. I also mentioned the issue here. #5 (comment)

Shunlin

guytevet / motionclip Goto Github PK

motionclip's People

Contributors

Stargazers

Watchers

Forkers

motionclip's Issues

Recommend Projects

Recommend Topics

Recommend Org