guytevet / motionclip Goto Github PK
View Code? Open in Web Editor NEWOfficial Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"
License: MIT License
Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"
License: MIT License
what a great work!
I'm implementing with your code, but I have a question.
In your paper, Input is orientations in 6D representation of SMPL body model(24 x 6). When I check from debugging, The input shape is 25 x 6.
Which one is right? or What component is added to 24 x 6?
Could you kind share the file 'amass_30fps_legacy_db.pt' to me ? I can not correctly get it.
Was clip finetuned in training the checkpoint? I am unable to reproduce results with clip frozen (didnt try with it unfrozen yet).
I do parse data, but I can not find this file when I am doing text-to-motion.
Hi, Thanks for the great work. Do you have an estimate of when the code will be released?
In the visualize.py, the param, "interval", is 1000/fps. May I know why this is set by default or why it is 1000.
if I want to use the num_frames == -2 mode, how should I input training parameters?
Hi, thanks a lot for sharing the code!!
I would like to confirm the training in translation.
In the training command you provided (paper-model
), the translation is active(e.g. --translation
).
While in the paper, Sec.3 Method, you precise that p_i \in 24 * 6, which indicates the training should not involve the translation.
Also in the demo, it seems that the generated motion has no translation.
Could you please be precise on this detail?
Like CLIP, where we compute the image and text embeddings and compute the similarities to retrieve the best matching text, I tried the same using motion and text, but it does not work.
Eg. Using the AMASS dataset and bs = 2; texts: 'jump', 'dancing',
emb = enc.encode_motions(batch['x']).to(device)
emb /= emb.norm(dim=-1, keepdim=True)
text_inputs = torch.cat([clip.tokenize(c) for c in batch["clip_text"]]).to(device)
text_features = clip_model.encode_text(text_inputs).float()
text_features /= text_features.norm(dim=-1, keepdim=True)
logit_scale = clip_model.logit_scale.exp()
similarity = (logit_scale * emb @ text_features.float().T).softmax(dim=-1)
values, indices = similarity[0].topk(len(batch["clip_text"]))
# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
print(f"{batch['clip_text'][index]:>16s}: {100 * value.item():.2f}%")
Expected output for similarity[0] -> high "jump" probability
But I get a high "dance" probability output. I have tested this with multiple batches and the correct text does not get the highest similarity a majority of the times. Am I inferencing it wrong?
Thanks for the interesting work!
I have two questions regarding amass.py and dataset.py respectively.
In line 93-95 in amass.py, why you only select such 18 joints (should it still be 24 or 22?) and switch #8 joint to the root position?
In line 115-122 in dataset.py, should the translation be the "trans" vector where "trans" is the key word of AMASS data dictionary? It seems that you did not store this "trans" vector into any places. Or do I misunderstand the meaning of "self.translation"?
Thanks!
I train the model with the provided code. The generated results are almost static. What happened?
I am attracted by your fascinating work and thanks for sharing the codes!
Here I have some questions:
2.Could you give some instructions on how to generate the motion video that views from different angles and perspectives but not "set global orientation to zero". (for example, the motions in the photo are generated from the side view, 90 degree)
For a certain motion sequence, I encode it with the motion encoder ( function ‘encode_motions()' in visualize.py ) and get the clip features of this sequence. Then, I decode the clip features with the motion decoder ( function generate() of class MOTIONCLIP in motionclip.py ). However, when I visualize the motion sequence after decoded, I found it is very different from the motion before encoding.
I want to know why it happens? Do you have any good solution?
Thanks for your great job!
I'm curious about the usage of duration_finetunning.py, is MotionCLIP trained in a two-stage manner? Namely, training Transformer AutoEncoder in the first stage, and finetune CLIP text-image encoder for the second stage. Am I understanding right?
Hi MotionCLIP team,
Thanks for your great work, here I am wondering how to render a single image using the blender and the smplx add-on. In my work, I wanna render images from the HumanML3D format motion. Could you give me some instructions?
Really appreciate your work again and hope hearing from you!
Kangning
Hi,thx for your geat job!!
I want to use the code to run text_to_motion scrpit, just like:
python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt
But I got the below exception:
Traceback (most recent call last): File "/data/miniconda3/envs/motion_clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/miniconda3/envs/motion_clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/visualize/text2motion.py", line 41, in <module> main() File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/visualize/text2motion.py", line 21, in main model, datasets = get_model_and_data(parameters, split='vald') File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/utils/get_model_and_data.py", line 26, in get_model_and_data datasets = get_datasets(parameters, clip_preprocess, split) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/datasets/get_dataset.py", line 18, in get_datasets dataset = DATA(split=split, clip_preprocess=clip_preprocess, **parameters) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/datasets/amass.py", line 142, in __init__ assert os.path.exists(self.datapath) AssertionError
Where can I to get tht pt file? Much appreciate!
when I run python -m src.datasets.amass_parser --dataset_name amass, I get an error: RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 9932 but got size 52 for tensor number 1 in the list.
the details are as follows:
Loading babel labels
DONE! - Loading babel labels
args.input_dir: ./data/render
Loading Body Models
DONE! - Loading Body Models
Reading Transitions_mocap sequence...
0%| | 0/1 [00:00<?, ?it/s]/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:166: DeprecationWarning: np.object
is a deprecated alias for the builtin object
. To silence this warning, use object
by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_raw_text_labels = np.full(data['poses'].shape[0], "", dtype=np.object)
/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:167: DeprecationWarning: np.object
is a deprecated alias for the builtin object
. To silence this warning, use object
by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_proc_text_labels = np.full(data['poses'].shape[0], "", dtype=np.object)
/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:168: DeprecationWarning: np.object
is a deprecated alias for the builtin object
. To silence this warning, use object
by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_action_cat = np.full(data['poses'].shape[0], "", dtype=np.object)
rot_mats.shape: torch.Size([191, 52, 3, 3])
joints.shape: torch.Size([1, 52, 3])
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/licg/.conda/envs/motionclip/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/licg/.conda/envs/motionclip/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 318, in
db = read_data(args.input_dir, # ./data/amass
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 115, in read_data
results_dict = read_single_sequence(split_name, dataset_name, seq_folder, seq_name, body_models, target_fps,
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 228, in read_single_sequence
body_motion = body_model(pose_body=pose_body, pose_hand=pose_hand, root_orient=root_orient)
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/data2/lichenguang/projects/MotionCLIP/human_body_prior/body_model/body_model.py", line 247, in forward
verts, joints = lbs(betas=shape_components, pose=full_pose, v_template=v_template,
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 231, in lbs
J_transformed, A = batch_rigid_transform(rot_mats, J, parents, dtype=dtype)
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 394, in batch_rigid_transform
transforms_mat = transform_mat(
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 349, in transform_mat
return torch.cat([F.pad(R, [0, 0, 0, 1]),
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 9932 but got size 52 for tensor number 1 in the list.
Could you telll me how can I remove this bug? I think the problem is about the AMASS dataset. The contents of directory 'data/amass' are as follows:
Hi, environment.yml creating conda env is failing in Windows, Is there any suggestions?
Running conda env create -f environment.yml
in command:
conda --version: 4.13.0
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed
ResolvePackageNotFound:
- _openmp_mutex==4.5=1_gnu
- lcms2==2.12=h3be6417_0
- lame==3.100=h7b6447c_0
- ninja==1.10.2=hff7bd54_1
- mkl_fft==1.3.0=py38h42c9631_2
- xz==5.2.5=h7b6447c_0
- torchvision==0.9.1=py38_cu101
- numpy==1.20.3=py38hf144106_0
- libgcc-ng==9.3.0=h5101ec6_17
- openssl==1.1.1k=h27cfd23_0
- mkl-service==2.4.0=py38h7f8727e_0
- readline==8.1=h27cfd23_0
- libuv==1.40.0=h7b6447c_0
- libwebp-base==1.2.0=h27cfd23_0
- intel-openmp==2021.3.0=h06a4308_3350
- libtasn1==4.16.0=h27cfd23_0
- mkl_random==1.2.2=py38h51133e4_0
- libiconv==1.15=h63c8f33_5
- ncurses==6.2=he6710b0_1
- zstd==1.4.9=haebb681_0
- ffmpeg==4.3=hf484d3e_0
- freetype==2.10.4=h5ab3b9f_0
- libgomp==9.3.0=h5101ec6_17
- openjpeg==2.3.0=h05c96fa_1
- pytorch==1.8.1=py3.8_cuda10.1_cudnn7.6.3_0
- certifi==2021.5.30=py38h06a4308_0
- tk==8.6.10=hbc83047_0
- mkl==2021.3.0=h06a4308_520
- lz4-c==1.9.3=h295c915_1
- pip==21.0.1=py38h06a4308_0
- pillow==8.3.1=py38h2c7a002_0
- libtiff==4.2.0=h85742a9_0
- numpy-base==1.20.3=py38h74d4b33_0
- gmp==6.2.1=h2531618_2
- gnutls==3.6.15=he1e5248_0
- openh264==2.1.0=hd408876_0
- nettle==3.7.3=hbbd107a_1
- libunistring==0.9.10=h27cfd23_0
- python==3.8.11=h12debd9_0_cpython
- libffi==3.3=he6710b0_2
- cudatoolkit==10.1.243=h6bb024c_0
- libidn2==2.3.2=h7f8727e_0
- zlib==1.2.11=h7b6447c_3
- ca-certificates==2021.7.5=h06a4308_1
- sqlite==3.36.0=hc218d9a_0
- libpng==1.6.37=hbc83047_0
- setuptools==52.0.0=py38h06a4308_0
- libstdcxx-ng==9.3.0=hd4cf53a_17
- bzip2==1.0.8=h7b6447c_0
- jpeg==9b=h024ee3a_2
- ld_impl_linux-64==2.35.1=h7274673_9
I have a confusion about the amass dataset.
In the amass_parser.py file, let's download the dataset BioMotionLab_NTroje, but there are three on the official website of amass, which are BMLhandball、BMLmovi、BMLrub. I don't know which one to download.
After running the code python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt ,appear the following issue. I don't know how to solve it.
Restore weights..
Visualization of the epoch 100
Generate the videos..
Render the generated samples: 100%|████████████████████████████████████████████████████| 24/24 [00:09<00:00, 2.41it/s]
Load the generated samples: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\anaconda\Anaconda3\envs\motionclip\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\anaconda\Anaconda3\envs\motionclip\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\text2motion.py", line 41, in
main()
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\text2motion.py", line 37, in main
viz_clip_text(model, grid, epoch, parameters, folder=folder)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 280, in viz_clip_text
frames = generate_by_video({}, {}, generation,
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 132, in generate_by_video
gener["frames"] = pool_job_with_desc(pool, iterator,
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in pool_job_with_desc
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\anim.py", line 44, in load_anim
data = np.array(imageio.mimread(path, memtest=False))[..., :3]
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\functions.py", line 354, in mimread
reader = read(uri, format, "I", **kwargs)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\functions.py", line 186, in get_reader
return format.get_reader(request)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\format.py", line 170, in get_reader
return self.Reader(self, request)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\format.py", line 221, in init
self._open(**self.request.kwargs.copy())
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillowmulti.py", line 60, in _open
return PillowFormat.Reader._open(self)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillow.py", line 138, in _open
as_gray=as_gray, is_gray=_palette_is_grayscale(self._im)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillow.py", line 689, in _palette_is_grayscale
palette = np.asarray(pil_image.getpalette()).reshape((256, 3))
ValueError: cannot reshape array of size 384 into shape (256,3)
In the paper, it is said that for babel action recognition the training was performed using the labels rather than the raw text, but in code i found self.clip_label_text = "text_raw_labels" . "text_raw_labels" seems to load the raw text for each frame rather than just the category. Can you help me with understanding this? Thanks!
Hi, thank you for your amazing work.
I am curious about the training requirements. Could you share it?
Hi Guy Tevet,
I trust this message finds you well. I sincerely hope recent events, including any conflicts, haven't hindered your research efforts. Please accept my best wishes for you and everyone in your vicinity.
Upon checking the example text file you provided in this repo, I observed that the dataset's captions seem relatively simpler than those of HumanML3D, which has been used in your other studies.
With that in mind, I would be grateful if you could answer a couple of inquiries:
I thank you in advance for any insights you might offer.
Best regards.
On executing the code after first release using the training command, i.e python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d
--lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 100 --snapshot 10
--device
--datapath ./data/amass_db/amass_30fps_db.pt
--folder ./exps/my-paper-model
I face 2 errors,
(i) train.py: error: the following arguments are required: --dataset
Fix - Training command needs to be edited with the inclusion of dataset parameter
(ii) NameError: name 'batch_losses_dict' is not defined
Fix - line # 45 to 49 need to be commented in trainer.py as they seem to be of no use
Please correct me know if I am wrong in the above fixes
Hi. I just wonder if we can ignore global translation and only take joint rotation as input to your pretrained MotionCLIP? Does this make sense? Thank you very much!
作者您好,请问您使用ffmpeg的版本是多少呢?
Hi,
thanks for sharing this project.
I saw you have two loss terms (mmd term and hessian term) that were not discussed in your paper (maybe I miss the detail, if so, could you point out that I miss them).
best regards
when I test the Text-to-Motion part, the visualize part failed,
list index out of range
File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 111, in pool_job_with_desc
for _ in pool.imap_unordered(plot_3d_motion_dico, iterator):
File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 139, in generate_by_video
gener["frames"] = pool_job_with_desc(pool, iterator,
File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 413, in viz_clip_edit
frames = generate_by_video({}, {}, generation,
File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/motion_editing.py", line 33, in main
viz_clip_edit(model, datasets, edit_csv, epoch, parameters, folder=folder)
File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/motion_editing.py", line 37, in <module>
main()
File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
IndexError: list index out of range
it seems thatpool.imap_unordered(plot_3d_motion_dico, iterator)
fails.
when running "python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt" i get a that error. i just change.
The problem is in src/datasets/dataset.py", line 371.
I can run the script by assigning values to those variables in lines 371 and 372, so it seems that the issue lies in the parameter passing.
Thanks for your time, im kind of a noob :)
Hi,
Are there any clues about the reason why the motions reproduced are static? I believe I followed the instructions very well. I also mentioned the issue here. #5 (comment)
Shunlin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.