Giter Club home page Giter Club logo

motionclip's People

Contributors

briang13 avatar guytevet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

motionclip's Issues

Why input size is 25 x 6?

what a great work!

I'm implementing with your code, but I have a question.

In your paper, Input is orientations in 6D representation of SMPL body model(24 x 6). When I check from debugging, The input shape is 25 x 6.

Which one is right? or What component is added to 24 x 6?

Reproducing paper results

Was clip finetuned in training the checkpoint? I am unable to reproduce results with clip frozen (didnt try with it unfrozen yet).

code release?

Hi, Thanks for the great work. Do you have an estimate of when the code will be released?

Training translation??

Hi, thanks a lot for sharing the code!!

I would like to confirm the training in translation.

In the training command you provided (paper-model), the translation is active(e.g. --translation).

While in the paper, Sec.3 Method, you precise that p_i \in 24 * 6, which indicates the training should not involve the translation.
Also in the demo, it seems that the generated motion has no translation.

Could you please be precise on this detail?

SImilarities computed using motion and text embeddings are incorrect

Like CLIP, where we compute the image and text embeddings and compute the similarities to retrieve the best matching text, I tried the same using motion and text, but it does not work.

Eg. Using the AMASS dataset and bs = 2; texts: 'jump', 'dancing',

emb = enc.encode_motions(batch['x']).to(device)
emb /= emb.norm(dim=-1, keepdim=True)

text_inputs = torch.cat([clip.tokenize(c) for c in batch["clip_text"]]).to(device)
text_features = clip_model.encode_text(text_inputs).float()
text_features /= text_features.norm(dim=-1, keepdim=True)

logit_scale = clip_model.logit_scale.exp()
similarity = (logit_scale * emb @ text_features.float().T).softmax(dim=-1)

values, indices = similarity[0].topk(len(batch["clip_text"]))

# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
    print(f"{batch['clip_text'][index]:>16s}: {100 * value.item():.2f}%")

Expected output for similarity[0] -> high "jump" probability
But I get a high "dance" probability output. I have tested this with multiple batches and the correct text does not get the highest similarity a majority of the times. Am I inferencing it wrong?

"action2motion_joints" list & AMASS db file does not include translation vector

Thanks for the interesting work!
I have two questions regarding amass.py and dataset.py respectively.

In line 93-95 in amass.py, why you only select such 18 joints (should it still be 24 or 22?) and switch #8 joint to the root position?

In line 115-122 in dataset.py, should the translation be the "trans" vector where "trans" is the key word of AMASS data dictionary? It seems that you did not store this "trans" vector into any places. Or do I misunderstand the meaning of "self.translation"?

Thanks!

Two questions

I am attracted by your fascinating work and thanks for sharing the codes!

Here I have some questions:

  1. I wonder how you got the paper-model perform so well. I trained my-model with the parameters suggested in readme.md, but it didn't perform will, only reach 27% top1.acc and 41% top5.acc on the action-recognition task.

2.Could you give some instructions on how to generate the motion video that views from different angles and perspectives but not "set global orientation to zero". (for example, the motions in the photo are generated from the side view, 90 degree)
Screenshot 2022-07-05 221343

consistency of the motion encoder and the motion decoder

For a certain motion sequence, I encode it with the motion encoder ( function ‘encode_motions()' in visualize.py ) and get the clip features of this sequence. Then, I decode the clip features with the motion decoder ( function generate() of class MOTIONCLIP in motionclip.py ). However, when I visualize the motion sequence after decoded, I found it is very different from the motion before encoding.

I want to know why it happens? Do you have any good solution?

What is the script duration_finetunning.py used for?

Thanks for your great job!
I'm curious about the usage of duration_finetunning.py, is MotionCLIP trained in a two-stage manner? Namely, training Transformer AutoEncoder in the first stage, and finetune CLIP text-image encoder for the second stage. Am I understanding right?

use smplx add-on

Hi MotionCLIP team,
Thanks for your great work, here I am wondering how to render a single image using the blender and the smplx add-on. In my work, I wanna render images from the HumanML3D format motion. Could you give me some instructions?
Really appreciate your work again and hope hearing from you!

Kangning

About AMSS Dataset

In your code, one of the 'amass_train_split' list is 'BioMotionLab_NTroje'. I did not see it on the AMSS dataset. It is different from the code in mass.

AssertionError:data/amass/amass_30fps_legacy_db.pt

Hi,thx for your geat job!!
I want to use the code to run text_to_motion scrpit, just like:

python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt

But I got the below exception:

Traceback (most recent call last): File "/data/miniconda3/envs/motion_clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/miniconda3/envs/motion_clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/visualize/text2motion.py", line 41, in <module> main() File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/visualize/text2motion.py", line 21, in main model, datasets = get_model_and_data(parameters, split='vald') File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/utils/get_model_and_data.py", line 26, in get_model_and_data datasets = get_datasets(parameters, clip_preprocess, split) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/datasets/get_dataset.py", line 18, in get_datasets dataset = DATA(split=split, clip_preprocess=clip_preprocess, **parameters) File "/group/20000/weidongyang/pose-estimation/motion_clip/MotionCLIP/src/datasets/amass.py", line 142, in __init__ assert os.path.exists(self.datapath) AssertionError

Where can I to get tht pt file? Much appreciate!

question about amass_parser.py

when I run python -m src.datasets.amass_parser --dataset_name amass, I get an error: RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 9932 but got size 52 for tensor number 1 in the list.
the details are as follows:
Loading babel labels
DONE! - Loading babel labels
args.input_dir: ./data/render
Loading Body Models
DONE! - Loading Body Models
Reading Transitions_mocap sequence...
0%| | 0/1 [00:00<?, ?it/s]/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:166: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_raw_text_labels = np.full(data['poses'].shape[0], "", dtype=np.object)
/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:167: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_proc_text_labels = np.full(data['poses'].shape[0], "", dtype=np.object)
/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py:168: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
frame_action_cat = np.full(data['poses'].shape[0], "", dtype=np.object)
rot_mats.shape: torch.Size([191, 52, 3, 3])
joints.shape: torch.Size([1, 52, 3])
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/licg/.conda/envs/motionclip/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/licg/.conda/envs/motionclip/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 318, in
db = read_data(args.input_dir, # ./data/amass
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 115, in read_data
results_dict = read_single_sequence(split_name, dataset_name, seq_folder, seq_name, body_models, target_fps,
File "/data2/lichenguang/projects/MotionCLIP/src/datasets/amass_parser.py", line 228, in read_single_sequence
body_motion = body_model(pose_body=pose_body, pose_hand=pose_hand, root_orient=root_orient)
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/data2/lichenguang/projects/MotionCLIP/human_body_prior/body_model/body_model.py", line 247, in forward
verts, joints = lbs(betas=shape_components, pose=full_pose, v_template=v_template,
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 231, in lbs
J_transformed, A = batch_rigid_transform(rot_mats, J, parents, dtype=dtype)
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 394, in batch_rigid_transform
transforms_mat = transform_mat(
File "/home/licg/.conda/envs/motionclip/lib/python3.8/site-packages/smplx/lbs.py", line 349, in transform_mat
return torch.cat([F.pad(R, [0, 0, 0, 1]),
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 9932 but got size 52 for tensor number 1 in the list.

Could you telll me how can I remove this bug? I think the problem is about the AMASS dataset. The contents of directory 'data/amass' are as follows:

image

Create conda environment failed

Running conda env create -f environment.yml in command:

conda --version: 4.13.0

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed
ResolvePackageNotFound:
  - _openmp_mutex==4.5=1_gnu
  - lcms2==2.12=h3be6417_0
  - lame==3.100=h7b6447c_0
  - ninja==1.10.2=hff7bd54_1
  - mkl_fft==1.3.0=py38h42c9631_2
  - xz==5.2.5=h7b6447c_0
  - torchvision==0.9.1=py38_cu101
  - numpy==1.20.3=py38hf144106_0
  - libgcc-ng==9.3.0=h5101ec6_17
  - openssl==1.1.1k=h27cfd23_0
  - mkl-service==2.4.0=py38h7f8727e_0
  - readline==8.1=h27cfd23_0
  - libuv==1.40.0=h7b6447c_0
  - libwebp-base==1.2.0=h27cfd23_0
  - intel-openmp==2021.3.0=h06a4308_3350
  - libtasn1==4.16.0=h27cfd23_0
  - mkl_random==1.2.2=py38h51133e4_0
  - libiconv==1.15=h63c8f33_5
  - ncurses==6.2=he6710b0_1
  - zstd==1.4.9=haebb681_0
  - ffmpeg==4.3=hf484d3e_0
  - freetype==2.10.4=h5ab3b9f_0
  - libgomp==9.3.0=h5101ec6_17
  - openjpeg==2.3.0=h05c96fa_1
  - pytorch==1.8.1=py3.8_cuda10.1_cudnn7.6.3_0
  - certifi==2021.5.30=py38h06a4308_0
  - tk==8.6.10=hbc83047_0
  - mkl==2021.3.0=h06a4308_520
  - lz4-c==1.9.3=h295c915_1
  - pip==21.0.1=py38h06a4308_0
  - pillow==8.3.1=py38h2c7a002_0
  - libtiff==4.2.0=h85742a9_0
  - numpy-base==1.20.3=py38h74d4b33_0
  - gmp==6.2.1=h2531618_2
  - gnutls==3.6.15=he1e5248_0
  - openh264==2.1.0=hd408876_0
  - nettle==3.7.3=hbbd107a_1
  - libunistring==0.9.10=h27cfd23_0
  - python==3.8.11=h12debd9_0_cpython
  - libffi==3.3=he6710b0_2
  - cudatoolkit==10.1.243=h6bb024c_0
  - libidn2==2.3.2=h7f8727e_0
  - zlib==1.2.11=h7b6447c_3
  - ca-certificates==2021.7.5=h06a4308_1
  - sqlite==3.36.0=hc218d9a_0
  - libpng==1.6.37=hbc83047_0
  - setuptools==52.0.0=py38h06a4308_0
  - libstdcxx-ng==9.3.0=hd4cf53a_17
  - bzip2==1.0.8=h7b6447c_0
  - jpeg==9b=h024ee3a_2
  - ld_impl_linux-64==2.35.1=h7274673_9

AMASS Dataset issue

I have a confusion about the amass dataset.
In the amass_parser.py file, let's download the dataset BioMotionLab_NTroje, but there are three on the official website of amass, which are BMLhandballBMLmoviBMLrub. I don't know which one to download.

Text-to-Motion issue

After running the code python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt ,appear the following issue. I don't know how to solve it.
Restore weights..
Visualization of the epoch 100
Generate the videos..
Render the generated samples: 100%|████████████████████████████████████████████████████| 24/24 [00:09<00:00, 2.41it/s]
Load the generated samples: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\anaconda\Anaconda3\envs\motionclip\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\anaconda\Anaconda3\envs\motionclip\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\text2motion.py", line 41, in
main()
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\text2motion.py", line 37, in main
viz_clip_text(model, grid, epoch, parameters, folder=folder)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 280, in viz_clip_text
frames = generate_by_video({}, {}, generation,
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 132, in generate_by_video
gener["frames"] = pool_job_with_desc(pool, iterator,
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in pool_job_with_desc
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\visualize.py", line 114, in
array = np.stack([[load_anim(save_path_format.format(i, j), timesize)
File "D:\CodeProject\python\MotionCLIP-main\src\visualize\anim.py", line 44, in load_anim
data = np.array(imageio.mimread(path, memtest=False))[..., :3]
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\functions.py", line 354, in mimread
reader = read(uri, format, "I", **kwargs)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\functions.py", line 186, in get_reader
return format.get_reader(request)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\format.py", line 170, in get_reader
return self.Reader(self, request)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\core\format.py", line 221, in init
self._open(**self.request.kwargs.copy())
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillowmulti.py", line 60, in _open
return PillowFormat.Reader._open(self)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillow.py", line 138, in _open
as_gray=as_gray, is_gray=_palette_is_grayscale(self._im)
File "C:\anaconda\Anaconda3\envs\motionclip\lib\site-packages\imageio\plugins\pillow.py", line 689, in _palette_is_grayscale
palette = np.asarray(pil_image.getpalette()).reshape((256, 3))
ValueError: cannot reshape array of size 384 into shape (256,3)

Training for action recognition

In the paper, it is said that for babel action recognition the training was performed using the labels rather than the raw text, but in code i found self.clip_label_text = "text_raw_labels" . "text_raw_labels" seems to load the raw text for each frame rather than just the category. Can you help me with understanding this? Thanks!

results on HumanML3D dataset

Hi Guy Tevet,

I trust this message finds you well. I sincerely hope recent events, including any conflicts, haven't hindered your research efforts. Please accept my best wishes for you and everyone in your vicinity.

Upon checking the example text file you provided in this repo, I observed that the dataset's captions seem relatively simpler than those of HumanML3D, which has been used in your other studies.

With that in mind, I would be grateful if you could answer a couple of inquiries:

  1. Have you experimented with MotionCLIP on the HumanML3D dataset? If so, could you kindly share your experiences? Please understand that even brief insights would be greatly appreciated. It is totally fine if it is not a very accurate description.
  2. In your MotionCLIP-used dataset, did you attempt to train a standalone decoder utilizing CLIP features as input?

I thank you in advance for any insights you might offer.
Best regards.

Execution issues

On executing the code after first release using the training command, i.e python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d
--lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 100 --snapshot 10
--device
--datapath ./data/amass_db/amass_30fps_db.pt
--folder ./exps/my-paper-model

I face 2 errors,
(i) train.py: error: the following arguments are required: --dataset
Fix - Training command needs to be edited with the inclusion of dataset parameter
(ii) NameError: name 'batch_losses_dict' is not defined
Fix - line # 45 to 49 need to be commented in trainer.py as they seem to be of no use

Please correct me know if I am wrong in the above fixes

ffmpeg version

作者您好,请问您使用ffmpeg的版本是多少呢?

two extra loss terms: mmd and hessian_penalty

Hi,

thanks for sharing this project.

I saw you have two loss terms (mmd term and hessian term) that were not discussed in your paper (maybe I miss the detail, if so, could you point out that I miss them).

  1. Could you please clarify whether these two terms were utilized during your training implementation, please?
  2. And if you used them, could you possibly explain them a little bit to me about what is the functionality of them, thank you very much for your patience!

best regards

Visualize failed

when I test the Text-to-Motion part, the visualize part failed,

list index out of range
  File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 111, in pool_job_with_desc
    for _ in pool.imap_unordered(plot_3d_motion_dico, iterator): 
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 139, in generate_by_video
    gener["frames"] = pool_job_with_desc(pool, iterator,
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/visualize.py", line 413, in viz_clip_edit
    frames = generate_by_video({}, {}, generation,
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/motion_editing.py", line 33, in main
    viz_clip_edit(model, datasets, edit_csv, epoch, parameters, folder=folder)
  File "/home/vatis/DataDisk_1/22_vatis_master/zhouxiangzhong/MotionCLIP-main/src/visualize/motion_editing.py", line 37, in <module>
    main()
  File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/vatis/anaconda3/envs/zxz_mld_plt/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
IndexError: list index out of range

it seems thatpool.imap_unordered(plot_3d_motion_dico, iterator) fails.

AttributeError: 'AMASS' object has no attribute 'nfeats'

when running "python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt" i get a that error. i just change.

The problem is in src/datasets/dataset.py", line 371.

I can run the script by assigning values to those variables in lines 371 and 372, so it seems that the issue lies in the parameter passing.

Thanks for your time, im kind of a noob :)

Motions are static

Hi,

Are there any clues about the reason why the motions reproduced are static? I believe I followed the instructions very well. I also mentioned the issue here. #5 (comment)

Shunlin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.