korrawe / harp Goto Github PK
View Code? Open in Web Editor NEWHARP: Personalized Hand Reconstruction from a Monocular RGB Video
HARP: Personalized Hand Reconstruction from a Monocular RGB Video
I have installed Pytorch3d using instructions at:
using command: pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu113_pyt1110/download.html
After running python optimize_sequence.py
, I get following output:
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /home/asadali/anaconda3/envs/harp/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth
Training size: 300
Val size: 300
/home/asadali/anaconda3/envs/harp/lib/python3.9/site-packages/pytorch3d/structures/meshes.py:1097: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
self._edges_packed = torch.stack([u // V, u % V], dim=1)
/home/asadali/anaconda3/envs/harp/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 20 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
0%| | 0/301 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/asadali/Desktop/Hand Pose Estimation/harp/optimize_sequence.py", line 842, in
main()
File "/home/asadali/Desktop/Hand Pose Estimation/harp/optimize_sequence.py", line 837, in main
optimize_hand_sequence(config_dict, mano_params, images_dataset, val_mano_params, val_images_dataset,
File "/home/asadali/Desktop/Hand Pose Estimation/harp/optimize_sequence.py", line 482, in optimize_hand_sequence
y_pred = render_image_with_RT(meshes, light_T, light_R, cam_T, cam_R,
File "/home/asadali/Desktop/Hand Pose Estimation/harp/utils/visualize.py", line 304, in render_image_with_RT
rendered_img = renderer(mesh,
File "/home/asadali/anaconda3/envs/harp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/asadali/Desktop/Hand Pose Estimation/harp/renderer/renderer_helper.py", line 359, in forward
batch_size = fragments_from_light[0].shape[0]
TypeError: 'Fragments' object is not subscriptable
I looked at the definition of Fragments class of pytorch3d but that did not give me any clue. Any help?
Hello,
thank you for making the code public.
It seems that there are some missing modules in the renderer folder for visualization
Best regards,
Thank you for open-sourcing your wonderful work.
I configured the setup for my custom videos. But before processing my own videos, I tried to test it on the given preprocessed_data to confirm that everything is configured correctly and working fine.
Data: I choose "preprocessed_data/dim_light/3" sequence for initial testing as I have groundtruth available. I removed all other folders except "unscreen", "unscreen_cropped" and "mask" for inference
./data/dim_light/3
|-- unscreen
|-- unscreen_cropped
|-- mask
I ran the following command after following all the given instructions in the README:
python ./metro/tools/end2end_inference_handmesh.py --resume_checkpoint ./models/metro_release/metro_hand_state_dict.bin --image_file_or_path data --do_crop
Terminal log looks like:
2023-12-22 13:21:05,927 METRO Inference INFO: Inference: Loading from checkpoint ./models/metro_release/metro_hand_state_dict.bin
2023-12-22 13:21:05,927 METRO Inference INFO: Update config parameter num_hidden_layers: 12 -> 4
2023-12-22 13:21:05,928 METRO Inference INFO: Update config parameter hidden_size: 768 -> 1024
2023-12-22 13:21:05,928 METRO Inference INFO: Update config parameter num_attention_heads: 12 -> 4
2023-12-22 13:21:05,928 METRO Inference INFO: Update config parameter intermediate_size: 3072 -> 4096
2023-12-22 13:21:07,769 METRO Inference INFO: Init model from scratch.
2023-12-22 13:21:07,769 METRO Inference INFO: Update config parameter num_hidden_layers: 12 -> 4
2023-12-22 13:21:07,769 METRO Inference INFO: Update config parameter hidden_size: 768 -> 256
2023-12-22 13:21:07,769 METRO Inference INFO: Update config parameter num_attention_heads: 12 -> 4
2023-12-22 13:21:07,769 METRO Inference INFO: Update config parameter intermediate_size: 3072 -> 1024
2023-12-22 13:21:08,027 METRO Inference INFO: Init model from scratch.
2023-12-22 13:21:08,028 METRO Inference INFO: Update config parameter num_hidden_layers: 12 -> 4
2023-12-22 13:21:08,028 METRO Inference INFO: Update config parameter hidden_size: 768 -> 64
2023-12-22 13:21:08,028 METRO Inference INFO: Update config parameter num_attention_heads: 12 -> 4
2023-12-22 13:21:08,028 METRO Inference INFO: Update config parameter intermediate_size: 3072 -> 256
2023-12-22 13:21:08,086 METRO Inference INFO: Init model from scratch.
=> loading pretrained model models/hrnet/hrnetv2_w64_imagenet_pretrained.pth
2023-12-22 13:21:10,246 METRO Inference INFO: => loading hrnet-v2-w64 model
2023-12-22 13:21:10,247 METRO Inference INFO: Transformers total parameters: 101182022
2023-12-22 13:21:10,252 METRO Inference INFO: Backbone total parameters: 128059944
2023-12-22 13:21:10,254 METRO Inference INFO: Loading state dict from checkpoint ./models/metro_release/metro_hand_state_dict.bin
2023-12-22 13:21:11,231 METRO Inference INFO: Run inference
- Skip data/dim_light/3/unscreen. Images already cropped
After coarse alignment: 284.209717
After fine alignment: 2.175960
save to data/dim_light/3/metro_image/0001_metro_pred.jpg
After coarse alignment: 277.326782
After fine alignment: 2.160460
save to data/dim_light/3/metro_image/0002_metro_pred.jpg
After coarse alignment: 279.830750
After fine alignment: 2.158962
save to data/dim_light/3/metro_image/0003_metro_pred.jpg
After coarse alignment: 271.266846
After fine alignment: 2.142965
Which looks fine to me as the error after alignment looks good. But "metro_image" and "mano_fit_image" folders have white images in it like this:
Further, I checked the "metro_mano" and printed the output of mano["joints"] from "metro_mano/0001_mano.pkl" and compared with the already given mano files in preprocessed_data and they were almost equal:
# my prediction
new_pred/dim_light/3/metro_mano/0001_mano.pkl : [[[ 8.495154 17.06903 -9.26395 ]
[ -41.61213 -12.395572 8.721587 ]
[ -75.39449 -19.07165 12.3267145]
[ -99.15905 -35.592865 7.109699 ]
[-134.94511 -44.250084 6.7244143]
[ -55.110466 -82.35045 9.510782 ]
[ -67.26341 -114.763336 17.313774 ]
[ -75.669304 -136.09357 27.320322 ]
[ -85.184204 -161.39336 35.048496 ]
[ -33.910183 -98.4381 17.329912 ]
[ -39.628586 -131.31053 24.955 ]
[ -43.71412 -154.00847 37.94211 ]
[ -48.900585 -184.4718 41.905487 ]
[ -6.6895456 -92.741684 27.531963 ]
[ -14.318776 -123.943405 31.026226 ]
[ -15.8594885 -149.02838 39.92393 ]
[ -21.418255 -177.08452 40.96007 ]
[ 13.468429 -82.36368 37.21435 ]
[ 16.056091 -104.14632 42.349155 ]
[ 20.082771 -124.0588 49.40557 ]
[ 22.207354 -149.0515 47.382576 ]]]
# groundtruth already provided
../data/preprocessed_data/dim_light/3/metro_mano/0001_mano.pkl : [[[ 8.087268 16.605965 -9.308287 ]
[ -41.77881 -12.450466 8.813671 ]
[ -75.50443 -18.959173 12.084147 ]
[ -99.14457 -35.56908 7.2946463]
[-134.7546 -44.14023 6.619013 ]
[ -55.1098 -82.33965 9.5261545]
[ -67.22784 -114.70128 17.272465 ]
[ -75.6818 -135.92662 27.256243 ]
[ -85.07761 -161.1897 34.914143 ]
[ -33.834896 -98.423485 17.284163 ]
[ -39.522892 -131.12558 25.191135 ]
[ -43.885292 -153.96214 37.649086 ]
[ -48.765457 -184.25931 41.956997 ]
[ -6.548176 -92.76689 27.53105 ]
[ -14.17833 -123.8556 31.163118 ]
[ -16.051815 -148.91757 39.79642 ]
[ -21.312515 -176.93044 40.97242 ]
[ 13.623191 -82.3927 37.285084 ]
[ 16.057844 -104.11501 42.403454 ]
[ 20.027308 -124.03084 49.31449 ]
[ 22.230108 -148.9662 47.409035 ]]]
Why output images are blank but the mano output looks correct?
First of all, thanks for the great work and code.
As depicted in Table 2: Quantitative evaluation of the appearance reconstruction task on the train split of our captured sequences. I am wondering about the specific setting of the train split. There are 9 sequences in preprocessed_data/subject_1/
.
What exactly sequences are used for Table 2 exp? And does the experiment use the same sequences for train and evaluation?
Thanks for your help!
Hi, may I ask how to optimize the nimble param to fit nimble meshes to mano meshes?
I take your advice in the previous issue to use the method you provided to fit nimble meshes to mano meshes, but unfortunately it can not converge. I encounter a few problems.
The first problem is the nimble layer you used in the method seems to be different from the original NIMBLE repo? The nimble layer provided in the original NIMBLE repo doesn't have cur_rot
, cur_trans
, global_scale
and no_tex
params. Could you kindly provide your version of nimble layer?
The second problem is that it seems there are no regularizations on pose and shape parameters during the coarse and fine alignment stage. I run the method but the generated hand mesh is distorted. I guess it is the reason of not adding regularization. Did you optimize well without adding the regularization terms?
I try to add a regularization term to optimize, but the value of weighting factor is difficult to adjust in order to fit nimble meshes to mano meshes. It would be nice of you if you could provide some suggestions for optimization.
Thank you!
Hello, thanks for your awesome work!
ive been just wondering, if it is possible to render the reconstructed hand back to the location of a SMPL-X mesh, i.e. to replace the hand of an SMPLX body? if so can you give some advice to implement it? im a newbie to computer graphic.
Hello I tried running the optimization using the hand template of mano instead of the arm template. For this, I mainly set use_arm to False in config_utils.py. I ran the optimization on the provided precessed sequences. I found some issues during the optimization, that are mostly related to visualization:
Here are the evaluation results I got from running on the sequences:
Silhouette IoU: 0.21830
L1: 0.03951
LPIPS: 0.22118
MS_SSIM: 0.74872
hello, thanks for the work.
I tried running metro with the modifications from this repo. it worked with the coarse and fine alignments but at the stage of smoothing. I get the following error:
--- Sequence Smoothing ---
--- Sequence Smoothing ---
samples/hand/sample/metro_mano
dict_keys(['joints', 'verts', 'rot', 'pose', 'shape', 'trans', 'cam', 'joints_metro', 'verts_metro'])
Traceback (most recent call last):
File "./metro/tools/end2end_inference_handmesh.py", line 544, in <module>
main(args)
File "./metro/tools/end2end_inference_handmesh.py", line 540, in main
run_inference(args, image_list, _metro_network, mano_model, renderer, mesh_sampler)
File "./metro/tools/end2end_inference_handmesh.py", line 313, in run_inference
smooth_sequence(renderer, mano_arm_layer, mano_fit_output_dir, mano_fit_smooth_dir, use_smplx_arm=True, img_res=RESOLUTION)
File "/test_container/metro/hand_utils/hand_utils.py", line 706, in smooth_sequence
params_out = optimize_smooth_seq(params, mano_layer, use_smplx_arm=use_smplx_arm, img_res=img_res)
File "/test_container/metro/hand_utils/hand_utils.py", line 588, in optimize_smooth_seq
betas=params['shape'], global_orient=params['rot'], transl=params['trans'], right_hand_pose=params['pose'], return_type='mano')
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/test_container/hand_models/smplx/smplx/body_models.py", line 2367, in forward
lmk_bary_coords)
File "/test_container/hand_models/smplx/smplx/lbs.py", line 152, in vertices2landmarks
landmarks = torch.einsum('blfi,blf->bli', [lmk_vertices, lmk_bary_coords])
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/functional.py", line 241, in einsum
return torch._C._VariableFunctions.einsum(equation, operands)
when running without fit_arm option I get the following error instead:
2023-07-20 21:24:46,013 METRO Inference INFO: Run inference
After coarse alignment: 218.062958
After fine alignment: 2.566075
save to samples/hand/sample/metro_image/0001_metro_pred.jpg
After coarse alignment: 209.102310
After fine alignment: 2.593198
save to samples/hand/sample/metro_image/0002_metro_pred.jpg
--- Sequence Smoothing ---
--- Sequence Smoothing ---
samples/hand/sample/metro_mano
dict_keys(['joints', 'verts', 'rot', 'pose', 'shape', 'trans', 'cam', 'joints_metro', 'verts_metro'])
Traceback (most recent call last):
File "./metro/tools/end2end_inference_handmesh.py", line 544, in <module>
main(args)
File "./metro/tools/end2end_inference_handmesh.py", line 540, in main
run_inference(args, image_list, _metro_network, mano_model, renderer, mesh_sampler)
File "./metro/tools/end2end_inference_handmesh.py", line 313, in run_inference
smooth_sequence(renderer, mano_arm_layer, mano_fit_output_dir, mano_fit_smooth_dir, use_smplx_arm=False, img_res=RESOLUTION)
File "/test_container/metro/hand_utils/hand_utils.py", line 706, in smooth_sequence
params_out = optimize_smooth_seq(params, mano_layer, use_smplx_arm=use_smplx_arm, img_res=img_res)
File "/test_container/metro/hand_utils/hand_utils.py", line 590, in optimize_smooth_seq
hand_verts, hand_joints = mano_layer(torch.cat((params['rot'], params['pose']), 1), params['shape'], params['trans'])
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/test_container/hand_models/smplx/smplx/body_models.py", line 2304, in forward
full_pose = torch.cat([global_orient.reshape(-1, 1, 3),
RuntimeError: shape '[-1, 1, 3]' is invalid for input of size 50
do you have an idea of what could be the reason behind this.
Thanks
Hi, thank you for providing such a good work! It has provided me with valuable insights.
May I ask if the hand appearance dataset and synthetic dataset will be made publicly available? Or if it would be possible to make the code for generating the synthetic dataset using Blender and Nimble available to the public?
Thank you very much!
Dear Authors,
Thanks for this open-source for hand reconstruction. I had a problem when I installed the required smplx. I didn't find the hand_models folder, should I create a new one? And I downloaded the smplx and didn't find the ./hand_models/smplx/smplx/body_models.py and ./hand_models/smplx/smplx/init.py in the model folder.
Sorry, I am very new to this topic. If you could provide some guidance, I would be very appreciative! Thank you!
Hello,
There were some issues when setting up the environment for the repo. First, the requirements.txt file mentions that the pytorch3d version used is 0.5.0 but in renderer_helper there is an import of ShaderBase from pytorch3d.renderer.mesh.shader which was introduced in version 0.6.2 if I'm not mistaken.
Anyway, here are the steps I used to create the environment, they might be useful:
conda create -n harp python=3.9 && conda activate harp
conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d=0.6.2 -c pytorch3d
requirements_reduced.txt
and install with pip install -r requirements_reduce.txt
:tensorboard
lpips
pytorch_msssim
chumpy # then comment "#from numpy import bool, int, float, complex, object, unicode, str, nan, inf "in /home/xx/moniconda3/envs/harp/lib/python3.9/site-packages/chumpy/__init__.py
scikit-learn
scipy
matplotlib
scikit-image
matplotlib
imageio
plotly
opencv-python
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.