zc-alexfan / arctic Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] Official repository for downloading, processing, visualizing, and training models on the ARCTIC dataset.
Home Page: https://arctic.is.tue.mpg.de
License: Other
[CVPR 2023] Official repository for downloading, processing, visualizing, and training models on the ARCTIC dataset.
Home Page: https://arctic.is.tue.mpg.de
License: Other
Hi! Thanks for releasing the code and dataset. I get a problem in data processing when using the command python scripts_data/process_seqs.py --export_verts
. The error message is shown as follows:
Traceback (most recent call last):
File "scripts_data/process_seqs.py", line 62, in main
process_seq(task, export_verts=args.export_verts)
File "./src/arctic/processing.py", line 448, in process_seq
export_verts,
File "./src/arctic/processing.py", line 90, in process_batch
out_world = forward_gt_world(batch, layers, smplx_m)
File "./src/arctic/processing.py", line 191, in forward_gt_world
assert mano_l["joints.left"].shape[1] == 21
AssertionError
So how can I fix it? Thanks for your help!!
Thanks for the nice work!
I really appreciate reading your paper and baseline code.
And after following your code, I got several questions to ask.
From the cropped images to go through encoder(ResNet50), the image size should be changed into 1000x1000 -> 224x224.
I found that process happens from this line,
arctic/src/datasets/arctic_dataset.py
Line 168 in 7e3f736
Could you explain how the bbox is determined? From below description, I could understand what the bbox is, but I got curious how you determined the bbox values. If I try to use another dataset, I think I have to determine bbox variable.
From 'data_doc.md',
data_dict - s05/box_use_01 - bbox: Bounding box (squared) on image for network input; three-dimensional cx, cy, scale where (cx, cy) is the bbox center; scale*200 is the bounding box width and height; frames x views x 3
In scripts_data/crop_images.py, process_fname function is used for cropping.
arctic/scripts_data/crop_images.py
Line 35 in 7e3f736
4.
[Solved]I misread the code, and asked dumb question.
Hi, I have a question about MANO parameters and SMPLX parameters.
The MANO right hand parameters at frame 0 (box_use_01.mano.npy) is:
array([-0.03240527, 0.27357996, -0.44066623, -0.15148902, -0.19981687,
-0.6955209 , 6.558325 , -0.38321212, 0.8501508 , 0.45173895,
0.18395485, -0.39749008, 0.06304266, -0.11010664, -0.6733147 ,
5.1536746 , 0.26592368, 2.5213878 , 0.4855672 , -0.2268661 ,
0.24746093, 0.26659805, -0.05636495, -0.563865 , 3.304417 ,
0.31140158, 4.6122 , 0.30845642, -0.16647121, -0.4377692 ,
0.24876864, -0.02071797, -0.6238626 , 5.9167466 , 0.02213376,
1.8504663 , -0.9403614 , 0.6202517 , -0.63148606, -0.0362047 ,
-0.23572953, 0.97068465, -0.40886796, 0.5357393 , -1.2295284 ],
dtype=float32)
While SMPLX right hand pose at frame 0 (box_use_01.smplx.npy) is:
array([ 0.0094533 , 0.16874607, -0.08016084, -0.05645293, -0.02849651,
0.07896784, 0.1604375 , -0.05796457, 0.13873944, 0.1402729 ,
0.07300756, 0.09562486, -0.11038493, -0.06354734, 0.1671049 ,
-0.03375542, 0.00310679, -0.29383537, -0.13252321, -0.0102099 ,
0.6954388 , 0.17532377, 0.09140654, 0.19590265, -0.40491733,
-0.15230231, -0.5940853 , 0.05956102, -0.04324804, 0.25968423,
-0.08441725, -0.03401042, 0.04836456, -0.06257863, 0.06051512,
-0.14253886, -0.13492593, 0.1548856 , -0.38249287, -0.41815382,
0.07842843, 0.58350015, -0.06046534, 0.25589123, -0.72547275],
dtype=float32)
It seems that they are not equal. I thought they described the same hand pose, what are the differences between them?
Hi Alex,
I just tried to use the differentiable manolayer here: https://github.com/hassony2/manopth/blob/master/manopth/manolayer.py, to reproduce the hand mesh with the provided beta and theta parameters. For some hand samples, the smplx and the manopth align with each other, but for some, it seems like the result doesn't really match, as below.
Do you have any guess on the cause for this, or have you tried to use their manolayer before? Since the GT is axis-angle, so I don't think it's because of the wrong parameters of the manolayer
Is it possible to know which hand joints are occluded in each image in the ground truth training data?
Hi,
Are the hand pose parameters and camera parameters (extrinsic and intrinsic) available for download?
What is the purpose of toggle_parameters
here?
arctic/src/models/arctic_lstm/wrapper.py
Line 45 in 0847170
If we want to use the ARCTIC dataset like GRAB for full body hand object interaction, is there any method to recover the full body motion for ARCTIC? Thanks.
Hi, great work! I was wondering if there was a script to visualize the contact heatmaps as you do in the videos.
In the path "/unpack/arctic_data/data/raw_seqs", the folder s03 is missing.
Working on the MoCap data, I noticed some labels are not the same for different subjects. An example of this is:
The markers on the base of the table correspond to [M_1, M_2, M_5, M_6]
for subjects s01 through s04 and [M_5, M_6, M_7, M_8]
for subjects s05 and s06.
Is there a part of the data that specifies which labels correspond to which markers?
Of course, this might be a problem with my visualization, in which case I'd appreciate it if you could confirm that the labels correspond to the same markers across the subjects.
Thanks!
Amin
I was trying to run the training, but I got "file not found" error with it looking for file p1_train.npy. I did a search of the entire downloaded folder, but don't have this file.
Hi! I've been trying to project the 3d keypoints to 2d on my own..
I've been using the mano.j3d.full.r
and mano.j2d.norm.r
. I'm just trying to understand the full pseudocode involved in https://github.com/zc-alexfan/arctic/blob/7e3f736a95d7c461dbd93068de4367374c68a5ac/src/datasets/arctic_dataset.py.
It seems like the 3D is loaded (EDIT: just kidding I see now that they come from different sources...) and then:
There's also a mano.transl.r
hidden in the dataset keys, but I've only gotten things to render roughly closely by using the wrist-position in the mano.j3d.full.r
.
Any help elaborating what things are happening in step 3) would be really appreciated! My major issue is that I want a system that accurately recovers the 3D keypoints but to recover those and have it match the 2D keypoints I need to be able to project the 3D keypoints to 2D (with/maybe without accounting for distortion -- whatever is easiest).
How can I generate & export the per-step hand-object binary contact information? I see the scripts_method/visualize.py has an option to visualize it from model predictions, but I'm trying to generate GT contact from the raw data.
Thanks in advance!
Thanks for your great job!
I have downloaded the codes and data using the same splits as your baselines following the instructions. Everything works fine until I start training the ArcticNet-SF from scratch, and my issues are as follows:
In your default configuration, you use a batch size of 64. However, I found that it would take a very long time (e.g. 7~8 hours per epoch) for training on a single RTX 3090. Is that normal?
So here comes the third question. In your paper, I didn't find any descriptions of the training time and the hardware you use for the baselines. Could you please clarify it?
Thanks in advance!
Dear team, thanks for your great work! When I visualize mask images, I find that the mask image size is 1600 * 900, but the cropped image size is 1000 * 1000, so the mask images can't match the cropped images pixel by pixel. How can they be exactly matched? I'd be really appreciated if you could give me some advice.
Dear all,
Thanks for this amazing dataset. I have been working on it and need to find out the positions of the markers, either in 3D (world coordinate) or 2D (image coordinate). I looked through the data documentation and found 5 ego-centric camera marker positions, but not the other markers, e.g., the ones on the hands.
Exploring the code in src/utils/eval_modules.py
, I found the joints_gt
variable being used with the dimensions Nx14x3
which I can guess is for the 3D marker positions. If my guess is correct, it means that these marker positions can be extracted using scripts_method/evaluate_metrics.py
which uses the models.
My question is: is there another way I can extract the marker positions without the need to run the models?
Many thanks in advance!
Amin
Hi Alex,
Thanks for your work. I have a question regarding the setup of training the ArcticNet-SF with multiple GPU. I run the provided command for training it follow the CVPR setup, and it takes like 2 hours for single epoch, thus I would like to accelerate it.
However, if simply changing the code within the trainer function:
arctic/scripts_method/train.py
Line 46 in fc6f7d7
Have you tried with multi-GPU training? If so, could you please provide more suggestions for this?
When will it be released?
Hi,
I noticed another dataset GRAB having similar configurations as ARCTIC (without images). I wonder if you have tried to merge the two datasets or not. If I would like to train a 3D-only grasp/motion model, I think I can fuse the two datasets together? Do you think if there is any obstacle preventing this?
Are you planning to setup a pipeline for Object tracking task?
Moreover, do you think ARCTIC would be a good benchmark for object tracking (using hand motion prior in some way)?
If so, how could I get 2D objects' bounding boxes extracting from the poses?
Thank you and great work!
I found three images are missing in my downloaded dataset:
I wonder if that is missing in the dataset or maybe just some error occured during my downloading....
The predicted weak perspective camera is converted into the translation term, which is then used to compute the vertices and joints in camera coordinate frame. However, during evaluation, all the 3D locations are shifted to the wrist joint and then metrics are computed. This effectively gets rid of any translation error. Is there any reason why the metrics are not computed in the camera frame?
@zc-alexfan
Thanks for making such a great dataset.
I am a student from China. I have tried your script for downloading data but failed. The login part is passed but the downloading process was stucked at the beginning. I have tried many VPN to switch to oversea network but still didn't work. Could you please check the download link?
Best,
Yu
For visualizing mesh by projecting them onto the image, shouldn't the intrinsics in
arctic/scripts_method/visualizer.py
Lines 85 to 88 in f91ca2b
Following #5, I noticed my training times are very different from yours. E.g., each epoch takes ~8 hours on the full data with twice your batch size (128) and the default parameters on an A100 GPU.
Thanks!
Amin
This issue should aggregate all discussion/emails related to leaderboard submission.
Thank you for your great work! I want to ask about the translation of the hand and the object. I want to get the object verts for a given frame, so I first fetched the vertices from the mesh file, and then try to apply the appropriate transform. The rotations seems correct using obj_rot_cam
, but there seemed to be some issue with obj_trans
, as shown in the following figure of reprojection result (using the intrinsics). I wonder if I should apply some other transformation on the translation, or should I multiply a scalar to it? Thanks.
Can you provide a simple outline for how you would run inference using ArcticNet-SF/LSTM on custom data? I didn't see it in the DOCS but I might have missed it!
I'm assuming I might want to modify extract_predicts.py
I want to use the mano files, but I noticed that there is a mismatch between number of files:
Ex) Within this particular folder, there are 733 images.
But, the number of mano files are only 732 (unpack/arctic_data/data/raw_seqs/s01/box_grab_01.mano.npy):
Since the mano file does not have information about which image each row maps to, instead of discarding the one image for which the mano parameters are missing, I would have to discard all the images in this folder.
I wanted to confirm this is the case or if there was an error when downloading the files.
Thank You!
This happens when the downloading process is killed due to out of RAM. You can increase the swap space following
here.
Exception thrown when running the following command:
python -m scripts_data.process_seqs --mano_p ./data/arctic_data/data/raw_seqs/s01/capsulemachine_use_01.mano.npy --export_verts
I am on commit 20f5b6b464328621bf167b882f43ec14b127e7b4
More details later...
Hey!
Thanks for this amazing work. I wanted to use the pre-cropped images however I can't find the modified intrinsics after the cropping? Is there a way to recover it?
Thanks.
The current code uses ground truth intrinsics for egocentric model during evaluation. Is that also allowed for the HANDS challenge?
I want to label joint visibility. In a previous response, it was mentioned to use segmentation masks, so I'm currently modifying the 'python scripts_data/visualizer.py --object --smplx --headless' script. During this process, I'm facing difficulty in obtaining segmentation mask images that match the cropped images. Can I get some advice on this part?
Shouldn't gt_transl and T0 be the same?
If the global orientation of both the object and hand are with respect to camera coord space, then the object cano space and MANO cano space are different than the camera coord space by only a translation component.
Also, shouldn't targets["mano.j3d.full.r"]
and targets["mano.j3d.cam.r"]
be the same since both are in camera coord space?
Hello, thank you for your work, it's very interesting.
I would like to clarify whether it works on Windows or only on Linux?
I have Windows, but I satisfy all the general requirements from docs/setup.md.
I wanted to test your code and followed your instructions, but I had so many errors with packages. I was able to fix some of them, but one of them is still unsolved - "ModuleNotFoundError: No module named 'common.***', now all errors are related to module common only.
Thank you for your reply.
Thanks for this great dataset!
I was looking at the data format docs for SMPLX here and noticed that there is no shape parameter included in the *.smplx.npy files.
To my understanding, the shape is required to reconstruct the joints/verts.
Am I missing something?
Dear team,
Thank you for your great work! I wonder if ARCTIC provides background images for all 3rd-person view cameras. It would be very helpful to segment foreground scenes. Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.