Giter Club home page Giter Club logo

arctic's Introduction

ARCTIC 🥶: A Dataset for Dexterous Bimanual Hand-Object Manipulation

Image

[ Project Page ] [ Paper ] [ Video ] [ Register ARCTIC Account ] [ ICCV Competition ] [ Leaderboard ]

Image

This is a repository for preprocessing, splitting, visualizing, and rendering (RGB, depth, segmentation masks) the ARCTIC dataset. Further, here, we provide code to reproduce our baseline models in our CVPR 2023 paper (Vancouver, British Columbia 🇨🇦) and developing custom models.

Our dataset contains heavily dexterous motion:

Image

News

✨CVPR 2024 Highlight: HOLD is the first method that jointly reconstructs articulated hands and objects from monocular videos without assuming a pre-scanned object template and 3D hand-object training data. See our project page for details.

HOLD Reconstruction Example

Reference for HOLD Reconstruction

  • 2023.12.20: MoCap can be downloaded now! See download instructions and visualization.
  • 2023.09.11: ARCTIC leaderboard online!
  • 2023.06.16: ICCV ARCTIC challenge starts!
  • 2023.05.04: ARCTIC dataset with code for dataloaders, visualizers, models is officially announced (version 1.0)!
  • 2023.03.25: ARCTIC ☃️ dataset (version 0.1) is available! 🎉

Invited talks/posters at CVPR2023:

Why use ARCTIC?

Summary on dataset:

  • It contains 2.1M high-resolution images paired with annotated frames, enabling large-scale machine learning.
  • Images are from 8x 3rd-person views and 1x egocentric view (for mixed-reality setting).
  • It includes 3D groundtruth for SMPL-X, MANO, articulated objects.
  • It is captured in a MoCap setup using 54 high-end Vicon cameras.
  • It features highly dexterous bimanual manipulation motion (beyond quasi-static grasping).

Potential tasks with ARCTIC:

Check out our project page for more details.

Projects that use ARCTIC

Reconstruction:

Generation:

Create a pull request for missing projects.

Features

Image

  • Instructions to download the ARCTIC dataset.
  • Scripts to process our dataset and to build data splits.
  • Rendering scripts to render our 3D data into RGB, depth, and segmentation masks.
  • A viewer to interact with our dataset.
  • Instructions to setup data, code, and environment to train our baselines.
  • A generalized codebase to train, visualize and evaluate the results of ArcticNet and InterField for the ARCTIC benchmark.
  • A viewer to interact with the prediction.

Getting started

Get a copy of the code:

git clone https://github.com/zc-alexfan/arctic.git

License

See LICENSE.

Citation

@inproceedings{fan2023arctic,
  title = {{ARCTIC}: A Dataset for Dexterous Bimanual Hand-Object Manipulation},
  author = {Fan, Zicong and Taheri, Omid and Tzionas, Dimitrios and Kocabas, Muhammed and Kaufmann, Manuel and Black, Michael J. and Hilliges, Otmar},
  booktitle = {Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023}
}

Our paper benefits a lot from aitviewer. If you find our viewer useful, to appreciate their hard work, consider citing:

@software{kaufmann_vechev_aitviewer_2022,
  author = {Kaufmann, Manuel and Vechev, Velko and Mylonopoulos, Dario},
  doi = {10.5281/zenodo.1234},
  month = {7},
  title = {{aitviewer}},
  url = {https://github.com/eth-ait/aitviewer},
  year = {2022}
}

Acknowledgments

Constructing the ARCTIC dataset is a huge effort. The authors deeply thank: Tsvetelina Alexiadis (TA) for trial coordination; Markus Höschle (MH), Senya Polikovsky, Matvey Safroshkin, Tobias Bauch (TB) for the capture setup; MH, TA and Galina Henz for data capture; Priyanka Patel for alignment; Giorgio Becherini and Nima Ghorbani for MoSh++; Leyre Sánchez Vinuela, Andres Camilo Mendoza Patino, Mustafa Alperen Ekinci for data cleaning; TB for Vicon support; MH and Jakob Reinhardt for object scanning; Taylor McConnell for Vicon support, and data cleaning coordination; Benjamin Pellkofer for IT/web support; Neelay Shah, Jean-Claude Passy, Valkyrie Felso for evaluation server. We also thank Adrian Spurr and Xu Chen for insightful discussion. OT and DT were supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039B".

Contact

For technical questions, please create an issue. For other questions, please contact [email protected].

For commercial licensing, please contact [email protected].

Star History

Star History Chart

arctic's People

Contributors

ataboukhadra avatar kabouzeid avatar zc-alexfan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

arctic's Issues

fuse arctic and grab datasets?

Hi,

I noticed another dataset GRAB having similar configurations as ARCTIC (without images). I wonder if you have tried to merge the two datasets or not. If I would like to train a 3D-only grasp/motion model, I think I can fuse the two datasets together? Do you think if there is any obstacle preventing this?

SMPLX shape

Thanks for this great dataset!

I was looking at the data format docs for SMPLX here and noticed that there is no shape parameter included in the *.smplx.npy files.
To my understanding, the shape is required to reconstruct the joints/verts.

Am I missing something?

Occluded hand joints

Is it possible to know which hand joints are occluded in each image in the ground truth training data?

Does MANO parameters equal to SMPLX hand parameters?

Hi, I have a question about MANO parameters and SMPLX parameters.

The MANO right hand parameters at frame 0 (box_use_01.mano.npy) is:
array([-0.03240527, 0.27357996, -0.44066623, -0.15148902, -0.19981687,
-0.6955209 , 6.558325 , -0.38321212, 0.8501508 , 0.45173895,
0.18395485, -0.39749008, 0.06304266, -0.11010664, -0.6733147 ,
5.1536746 , 0.26592368, 2.5213878 , 0.4855672 , -0.2268661 ,
0.24746093, 0.26659805, -0.05636495, -0.563865 , 3.304417 ,
0.31140158, 4.6122 , 0.30845642, -0.16647121, -0.4377692 ,
0.24876864, -0.02071797, -0.6238626 , 5.9167466 , 0.02213376,
1.8504663 , -0.9403614 , 0.6202517 , -0.63148606, -0.0362047 ,
-0.23572953, 0.97068465, -0.40886796, 0.5357393 , -1.2295284 ],
dtype=float32)

While SMPLX right hand pose at frame 0 (box_use_01.smplx.npy) is:
array([ 0.0094533 , 0.16874607, -0.08016084, -0.05645293, -0.02849651,
0.07896784, 0.1604375 , -0.05796457, 0.13873944, 0.1402729 ,
0.07300756, 0.09562486, -0.11038493, -0.06354734, 0.1671049 ,
-0.03375542, 0.00310679, -0.29383537, -0.13252321, -0.0102099 ,
0.6954388 , 0.17532377, 0.09140654, 0.19590265, -0.40491733,
-0.15230231, -0.5940853 , 0.05956102, -0.04324804, 0.25968423,
-0.08441725, -0.03401042, 0.04836456, -0.06257863, 0.06051512,
-0.14253886, -0.13492593, 0.1548856 , -0.38249287, -0.41815382,
0.07842843, 0.58350015, -0.06046534, 0.25589123, -0.72547275],
dtype=float32)

It seems that they are not equal. I thought they described the same hand pose, what are the differences between them?

Questions for the image resize details

Thanks for the nice work!
I really appreciate reading your paper and baseline code.
And after following your code, I got several questions to ask.

From the cropped images to go through encoder(ResNet50), the image size should be changed into 1000x1000 -> 224x224.
I found that process happens from this line,

img = data_utils.rgb_processing(

And, in data_utils, rgb_processing function does that work.
But, I couldn't understand what does the function really do.
From the generate_patch_image function, I found the affine transformation happens, but couldn't get its meaning. Would you explain these in more detail?
Also, could you elaborate why you did does affine transformation instead of resizing? I thought only resizing the cropped_image into 224*224 is enough.

Could you explain how the bbox is determined? From below description, I could understand what the bbox is, but I got curious how you determined the bbox values. If I try to use another dataset, I think I have to determine bbox variable.
From 'data_doc.md',
data_dict - s05/box_use_01 - bbox: Bounding box (squared) on image for network input; three-dimensional cx, cy, scale where (cx, cy) is the bbox center; scale*200 is the bounding box width and height; frames x views x 3

In scripts_data/crop_images.py, process_fname function is used for cropping.

def process_fname(fname, bbox_loose, sid, view_idx, pbar):

Could you explain the details why the ego_image_scale == 0.3?

4.
[Solved]I misread the code, and asked dumb question.

Background Images?

Dear team,

Thank you for your great work! I wonder if ARCTIC provides background images for all 3rd-person view cameras. It would be very helpful to segment foreground scenes. Thank you!

Can't download data.

@zc-alexfan
Thanks for making such a great dataset.
I am a student from China. I have tried your script for downloading data but failed. The login part is passed but the downloading process was stucked at the beginning. I have tried many VPN to switch to oversea network but still didn't work. Could you please check the download link?
Best,
Yu

How to get from hand kp3d to kp2d?

Hi! I've been trying to project the 3d keypoints to 2d on my own..

I've been using the mano.j3d.full.r and mano.j2d.norm.r. I'm just trying to understand the full pseudocode involved in https://github.com/zc-alexfan/arctic/blob/7e3f736a95d7c461dbd93068de4367374c68a5ac/src/datasets/arctic_dataset.py.

It seems like the 3D is loaded (EDIT: just kidding I see now that they come from different sources...) and then:

  1. 2D drops Z and adds a homogenous 1.0
  2. 2D is scaled down by 0.3 (args.ego_image_scale)
  3. A transformation matrix is built in j2d_processing that accounts for some center/scale (which is I think [420., 300.]?)
    Yet when looking at the intrinsics matrix, the center point (106.0 110.0) seems more representative of the (224, 224) resolution?

There's also a mano.transl.r hidden in the dataset keys, but I've only gotten things to render roughly closely by using the wrist-position in the mano.j3d.full.r.

Any help elaborating what things are happening in step 3) would be really appreciated! My major issue is that I want a system that accurately recovers the 3D keypoints but to recover those and have it match the 2D keypoints I need to be able to project the 3D keypoints to 2D (with/maybe without accounting for distortion -- whatever is easiest).

Updated intrinsics for the cropped images?

Hey!
Thanks for this amazing work. I wanted to use the pre-cropped images however I can't find the modified intrinsics after the cropping? Is there a way to recover it?

Thanks.

Joint visibility labeling

I want to label joint visibility. In a previous response, it was mentioned to use segmentation masks, so I'm currently modifying the 'python scripts_data/visualizer.py --object --smplx --headless' script. During this process, I'm facing difficulty in obtaining segmentation mask images that match the cropped images. Can I get some advice on this part?

Some questions about training

Thanks for your great job!

I have downloaded the codes and data using the same splits as your baselines following the instructions. Everything works fine until I start training the ArcticNet-SF from scratch, and my issues are as follows:

  1. During the training process, I found that there are several warnings telling me that it was unable to load the images, which I found did not exist or were corrupted. I have downloaded the dataset twice on different computers, and the problem still exists. And here is the snapshot:
image I compared the generated checksum.json with the provided one and found that the only difference is that all the paths with the prefix "/data/images_zips" are missing, but I think this is because I downloaded the cropped version instead of the full-resolution one.
  1. In your default configuration, you use a batch size of 64. However, I found that it would take a very long time (e.g. 7~8 hours per epoch) for training on a single RTX 3090. Is that normal?

  2. So here comes the third question. In your paper, I didn't find any descriptions of the training time and the hardware you use for the baselines. Could you please clarify it?

Thanks in advance!

Inference on Custom Data / Images / Video

Can you provide a simple outline for how you would run inference using ArcticNet-SF/LSTM on custom data? I didn't see it in the DOCS but I might have missed it!

I'm assuming I might want to modify extract_predicts.py

missing images?

I found three images are missing in my downloaded dataset:

  • cropped_images/s01/laptop_use_04/7/00737.jpg
  • cropped_images/s01/espressomachine_grab_01/4/00443.jpg
  • cropped_images/s08/mixer_use_02/3/00295.jpg

I wonder if that is missing in the dataset or maybe just some error occured during my downloading....

Problem regarding translation

Thank you for your great work! I want to ask about the translation of the hand and the object. I want to get the object verts for a given frame, so I first fetched the vertices from the mesh file, and then try to apply the appropriate transform. The rotations seems correct using obj_rot_cam, but there seemed to be some issue with obj_trans, as shown in the following figure of reprojection result (using the intrinsics). I wonder if I should apply some other transformation on the translation, or should I multiply a scalar to it? Thanks.

image

Visualize Contact Areas

Hi, great work! I was wondering if there was a script to visualize the contact heatmaps as you do in the videos.

Hand-Object Tracking

Are you planning to setup a pipeline for Object tracking task?
Moreover, do you think ARCTIC would be a good benchmark for object tracking (using hand motion prior in some way)?
If so, how could I get 2D objects' bounding boxes extracting from the poses?
Thank you and great work!

Error in raw data proccessing

Hi! Thanks for releasing the code and dataset. I get a problem in data processing when using the command python scripts_data/process_seqs.py --export_verts. The error message is shown as follows:

Traceback (most recent call last):
  File "scripts_data/process_seqs.py", line 62, in main
    process_seq(task, export_verts=args.export_verts)
  File "./src/arctic/processing.py", line 448, in process_seq
    export_verts,
  File "./src/arctic/processing.py", line 90, in process_batch
    out_world = forward_gt_world(batch, layers, smplx_m)
  File "./src/arctic/processing.py", line 191, in forward_gt_world
    assert mano_l["joints.left"].shape[1] == 21
AssertionError

So how can I fix it? Thanks for your help!!

Hand joint ordering

Hi,

Does arctic follow the same joint ordering as the below figure:
image

Could you tell me where I can find the information on the joint ordering for both the left and right hands, please?

Thank You.

gt_transl in process_data in process_arctic.py

Shouldn't gt_transl and T0 be the same?

If the global orientation of both the object and hand are with respect to camera coord space, then the object cano space and MANO cano space are different than the camera coord space by only a translation component.

Also, shouldn't targets["mano.j3d.full.r"] and targets["mano.j3d.cam.r"] be the same since both are in camera coord space?

Hand mesh mismatch with the manopth

Hi Alex,

I just tried to use the differentiable manolayer here: https://github.com/hassony2/manopth/blob/master/manopth/manolayer.py, to reproduce the hand mesh with the provided beta and theta parameters. For some hand samples, the smplx and the manopth align with each other, but for some, it seems like the result doesn't really match, as below.
Screenshot from 2023-06-15 01-27-47

Do you have any guess on the cause for this, or have you tried to use their manolayer before? Since the GT is axis-angle, so I don't think it's because of the wrong parameters of the manolayer

ModuleNotFoundError: No module named 'common.***'

Hello, thank you for your work, it's very interesting.
I would like to clarify whether it works on Windows or only on Linux?
I have Windows, but I satisfy all the general requirements from docs/setup.md.
I wanted to test your code and followed your instructions, but I had so many errors with packages. I was able to fix some of them, but one of them is still unsolved - "ModuleNotFoundError: No module named 'common.***', now all errors are related to module common only.
Thank you for your reply.

Marker positions in 3D or 2D

Dear all,

Thanks for this amazing dataset. I have been working on it and need to find out the positions of the markers, either in 3D (world coordinate) or 2D (image coordinate). I looked through the data documentation and found 5 ego-centric camera marker positions, but not the other markers, e.g., the ones on the hands.

Exploring the code in src/utils/eval_modules.py, I found the joints_gt variable being used with the dimensions Nx14x3 which I can guess is for the 3D marker positions. If my guess is correct, it means that these marker positions can be extracted using scripts_method/evaluate_metrics.py which uses the models.

My question is: is there another way I can extract the marker positions without the need to run the models?

Many thanks in advance!
Amin

Training setup with multiple GPU

Hi Alex,

Thanks for your work. I have a question regarding the setup of training the ArcticNet-SF with multiple GPU. I run the provided command for training it follow the CVPR setup, and it takes like 2 hours for single epoch, thus I would like to accelerate it.

However, if simply changing the code within the trainer function:

trainer = pl.Trainer(
to set devices number and strategy caused many problems. I think it's because there are many .to(device) operations within the data pre-processing code, so hard to directly run the code with multi-GPU setup.

Have you tried with multi-GPU training? If so, could you please provide more suggestions for this?

Evaluating the predicted weak perspective camera

The predicted weak perspective camera is converted into the translation term, which is then used to compute the vertices and joints in camera coordinate frame. However, during evaluation, all the 3D locations are shifted to the wrist joint and then metrics are computed. This effectively gets rid of any translation error. Is there any reason why the metrics are not computed in the camera frame?

Possible inconsistency in the MoCap labels?

Working on the MoCap data, I noticed some labels are not the same for different subjects. An example of this is:
The markers on the base of the table correspond to [M_1, M_2, M_5, M_6] for subjects s01 through s04 and [M_5, M_6, M_7, M_8]for subjects s05 and s06.

Is there a part of the data that specifies which labels correspond to which markers?
Of course, this might be a problem with my visualization, in which case I'd appreciate it if you could confirm that the labels correspond to the same markers across the subjects.

Thanks!
Amin

Intrinsics in visualizer

For visualizing mesh by projecting them onto the image, shouldn't the intrinsics in

focal = 1000.0
rows = 224
cols = 224
K = np.array([[focal, 0, rows / 2.0], [0, focal, cols / 2.0], [0, 0, 1]])
be taken from meta_info in predictions rather than hard coding here?

Mismatch between number of images and corresponding mano files

I want to use the mano files, but I noticed that there is a mismatch between number of files:

Ex) Within this particular folder, there are 733 images.
image

But, the number of mano files are only 732 (unpack/arctic_data/data/raw_seqs/s01/box_grab_01.mano.npy):
image

Since the mano file does not have information about which image each row maps to, instead of discarding the one image for which the mano parameters are missing, I would have to discard all the images in this folder.
I wanted to confirm this is the case or if there was an error when downloading the files.
Thank You!

NameError: name 'to_homo_batch' is not defined

Exception thrown when running the following command:

python -m scripts_data.process_seqs --mano_p ./data/arctic_data/data/raw_seqs/s01/capsulemachine_use_01.mano.npy --export_verts

I am on commit 20f5b6b464328621bf167b882f43ec14b127e7b4

More details later...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.