Giter Club home page Giter Club logo

zc-alexfan / arctic Goto Github PK

View Code? Open in Web Editor NEW
255.0 255.0 17.0 21.55 MB

[CVPR 2023] Official repository for downloading, processing, visualizing, and training models on the ARCTIC dataset.

Home Page: https://arctic.is.tue.mpg.de

License: Other

Makefile 0.13% Shell 0.79% Python 99.09%
3d-reconstruction animation artificial-intelligence augmented-reality awesome awesome-list computer-graphics computer-vision hand-object-interaction hand-tracking mano mixed-reality neural-networks pose-estimation pytorch smplx virtual-reality

arctic's Issues

Error in raw data proccessing

Hi! Thanks for releasing the code and dataset. I get a problem in data processing when using the command python scripts_data/process_seqs.py --export_verts. The error message is shown as follows:

Traceback (most recent call last):
  File "scripts_data/process_seqs.py", line 62, in main
    process_seq(task, export_verts=args.export_verts)
  File "./src/arctic/processing.py", line 448, in process_seq
    export_verts,
  File "./src/arctic/processing.py", line 90, in process_batch
    out_world = forward_gt_world(batch, layers, smplx_m)
  File "./src/arctic/processing.py", line 191, in forward_gt_world
    assert mano_l["joints.left"].shape[1] == 21
AssertionError

So how can I fix it? Thanks for your help!!

Questions for the image resize details

Thanks for the nice work!
I really appreciate reading your paper and baseline code.
And after following your code, I got several questions to ask.

From the cropped images to go through encoder(ResNet50), the image size should be changed into 1000x1000 -> 224x224.
I found that process happens from this line,

img = data_utils.rgb_processing(

And, in data_utils, rgb_processing function does that work.
But, I couldn't understand what does the function really do.
From the generate_patch_image function, I found the affine transformation happens, but couldn't get its meaning. Would you explain these in more detail?
Also, could you elaborate why you did does affine transformation instead of resizing? I thought only resizing the cropped_image into 224*224 is enough.

Could you explain how the bbox is determined? From below description, I could understand what the bbox is, but I got curious how you determined the bbox values. If I try to use another dataset, I think I have to determine bbox variable.
From 'data_doc.md',
data_dict - s05/box_use_01 - bbox: Bounding box (squared) on image for network input; three-dimensional cx, cy, scale where (cx, cy) is the bbox center; scale*200 is the bounding box width and height; frames x views x 3

In scripts_data/crop_images.py, process_fname function is used for cropping.

def process_fname(fname, bbox_loose, sid, view_idx, pbar):

Could you explain the details why the ego_image_scale == 0.3?

4.
[Solved]I misread the code, and asked dumb question.

Does MANO parameters equal to SMPLX hand parameters?

Hi, I have a question about MANO parameters and SMPLX parameters.

The MANO right hand parameters at frame 0 (box_use_01.mano.npy) is:
array([-0.03240527, 0.27357996, -0.44066623, -0.15148902, -0.19981687,
-0.6955209 , 6.558325 , -0.38321212, 0.8501508 , 0.45173895,
0.18395485, -0.39749008, 0.06304266, -0.11010664, -0.6733147 ,
5.1536746 , 0.26592368, 2.5213878 , 0.4855672 , -0.2268661 ,
0.24746093, 0.26659805, -0.05636495, -0.563865 , 3.304417 ,
0.31140158, 4.6122 , 0.30845642, -0.16647121, -0.4377692 ,
0.24876864, -0.02071797, -0.6238626 , 5.9167466 , 0.02213376,
1.8504663 , -0.9403614 , 0.6202517 , -0.63148606, -0.0362047 ,
-0.23572953, 0.97068465, -0.40886796, 0.5357393 , -1.2295284 ],
dtype=float32)

While SMPLX right hand pose at frame 0 (box_use_01.smplx.npy) is:
array([ 0.0094533 , 0.16874607, -0.08016084, -0.05645293, -0.02849651,
0.07896784, 0.1604375 , -0.05796457, 0.13873944, 0.1402729 ,
0.07300756, 0.09562486, -0.11038493, -0.06354734, 0.1671049 ,
-0.03375542, 0.00310679, -0.29383537, -0.13252321, -0.0102099 ,
0.6954388 , 0.17532377, 0.09140654, 0.19590265, -0.40491733,
-0.15230231, -0.5940853 , 0.05956102, -0.04324804, 0.25968423,
-0.08441725, -0.03401042, 0.04836456, -0.06257863, 0.06051512,
-0.14253886, -0.13492593, 0.1548856 , -0.38249287, -0.41815382,
0.07842843, 0.58350015, -0.06046534, 0.25589123, -0.72547275],
dtype=float32)

It seems that they are not equal. I thought they described the same hand pose, what are the differences between them?

Hand mesh mismatch with the manopth

Hi Alex,

I just tried to use the differentiable manolayer here: https://github.com/hassony2/manopth/blob/master/manopth/manolayer.py, to reproduce the hand mesh with the provided beta and theta parameters. For some hand samples, the smplx and the manopth align with each other, but for some, it seems like the result doesn't really match, as below.
Screenshot from 2023-06-15 01-27-47

Do you have any guess on the cause for this, or have you tried to use their manolayer before? Since the GT is axis-angle, so I don't think it's because of the wrong parameters of the manolayer

Occluded hand joints

Is it possible to know which hand joints are occluded in each image in the ground truth training data?

Visualize Contact Areas

Hi, great work! I was wondering if there was a script to visualize the contact heatmaps as you do in the videos.

Possible inconsistency in the MoCap labels?

Working on the MoCap data, I noticed some labels are not the same for different subjects. An example of this is:
The markers on the base of the table correspond to [M_1, M_2, M_5, M_6] for subjects s01 through s04 and [M_5, M_6, M_7, M_8]for subjects s05 and s06.

Is there a part of the data that specifies which labels correspond to which markers?
Of course, this might be a problem with my visualization, in which case I'd appreciate it if you could confirm that the labels correspond to the same markers across the subjects.

Thanks!
Amin

How to get from hand kp3d to kp2d?

Hi! I've been trying to project the 3d keypoints to 2d on my own..

I've been using the mano.j3d.full.r and mano.j2d.norm.r. I'm just trying to understand the full pseudocode involved in https://github.com/zc-alexfan/arctic/blob/7e3f736a95d7c461dbd93068de4367374c68a5ac/src/datasets/arctic_dataset.py.

It seems like the 3D is loaded (EDIT: just kidding I see now that they come from different sources...) and then:

  1. 2D drops Z and adds a homogenous 1.0
  2. 2D is scaled down by 0.3 (args.ego_image_scale)
  3. A transformation matrix is built in j2d_processing that accounts for some center/scale (which is I think [420., 300.]?)
    Yet when looking at the intrinsics matrix, the center point (106.0 110.0) seems more representative of the (224, 224) resolution?

There's also a mano.transl.r hidden in the dataset keys, but I've only gotten things to render roughly closely by using the wrist-position in the mano.j3d.full.r.

Any help elaborating what things are happening in step 3) would be really appreciated! My major issue is that I want a system that accurately recovers the 3D keypoints but to recover those and have it match the 2D keypoints I need to be able to project the 3D keypoints to 2D (with/maybe without accounting for distortion -- whatever is easiest).

how to export contact info

How can I generate & export the per-step hand-object binary contact information? I see the scripts_method/visualize.py has an option to visualize it from model predictions, but I'm trying to generate GT contact from the raw data.
Thanks in advance!

Some questions about training

Thanks for your great job!

I have downloaded the codes and data using the same splits as your baselines following the instructions. Everything works fine until I start training the ArcticNet-SF from scratch, and my issues are as follows:

  1. During the training process, I found that there are several warnings telling me that it was unable to load the images, which I found did not exist or were corrupted. I have downloaded the dataset twice on different computers, and the problem still exists. And here is the snapshot:
image I compared the generated checksum.json with the provided one and found that the only difference is that all the paths with the prefix "/data/images_zips" are missing, but I think this is because I downloaded the cropped version instead of the full-resolution one.
  1. In your default configuration, you use a batch size of 64. However, I found that it would take a very long time (e.g. 7~8 hours per epoch) for training on a single RTX 3090. Is that normal?

  2. So here comes the third question. In your paper, I didn't find any descriptions of the training time and the hardware you use for the baselines. Could you please clarify it?

Thanks in advance!

Cropped image size && mask image size

Dear team, thanks for your great work! When I visualize mask images, I find that the mask image size is 1600 * 900, but the cropped image size is 1000 * 1000, so the mask images can't match the cropped images pixel by pixel. How can they be exactly matched? I'd be really appreciated if you could give me some advice.

Marker positions in 3D or 2D

Dear all,

Thanks for this amazing dataset. I have been working on it and need to find out the positions of the markers, either in 3D (world coordinate) or 2D (image coordinate). I looked through the data documentation and found 5 ego-centric camera marker positions, but not the other markers, e.g., the ones on the hands.

Exploring the code in src/utils/eval_modules.py, I found the joints_gt variable being used with the dimensions Nx14x3 which I can guess is for the 3D marker positions. If my guess is correct, it means that these marker positions can be extracted using scripts_method/evaluate_metrics.py which uses the models.

My question is: is there another way I can extract the marker positions without the need to run the models?

Many thanks in advance!
Amin

Training setup with multiple GPU

Hi Alex,

Thanks for your work. I have a question regarding the setup of training the ArcticNet-SF with multiple GPU. I run the provided command for training it follow the CVPR setup, and it takes like 2 hours for single epoch, thus I would like to accelerate it.

However, if simply changing the code within the trainer function:

trainer = pl.Trainer(
to set devices number and strategy caused many problems. I think it's because there are many .to(device) operations within the data pre-processing code, so hard to directly run the code with multi-GPU setup.

Have you tried with multi-GPU training? If so, could you please provide more suggestions for this?

fuse arctic and grab datasets?

Hi,

I noticed another dataset GRAB having similar configurations as ARCTIC (without images). I wonder if you have tried to merge the two datasets or not. If I would like to train a 3D-only grasp/motion model, I think I can fuse the two datasets together? Do you think if there is any obstacle preventing this?

Hand-Object Tracking

Are you planning to setup a pipeline for Object tracking task?
Moreover, do you think ARCTIC would be a good benchmark for object tracking (using hand motion prior in some way)?
If so, how could I get 2D objects' bounding boxes extracting from the poses?
Thank you and great work!

missing images?

I found three images are missing in my downloaded dataset:

  • cropped_images/s01/laptop_use_04/7/00737.jpg
  • cropped_images/s01/espressomachine_grab_01/4/00443.jpg
  • cropped_images/s08/mixer_use_02/3/00295.jpg

I wonder if that is missing in the dataset or maybe just some error occured during my downloading....

Evaluating the predicted weak perspective camera

The predicted weak perspective camera is converted into the translation term, which is then used to compute the vertices and joints in camera coordinate frame. However, during evaluation, all the 3D locations are shifted to the wrist joint and then metrics are computed. This effectively gets rid of any translation error. Is there any reason why the metrics are not computed in the camera frame?

Hand joint ordering

Hi,

Does arctic follow the same joint ordering as the below figure:
image

Could you tell me where I can find the information on the joint ordering for both the left and right hands, please?

Thank You.

Can't download data.

@zc-alexfan
Thanks for making such a great dataset.
I am a student from China. I have tried your script for downloading data but failed. The login part is passed but the downloading process was stucked at the beginning. I have tried many VPN to switch to oversea network but still didn't work. Could you please check the download link?
Best,
Yu

Intrinsics in visualizer

For visualizing mesh by projecting them onto the image, shouldn't the intrinsics in

focal = 1000.0
rows = 224
cols = 224
K = np.array([[focal, 0, rows / 2.0], [0, focal, cols / 2.0], [0, 0, 1]])
be taken from meta_info in predictions rather than hard coding here?

Some more details about the pretrained models

Following #5, I noticed my training times are very different from yours. E.g., each epoch takes ~8 hours on the full data with twice your batch size (128) and the default parameters on an A100 GPU.

  1. Noticing that, my only guess is that the pretrained models are trained on cropped images. Is that correct?
  2. Also, I see that the training loss stays at ~60 for around 15k steps and then suddenly drops down to ~2. Do you possibly remember seeing the same behaviour?
    I tried with different learning rates and the behavior is more or less the same.

Thanks!
Amin

Problem regarding translation

Thank you for your great work! I want to ask about the translation of the hand and the object. I want to get the object verts for a given frame, so I first fetched the vertices from the mesh file, and then try to apply the appropriate transform. The rotations seems correct using obj_rot_cam, but there seemed to be some issue with obj_trans, as shown in the following figure of reprojection result (using the intrinsics). I wonder if I should apply some other transformation on the translation, or should I multiply a scalar to it? Thanks.

image

Inference on Custom Data / Images / Video

Can you provide a simple outline for how you would run inference using ArcticNet-SF/LSTM on custom data? I didn't see it in the DOCS but I might have missed it!

I'm assuming I might want to modify extract_predicts.py

Mismatch between number of images and corresponding mano files

I want to use the mano files, but I noticed that there is a mismatch between number of files:

Ex) Within this particular folder, there are 733 images.
image

But, the number of mano files are only 732 (unpack/arctic_data/data/raw_seqs/s01/box_grab_01.mano.npy):
image

Since the mano file does not have information about which image each row maps to, instead of discarding the one image for which the mano parameters are missing, I would have to discard all the images in this folder.
I wanted to confirm this is the case or if there was an error when downloading the files.
Thank You!

NameError: name 'to_homo_batch' is not defined

Exception thrown when running the following command:

python -m scripts_data.process_seqs --mano_p ./data/arctic_data/data/raw_seqs/s01/capsulemachine_use_01.mano.npy --export_verts

I am on commit 20f5b6b464328621bf167b882f43ec14b127e7b4

More details later...

Updated intrinsics for the cropped images?

Hey!
Thanks for this amazing work. I wanted to use the pre-cropped images however I can't find the modified intrinsics after the cropping? Is there a way to recover it?

Thanks.

Joint visibility labeling

I want to label joint visibility. In a previous response, it was mentioned to use segmentation masks, so I'm currently modifying the 'python scripts_data/visualizer.py --object --smplx --headless' script. During this process, I'm facing difficulty in obtaining segmentation mask images that match the cropped images. Can I get some advice on this part?

gt_transl in process_data in process_arctic.py

Shouldn't gt_transl and T0 be the same?

If the global orientation of both the object and hand are with respect to camera coord space, then the object cano space and MANO cano space are different than the camera coord space by only a translation component.

Also, shouldn't targets["mano.j3d.full.r"] and targets["mano.j3d.cam.r"] be the same since both are in camera coord space?

ModuleNotFoundError: No module named 'common.***'

Hello, thank you for your work, it's very interesting.
I would like to clarify whether it works on Windows or only on Linux?
I have Windows, but I satisfy all the general requirements from docs/setup.md.
I wanted to test your code and followed your instructions, but I had so many errors with packages. I was able to fix some of them, but one of them is still unsolved - "ModuleNotFoundError: No module named 'common.***', now all errors are related to module common only.
Thank you for your reply.

SMPLX shape

Thanks for this great dataset!

I was looking at the data format docs for SMPLX here and noticed that there is no shape parameter included in the *.smplx.npy files.
To my understanding, the shape is required to reconstruct the joints/verts.

Am I missing something?

Background Images?

Dear team,

Thank you for your great work! I wonder if ARCTIC provides background images for all 3rd-person view cameras. It would be very helpful to segment foreground scenes. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.