Giter Club home page Giter Club logo

ev2hands's Introduction

Ev2Hands: 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera [3DV'24]

Official PyTorch implementation

Project page | Paper | πŸ€— Demo

Ev2Hands

Abstract

3D hand tracking from a monocular video is a very challenging problem due to hand interactions, occlusions, left-right hand ambiguity, and fast motion. Most existing methods rely on RGB inputs, which have severe limitations under low-light conditions and suffer from motion blur. In contrast, event cameras capture local brightness changes instead of full image frames and do not suffer from the described effects, but, unfortunately, existing image-based techniques cannot be directly applied to events due to significant differences in the data modalities. In response to these challenges, this paper introduces the first framework for 3D tracking of two fast-moving and interacting hands from a single monocular event camera. Our approach tackles the left-right hand ambiguity with a novel semi-supervised feature-wise attention mechanism and integrates an intersection loss to fix hand collisions. To facilitate advances in this research domain, we release a new synthetic large-scale dataset of two interacting hands, Ev2Hands-S, and a new real benchmark with real event streams and ground-truth 3D annotations, Ev2Hands-R. Our approach outperforms existing methods in terms of the 3D reconstruction accuracy and generalizes to real data under severe light conditions.

Advantages of Event Based Vision

High Speed Motion Low Light Performance

Ev2Hands

Ev2Hands

Usage



Installation

Clone the repository

https://github.com/Chris10M/Ev2Hands.git
cd Ev2Hands

Data Prerequisites

Data Folder
  • Please register at the MANO website and download the MANO models.
  • Put the MANO models in a folder with the following structure.
  • Download the data from here and ensure that the files and folders have the following structure.
src
|
└── data
      |
      └── models
      |     |
      |     └── mano
      |           β”œβ”€β”€ MANO_RIGHT.pkl
      |           └── MANO_LEFT.pkl
      └── background_images
      └── mano_textures
      └── J_regressor_mano_ih26m.npy
      └── TextureBasis
Pretrained Model

The pretrained model best_model_state_dict.pth can be found here. Please place the model in the following folder structure.

src
|
└── Ev2Hands
       |
       └── savedmodels
               |
               └── best_model_state_dict.pth

Hand Simulator

Create a conda enviroment from the file

conda env create -f hand_simulator.yml

Install VID2E inside src/HandSimulator following the instructions from here. This is needed for generating events from the synthetic data.

Ev2Hands

Create a conda enviroment from the file

conda env create -f ev2hands.yml

Install torch-mesh-isect following the instructions from here. This is needed for the Intersection Aware Loss.

Note: To compile torch-mesh-isect, we found pytorch=1.4.0 with cuda10.1 works without any issues.

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

Datasets

Ev2Hands-S

Download

The Ev2Hands-S can be downloaded from here. Unzip and set ROOT_TRAIN_DATA_PATH in src/settings.py to the path of the downloaded dataset.

Generation
  • Download InterHand2.6M (5 fps), and set INTERHAND_ROOT_PATH in src/settings.py to the directory of InterHand2.6M_5fps_batch1

  • Set the path to generate Ev2Hands-S, ROOT_TRAIN_DATA_PATH in src/settings.py. Now, run

    cd src/HandSimulator
    GENERATION_MODE={'train', 'val', 'test'} python main.py
    

    to generate the dataset in parts. Note: Set GENERATION_MODE as either train, val or test.

  • After the dataset parts are generated, we stich the parts to get a H5 event dataset and an annnotation pickle using,

    cd src/HandSimulator
    GENERATION_MODE={'train', 'val', 'test'} python stich_mp.py 
    

    Note: Set GENERATION_MODE as either train, val or test.

  • For generating the dataset using SLURM, please check src/HandSimulator/slurm_main.sh, src/HandSimulator/slurm_stich_mp.sh

Ev2Hands-R

Download

The Ev2Hands-R can be downloaded from here. Unzip and set the train and test folders of Ev2Hands-R in src/settings.py

REAL_TRAIN_DATA_PATH = 'path/to/Ev2Hands-R/train'
REAL_TEST_DATA_PATH = 'path/to/Ev2Hands-R/test'

Training

Pretraining

For pretraining, ensure Ev2Hands-S is present. Please check src/Ev2Hands/arg_parser.py for default batch size and checkpoint path.

cd src/Ev2Hands
python train.py 

Finetuning

For finetuning, ensure Ev2Hands-R is present. Please check src/Ev2Hands/arg_parser.py for default batch size and checkpoint path.

cd src/Ev2Hands
python finetune.py 

Evaluation

Ev2Hands-S

For evaluation, ensure Ev2Hands-S is present. Note: The provided model will not work well with Ev2Hands-S as it is finetuned for Ev2Hands-R.

cd src/Ev2Hands
python evaluate.py 

Ev2Hands-R

For evaluation, ensure Ev2Hands-R is present.

cd src/Ev2Hands
python evaluate_ev2hands_r.py 

Demo

Performs inference on real event streams. A trained model is needed for demo to work. You can also use the pretrained model.

demo

To run the demo,

cd src/Ev2Hands
python3 demo.py

Please check src/Ev2Hands/arg_parser.py for default batch size and checkpoint path.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{Millerdurai_3DV2024, 
    title={3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera}, 
    author={Christen Millerdurai and Diogo Luvizon and Viktor Rudnev and AndrΓ© Jonas and Jiayi Wang and Christian Theobalt and Vladislav Golyanik}, 
    booktitle = {International Conference on 3D Vision (3DV)}, 
    year={2024} 
} 

License

Ev2Hands is under CC-BY-NC 4.0 license. The license also applies to the pre-trained models.

ev2hands's People

Contributors

chris10m avatar eltociear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ev2hands's Issues

Inaccuracy in Joint Annotations of Ev2Hands-R Dataset

Hi @Chris10M,

Thank you for the great work and for making the code available.

To get an insight into the Ev2Hands-R data, I plotted the ground-truth joint annotations on the event frames (plot_data.py). I observed that for some frames, the joint annotations used to train the Ev2Hands model are not accurate; the projected 2D skeleton does not coincide accurately with the event cloud, as seen in the attached images (column 1: input RGB image, column 2: input event frame, column 3: input event frame + ground-truth 2D joint skeleton). The fingers are straight in the input RGB and event frames, but the annotated skeleton appears to have bent fingers. This would have adversely affected the network training, leading to suboptimal accuracy.

Is this inaccuracy in the ground-truth joint annotations due to the motion tracking system (Captury)? Or could this be due to some misalignment in the synchronization of the event and RGB streams?

0000001
0000011
0000061

Query in correspondence of samples across datasets

Hi @Chris10M,

Thank you for providing the synchronized RGB streams and the MANO parameters. I had a query regarding the number of samples in this data. Below are the number of samples for each subject.

Subject Ev2Hands-R (subject_{id}_event.pickle) MANO_Parameters (event_mano_params.pkl) MANO_Parameters (rgb_mano_params.pkl) RGB_Data (subject_{id}_rgb.mp4)
1 11384 11384 11383 11385
2 11777 11777 11777 11778
3 11921 11921 11921 11922
4 13275 13275 13275 13276
5 12386 12386 12385 12387

Could you tell me the correspondence across these datasets? Specifically, given a sample index in rgb_mano_params.pkl, what is the corresponding frame in subject_{id}_rgb.mp4? Also, for subjects 1 and 5, given a sample index in event_mano_params.pkl what is the corresponding sample index in rgb_mano_params.pkl?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.