Giter Club home page Giter Club logo

kypt_transformer's Introduction

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Overview

Installation and Setup

  1. Setup the conda environment

    conda create --name kypt_trans python==3.8.11
    conda activate kypt_trans
    conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
    pip install -r requirements.txt
    
  2. Download MANO model files from the website (requires login) and set the smplx_path in config.py

  3. Clone the Current Repo

    git clone <curr_repo>
    cd kypt_trans
    cd main
    

The setup has been tested on NVIDIA 3090 GPU.

Depending on the dataset you intend to train/evaluate follow the instructions below for the setup.

InterHand2.6M Setup

  1. Download the dataset from the website
  2. In config.py, set interhand_anno_dir to point to the annotations directory
  3. In config.py, set interhand_images_path to point to the images directory
  4. If you intend to use RootNet output for the root joint translation, download the RootNet results for InterHand2.6M from here. Set root_net_output_path in config.py for point to the RootNet outputs folder. Instead, if you intend to test with ground-truth relative translation, set root_net_output_path to None

HO-3D Setup

  1. Download the dataset from the website and set ho3d_anno_dir in config.py to point to the dataset folder
  2. Download the YCB models from here. The original mesh models are large and wont fit in the memory for some computation. Because of this the mesh decimated to 2000 faces as described here. Rename the decimated models as textured_simple_2000.obj

H2O-3D Setup

  1. Download the dataset from the website and set h2o3d_anno_dir in config.py to point to the dataset folder.
  2. Follow step 2 in HO-3D Setup above to download, decimate and set the object models.

Demo

We provide the demo script for visualizing the outputs.

InterHand2.6M

  1. Download the checkpoint file for the model trained to output MANO joint angles from here
  2. Set dataset = 'InterHand2.6M' and pose_representation = 'angles' in config.py
  3. Run the following script:
    python demo.py --ckpt_path <path_to_ckpt> --use_big_decoder --dec_layers 6
    
  4. The outputs are shown in matplotlib and open3d windows. See the instructions in the command line to navigate

HO-3D

  1. Download the checkpoint file for the model trained to output 3D pose representation from here
  2. Set dataset = 'ho3d' and pose_representation = '3D' in config.py
  3. Run the follwing script:
    python demo.py --ckpt_path <path_to_ckpt> --use_big_decoder --dec_layers 6
    
  4. Since the output here is only 3D joint locations, the projections in 2D are shown in the matplotlib window. See the instructions in the command line to navigate.

Evaluation

Depending on the dataset you intend to evaluate follow the instructions below.

InterHand2.6M (Table 1 in Paper)

  1. Make the following changes in the config.py

    dataset = 'InterHand2.6M'`
    pose_representation = '2p5D' # Table 1 in paper uses 2.5D pose representation
    
  2. Download the checkpoint file from here

  3. Run the following command:

    python test.py --ckpt_path <path_to_interhand2.6m_ckpt> --gpu_ids <gpu_ids>
    

    If running on multiple GPUs, set <gpu_ids> to 0,1,2,3

  4. The error metrics are dumped into a .txt file in the folder containing the checkpoint

  5. Final numbers as below:

[Results txt file]

Single Hand MPJPE (mm) Interacting Hands MPJPE (mm) All MPJPE (mm) MRRPE (mm)
10.88 14.16 12.62 29.50

HO-3D (v2) (Table 2 in Paper)

  1. Make the following changes in the config.py
    dataset = 'ho3d'
    pose_representation = '3D' # Table 2 in paper uses 3D pose representation
    
  2. Download the checkpoint file from here
  3. Run the following command:
    python test.py --ckpt_path <path_to_ho3d_ckpt> --use_big_decoder --dec_layers 6
    
  4. The object error metric (MSSD) is dumped into a .txt file in the folder containing the checkpoint
  5. Also dumped is a .json file which can be submitted to the HO-3D (v2) challenge after zipping the file
  6. Here is the dumped results file after the run: [Results txt file]
  7. Hand pose estimation accuracy in the HO-3D challenge leaderboard: here, user: bullet

H2O-3D

  1. Make the following changes in the config.py
    dataset = 'h2o3d'
    pose_representation = '3D' # h2o3d results in paper uses 3D pose representation
    
  2. Download the checkpoint file from here
  3. Run the following command:
    python test.py --ckpt_path <path_to_h2o3d_ckpt>
    
  4. The object error metric (MSSD) is dumped into a .txt file in the folder containing the checkpoint
  5. Also dumped is a .json file which can be submitted to the H2O-3D challenge after zipping the file.
  6. Here is the dumped results file after the run: [Results txt file]
  7. Hand pose estimation accuracy in the H2O-3D challenge leaderboard: here, user: bullet

Training

  1. Depending on the dataset and output pose respresentation you intend to train on, set the dataset & pose_representation variables in the config.py.
  2. Run the following script to start the training:
    CUDA_VISIBLE_DEVICES=0,1 python train.py --run_dir_name <run_name>
    
    To continue training from the last saved checkpoint use --continue argument in the above command.
  3. The checkpoints are dumped after every ecoch in the 'output' folder of the base directory
  4. Tensorboard logging is also available in the 'output' folder
Training with HO-3D and H2O-3D datasets together

The H2O-3D results in the paper are obtained by training the network on the combined dataset of HO-3D and H2O-3D. This training can be achieved by setting dataset to ho3d_h2o3d in config.py.

Reference

    @InProceedings{Hampali_2022_CVPR_Kypt_Trans,  
    author = {Shreyas Hampali and Sayan Deb Sarkar and Mahdi Rad and Vincent Lepetit},  
    title = {Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation},  
    booktitle = {IEEE Computer Vision and Pattern Recognition Conference},  
    year = {2022}  
    }  

Acknowlegements

  • A lot of the code has been reused from InterHand2.6M repo and the DETR repo. We thank the authors for making their code public

kypt_transformer's People

Contributors

shreyashampali avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.