Giter Club home page Giter Club logo

harper's Introduction

Exploring 3D Human Pose Estimation and Forecasting from the Robot’s Perspective: The HARPER Dataset

Andrea Avogaro, Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, and Marco Cristani

arXiv

This is the official code page for the Human from an Articulated Robot Perspective (HARPER) dataset!

Abstract:

We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.

HARPER Overview

Dataset download

Run the following command to download the dataset:

PYTHONPATH=. python download/harper_downloader.py --dst_folder ./data

Data structure:

-coming soon-

3D data only (no images)

The dataset has two points of view: the panoptic point of view and the robot's perspective point of view. The first one is obtained using a 6-camera OptiTrack MoCap system. Thanks to it, the human skeleton pose (21x3) and the Spot skeleton can be located in the same 3D reference system.

For the sake of completeness, we provide the 3D panoptic data here. To download and create the data structure with train and test splits you can use the following code:

PYTHONPATH=. python download/harper_only_3d_downloader.py.py --dst_folder ./data

This will generate the following tree structure:

data
├── harper_3d_120
│   ├── test
│   │   ├── subj_act_120hz.pkl
│   │   ├── ...
│   │   └── subj_act_120hz.pkl
│   └── train
│       ├── subj_act_120hz.pkl
│       ├── ...
│       └── subj_act_120hz.pkl
└── harper_3d_30
    ├── test
    │   ├── subj_act_30hz.pkl
    │   ├── ...
    │   └── subj_act_30hz.pkl
    └── train
        ├── subj_act_30hz.pkl
        ├── ...
        └── subj_act_30hz.pkl

In the harper_3d_120 and harper_3d_30 folders you will find the 3D panoptic data at 120 Hz and 30 Hz respectively, both with train and test split.

Each .pkl file contains a dictionary with the frame index as key and the following values:

  • frame: the frame index
  • subject: the subject id
  • action: the action id
  • human_joints_3d: the 3D human pose (21x3)
  • spot_joints_3d: the 3D Spot pose (22x3)

A torch dataloader will be provided soon.

Visualize the 3D panoptic data

To visualize the 3D panoptic data you can use the following code:

PYTHONPATH=. python tools/visualization/visualize_3d.py --pkl_file ./data/harper_3d_30/train/cun_act1_30hz.pkl

Important notes regarding the annotations

Due to limits in the SPOT SDK at the time of recording, the 2D keypoints annotations were manually verified and eventually fixed to be synchronized with the frame rate obtained by the SPOT. For the 3D pose estimation pipeline, we lift the 2D keypoints using the depth values (see key 'visibles_3d' in the annotations).

harper's People

Contributors

federicocunico avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

harper's Issues

How to calculate "last frame average" in TABLE Ⅲ

Dear author, thanks for your great work!

I want to know how to calculate "last frame average" in TABLE Ⅲ. In the paper, it is described as "the average over the last frame of each action instance (Last frame average).", but I don't really understand. Does it
Could you explain it for me or offer the code about evaluation?

Besides, I want to know how to get the data used for HPF benchmark.
I failed to get the same number of sequence described in paper that "For 3D-HPF, we sampled 7917 sequences (of 20 frames each) for the training set and 3088 for the test set".

Thanks for your help!

Inquiry regarding access to RGB videos from External RGB Camera

Hi,

Thank you for your outstanding work! I believe your work can greatly benefit HRC studies.

I have a few questions about the structure and names of the repositories in your onedrive link:

  • Do the HARPER - 3D panoptic - high fps and HARPER - 3D only directories contain the same 3D keypoint data?
  • In the HARPER directory, are the single zips and harper.tar.gz the same?
  • It seems that single zips and harper.tar.gz consist of videos from Spot's stereo cameras (5 Greyscale + Depth and 1 RGB-D). Could you please let me know where I can find the RGB videos captured by the External RGB Camera?

I would greatly appreciate any clarification you can provide. Thank you in advance for your help!

Best regards,

Baseline Code

Hello, can you provide the baseline code for 3D pose estimation and motion prediction in the paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.