Giter Club home page Giter Club logo

wholebody3d's Introduction

H3WB: Human3.6M 3D WholeBody Dataset and Benchmark

This is the official repository for the paper "H3WB: Human3.6M 3D WholeBody Dataset and Benchmark" (ICCV'23). The repo contains Human3.6M 3D WholeBody (H3WB) annotations proposed in this paper.

For the 3D whole-body benchmark and results please refer to benchmark.md.

🆕Updates

Table of Content

What is H3WB

H3WB is a large-scale dataset for 3D whole-body pose estimation. It is an extension of Human3.6m dataset and contains 133 whole-body (17 for body, 6 for feet, 68 for face and 42 for hands) keypoint annotations on 100K images. The skeleton layout is the same as COCO-Wholebody dataset. Extensions to other popular 3D pose estimation datasets are ongoing and we already have annotations for Total Capture. If you want your favorite multi-view dataset to get whole-body 3D annotations, let us know!

Example annotations:

Layout from COCO-WholeBody: Image source.

H3WB Dataset

Download

  • Images can be downloaded from the official cite of Human3.6m dataset. We provide a data preparation script to compile Human3.6m videos into images which allows establishing correct correspondence between images and annotations.

  • The annotations for H3WB can be downloaded from here and by default it is put under datasets/json/.

  • The annotations for T3WB can be downloaded from here.

  • You could also download H3WB dataset in a format commonly employed for 3D pose estimation tasks here. We provide an accompanying data preparation class for this format. We highly recommend this format for your experiments. The util files camera.py, mocap_dataset.py and skeleton.py are directly taken from VideoPose repository.

Annotation format

Every json is in the following structure, but not every json contains all these values. See Tasks section.

XXX.json --- sample id --- 'image_path'
                        |
                        -- 'bbox' --- 'x_min'
                        |          |- 'y_min'
                        |          |- 'x_max'
                        |          |- 'y_max'
                        |
                        |- 'keypont_2d' --- joint id --- 'x'
                        |                             |- 'y'
                        |
                        |- 'keypont_3d' --- joint id --- 'x'
                                                      |- 'y'
                                                      |- 'z'
                        
                        

We also provide a script to load json files.

Pretrained models

H3WB comes with pretrained models that were used to create the datasets. Model implementations can be found in the 'models/' folder. Please find chekpoints in the table below:

Dataset Completion Diffusion Hands Diffusion Face
H3WB ckpt ckpt ckpt

Pretrained models for the different tasks of the benchmark can be found in benchmark.md.

Tasks

We propose 3 different tasks along with the 3D WholeBody dataset:

2D → 3D: 2D complete whole-body to 3D complete whole-body lifting

  • Use 2Dto3D_train.json for training and validation. It contains 80k 2D and 3D keypoints.

  • Use 2Dto3D_test_2d.json for test on leaderboard. It contains 10k 2D keypoints.

I2D → 3D: 2D incomplete whole-body to 3D complete whole-body lifting

  • Use 2Dto3D_train.json for training and validation. It contains 80k 2D and 3D keypoints.

  • Please apply masking on yourself during the training. The official masking strategy is as follows:

    • With 40% probability, each keypoint has a 25% chance of being masked,
    • with 20% probability, the face is entirely masked,
    • with 20% probability, the left hand is entirely masked,
    • with 20% probability, the right hand is entirely masked.
  • Use I2Dto3D_test_2d.json for test on leaderboard. It contains 10k 2D keypoints. Note that this test set is different from the 2Dto3D_test_2d.json.

RGB → 3D: Image to 3D complete whole-body prediction

  • Use RGBto3D_train.json for training and validation. It contains 80k image_path, bounding box and 2D keypoints.
  • It has the same samples from the 2Dto3D_train.json, so you can also access to 2D keypoints if needed.
  • Use RGBto3D_test_img.json for test on leaderboard. It contains 20k image_path and bounding box.
  • Note that the test sample ids are not aligned with previous 2 tasks.

Evaluation

Validation

We do not provide a validation set. We encourage researchers to report 5-fold cross-validation results with average and standard deviation values.

Evaluation on test set

We have released the the test sets of H3WB dataset.

Both 2D → 3D and I2D → 3D test sets contain 10k triplets of {image, 2D coordinates, 3D coordinates}. Note that, in order to prevent cheating on I2D → 3D task they have different test samples.

Visualization

We provide a function to visualize 3D whole-body, as well as the evaluation function for the leaderboard in this script.

Benchmark

Please refer to benchmark.md for the benchmark results.

Terms of Use

  1. This project is released under the MIT License.

  2. We do not own the copyright of the images. Use of the images must abide by the Human3.6m License agreement.

How to cite

If you find H3WB 3D WholeBody dataset useful for your project, please cite our paper as follows.

Yue Zhu, Nermin Samet, David Picard, "H3WB: Human3.6M 3D WholeBody Dataset and benchmark", ICCV, 2023.

BibTeX entry:

@InProceedings{Zhu_2023_ICCV,
    author    = {Zhu, Yue and Samet, Nermin and Picard, David},
    title     = {H3WB: Human3.6M 3D WholeBody Dataset and Benchmark},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {20166-20177}
}

Please also consider citing the following works.

@article{h36m_pami,
 author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
 title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
 journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
 publisher = {IEEE Computer Society},
 year = {2014}
} 
 
@inproceedings{IonescuSminchisescu11,
 author = {Catalin Ionescu, Fuxin Li, Cristian Sminchisescu},
 title = {Latent Structured Models for Human Pose Estimation},
 booktitle = {International Conference on Computer Vision},
 year = {2011}
}

wholebody3d's People

Contributors

davidpicard avatar nerminsamet avatar wholebody3d avatar zhuyue0324 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wholebody3d's Issues

data normalization

hello, I met some questions in data processing.
how can I convert positions_3d data to normalization data?
position_2d data can be converted to [-1,1] using cameras' parameters "res_w" and "res_h", but what's the max value of 3d_data?
I'm looking forward to your response. thank you very much .

Module import

Hello, your code in the "from common.utils import wrap;
from common.quaternion import qrot, qinverse"How to import? There is no "common" module in the source code?

Question about the task IMGTo3D

In this task, should CPN(Cascade Pyramid Network) always supply complete 2D keypoints? When it should but can't work on some actions(for example, limb occlusion), can I use the model in the task I2D -> 3D as follow-up?
The second question is when keypoint is masked,will the value of it be filled with zero?

NumPy data for test set

Hello,
could you also please provide the numpy file for the test set (2D data)? I just want to make sure that if I use the training numpy file, then for the final test set, it has the exact same normalization (if applied anny).

关于测试集

首先,非常感谢作者提供了如此重要的数据集。
我想获得RGBto3D_test_3d.json文件,请问能分享出来么?

data

03aa2d357818a49a30d598e4745ac65

Can you tell me how I can match the 3D coordinates in the json file with the image in human36M?

Inference script

Do you have a script to perform 2D/3D pose estimation on images ?

the visualize results is correct ?

Thanks for your great work!

I visualize image with your anns, the results as follow, is it correct or not?
image

This is my visualize code:

import cv2
import json
from PIL import Image
from utils.utils import json_loader_part, json_loader, draw_skeleton


data_dir = '/data/dataset/whole-body/human3.6m/annotations'
keypoints_2d, keypoints_3d = json_loader_part(data_dir)
input_list, target_list, bbox_list = json_loader(data_dir, task=3, type='train')

i = 100
img = Image.open('/data/dataset/whole-body/human3.6m/' + input_list[i])
save_path = '1.jpg'

draw_skeleton(keypoints_2d[i], save_path=save_path, background=img)

question about 3d coordinate system?

I have read #5 (comment),
but I don't understand why the math formula between camera coordinates or world coordinates shown above,is there any introduction about that for I'm readly new in this area.
Thanks a lot.

Inference model

Hi! I cannot understand how to inference model on my own data and visualize it. Thanks.

Which file, global_3d or camera_3d, should be used to train the model?

I have looked at your dataset. It is a very interesting data set, and I would like to use it.

My question is, I checked the contents of the h3wb_train.npz file and found that it contains two types of 3D pose data: global_3d and camera_3d. What is the difference between them? And which one should be used for training the 3D pose estimation model?
We are looking forward to your answer.

Question about 3d keypoint data.

When I using keypoints_3d in RGBto3D_train.json, I found that x or y value may be -1400 or less(body part, 0-16). Is the unit of measurement for the data millimeters? Where is the origin of the coordinates? I seem to not have found a clear answer in the data description.
If anyone can help me, thank you very much.

Willing to perform motion forecasting

Hi, really nice work. It was a pleasure to see you present your work. I would like to work on motion forecasting. Have you tried any model for this task ? And what .json should I use to ? Thanks

About the difference between two task

Task "2D → 3D" is based on single frame input & single frame output
While task "I2D → 3D" is based on multi-frames(or frame sequence) input & single frame output.
Is it right?

Facing Error with dimensions

hi, I have tried to use the following model of SimpleBaseline, but it is giving error.

Error:
`
torch.Size([16, 1, 266])
Traceback (most recent call last):
File ".\wholebody3d\models\SimpleBaseline.py", line 174, in
avg_loss = train_one_epoch(epoch_number, 0,training_loader, model)
File ".\wholebody3d\models\SimpleBaseline.py", line 56, in train_one_epoch
outputs = model(inputs)
File ".\wholebody3d\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".\wholebody3d\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File ".\wholebody3d\models\SimpleBaseline.py", line 27, in forward
x1 = nn.Dropout(p=0.5)(nn.ReLU()(self.bn1(self.fc1(x))))
File ".\wholebody3d\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".\wholebody3d\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File ".\wholebody3d\venv\lib\site-packages\torch\nn\modules\batchnorm.py", line 171, in forward
return F.batch_norm(
File ".\wholebody3d\venv\lib\site-packages\torch\nn\functional.py", line 2478, in batch_norm
return torch.batch_norm(
RuntimeError: running_mean should contain 1 elements not 1024

Process finished with exit code 1
`

Edited Code:
`
import os
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from datetime import datetime
from utils import json_loader_part, draw_skeleton

class model_A_simple_yet_effective_baseline_for_3d_human_pose_estimation(nn.Module):
def init(self):
super(model_A_simple_yet_effective_baseline_for_3d_human_pose_estimation, self).init()
self.upscale = nn.Linear(1332, 1024)
self.fc1 = nn.Linear(1024, 1024)
self.bn1 = nn.BatchNorm1d(1024)
self.fc2 = nn.Linear(1024, 1024)
self.bn2 = nn.BatchNorm1d(1024)
self.fc3 = nn.Linear(1024, 1024)
self.bn3 = nn.BatchNorm1d(1024)
self.fc4 = nn.Linear(1024, 1024)
self.bn4 = nn.BatchNorm1d(1024)
self.outputlayer = nn.Linear(1024, 133
3)

  def forward(self, x):
      x = self.upscale(x)
      x1 = nn.Dropout(p=0.5)(nn.ReLU()(self.bn1(self.fc1(x))))
      x1 = nn.Dropout(p=0.5)(nn.ReLU()(self.bn2(self.fc2(x1))))
      x = x + x1
      x1 = nn.Dropout(p=0.5)(nn.ReLU()(self.bn3(self.fc3(x))))
      x1 = nn.Dropout(p=0.5)(nn.ReLU()(self.bn4(self.fc4(x1))))
      x = x + x1
      x = self.outputlayer(x)
      return x

def train_one_epoch(epoch_index, tb_writer, training_loader, model):
running_loss = 0.
last_loss = 0.

  for i, data in enumerate(training_loader):
      # Every data instance is an input + label pair
      inputs, labels = data
      # Zero your gradients for every batch!
      optimizer.zero_grad()
      # Make predictions for this batch
      outputs = model(inputs)
      # Compute the loss and its gradients
      loss = loss_fn(outputs, labels)
      loss.backward()
      # Adjust learning weights
      optimizer.step()
      # Gather data and report
      running_loss += loss.item()
      if i % 1000 == 999:
          last_loss = running_loss / 1000 # loss per batch
          print('  batch {} loss: {}'.format(i + 1, last_loss))
          tb_x = epoch_index * len(training_loader) + i + 1
          # tb_writer.add_scalar('Loss/train', last_loss, tb_x)
          running_loss = 0.

  return last_loss

class CustomDataset(Dataset):
def init(self, features, labels):
self.features = features
self.labels = labels

  def __len__(self):
      return len(self.features)

  def __getitem__(self, index):
      return self.features[index], self.labels[index]

if name == "main":
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model_A_simple_yet_effective_baseline_for_3d_human_pose_estimation().to(device)

  print("====================Simulation================")
  data_path= '.\wholebody3d\datasets\json'
  input_list_train, target_list_train = json_loader_part(data_path, 1)
  input_list_test, target_list_test = json_loader_part(data_path, 2)

  input_list_train = torch.cat(input_list_train, dim=0)
  input_list_train = torch.reshape(input_list_train, (input_list_train.shape[0], 1, -1))

  target_list_train = torch.cat(target_list_train, dim=0)
  target_list_train = torch.reshape(target_list_train, (target_list_train.shape[0], 1, -1))

  input_list_test = torch.cat(input_list_test, dim=0)
  input_list_test = torch.reshape(input_list_test, (input_list_test.shape[0], 1, -1))

  target_list_test = torch.cat(target_list_test, dim=0)
  target_list_test = torch.reshape(target_list_test, (target_list_test.shape[0], 1, -1))

  print("============Creating a DataLoader===========")
  train_dataset = CustomDataset(input_list_train, target_list_train)
  validation_dataset = CustomDataset(input_list_test, target_list_test)

  print("Create data loaders for our datasets; shuffle for training, not for validation")
  batch_size = 16
  training_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=False)
  validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=batch_size, shuffle=False)


  print("============ Selecting Loss function ===========")
  loss_fn = torch.nn.CrossEntropyLoss()
 
  print("============Selecting an Optimizer===========")  
  optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)


  print("============Starting Epoch ===========")
  timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
  epoch_number = 0

  EPOCHS = 2

  best_vloss = 1_000_000.

  for epoch in range(EPOCHS):
      print('EPOCH {}:'.format(epoch_number + 1))       
      model.train(True)       
      avg_loss = train_one_epoch(epoch_number, 0,training_loader, model)
      running_vloss = 0.0
      
      model.eval()    
      with torch.no_grad():
          for i, vdata in enumerate(validation_loader):
              vinputs, vlabels = vdata
              voutputs = model(vinputs)
              vloss = loss_fn(voutputs, vlabels)
              running_vloss += vloss

      avg_vloss = running_vloss / (i + 1)
      print('LOSS train {} valid {}'.format(avg_loss, avg_vloss))
    
      # Track best performance, and save the model's state
      if avg_vloss < best_vloss:
          best_vloss = avg_vloss
          model_path = 'model_{}_{}'.format(timestamp, epoch_number)
          torch.save(model.state_dict(), model_path)

      epoch_number += 1

`

issue about data preparation script(data path issue)

Hello author, this is an great work and it brings many possibilities to 3D pose estimation. We also want to use this dataset to train our model, but we encountered a problem while using it. Image path in your annotations is 'S5/Images/Directions 2.60457274/frame_0125.jpg' . Is there anyway to convert this path into 'h36m/images/s_01_act_02_subact_01_ca_01' ? I also use other annotations about h36m, so i need unify their img_path in annotation.

about the hand dataset

hello,could you please tell me how to get the small datasets about hand which you mentioned in the paper?

How to train the modle

Your dataset is a very meaningful dataset for 3D whole-body HPE, and I would like to use it in my research .

my question is how to train the model . the code in test_leaderboard.py is
cross_validation = 0 for i in range(6): if 'cv' + str(i) in data_path: cross_validation = i

while 'cv' in the data_path, the code will train the model . but there isn't such file while contains the 'cv' in the file name in the json.zip

We are looking forward to your answer.

Could you provide a script for projecting 3D keypoints from RGBto3D_train.json to 2D coordinate points of the image?

First of all, many thanks to the authors for providing such an important dataset.

But there are still questions about how to project the 3D keypoints in RGBto3D_train.json to the 2D coordinates of the image, and it's been confusing me all day.

Here is my code below:
image
(the extrinsics and intrinsics are obtained from https://github.com/karfly/human36m-camera-parameters/blob/master/camera-parameters.json)

But got the unexpected result below:
image

Looking forward to your reply!

Can you provide the code for training `Large SimpleBaseline` for 3D whole body lift from complete 2D whole body keypoints (2D->3D)?

First, many thanks to the authors for providing such a rich dataset for 3D wholebody keypoints detection.
I used the dataset you provided to implement the 2D hands keypoints lifting task, where the input is 42 x 2 dimensional 2D coordinate points of both hands and the output is 41 x 3 3D relative coordinates(relative to left hand root).
To implement it, I used the simplebaseline model provided by mmpose for network training, where the training set was 2Dto3D_train_part1,2,3.json and tested on 2Dto3D_train_part4.json, but the MPJPE results on the test set were 1.678072452545166(m),which is very different from the result in your paper.
If it is convenient, could you share your code in implementing it, here is my email address ([email protected]) and I am looking forward to your reply.

关于数据集名称

你好,感谢你们提供了这么丰富的资源。

这个数据集的image_path就是"'S6/Images/Greeting 1.60457274/frame_1610.jpg'" 这样的。
S6是本来Human3.6M的subject循序,Greeting就是他们的action。
但是本来的Human3.6M有subaction,而在image_path没有subaction的标注。
还有image_path上的1.60457274指的是什么?

The coordinates of keypoints_3d

Thanks for your great work! I am trying to project the 3d keypoints provided in the RGBto3D_train.json with camera parameters provided in meta_data. Are the 3d keypoints in RGBto3D_train.json in camera coordinates or world coordinates ? Thank you.

Data loading

Hello, I am using this dataset, I would like to ask: the original H36M dataset has cameras.pkl for training and testing, does this dataset have a similar pkl for training?

Issue about input of task (I)2D->3D

2024-04-07 16-25-42 的屏幕截图
The annotation of 1st line indicates that the input of JointFormer should be in meter, so when I get 2D coordinates through the inference of 2D-HPE model, How to convert the 2D coordinates from pixel to meter?
Besides, when I get the final 3d output of JointFormer, how to restore it from meter to pixel? Is this the inverse process of above?

human36m&wholebody3d

I visualise looking at the 3d skeletal points of wholebody3d and human36m corresponding pictures, but they seem to be mirror symmetrical. Can you help me?

load testset

Hi, Could you help me? The previous work used to use the Human36M dataset S9, S11 as a test set, however, the test set you provided is S8. And his annotation format is not quite the same as the training set, can you provide the code for loading the test set data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.