Giter Club home page Giter Club logo

Comments (17)

MLsmaller avatar MLsmaller commented on June 2, 2024

I have been looking at your project for the past two days, but I encountered some problems. I used the Kinectv2 camera. Now I want to test the depth image captured by this camera with the model you have trained.
Which files should I execute and where should I change the code?
Thank you for your guidance

from pose-ren.

l-j-oneil avatar l-j-oneil commented on June 2, 2024

Hi,

I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.

Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py

The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py

In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

Should help with your efforts.

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024

@l-j-oneil Thank you very much! Your comments are really helpful.

@MLsmaller I think the comments from @l-j-oneil should address most of your concerns. Here are some additional comments that you may find helpful:

  1. The centers are simply obtained by calculating the centroids of the pixels that fall into a predefined range of distance. You can use https://github.com/guohengkai/region-ensemble-network/blob/master/evaluation/get_centers.py to obtain centers for ICVL, NYU and MSRA datasets.
  2. To run the hand pose estimator using Kinect V2, you also have to revise the intrinsic paramters accordingly. My be (365.456, 365.456, 254.878, 205.395) is suitable for most Kinect V2 cameras but you can also retrieve the parameters from your Kinect V2 using official SDK.

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

Hi,

I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.

Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py

The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py

In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

Should help with your efforts.

Thank you for your reply,if I only want to run this model to test some images instead of real-time project,which functions should I running?
Thanks again.

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

Hi,

I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.

Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py

The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py

In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

Should help with your efforts.

Thank you for your reply,I will try this tomorrow. I'm not very experienced in this field, I guess I may have some questions to ask you in the future. I hope you can give me your advice. Thanks again
By the way,I have your project--awesome hand pose estimation in my collection, which has helped me a lot

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

I know what you mean. I'll try it tomorrow. Thank you

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

I initially thought the/src/testing/run_images.py file was for testing images,but I didn't understand this file very well. By the way, is the/src/testing/predicti.py file used to train the new data set or something else?

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024
  • predict.py is used to generate predicted labels for pretrained models.
  • run_images.py predicts labels for a given list of images. Actually I think this script is a bit duplicated and I might remove this file in the future.
  • realsense_realtime_demo_librealsense2.py is a good choice for you to start with your own project.

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

Hi,I tried it on your suggestion,this is the file I changed according to the realtime demo file.However, the result I got is not accurate. Do I still lack some operation in the data processing part?

Here is the .py file:

#-*- coding:utf-8 -*-
import logging
logging.basicConfig(level=logging.INFO)
import numpy as np
import cv2
#import pyrealsense2 as rs
import os
import sys
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(ROOT_DIR) # config
sys.path.append(os.path.join(ROOT_DIR, 'utils')) # utils
sys.path.append(os.path.join(ROOT_DIR, 'libs')) # libs
from model_pose_ren import ModelPoseREN
import util
from util import get_center_fast as get_center

def init_device():
    # Configure depth streams
    pipeline = rs.pipeline()
    config = rs.config()
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    print 'config'
    # Start streaming
    profile = pipeline.start(config)
    depth_sensor = profile.get_device().first_depth_sensor()
    depth_scale = depth_sensor.get_depth_scale()
    print "Depth Scale is: " , depth_scale
    return pipeline, depth_scale

def stop_device(pipeline):
    pipeline.stop()
    
def read_frame_from_device(pipeline, depth_scale):
    frames = pipeline.wait_for_frames()
    depth_frame = frames.get_depth_frame()
    #if not depth_frame:
    #    return None
    # Convert images to numpy arrays
    depth_image = np.asarray(depth_frame.get_data(), dtype=np.float32)
    depth = depth_image * depth_scale * 1000
    return depth

def show_results(img, results, cropped_image, dataset):
    img = np.minimum(img, 1500)
    img = (img - img.min()) / (img.max() - img.min())
    img = np.uint8(img*255)
    # draw cropped image
    img[:96, :96] = (cropped_image+1)*255/2
    img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
    cv2.rectangle(img, (0, 0), (96, 96), (255, 0, 0), thickness=2)
    img_show = util.draw_pose(dataset, img, results)
    return img_show

def main():
    # intrinsic paramters of Kinect V2
    #fx, fy, ux, uy = 365.456, 365.456, 254.878, 205.395  #kinect v2
    fx, fy, ux, uy = 463.889, 463.889, 320, 240
    # paramters
    dataset = 'icvl'
    if len(sys.argv) == 2:
        dataset = sys.argv[1]

    lower_ = 1
    upper_ = 650     

    # init realsense
    #pipeline, depth_scale = init_device()
    # init hand pose estimation model
    hand_model = ModelPoseREN(dataset,
        lambda img: get_center(img, lower=lower_, upper=upper_),
        param=(fx, fy, ux, uy), use_gpu=True)
    # for msra dataset, use the weights for first split
    if dataset == 'msra':
        hand_model.reset_model(dataset, test_id = 0)
    # realtime hand pose estimation loop
    #depth = read_frame_from_device(pipeline, depth_scale)
    icvl_path="/home/data/cy/ICVL/test/Depth/test_seq_1/image_0050.png"
    # preprocessing depth
    depth=cv2.imread(icvl_path,2)

    depth=np.asarray(depth,np.float32)
    depth[depth == 0] = depth.max()
    print(depth)
    # training samples are left hands in icvl dataset,
    # right hands in nyu dataset and msra dataset,
    # for this demo you should use your right hand
    if dataset == 'icvl':
        depth = depth[:, ::-1]  # flip
    # get hand pose
    results, cropped_image = hand_model.detect_image(depth)
    img_show = show_results(depth, results, cropped_image, dataset)
    cv2.imwrite('./test.png', img_show)
    #stop_device(pipeline)

if __name__ == '__main__':
    main()

----I read the image from the ICVL dataset(depth image) instead of from the camera, which is the detection effect of this image

image

By the way,I don't know what depth = depth_image * depth_scale * 1000 means in the function read_frame_from_device(), what does that mean, what does the parameter depth_scale mean, and I'm wondering is that the problem?
Sincerely hope to get your reply.

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024

Hi,

  1. depth_scale is a parameter related to Realsense SR300 camera. Since you are not reading depth images from the camera, you don't need to use this parameter.
  2. There are several problems of your code when you are dealing with ICVL dataset:
fx, fy, ux, uy = 463.889, 463.889, 320, 240

These parameters are related to Kinect V2. Since you are using depth images from ICVL dataset, you should use parameters of (240.99, 240.96, 160, 120), which are parameters of the camera used to capture ICVL dataset.

if dataset == 'icvl':
depth = depth[:, ::-1] # flip

You don't have to flip the depth images for those from ICVL dataset.
In fact, we flip the depth images from Realsense camera when using ICVL pre-trained models just because images from ICVL dataset are flipped.

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

Hi,

  1. depth_scale is a parameter related to Realsense SR300 camera. Since you are not reading depth images from the camera, you don't need to use this parameter.
  2. There are several problems of your code when you are dealing with ICVL dataset:
fx, fy, ux, uy = 463.889, 463.889, 320, 240

These parameters are related to Kinect V2. Since you are using depth images from ICVL dataset, you should use parameters of (240.99, 240.96, 160, 120), which are parameters of the camera used to capture ICVL dataset.

if dataset == 'icvl':
depth = depth[:, ::-1] # flip

You don't have to flip the depth images for those from ICVL dataset.
In fact, we flip the depth images from Realsense camera when using ICVL pre-trained models just because images from ICVL dataset are flipped.

Thank you ,After your suggestion is adopted, icvl data set can be successfully detected. Why is it not ok for Nyu data set
-----This is my code:

def main():
    # intrinsic paramters of Kinect V2
    #fx, fy, ux, uy =  365.456, 365.456, 254.878, 205.395  #kinect v2
    #fx, fy, ux, uy = 240.99, 240.96, 160, 120
    
    # paramters
    dataset = 'icvl'
    fx, fy, ux, uy = util.get_param(dataset)
    if len(sys.argv) == 2:
        dataset = sys.argv[1]
        print("the model of data is {}".format(sys.argv[1]))

    lower_ = 1
    upper_ = 650     #在0-650mm范围内

    # init realsense
    #pipeline, depth_scale = init_device()
    # init hand pose estimation model
    hand_model = ModelPoseREN(dataset,
        lambda img: get_center(img, lower=lower_, upper=upper_),
        param=(fx, fy, ux, uy), use_gpu=True)
    # for msra dataset, use the weights for first split
    if dataset == 'msra':
        hand_model.reset_model(dataset, test_id = 0)
    # realtime hand pose estimation loop
    #depth = read_frame_from_device(pipeline, depth_scale)
    icvl_path="/home/data/cy/ICVL/test/Depth/test_seq_1/image_0500.png"
    nyu_path="/home/data/cy/NYU/dataset/test/depth_1_0000001.png"
    print(dataset)
    # preprocessing depth
    #depth=cv2.imread(icvl_path,2)

    #depth=np.asarray(depth,np.float32)
    #depth[depth == 0] = depth.max()
    depth=util.load_image(dataset, nyu_path, is_flip=False)
    print(depth,type(depth))
    # training samples are left hands in icvl dataset,
    # right hands in nyu dataset and msra dataset,
    # for this demo you should use your right hand
    #if dataset == 'nyu':
        #depth = depth[:, ::-1]  # flip
    # get hand pose
    results, cropped_image = hand_model.detect_image(depth)
    img_show = show_results(depth, results, cropped_image, dataset)
    cv2.imwrite('./test.png', img_show)
    #stop_device(pipeline)

if __name__ == '__main__':
    main()

This is the warning,
image

And the results do not draw the keypoints of hand.

image
Sorry to bother you again.

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024

That's because the hand is not properly segmented. Cropping hand region from original depth image is a bit different between different dataset.
If you are dealing with the predefined dataset (ICVL, NYU, MSRA15), I suggest using predict.py to get the predicted results.

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

Now i know what your mean,I just test the NYU dataset. My idea is to test the depth image that get from kinect v2,Could you please tell me whether I can directly read the depth images I have saved from kinect just like I did when testing the icvl dataset,and i have to choose the nyu model?(Because this data set was captured by the kinect camera?)
The point I was confused about was that I didn't know what changes needed to be made to the code in the file when testing the depth image obtained from kinect, because I saw that the depth map I captured was very different from the depth image in the dataset. The depth image I saved was very black and my hand was not clear

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024

First of all, you can use any pre-trained models (ICVL, NYU, MSRA or HANDS17), see some examples here.

If you want to predict hand pose for depth images captured from Kinect, all you have to do is capture a depth image from the camera (perhaps save it to local disk) and preprocess it before feeding it into the Pose-REN model.

As for the preprocess, you can use the code from realsense_realtime_demo_librealsense2.py. Again, you need to change the intrinsic parameters to those of Kinect, and perhaps you have to change the depth threshold since our code simply uses a naive depth thresholding algorithm to segment the hand.

The depth image you saved looks black is most possibly because it's saved in 16-bit format and common image viewer will show black for this image.

from pose-ren.

MLsmaller avatar MLsmaller commented on June 2, 2024

First of all, you can use any pre-trained models (ICVL, NYU, MSRA or HANDS17), see some examples here.

If you want to predict hand pose for depth images captured from Kinect, all you have to do is capture a depth image from the camera (perhaps save it to local disk) and preprocess it before feeding it into the Pose-REN model.

As for the preprocess, you can use the code from realsense_realtime_demo_librealsense2.py. Again, you need to change the intrinsic parameters to those of Kinect, and perhaps you have to change the depth threshold since our code simply uses a naive depth thresholding algorithm to segment the hand.

The depth image you saved looks black is most possibly because it's saved in 16-bit format and common image viewer will show black for this image.

Thank you very much for your reply. Now I can detect the depth image I got from kinect, but I don't know how to write python kenict API for real-time detection. The code you provided is for Realsense SR300 camera, could you please provide some guidance?Is there any open source project using kinect camera for real-time detection that can be written in C++ or python?
By the way, your awesome-hand-pose-estimation project is of great help to me. Are there any open source projects that are real-time detected by Kinect in it?My idea is similar to the open source project Sphere Meshes for Real - Time Hand Modeling and Tracking, through real-time detection is saved two hands or with two hands of video can generate a relatively robust Hand model, but the project need Hand wrist wear a blue, but also a Hand.Excuse me which open source project can compare helpful to my project?
Sincerely hope to get your answer again.

from pose-ren.

xinghaochen avatar xinghaochen commented on June 2, 2024

If you want to use C++, here is a demo for using Pose-REN in C++.

If you want to use Python, as @l-j-oneil mentioned in this issue, you can take a look at DeepPrior++. They also provide a heuristic method for detecting hands, which is better than the naive depth thresholding method we used.

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

from pose-ren.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.