Comments (17)
I have been looking at your project for the past two days, but I encountered some problems. I used the Kinectv2 camera. Now I want to test the depth image captured by this camera with the model you have trained.
Which files should I execute and where should I change the code?
Thank you for your guidance
from pose-ren.
Hi,
I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.
Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py
The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py
In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].
I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py
Should help with your efforts.
from pose-ren.
@l-j-oneil Thank you very much! Your comments are really helpful.
@MLsmaller I think the comments from @l-j-oneil should address most of your concerns. Here are some additional comments that you may find helpful:
- The centers are simply obtained by calculating the centroids of the pixels that fall into a predefined range of distance. You can use https://github.com/guohengkai/region-ensemble-network/blob/master/evaluation/get_centers.py to obtain centers for ICVL, NYU and MSRA datasets.
- To run the hand pose estimator using Kinect V2, you also have to revise the intrinsic paramters accordingly. My be (365.456, 365.456, 254.878, 205.395) is suitable for most Kinect V2 cameras but you can also retrieve the parameters from your Kinect V2 using official SDK.
from pose-ren.
Hi,
I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.
Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py
The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py
In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].
I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py
Should help with your efforts.
Thank you for your reply,if I only want to run this model to test some images instead of real-time project,which functions should I running?
Thanks again.
from pose-ren.
@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.
from pose-ren.
Hi,
I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.
Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py
The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py
In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].
I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py
Should help with your efforts.
Thank you for your reply,I will try this tomorrow. I'm not very experienced in this field, I guess I may have some questions to ask you in the future. I hope you can give me your advice. Thanks again
By the way,I have your project--awesome hand pose estimation in my collection, which has helped me a lot
from pose-ren.
@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.
I know what you mean. I'll try it tomorrow. Thank you
from pose-ren.
@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.
I initially thought the/src/testing/run_images.py file was for testing images,but I didn't understand this file very well. By the way, is the/src/testing/predicti.py file used to train the new data set or something else?
from pose-ren.
- predict.py is used to generate predicted labels for pretrained models.
- run_images.py predicts labels for a given list of images. Actually I think this script is a bit duplicated and I might remove this file in the future.
- realsense_realtime_demo_librealsense2.py is a good choice for you to start with your own project.
from pose-ren.
@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.
Hi,I tried it on your suggestion,this is the file I changed according to the realtime demo file.However, the result I got is not accurate. Do I still lack some operation in the data processing part?
Here is the .py file:
#-*- coding:utf-8 -*-
import logging
logging.basicConfig(level=logging.INFO)
import numpy as np
import cv2
#import pyrealsense2 as rs
import os
import sys
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(ROOT_DIR) # config
sys.path.append(os.path.join(ROOT_DIR, 'utils')) # utils
sys.path.append(os.path.join(ROOT_DIR, 'libs')) # libs
from model_pose_ren import ModelPoseREN
import util
from util import get_center_fast as get_center
def init_device():
# Configure depth streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
print 'config'
# Start streaming
profile = pipeline.start(config)
depth_sensor = profile.get_device().first_depth_sensor()
depth_scale = depth_sensor.get_depth_scale()
print "Depth Scale is: " , depth_scale
return pipeline, depth_scale
def stop_device(pipeline):
pipeline.stop()
def read_frame_from_device(pipeline, depth_scale):
frames = pipeline.wait_for_frames()
depth_frame = frames.get_depth_frame()
#if not depth_frame:
# return None
# Convert images to numpy arrays
depth_image = np.asarray(depth_frame.get_data(), dtype=np.float32)
depth = depth_image * depth_scale * 1000
return depth
def show_results(img, results, cropped_image, dataset):
img = np.minimum(img, 1500)
img = (img - img.min()) / (img.max() - img.min())
img = np.uint8(img*255)
# draw cropped image
img[:96, :96] = (cropped_image+1)*255/2
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
cv2.rectangle(img, (0, 0), (96, 96), (255, 0, 0), thickness=2)
img_show = util.draw_pose(dataset, img, results)
return img_show
def main():
# intrinsic paramters of Kinect V2
#fx, fy, ux, uy = 365.456, 365.456, 254.878, 205.395 #kinect v2
fx, fy, ux, uy = 463.889, 463.889, 320, 240
# paramters
dataset = 'icvl'
if len(sys.argv) == 2:
dataset = sys.argv[1]
lower_ = 1
upper_ = 650
# init realsense
#pipeline, depth_scale = init_device()
# init hand pose estimation model
hand_model = ModelPoseREN(dataset,
lambda img: get_center(img, lower=lower_, upper=upper_),
param=(fx, fy, ux, uy), use_gpu=True)
# for msra dataset, use the weights for first split
if dataset == 'msra':
hand_model.reset_model(dataset, test_id = 0)
# realtime hand pose estimation loop
#depth = read_frame_from_device(pipeline, depth_scale)
icvl_path="/home/data/cy/ICVL/test/Depth/test_seq_1/image_0050.png"
# preprocessing depth
depth=cv2.imread(icvl_path,2)
depth=np.asarray(depth,np.float32)
depth[depth == 0] = depth.max()
print(depth)
# training samples are left hands in icvl dataset,
# right hands in nyu dataset and msra dataset,
# for this demo you should use your right hand
if dataset == 'icvl':
depth = depth[:, ::-1] # flip
# get hand pose
results, cropped_image = hand_model.detect_image(depth)
img_show = show_results(depth, results, cropped_image, dataset)
cv2.imwrite('./test.png', img_show)
#stop_device(pipeline)
if __name__ == '__main__':
main()
----I read the image from the ICVL dataset(depth image) instead of from the camera, which is the detection effect of this image
By the way,I don't know what depth = depth_image * depth_scale * 1000 means in the function read_frame_from_device(), what does that mean, what does the parameter depth_scale mean, and I'm wondering is that the problem?
Sincerely hope to get your reply.
from pose-ren.
Hi,
- depth_scale is a parameter related to Realsense SR300 camera. Since you are not reading depth images from the camera, you don't need to use this parameter.
- There are several problems of your code when you are dealing with ICVL dataset:
fx, fy, ux, uy = 463.889, 463.889, 320, 240
These parameters are related to Kinect V2. Since you are using depth images from ICVL dataset, you should use parameters of (240.99, 240.96, 160, 120), which are parameters of the camera used to capture ICVL dataset.
if dataset == 'icvl':
depth = depth[:, ::-1] # flip
You don't have to flip the depth images for those from ICVL dataset.
In fact, we flip the depth images from Realsense camera when using ICVL pre-trained models just because images from ICVL dataset are flipped.
from pose-ren.
Hi,
- depth_scale is a parameter related to Realsense SR300 camera. Since you are not reading depth images from the camera, you don't need to use this parameter.
- There are several problems of your code when you are dealing with ICVL dataset:
fx, fy, ux, uy = 463.889, 463.889, 320, 240
These parameters are related to Kinect V2. Since you are using depth images from ICVL dataset, you should use parameters of (240.99, 240.96, 160, 120), which are parameters of the camera used to capture ICVL dataset.
if dataset == 'icvl':
depth = depth[:, ::-1] # flipYou don't have to flip the depth images for those from ICVL dataset.
In fact, we flip the depth images from Realsense camera when using ICVL pre-trained models just because images from ICVL dataset are flipped.
Thank you ,After your suggestion is adopted, icvl data set can be successfully detected. Why is it not ok for Nyu data set
-----This is my code:
def main():
# intrinsic paramters of Kinect V2
#fx, fy, ux, uy = 365.456, 365.456, 254.878, 205.395 #kinect v2
#fx, fy, ux, uy = 240.99, 240.96, 160, 120
# paramters
dataset = 'icvl'
fx, fy, ux, uy = util.get_param(dataset)
if len(sys.argv) == 2:
dataset = sys.argv[1]
print("the model of data is {}".format(sys.argv[1]))
lower_ = 1
upper_ = 650 #在0-650mm范围内
# init realsense
#pipeline, depth_scale = init_device()
# init hand pose estimation model
hand_model = ModelPoseREN(dataset,
lambda img: get_center(img, lower=lower_, upper=upper_),
param=(fx, fy, ux, uy), use_gpu=True)
# for msra dataset, use the weights for first split
if dataset == 'msra':
hand_model.reset_model(dataset, test_id = 0)
# realtime hand pose estimation loop
#depth = read_frame_from_device(pipeline, depth_scale)
icvl_path="/home/data/cy/ICVL/test/Depth/test_seq_1/image_0500.png"
nyu_path="/home/data/cy/NYU/dataset/test/depth_1_0000001.png"
print(dataset)
# preprocessing depth
#depth=cv2.imread(icvl_path,2)
#depth=np.asarray(depth,np.float32)
#depth[depth == 0] = depth.max()
depth=util.load_image(dataset, nyu_path, is_flip=False)
print(depth,type(depth))
# training samples are left hands in icvl dataset,
# right hands in nyu dataset and msra dataset,
# for this demo you should use your right hand
#if dataset == 'nyu':
#depth = depth[:, ::-1] # flip
# get hand pose
results, cropped_image = hand_model.detect_image(depth)
img_show = show_results(depth, results, cropped_image, dataset)
cv2.imwrite('./test.png', img_show)
#stop_device(pipeline)
if __name__ == '__main__':
main()
And the results do not draw the keypoints of hand.
from pose-ren.
That's because the hand is not properly segmented. Cropping hand region from original depth image is a bit different between different dataset.
If you are dealing with the predefined dataset (ICVL, NYU, MSRA15), I suggest using predict.py to get the predicted results.
from pose-ren.
Now i know what your mean,I just test the NYU dataset. My idea is to test the depth image that get from kinect v2,Could you please tell me whether I can directly read the depth images I have saved from kinect just like I did when testing the icvl dataset,and i have to choose the nyu model?(Because this data set was captured by the kinect camera?)
The point I was confused about was that I didn't know what changes needed to be made to the code in the file when testing the depth image obtained from kinect, because I saw that the depth map I captured was very different from the depth image in the dataset. The depth image I saved was very black and my hand was not clear
from pose-ren.
First of all, you can use any pre-trained models (ICVL, NYU, MSRA or HANDS17), see some examples here.
If you want to predict hand pose for depth images captured from Kinect, all you have to do is capture a depth image from the camera (perhaps save it to local disk) and preprocess it before feeding it into the Pose-REN model.
As for the preprocess, you can use the code from realsense_realtime_demo_librealsense2.py. Again, you need to change the intrinsic parameters to those of Kinect, and perhaps you have to change the depth threshold since our code simply uses a naive depth thresholding algorithm to segment the hand.
The depth image you saved looks black is most possibly because it's saved in 16-bit format and common image viewer will show black for this image.
from pose-ren.
First of all, you can use any pre-trained models (ICVL, NYU, MSRA or HANDS17), see some examples here.
If you want to predict hand pose for depth images captured from Kinect, all you have to do is capture a depth image from the camera (perhaps save it to local disk) and preprocess it before feeding it into the Pose-REN model.
As for the preprocess, you can use the code from realsense_realtime_demo_librealsense2.py. Again, you need to change the intrinsic parameters to those of Kinect, and perhaps you have to change the depth threshold since our code simply uses a naive depth thresholding algorithm to segment the hand.
The depth image you saved looks black is most possibly because it's saved in 16-bit format and common image viewer will show black for this image.
Thank you very much for your reply. Now I can detect the depth image I got from kinect, but I don't know how to write python kenict API for real-time detection. The code you provided is for Realsense SR300 camera, could you please provide some guidance?Is there any open source project using kinect camera for real-time detection that can be written in C++ or python?
By the way, your awesome-hand-pose-estimation project is of great help to me. Are there any open source projects that are real-time detected by Kinect in it?My idea is similar to the open source project Sphere Meshes for Real - Time Hand Modeling and Tracking, through real-time detection is saved two hands or with two hands of video can generate a relatively robust Hand model, but the project need Hand wrist wear a blue, but also a Hand.Excuse me which open source project can compare helpful to my project?
Sincerely hope to get your answer again.
from pose-ren.
If you want to use C++, here is a demo for using Pose-REN in C++.
If you want to use Python, as @l-j-oneil mentioned in this issue, you can take a look at DeepPrior++. They also provide a heuristic method for detecting hands, which is better than the naive depth thresholding method we used.
I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py
from pose-ren.
Related Issues (20)
- About Data Layer HOT 4
- hi HOT 1
- caffe-pose HOT 1
- the result of kinect realtime test is very terrible! HOT 4
- build demo error HOT 4
- Hi. I have some question about data augmentation and total learning time. HOT 1
- Can I test both hands? HOT 1
- what does “annotation dir” mean? I can’t find this dir in caffe-pose
- caffe-pose HOT 1
- Configurations for RealSense 415 and 435 cameras HOT 11
- About format training data
- CUDA error while loading network HOT 1
- Training code HOT 2
- Running with hands17 net HOT 1
- about hand size and data augmentation HOT 5
- FPS and system specification HOT 1
- About training HOT 3
- src/testing/predict.py HOT 5
- Will the caffe be essentially first-built? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pose-ren.