Giter Club home page Giter Club logo

deep-prior's People

Contributors

moberweger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-prior's Issues

data augmentation

How do we do augmentation correctly?
We have to do image jitter and the corresponding coordination transformation at the same time? It's not convenient.

Loading depth map of ICVL data

Hi, @moberweger

It seems in the code importer.py , you use this function to load depth image in ICVL dataset.

def loadDepthMap(self,filename):
"""
Read a depth-map
:param filename: file name to load
:return: image data of depth image
"""

    img = Image.open(filename)  # open image
    assert len(img.getbands()) == 1  # ensure depth image
    imgdata = np.asarray(img, np.float32)

    return imgdata

However, ICVL depth images are png images with 3 channels. I don't understand how the above function can load the depth image with float32 numbers as output. I suppose the loading method here should be the same as in NYUimporter:

def loadDepthMap(self,filename):
"""
Read a depth-map
:param filename: file name to load
:return: image data of depth image
"""

    img = Image.open(filename)
    # top 8 bits of depth are packed into green channel and lower 8 bits into blue
    assert len(img.getbands()) == 3
    r, g, b = img.split()
    r = np.asarray(r, np.int32)
    g = np.asarray(g, np.int32)
    b = np.asarray(b, np.int32)
    dpt = np.bitwise_or(np.left_shift(g, 8), b)
    imgdata = np.asarray(dpt, np.float32)

    return imgdata

But the code actually works well with ICVL data. Could you help to explain ?

Thank you.

problem when i run a node

when i run a node with roslaunch i have it
Could not locate the DepthSense SDK +++++++++++
Please install the SDK, create an softkinectic overlay
and recompile the softkinetic_camera package, see README.txt.

but i guess install that correctly
please help me

Confusion in the use of poseNet in real time

Hi. I am trying to reproduce your model in Tensorflow and Keras, however, I am confused about the use of poseNet. In the first phase, you train poseNet model type = 0, in real-time you load the trained model, but you call it with type = 11, which has not been previously trained, and ask it to predict. I do not understand why we try to predict with a model that has not been trained, nor what was the purpose of training the previous one. I think maybe it is a kind of transfer learning, but I am not sure.
Please help me to understand.

Looking forward to your answers

Input and output size of the CNN layers

Hi @moberweger ,
I am checking the input and output dimension of each layer in the pose estimation cnn (PoseNet).
Here is the list of layer input and output based on my understanding of the code. (I separate conv layer and pooling layer here for easy understanding)
layer 1: convolutional_layer input 128 * 128 * 1 output 124 * 124 * 8, 5 * 5conv , 8 feature maps; (padding is valid)
layer 2: max_pooling_layer with relu, input 124 * 124 * 8, output 31 * 31 * 8, 4 * 4pooling (stride = 1)
layer 3: convolutional_layer, input 31 * 31 * 8 output 27 * 27 * 8, 5 * 5 conv, 8 feature maps (padding is valid)
layer 4: max_pooling_layer with relu input 27 * 27 * 8 output 13138, 2 * 2pooling (strid e = 1)
...
It seems in the input of layer 4 has odd number of width and height, which can not be divided by the poolsize 2. so the last row or column of the input was just dropped ? or do I miss anything here ?

Many thanks.

TypeError: Cannot convert Type TensorType(float32, 4D) ...

Hi, Thank @moberweger for sharing the code .

I have got the following error while trying to run the code:
TypeError: Cannot convert Type TensorType(float32, 4D) (of Variable Subtensor{int64:int64:}.0) into Type TensorType(float64, 4D). You can try to manually convert Subtensor{int64:int64:}.0 into a TensorType(float64, 4D).

The error occurs on both ICVL and NYU dataet.

Any idea how to fix it?

Thank you very much !

How to get the xyz coordinates

Hi,
I am new at programming, i use Kinect v1 to run test_realtimepipeline.py and how can i get the xyz coordinates of each joint?

Thank you.

test_1 and test_2 in NYU dataset

Hi,

Thank you very much for this code. I found your paper very interesting and I am trying to reproduce the basic training step.

I was trying to retrain the network using python main_nyu_posereg_embedding.py and I ran into the following issue : it tries to load the test_1 and test_2 sequences, and the folders data/test_1 and data/test_2 do not exits.
When I downloaded the NYU dataset I have only a train and a test dataset, did you make a custom split, or am I missing something in the initial configuation steps ?

Best,

Yana

IOError: [Errno 2] No such file or directory: './NYU_network_prior.pkl'

i make the test_1 and test_2 folder and when i run the code i have this . help me please
ali@ali-X555LD:~/Desktop/code/deep-prior-master/src$ sudo python test_realtimepipeline.py
Loading cache data from ./cache//NYUImporter_test_1_False_False_cache.pkl
/home/ali/Desktop/code/deep-prior-master/src/net/convpoollayer.py:301: UserWarning: DEPRECATION: the 'ds' parameter is not going to exist anymore as it is going to be replaced by the parameter 'ws'.
pooled_out = pool_2d(input=conv_out, ds=poolsize, ignore_border=True)
Traceback (most recent call last):
File "test_realtimepipeline.py", line 63, in
poseNet.load("./NYU_network_prior.pkl")
File "/home/ali/Desktop/code/deep-prior-master/src/net/netbase.py", line 414, in load
handle = opener(filename, 'rb')
IOError: [Errno 2] No such file or directory: './NYU_network_prior.pkl'

Hi I have a question about hand detection module

Hi thanks for sharing your code

I have a question about your hand detection module which is used in most of the hand pose estimation papers recently.

According to 372 line of src/data/importers.py,

dpt, M, com = hd.cropArea3D(gtorig[0],size=config['cube'],docom=True)

you used gtorig[0] for com of cropArea3D function.

So, I`m curious about whether you used groundtruth palm position(gtorig[0]) when you crop the hand from original depth image even in test stage or not.

problem with roslaunch

when i run a node with roslaunch then
ERROR: cannot launch node of type [softkinetic_camera/softkinetic_bringup_node]: softkinetic_camera
ROS path [0]=/opt/ros/indigo/share/ros
ROS path [1]=/opt/ros/indigo/share
ROS path [2]=/opt/ros/indigo/stacks
ERROR: cannot launch node of type [softkinetic_camera/softkinetic_bringup_node]: softkinetic_camera
ROS path [0]=/opt/ros/indigo/share/ros
ROS path [1]=/opt/ros/indigo/share
ROS path [2]=/opt/ros/indigo/stacks

please help me to solve it

Hand detection module is slow

Hi @moberweger ,

When I test the code, I've found that most of the computation time (about 50%) is spent on hand detection module. I guess this is due to the com refinement network during hand detection. Is this refinement important for final performance ?

Thank you .

i cant find a package

cameradevice.py", line 30, in
import lib_dscapture as dsc
ModuleNotFoundError: No module named 'lib_dscapture'
how i can install the lib_dscapture ???

use kinect

hi i use kinect and i have this
help me please

self.ctx = openni.Context()
AttributeError: 'module' object has no attribute 'Context'

thanks for helping

problem with run the code

hi when i run the code then
Traceback (most recent call last):
File "test_realtimepipeline.py", line 51, in
Seq2 = di.loadSequence('test_1')
File "/home/ali/Desktop/deep-prior-master/src/data/importers.py", line 553, in loadSequence
mat = scipy.io.loadmat(trainlabels)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio.py", line 141, in loadmat
MR, file_opened = mat_reader_factory(file_name, appendmat, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio.py", line 64, in mat_reader_factory
byte_stream, file_opened = _open_file(file_name, appendmat)
File "/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio.py", line 39, in _open_file
return open(file_like, 'rb'), True
IOError: [Errno 2] No such file or directory: '../data/NYU//test_1/joint_data.mat'

please help me to solve it

Hi, I have a question when using tensorflow

Hi, thanks for your code sharing, but because your code is written in theano, and I want to change the code to tensorflow, because I want to make some changes in the code,but I am not familiar with Theano.
However, I only write the network using tensorflow, not change the network, and feed the pre-processed data into the network directly, when the training phase began, the loss is so high, arrving at six-digit figures, I read your code many times, and I am confused why your code return the cost only 0.19.... firstly. It is a great difference, can you tell me why this difference happen? Do you do other normalization operation ?

I only modified the main_icvl_com_refine.py file, no other changes, hope for your help !

Some questions about refinement using ICVL dataset.

Hi @moberweger ,
I read your paper carefully and I have some questions in refinement.
1.In the paper,you said:"We will refer to this architecture as ORRef, for Refinement with Overlapping Regions. It uses as input several patches of different sizes but all centered on the joint location predicted by the first stage." But when I read the code in main_icvl_com_refine.py I can't find that how you did this,I tried saving the resized images,but I found that the patches are just centered on the original depth-map......
2.In the paper,you said:"To further improve the accuracy of the location estimates, we iterate this refinement step several times,by centering the network on the location predicted at the previous iteration."And I can't find this in the code...
Can you give me some advice?Thank you!

the result in the test phase (main_nyu_posereg_embedding)

I was using the main_nyu_posereg_embedding code in tensorflow, The output is:
Training
epoch 100, batch_num 1135, Minibatch Loss= 1.9201
Testing ...
Mean error: 483.2774520085452mm, max error: 682.6990653506009mm
Testing baseline
Mean error: 33.98014831542969mm
Mean error: 615.936767578125mm
I've tried to change all of the parameters, but I can't reduce the error value.This problem has been bothering me for a long time.So I hope you can tell me,Is the result correct? thank you very much!

Question about handdetector module

Hi, @moberweger ,
In handdetector.py module, line 192, while project to 2D:
xstart = int(numpy.floor((com[0] * com[2] / self.fx - size[0] / 2.) / com[2]*self.fx))
Only when I take Zstart = Zcom (=com[2]) , I can derive out the same equation and cancel out ux.
However, I think it should be Zstart = Zcom - size[2]/2 ? Or you have made an approximation Zstart ~= Zcom here ?
Thanks.

use the camera

Which file must be run to use the camera And do code for a my hand ??

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.