Giter Club home page Giter Club logo

Comments (57)

571502680 avatar 571502680 commented on August 14, 2024 1

To be honest, the segmentation accuracy is bad and it limits our performance.
The pose estimation accuracy mainly benefits from the vector representation for object keypoints.

However,why is the vector representation for object keypoints only used during testing rather than training?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024 1

Sorry for late reply.
Our method is a deep learning method, which is good at exploiting global context for detecting keypoints, while feature-based matching methods typically only use the local features.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

mask-rcnn architecture is an alternative.
It is more compact to predict semantic labels and vectors simultaneously

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

mask-rcnn architecture is an alternative.
It is more compact to predict semantic labels and vectors simultaneously

What is the accuracy of the segmentation?good or bad?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

To be honest, the segmentation accuracy is bad and it limits our performance.
The pose estimation accuracy mainly benefits from the vector representation for object keypoints.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

Hough voting is a procedure without parameters to be learned, so we only need to train the network to output the vector field.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

I found that your LINEMOD-phone dataset(number:1225) is smaller than the LINEMOD_ORIG phone dataset(number:1243). Why?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

I did not notice this problem.
The LINEMOD dataset we use is provided by https://github.com/Microsoft/singleshotpose

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

“val epoch 200 step 200 seg 0.00217727 ver 0.00254667 precision 0.95366901 recall 0.96332896 ”
how do i get 2Dprojection and ADD(-S)?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

python tools/train_linemod.py --cfg_file configs/linemod_train.json --linemod_cls cat --test_model

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

About draw_utils.py, I found that the input to some of its functions requires special handling. Could you call them to write in demo.py? I believe this will make it easier for others to understand the paper.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

“mask_pred = torch.argmax(seg_pred, 1)
visualize_mask(mask_pred.detach().cpu().numpy(), mask.detach().cpu().numpy(), save=False, save_fn=None)”
I can only implement the display of the mask.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

OK, I will add visualize_mask, visualize_vertex, visualize_hypothesis and visualize_voting_ellipse recently.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

I upload a jupyter notebook to visualize the keypoint detection pipeline: https://github.com/zju3dv/pvnet#visualization-of-the-voting-procedure

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

Is this key point detection method based on Hough voting affected by texture information?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

No, if compared with other deep learning methods.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

About ransac_voting_layer_v3 in ransac_voting_gpu.py,I don't understand the principle very much,why can it get hypothetical points from existing coordinates? does especially ransac_voting, ransac_voting.generate_hypothesis have python version?Because C++ version is not easy to debug.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

We generate hypotheses as described in our paper.
ransac_voting.generate_hypothesis do not have python version.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

I have interest in Real-time-demo in th Project Page,how does it realise that the cat interacts with the hat?Is the hat's pose the same as the cat?and is the hat's model pose the same as the cat's model pose?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

The hat's pose is the same as the cat.

  1. I imported the cat model and hat model in blender.
  2. At first, the hat model is not on the cat's head. To solve this problem, I manually moved it to the head position and save it as a new model.
  3. Now the hat is on the cat's head in the 3D space, resulting in the interaction effect given the same 6D pose.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

About the Real-time-demo in th Project Page, i use 3DXchange to build hat and cat model but cann't moved hat to the cat head position and save them as a new model.What is your software used? I'm sure that i need your help.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

I used blender.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

Well, how is the spatial coordinate of the model set? Is it relative to the camera coordinate system or the world coordinate system? How is the world coordinate system determined?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

The world coordinate system of blender.
I do not manually adjust the world coordinate system.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024
   If the projection of the key points on the surface of the object is occluded, will the projection of the key points at the occlusion position be voted out at this time? If not, how do you perform PnP calculations without the projection of key points?
   My understanding of voting is to find the voting results in all known foreground coordinates, rather than finding the projected coordinates of the key points in the unknown space.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

The motivation of our paper is to handle invisible keypoints.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

“My understanding of voting is to find the voting results in all known foreground coordinates, rather than finding the projected coordinates of the key points in the unknown space.”So is my understanding correct?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

We find both known and unknown coordinates.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

What does the indicator function II in Equation 2 specifically mean?I cann't find in this paper.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

In this paper?

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

Of course!as follow:

where II represents the indicator function, � is a threshold (0.99 in all experiments), and p 2 O means that the pixel p belongs to the object O. Intuitively, a higher voting scoremeans that a hypothesis is more confident as it coincides with more predicted directions.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

Indicator function means 1 if satisfy the condition otherwise 0.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

In ransac_voting_kernel.co---generate_hypothesis_kernel(),why is the Initialization of 'hvi' threadIdx.x + blockIdx.x*blockDim.x?I can't understand it!

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

It's CUDA programming.
I first define the GPU layout by getGPULayout(hn*vn,1,1,&bdim0,&bdim1,&bdim2,&tdim0,&tdim1,&tdim2);.
Then I can get the hvi-th thread by threadIdx.x + blockIdx.x*blockDim.x.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024
if(fabs(nx1*ny0-nx0*ny1)<1e-6) return;
if(fabs(ny1*nx0-ny0*nx1)<1e-6) return;

Why do you want to return if two condition are met?

float y=(nx1*(nx0*cx0+ny0*cy0)-nx0*(nx1*cx1+ny1*cy1))/(nx1*ny0-nx0*ny1);
float x=(ny1*(nx0*cx0+ny0*cy0)-ny0*(nx1*cx1+ny1*cy1))/(ny1*nx0-ny0*nx1);

What is the prototype of this formula?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

We represent a line in the Hessian Normal Form.
Then compute the intersection of two lines.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

In the paper > This step is repeated N times to generate a set of hypotheses
So i think if there are n pixels of the target object,N=n!/(2!(n-2)!),which means that n points combine with each other.But i find hvi <hnvn,it is different with mine.Does it mean that you randomly take hnvn hypothetical points rather than n!/(2!(n-2)!)?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

Yes

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

nx0 = direct[t0 * vn * 2 + vi * 2 + 1]

ny0 = -direct[t0 * vn * 2 + vi * 2]

n should be unit vector of pixel P and 2D keypoint Xk rather than normal vector,but n should be normal vector in the Hessian Normal Form.I'm so confused!

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

nx0 = direct[t0 * vn * 2 + vi * 2 + 1]
ny0 = -direct[t0 * vn * 2 + vi * 2]

This gives a normal vector.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

How is the dense_pts.txt in your dataset sampled? I compared it to object.xyz in the LINEMOD_ORIG and found that the two are not the same.

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

Use CloudCompare.

from pvnet.

hz-ants avatar hz-ants commented on August 14, 2024

What format files can we get with CloudCompare for our use

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

What do you mean by 'what format files'

from pvnet.

hz-ants avatar hz-ants commented on August 14, 2024

For example: *. txt, *. ply or other coordinate files?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

They are all available for CloudCompare.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

I have been thinking about it recently: If the object is missing texture, will the method based on the key point method be affected? What is the difference between it and the feature-based matching method?

from pvnet.

miksoft123 avatar miksoft123 commented on August 14, 2024

Hi, I wonder how can I train my own object. do I need a 3D object + its texture + mask image and so on? If I can't is there any other method to reuse pretrained model for test quickly?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

You need a 3D object and images with ground truth poses.
I am not sure if the pretrained model can generalize to new object.

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

In the ransac_voting_layer_v3.py function,

Coords = torch.nonzero(cur_mask).float()
Coords = coords[:, [1, 0]]

Why do you want to reverse the x, y coordinates?

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

coords is [row, col].
Reverse [row, col] to get [x, y].

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

I want to implement the visual code shown in this figure, but without success, can you upload this part of the code in demo.py?
voting visual code

from pvnet.

pengsida avatar pengsida commented on August 14, 2024
for ci,corner in enumerate(corners):
    dir_img=img.copy()
    dir_img[mask>0]//=2
    dir_img[mask>0]+=np.asarray([255,255,255],np.uint8)//2
    plt.imshow(dir_img)
    for hi in range(h):
        for wi in range(w):
            if mask[hi,wi]==0: continue
            if hi%5==0 and wi%5==0:
                diff=np.asarray(corner)-np.asarray([wi,hi])
                diff/=np.linalg.norm(diff)
                diff*=7
                # plt.arrow(wi,hi,diff[0]*5,diff[1]*5,width=0.5,linewidth=0.5)
                plt.annotate("",xy=(wi+diff[0],hi+diff[1]),xytext=(wi,hi),
                             arrowprops={'arrowstyle':'->,head_length=0.3,head_width=0.3','color':'red'})
    plt.show()

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

This idea is really amazing!Thank you~

from pvnet.

571502680 avatar 571502680 commented on August 14, 2024

When testing the Occlusion-Linemod dataset, is it also the same model trained using the Linemod dataset?
测试

from pvnet.

pengsida avatar pengsida commented on August 14, 2024

Yes.

from pvnet.

YC0315 avatar YC0315 commented on August 14, 2024

In the paper > This step is repeated N times to generate a set of hypotheses So i think if there are n pixels of the target object,N=n!/(2!(n-2)!),which means that n points combine with each other.But i find hvi <hn_vn,it is different with mine.Does it mean that you randomly take hn_vn hypothetical points rather than n!/(2!(n-2)!)?
您好,
1、请问您了解这里选取两个向量求交点,是任意选取的两个向量吗?
2、if(fabs(ny1nx0-ny0nx1)<1e-6) return;这里是不是定了如果两条直线的斜率相近的话,就不取这两条直线求交点呢?

from pvnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.