crockwell / rel_pose Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 5.0 1.53 MB

[3DV 2022] The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

License: BSD 3-Clause "New" or "Revised" License

Python 96.45% Shell 3.55%

3d-vision computer-vision machine-learning transformer-models vision-transformer

rel_pose's People

Contributors

Stargazers

Watchers

Forkers

zebrajack jackzhousz crepejung00 triangleczh bigdatasciencegroup

rel_pose's Issues

Generating Epipolar lines

Hi @crockwell, this is a really cool work!

I am trying to generate the epipolar lines given the relative pose predicted by your model. However, I am not able to obtain correct epipolar lines -- my hunch is that I have misunderstood either the coordinate system (X,Y,Z axes) convention used by the model or maybe the rotation matrix is transposed?

This is what I am getting with my own script (which obviously looks wrong):

I was wondering if you can share the script you used to generate the epipolar line figures in the paper? That would be immensely helpful!

Thanks,
Yash

What is the input image size?

On skimming through the code it looks like it is 384 x 384, but in the supplemental paper it mentions it is 256 x 256.

Can you please confirm the input image size?

Pretrained model link not responding

The given link for the pre-trained model is not responding:
https://fouheylab.eecs.umich.edu/~cnris/rel_pose/modelcheckpoints/pretrained_models.zip

Several Questions about the paper

Hello! Thanks for open-sourcing this amazing work! Here I have some questions about the paper.

I wonder how to get the actual U^T@U given the ground-truth rotation and translation as shown in Figure 9 in the paper? In my understanding, constructing actual U^T@U requires ground truth pixel correspondences, did you use some sort of descriptor->matching->critical-correspondences-filtering pipeline to get the actual pixel matches?
I still couldn't understand why the network could learn to predict translation in actual scale, in my understanding it is unlikely to predict relative translation in actual scale by just giving 2D-pixel information as the network' input, could you kindly elaborate more on that? thanks!

Looking forward to your reply!

Regarding the depth scale

Hi,
thanks for the nice work.

I have one question regarding the depth scale in matterport. What I understood is that because matterport has rather larger translations than other datasets, the loss scale of translation error will be much bigger, which might affect negatively to the learning process.
That's why in here:

rel_pose/src/data_readers/matterport.py

Line 48 in 35d1352

rel_pose[:3] /= Matterport.DEPTH_SCALE

you divided by 5 to adjust the scale?

If this is right, then my other question is, don't we have to change the intrinsics similarly?

Thanks!

Quick question regarding Equation 2 from paper

Thank you for the interesting work. I just had a quick question regarding Equation 2 of the paper:

Previously, it is explained that $Q_1$ and $K_2$ are both $P \times D$ matrices, where $P$ is the # of patches and $D$ is the feature dimension. Then, $\text{norm}(Q_1K_2^T)$ should be of dimension $P \times P$, with the first $P$ indicating image 1 patches and the latter indicating image 2 patches. However, it seems odd to me that the $V_2$ is also matrix-multiplied on the left side, effectively matching the image 2 patches' features of $V_2$ with the image 1's attention rows of $\text{norm}(Q_1K_2^T)$. Further, if the two images had different # of patches, this operation would not work either. Could I get some insight as to why this is done? I apologize if I have misinterpreted something.

Intrinsic parameters necessary?

Hi, thanks for sharing your work, really appreciate it! I have some questions about intrinsic parameters. Is the camera intrinsic parameters necessary to this algorithm? Can this algorithm work without intrinsic and extrinsic camera parameters? How is the performance (with and without intrinsic parameters)? Because I might have some dataset and want to test, but I have no idea about the intrinsic and extrinsic camera parameters.

Thank you very much!

7scenes dataset

Was there a speicfic reason to not evalaute on 7scenes and the cambridge dataset as they are often used in related works?

EMM’s ∼ U⊤U is valid only when `self.cross_features` = True ?

Enjoyed reading your paper. Nice work!

Nevertheless, I had a doubt as I went through the code and noticed the hyper parameter cross_features which I think governs equation 2's V_i, whether it is V_i.T A V_i or V_j.T A V_i. It looks to me when self.cross_features is False, the Essential Matrix module does not really capture the matrix U⊤U. Correct? If so, why would self.cross_features=False still be helpful?

Thank you.

crockwell / rel_pose Goto Github PK

rel_pose's People

Contributors

Stargazers

Watchers

Forkers

rel_pose's Issues

Generating Epipolar lines

What is the input image size?

Pretrained model link not responding

Several Questions about the paper

Regarding the depth scale

Quick question regarding Equation 2 from paper

Intrinsic parameters necessary?

7scenes dataset

EMM’s ∼ U⊤U is valid only when `self.cross_features` = True ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent