In this toolbox, homographies on hpatches are evaluated with accuracy, while recent pa

Loftr with the new testing setting as found by <a class="user-mention notranslate" dat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Feature Request: Add homography AUC about image-matching-toolbox HOT 19 CLOSED

Parskatt commented on August 22, 2024

Feature Request: Add homography AUC

from image-matching-toolbox.

Comments (19)

GrumpyZhou commented on August 22, 2024 3

Loftr with the new testing setting as found by @georg-bn :

>>Finished, pairs=540 match_failed=0 matches=2599.4 match_time=0.21s
==== Image Matching ====
#Features: mean=2599 min=94 max=5759
#(Old)Matches: a=2599, i=3204, v=2038
#Matches: a=2599, i=3204, v=2038
MMA@[ 1  3  5 10] px:
a=[0.63 0.91 0.93 0.95]
i=[0.68 0.95 0.96 0.98]
v=[0.59 0.86 0.91 0.92]

==== Homography Estimation ====
Hest solver=cv est_failed=0 ransac_thres=3 inlier_rate=0.86
Hest Correct: a=[0.64 0.87 0.92 0.95]
i=[0.81 0.98 0.99 1.  ]
v=[0.49 0.77 0.85 0.91]
Hest AUC: a=[0.36 0.65 0.75 0.84]
i=[0.48 0.79 0.87 0.93]
v=[0.25 0.52 0.64 0.76]

Comments: Not sure whether other methods also got improved in such a setting. And I am also not sure that it is an equal comparison to other methods, since in this setting both the ground truth and data have been changed.
But still!
Thank you all for the discussion and tryings to reproduce the published numbers.

from image-matching-toolbox.

georg-bn commented on August 22, 2024 1

I can't check at the moment, but it sounds like it might be due to Ransac. Perhaps if you increase the number of iterations it becomes more constant?

from image-matching-toolbox.

Parskatt commented on August 22, 2024 1

@TruongKhang

Hi Truong, this is the uncertainty we got when running HPatches Homography estimation. I would say +-0.5 is reasonable.

from image-matching-toolbox.

Parskatt commented on August 22, 2024

Actually I'm now unsure what they mean with AUC... But the OpenCV comment still holds.

Related issue:
zju3dv/LoFTR#136

from image-matching-toolbox.

GrumpyZhou commented on August 22, 2024

Hi @Parskatt ,

Thanks for pointing this out. I have updated the HPatches evaluation code to support:

Report both the cornor error in percentages at specific thresholds and the AUC of the corner errors at those thresholds
Optional switch between the pydegensac solver and opencv solver, degensac is the default one since it normally leads to better performance for almost all methods in my experience.
The new code example:

python -m immatch.eval_hpatches --gpu 0 \
    --config 'superpoint' 'superglue' 'caps_superpoint'  \
    --task 'both' --h_solver 'degensac' --ransac_thres 2 \
    --root_dir . --odir 'outputs/hpatches'

The example output for SuperGlue with degensac is:

>>>> Method=SuperGlue Default config: {'class': 'SuperGlue', 'weights': 'outdoor', 'sinkhorn_iterations': 20, 'nms_radius': 4, 'match_threshold': 0.2, 'max_keypoints': 1024, 'imsize': -1} Thres: [0.2]
Matching thres: 0.2  Save to: None

>>>>Eval hpatches: task=matching+homography method=SuperGlue_r4 rthres=2.0 thres=[1, 3, 5, 10] 
==== Image Matching ====
#Features: mean=921 min=212 max=1024
#(Old)Matches: a=491, i=522, v=462
#Matches: a=491, i=522, v=462
MMA@[ 1  3  5 10] px:
a=[0.44 0.85 0.94 0.98]
i=[0.5  0.86 0.96 0.99]
v=[0.39 0.85 0.93 0.97]

==== Homography Estimation ====
Hest solver=degensac est_failed=2 ransac_thres=2.0 inlier_rate=0.86
Hest Correct: a=[0.5  0.82 0.9  0.96]
i=[0.58 0.93 0.99 1.  ]
v=[0.43 0.72 0.82 0.92]
Hest AUC: a=[0.24 0.56 0.68 0.81]
i=[0.28 0.64 0.77 0.88]
v=[0.21 0.48 0.6  0.74]

The AUC results of the several methods that I quickly tested:

# Degensac  with ransac threshold = 2 at error thresholds [3, 5, 10] px
SuperPoint: 0.53 0.65 0.78
SuperPoint+SuperGlue:  0.56 0.68 0.81
D2Net: 0.45 0.58 0.73
R2D2: 0.53 0.64 0.76
Patch2Pix (w.NCNet): 0.53 0.65 0.78
CAPS (w.SuperPoint): 0.53 0.66 0.78
LoFTR (all matches): 0.59 0.69 0.8 

# Opencv with ransac threshold = 2 at error thresholds [3, 5, 10] px
SuperPoint: 0.37 0.51 0.68
SuperPoint+SuperGlue: 0.39 0.53 0.71
CAPS (w.SuperPoint): 0.33 0.49 0.67
LoFTR (all matches):   0.48 0.6  0.74

Feel free try out other settings. While I manage to produce similar or better AUC results of SuperPoint+SuperGlue, D2Net and R2D2 to those reported in Tab.1 of LoFTR paper, I didn't manage to reproduce it with LoFTR using either all matches or top1k matches.
If you get reply from zju3dv/LoFTR#136, I can then figure out how to update the code for LoFTR.

from image-matching-toolbox.

Parskatt commented on August 22, 2024

Awesome! Thanks.
I am very confused regarding how AUC was actually implemented in either SuperGlue or LoFTR, hopefully they will respond soon.
I know that they used ransac threshold=3, but I find it difficult to believe that that would increase the AUC with such a high amount.

I'll give an update if they respond.

from image-matching-toolbox.

Parskatt commented on August 22, 2024

@GrumpyZhou LoFTR additionally scales their homographies to fit images resized so that their shorter side is 480. I'm not sure how much this matters, but I know that some images are very large, so it could make a difference.

from image-matching-toolbox.

GrumpyZhou commented on August 22, 2024

Hi @Parskatt ,
I followed the modification:

Resize the image shorter side to 480
Set ransac threshold to 3

AUC@ 1 3 5 10 px:
LoFTR (all matches, opencv): [0.23 0.49 0.61 0.74]
LoFTR (all matches, degensac) [0.25 0.55 0.67 0.79]
LoFTR (1k matches, opencv): [0.27 0.5  0.61 0.74]
LoFTR (1k matches, degensac) [0.3  0.55 0.67 0.79]

Hm...I don't think this is an issue with the AUC metric implementation. Also considering I have used this repo to reproduce LoFTR results on Aachen benchmark, it is most likely to be a setting difference on HPatches. Directly using their reported number or asking the authors to provide the code to reproduce those numbers is the simplest solution if you want to compare your method to LoFTR.

I am not sure I will have time to dig into this and try to reproduce that table 😅.

from image-matching-toolbox.

Parskatt commented on August 22, 2024

Thanks for all your work :)

from image-matching-toolbox.

TruongKhang commented on August 22, 2024

@Parskatt , I also want to reproduce the results of LoFTR for HPatches. Do you find the solution for this problem?

from image-matching-toolbox.

Parskatt commented on August 22, 2024

@TruongKhang Sorry, no, I was not able to reproduce their results. I think it's better to compare to their method on MegaDepth and Scannet, which is reproducible.

from image-matching-toolbox.

TruongKhang commented on August 22, 2024

@Parskatt I wonder which settings they used in order to make a huge difference of performance like that. I'll try to ask them in more detail. Most methods were trained on Megadepth and then evaluated on HPatches. I think an evaluation on HPatches is necessary

from image-matching-toolbox.

georg-bn commented on August 22, 2024

I have managed to get quite close to the numbers reported in the paper, see #22 . @Parskatt & @TruongKhang, you were probably right when you suggested that they rescale the homography and compute the errors in the downscaled images.

from image-matching-toolbox.

TruongKhang commented on August 22, 2024

wow, awesome @georg-bn @GrumpyZhou! I tried to do something like that I couldn't get good results. I might do wrong somewhere. Thank you for fixing the code!

from image-matching-toolbox.

georg-bn commented on August 22, 2024

I think it makes sense to calculate the errors in the downscaled images since the aim is to make all images of approximately equal importance in the final score. As long as the other methods are evaluated in the same way I believe this to be fair.

It would make even more sense to make all the images the exact same size, e.g. in SuperPoint they are resized to 640x480 (or 480x640 depending on which side is largest). I tried this and obtained:

>>Finished, pairs=540 match_failed=0 matches=2547.3 match_time=0.12s
==== Image Matching ====
#Features: mean=2547 min=94 max=4235
#(Old)Matches: a=2547, i=3168, v=1971
#Matches: a=2547, i=3168, v=1971
MMA@[ 1  3  5 10] px:
a=[0.63 0.91 0.93 0.95]
i=[0.68 0.96 0.96 0.98]
v=[0.59 0.86 0.91 0.92]

==== Homography Estimation ====
Hest solver=cv est_failed=0 ransac_thres=3.0 inlier_rate=0.87
Hest Correct: a=[0.64 0.87 0.92 0.95]
i=[0.82 0.98 0.99 1.  ]
v=[0.47 0.77 0.85 0.9 ]
Hest AUC: a=[0.36 0.65 0.75 0.84]
i=[0.5  0.79 0.87 0.93]
v=[0.24 0.52 0.64 0.76]

Furthermore, by increasing the matching threshold from 0.2 to 0.5 I got with the "short side=480"-setting:

>>Finished, pairs=540 match_failed=0 matches=2238.2 match_time=0.12s
==== Image Matching ====
#Features: mean=2238 min=7 max=5571
#(Old)Matches: a=2238, i=2873, v=1649
#Matches: a=2238, i=2873, v=1649
MMA@[ 1  3  5 10] px:
a=[0.67 0.94 0.96 0.97]
i=[0.71 0.98 0.99 1.  ]
v=[0.63 0.9  0.93 0.94]

==== Homography Estimation ====
Hest solver=cv est_failed=0 ransac_thres=3.0 inlier_rate=0.91
Hest Correct: a=[0.64 0.87 0.92 0.95]
i=[0.83 0.98 0.99 1.  ]
v=[0.47 0.76 0.86 0.91]
Hest AUC: a=[0.37 0.65 0.75 0.84]
i=[0.51 0.79 0.87 0.93]
v=[0.25 0.52 0.64 0.76]

And with matching threshold 0.5 and all images scaled to 640x480 or 480x640:

>>Finished, pairs=540 match_failed=0 matches=2194.5 match_time=0.12s
==== Image Matching ====
#Features: mean=2195 min=7 max=4199
#(Old)Matches: a=2195, i=2845, v=1590
#Matches: a=2195, i=2845, v=1590
MMA@[ 1  3  5 10] px:
a=[0.67 0.94 0.96 0.97]
i=[0.71 0.98 0.99 1.  ]
v=[0.63 0.9  0.93 0.94]

==== Homography Estimation ====
Hest solver=cv est_failed=0 ransac_thres=3.0 inlier_rate=0.90
Hest Correct: a=[0.64 0.87 0.92 0.95]
i=[0.83 0.98 0.99 1.  ]
v=[0.46 0.77 0.86 0.9 ]
Hest AUC: a=[0.38 0.66 0.76 0.85]
i=[0.51 0.8  0.88 0.94]
v=[0.25 0.52 0.64 0.76]

This last run does match the numbers reported in the paper, but I have no idea if the settings actually are "correct" - probably not as I don't see why they would have changed the matching threshold. I will try asking the authors again in zju3dv/LoFTR#136

from image-matching-toolbox.

TruongKhang commented on August 22, 2024

@georg-bn, if we can reproduce the results as approximately the same as in the paper, I think we can directly get the reported results in the paper for comparison. The matching threshold is a tuning hyper-parameter and changing it in the evaluation on other datasets is normal, I think

from image-matching-toolbox.

georg-bn commented on August 22, 2024

@TruongKhang Yes, I suppose you are correct.

from image-matching-toolbox.

georg-bn commented on August 22, 2024

Some final(?) observations: Changing to only using top 1000 matches worsened the performance somewhat, but this can be compensated by setting the confidence in cv2.findHomography to 0.99999, which is the RANSAC-confidence used in the LoFTR pose estimation experiments.

from image-matching-toolbox.

TruongKhang commented on August 22, 2024

@georg-bn, I find that the results are changed a little every time I re-run the HPatches evaluation. Do you have this problem?

from image-matching-toolbox.

Feature Request: Add homography AUC about image-matching-toolbox HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent