Hi there, Thanks again for this work and making it publicly availabl

Full adaption mode about real-time-self-adaptive-deep-stereo HOT 4 CLOSED

YJonmo commented on June 3, 2024

Full adaption mode

from real-time-self-adaptive-deep-stereo.

Comments (4)

AlessioTonioni commented on June 3, 2024

Theoretically if I use the same images for the adaption that I used the for training, then the network should perform the same if not better.

Not really because the loss used for the adaptation is far from being perfect and is actually very noisy. So what you are experiencing is completely reasonable. When you train with gt you have perfect supervision and no noise in the training process, when you use the self-supervised loss you inject noise in the training process and the performance starts to decrease. Self-supervision makes sense only if you don't have any other source of ground truth.

The interesting point is that if I use my own stereo endoscope images for the Full mode adaptation, then after a few hundred iterations the result starts to look good.

Again this is expected because you deploy the network on completely different images, so the initial performance is quite bad. Even if the self-supervised loss is very noisy it will still provide reasonable training signals that should improve the prediction from really bad to something better.

So I am not sure pretraining the network using the Blender images make any improvement in depth estimation compared with if I just had used the unsupervised approach such as Godard 2017.

In my experience, the pretraining helps to converge faster once you start to train in a new environment. Think of it like pretraining a classification network on imagenet and then fine-tuning for a different dataset. If you start from scratch on real images without any kind of precise supervision I think you will get worse performance, but you can definitely try it.

from real-time-self-adaptive-deep-stereo.

YJonmo commented on June 3, 2024

Thanks for the feedback!

By the way I am also trying to get the pose of the camera in the environment.
Is there any approach to to find the camera pose similar to your approach of pretraining on ground-truth and then adapting to real stereo images without the pose?
The Monodepth2 by Godard estimates the pose but it is unsupervised.

Thanks

from real-time-self-adaptive-deep-stereo.

AlessioTonioni commented on June 3, 2024

As far as I know no, but the simplest thing would be to take something similar to the pose estimation network from Monodepth2 by Godard and add a supervised loss on the pose of the camera. Assuming that you have groundtruth for the pose of the camera in a trainign set.

from real-time-self-adaptive-deep-stereo.

YJonmo commented on June 3, 2024

thank you

from real-time-self-adaptive-deep-stereo.

Full adaption mode about real-time-self-adaptive-deep-stereo HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent