Comments (8)
Interesting ideas! Thanks for sharing these.
I'm surprised that the disp_multi
output is so blurry. I would guess that there might be a bug somewhere in how your multiple views are being preprocessed or used. It might be worth carefully checking tensorboard images (e.g. checking cost volume minimums) to check that sensible things are being done when it comes to intrinsics, extrinsics etc.
Using multiple views
I would say to start with that the idea of using the backwards camera (in reverse) seems very sensible – a good idea! I would avoid using the side cameras though, at least to start with; as you point out the pose network is going to have a hard job there, and those cameras are going to be seeing quite different things to what the front and back cameras are seeing.
Cropping with intrinsics
I agree you should have to change the intrinsics when you crop – but I'm not sure I quite follow the logic you're using here:
# The center points are the original center points + (0.5 * the number of cropped pixels on the bottom) - (0.5 * the number of pixels cropped on the top)
I'm also not sure of all the conventions used in Berkley.
Overall – when you crop an image like so:
cropped_image = uncropped_image[crop_top:(crop_top + crop_height), crop_left:(crop_left + crop_width)]
my understanding is that the principle point changes as follows:
cropped_cx = uncropped_cx - crop_left
cropped_cy = uncropped_cy - crop_top
, and the focal lengths don't change at all. Perhaps you could check that this is happening in your code?
Different image sizes
In theory – yes! But in practice this introduces a lot of potential for hard-to-find bugs in intrinsics and extrinsics especially. So make sure a simple version (e.g. where you only use sequences with one single image size) works well first!
from manydepth.
Finished updating
Hi!
I have been able to test out some of the ideas I mentioned. This has resulted in some interesting results.
Using multiple views
I have tested a bit using the backward-facing camera simultaneously with the front-facing camera. I reverted the temporal order for the backward-facing camera and treated the images as a separate "scene," consisting of between 100 and 120 frames. I have provided a GIF displaying the images in a scene:
Here, I have cropped the image so that the bottom part of the vehicle is not showing, resulting in ~300px. I have also cropped the top part with the same amount. The total number of samples in my training set is now ~35 000.
When training with this dataset, I now get some interesting-looking disparity images:
These results look slightly like the results mentioned in your paper about moving objects and using the baseline model without the consistency loss. However, from my understanding, the disparity maps generated from a single image, in that case, were looking OK.
Cropping
I noticed that my train of thought might be a bit vague. I'm unsure whether the principal point should represent coordinates in the original image or for the cropped one. E.g. if I have an image of 1200x1000, with an cx of 600 and cy of 500. If I crop it by 300 pixels on the top and bottom, resulting in a 1200x400 image. Should my cx and cy now represent coordinates in the 1200x1000 image, leaving the values the same, or should it represent coordinates in the new image, resulting in cx as 600 and cy now at 250?
I will try to change my intrinsic calculation to your suggestions, and retrain the network:
def load_intrinsics(self, folder, frame_index):
path = pathlib.Path(self.data_path + folder).parent
cam_name = folder.split('/')[-1]
K = np.fromfile(f'{path}/{cam_name}_k_matrix.npy')
K = K.reshape(3, 3)
fx = K[0, 0]
cx = K[0, 2]
fy = K[1, 1]
cy = K[1, 2]
cropped_height = self.full_res_shape[1] - self.crop_value[3] - self.crop_value[1]
cropped_width = self.full_res_shape[0] - self.crop_value[2] - self.crop_value[0]
intrinsics = np.array([[fx, 0, cx, 0],
[0, fy, cy - self.crop_value[3], 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]).astype(np.float32)
intrinsics[0, 0] /= self.full_res_shape[0]
intrinsics[1, 1] /= self.full_res_shape[1]
intrinsics[0, 2] /= cropped_width
intrinsics[1, 2] /= cropped_height
return intrinsics
Btw, these are the training parameters used:
- height 224
- width 608
- freeze_teacher_epoch 12
- batch_size 12
from manydepth.
Nice! Yes the disp_multi you have here look much more sharp than you posted before. Did you change anything else?
from manydepth.
The results are from training with a larger dataset (with the added backward-facing camera images) and changing the intrinsic matrix accordingly to my first comments. No changes other than those.
from manydepth.
Super – thanks for reporting back on this. Very interesting results.
I hope now that disp_multi gives results on a par with (or ideally better than) disp_mono.
from manydepth.
Do you have any thoughts on the cars (moving objects) that are detected as far away? I have noticed this behavior both when using front only and backward/forward cameras. I also noticed it for another dataset I am training on as well (DDAD from Toyota Research Institute) when only using the front-facing camera.
from manydepth.
Yes – this 'hole punching' behaviour is pretty common when training on monocular videos with moving objects.
This is discussed in some detail in the monodepth2 paper ('Auto-Masking Stationary Pixels' section), and, to a lesser extent, the ManyDepth paper.
Automasking in monodepth2 helps a little with these, but doesn't solve completely. You might want to look to some more recent works e.g. [1] if they are causing you significant bother. (Or perhaps consider a more hacky solution e.g. using semantics with some heuristics)
[1] Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, and Anelia Angelova. Unsupervised monocular depth learning in dynamic scenes. In CoRL, 2020
from manydepth.
It's not really a big problem at the moment, but it surely is something that I'll look into improving if possible! Thank you so much for your help and input! Really appreciate you taking your time with your detailed answers! :)
from manydepth.
Related Issues (20)
- Stereo + Tempora (training with stereo video like monodepth2)
- About cityscapes disparity to depth_gt HOT 2
- Can the scale ambiguity be resolved with multi-frame approach? HOT 1
- Depth Estimation Results on Single Frames
- Question of relative pose in matching augmentation.
- how to test many frames at the same time?
- Help training out bright reflection... HOT 1
- About scale problem in monocular setting HOT 2
- Why the training time of manydepth is much shorter than monodepth2? HOT 3
- when i test the image,the result of the depth is black HOT 1
- About MAX gt_depth HOT 2
- Get gt depth for other Cityscape images HOT 2
- Calculate the gt depth for Cityscape images HOT 3
- About qualitative results presented in Figure 4 HOT 1
- The dataset GT-Calculation HOT 1
- what should the
- How to conduct multi GPU training
- On the pre training model of cityscapes HOT 3
- custom dataset prepare
- test-time refinement code
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from manydepth.