nianticlabs / manydepth Goto Github PK

View Code? Open in Web Editor NEW

596.0 18.0 83.0 9.71 MB

[CVPR 2021] Self-supervised depth estimation from short sequences

License: Other

Python 100.00%

kitti cityscapes depths monodepth depth-estimation pytorch cvpr2021 cvpr self-supervised-learning self-supervised

manydepth's People

Contributors

Stargazers

Watchers

Forkers

miaomiaomeng mfkiwl goldenwoman fcntes fengziyue tsurherman daydreamer2023 maribax zhangzaibin gaoqiangwu jie311 jlqzzz v-mehta syed-cbot drericebert wut128 kinggreat24 cudnn yas trendingtechnology rettyov arthasmil elerson pw22-sbn-01 gridl eyebies winday00 songzhan66 patruh malesilver leifengsoul liangji-l pilotier brandonwagstaff shuweishao kobayashiaki midpushgogo jamesliang819 xuzhiyuan29 quan5609 anhvth zhangxuelei86 adelbennaceur tommy0812 syedsajidhussain ricklentz morbi25 chanochang metavai wangqingl yijunwu tne-ai kotanimasaya bingai yangfan-nreal less-lab-uva marineroboticsgroup tlwzzy aartykov jinraekim jarvislee0423 myegos wencheng256 aleky-g sunggoojung cyrano67 jiangjudging worryfreey wangjiyuan9 andreimihalea hyunyongjeon cv-depth graphopti bkanchan6 chunihiro isjhan fanrz egol2 wendaligit fardinayar zhangzw12319 jinlovespho

manydepth's Issues

inconsistent and incorrect scale in predicted depth

Hi there, thanks for sharing your codes. I am trying to train and predict my own dataset.
The predicted depth jpeg image looks fine in eye. However, when I try to reconstruct the depth image into pointcloud in camera coordinate, the points are inconsistent in the same object and in the whole scene.

For example, the upper pole on the right are stretched, which is not obvious in the jpeg image. Also, the scale of the car is not normal.

I think one of the reason is that the scale of depth is not correct locally and globally.

Have your group noticed these issues? I am really appreciated if you could give me some advices!

RGB image

Depth image

Pointcloud - the poles are twisted

Pointcloud - the car is streched

Training with Lyft

Hi, I have some questions regarding training with a custom dataset.

(I noticed that my issue became a bit lengthy, so here's a TL;DR):

Can I use images pointing in more than one direction to increase the number of samples in the dataset?
Do I need to modify the intrinsic matrix when cropping the images
Can I use images from different cameras, with different dimensions, but points in the same direction?

How much memory does this cost?

As far as I know, cost volume in multi view stereo costs much memory like 11G. So what about this method?

How can I save predicted depth map?

It's a great work! But I got one question.

After evaluating the model, how can I save the predicted depth map?
I notice that there is an option 'eval_split' in the code and I think it can save predicted depth map.
If I set 'eval_split' is 'benchmark', an error occured:
`
Traceback (most recent call last):
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hzc/manydepth/manydepth/evaluate_depth.py", line 371, in
evaluate(options.parse())
File "/home/hzc/manydepth/manydepth/evaluate_depth.py", line 158, in evaluate
for i, data in tqdm.tqdm(enumerate(dataloader)):
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hzc/manydepth/manydepth/datasets/mono_dataset.py", line 157, in getitem
folder, frame_index + i, side, do_flip)
File "/home/hzc/manydepth/manydepth/datasets/kitti_dataset.py", line 65, in get_color
color = self.loader(self.get_image_path(folder, frame_index, side))
File "/home/hzc/manydepth/manydepth/datasets/kitti_dataset.py", line 82, in get_image_path
self.data_path, folder, "image_0{}/data".format(self.side_map[side]), f_str)
KeyError: None

Detach the reprojection loss mask needed?

Hi,

Thanks a lot for this open repo and your interesting paper! I have a question wrt a detail in your code. In trainer.py line 607

reprojection_loss = reprojection_loss * reprojection_loss_mask 
reprojection_loss = reprojection_loss.sum() / (reprojection_loss_mask.sum() + 1e-7)

I wonder does thisreprojection_loss_mask need to be detached?

Thanks!

Hi, I would like to make two small suggestions about the evaluate code

Thanks for your awesome work!

The first suggestion is to use transforms.InterpolationMode.LANCZOS instead of Image.ANTIALIAS to avoid the UserWarning. In the Image.py of PIL we can find LANCZOS = ANTIALIAS = 1, so the performance will not change after the replacement.

manydepth/manydepth/datasets/mono_dataset.py

Line 53 in 44e2cb8

self.interp = Image.ANTIALIAS

UserWarning:

/usr/local/lib/python3.8/dist-packages/torchvision/transforms/transforms.py:280: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.

The second suggestion is about tqdm.

manydepth/manydepth/evaluate_depth.py

Line 158 in 44e2cb8

for i, data in tqdm.tqdm(enumerate(dataloader)):

The correct way to use it is for i, data in enumerate(tqdm.tqdm(dataloader)):, and this will allow the progress bar to display correctly. 😄

The results of cityscapes

Hi, @JamieWatson683 @daniyar-niantic

I evaluate the model with CityScapes (512x192) and got the same number as that in Table 3 in your paper (IEEE) and paper (arxiv).

Is the resolution in Table 3 a clerical error? (416x128 in the paper, 512x192 in the weights)
Is there a revised version of the paper available for reference?

Input channels for input_features for PoseDecoder

Hi, I got a question regarding the input_features data for PoseDecoder network.

From the line below, the PoseDecoder accepts an input feature with number of channels equal to self.num_ch_enc[-1], which according to the ResnetMultiImageInput encoder, should be 512.
self.convs[("squeeze")] = nn.Conv2d(self.num_ch_enc[-1], 256, 1)

However, the output features of the ResnetEncoder have the following shapes, which means that only the last element of the features array is accepted by the PoseDecoder?:
torch.Size([1, 64, 320, 96])
torch.Size([1, 64, 160, 48])
torch.Size([1, 128, 80, 24])
torch.Size([1, 256, 40, 12])
torch.Size([1, 512, 20, 6])

Perhaps I am reading the code wrongly, so I appreciate if anyone could explain if to me. Thank you so much!

Feature request: Tensorflow lite version

first congratulations on the project, and thank you for sharing this research and the models.

I am interested in a tensorflow lite version of this model, I appreciate if you can share it in case you have it. I would like to test it on a device android with very little resources.

batch size 8 got abs rel 0.130

Hi, thanks for your great work and sharing code.
I have a V100 GPU but I cannot start training with batch size 12. The maximum value of batch size I can use is 8. And I didn't change other parameters. Then I got a bad performance.

How should I do to get a better performance?

About monodepth teacher

May I ask how do you init the teacher monodepth2 network in your training? Do you load a pretrained monodepth weight or trained from scratch(not exactly scratch, but from the pretrained resnet)?

I have these questions because I think my monodepth result is good, but when I close pose net and use my pose ground truth, the mono predict after training with manydepth is not as good as much, so does the multi predict result of course.

Thank you.

Can't get custom dataset to perform as well as monodepth2

I'm running into issues getting manydepth to produce good results with a custom dataset. The same dataset on monodepth2 works pretty well (though dynamic objects are incorrect). I'm using the same camera intrinsics as for monodepth2 so pretty sure they're correct. I've tried freezing the teacher it at 5 epochs as well but it produces the same results.

I've got about 26k pairs of images at 20 fps at 640x416 resolution.

python -m manydepth.train --dataset <custom> --data_path ../my-dataset/ --batch_size 9 --log_frequency 5 --height 416 --num_workers 16 --png --freeze_teacher_epoch 5

Original:

Manydepth:

Monodepth2:

Depth map scale for KITTI data

What is the scaling factor needed to get metric depth maps from output disparity maps with the KITTI dataset?

I see that a lot of the code is from monodepth2 including using the same disparity to depth transformation when predicting for KITTI images, that is; disp_to_depth with default values 0.1 and 100, followed by scaling with the KITTI stereo factor of 5.4. Using these default values the transformation can be summarised by the following formula

depth = 5.4 / (0.01+9.99*disparity)

However using this same transformation on the output of manydepth results in depth maps with completely different scales to that of the monodepth2 depth maps. For example the output of test_sequence_target.jpg on the manydepth KITTI_HR model using multi mode has the following statistics:

output	max value	mean	median	min value
raw disparity	`0.651358`	`0.247255`	`0.187170`	`0.027917`
depth map	`18.6921`	`3.23547`	`2.87261`	`0.828594`

Compare this with the output of running the same image on the monodepth2 mono+stereo_1024x320 model:

output	max value	mean	median	min value
raw disparity	`0.114764`	`0.037749`	`0.026548`	`0.006090`
depth map	`76.2298`	`20.6049`	`19.6213`	`4.66927`

The same can be seen for any images in the KITTI dataset.

Clearly because the scale of the raw output disparities is very different there needs to be a different scale applied when transforming into depth, but I can't find anywhere in the code what this should be. Is there a known value to scale the depths maps for KITTI images so that depth is in a metric scale, or at least they more match the scale used by monodepth2 for KITTI images?

How to debug when training with "python -m manydepth.train"

Thanks for the impressive work! I have a question for the tools you use for debugging manydpeth. As you use "python -m manydepth.train" to specify the level of path, I can't find a way to debug using VsCode because no runnable .py file can be used to start running the code. So can you give some insights to debug the code?
Thanks a lot!

Moving object from left to right

Thanks for the excellent work.

I see you use self-supervised training to deal with the cost_volume overfitting, so the network can predict fine with multi-frame when there is a moving object moving in front of and in the same direction of the camera, like front cars.

I also test with your model, to predict with multi-frame, when an object is moving from left to right in front of the camera, I thought is would give a wrong result, but the result is just fine, which is what I don't understand. I think in this case, the cost volume will have two image sections--which are regions of the moving object in two frames--that cannot find a match in any depth. So why does it still predict fine? Can I trust this result?

Just to demostrate, I post these two image, but this is not what I use to train/predict:

parameters of the model

Hi,
I have a question about the amount of parameters and computation of the model（MR for KITTI）, and should the amount of parameters and computation of the teacher network also be included?

Can you provide more ablations about mask?

Hi，thank you for your excellent work！
The mask you proposed is very useful, but I still have some questions.
In the body of the paper, your loss is:

but have you tried to remove (1-M) for Lp ? like this:

or remove M and (1-M) both:

And another question: in your ablation you show that the performance of Manydepth(with motion masking, w/o teacher) is much worse than the performance of Manydepth(w/o motion masking), what's your thoughts of this phenomenon? I think adding the motion masking to Lp will make the model better, because the model will not attempt to reproject moving objects, but the results seem to get much worse.

Stereo + Temporal

Hi:

Thank you for sharing this wonderful work!

In the Monodepth2 you tried the Monocular, Stereo, and M+S, have you try the stereo in this manydepth setting? does it has much performance boost over monocular?

Thank you!

About test-time-refinement (TTR)

Hi @JamieWatson683, thank you for this very exciting project! May I ask you a question: Do you provide code for the test-time-refinement (TTR) as shown in the main table of the Results section? If so, how to use that for my own sequence?

Depth estimation from underwater monocular video sequences

Hi@mdfirman @daniyar-niantic ,
Thanks for your work!
I tested your model in the underwater data set, but the effect is not very good. after debugging, the loss function drops normally, and the pose network can work normally, but the final result is very strange.the depth data is almost between 0.01-0.15m. I want to ask whether is the model doesn't work for this type of dataset,?here are some images from my dataset, do you know what's the problem?Thanks!

about confidence_mask

I don't think I understand what confidence_mask is and what is this function doing:

    def compute_confidence_mask(self, cost_volume, num_bins_threshold=None):
        """ Returns a 'confidence' mask based on how many times a depth bin was observed"""

        if num_bins_threshold is None:
            num_bins_threshold = self.num_depth_bins
        confidence_mask = ((cost_volume > 0).sum(1) == num_bins_threshold).float()

        return confidence_mask

Is this just the same with 1-missing_mask?
Can you please explain it? Or does this have any explanation in the paper? Thanks!

Contact person / email

Hi,
thanks for the interesting paper.
I found a small issue in the content of the paper and would like to discuss it with the main contact, but couldn't find any contact details in the paper.

Looking forward to hearing from you
Yevhen

"Normal" Training Loss and Strange Test Result

After fixing the "--png" bug, I also faced difficulties in reproducing good results.

Training Loss

with command

CUDA_VISIBLE_DEVICES=0 python3 -m manydepth.train --data_path /home/kitti_raw/ --log_dir workdirs/ --model_name manydepth --png

which is quite normal (I don't know what it is expected to be but that is reasonable at least).

Test Results

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 |                                                                                                                              
&   0.454  &   4.961  &  12.336  &   0.607  &   0.288  &   0.541  &   0.754  \\

which is of course a wrong one.

Tensorboard Validation events

I can't detect bug from here.

Local Modification of Codes

For codes, I modified the datasets/mono_dataset.py on the color augmentation part in compatibility with the new torchvision (which does not seems to be the main problem).

I also modified the export_gt scripts (I don't find the original script works because the splits are on the upper level folder of the script).

Strange training loss and test result

I ran

CUDA_VISIBLE_DEVICES=0 python3 -m manydepth.train --data_path /home/kitti_raw/ --log_dir workdirs/ --model_name manydepth

The loss is extremely small.

The result on the 12 epoch (it should be reasonable at this moment), but is not.

abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 |
&   0.443  &   4.757  &  12.083  &   0.588  &   0.303  &   0.561  &   0.766  \\

get_depth never enabled

Hi, and thank you for your contribution!

I have earlier trained monodepth2 with the Lyft dataset with success, and I'm trying to train manydepth with the same dataloader (with some modifications e.g., for the new load_intrinsics() function.). When using gt depth generated from lidar scans from an onboard lidar, I noticed that the functionality is never called, even though check_depth() returns True. After looking in the MonoDataset, I noticed on line 192 that it seems this functionality is disabled. Is this intentional?

From MonoDataset;

        if self.load_depth and False:
            depth_gt = self.get_depth(folder, frame_index, side, do_flip)
            inputs["depth_gt"] = np.expand_dims(depth_gt, 0)
            inputs["depth_gt"] = torch.from_numpy(inputs["depth_gt"].astype(np.float32))

I tried removing the additional False, but it seems that the lidar data in the Lyft dataset does not have points divisible by 4, as per this ValueError:

  File "/cluster/work/didriksg/depth_detection/manydepth/manydepth/kitti_utils.py", line 70, in generate_depth_map
    velo = load_velodyne_points(velo_filename)
  File "/cluster/work/didriksg/depth_detection/manydepth/manydepth/kitti_utils.py", line 16, in load_velodyne_points
    points = np.fromfile(filename, dtype=np.float32).reshape(-1, 4)
ValueError: cannot reshape array of size 555895 into shape (4)

I suppose a solution here is to drop the three last/first points so that the number of points is divisible by 4?

I also have some questions regarding some suspicious-looking loss, but I will look a bit more into it and possibly post it in a separate issue.

how to choose the best freeze_teacher_epoch parameter?

Hi, thanks for sharing the great work!

But I'm confused that when I trained with custom dataset, how to choose the best freeze_teacher_epoch parameter? Does this have anything to do with the amount of data?

Looking forward to your reply. Thank you!

Question About Input Image Size

Hi there, I am trying to train the model using KITTI Raw dataset images. I realize that the KITTI raw images have a resolution of 1242 x 375 while the default image settings for the model is 640 x 192. Do I have to resize all the KITTI raw images to 640 x 192 before using them for training? Thank you for your advice!

Supplementary materials

Hi nianticlabs:

Thank you for sharing this amazing research project to the community!
just one question, where can we access the supplementary materials for this paper?

thank you!

sincerely
Ziyue Feng

How to get the error map between predicted depth and the GT depth map

Hi,

I'm very interested in the error map visualization in Fig.4 of your paper. Do you use the projected LIDAR point cloud as the GT, or the improved ground truth image in KITTI for error computation? I wonder whether you conduct the interpolation to the GT depth map?

Can you provide the code to show the error map? Thank you a lot :D

how to do the ablation study without teacher

Hi, I want to try to do your ablation study without the teacher network.
It is "ManyDepth (with motion masking, w/o teacher) 0.154" in table 4 in your paper.
I mention there are choices about freezing the teacher network. How to get the results of models without the teacher.

Bug in intrinsics re-scaling?

I noticed you are updating intrinsics like so:

K[0, :] *= self.width // (2 ** scale)
K[1, :] *= self.height // (2 ** scale)

Aren't you supposed to multiply the intrinsics by the ratio of the new_shape / orig_shape? If you are resizing your img to be self.width // (2 ** scale), then shouldn't it be K[0, :] *= (self.width // (2 ** scale * orig_width)) where orig_width is the original width of the image before resizing?

What you are doing here seems to be just multiplying the intrinsics by the size of the new image. That cannot be. Am I mis-reading the code?

AssertionError: size of input tensor and input format are different.

Traceback (most recent call last):
  File "/home/hzc/anaconda3/envs/cas/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/hzc/anaconda3/envs/cas/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hzc/manydepth/manydepth/train.py", line 16, in <module>
    trainer.train()
  File "/home/hzc/manydepth/manydepth/trainer.py", line 211, in train
    self.run_epoch()
  File "/home/hzc/manydepth/manydepth/trainer.py", line 242, in run_epoch
    self.log("train", inputs, outputs, losses)
  File "/home/hzc/manydepth/manydepth/trainer.py", line 742, in log
    consistency_target, self.step)
  File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/tensorboardX/writer.py", line 608, in add_image
    image(tag, img_tensor, dataformats=dataformats), global_step, walltime)
  File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/tensorboardX/summary.py", line 283, in image
    tensor = convert_to_HWC(tensor, dataformats)
  File "/home/hzc/anaconda3/envs/cas/lib/python3.6/site-packages/tensorboardX/utils.py", line 103, in convert_to_HWC
    tensor shape: {}, input_format: {}".format(tensor.shape, input_format)
AssertionError: size of input tensor and input format are different.         tensor shape: (1, 3, 192, 640), input_format: CHW

I don't edit any codes. But an error occurs when I train. How can I solve this?
Thanks in advance.

A tiny bug during training

Hi, thanks for your great work!

I find a tiny bug which will influence the decay of learning rate: In the function train() in train.py, when epoch reaches freeze_teacher_epoch, it will reset the optimizer and lr_scheduler, which makes epoch reset to 0 in lr_scheduler's view.

I have proved that the lr will never decay in normal training, because step_size=15 and when epoch == 15, lr_scheduler is reset.

I fixed it and trained a new model under the same condition, getting the following results,

	abs_rel	sq_rel	rmse	rmse_log	a1	a2	a3
KITTI_MR	0.098	0.770	4.459	0.176	0.900	0.965	0.983
NEW	0.100	0.755	4.423	0.178	0.899	0.964	0.983

It seems better in sq_rel and rmse.

Indoor environment

Hi @mdfirman,
Thanks for your work!

Does this method work in the indoor environment?

Thanks!

Custom data sets (high speed scenes) do not work well

First of all, thank you for your contribution to the depth estimate!
When I reproduced your code, I used my own custom dataset for training the monocular model, which consists of 6k consecutive frame images.
I have also changed the intrinsics matrix K in the data loader, which should be correct after verification.
However, the training results are still unsatisfactory, and I cannot even generate the correct depth map of the road, and I cannot get the correct depth of the vehicles on the road.
So I would like to ask you what could be the cause of this situation? My personal guess is that apart from the relatively small dataset, is it possible that in high-speed scenarios where the environment is relatively simple, monocular training does not produce large losses and therefore the network cannot be trained sufficiently?
I would be grateful for a solution!
Thanks!

Multi-GPU

Hi:

Seems the training time is similar to Monodepth2, pretty efficient!
I'm wondering if it's possible to utilize multiple GPUs?

Thank you

About depth bins

Hi,
Thanks for the interesting paper. It is really impressive and inspiring.
I want to ask you some questions about the binning strategy.
In options.py, there are two options, inverse, and linear, but the linear is default and chosen for your model.
As far as I know, many papers of MVS depth using DNN construct cost volume with planes sampled from the inverse depth space. But in your case, does linear perform better than inverse sampling? Also, would you please explain any insights behind this choice?

how to get the RGBD map like the video shows

the demo video demo shows the rgbd map.

I'm currious about how to get this rgbd map.
A possible method is depth image + intrinsic -> pointcloud -> aggragate all pointclouds with poses -> voxelization -> rgbd map.
Could anybody know how to generate this rgbd map?

Train on own dataset with not good result

Hi, thanks for your interesting paper and innovative ideas on depth estimation. I am trying to use your model to train on our own campus dataset to see if it works well in real time. As a freshman on deep learning, I follow your experiment implementation and code instructions but still get frustrating results. Could you give me some advice on training to get a better result?

My frame order is [0,-1,1], so I changed the code to match the input.

My result:

My settings:
{
"data_path": "/media/xzy/daa84e38-7f66-4aa4-a0ce-4fe978abe706/xzy/Downloads/manydepth/dump_root",
"log_dir": "/media/xzy/daa84e38-7f66-4aa4-a0ce-4fe978abe706/xzy/Downloads/manydepth/log",
"model_name": "Vecan_model",
"split": "vecan",
"num_layers": 18,
"depth_binning": "linear",
"num_depth_bins": 96,
"dataset": "cityscapes_preprocessed",
"png": true,
"height": 192,
"width": 640,
"disparity_smoothness": 0.001,
"scales": [
0,
1,
2,
3
],
"min_depth": 0.1,
"max_depth": 80.0,
"frame_ids": [
0,
-1,
1
],
"batch_size": 8,
"learning_rate": 0.0001,
"num_epochs": 20,
"scheduler_step_size": 15,
"freeze_teacher_and_pose": false,
"freeze_teacher_epoch": 5,
"v1_multiscale": false,
"avg_reprojection": false,
"disable_automasking": false,
"no_ssim": false,
"weights_init": "pretrained",
"use_future_frame": false,
"num_matching_frames": 1,
"disable_motion_masking": false,
"no_matching_augmentation": false,
"no_cuda": false,
"num_workers": 8,
"load_weights_folder": "/media/xzy/daa84e38-7f66-4aa4-a0ce-4fe978abe706/xzy/Downloads/manydepth/manydepth/checkpoint/KITTI_MR",
"mono_weights_folder": null,
"models_to_load": [
"encoder",
"depth",
"pose_encoder",
"pose"
],
"log_frequency": 250,
"save_frequency": 1,
"eval_stereo": false,
"eval_mono": false,
"disable_median_scaling": false,
"pred_depth_scale_factor": 1,
"ext_disp_to_eval": null,
"eval_split": "eigen",
"save_pred_disps": false,
"no_eval": false,
"eval_eigen_to_benchmark": false,
"eval_out_dir": null,
"post_process": false,
"zero_cost_volume": false,
"static_camera": false
}

Training on NYU-V2 Dataset

Hi, is it possible to train the model on NYU V2 dataset and evaluate the model with existing code? Do we need any preprocessing for that?

depth ground truth error

Thank you for your excellent work, and I have learned many things from your source codes.

I saw that the error occurs when the ground truth depth is loaded. (issue9)

Can I just remove "and False" in mono_dataset.py: 192line?

or Do I need to make further modifications to other parts of the source code?

Wrong depth scale when using ground-truth camera poses.

Hi, I have a question about using ground-truth camera poses instead of predicted camera poses. I tried to use camera poses with the correct scale in the KITTI dataset, but I find the scale not correct yet. Is there anything I missed? I only changed the code as follows.

output, lowest_cost, costvol = encoder(input_color, lookup_frames,
                                                       relative_poses, # change to relative_poses_gt
                                                       K,
                                                       invK,
                                                       min_depth_bin, max_depth_bin)

Thanks a lot!

pose_enc.load_state_dict error

I test the kiittiHR models ,and got the error,:
-> Loading weights from /home/wangshuo/PycharmProjects/test_list/depth/manydepth/models/KITTI_HR Traceback (most recent call last): File "/home/wangshuo/anaconda3/envs/many/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/wangshuo/anaconda3/envs/many/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/wangshuo/PycharmProjects/test_list/depth/manydepth/manydepth/evaluate_depth.py", line 399, in <module> evaluate(options.parse()) File "/home/wangshuo/PycharmProjects/test_list/depth/manydepth/manydepth/evaluate_depth.py", line 146, in evaluate pose_enc.load_state_dict(pose_enc_dict, strict=True) File "/home/wangshuo/anaconda3/envs/many/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ResnetEncoder: Unexpected key(s) in state_dict: "encoder.bn1.num_batches_tracked", "encoder.layer1.0.bn1.num_batches_tracked", "encoder.layer1.0.bn2.num_batches_tracked", "encoder.layer1.1.bn1.num_batches_tracked", "encoder.layer1.1.bn2.num_batches_tracked", "encoder.layer2.0.bn1.num_batches_tracked", "encoder.layer2.0.bn2.num_batches_tracked", "encoder.layer2.0.downsample.1.num_batches_tracked", "encoder.layer2.1.bn1.num_batches_tracked", "encoder.layer2.1.bn2.num_batches_tracked", "encoder.layer3.0.bn1.num_batches_tracked", "encoder.layer3.0.bn2.num_batches_tracked", "encoder.layer3.0.downsample.1.num_batches_tracked", "encoder.layer3.1.bn1.num_batches_tracked", "encoder.layer3.1.bn2.num_batches_tracked", "encoder.layer4.0.bn1.num_batches_tracked", "encoder.layer4.0.bn2.num_batches_tracked", "encoder.layer4.0.downsample.1.num_batches_tracked", "encoder.layer4.1.bn1.num_batches_tracked", "encoder.layer4.1.bn2.num_batches_tracked".

Why disable gradients of on lookup images?

In resnet_encoder.py line 275~291

# feature extraction on lookup images - disable gradients to save memory       
with torch.no_grad():            
      if self.adaptive_bins:                
          self.compute_depth_bins(min_depth_bin, max_depth_bin)
     ......

I don't understand why disable gradients of on lookup images, if don't do like this, will the result be impacted?

Depth evaluation for single frame mode?

Thanks for the wonderful work!

I have a question for the depth evaluation:
When I evaluate the depth performance of a single image, which has no previous frame, I set
"--zero_cost_vulome" and "--num_matching_frames = 0"
for the evaluation options.
However, the "evaluate_depth" encouter a failure: because "frames_to_load[1:]" is empty, the "lookup_frames" receive the empty Tensor-list.

What should I set or change for the single frame mode evaluation, where the test frame has completely no previous frames or future frames?

About update_adaptive_depth_bins in trainer.py

Thanks for sharing your amazing work and code.

Have a question about the update_adaptive_depth_bins() function (Line364 around). It is mentioned in your paper, the depth range is dynamically updated by min and max of MVS depth (i.e., the student network). When checking the code, the mono_depth is used instead. Do I misunderstand that? Or the MVS depth will be learned to mimic the Mono depth? Thanks for your clarification.

What should cityscapes looks like?

I followed this repo to preprocess the cityscapes dataset, but 'FileNotFoundError: [Errno 2] No such file or directory: '/home/hzc/cityscape/ulm/ulm_000056_000015.jpg'' when I trained the model.

So, what should this dataset looks like?

Freeze Teacher network from beginning

Hi,

thanks a lot for sharing this.

I have a fully pre-trained teacher network and tried to freeze its weights directly from the beginning, as I guess it does not make sense to train it further.

However, if I set --freeze_teacher_and_pose as a run option, then self.min_depth_tracker and self.max_depth_tracker are never set in trainer.py, because following lines are not called:

manydepth/manydepth/trainer.py

Lines 59 to 60 in 7e4c46f

 self.min_depth_tracker = 0.1 

 self.max_depth_tracker = 10.0

Thus, I get an error on the following lines:

manydepth/manydepth/trainer.py

Lines 306 to 307 in 7e4c46f

 min_depth_bin = self.min_depth_tracker 

 max_depth_bin = self.max_depth_tracker

How do you suggest to initialize self.min_depth_tracker and self.max_depth_tracker in case of freezing teacher weights from the beginning on? I suppose it makes sense to initialize it to reflect the range of depth values that my pretrained model produces?

Thanks in advance and best regards,
Patrick

Pre-trained models for monocular depth networks

Thank you for open-sourcing this very interesting work. Would it be possible to also provide weights for the monocular depth networks (the teacher networks) that go along with the currently available pre-trained models? Thank you!

Can't reproduce the results on cityscapes

Hi, it's a great work!

I follow the instruction to train and evaluate on cithscapes, while got following result which is slightly different from the paper.

Hence, how can I achieve the SOTA?

	min_depth_bin = self.min_depth_tracker
	max_depth_bin = self.max_depth_tracker