wvangansbeke / sparse-depth-completion Goto Github PK

View Code? Open in Web Editor NEW

458.0 17.0 76.0 150 KB

Predict dense depth maps from sparse and noisy LiDAR frames guided by RGB images. (Ranked 1st place on KITTI) [2019]

Home Page: https://arxiv.org/pdf/1902.05356.pdf

License: Other

Python 95.46% Shell 4.54%

depth-completion depth-prediction lidar sensor-fusion computer-vision deep-learning pytorch noisy-data kitti

sparse-depth-completion's Introduction

Sparse-Depth-Completion

This repo contains the implementation of our paper Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty by Wouter Van Gansbeke, Davy Neven, Bert De Brabandere and Luc Van Gool.

If you find this interesting or relevant to your work, consider citing:

@inproceedings{wvangansbeke_depth_2019,
    author={Van Gansbeke, Wouter and Neven, Davy and De Brabandere, Bert and Van Gool, Luc},
    booktitle={2019 16th International Conference on Machine Vision Applications (MVA)},
    title={Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty},
    year={2019},
    pages={1-6},
    organization={IEEE}
}

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here

Introduction

Monocular depth prediction methods fail to generate absolute and precise depth maps and stereoscopic approaches are still significantly outperformed by LiDAR based approaches. The goal of the depth completion task is to generate dense depth predictions from sparse and irregular point clouds. This project makes use of uncertainty to combine multiple sensor data in order to generate accurate depth predictions. Mapped lidar points together with RGB images (monocular) are used in this framework. This method holds the 1st place entry on the KITTI depth completion benchmark at the time of submission of the paper.

The contribution of this paper is threefold:

Global and local information are combined in order to accurately complete and correct the sparse and noisy LiDAR input. Monocular RGB images are used for the guidance of this depth completion task.
Confidence maps are learned for the global branch and the local branch in an unsupervised manner. The predicted depth maps are weighted by their respective confidence map. This is the late fusion technique used in our framework.
This method ranks first on the KITTI depth completion benchmark without using additional data or postprocessing.

See full demo on YouTube. The predictions of our model for the KITTI test set can be downloaded here.

Requirements

Python 3.7 The most important packages are pytorch, torchvision, numpy, pillow and matplotlib. (Works with Pytorch 1.1)

Dataset

The Kitti dataset has been used. First download the dataset of the depth completion. Secondly, you'll need to unzip and download the camera images from kitti. I used the file download_raw_files.sh, but this is at your own risk. Make sure you understand it, otherwise don't use it. If you want to keep it safe, go to kitti's website.

The complete dataset consists of 85898 training samples, 6852 validation samples, 1000 selected validation samples and 1000 test samples.

Preprocessing

This step is optional, but allows you to transform the images to jpgs and to downsample the original lidar frames. This will create a new dataset in $dest. You can find the required preprocessing in: Datasets/Kitti_loader.py

Run:

source Shell/preprocess $datapath $dest $num_samples

(Firstly, I transformed the png's to jpg - images to save place. Secondly, two directories are built i.e. one for training and one for validation. See Datasets/Kitti_loader.py)

Dataset structure should look like this:

|--depth selection
|-- Depth
     |-- train
           |--date
               |--sequence1
               | ...
     |--validation
|--RGB
    |--train
         |--date
             |--sequence1
             | ...
    |--validation

Run Code

To run the code:

python main.py --data_path /path/to/data/ --lr_policy plateau

Flags:

Set flag "input_type" to rgb or depth.
Set flag "pretrained" to true or false to use a model pretrained on Cityscapes for the global branch.
See python main.py --help for more information.

source Shell/train.sh $datapath

checkout more details in the bash file.

Trained models

Our network architecture is based on ERFNet.

You can find the model pretrained on Cityscapes here. This model is used for the global network.

You can find a fully trained model and its corresponding predictions for the KITTI test set here. The RMSE is around 802 mm on the selected validation set for this model as reported in the paper.

To test it: Save the model in a folder in the Saved directory.

and execute the following command:

source Test/test.sh /path/to/directory_with_saved_model/ $num_samples /path/to/dataset/ /path/to/directory_with_ground_truth_for_selected_validation_files/

(You might have to recompile the C files for testing, provided by KITTI, if your architecture is different from mine)

Results

Comparision with state-of-the-art:

Discussion

Practical discussion:

I recently increased the stability of the training process and I also made the convergence faster by adding some skip connections between the global and local networks. Initially I only used guidance by multiplication with an attention map (=probability), but found out that it is less robust and that differences between a focal MSE and vanilla MSE loss function were now negligible. Be aware that this change will alter the appearance of the confidence maps since fusion happens at mutliple stages now.
Feel free to experiment with different architectures for the global or local network. It is easy to add new architectures to Models/__init__.py
I used a Tesla V100 GPU for evaluation.

Acknowledgement

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven)

sparse-depth-completion's People

Contributors

Stargazers

Watchers

Forkers

zhuyeye liyuxiaoboy alex2012yang collector-m rauldequeirozmendes chrgri kamiradi jlqzzz rensimon gujiaqivadin versatran01 mqchen1993 chunyu-lin-bjtu abelguima minzhangm miaozhenwei kaixiang-git zaxon tamwaiban maohaili qiqzhang zhumingxu pankajmehar weiweizhang6338 sangkny luben2018 haojeng-wang electronicdevil vitorguizilini-tri truongkhang abeerraj siyamsajeebkhan zhwzhong nakajimakou1 devin-coder gaopeng91 kuersatp avinash-ramesh wenhuizhao-center robot-ai-machinelearning carlostheran wangqiqi577 cule xrosliang mjtian dingkwang ydadwhal alihahaa yulingfeng120 entc17-fyp8 ano977 fanrz metavai shikishima-tasakilab lb1995 zhangsongdmk littlemonster888 jhan15 liuziyang123 ashishd zhanghongruiupup datanger ip-restoration midnightluo tmanh aleky-g xingnemo qianqian121 seokyeongkim zhjiuang smaicg amitshomer 5l1v3r1 lz-gzhu arihantlunawat

sparse-depth-completion's Issues

About global and local information

Hi, thank you for the open source code. After I read your paper, what I don't understand is what does global and local information mean?

Testing with other KITTI sequences or other data

Hi!

We have been working with a KITTI sequence that does not seem to have projected Lidar scans available. If I understand the code correctly, you do not use the raw Lidar scans at all, right? So we would have to project the Lidar scans to images if we want to do something like that?

Also, we have a handheld Lidar (Intel Realsense L515) that can capture both RGB images and depth images, and we were thinking of trying to feed your code with this data. Do you know if the images (rgb and depth) have to be of the same size for the code to work?

Best,
Adam

About normalize the input

In #5 (comment) this issue, dataloader have the code to normalize depth and RGB to 0-1, but I can't find the normalize code in the current dataloader version, which way (normalize or not) is better?

Problem with metric calculation

Hi, thank you for sharing your great work.

I've been looking through your implementation of the evaluation metric. In class Metrics of file benchmark_metrics.py, I found this line self.num = valid_mask.sum().item(). I think it might be wrong because this is the number of non-zero pixels. I think this variable should be the batch_size and equal to 1 because you use this variable when computing the average value of metrics in file main.py as follows:
score.update(metric.get_metric(args.metric), metric.num) score_1.update(metric.get_metric(args.metric_1), metric.num).

Am I understanding correctly? Please correct me if I have misunderstanding something.

Best,
Khang Truong.

About data preprocessing

Hi,
I'm trying to figure out the data preprocessing code (Kitti_loader.py ) , maybe the description of the data preprocessing in the README is lacking some kind of details. For instances, "This will create a new dataset in $dest. You can find the required preprocessing in:"

Maybe I'm not good enough regarding code comprehension , haha

Looking forward to ur reply @wvangansbeke

What is "input_type" in args???

Hi there, thanks for the code...
Can you kindly elaborate what is this "input_type" argument is for?

Sorry for a noob question, actually i am just learning this deep learning... and getting my hands dirty on your code.....

Error while testing on pretrained model(trained on Kitti)

Hey!

I find your work really interesting, I am trying to test your approach on the pre-trained model you have provided when I try to run the command that you have written in the Readme it is giving me an error.

I have two questions regarding the testing on the pre-trained model.
Q1.
I try to take a look at the test.sh , the arguments passed in the Readme does not match with the position of the arguments that you have given in the test.sh.

According to test.sh :
first argument is- save path
second argument is- number of samples
third argument is - data path
fourth argument is - groundtruth data path

When I try to run this it gives me an error that "no such file or directory"
Please help to solve this issue.

Q2. Regarding the number of samples.

Can you explain to me what is the purpose of this argument? Are these number of samples to be tested?

Any help would be highly appreciated.

Thank you

The training condition of the best model

Hi,
It's very nice of you to share your code.

However, I can not reproduce the best model whose RMSE is 802 in the selected val set.

Can you tell me the detail training hyper-parameters of the best model （which loss funtion do you choose? the lr policy? The initialization learning rete ）? And how many epoch it takes to get that model.

Many thanks!!

Can't run the code

Hi, I want to reproduce your results. I have followed the procedures in the readme, However, I found several places that are not specific enough. Would you mind share those details about

Do you unpack the downloaded Dataset (Data_Depth_Annotated. Zip, Data_Depth_Velodyne. Zip, Depth_Selection. Zip), put it in the Data folder, and then do preprocess part as you wrote?
The contents and structure of the Saved folder, e.g. what's the three path in the following command,sourceTest/test.sh /path/to/directory_with_saved_model/ $num_samples /path/to/dataset/ /path/to/directory_with_ground_truth_for_selected_validation_files/

I would appreciate it if you could reply.

try to run /Download/download_raw_files.sh

Incorrect depth estimation when there is low LiDAR density

Hi,
I was working with your neural network and I have faced some small issues when downsampling the LiDAR input to around 500 samples. In order to downsample the input, I used the function that you provide: downsample. Afterwards, I have tested how it predicts depth in the pixels where there is actual LiDAR information in the input and I have got the following results:

LiDAR depth: 2049.609375 groundtruth depth: 2049.609375 NN depth: 2097.265625
LiDAR depth: 1317.578125 groundtruth depth: 1303.125 NN depth: 729.6875
LiDAR depth: 2034.765625 groundtruth depth: 1994.921875 NN depth: 1766.015625
LiDAR depth: 1038.671875 groundtruth depth: 1038.671875 NN depth: 604.6875
LiDAR depth: 1788.28125 groundtruth depth: 1784.375 NN depth: 527.34375
LiDAR depth: 1769.921875 groundtruth depth: 1740.234375 NN depth: 1535.9375
LiDAR depth: 1423.828125 groundtruth depth: 1423.828125 NN depth: 986.71875
LiDAR depth: 1203.125 groundtruth depth: 1190.625 NN depth: 743.359375
LiDAR depth: 1762.890625 groundtruth depth: 1757.421875 NN depth: 1470.3125
LiDAR depth: 2573.4375 groundtruth depth: 2540.234375 NN depth: 2076.953125
LiDAR depth: 655.078125 groundtruth depth: 655.078125 NN depth: 539.453125
LiDAR depth: 1229.296875 groundtruth depth: 1229.296875 NN depth: 436.71875

I have also seen that for the normal LiDAR depth maps, the output is never the depth map's values but very similar ones in the same pixel position:

LiDAR depth: 1219.53125 groundtruth depth: 1219.53125 NN depth: 1220.3125
LiDAR depth: 2633.203125 groundtruth depth: 2633.203125 NN depth: 2554.296875
LiDAR depth: 1113.671875 groundtruth depth: 1113.671875 NN depth: 1115.625
LiDAR depth: 4438.28125 groundtruth depth: 4438.28125 NN depth: 4425.0
LiDAR depth: 627.734375 groundtruth depth: 618.75 NN depth: 632.03125
LiDAR depth: 1235.15625 groundtruth depth: 1233.984375 NN depth: 1235.546875
LiDAR depth: 1150.390625 groundtruth depth: 1150.390625 NN depth: 1148.4375
LiDAR depth: 2821.484375 groundtruth depth: 2821.484375 NN depth: 2819.140625
LiDAR depth: 1296.484375 groundtruth depth: 1296.484375 NN depth: 1284.765625

I believe it is an issue due to the lack of information density but I wanted to know if there is any way of improving these results with such a sparse depth information or if retraining with a sparser dataset (creating my own one) would improve the accuracy.
Thanks for your help.
Best,
Sebastián

Size of image from an imported folder

Hi!
I have been tweaking around your code in order to gain some insight of how it works and I have ran into this question. Why in test.py you want the loaded images to have a specific input size (lines 101-104):

raw_pil = Image.open(raw_path)
gt_path = os.path.join(gt)
gt_pil = Image.open(gt)
assert raw_pil.size == (1216, 352)

But this is not required for the training in main.py.
Afterwards I see you crop the images in both codes until they are 1216 x 256 (as stated in your paper) so I do not understand why would it be so important to make sure that the loaded image in test.py has a specific input format (1216, 352) which is not even the one from the KITTI dataset download files (1226 x 370). Is there any insight I am missing?

question about training of confidence map

Hi,

I noticed that your loss calculation has nothing to do with the confidence map of either local net or global net. So I'm curious that how do you train with those two maps? Since I'm trying to using your algorithm to train on my dataset but the confidence map looks weird.

to achieve same results as presented in the paper

@wvangansbeke

Thank you for sharing code.

I read though the code, but found that "mse loss" is by default used for training in given Shell/train.sh. At least, loss with uncertainty should be used, i guess. And Have no idea whether other options should be changed too.

Is it possible to share your training scripts which can achieve the same results as presented in the paper?

Training with Confidence for Global and Local Networks

From the paper:
"We first train both parts of the framework individually and use a pretrained ERFNet on Cityscapes [22] for our global network."
I was wondering how global and local networks are trained whether with or without confidence maps and how is the final end to end training done, with these pre-trained sub-networks

Dataset Download

What paths are supposed to be specified when running "source test.sh"? The description says to enter path to dataset and path to ground truth for selection validation files. What exactly is path to dataset because when I look inside test.py, it seems like it is supposed to be path to directory for saving output?

num_samples should be a positive integer value, but got num_samples=0

Not able to run training. Please help me figure out what went wrong.

Cmd line:- Sparse-Depth-Completion$ python main.py --data_path ../../dataset/train/kitti_lookalike/ --lr_policy plateau --num_samples 128000

Traceback (most recent call last):
File "main.py", line 437, in
main()
File "main.py", line 171, in main
train_loader, valid_loader, valid_selection_loader = get_loader(args, dataset)
File "/home/mdl/azk6085/MDL/Sparse-Depth-Completion/Datasets/dataloader.py", line 54, in get_loader
pin_memory=True, drop_last=True)
File "/home/mdl/azk6085/sw_package/anaconda/envs/tensorflow/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in init
sampler = RandomSampler(dataset)
File "/home/mdl/azk6085/sw_package/anaconda/envs/tensorflow/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

About pretrained model on only trained with sparse data

I want to ask model_best_epoch.pth.tar, is it trained with RGB and required RGB input?
I have my dataset without RGB. Is there a pretrained model without RGB guidence?
And what should I change in the scripts if I use depth only? Just change input_type from rgb to depth?
Thank you.

Pregenerated Depthmaps somewhere?

Hi, do you have these high resolution depthmaps saved somewhere i can download? I could run your network to generate it, but i just need to depthmaps

How did you render the results shown in video?

Code stuck at testing

Latest issue : no change in test.py, except manually coding the best_file_name. No change in test.sh.

Below is the earlier issues I faced, probably we can ignore them, if we have the solution for above.:
downloaded the data_depth_selection, unzipped it, and then ran the following command :

The o/p is :

Ignore the line numbers. I use them inside test.py for debugging.
I changed the default value for --input_type inside test.py from "rgb", to "depth".

If I keep the --input_type to '"rgb" as default, I get the following o/p :

Could it be some issue with channel ? I see that depth_images in KItti dataset are 16 bit-depth (acc. to my understanding, that is equal to 2 channels), and rgb are 4 bit-depth (3 channels), whereas in test.py, its channel_in=1 for depth and 4 for rgb.

For input_type = 'rgb', channel_in=3, the o/p is:

I used the solution, pasted below, from this website https://discuss.pytorch.org/t/tensor-size-mismatch/31897/2

In test.py, for input_type = 'depth', channel_in=2
In test.sh, I changed the removed the highlighted part :

to the following

O/p :

How to visualize

Hello.How did you visualize the depth map to RGB?I can't find this part in your code.Looking forward to your reply.

how to evaluate?

Hi!
I am confuse how to evaluate the dataset?I see some code can do that in main.py,but in "elif args.evaluate......validate(valid_selection_loader,model,criterion_global,criterion_local) " ,i meet a error,that is the "criterion_global" is not defined,how can i do?
thanks!
@wvangansbeke

Can't load the model

I was trying to load the pretrained model as follows torch.nn.DataParallel(densifier).cuda().load_state_dict(loaded_dict_enc, strict=False) but I get the following error

_IncompatibleKeys(missing_keys=['module.depthnet.encoder.initial_block.conv.weight', 'module.depthnet.encoder.initial_block.conv.bias', 'module.depthnet.encoder.initial_block.bn.weight', 'module.depthnet.encoder.initial_block.bn.bias', 'module.depthnet.encoder.initial_block.bn.running_mean', 'module.depthnet.encoder.initial_block.bn.running_var', 'module.depthnet.encoder.layers.0.conv.weight', 'module.depthnet.encoder.layers.0.conv.bias', 'module.depthnet.encoder.layers.0.bn.weight', 'module.depthnet.encoder.layers.0.bn.bias', 'module.depthnet.encoder.layers.0.bn.running_mean', 'module.depthnet.encoder.layers.0.bn.running_var', 'module.depthnet.encoder.layers.1.conv3x1_1.weight', 'module.depthnet.encoder.layers.1.conv3x1_1.bias', 'module.depthnet.encoder.layers.1.conv1x3_1.weight', 'module.depthnet.encoder.layers.1.conv1x3_1.bias', 'module.depthnet.encoder.layers.1.bn1.weight', 'module.depthnet.encoder.layers.1.bn1.bias', 'module.depthnet.encoder.layers.1.bn1.running_mean', 'module.depthnet.encoder.layers.1.bn1.running_var', 'module.depthnet.encoder.layers.1.conv3x1_2.weight', 'module.depthnet.encoder.layers.1.conv3x1_2.bias', 'module.depthnet.encoder.layers.1.conv1x3_2.weight', 'module.depthnet.encoder.layers.1.conv1x3_2.bias', 'module.depthnet.encoder.layers.1.bn2.weight', 'module.depthnet.encoder.layers.1.bn2.bias', 'module.depthnet.encoder.layers.1.bn2.running_mean', 'module.depthnet.encoder.layers.1.bn2.running_var', 'module.depthnet.encoder.layers.2.conv3x1_1.weight', 'module.depthnet.encoder.layers.2.conv3x1_1.bias', 'module.depthnet.encoder.layers.2.conv1x3_1.weight', 'module.depthnet.encoder.layers.2.conv1x3_1.bias', 'module.depthnet.encoder.layers.2.bn1.weight', 'module.depthnet.encoder.layers.2.bn1.bias', 'module.depthnet.encoder.layers.2.bn1.running_mean', 'module.depthnet.encoder.layers.2.bn1.running_var', 'module.depthnet.encoder.layers.2.conv3x1_2.weight', 'module.depthnet.encoder.layers.2.conv3x1_2.bias', 'module.depthnet.encoder.layers.2.conv1x3_2.weight', 'module.depthnet.encoder.layers.2.conv1x3_2.bias', 'module.depthnet.encoder.layers.2.bn2.weight', 'module.depthnet.encoder.layers.2.bn2.bias', 'module.depthnet.encoder.layers.2.bn2.running_mean', 'module.depthnet.encoder.layers.2.bn2.running_var', 'module.depthnet.encoder.layers.3.conv3x1_1.weight', 'module.depthnet.encoder.layers.3.conv3x1_1.bias', 'module.depthnet.encoder.layers.3.conv1x3_1.weight', 'module.depthnet.encoder.layers.3.conv1x3_1.bias', 'module.depthnet.encoder.layers.3.bn1.weight', 'module.depthnet.encoder.layers.3.bn1.bias', 'module.depthnet.encoder.layers.3.bn1.running_mean', 'module.depthnet.encoder.layers.3.bn1.running_var', 'module.depthnet.encoder.layers.3.conv3x1_2.weight', 'module.depthnet.encoder.layers.3.conv3x1_2.bias', 'module.depthnet.encoder.layers.3.conv1x3_2.weight', 'module.depthnet.encoder.layers.3.conv1x3_2.bias', 'module.depthnet.encoder.layers.3.bn2.weight', 'module.depthnet.encoder.layers.3.bn2.bias', 'module.depthnet.encoder.layers.3.bn2.running_mean', 'module.depthnet.encoder.layers.3.bn2.running_var', 'module.depthnet.encoder.layers.4.conv3x1_1.weight', 'module.depthnet.encoder.layers.4.conv3x1_1.bias', 'module.depthnet.encoder.layers.4.conv1x3_1.weight', 'module.depthnet.encoder.layers.4.conv1x3_1.bias', 'module.depthnet.encoder.layers.4.bn1.weight', 'module.depthnet.encoder.layers.4.bn1.bias', 'module.depthnet.encoder.layers.4.bn1.running_mean', 'module.depthnet.encoder.layers.4.bn1.running_var', 'module.depthnet.encoder.layers.4.conv3x1_2.weight', 'module.depthnet.encoder.layers.4.conv3x1_2.bias', 'module.depthnet.encoder.layers.4.conv1x3_2.weight', 'module.depthnet.encoder.layers.4.conv1x3_2.bias', 'module.depthnet.encoder.layers.4.bn2.weight', 'module.depthnet.encoder.layers.4.bn2.bias', 'module.depthnet.encoder.layers.4.bn2.running_mean', 'module.depthnet.encoder.layers.4.bn2.running_var', 'module.depthnet.encoder.layers.5.conv3x1_1.weight', 'module.depthnet.encoder.layers.5.conv3x1_1.bias', 'module.depthnet.encoder.layers.5.conv1x3_1.weight', 'module.depthnet.encoder.layers.5.conv1x3_1.bias', 'module.depthnet.encoder.layers.5.bn1.weight', 'module.depthnet.encoder.layers.5.bn1.bias', 'module.depthnet.encoder.layers.5.bn1.running_mean', 'module.depthnet.encoder.layers.5.bn1.running_var', 'module.depthnet.encoder.layers.5.conv3x1_2.weight', 'module.depthnet.encoder.layers.5.conv3x1_2.bias', 'module.depthnet.encoder.layers.5.conv1x3_2.weight', 'module.depthnet.encoder.layers.5.conv1x3_2.bias', 'module.depthnet.encoder.layers.5.bn2.weight', 'module.depthnet.encoder.layers.5.bn2.bias', 'module.depthnet.encoder.layers.5.bn2.running_mean', 'module.depthnet.encoder.layers.5.bn2.running_var', 'module.depthnet.encoder.layers.6.conv.weight', 'module.depthnet.encoder.layers.6.conv.bias', 'module.depthnet.encoder.layers.6.bn.weight', 'module.depthnet.encoder.layers.6.bn.bias', 'module.depthnet.encoder.layers.6.bn.running_mean', 'module.depthnet.encoder.layers.6.bn.running_var', 'module.depthnet.encoder.layers.7.conv3x1_1.weight', 'module.depthnet.encoder.layers.7.conv3x1_1.bias', 'module.depthnet.encoder.layers.7.conv1x3_1.weight', 'module.depthnet.encoder.layers.7.conv1x3_1.bias', 'module.depthnet.encoder.layers.7.bn1.weight', 'module.depthnet.encoder.layers.7.bn1.bias', 'module.depthnet.encoder.layers.7.bn1.running_mean', 'module.depthnet.encoder.layers.7.bn1.running_var', 'module.depthnet.encoder.layers.7.conv3x1_2.weight', 'module.depthnet.encoder.layers.7.conv3x1_2.bias', 'module.depthnet.encoder.layers.7.conv1x3_2.weight', 'module.depthnet.encoder.layers.7.conv1x3_2.bias', 'module.depthnet.encoder.layers.7.bn2.weight', 'module.depthnet.encoder.layers.7.bn2.bias', 'module.depthnet.encoder.layers.7.bn2.running_mean', 'module.depthnet.encoder.layers.7.bn2.running_var', 'module.depthnet.encoder.layers.8.conv3x1_1.weight', 'module.depthnet.encoder.layers.8.conv3x1_1.bias', 'module.depthnet.encoder.layers.8.conv1x3_1.weight', 'module.depthnet.encoder.layers.8.conv1x3_1.bias', 'module.depthnet.encoder.layers.8.bn1.weight', 'module.depthnet.encoder.layers.8.bn1.bias', 'module.depthnet.encoder.layers.8.bn1.running_mean', 'module.depthnet.encoder.layers.8.bn1.running_var', 'module.depthnet.encoder.layers.8.conv3x1_2.weight', 'module.depthnet.encoder.layers.8.conv3x1_2.bias', 'module.depthnet.encoder.layers.8.conv1x3_2.weight', 'module.depthnet.encoder.layers.8.conv1x3_2.bias', 'module.depthnet.encoder.layers.8.bn2.weight', 'module.depthnet.encoder.layers.8.bn2.bias', 'module.depthnet.encoder.layers.8.bn2.running_mean', 'module.depthnet.encoder.layers.8.bn2.running_var', 'module.depthnet.encoder.layers.9.conv3x1_1.weight', 'module.depthnet.encoder.layers.9.conv3x1_1.bias', 'module.depthnet.encoder.layers.9.conv1x3_1.weight', 'module.depthnet.encoder.layers.9.conv1x3_1.bias', 'module.depthnet.encoder.layers.9.bn1.weight', 'module.depthnet.encoder.layers.9.bn1.bias', 'module.depthnet.encoder.layers.9.bn1.running_mean', 'module.depthnet.encoder.layers.9.bn1.running_var', 'module.depthnet.encoder.layers.9.conv3x1_2.weight', 'module.depthnet.encoder.layers.9.conv3x1_2.bias', 'module.depthnet.encoder.layers.9.conv1x3_2.weight', 'module.depthnet.encoder.layers.9.conv1x3_2.bias', 'module.depthnet.encoder.layers.9.bn2.weight', 'module.depthnet.encoder.layers.9.bn2.bias', 'module.depthnet.encoder.layers.9.bn2.running_mean', 'module.depthnet.encoder.layers.9.bn2.running_var', 'module.depthnet.encoder.layers.10.conv3x1_1.weight', 'module.depthnet.encoder.layers.10.conv3x1_1.bias', 'module.depthnet.encoder.layers.10.conv1x3_1.weight', 'module.depthnet.encoder.layers.10.conv1x3_1.bias', 'module.depthnet.encoder.layers.10.bn1.weight', 'module.depthnet.encoder.layers.10.bn1.bias', 'module.depthnet.encoder.layers.10.bn1.running_mean', 'module.depthnet.encoder.layers.10.bn1.running_var', 'module.depthnet.encoder.layers.10.conv3x1_2.weight', 'module.depthnet.encoder.layers.10.conv3x1_2.bias', 'module.depthnet.encoder.layers.10.conv1x3_2.weight', 'module.depthnet.encoder.layers.10.conv1x3_2.bias', 'module.depthnet.encoder.layers.10.bn2.weight', 'module.depthnet.encoder.layers.10.bn2.bias', 'module.depthnet.encoder.layers.10.bn2.running_mean', 'module.depthnet.encoder.layers.10.bn2.running_var', 'module.depthnet.encoder.layers.11.conv3x1_1.weight', 'module.depthnet.encoder.layers.11.conv3x1_1.bias', 'module.depthnet.encoder.layers.11.conv1x3_1.weight', 'module.depthnet.encoder.layers.11.conv1x3_1.bias', 'module.depthnet.encoder.layers.11.bn1.weight', 'module.depthnet.encoder.layers.11.bn1.bias', 'module.depthnet.encoder.layers.11.bn1.running_mean', 'module.depthnet.encoder.layers.11.bn1.running_var', 'module.depthnet.encoder.layers.11.conv3x1_2.weight', 'module.depthnet.encoder.layers.11.conv3x1_2.bias', 'module.depthnet.encoder.layers.11.conv1x3_2.weight', 'module.depthnet.encoder.layers.11.conv1x3_2.bias', 'module.depthnet.encoder.layers.11.bn2.weight', 'module.depthnet.encoder.layers.11.bn2.bias', 'module.depthnet.encoder.layers.11.bn2.running_mean', 'module.depthnet.encoder.layers.11.bn2.running_var', 'module.depthnet.encoder.layers.12.conv3x1_1.weight', 'module.depthnet.encoder.layers.12.conv3x1_1.bias', 'module.depthnet.encoder.layers.12.conv1x3_1.weight', 'module.depthnet.encoder.layers.12.conv1x3_1.bias', 'module.depthnet.encoder.layers.12.bn1.weight', 'module.depthnet.encoder.layers.12.bn1.bias', 'module.depthnet.encoder.layers.12.bn1.running_mean', 'module.depthnet.encoder.layers.12.bn1.running_var', 'module.depthnet.encoder.layers.12.conv3x1_2.weight', 'module.depthnet.encoder.layers.12.conv3x1_2.bias', 'module.depthnet.encoder.layers.12.conv1x3_2.weight', 'module.depthnet.encoder.layers.12.conv1x3_2.bias', 'module.depthnet.encoder.layers.12.bn2.weight', 'module.depthnet.encoder.layers.12.bn2.bias', 'module.depthnet.encoder.layers.12.bn2.running_mean', 'module.depthnet.encoder.layers.12.bn2.running_var', 'module.depthnet.encoder.layers.13.conv3x1_1.weight', 'module.depthnet.encoder.layers.13.conv3x1_1.bias', 'module.depthnet.encoder.layers.13.conv1x3_1.weight', 'module.depthnet.encoder.layers.13.conv1x3_1.bias', 'module.depthnet.encoder.layers.13.bn1.weight', 'module.depthnet.encoder.layers.13.bn1.bias', 'module.depthnet.encoder.layers.13.bn1.running_mean', 'module.depthnet.encoder.layers.13.bn1.running_var', 'module.depthnet.encoder.layers.13.conv3x1_2.weight', 'module.depthnet.encoder.layers.13.conv3x1_2.bias', 'module.depthnet.encoder.layers.13.conv1x3_2.weight', 'module.depthnet.encoder.layers.13.conv1x3_2.bias', 'module.depthnet.encoder.layers.13.bn2.weight', 'module.depthnet.encoder.layers.13.bn2.bias', 'module.depthnet.encoder.layers.13.bn2.running_mean', 'module.depthnet.encoder.layers.13.bn2.running_var', 'module.depthnet.encoder.layers.14.conv3x1_1.weight', 'module.depthnet.encoder.layers.14.conv3x1_1.bias', 'module.depthnet.encoder.layers.14.conv1x3_1.weight', 'module.depthnet.encoder.layers.14.conv1x3_1.bias', 'module.depthnet.encoder.layers.14.bn1.weight', 'module.depthnet.encoder.layers.14.bn1.bias', 'module.depthnet.encoder.layers.14.bn1.running_mean', 'module.depthnet.encoder.layers.14.bn1.running_var', 'module.depthnet.encoder.layers.14.conv3x1_2.weight', 'module.depthnet.encoder.layers.14.conv3x1_2.bias', 'module.depthnet.encoder.layers.14.conv1x3_2.weight', 'module.depthnet.encoder.layers.14.conv1x3_2.bias', 'module.depthnet.encoder.layers.14.bn2.weight', 'module.depthnet.encoder.layers.14.bn2.bias', 'module.depthnet.encoder.layers.14.bn2.running_mean', 'module.depthnet.encoder.layers.14.bn2.running_var', 'module.depthnet.encoder.output_conv.weight', 'module.depthnet.encoder.output_conv.bias', 'module.depthnet.decoder.layer1.conv.weight', 'module.depthnet.decoder.layer1.conv.bias', 'module.depthnet.decoder.layer1.bn.weight', 'module.depthnet.decoder.layer1.bn.bias', 'module.depthnet.decoder.layer1.bn.running_mean', 'module.depthnet.decoder.layer1.bn.running_var', 'module.depthnet.decoder.layer2.conv3x1_1.weight', 'module.depthnet.decoder.layer2.conv3x1_1.bias', 'module.depthnet.decoder.layer2.conv1x3_1.weight', 'module.depthnet.decoder.layer2.conv1x3_1.bias', 'module.depthnet.decoder.layer2.bn1.weight', 'module.depthnet.decoder.layer2.bn1.bias', 'module.depthnet.decoder.layer2.bn1.running_mean', 'module.depthnet.decoder.layer2.bn1.running_var', 'module.depthnet.decoder.layer2.conv3x1_2.weight', 'module.depthnet.decoder.layer2.conv3x1_2.bias', 'module.depthnet.decoder.layer2.conv1x3_2.weight', 'module.depthnet.decoder.layer2.conv1x3_2.bias', 'module.depthnet.decoder.layer2.bn2.weight', 'module.depthnet.decoder.layer2.bn2.bias', 'module.depthnet.decoder.layer2.bn2.running_mean', 'module.depthnet.decoder.layer2.bn2.running_var', 'module.depthnet.decoder.layer3.conv3x1_1.weight', 'module.depthnet.decoder.layer3.conv3x1_1.bias', 'module.depthnet.decoder.layer3.conv1x3_1.weight', 'module.depthnet.decoder.layer3.conv1x3_1.bias', 'module.depthnet.decoder.layer3.bn1.weight', 'module.depthnet.decoder.layer3.bn1.bias', 'module.depthnet.decoder.layer3.bn1.running_mean', 'module.depthnet.decoder.layer3.bn1.running_var', 'module.depthnet.decoder.layer3.conv3x1_2.weight', 'module.depthnet.decoder.layer3.conv3x1_2.bias', 'module.depthnet.decoder.layer3.conv1x3_2.weight', 'module.depthnet.decoder.layer3.conv1x3_2.bias', 'module.depthnet.decoder.layer3.bn2.weight', 'module.depthnet.decoder.layer3.bn2.bias', 'module.depthnet.decoder.layer3.bn2.running_mean', 'module.depthnet.decoder.layer3.bn2.running_var', 'module.depthnet.decoder.layer4.conv.weight', 'module.depthnet.decoder.layer4.conv.bias', 'module.depthnet.decoder.layer4.bn.weight', 'module.depthnet.decoder.layer4.bn.bias', 'module.depthnet.decoder.layer4.bn.running_mean', 'module.depthnet.decoder.layer4.bn.running_var', 'module.depthnet.decoder.layer5.conv3x1_1.weight', 'module.depthnet.decoder.layer5.conv3x1_1.bias', 'module.depthnet.decoder.layer5.conv1x3_1.weight', 'module.depthnet.decoder.layer5.conv1x3_1.bias', 'module.depthnet.decoder.layer5.bn1.weight', 'module.depthnet.decoder.layer5.bn1.bias', 'module.depthnet.decoder.layer5.bn1.running_mean', 'module.depthnet.decoder.layer5.bn1.running_var', 'module.depthnet.decoder.layer5.conv3x1_2.weight', 'module.depthnet.decoder.layer5.conv3x1_2.bias', 'module.depthnet.decoder.layer5.conv1x3_2.weight', 'module.depthnet.decoder.layer5.conv1x3_2.bias', 'module.depthnet.decoder.layer5.bn2.weight', 'module.depthnet.decoder.layer5.bn2.bias', 'module.depthnet.decoder.layer5.bn2.running_mean', 'module.depthnet.decoder.layer5.bn2.running_var', 'module.depthnet.decoder.layer6.conv3x1_1.weight', 'module.depthnet.decoder.layer6.conv3x1_1.bias', 'module.depthnet.decoder.layer6.conv1x3_1.weight', 'module.depthnet.decoder.layer6.conv1x3_1.bias', 'module.depthnet.decoder.layer6.bn1.weight', 'module.depthnet.decoder.layer6.bn1.bias', 'module.depthnet.decoder.layer6.bn1.running_mean', 'module.depthnet.decoder.layer6.bn1.running_var', 'module.depthnet.decoder.layer6.conv3x1_2.weight', 'module.depthnet.decoder.layer6.conv3x1_2.bias', 'module.depthnet.decoder.layer6.conv1x3_2.weight', 'module.depthnet.decoder.layer6.conv1x3_2.bias', 'module.depthnet.decoder.layer6.bn2.weight', 'module.depthnet.decoder.layer6.bn2.bias', 'module.depthnet.decoder.layer6.bn2.running_mean', 'module.depthnet.decoder.layer6.bn2.running_var', 'module.depthnet.decoder.output_conv.weight', 'module.depthnet.decoder.output_conv.bias', 'module.convbnrelu.0.0.weight', 'module.hourglass1.conv1.0.0.weight', 'module.hourglass1.conv2.0.weight', 'module.hourglass1.conv3.0.0.weight', 'module.hourglass1.conv4.0.0.weight', 'module.hourglass1.conv5.0.weight', 'module.hourglass1.conv5.1.weight', 'module.hourglass1.conv5.1.bias', 'module.hourglass1.conv5.1.running_mean', 'module.hourglass1.conv5.1.running_var', 'module.hourglass1.conv6.0.weight', 'module.hourglass1.conv6.1.weight', 'module.hourglass1.conv6.1.bias', 'module.hourglass1.conv6.1.running_mean', 'module.hourglass1.conv6.1.running_var', 'module.hourglass2.conv1.0.0.weight', 'module.hourglass2.conv1.1.weight', 'module.hourglass2.conv1.1.bias', 'module.hourglass2.conv1.1.running_mean', 'module.hourglass2.conv1.1.running_var', 'module.hourglass2.conv2.0.weight', 'module.hourglass2.conv3.0.0.weight', 'module.hourglass2.conv3.1.weight', 'module.hourglass2.conv3.1.bias', 'module.hourglass2.conv3.1.running_mean', 'module.hourglass2.conv3.1.running_var', 'module.hourglass2.conv4.0.0.weight', 'module.hourglass2.conv5.0.weight', 'module.hourglass2.conv5.1.weight', 'module.hourglass2.conv5.1.bias', 'module.hourglass2.conv5.1.running_mean', 'module.hourglass2.conv5.1.running_var', 'module.hourglass2.conv6.0.weight', 'module.hourglass2.conv6.1.weight', 'module.hourglass2.conv6.1.bias', 'module.hourglass2.conv6.1.running_mean', 'module.hourglass2.conv6.1.running_var', 'module.fuse.0.0.weight', 'module.fuse.2.weight', 'module.fuse.2.bias'], unexpected_keys=['module.encoder.initial_block.conv.weight', 'module.encoder.initial_block.conv.bias', 'module.encoder.initial_block.bn.weight', 'module.encoder.initial_block.bn.bias', 'module.encoder.initial_block.bn.running_mean', 'module.encoder.initial_block.bn.running_var', 'module.encoder.layers.0.conv.weight', 'module.encoder.layers.0.conv.bias', 'module.encoder.layers.0.bn.weight', 'module.encoder.layers.0.bn.bias', 'module.encoder.layers.0.bn.running_mean', 'module.encoder.layers.0.bn.running_var', 'module.encoder.layers.1.conv3x1_1.weight', 'module.encoder.layers.1.conv3x1_1.bias', 'module.encoder.layers.1.conv1x3_1.weight', 'module.encoder.layers.1.conv1x3_1.bias', 'module.encoder.layers.1.conv3x1_2.weight', 'module.encoder.layers.1.conv3x1_2.bias', 'module.encoder.layers.1.conv1x3_2.weight', 'module.encoder.layers.1.conv1x3_2.bias', 'module.encoder.layers.1.bn1.weight', 'module.encoder.layers.1.bn1.bias', 'module.encoder.layers.1.bn1.running_mean', 'module.encoder.layers.1.bn1.running_var', 'module.encoder.layers.1.bn2.weight', 'module.encoder.layers.1.bn2.bias', 'module.encoder.layers.1.bn2.running_mean', 'module.encoder.layers.1.bn2.running_var', 'module.encoder.layers.2.conv3x1_1.weight', 'module.encoder.layers.2.conv3x1_1.bias', 'module.encoder.layers.2.conv1x3_1.weight', 'module.encoder.layers.2.conv1x3_1.bias', 'module.encoder.layers.2.conv3x1_2.weight', 'module.encoder.layers.2.conv3x1_2.bias', 'module.encoder.layers.2.conv1x3_2.weight', 'module.encoder.layers.2.conv1x3_2.bias', 'module.encoder.layers.2.bn1.weight', 'module.encoder.layers.2.bn1.bias', 'module.encoder.layers.2.bn1.running_mean', 'module.encoder.layers.2.bn1.running_var', 'module.encoder.layers.2.bn2.weight', 'module.encoder.layers.2.bn2.bias', 'module.encoder.layers.2.bn2.running_mean', 'module.encoder.layers.2.bn2.running_var', 'module.encoder.layers.3.conv3x1_1.weight', 'module.encoder.layers.3.conv3x1_1.bias', 'module.encoder.layers.3.conv1x3_1.weight', 'module.encoder.layers.3.conv1x3_1.bias', 'module.encoder.layers.3.conv3x1_2.weight', 'module.encoder.layers.3.conv3x1_2.bias', 'module.encoder.layers.3.conv1x3_2.weight', 'module.encoder.layers.3.conv1x3_2.bias', 'module.encoder.layers.3.bn1.weight', 'module.encoder.layers.3.bn1.bias', 'module.encoder.layers.3.bn1.running_mean', 'module.encoder.layers.3.bn1.running_var', 'module.encoder.layers.3.bn2.weight', 'module.encoder.layers.3.bn2.bias', 'module.encoder.layers.3.bn2.running_mean', 'module.encoder.layers.3.bn2.running_var', 'module.encoder.layers.4.conv3x1_1.weight', 'module.encoder.layers.4.conv3x1_1.bias', 'module.encoder.layers.4.conv1x3_1.weight', 'module.encoder.layers.4.conv1x3_1.bias', 'module.encoder.layers.4.conv3x1_2.weight', 'module.encoder.layers.4.conv3x1_2.bias', 'module.encoder.layers.4.conv1x3_2.weight', 'module.encoder.layers.4.conv1x3_2.bias', 'module.encoder.layers.4.bn1.weight', 'module.encoder.layers.4.bn1.bias', 'module.encoder.layers.4.bn1.running_mean', 'module.encoder.layers.4.bn1.running_var', 'module.encoder.layers.4.bn2.weight', 'module.encoder.layers.4.bn2.bias', 'module.encoder.layers.4.bn2.running_mean', 'module.encoder.layers.4.bn2.running_var', 'module.encoder.layers.5.conv3x1_1.weight', 'module.encoder.layers.5.conv3x1_1.bias', 'module.encoder.layers.5.conv1x3_1.weight', 'module.encoder.layers.5.conv1x3_1.bias', 'module.encoder.layers.5.conv3x1_2.weight', 'module.encoder.layers.5.conv3x1_2.bias', 'module.encoder.layers.5.conv1x3_2.weight', 'module.encoder.layers.5.conv1x3_2.bias', 'module.encoder.layers.5.bn1.weight', 'module.encoder.layers.5.bn1.bias', 'module.encoder.layers.5.bn1.running_mean', 'module.encoder.layers.5.bn1.running_var', 'module.encoder.layers.5.bn2.weight', 'module.encoder.layers.5.bn2.bias', 'module.encoder.layers.5.bn2.running_mean', 'module.encoder.layers.5.bn2.running_var', 'module.encoder.layers.6.conv.weight', 'module.encoder.layers.6.conv.bias', 'module.encoder.layers.6.bn.weight', 'module.encoder.layers.6.bn.bias', 'module.encoder.layers.6.bn.running_mean', 'module.encoder.layers.6.bn.running_var', 'module.encoder.layers.7.conv3x1_1.weight', 'module.encoder.layers.7.conv3x1_1.bias', 'module.encoder.layers.7.conv1x3_1.weight', 'module.encoder.layers.7.conv1x3_1.bias', 'module.encoder.layers.7.conv3x1_2.weight', 'module.encoder.layers.7.conv3x1_2.bias', 'module.encoder.layers.7.conv1x3_2.weight', 'module.encoder.layers.7.conv1x3_2.bias', 'module.encoder.layers.7.bn1.weight', 'module.encoder.layers.7.bn1.bias', 'module.encoder.layers.7.bn1.running_mean', 'module.encoder.layers.7.bn1.running_var', 'module.encoder.layers.7.bn2.weight', 'module.encoder.layers.7.bn2.bias', 'module.encoder.layers.7.bn2.running_mean', 'module.encoder.layers.7.bn2.running_var', 'module.encoder.layers.8.conv3x1_1.weight', 'module.encoder.layers.8.conv3x1_1.bias', 'module.encoder.layers.8.conv1x3_1.weight', 'module.encoder.layers.8.conv1x3_1.bias', 'module.encoder.layers.8.conv3x1_2.weight', 'module.encoder.layers.8.conv3x1_2.bias', 'module.encoder.layers.8.conv1x3_2.weight', 'module.encoder.layers.8.conv1x3_2.bias', 'module.encoder.layers.8.bn1.weight', 'module.encoder.layers.8.bn1.bias', 'module.encoder.layers.8.bn1.running_mean', 'module.encoder.layers.8.bn1.running_var', 'module.encoder.layers.8.bn2.weight', 'module.encoder.layers.8.bn2.bias', 'module.encoder.layers.8.bn2.running_mean', 'module.encoder.layers.8.bn2.running_var', 'module.encoder.layers.9.conv3x1_1.weight', 'module.encoder.layers.9.conv3x1_1.bias', 'module.encoder.layers.9.conv1x3_1.weight', 'module.encoder.layers.9.conv1x3_1.bias', 'module.encoder.layers.9.conv3x1_2.weight', 'module.encoder.layers.9.conv3x1_2.bias', 'module.encoder.layers.9.conv1x3_2.weight', 'module.encoder.layers.9.conv1x3_2.bias', 'module.encoder.layers.9.bn1.weight', 'module.encoder.layers.9.bn1.bias', 'module.encoder.layers.9.bn1.running_mean', 'module.encoder.layers.9.bn1.running_var', 'module.encoder.layers.9.bn2.weight', 'module.encoder.layers.9.bn2.bias', 'module.encoder.layers.9.bn2.running_mean', 'module.encoder.layers.9.bn2.running_var', 'module.encoder.layers.10.conv3x1_1.weight', 'module.encoder.layers.10.conv3x1_1.bias', 'module.encoder.layers.10.conv1x3_1.weight', 'module.encoder.layers.10.conv1x3_1.bias', 'module.encoder.layers.10.conv3x1_2.weight', 'module.encoder.layers.10.conv3x1_2.bias', 'module.encoder.layers.10.conv1x3_2.weight', 'module.encoder.layers.10.conv1x3_2.bias', 'module.encoder.layers.10.bn1.weight', 'module.encoder.layers.10.bn1.bias', 'module.encoder.layers.10.bn1.running_mean', 'module.encoder.layers.10.bn1.running_var', 'module.encoder.layers.10.bn2.weight', 'module.encoder.layers.10.bn2.bias', 'module.encoder.layers.10.bn2.running_mean', 'module.encoder.layers.10.bn2.running_var', 'module.encoder.layers.11.conv3x1_1.weight', 'module.encoder.layers.11.conv3x1_1.bias', 'module.encoder.layers.11.conv1x3_1.weight', 'module.encoder.layers.11.conv1x3_1.bias', 'module.encoder.layers.11.conv3x1_2.weight', 'module.encoder.layers.11.conv3x1_2.bias', 'module.encoder.layers.11.conv1x3_2.weight', 'module.encoder.layers.11.conv1x3_2.bias', 'module.encoder.layers.11.bn1.weight', 'module.encoder.layers.11.bn1.bias', 'module.encoder.layers.11.bn1.running_mean', 'module.encoder.layers.11.bn1.running_var', 'module.encoder.layers.11.bn2.weight', 'module.encoder.layers.11.bn2.bias', 'module.encoder.layers.11.bn2.running_mean', 'module.encoder.layers.11.bn2.running_var', 'module.encoder.layers.12.conv3x1_1.weight', 'module.encoder.layers.12.conv3x1_1.bias', 'module.encoder.layers.12.conv1x3_1.weight', 'module.encoder.layers.12.conv1x3_1.bias', 'module.encoder.layers.12.conv3x1_2.weight', 'module.encoder.layers.12.conv3x1_2.bias', 'module.encoder.layers.12.conv1x3_2.weight', 'module.encoder.layers.12.conv1x3_2.bias', 'module.encoder.layers.12.bn1.weight', 'module.encoder.layers.12.bn1.bias', 'module.encoder.layers.12.bn1.running_mean', 'module.encoder.layers.12.bn1.running_var', 'module.encoder.layers.12.bn2.weight', 'module.encoder.layers.12.bn2.bias', 'module.encoder.layers.12.bn2.running_mean', 'module.encoder.layers.12.bn2.running_var', 'module.encoder.layers.13.conv3x1_1.weight', 'module.encoder.layers.13.conv3x1_1.bias', 'module.encoder.layers.13.conv1x3_1.weight', 'module.encoder.layers.13.conv1x3_1.bias', 'module.encoder.layers.13.conv3x1_2.weight', 'module.encoder.layers.13.conv3x1_2.bias', 'module.encoder.layers.13.conv1x3_2.weight', 'module.encoder.layers.13.conv1x3_2.bias', 'module.encoder.layers.13.bn1.weight', 'module.encoder.layers.13.bn1.bias', 'module.encoder.layers.13.bn1.running_mean', 'module.encoder.layers.13.bn1.running_var', 'module.encoder.layers.13.bn2.weight', 'module.encoder.layers.13.bn2.bias', 'module.encoder.layers.13.bn2.running_mean', 'module.encoder.layers.13.bn2.running_var', 'module.encoder.layers.14.conv3x1_1.weight', 'module.encoder.layers.14.conv3x1_1.bias', 'module.encoder.layers.14.conv1x3_1.weight', 'module.encoder.layers.14.conv1x3_1.bias', 'module.encoder.layers.14.conv3x1_2.weight', 'module.encoder.layers.14.conv3x1_2.bias', 'module.encoder.layers.14.conv1x3_2.weight', 'module.encoder.layers.14.conv1x3_2.bias', 'module.encoder.layers.14.bn1.weight', 'module.encoder.layers.14.bn1.bias', 'module.encoder.layers.14.bn1.running_mean', 'module.encoder.layers.14.bn1.running_var', 'module.encoder.layers.14.bn2.weight', 'module.encoder.layers.14.bn2.bias', 'module.encoder.layers.14.bn2.running_mean', 'module.encoder.layers.14.bn2.running_var', 'module.decoder.layers.0.conv.weight', 'module.decoder.layers.0.conv.bias', 'module.decoder.layers.0.bn.weight', 'module.decoder.layers.0.bn.bias', 'module.decoder.layers.0.bn.running_mean', 'module.decoder.layers.0.bn.running_var', 'module.decoder.layers.1.conv3x1_1.weight', 'module.decoder.layers.1.conv3x1_1.bias', 'module.decoder.layers.1.conv1x3_1.weight', 'module.decoder.layers.1.conv1x3_1.bias', 'module.decoder.layers.1.bn1.weight', 'module.decoder.layers.1.bn1.bias', 'module.decoder.layers.1.bn1.running_mean', 'module.decoder.layers.1.bn1.running_var', 'module.decoder.layers.1.conv3x1_2.weight', 'module.decoder.layers.1.conv3x1_2.bias', 'module.decoder.layers.1.conv1x3_2.weight', 'module.decoder.layers.1.conv1x3_2.bias', 'module.decoder.layers.1.bn2.weight', 'module.decoder.layers.1.bn2.bias', 'module.decoder.layers.1.bn2.running_mean', 'module.decoder.layers.1.bn2.running_var', 'module.decoder.layers.2.conv3x1_1.weight', 'module.decoder.layers.2.conv3x1_1.bias', 'module.decoder.layers.2.conv1x3_1.weight', 'module.decoder.layers.2.conv1x3_1.bias', 'module.decoder.layers.2.bn1.weight', 'module.decoder.layers.2.bn1.bias', 'module.decoder.layers.2.bn1.running_mean', 'module.decoder.layers.2.bn1.running_var', 'module.decoder.layers.2.conv3x1_2.weight', 'module.decoder.layers.2.conv3x1_2.bias', 'module.decoder.layers.2.conv1x3_2.weight', 'module.decoder.layers.2.conv1x3_2.bias', 'module.decoder.layers.2.bn2.weight', 'module.decoder.layers.2.bn2.bias', 'module.decoder.layers.2.bn2.running_mean', 'module.decoder.layers.2.bn2.running_var', 'module.decoder.layers.3.conv.weight', 'module.decoder.layers.3.conv.bias', 'module.decoder.layers.3.bn.weight', 'module.decoder.layers.3.bn.bias', 'module.decoder.layers.3.bn.running_mean', 'module.decoder.layers.3.bn.running_var', 'module.decoder.layers.4.conv3x1_1.weight', 'module.decoder.layers.4.conv3x1_1.bias', 'module.decoder.layers.4.conv1x3_1.weight', 'module.decoder.layers.4.conv1x3_1.bias', 'module.decoder.layers.4.bn1.weight', 'module.decoder.layers.4.bn1.bias', 'module.decoder.layers.4.bn1.running_mean', 'module.decoder.layers.4.bn1.running_var', 'module.decoder.layers.4.conv3x1_2.weight', 'module.decoder.layers.4.conv3x1_2.bias', 'module.decoder.layers.4.conv1x3_2.weight', 'module.decoder.layers.4.conv1x3_2.bias', 'module.decoder.layers.4.bn2.weight', 'module.decoder.layers.4.bn2.bias', 'module.decoder.layers.4.bn2.running_mean', 'module.decoder.layers.4.bn2.running_var', 'module.decoder.layers.5.conv3x1_1.weight', 'module.decoder.layers.5.conv3x1_1.bias', 'module.decoder.layers.5.conv1x3_1.weight', 'module.decoder.layers.5.conv1x3_1.bias', 'module.decoder.layers.5.bn1.weight', 'module.decoder.layers.5.bn1.bias', 'module.decoder.layers.5.bn1.running_mean', 'module.decoder.layers.5.bn1.running_var', 'module.decoder.layers.5.conv3x1_2.weight', 'module.decoder.layers.5.conv3x1_2.bias', 'module.decoder.layers.5.conv1x3_2.weight', 'module.decoder.layers.5.conv1x3_2.bias', 'module.decoder.layers.5.bn2.weight', 'module.decoder.layers.5.bn2.bias', 'module.decoder.layers.5.bn2.running_mean', 'module.decoder.layers.5.bn2.running_var', 'module.decoder.output_conv.weight', 'module.decoder.output_conv.bias'])

Note that I even had strict=False option while loading pretrained weights. I defined the model as densifier = Models.define_model('mod',in_channels=4)

Problems with trying to run the program in evaluate mode

Hi!

I am trying to load your pretrained model into the main.py script and running with --evaluate but I am having problems with getting everything running. I got the files needed from KITTI, and used your download_raw_files.sh, but modified it to only download validation data into Data/val. The model you have trained is saved in a folder I created in Saved, but I don't know how to send this path to the program. Also, when trying to run I have to supply a --mod which I don't know how to format, and I get an error:

main.py: error: argument --mod: invalid choice: '/home/emdkaai/Desktop/Sparse-Depth-Completion/Saved/pretrained/model_best_epoch.pth' (choose from 'mod')

Could you give me any suggestions on what to try?

Best,
Adam

Trying to create tensor with negative dimension -2: [-2, 34, 3, 3]

Hello,

Upon executing the model.py, I am getting the following traceback:

Traceback (most recent call last):
  File "model.py", line 169, in <module>
    model = uncertainty_net(34, in_channels).cuda()
  File "model.py", line 23, in __init__
    self.depthnet = Net(in_channels=in_channels, out_channels=out_channels)
  File "/home/saum/7sensing/Sparse-Depth-Completion/Models/ERFNet.py", line 143, in __init__
    self.encoder = Encoder(in_channels, out_channels)
  File "/home/saum/7sensing/Sparse-Depth-Completion/Models/ERFNet.py", line 66, in __init__
    self.initial_block = DownsamplerBlock(in_channels, chans)
  File "/home/saum/7sensing/Sparse-Depth-Completion/Models/ERFNet.py", line 15, in __init__
    self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
  File "/home/saum/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in __init__
    False, _pair(0), groups, bias, padding_mode)
  File "/home/saum/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 52, in __init__
    out_channels, in_channels // groups, *kernel_size))
RuntimeError: Trying to create tensor with negative dimension -2: [-2, 34, 3, 3]

Could you help in this regard?

colormap for visualizing the result

Hi,

May I ask which colormap you use to visualize your result? I checked all cmap options of matplotlib and seen none looks like yours. Thanks!

Evaluating your pretrained model

Hey,

I want to use your pretrained model (KITTI) to output some depth-maps. Therefore I created a dataset folder as you said
Running

$source Test/test.sh /home/username/Desktop/bryan/Sparse-Depth-Completion/mod/model_best_epoch.pth.tar 5 /home/username/Desktop/bryan/data/ /home/username/Desktop/bryan/groundtruth_depth/

Gives me the error:
Save path is: /home/username/Desktop/bryan/Sparse-Depth-Completion/mod/model_best_epoch.pth.tar
Data path is: /home/username/Desktop/bryan/data/
/home/username/Desktop/bryan/Sparse-Depth-Completion
Traceback (most recent call last):
File "Test/test.py", line 164, in
main()
File "Test/test.py", line 54, in main
best_file_name = glob.glob(os.path.join(args.save_path, 'model_best*'))[0]
IndexError: list index out of range
Test/devkit/cpp/evaluate_depth /home/username/Desktop/bryan/groundtruth_depth/ Saved//home/username/Desktop/bryan/Sparse-Depth-Completion/mod/model_best_epoch.pth.tar/results
Starting depth evaluation..
Number of groundtruth (5) and prediction files (-1) mismatch!
Segmentation fault (core dumped)

the structure of the "data"folder is as you mentioned it: Parent folder "data", subfolder "Depth", "depth_selection", "RGB" where "Depth" and "RGB" has the subfolder validation where the images directly (without any subfolders) are.

RGB input type

Hi ,
I wonder if you can supply me with a detailed data set structure
I've ran the code using input_type as depth ,but while trying to run it as RGB i get out of indexing error I think due fail in data set structure

Regards ,
Omar .

Script for only prediction

I have downloaded the trained model. I will like it to predict dense depth data for my sparse input data. As I can see, there is no script for only prediction. The test script has validation also, which needs ground_truth data as well, which I don't have.
Any help with this..
Upon running test.sh, I am facing errors, which I have posted as a separate issue.

Confusions about dataset

Hi,
I'm reading your paper and trying to reproduce your work.But there are some confusions on me.I have downloaded data_depth_annotated.zip,data_depth_velodyne.zip and data_depth_selection.zip.I have followed the instruction of kitti website and unziped data_depth_annotated.zip and data_depth_velodyne.zip into the same folder.My confusions are as follows.

Files in data_depth_annotated.zip are ground truth of depth and files in data_depth_velodyne.zip are raw depth,is my understanding right?
Files in data_depth_selection.zip are prepared for us to evaluate model,is my understanding right?
I only find RGB images in data_depth_selection.zip.If I want to use RGB images for guidence,where are the corresponding RGB images in the two zip files,data_depth_annotated.zip and data_depth_velodyne.zip.
What's the meaning of embedding1,embedding2,embedding3,embedding4?In other words,embedding from ERFNet as input of hourglass network,why?

I would appreciate it if you could help me with above confusions.

test on vlp-16 dataset

Hi,

Recently I've been trying on training on our dataset that is generated from a vlp-16 lidar. Before that, I tried to perform training on reduced lines version of kitti dataset (I manually reduced 3/4 lines) and it works fine and the output looks likes generated on 64-line version roughly. However when I trained on our dataset, the result is not such satisfactory, I can barely recognize the object especially its edge. So it there something I can do such as image augmentation or is it related to our ground truth since it contains some noise especially when there is moving object

RGB image for training and validation

Hi,

I am wondering do I need to download the whole raw data from kitti official website? It seems that there is no way to just download RGB images(I've downloaded the depth and lidar part from depth completion in kitti benchmark). Thanks!

Get the result on YouTube

@wvangansbeke

How could I get the result displayed on YouTube?
I am confused about how to transfer these depth and rgb image to a 3D point cloud.

Reproduce the result of localnet

Hi,
Thanks and waiting for your code release.

I am trying to reproduce the result of localnet. The localnet takes 1216x256x1 velodyne_raw(not normalized,but divide 256 to obtain real depth value) as input, and a simple conv layer with stride=2, kernel_size=3, out_channels=32, following two hourglasses as in the paper, then a tranconv layer of stride=2, kernel_size=3, out_channels=32 and a 1x1 conv layer with out_channels=2, the confidence map after softmax adds on the local depth prediction.
I trained on kitti depth dataset with adam and lr=0.001 for 40 epochs and 48 batchsize, the loss is masked mae loss. However, my result is much worse, the best rmse is around 1800 compared with your localnet result 995.

Is there any insight to improve the localnet? Thanks so much :)

-s7ev3n

Testing the pretrained model

I have tested the pretrained model you provided and that was the result I got :
https://drive.google.com/open?id=154zANBoH7dCJNQIQihpqVDTMM_nsotyd

I wonder if those results are matching what you got , or I did something wrong ?

in addition to that , I wanna know the fusion technique that you used , is it just the concatenation between RGB input images and lidar depth input before you feed it to the CNN ? or there is something more I don't understand
Thanks in advance :)

The problem about RGB dataset

I have downloaded raw datasets from the Kitti dataset, but I don't know how to split in into train and val folders, could you tell me?

Why confidence map and guidance map are able to correct mistakes?

Hi,
Looking forward to the release!

I have some questions about the paper.

I don't understand why confidence(uncertainty) map and guidance map are able to correct mistakes in the ground truth? In a general setting, unguided or guided, I think CNN is able to handle a small amount of error in the ground truth.
How many channels do the guidance map have(one of the global net output)? In the figure it says 1216x256x1, I'm wondering did you try to increase the number of channels, how does them perform?
As for the pretrained ERFNet on cityscape dataset, what is the setting for the pretraining? And have you try the depth completion without pretraining?

Thank you!

about Sparse and noisy LiDAR

what is this Sparse and noisy LiDAR, it is a cheap 4-beam LiDAR?

About metrics update

Thanks for your code!
I am confused about the metric update method in your validation. (Line 402~404 in main.py)

metric.calculate(prediction[:, 0:1], gt)
score.update(metric.get_metric(args.metric), metric.num)
score_1.update(metric.get_metric(args.metric_1), metric.num)

why do you use the weight metric.num instead of input.size(0) ?
(And I found the result (RMSE) of using metric.num is about 30mm lower than the result (RMSE) of using input.size(0) ?)
Thanks!

Dataset Prepare

Hi, thank you for your great work!
I download data_depth_selection.zip, data_depth_velodyne.zip and data_depth_annotated.zip from Kitti website and unzip them. Howerver, where can I download the camera images I need and what structure should I build before data preprocessing in README.md?
Thank you!

About dataset

Do I need to download all avg-kitti/raw_data datasets for testing purposes only? There are more than 200。Too much

Training conditions

Hi, I'm very grateful you can open your code for this depth completion task. Here I have a question:

Did you implement this training process on a single gpu or multi-gpu?
What learning rate decay scheduler did you use? And, did you use the 'scheduler.step' under each epoch or under 'for i, (inputs, gt) in enumerate(train_loader)'?

Thank you very much!

Where is main.py

It seems like, that the main.py is missing in your branch.

Regards

Waiting for your code.

Hi, waiting for your cod to release.. Can you tell how many days it will take to complete. Thanks in advance.

why the localNet not use global depth prediction as input

I am confuse that the dense depth is generated, why not use it as localnet's input? Does instead the guidance map to global depth prediction will bad to the accuracy?

Question about the data preprocess in dataloader.py

Hello!
Thank you so much to share your code publicly for free.
I am confused about the following code in dataloader.py, probably from the line 174 to 180:

input, gt = self.totensor(sparse_depth_np).float(), self.totensor(gt_np).float()

    if self.normal:
        # Put in {0-1} range and then normalize
        input = input/self.max_depth
        gt = gt/self.max_depth
        # input = self.depth_norm(input)

we know that totensor in pytorch can converts a PIL Image in range [0,255] to a torch.FloatTensor in the range of [0.0, 1.0] automatically, but you also do the normalization by divided by the max_depth, I wonder whether this transformation is is proper?
Wish your reply, thank you!

Replicate results with pre-trained model

Hi, I am trying to replicate your results on a different codebase, using the pre-trained model provided, but I am probably doing something wrong (the depth maps are meaningless, even though I can load the weights). My guess is that it's a pre-processing issue, what sort of normalization are you doing on the input RGB and depth information? Thanks!

Confusing Terminologies

Hi
I'm a student using the project for educational purposes, I just want to understand the difference between Local branch, Global branch and confidence maps, I've searched for them on Google and I've got nothing.
Thanks.

Erased my system data completely

I ran source download_raw_files.sh script in my system(ubuntu 18 with python 3.7.5) and after running for some time I saw that my system data got erased completely. And now I am not able to use my system, whole data including my home folder has been deleted.

Does it have any link with your download_raw_files.sh script? I see that there are some "rm -f" commands in your script.

Thanks