shariqfarooq123 / adabins Goto Github PK

Official implementation of Adabins: Depth Estimation using adaptive bins

License: GNU General Public License v3.0

Python 100.00%

adaptive-bins deep-learning depth-estimation metric-depth-estimation monocular-depth-estimation neural-networks pretrained-models single-image-depth-prediction transformers

adabins's People

Contributors

Stargazers

Watchers

Forkers

kevinnt2018 templeblock fatemehkarimii gaopeng91 dyy0205 jucic superxingzai yifanzhu314 nitishjaiswal volpepe olegjakushkin vi-sri inf800 tianhaofu mingyangzhang77 rensimon zero2er0 ashutosh1807 64bits tjqansthd chien-lung p-ranav penitto seonguke anhquancao pavless muyiyunzi weiyunjiang huangwenwenlili henrykang1 r-fehler n1ckfg velcorn d2021101420 ykk136 sharib-vision vladimiryugay bingyuanw satoshirobatofujimoto motokimura kwkarlwang heekwangjeon emransalehali malesilver kundan0 towan424 karthiknagarajansundar terrisgo younglbt pbl-summer cv-ip yuan1998jeff yiweichen04 arqam-ai tim885 avril-affine wenlk metavai uyuhgjnasjfdknf iceage7 chen-che-w nikepupu elerson amalbinessa vignkri kevinpogrund dejavu979 zhang405744522 kartikwar rehohoho pilotier jasonchan117 zxf864823150 alireza-moeini theonetrueguy mcx francescodisalvo05 electronicelement witteringbulb anthonydickson menglinghui spellscaper ericm1030 mishav78 fcntes shangbuhuan13 qsinghuan jinqi2376 cvsod tommyjiang aqdusc pytti-tools jeremydemers milanpatel25 bloody-trevi dreamz520 s1wana berkshireasia marcusschilling linjumble

adabins's Issues

could you please kindly specify the URI of the test dataset

|. You can download the predicted depths in 16-bit format for NYU-Depth-v2 official test set and KITTI Eigen split test set here
you said here , we could have one demo train in your codes, but could you please kindly specify the URI , where we could download the mini dataset KITTI .

Thanks in advanced .

example_rgb_batch

请问example_rgb_batch是什么

what is focal for in train_files_with_gt.txt

Hi @shariqfarooq123

Very nice work!

I noticed that in both kitti and nyu train/test_files with_gt.txt there is a column of focal values.
In nyu the focal value is a constant 518.8579
But in kitti the focal values are ranging from 707.0912 to 721.856.
What is focal? Is it camera focal length? in what unit? (mm, or pixel)?
Are focal values used in training the model?

The reason I asked these questions is if I want to train the model with my own datasets, do I need to provide focal value? if the model doesn't really use it for training/testing, could I just give an arbitrary number?

Thanks a lot for your help.

How to get your 50k subset?

Could you provide some instructions?

Training on slurm

Hey there,

Thanks for your work. I'm trying to train the model on slurm and I think I'm missing something.

When I set to train with 3 GPUS on one node:

#SBATCH --nodes=1
#SBATCH --gres=gpu:rtx_8000:3

I get the exact same speed as using 1 GPU:

#SBATCH --nodes=1
#SBATCH --gres=gpu:rtx_8000:1

either with --distributed or without it. And the training itself is very slow, it performs like 20 steps (not epochs) for 8 hours.

I have around 300k images in train and 200k in validation. All the other parameters are the same.

Is there anything I can do about it?

Convergence on a custom dataset

Hey there,

Thanks for your work.

I'm currently trying to overfit for only one image with the basic setup.
However, after 100 epochs with standard parameters, the losses do not really converge during training. For example, SILog loss is plateaued during training at 1.5, and chamfer loss is around 1.1. See the wdb board here.

Moreover, as you can see from the dashboards, the convergence seems really slow for just one image. Are there any hints to improve specifically for AdaBins losses? (Already tried various lr and turning off the scheduler)

Regarding the input. My depth map labels are actually numpy arrays with real depth values ranging from 0 to 80 meters.

Any hints on how to make the network converge to a single image would be appreciated!

RuntimeError: ./pretrained/AdaBins_nyu.pt is a zip archive (did you mean to use torch.jit.load()?)

how to train my own datasets

Hi,
As the title, if I want to train my own data set, what steps should I take?Do I need to make it into NYU data set format for processing？
Thanks!

Nan Loss after a couple of epochs

Hi,

First of all very nice paper and architechture! Ive been playing with the training code and trained a model with my own data. However after a couple of epochs im getting a Nan loss and after some debugning it seems like its the output of the model returns a nan tensor. I played a little bit with gradient clipping to see if it would help but it does not seem to be the case.

Best regards

Generate release?

I'm interested in packaging this for the conda-forge. Would it be possible to generate a release for this repo? Thanks!

how to train?

I try to run 'python train.py',but always 0%

Cant reproduce evaluation metrics on NYU

Dear authors,

I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.

I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors script.

I'm obtaining the following results:

Retrained - BEST

a1	a2	a3	abs_rel	rmse	log_10	rmse_log	silog	sq_rel
0.782	0.961	0.991	0.146	0.666	0.065	0.204	19.751	0.111

Retrained - LAST

a1	a2	a3	abs_rel	rmse	log_10	rmse_log	silog	sq_rel
0.791	0.950	0.986	0.148	0.655	0.064	0.210	20.915	0.118

Trained by AUTHORS

a1	a2	a3	abs_rel	rmse	log_10	rmse_log	silog	sq_rel
0.861	0.973	0.994	0.118	0.562	0.053	0.174	16.925	0.082

This is far away from the results shown on paper. Can you please help me reproduce your paper results?

NameError: name 'Image' is not defined

In the Recommended Way, Image has to be imported as from PIL import Image

Loss Function on Paper

Thank you for sharing all of your code, this is an amazing contribution to depth society.

I would like to ask why do you add 15% of the mse instead of substracting the 85% of mse in your loss function below?

AdaBins/loss.py

Line 24 in 2fb686a

Dg = torch.var(g) + 0.15 * torch.pow(torch.mean(g), 2)

I was curious since your paper refers to a substraction as follows:

Thank you for your time

Environment file

Hey there,

Can you please provide an environment file?

How many GPUs or Cards are used when training?

I have noticed that you use syncbn in your code.

is there any requirement of input image size by prediction(inference)

I try use some image from other dataset to test the prediction.
I use a image from nuscenes-dataset which the size is 9001600(hw) but when I run the inference ,
there is an error:
/content/drive/My Drive/AdaBins/models/layers.py in forward(self, x)
17 embeddings = self.embedding_convPxP(x).flatten(2) # .shape = n,c,s = n, embedding_dim, s
18 # embeddings = nn.functional.pad(embeddings, (1,0)) # extra special token at start ?
---> 19 embeddings = embeddings + self.positional_encodings[:embeddings.shape[2], :].T.unsqueeze(0)
20
21 # change to S,N,E format required by transformer

RuntimeError: The size of tensor a (1400) must match the size of tensor b (500) at non-singleton dimension 2

but when resize this to the shape of kitti-dataset or to 352*704,it works
do you have any suggestion ,which size should I choose

utils.colorize is dropping axes incorrectly

utils.colorize does not work properly.

Line 66 in utils.py is now:
img = value[:, :, :3]

Should probably be:
img = value.squeeze()

pre process of the raw data

hello, i want to ask something about your depth ground truth, did u densify the raw kitti depth_gt and what kind of methods did u use?

Why the output for range attention map from mVIT has a shape [n, 128, embedding dim]?

While I read your implementation of mVIT, I found that the second output of mVIT has a shape [n, 128, embedding_dim].
If I correctly understand your paper, the shape should be [n, S, embedding_dim] where S=h/p*w/p following the notation of paper.
Please give me some explanation or any reason for this part.

(Edit) I understood that part, and finally I've got what misc is!
Sorry, and Thank you.

Actual distance value from Depth Maps

Hi,
We have the Depth map values like in scales of 10-10000 for each pixel in the image. How do we find the actual distance from camera taking the picture like 5000 value on depth map corresponds to 4.5m from camera kind of thing?

Using CNN layers replace transformer

Hi @shariqfarooq123,
Thanks for your work!
I have a question. Have you tried to use the CNN to replace the transformer?

Pytorch version

The following situation occurred when I ran the code

Traceback (most recent call last):
File "infer.py", line 158, in
inferHelper = InferenceHelper()
File "infer.py", line 85, in init
model, _, _ = model_io.load_checkpoint(pretrained_path, model)
File "/home/titan2/AdaBins/model_io.py", line 37, in load_checkpoint
ckpt = torch.load(fpath, map_location='cpu')
File "/home/titan2/anaconda3/envs/adabins/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/titan2/anaconda3/envs/adabins/lib/python3.6/site-packages/torch/serialization.py", line 599, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: ./pretrained/AdaBins_nyu.pt is a zip archive (did you mean to use torch.jit.load()?)

The following situation occurred when I ran the code
After I googled, I knew it was a pytorch version issue
I currently use Pytorch 1.3.1
I want to know what the Pytorch version of the pretrained model is when you are training
Thank you!

CPU Version of ADA BINS

Hi,

i am looking for cpu version of ada bins. if some one could help, it would be great.

Thanks
-jyothi

About the configuration for Base+R in Table 6

Hi, there. Thank you for your paper and code.
I tried to re-train your model named Base+R in Table 6.
As mentioned in your paper, I set Cd=1 which refers to num_classes=1 in

AdaBins/models/unet_adaptive_bins.py

Line 26 in 2fb686a

def __init__(self, num_features=2048, num_classes=1, bottleneck_features=2048):

. Due to the output should be greater than zero (log used in loss function), I added an extra nn.Sigmoid to force the output to be [0, 1] and scaled it to [0, 10] by multiply 10. These two changes were all made in DecoderBN.
The final model looks like:

class BaseR(nn.Module):  
    def __init__(self):  
        super(BaseR, self).__init__()  
        self.encoder = Encoder()  
        self.decoder = DecoderBN(num_classes=1)  
    def forward(self, x):  
        unet_out = self.decoder(self.encoder(x))  
        return unet_out  
    ....

Besides, the loss only contains the Pixel-wise depth loss. The batch size is 10 for gpu memory limitation. All other configurations were the same as yours.
After trained with 25 epochs, I got the loss around 0.4 and the loss curve as:

And the evaluation results are:
Metrics: {'a1': 0.0, 'a2': 0.0, 'a3': 0.0, 'abs_rel': 0.974, 'rmse': 2.795, 'log_10': 1.583, 'rmse_log': 3.649, 'silog': 12.393, 'sq_rel': 2.555}

I don't know what is wrong here. Do you remember your final loss value ? or can you provide your model code for Base+R.

Thank you very much! :)

Real-time?

Hi, I was doing some tests with this project and I am impressed, thank you for sharing it!

I do have a question, I was testing in colab (on a Telsa K80) and it took around .8-1.0s per image. I was wondering, do you think that with enough optimizations I could get a close-to-realtime performance on any Jetson @640x480, or is it pointless to even try? (I have not gone in depth into the code, so I don't know how much performance headroom is left)

Please add Colab demo

Hi please add cola demo, cheers.

Thanks
Aiyush

How do you evaluate the kitti eigen split?

Hi, I appreciate your work a lot, thanks for releasing the code.

In your code evaluate.py, I see that you do Grag crop to depth_gt, but I also see that in your test dataloader, when you load the depth_gt, you do kitti benchmark crop, i.e., crop to (1216 x 352). Hence, you first crop the depth_gt to size of (1216 x 352), and then you crop them according to Grag crop.
Did I miss something? Is that true? Can you explain the reasonability?

By the way, do you evaluate the kitti eigen split on 652 test images or on the raw 697 test images?

Thanks again!

pretrained weights.

Hello @shariqfarooq123

Thanks for sharing the code. Would it be possible to also share the pretrained weights?

'./pretrained/AdaBins_nyu.pt'

Skip Connection Locations

What was your methodology for choosing the location of skip connections in the Efficient Net encoder?

Pre-processing on NYU dataset

Dear authors,

I noticed that you use the official splits of the NYU Depth V2 dataset, but with the tools available from the dataset page I cannot reproduce the same kind of filenames that you use on your files.

Could you please make available the code used to preprocess the NYU Depth dataset?

Thanks!

Bug in the scheduler

Hi,

Thank you so much for publishing the code. This work is excellent and has achieved state-of-the-art performance. However, when I was trying to retrain the model with my training code built with pytorch lightning, I was unable to reach the results obtained in the paper.

I think I have found a bug in the code, which may be why I could not achieve the desired result. When the args.same_lr flag is set to false, the encoder and decoder are supposed to have different learning rates. However, the learning rate is still constant.

The OneCycleLR used in this line expects a list of lrs to use different learning rates for encoder and decoder. But as an integer is used, the lr is the same for both encoder and decoder even when args.same_lr is set to false.

When I print out the scheduler.optimizer, I receive the following.

AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1
)

When I replace the section with the following:

    lrs = [l['lr'] for l in optimizer.param_groups]
    ###################################### Scheduler ###############################################
    scheduler = optim.lr_scheduler.OneCycleLR(optimizer, lrs, epochs=epochs, steps_per_epoch=len(train_loader),
                                              cycle_momentum=True,
                                              base_momentum=0.85, max_momentum=0.95, last_epoch=args.last_epoch,
                                              div_factor=args.div_factor,
                                              final_div_factor=args.final_div_factor)

Notice that the output of scheduler.optimizer changes to the following


AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-06
    lr: 1.4280000000000006e-06
    max_lr: 3.57e-05
    max_momentum: 0.95
    min_lr: 1.428e-08
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1
)

Would you please let me know if this behavior is intentional?

Thank you,
Best,
SK

a little question on Test-set(654image) of Nyu-depth V2

Sorry, I have a little question on the Test-Set with Nyu Depth V2...
I have saw "Depth Map Prediction from a Single Image using a Multi-Scale Deep Network", he mentioned that official test-set(215 scenes in the 464 scenes). but i saw many papers and official website. I didn't saw that where officially mentioned that which 215 scenes are for test which 249 scenes for training.

If you can, I hope you can answer me. Really thanks.

visualization of nyud predicted depth

Hi, Thanks for your code.

Could you please provide the visualization code of nyud predicted depth or the heat map of nyud predicted depth?

Thank you very much!

TypeError: Cannot handle this data type: (1, 1, 480, 640), <u2

I'm trying to implement it on my jupyter notebook, following Readme file, downloaded pretrained weights and saved it in a directory named pretrained. While running the code for predicting depth for a single pillow image (I used the test image given in the test_imgs), the predicted_depth is a numpy array with shape =[1,1,480,640]. I didn't understand why predicted_depth is 4 dimentional array. Moving further, I tried to run the code for predicting depths of all images from a directory so I again used the test_imgs directory containing classroom image and specified a target directory for storing 16-bit output. I'm getting type error, Can't handle this data type (1,1, 480, 640), <u2. Shouldn't the dimensions of the predicted depth output be a 2-dimentional array instead of 4?

from infer import InferenceHelper
infer_helper = InferenceHelper(dataset='nyu')
infer_helper.predict_dir("test_imgs/", "test_imgs_results/")

Output

Loading base model ()...
Using cache found in C:\Users\Hp/.cache\torch\hub\rwightman_gen-efficientnet-pytorch_master
Done.
Removing last two layers (global_pool & classifier).
Building Encoder-Decoder model..Done.
  0%|                                                                                            | 0/1 [00:04<?, ?it/s]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\PIL\Image.py in fromarray(obj, mode)
   2679         try:
-> 2680             mode, rawmode = _fromarray_typemap[typekey]
   2681         except KeyError:

KeyError: ((1, 1, 480, 640), '<u2')

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-8cfe96e3a7b5> in <module>
      6 #bin_centers, predicted_depth = infer_helper.predict_pil(img)
      7 # predict depths of images stored in a directory and store the predictions in 16-bit format in a given separate dir
----> 8 infer_helper.predict_dir("test_imgs/", "test_imgs_results/")

~\anaconda3\lib\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, **kwargs)
     13         def decorate_context(*args, **kwargs):
     14             with self:
---> 15                 return func(*args, **kwargs)
     16         return decorate_context
     17 

<personal_directories>\AdaBins\infer.py in predict_dir(self, test_dir, out_dir)
    146             save_path = os.path.join(out_dir, basename + ".png")
    147 
--> 148             Image.fromarray(final).save(save_path)
    149 
    150 

~\anaconda3\lib\site-packages\PIL\Image.py in fromarray(obj, mode)
   2680             mode, rawmode = _fromarray_typemap[typekey]
   2681         except KeyError:
-> 2682             raise TypeError("Cannot handle this data type: %s, %s" % typekey)
   2683     else:
   2684         rawmode = mode

TypeError: Cannot handle this data type: (1, 1, 480, 640), <u2

RuntimeError

Hey all... just trying to run this on some HD png images... am I missing something?

`X:\miniconda3\envs\AdaBins2\python.exe C:/Users/proscans/AdaBins/testItOut.py
Loading base model ()...Using cache found in C:\Users\proscans/.cache\torch\hub\rwightman_gen-efficientnet-pytorch_master
Done.
Removing last two layers (global_pool & classifier).
Building Encoder-Decoder model..Done.
Traceback (most recent call last):
File "C:/Users/proscans/AdaBins/testItOut.py", line 12, in
bin_centers, predicted_depth = infer_helper.predict_pil(img)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "C:\Users\proscans\AdaBins\infer.py", line 95, in predict_pil
bin_centers, pred = self.predict(img)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "C:\Users\proscans\AdaBins\infer.py", line 106, in predict
bins, pred = self.model(image)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\proscans\AdaBins\models\unet_adaptive_bins.py", line 94, in forward
bin_widths_normed, range_attention_maps = self.adaptive_bins_layer(unet_out)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\proscans\AdaBins\models\miniViT.py", line 25, in forward
tgt = self.patch_transformer(x.clone()) # .shape = S, N, E
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\proscans\AdaBins\models\layers.py", line 19, in forward
embeddings = embeddings + self.positional_encodings[:embeddings.shape[2], :].T.unsqueeze(0)
RuntimeError: The size of tensor a (1980) must match the size of tensor b (500) at non-singleton dimension 2

Process finished with exit code 1`

Evaluation on SUN RGB-D dataset

Because when I test the SUN RGB-D database, the test results are different
So I want to ask about the evaluation settings on the SUN RGB-D dataset.
Question 1 When testing SUN RGB-D, has the maximum depth been adjusted?
Question 2: SUN RGB-D has several databases. Does your test result on the paper test 5050 images at a time or test the databases separately and then take the average?
Question 3 Do you use evaluate.py on SUNRGB-D data and set the data set to NYU?

Realtime Distance Calculation

Hi I am trying to implement this model along with yolo Y5 ,realtime by using cameras on a motorbike to calculate the distance of vehicles,while riding the bike, but the accuracy is very bad is there any way to improve the accuracy ?,and is it possible to implement this on Jetson Nano ?

train process can‘t train a epoch

Using diff LR
Epoch: 1/25. Loop: Train: 0%| | 0/1515 [00:00<?, ?it/s]
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/tuxiang/Documents/chenyihang/AdaBins-main/adabins/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/tuxiang/Documents/chenyihang/AdaBins-main/adabins/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/usr/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused

waitting for the code to open source...

A question on weight decay.

Hi,
Thank you for sharing the code.
I would like to ask a question on the weight decay coefficient.

In your paper, you said, "For training, we use the AdamW optimizer [30] with weight-decay 10−2."
However, in your code(args_train_kitti_eigen.txt or args_train_nyu.txt), you wrote, "--wd 0.1"
Can you tell me which is correct?

Thanks.

I want to train my dataset

Hi
As the title, I have some questions about train for my dataset.

what does 'focal' mean & How does that affect learning?
How much loss is required for a good output?

NYU Dataset

Dear authors,

Can you please confirm that the (NYU) Test and Train datasets are the same used in your old paper titled "High-Quality Monocular Depth Estimation via Transfer Learning"?

Thank you.

How to train on the NYU depth dataset? I only find 1449 pairs of RGB and depth images,didn't find the120k images and 50K subset. thanks a lot!

issue when processing imgs with a size of (1000,750)

It seems that the model doesn't support a image larger than (640,480) such as (1000,750) since in line 19 in layers.py

embeddings = embeddings + self.positional_encodings[:embeddings.shape[2], :].T.unsqueeze(0)

embeddings.shape[2]=713 is larger than 500 which is configured in line 14 in layers.py

self.positional_encodings = nn.Parameter(torch.rand(500, embedding_dim), requires_grad=True)

I tried to increase the value from 500 to 1000, however the pretrained NYU model doesn't support it, so what should I do to support an image resolution of (1000,750), do I have to adjust model parameters and then retrain the network?

Colored Depth Visualization

Hi,
Currently, the depth result is 640x480x1 - 16 bit values, when opening the image it's greyscaled obviously. How can we obtain the depth map in RGB colorspace too, like the ones mentioned in the paper

Code changes for Uniform bin widths

How to train this model for Uniform bins sizes as reported in Table. 6 of the main paper?

The mViT outputs both bin widths and Range attention maps. So if we were to use uniform bin widths, should we just discard the mViT's predicted bin widths and recompute the bin centers on a uniform bin widths tensor and finally compute the depth prediction?

AdaBins/models/miniViT.py

Line 29 in 2fb686a

regression_head, queries = tgt[0, ...], tgt[1:self.n_query_channels + 1, ...]

The regression_head is used to feed the MLP to get the bin widths. So I am assuming we just ignore this regression_head variable and not use any MLP and just return the range_attention_maps!?

AdaBins/train.py

Lines 187 to 195 in 2fb686a

 bin_edges, pred = model(img) 

 mask = depth > args.min_depth 

 l_dense = criterion_ueff(pred, depth, mask=mask.to(torch.bool), interpolate=True) 

 if args.w_chamfer > 0: 

 l_chamfer = criterion_bins(bin_edges, depth) 

 else: 

 l_chamfer = torch.Tensor([0]).to(img.device)

Assuming, we did that, the bin _edges returned by the model would be a constant tensor across all the images (uniform bins constraint). Now, we shouldn't use the chamfer loss for these bins right!? because these uniform bin edges are not something that the network predicted, and moreover, we cant force these predicted bin edges to mimic the bin edges in the ground truth.

It will be great if you can share all the related changes for converting AdaBins to Uniform Bins.

Batch Inference not working as intended

from infer import InferenceHelper
infer_helper = InferenceHelper(dataset='nyu')
infer_helper.predict_dir("test_imgs/", "test_imgs/")

Basically, I tried to work out the sample inference script to predict images in batch.

But while converting the predicted tensor to PIL Image, this occurs.
TypeError: Cannot handle this data type: (1, 1, 480, 640), <u2

About the NYU results in paper

Hi, first of all, thanks for your nice work and open source code. I faced a problem when evaluating the result. As list in Readme, I download the predicted depths in 16-bit format for NYU-Depth-v2 official test set and KITTI Eigen split test set from the providing link:
https://drive.google.com/drive/folders/1b3nfm8lqrvUjtYGmsqA5gptNQ8vPlzzS?usp=sharing
Then, I used the evaluation code in infer.py. In particular, I directly compared the downloaded depth maps with gt depth maps provided in NYU official split and the input size follows the eigen's center crop. The ground truth depth maps was obtained from DenseDepth
Actually, I tried two different input size:

input size1: 228x304 (Center cropping)
Results: Metrics: {'a1': 0.895, 'a2': 0.98, 'a3': 0.997, 'abs_rel': 0.106, 'rmse': 0.419, 'log_10': 0.045, 'rmse_log': 0.131, 'silog': 9.761, 'sq_rel': 0.075}

input size2: 45:471, 41:601 as used in your code (line132)
Results: Metrics: {'a1': 0.887, 'a2': 0.978, 'a3': 0.995, 'abs_rel': 0.11, 'rmse': 0.41, 'log_10': 0.047, 'rmse_log': 0.142, 'silog': 11.748, 'sq_rel': 0.07}

However, the absolute relative error is 0.103 in your paper.
I can not find out the problem. Would you please help to figure it out or provide the script for reproducing the result ?

Thank you very much! :)

Where do I get the the test set of the SUN RGB-D dataset?

Hello,

Where do I get the the test set of the SUN RGB-D dataset?

	bin_edges, pred = model(img)

	mask = depth > args.min_depth
	l_dense = criterion_ueff(pred, depth, mask=mask.to(torch.bool), interpolate=True)

	if args.w_chamfer > 0:
	l_chamfer = criterion_bins(bin_edges, depth)
	else:
	l_chamfer = torch.Tensor([0]).to(img.device)

shariqfarooq123 / adabins Goto Github PK

adabins's People

Contributors

Stargazers

Watchers

Forkers

adabins's Issues

Retrained - BEST

Retrained - LAST

Trained by AUTHORS

Recommend Projects

Recommend Topics

Recommend Org