shariqfarooq123 / adabins Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of Adabins: Depth Estimation using adaptive bins
License: GNU General Public License v3.0
Official implementation of Adabins: Depth Estimation using adaptive bins
License: GNU General Public License v3.0
|. You can download the predicted depths in 16-bit format for NYU-Depth-v2 official test set and KITTI Eigen split test set here
you said here , we could have one demo train in your codes, but could you please kindly specify the URI , where we could download the mini dataset KITTI .
Thanks in advanced .
请问example_rgb_batch是什么
Very nice work!
I noticed that in both kitti and nyu train/test_files with_gt.txt there is a column of focal values.
In nyu the focal value is a constant 518.8579
But in kitti the focal values are ranging from 707.0912 to 721.856.
What is focal? Is it camera focal length? in what unit? (mm, or pixel)?
Are focal values used in training the model?
The reason I asked these questions is if I want to train the model with my own datasets, do I need to provide focal value? if the model doesn't really use it for training/testing, could I just give an arbitrary number?
Thanks a lot for your help.
Could you provide some instructions?
Hey there,
Thanks for your work. I'm trying to train the model on slurm and I think I'm missing something.
When I set to train with 3 GPUS on one node:
#SBATCH --nodes=1
#SBATCH --gres=gpu:rtx_8000:3
I get the exact same speed as using 1 GPU:
#SBATCH --nodes=1
#SBATCH --gres=gpu:rtx_8000:1
either with --distributed
or without it. And the training itself is very slow, it performs like 20 steps (not epochs) for 8 hours.
I have around 300k images in train and 200k in validation. All the other parameters are the same.
Is there anything I can do about it?
Hey there,
Thanks for your work.
I'm currently trying to overfit for only one image with the basic setup.
However, after 100 epochs with standard parameters, the losses do not really converge during training. For example, SILog loss is plateaued during training at 1.5, and chamfer loss is around 1.1. See the wdb board here.
Moreover, as you can see from the dashboards, the convergence seems really slow for just one image. Are there any hints to improve specifically for AdaBins losses? (Already tried various lr and turning off the scheduler)
Regarding the input. My depth map labels are actually numpy arrays with real depth values ranging from 0 to 80 meters.
Any hints on how to make the network converge to a single image would be appreciated!
Hi,
As the title, if I want to train my own data set, what steps should I take?Do I need to make it into NYU data set format for processing?
Thanks!
Hi,
First of all very nice paper and architechture! Ive been playing with the training code and trained a model with my own data. However after a couple of epochs im getting a Nan loss and after some debugning it seems like its the output of the model returns a nan tensor. I played a little bit with gradient clipping to see if it would help but it does not seem to be the case.
Best regards
I'm interested in packaging this for the conda-forge. Would it be possible to generate a release for this repo? Thanks!
I try to run 'python train.py',but always 0%
Dear authors,
I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.
I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors
script.
I'm obtaining the following results:
a1 | a2 | a3 | abs_rel | rmse | log_10 | rmse_log | silog | sq_rel |
---|---|---|---|---|---|---|---|---|
0.782 | 0.961 | 0.991 | 0.146 | 0.666 | 0.065 | 0.204 | 19.751 | 0.111 |
a1 | a2 | a3 | abs_rel | rmse | log_10 | rmse_log | silog | sq_rel |
---|---|---|---|---|---|---|---|---|
0.791 | 0.950 | 0.986 | 0.148 | 0.655 | 0.064 | 0.210 | 20.915 | 0.118 |
a1 | a2 | a3 | abs_rel | rmse | log_10 | rmse_log | silog | sq_rel |
---|---|---|---|---|---|---|---|---|
0.861 | 0.973 | 0.994 | 0.118 | 0.562 | 0.053 | 0.174 | 16.925 | 0.082 |
This is far away from the results shown on paper. Can you please help me reproduce your paper results?
In the Recommended Way, Image has to be imported as from PIL import Image
Thank you for sharing all of your code, this is an amazing contribution to depth society.
I would like to ask why do you add 15% of the mse instead of substracting the 85% of mse in your loss function below?
Line 24 in 2fb686a
I was curious since your paper refers to a substraction as follows:
Thank you for your time
Hey there,
Can you please provide an environment file?
I have noticed that you use syncbn in your code.
I try use some image from other dataset to test the prediction.
I use a image from nuscenes-dataset which the size is 9001600(hw) but when I run the inference ,
there is an error:
/content/drive/My Drive/AdaBins/models/layers.py in forward(self, x)
17 embeddings = self.embedding_convPxP(x).flatten(2) # .shape = n,c,s = n, embedding_dim, s
18 # embeddings = nn.functional.pad(embeddings, (1,0)) # extra special token at start ?
---> 19 embeddings = embeddings + self.positional_encodings[:embeddings.shape[2], :].T.unsqueeze(0)
20
21 # change to S,N,E format required by transformer
RuntimeError: The size of tensor a (1400) must match the size of tensor b (500) at non-singleton dimension 2
but when resize this to the shape of kitti-dataset or to 352*704,it works
do you have any suggestion ,which size should I choose
utils.colorize does not work properly.
Line 66 in utils.py is now:
img = value[:, :, :3]
Should probably be:
img = value.squeeze()
hello, i want to ask something about your depth ground truth, did u densify the raw kitti depth_gt and what kind of methods did u use?
While I read your implementation of mVIT, I found that the second output of mVIT has a shape [n, 128, embedding_dim].
If I correctly understand your paper, the shape should be [n, S, embedding_dim] where S=h/p*w/p following the notation of paper.
Please give me some explanation or any reason for this part.
(Edit) I understood that part, and finally I've got what misc is!
Sorry, and Thank you.
Hi,
We have the Depth map values like in scales of 10-10000 for each pixel in the image. How do we find the actual distance from camera taking the picture like 5000 value on depth map corresponds to 4.5m from camera kind of thing?
Hi @shariqfarooq123,
Thanks for your work!
I have a question. Have you tried to use the CNN to replace the transformer?
The following situation occurred when I ran the code
Traceback (most recent call last):
File "infer.py", line 158, in
inferHelper = InferenceHelper()
File "infer.py", line 85, in init
model, _, _ = model_io.load_checkpoint(pretrained_path, model)
File "/home/titan2/AdaBins/model_io.py", line 37, in load_checkpoint
ckpt = torch.load(fpath, map_location='cpu')
File "/home/titan2/anaconda3/envs/adabins/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/titan2/anaconda3/envs/adabins/lib/python3.6/site-packages/torch/serialization.py", line 599, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: ./pretrained/AdaBins_nyu.pt is a zip archive (did you mean to use torch.jit.load()?)
The following situation occurred when I ran the code
After I googled, I knew it was a pytorch version issue
I currently use Pytorch 1.3.1
I want to know what the Pytorch version of the pretrained model is when you are training
Thank you!
Hi,
i am looking for cpu version of ada bins. if some one could help, it would be great.
Thanks
-jyothi
Hi, there. Thank you for your paper and code.
I tried to re-train your model named Base+R in Table 6.
As mentioned in your paper, I set Cd=1 which refers to num_classes=1 in
AdaBins/models/unet_adaptive_bins.py
Line 26 in 2fb686a
class BaseR(nn.Module):
def __init__(self):
super(BaseR, self).__init__()
self.encoder = Encoder()
self.decoder = DecoderBN(num_classes=1)
def forward(self, x):
unet_out = self.decoder(self.encoder(x))
return unet_out
....
Besides, the loss only contains the Pixel-wise depth loss. The batch size is 10 for gpu memory limitation. All other configurations were the same as yours.
After trained with 25 epochs, I got the loss around 0.4 and the loss curve as:
And the evaluation results are:
Metrics: {'a1': 0.0, 'a2': 0.0, 'a3': 0.0, 'abs_rel': 0.974, 'rmse': 2.795, 'log_10': 1.583, 'rmse_log': 3.649, 'silog': 12.393, 'sq_rel': 2.555}
I don't know what is wrong here. Do you remember your final loss value ? or can you provide your model code for Base+R.
Thank you very much! :)
Hi, I was doing some tests with this project and I am impressed, thank you for sharing it!
I do have a question, I was testing in colab (on a Telsa K80) and it took around .8-1.0s per image. I was wondering, do you think that with enough optimizations I could get a close-to-realtime performance on any Jetson @640x480, or is it pointless to even try? (I have not gone in depth into the code, so I don't know how much performance headroom is left)
Hi please add cola demo, cheers.
Thanks
Aiyush
Hi, I appreciate your work a lot, thanks for releasing the code.
In your code evaluate.py, I see that you do Grag crop to depth_gt, but I also see that in your test dataloader, when you load the depth_gt, you do kitti benchmark crop, i.e., crop to (1216 x 352). Hence, you first crop the depth_gt to size of (1216 x 352), and then you crop them according to Grag crop.
Did I miss something? Is that true? Can you explain the reasonability?
By the way, do you evaluate the kitti eigen split on 652 test images or on the raw 697 test images?
Thanks again!
Hello @shariqfarooq123
Thanks for sharing the code. Would it be possible to also share the pretrained weights?
'./pretrained/AdaBins_nyu.pt'
What was your methodology for choosing the location of skip connections in the Efficient Net encoder?
Dear authors,
I noticed that you use the official splits of the NYU Depth V2 dataset, but with the tools available from the dataset page I cannot reproduce the same kind of filenames that you use on your files.
Could you please make available the code used to preprocess the NYU Depth dataset?
Thanks!
Hi,
Thank you so much for publishing the code. This work is excellent and has achieved state-of-the-art performance. However, when I was trying to retrain the model with my training code built with pytorch lightning, I was unable to reach the results obtained in the paper.
I think I have found a bug in the code, which may be why I could not achieve the desired result. When the args.same_lr
flag is set to false, the encoder and decoder are supposed to have different learning rates. However, the learning rate is still constant.
The OneCycleLR used in this line expects a list of lrs to use different learning rates for encoder and decoder. But as an integer is used, the lr is the same for both encoder and decoder even when args.same_lr
is set to false.
When I print out the scheduler.optimizer
, I receive the following.
AdamW (
Parameter Group 0
amsgrad: False
base_momentum: 0.85
betas: (0.95, 0.999)
eps: 1e-08
initial_lr: 1.428e-05
lr: 1.4279999999999978e-05
max_lr: 0.000357
max_momentum: 0.95
min_lr: 1.428e-07
weight_decay: 0.1
Parameter Group 1
amsgrad: False
base_momentum: 0.85
betas: (0.95, 0.999)
eps: 1e-08
initial_lr: 1.428e-05
lr: 1.4279999999999978e-05
max_lr: 0.000357
max_momentum: 0.95
min_lr: 1.428e-07
weight_decay: 0.1
)
When I replace the section with the following:
lrs = [l['lr'] for l in optimizer.param_groups]
###################################### Scheduler ###############################################
scheduler = optim.lr_scheduler.OneCycleLR(optimizer, lrs, epochs=epochs, steps_per_epoch=len(train_loader),
cycle_momentum=True,
base_momentum=0.85, max_momentum=0.95, last_epoch=args.last_epoch,
div_factor=args.div_factor,
final_div_factor=args.final_div_factor)
Notice that the output of scheduler.optimizer
changes to the following
AdamW (
Parameter Group 0
amsgrad: False
base_momentum: 0.85
betas: (0.95, 0.999)
eps: 1e-08
initial_lr: 1.428e-06
lr: 1.4280000000000006e-06
max_lr: 3.57e-05
max_momentum: 0.95
min_lr: 1.428e-08
weight_decay: 0.1
Parameter Group 1
amsgrad: False
base_momentum: 0.85
betas: (0.95, 0.999)
eps: 1e-08
initial_lr: 1.428e-05
lr: 1.4279999999999978e-05
max_lr: 0.000357
max_momentum: 0.95
min_lr: 1.428e-07
weight_decay: 0.1
)
Would you please let me know if this behavior is intentional?
Thank you,
Best,
SK
Sorry, I have a little question on the Test-Set with Nyu Depth V2...
I have saw "Depth Map Prediction from a Single Image using a Multi-Scale Deep Network", he mentioned that official test-set(215 scenes in the 464 scenes). but i saw many papers and official website. I didn't saw that where officially mentioned that which 215 scenes are for test which 249 scenes for training.
If you can, I hope you can answer me. Really thanks.
Hi, Thanks for your code.
Could you please provide the visualization code of nyud predicted depth or the heat map of nyud predicted depth?
Thank you very much!
I'm trying to implement it on my jupyter notebook, following Readme file, downloaded pretrained weights and saved it in a directory named pretrained. While running the code for predicting depth for a single pillow image (I used the test image given in the test_imgs), the predicted_depth is a numpy array with shape =[1,1,480,640]. I didn't understand why predicted_depth is 4 dimentional array. Moving further, I tried to run the code for predicting depths of all images from a directory so I again used the test_imgs directory containing classroom image and specified a target directory for storing 16-bit output. I'm getting type error, Can't handle this data type (1,1, 480, 640), <u2. Shouldn't the dimensions of the predicted depth output be a 2-dimentional array instead of 4?
from infer import InferenceHelper
infer_helper = InferenceHelper(dataset='nyu')
infer_helper.predict_dir("test_imgs/", "test_imgs_results/")
Output
Loading base model ()...
Using cache found in C:\Users\Hp/.cache\torch\hub\rwightman_gen-efficientnet-pytorch_master
Done.
Removing last two layers (global_pool & classifier).
Building Encoder-Decoder model..Done.
0%| | 0/1 [00:04<?, ?it/s]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\PIL\Image.py in fromarray(obj, mode)
2679 try:
-> 2680 mode, rawmode = _fromarray_typemap[typekey]
2681 except KeyError:
KeyError: ((1, 1, 480, 640), '<u2')
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-1-8cfe96e3a7b5> in <module>
6 #bin_centers, predicted_depth = infer_helper.predict_pil(img)
7 # predict depths of images stored in a directory and store the predictions in 16-bit format in a given separate dir
----> 8 infer_helper.predict_dir("test_imgs/", "test_imgs_results/")
~\anaconda3\lib\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, **kwargs)
13 def decorate_context(*args, **kwargs):
14 with self:
---> 15 return func(*args, **kwargs)
16 return decorate_context
17
<personal_directories>\AdaBins\infer.py in predict_dir(self, test_dir, out_dir)
146 save_path = os.path.join(out_dir, basename + ".png")
147
--> 148 Image.fromarray(final).save(save_path)
149
150
~\anaconda3\lib\site-packages\PIL\Image.py in fromarray(obj, mode)
2680 mode, rawmode = _fromarray_typemap[typekey]
2681 except KeyError:
-> 2682 raise TypeError("Cannot handle this data type: %s, %s" % typekey)
2683 else:
2684 rawmode = mode
TypeError: Cannot handle this data type: (1, 1, 480, 640), <u2
Hey all... just trying to run this on some HD png images... am I missing something?
`X:\miniconda3\envs\AdaBins2\python.exe C:/Users/proscans/AdaBins/testItOut.py
Loading base model ()...Using cache found in C:\Users\proscans/.cache\torch\hub\rwightman_gen-efficientnet-pytorch_master
Done.
Removing last two layers (global_pool & classifier).
Building Encoder-Decoder model..Done.
Traceback (most recent call last):
File "C:/Users/proscans/AdaBins/testItOut.py", line 12, in
bin_centers, predicted_depth = infer_helper.predict_pil(img)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "C:\Users\proscans\AdaBins\infer.py", line 95, in predict_pil
bin_centers, pred = self.predict(img)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "C:\Users\proscans\AdaBins\infer.py", line 106, in predict
bins, pred = self.model(image)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\proscans\AdaBins\models\unet_adaptive_bins.py", line 94, in forward
bin_widths_normed, range_attention_maps = self.adaptive_bins_layer(unet_out)
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\proscans\AdaBins\models\miniViT.py", line 25, in forward
tgt = self.patch_transformer(x.clone()) # .shape = S, N, E
File "X:\miniconda3\envs\AdaBins2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\proscans\AdaBins\models\layers.py", line 19, in forward
embeddings = embeddings + self.positional_encodings[:embeddings.shape[2], :].T.unsqueeze(0)
RuntimeError: The size of tensor a (1980) must match the size of tensor b (500) at non-singleton dimension 2
Process finished with exit code 1`
Because when I test the SUN RGB-D database, the test results are different
So I want to ask about the evaluation settings on the SUN RGB-D dataset.
Question 1 When testing SUN RGB-D, has the maximum depth been adjusted?
Question 2: SUN RGB-D has several databases. Does your test result on the paper test 5050 images at a time or test the databases separately and then take the average?
Question 3 Do you use evaluate.py on SUNRGB-D data and set the data set to NYU?
Hi I am trying to implement this model along with yolo Y5 ,realtime by using cameras on a motorbike to calculate the distance of vehicles,while riding the bike, but the accuracy is very bad is there any way to improve the accuracy ?,and is it possible to implement this on Jetson Nano ?
Using diff LR
Epoch: 1/25. Loop: Train: 0%| | 0/1515 [00:00<?, ?it/s]
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/tuxiang/Documents/chenyihang/AdaBins-main/adabins/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/tuxiang/Documents/chenyihang/AdaBins-main/adabins/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/usr/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
waitting for the code to open source...
Hi,
Thank you for sharing the code.
I would like to ask a question on the weight decay coefficient.
In your paper, you said, "For training, we use the AdamW optimizer [30] with weight-decay 10−2."
However, in your code(args_train_kitti_eigen.txt or args_train_nyu.txt), you wrote, "--wd 0.1"
Can you tell me which is correct?
Thanks.
Hi
As the title, I have some questions about train for my dataset.
Dear authors,
Can you please confirm that the (NYU) Test and Train datasets are the same used in your old paper titled "High-Quality Monocular Depth Estimation via Transfer Learning"?
Thank you.
It seems that the model doesn't support a image larger than (640,480) such as (1000,750) since in line 19 in layers.py
embeddings = embeddings + self.positional_encodings[:embeddings.shape[2], :].T.unsqueeze(0)
embeddings.shape[2]=713 is larger than 500 which is configured in line 14 in layers.py
self.positional_encodings = nn.Parameter(torch.rand(500, embedding_dim), requires_grad=True)
I tried to increase the value from 500 to 1000, however the pretrained NYU model doesn't support it, so what should I do to support an image resolution of (1000,750), do I have to adjust model parameters and then retrain the network?
Hi,
Currently, the depth result is 640x480x1 - 16 bit values, when opening the image it's greyscaled obviously. How can we obtain the depth map in RGB colorspace too, like the ones mentioned in the paper
How to train this model for Uniform bins sizes as reported in Table. 6 of the main paper?
The mViT outputs both bin widths and Range attention maps. So if we were to use uniform bin widths, should we just discard the mViT's predicted bin widths and recompute the bin centers on a uniform bin widths tensor and finally compute the depth prediction?
Line 29 in 2fb686a
Lines 187 to 195 in 2fb686a
It will be great if you can share all the related changes for converting AdaBins to Uniform Bins.
from infer import InferenceHelper
infer_helper = InferenceHelper(dataset='nyu')
infer_helper.predict_dir("test_imgs/", "test_imgs/")
Basically, I tried to work out the sample inference script to predict images in batch.
But while converting the predicted tensor to PIL Image, this occurs.
TypeError: Cannot handle this data type: (1, 1, 480, 640), <u2
Hi, first of all, thanks for your nice work and open source code. I faced a problem when evaluating the result. As list in Readme, I download the predicted depths in 16-bit format for NYU-Depth-v2 official test set and KITTI Eigen split test set from the providing link:
https://drive.google.com/drive/folders/1b3nfm8lqrvUjtYGmsqA5gptNQ8vPlzzS?usp=sharing
Then, I used the evaluation code in infer.py. In particular, I directly compared the downloaded depth maps with gt depth maps provided in NYU official split and the input size follows the eigen's center crop. The ground truth depth maps was obtained from DenseDepth
Actually, I tried two different input size:
input size1: 228x304 (Center cropping)
Results: Metrics: {'a1': 0.895, 'a2': 0.98, 'a3': 0.997, 'abs_rel': 0.106, 'rmse': 0.419, 'log_10': 0.045, 'rmse_log': 0.131, 'silog': 9.761, 'sq_rel': 0.075}
input size2: 45:471, 41:601 as used in your code (line132)
Results: Metrics: {'a1': 0.887, 'a2': 0.978, 'a3': 0.995, 'abs_rel': 0.11, 'rmse': 0.41, 'log_10': 0.047, 'rmse_log': 0.142, 'silog': 11.748, 'sq_rel': 0.07}
However, the absolute relative error is 0.103 in your paper.
I can not find out the problem. Would you please help to figure it out or provide the script for reproducing the result ?
Thank you very much! :)
Hello,
Where do I get the the test set of the SUN RGB-D dataset?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.