hisfog / sfmnext-impl Goto Github PK
View Code? Open in Web Editor NEW[AAAI 2024] Official implementation of "SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation", and more.
License: MIT License
[AAAI 2024] Official implementation of "SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation", and more.
License: MIT License
How can I train a model on KITTI dataset using E5 as the backbone? Can I train it at a lower resolution like 640x192?
I am very interested in your work! I ran your model in TensorRT, C++. The fps is so low that I think I need to run a model with fp16 precision. Do you have a pretrained model trained with fp16 precision? Can I possibly get it?
Hello and nice work! My question is how to finetune the model on KITTI?
I tried with the script ./finetune/train_ft_SQLdepth.py but cannot get good enough results. Only abs_rel 0.0494 and rmse 2.182.
How can I see reprojected image and automasked image with test_simple.py?
May I ask which model or weights are specifically used for the ZoeDepth and MidaS models in your demo site?
Links would be appreciated, thanks.
Training KITTI in README.md only mentions Python train.py./args_ Files/args_ Res50_ Kitti_ 192x640_ Train.txt, args_ There seems to be no information about the ConvNeXt-L model in the files folder. We look forward to your answer. Thank you!
Thanks to the authors for their outstanding contribution. I have successfully reproduced the results of the paper. However, I have encountered new confusion.
At first, I failed to reproduce the results of the paper. However, I discovered from the authors' comments that using the previous version could reproduce the results.
So I obtained the previous version by executing the git checkout 6a1e997 command.
Here are the experimental results of my ResNet50 model at a resolution of 192x640.
Compare to the new version,the previous version that affects the experimental results is the 'train_files.txt' file in 'eigen_zhou', which contains about 71k images.However, the training set mentioned in the author's paper contains only 26k images.
In addition, I trained using the previous train split (71k images)on Monodepth2 and achieved results similar to the SQLdepth.
I am very confused by the experimental results.My question is:
I impressed your work!
I want to convert pth file (ConvNeXt_Large_SQLdepth) to onnx file.
I succeed encorder to onnx, but, unfortunately, I faild to covert depth.pth to onnx.
below is my error code :
/SfMNeXt-Impl/networks/layers.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert c == ck, "Number of channels in x and Embedding dimension (at dim 2) of K matrix must match"
Traceback (most recent call last):
File "export_to_onnx_depth.py", line 56, in
input_names=input_names_2, output_names=output_names_2, opset_version=11)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/init.py", line 28, in _export
result = utils._export(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 701, in _export
dynamic_axes=dynamic_axes)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 459, in _model_to_graph
use_new_jit_passes)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 420, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 380, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 1139, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 130, in forward
self._force_outplace,
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 119, in wrapper
out_vars, _ = _flatten(outs)
RuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input of unsupported type: int
I still couldn't find int type, where in this input, output.
May I get help?
Hello, I have a question, why are there no functions in the "layers.py" file, PatchTransformerEncoder, PixelWiseDotProduct.
作者您好,在运行python train.py ./args_files/args_res50_kitti_192x640_train.txt时,会报错Hugging Face Hub连接超时错误,请问这该如何解决。或者我能否手动下载模型权重文件,然后将其放置在脚本期望的位置?具体文件名称和位置能否请您告知一下。
错误情况如下:
(/data/ccy/env/SfMNeXt) ccy@kxB1:/data/ccy/project/SfMNeXt-Impl-main$ python train.py ./args_files/args_res50_kitti_192x640_train.txt
/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
socket.timeout: timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 790, in urlopen
response = self._make_request(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connection.py", line 611, in connect
self.sock = sock = self._new_conn()
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connection.py", line 212, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f14bbdeda90>, 'Connection to huggingface.co timed out. (connect timeout=10)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convnext_large.fb_in22k_ft_in1k/resolve/main/pytorch_model.bin (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f14bbdeda90>, 'Connection to huggingface.co timed out. (connect timeout=10)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
metadata = get_hf_file_metadata(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1624, in get_hf_file_metadata
r = _request_wrapper(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 402, in _request_wrapper
response = _request_wrapper(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 425, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 63, in send
return super().send(request, *args, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/adapters.py", line 507, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convnext_large.fb_in22k_ft_in1k/resolve/main/pytorch_model.bin (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f14bbdeda90>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: ecf9a333-7b8b-4830-bc46-02241ea337ff)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/ccy/project/SfMNeXt-Impl-main/train.py", line 22, in
trainer = Trainer(opts)
File "/data/ccy/project/SfMNeXt-Impl-main/trainer.py", line 64, in init
self.models["encoder"] = networks.Unet(pretrained=(not self.opt.load_pretrained_model), backbone=self.opt.backbone, in_channels=3, num_classes=self.opt.model_dim, decoder_channels=self.opt.dec_channels)
File "/data/ccy/project/SfMNeXt-Impl-main/networks/Unet.py", line 114, in init
encoder = create_model(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/convnext.py", line 986, in convnext_large
model = _create_convnext('convnext_large', pretrained=pretrained, **dict(model_args, **kwargs))
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/convnext.py", line 486, in _create_convnext
model = build_model_with_cfg(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_builder.py", line 397, in build_model_with_cfg
load_pretrained(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_builder.py", line 190, in load_pretrained
state_dict = load_state_dict_from_hf(pretrained_loc)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_hub.py", line 188, in load_state_dict_from_hf
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1377, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
期待得到您的回复 谢谢~
There is a minor bug. In mono_dataset.py, due to the frame_idxs
being [0, -1, 1]
, there can be situations of data going out of bounds when accessing getitem
.
Thanks for your work. My questions is Did you use the vit pretrain weights? In your code it seems that you use the adabins vit pretrain weights.
Dear,
The code provided is excellent.
However, when I try to run the code you provided, the actual model training doesn't seem to work correctly. For example, the performance of the model displayed in TensorBoard is not satisfactory, and the model output during training is not displayed correctly. Additionally, when I train the model, the output is represented as a black visual screen.
I only changed the data path in your code. Why is this happening?
Unlike the previous issue, the code seems to run fine, but the training doesn't appear to be working correctly.
What do you think?
Lastly, if the input is 192x640, is it correct for the output to be 96x320?
Sincerely,
Hi,
I have noticed that in your code, pred_depth
is directly set equal to pred_disp
, and it is the same during training. This is very confusing to me because depth and disparity are completely different. How can they be equal directly?
I have checked other repositories, and I found that only your code is written this way. Does this mean you are estimating depth directly instead of disparity?
Thank you for your clarification.
Nice work! But I got some problems when reproducing. This is my config:
--data_path ../raw_data/
--log_dir ./logdir/
--model_name res_640x192
--eval_split eigen
--backbone resnet_lite
--height 192
--width 640
--batch_size 16
--num_epochs 25
--scheduler_step_size 15
--num_layers 50
--num_features 256
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--eval_mono
--post_process
--load_pretrained_model
--load_pt_folder ./pretrained/
--pretrained_pose
--pose_net_path ./pretrained/
I downloaded pretrained models (depth.pth, encoder.pth and pose.pth) in the "pretrained" folder and I used jpeg images. But I got some bad results like these. Could you help me...
Thank you for the great work.
Could you please specify the frames per second (FPS) of the inference code used?
Could you please provide the pretrained_pose weights? I used my own trained weights to visualize the warped image, and it should be similar to the source frame, but it is actually more similar to the reference frame. I would like to check if this might be due to a poorly trained PoseNet
A nice work! but I am confused about some hyper-parameters.
Your work is great. I have 2 questions for you.
I use the following command to train and get the model,
python train.py ./args_files/args_res50_kitti_192x640_train.txt
,The first question is, which corresponding command should I use for evaluation?
Secondly, the encoder of this model occupies 950MB of storage. Are you sure it is the encoder resnet50?
Hi!
I have a question regarding the evaluation results when using the provided pretrained weights. When using the ones for ResNet (320x1024 and 192x640) the numbers are the same with the numbers from paper (for both eigen and eigen_benchmark evaluations), but when using the weights for Effb5 and ConvNeXt (320x1024, both for KITTI) the numbers are a little bit different from the paper and also the number of parameters for Effb5 it seems to be 45M instead of 37M.
Thanks for your works. And I want to know if you used '.png' images to train and validate, thank you.
Thank you for making your code public !
I saw the results in your paper in Table 1 and 2. What is the difference b/w "KITTI eigen benchmark" and "KITTI-with-improved-groundtruth" ?
What do you call these depths that are on the official KITTI website , and where can I find the other one?
When I trained with monocular video frames, I found that the training results varied greatly with the same settings. Is this normal. And I found that during training, the smooth loss easily became 0. Has anyone ever encountered such a problem.
Thank you for your code. What's the version of kornia? I cannot install kornia because of the conflict with pytorch.
很感谢您的开源!!!
不过有一些问题需要您的帮助,我使用您提供的Cityscapes上预训练的权重文件进行评估与您论文及github仓库提供的结果有很大的出入,而且权重文件的分辨率是640x192并不是512x196?
评估代码及参数:
python evaluate_res50_depth_cityscapes_config.py --eval_data_path /media/MHD/lj/datasets/cityscapes --dataset cityscapes_preprocessed --split cityscapes_preprocessed --eval_split cityscapes --height 192 --width 512 --model_dim 64 --patch_size 16 --query_nums 120 --min_depth 0.001 --max_depth 80.0 --dim_out 128 --eval_mono --load_weights_folder checkpoints/cityscapes_models
I used the parameters from the file args_res50_kitti_192x640_train to train on the Kitti dataset. The training proceeds normally, but when visualizing the output of the model, I found that the images are completely black when using Resnet_lite as the backbone, and green when using Resnet. Has anyone encountered a similar issue? How was it resolved?
what's the difference between your works and Adabins?
What is SfMNeXt-indoor(78M), SfMNeXt-outdoor(78M), SfMNeXt-outdoor-cvnxt(242M), ZoeDepth(345M), ZoeDev and MidaS(345M) in model type of online demo?
Nice work! Is the current code missing only the implementation of the FullQueryLayer class? If so, I can go ahead and fill in that part myself.
Hi~I wonder how many epochs involved in training processing?
你好,你们论文中的网络结构图很美观,我想问你们是用什么软件画的?
I have completely trained your model, and the difference with the model you provided is close to 33%. Could you please help me check my training files.
Below is the content of my training file:
--data_path /home/ubuntu/ubuntu_jixie/temp/kitti-raw
--dataset kitti
--model_name res_088
--backbone resnet_lite
--height 192
--width 640
--batch_size 16
--num_epochs 25
--scheduler_step_size 15
--num_layers 50
--num_features 256
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--min_depth 0.001
--max_depth 80.0
--eval_mono
--post_process
I use this model KITTI (Efficient-b5) and want to visualize the depth map.
for this code:
python test_simple_SQL_config.py ./args_files/args_test_simple_kitti_320x1024.txt
How should I modify the parameters? I tried several parameters but got poor results.
I'm looking forward to your reply
Hello! My question is about smoothloss.
This version code shows that the output of the model is depth. And you still use 'outputs["disp", 0] = pred' to save the depth you predicted. That is ok. So you changed the behavior in generate_images_pred function.
But in smoothloss, it seems that you still use depth to compute the smoothloss. Why didnt you use the inverse depth, which means the true disparities for smoothloss as the monodepth2 did? The results will be the same?
Thanks!
We train the network with
CUDA_VISIBLE_DEVICES=1 python train.py ./args_files/args_res50_kitti_192x640_train.txt
and test with
CUDA_VISIBLE_DEVICES=1 python evaluate_depth_config.py args_files/hisfog/kitti/resnet_320x1024.txt
CUDA_VISIBLE_DEVICES=1 python evaluate_depth_config.py args_files/hisfog/kitti/resnet_192x640.txt
but there are some problem File "/home/yx/miniconda3/envs/ekf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetEncoderDecoder:
Missing key(s) in state_dict: "encoder.encoder.conv1.weight", "encoder.encoder.bn1.weight", "encoder.encoder.bn1.bias", "encoder.encoder.bn1.running_mean", "encoder.encoder.bn1.running_var", "encoder.encoder.layer1.0.conv1.weight", "encoder.encoder.layer1.0.bn1.weight", "encoder.encoder.layer1.0.bn1.bias", "encoder.encoder.layer1.0.bn1.running_mean", "encoder.encoder.layer1.0.bn1.running_var", "encoder.encoder.layer1.0.conv2.weight", "encoder.encoder.layer1.0.bn2.weight", "encoder.encoder.layer1.0.bn2.bias", "encoder.encoder.layer1.0.bn2.running_mean", "encoder.encoder.layer1.0.bn2.running_var", "encoder.encoder.layer1.0.conv3.weight", "encoder.encoder.layer1.0.bn3.weight", "encoder.encoder.layer1.0.bn3.bias", "encoder.encoder.layer1.0.bn3.running_mean", "encoder.encoder.layer1.0.bn3.running_var", "encoder.encoder.layer1.0.downsample.0.weight", "encoder.encoder.layer1.0.downsample.1.weight", "encoder.encoder.layer1.0.downsample.1.bias", "encoder.encoder.layer1.0.downsample.1.running_mean", "encoder.encoder.layer1.0.downsample.1.running_var", "encoder.encoder.layer1.1.conv1.weight", "encoder.encoder.layer1.1.bn1.weight", "encoder.encoder.layer1.1.bn1.bias", "encoder.encoder.layer1.1.bn1.running_mean", "encoder.encoder.layer1.1.bn1.running_var", "encoder.encoder.layer1.1.conv2.weight", "encoder.encoder.layer1.1.bn2.weight", "encoder.encoder.layer1.1.bn2.bias", "encoder.encoder.layer1.1.bn2.running_mean", "encoder.encoder.layer1.1.bn2.running_var", "encoder.encoder.layer1.1.conv3.weight", "encoder.encoder.layer1.1.bn3.weight", "encoder.encoder.layer1.1.bn3.bias", "encoder.encoder.layer1.1.bn3.running_mean", "encoder.encoder.layer1.1.bn3.running_var", "encoder.encoder.layer1.2.conv1.weight", "encoder.encoder.layer1.2.bn1.weight", "encoder.encoder.layer1.2.bn1.bias", "encoder.encoder.layer1.2.bn1.running_mean", "encoder.encoder.layer1.2.bn1.running_var", "encoder.encoder.layer1.2.conv2.weight", "encoder.encoder.layer1.2.bn2.weight", "encoder.encoder.layer1.2.bn2.bias", "encoder.encoder.layer1.2.bn2.running_mean", "encoder.encoder.layer1.2.bn2.running_var", "encoder.encoder.layer1.2.conv3.weight", "encoder.encoder.layer1.2.bn3.weight", "encoder.encoder.layer1.2.bn3.bias", "encoder.encoder.layer1.2.bn3.running_mean", "encoder.encoder.layer1.2.bn3.running_var", "encoder.encoder.layer2.0.conv1.weight", "encoder.encoder.layer2.0.bn1.weight", "encoder.encoder.layer2.0.bn1.bias", "encoder.encoder.layer2.0.bn1.running_mean", "encoder.encoder.layer2.0.bn1.running_var", "encoder.encoder.layer2.0.conv2.weight", "encoder.encoder.layer2.0.bn2.weight", "encoder.encoder.layer2.0.bn2.bias", "encoder.encoder.layer2.0.bn2.running_mean", "encoder.encoder.layer2.0.bn2.running_var", "encoder.encoder.layer2.0.conv3.weight", "encoder.encoder.layer2.0.bn3.weight", "encoder.encoder.layer2.0.bn3.bias", "encoder.encoder.layer2.0.bn3.running_mean", "encoder.encoder.layer2.0.bn3.running_var", "encoder.encoder.layer2.0.downsample.0.weight", "encoder.encoder.layer2.0.downsample.1.weight", "encoder.encoder.layer2.0.downsample.1.bias", "encoder.encoder.layer2.0.downsample.1.running_mean", "encoder.encoder.layer2.0.downsample.1.running_var", "encoder.encoder.layer2.1.conv1.weight", "encoder.encoder.layer2.1.bn1.weight", "encoder.encoder.layer2.1.bn1.bias", "encoder.encoder.layer2.1.bn1.running_mean", "encoder.encoder.layer2.1.bn1.running_var", "encoder.encoder.layer2.1.conv2.weight", "encoder.encoder.layer2.1.bn2.weight", "encoder.encoder.layer2.1.bn2.bias", "encoder.encoder.layer2.1.bn2.running_mean", "encoder.encoder.layer2.1.bn2.running_var", "encoder.encoder.layer2.1.conv3.weight", "encoder.encoder.layer2.1.bn3.weight", "encoder.encoder.layer2.1.bn3.bias", "encoder.encoder.layer2.1.bn3.running_mean", "encoder.encoder.layer2.1.bn3.running_var", "encoder.encoder.layer2.2.conv1.weight", "encoder.encoder.layer2.2.bn1.weight", "encoder.encoder.layer2.2.bn1.bias", "encoder.encoder.layer2.2.bn1.running_mean", "encoder.encoder.layer2.2.bn1.running_var", "encoder.encoder.layer2.2.conv2.weight", "encoder.encoder.layer2.2.bn2.weight", "encoder.encoder.layer2.2.bn2.bias", "encoder.encoder.layer2.2.bn2.running_mean", "encoder.encoder.layer2.2.bn2.running_var", "encoder.encoder.layer2.2.conv3.weight", "encoder.encoder.layer2.2.bn3.weight", "encoder.encoder.layer2.2.bn3.bias", "encoder.encoder.layer2.2.bn3.running_mean", "encoder.encoder.layer2.2.bn3.running_var", "encoder.encoder.layer2.3.conv1.weight", "encoder.encoder.layer2.3.bn1.weight", "encoder.encoder.layer2.3.bn1.bias", "encoder.encoder.layer2.3.bn1.running_mean", "encoder.encoder.layer2.3.bn1.running_var", "encoder.encoder.layer2.3.conv2.weight", "encoder.encoder.layer2.3.bn2.weight", "encoder.encoder.layer2.3.bn2.bias", "encoder.encoder.layer2.3.bn2.running_mean", "encoder.encoder.layer2.3.bn2.running_var", "encoder.encoder.layer2.3.conv3.weight", "encoder.encoder.layer2.3.bn3.weight", "encoder.encoder.layer2.3.bn3.bias", "encoder.encoder.layer2.3.bn3.running_mean", "encoder.encoder.layer2.3.bn3.running_var", "encoder.encoder.layer3.0.conv1.weight", "encoder.encoder.layer3.0.bn1.weight", "encoder.encoder.layer3.0.bn1.bias", "encoder.encoder.layer3.0.bn1.running_mean", "encoder.encoder.layer3.0.bn1.running_var", "encoder.encoder.layer3.0.conv2.weight", "encoder.encoder.layer3.0.bn2.weight", "encoder.encoder.layer3.0.bn2.bias", "encoder.encoder.layer3.0.bn2.running_mean", "encoder.encoder.layer3.0.bn2.running_var", "encoder.encoder.layer3.0.conv3.weight", "encoder.encoder.layer3.0.bn3.weight", "encoder.encoder.layer3.0.bn3.bias", "encoder.encoder.layer3.0.bn3.running_mean", "encoder.encoder.layer3.0.bn3.running_var", "encoder.encoder.layer3.0.downsample.0.weight", "encoder.encoder.layer3.0.downsample.1.weight", "encoder.encoder.layer3.0.downsample.1.bias", "encoder.encoder.layer3.0.downsample.1.running_mean", "encoder.encoder.layer3.0.downsample.1.running_var", "encoder.encoder.layer3.1.conv1.weight", "encoder.encoder.layer3.1.bn1.weight", "encoder.encoder.layer3.1.bn1.bias", "encoder.encoder.layer3.1.bn1.running_mean", "encoder.encoder.layer3.1.bn1.running_var", "encoder.encoder.layer3.1.conv2.weight", "encoder.encoder.layer3.1.bn2.weight", "encoder.encoder.layer3.1.bn2.bias", "encoder.encoder.layer3.1.bn2.running_mean", "encoder.encoder.layer3.1.bn2.running_var", "encoder.encoder.layer3.1.conv3.weight", "encoder.encoder.layer3.1.bn3.weight", "encoder.encoder.layer3.1.bn3.bias", "encoder.encoder.layer3.1.bn3.running_mean", "encoder.encoder.layer3.1.bn3.running_var", "encoder.encoder.layer3.2.conv1.weight", "encoder.encoder.layer3.2.bn1.weight", "encoder.encoder.layer3.2.bn1.bias", "encoder.encoder.layer3.2.bn1.running_mean", "encoder.encoder.layer3.2.bn1.running_var", "encoder.encoder.layer3.2.conv2.weight", "encoder.encoder.layer3.2.bn2.weight", "encoder.encoder.layer3.2.bn2.bias", "encoder.encoder.layer3.2.bn2.running_mean", "encoder.encoder.layer3.2.bn2.running_var", "encoder.encoder.layer3.2.conv3.weight", "encoder.encoder.layer3.2.bn3.weight", "encoder.encoder.layer3.2.bn3.bias", "encoder.encoder.layer3.2.bn3.running_mean", "encoder.encoder.layer3.2.bn3.running_var", "encoder.encoder.layer3.3.conv1.weight", "encoder.encoder.layer3.3.bn1.weight", "encoder.encoder.layer3.3.bn1.bias", "encoder.encoder.layer3.3.bn1.running_mean", "encoder.encoder.layer3.3.bn1.running_var", "encoder.encoder.layer3.3.conv2.weight", "encoder.encoder.layer3.3.bn2.weight", "encoder.encoder.layer3.3.bn2.bias", "encoder.encoder.layer3.3.bn2.running_mean", "encoder.encoder.layer3.3.bn2.running_var", "encoder.encoder.layer3.3.conv3.weight", "encoder.encoder.layer3.3.bn3.weight", "encoder.encoder.layer3.3.bn3.bias", "encoder.encoder.layer3.3.bn3.running_mean", "encoder.encoder.layer3.3.bn3.running_var", "encoder.encoder.layer3.4.conv1.weight", "encoder.encoder.layer3.4.bn1.weight", "encoder.encoder.layer3.4.bn1.bias", "encoder.encoder.layer3.4.bn1.running_mean", "encoder.encoder.layer3.4.bn1.running_var", "encoder.encoder.layer3.4.conv2.weight", "encoder.encoder.layer3.4.bn2.weight", "encoder.encoder.layer3.4.bn2.bias", "encoder.encoder.layer3.4.bn2.running_mean", "encoder.encoder.layer3.4.bn2.running_var", "encoder.encoder.layer3.4.conv3.weight", "encoder.encoder.layer3.4.bn3.weight", "encoder.encoder.layer3.4.bn3.bias", "encoder.encoder.layer3.4.bn3.running_mean", "encoder.encoder.layer3.4.bn3.running_var", "encoder.encoder.layer3.5.conv1.weight", "encoder.encoder.layer3.5.bn1.weight", "encoder.encoder.layer3.5.bn1.bias", "encoder.encoder.layer3.5.bn1.running_mean", "encoder.encoder.layer3.5.bn1.running_var", "encoder.encoder.layer3.5.conv2.weight", "encoder.encoder.layer3.5.bn2.weight", "encoder.encoder.layer3.5.bn2.bias", "encoder.encoder.layer3.5.bn2.running_mean", "encoder.encoder.layer3.5.bn2.running_var", "encoder.encoder.layer3.5.conv3.weight", "encoder.encoder.layer3.5.bn3.weight", "encoder.encoder.layer3.5.bn3.bias", "encoder.encoder.layer3.5.bn3.running_mean", "encoder.encoder.layer3.5.bn3.running_var", "encoder.encoder.layer4.0.conv1.weight", "encoder.encoder.layer4.0.bn1.weight", "encoder.encoder.layer4.0.bn1.bias", "encoder.encoder.layer4.0.bn1.running_mean", "encoder.encoder.layer4.0.bn1.running_var", "encoder.encoder.layer4.0.conv2.weight", "encoder.encoder.layer4.0.bn2.weight", "encoder.encoder.layer4.0.bn2.bias", "encoder.encoder.layer4.0.bn2.running_mean", "encoder.encoder.layer4.0.bn2.running_var", "encoder.encoder.layer4.0.conv3.weight", "encoder.encoder.layer4.0.bn3.weight", "encoder.encoder.layer4.0.bn3.bias", "encoder.encoder.layer4.0.bn3.running_mean", "encoder.encoder.layer4.0.bn3.running_var", "encoder.encoder.layer4.0.downsample.0.weight", "encoder.encoder.layer4.0.downsample.1.weight", "encoder.encoder.layer4.0.downsample.1.bias", "encoder.encoder.layer4.0.downsample.1.running_mean", "encoder.encoder.layer4.0.downsample.1.running_var", "encoder.encoder.layer4.1.conv1.weight", "encoder.encoder.layer4.1.bn1.weight", "encoder.encoder.layer4.1.bn1.bias", "encoder.encoder.layer4.1.bn1.running_mean", "encoder.encoder.layer4.1.bn1.running_var", "encoder.encoder.layer4.1.conv2.weight", "encoder.encoder.layer4.1.bn2.weight", "encoder.encoder.layer4.1.bn2.bias", "encoder.encoder.layer4.1.bn2.running_mean", "encoder.encoder.layer4.1.bn2.running_var", "encoder.encoder.layer4.1.conv3.weight", "encoder.encoder.layer4.1.bn3.weight", "encoder.encoder.layer4.1.bn3.bias", "encoder.encoder.layer4.1.bn3.running_mean", "encoder.encoder.layer4.1.bn3.running_var", "encoder.encoder.layer4.2.conv1.weight", "encoder.encoder.layer4.2.bn1.weight", "encoder.encoder.layer4.2.bn1.bias", "encoder.encoder.layer4.2.bn1.running_mean", "encoder.encoder.layer4.2.bn1.running_var", "encoder.encoder.layer4.2.conv2.weight", "encoder.encoder.layer4.2.bn2.weight", "encoder.encoder.layer4.2.bn2.bias", "encoder.encoder.layer4.2.bn2.running_mean", "encoder.encoder.layer4.2.bn2.running_var", "encoder.encoder.layer4.2.conv3.weight", "encoder.encoder.layer4.2.bn3.weight", "encoder.encoder.layer4.2.bn3.bias", "encoder.encoder.layer4.2.bn3.running_mean", "encoder.encoder.layer4.2.bn3.running_var", "encoder.encoder.fc.weight", "encoder.encoder.fc.bias", "decoder.conv2.weight", "decoder.conv2.bias", "decoder.up1._net.0.weight", "decoder.up1._net.0.bias", "decoder.up1._net.1.weight", "decoder.up1._net.1.bias", "decoder.up1._net.1.running_mean", "decoder.up1._net.1.running_var", "decoder.up1._net.3.weight", "decoder.up1._net.3.bias", "decoder.up1._net.4.weight", "decoder.up1._net.4.bias", "decoder.up1._net.4.running_mean", "decoder.up1._net.4.running_var", "decoder.up2._net.0.weight", "decoder.up2._net.0.bias", "decoder.up2._net.1.weight", "decoder.up2._net.1.bias", "decoder.up2._net.1.running_mean", "decoder.up2._net.1.running_var", "decoder.up2._net.3.weight", "decoder.up2._net.3.bias", "decoder.up2._net.4.weight", "decoder.up2._net.4.bias", "decoder.up2._net.4.running_mean", "decoder.up2._net.4.running_var", "decoder.up3._net.0.weight", "decoder.up3._net.0.bias", "decoder.up3._net.1.weight", "decoder.up3._net.1.bias", "decoder.up3._net.1.running_mean", "decoder.up3._net.1.running_var", "decoder.up3._net.3.weight", "decoder.up3._net.3.bias", "decoder.up3._net.4.weight", "decoder.up3._net.4.bias", "decoder.up3._net.4.running_mean", "decoder.up3._net.4.running_var", "decoder.up4._net.0.weight", "decoder.up4._net.0.bias", "decoder.up4._net.1.weight", "decoder.up4._net.1.bias", "decoder.up4._net.1.running_mean", "decoder.up4._net.1.running_var", "decoder.up4._net.3.weight", "decoder.up4._net.3.bias", "decoder.up4._net.4.weight", "decoder.up4._net.4.bias", "decoder.up4._net.4.running_mean", "decoder.up4._net.4.running_var", "decoder.conv3.weight", "decoder.conv3.bias".
I am trying to run inference on some images using the KITTI (ConvNeXt-L) linked on the README.
the model runs without error, but outputs look like this:
I believe something is going wrong since this doesn't match the supposed results.
I read some of this repo's previous issues that had similar images, but didn't find any full configuration setups that fixed the issue.
I am running inference with this command from the README
python test_simple_SQL_config.py ./args_files/args_test_simple_kitti_320x1024.txt
Where can I find a combination of:
/SfMNeXt-Impl-main/args_files
Thanks!
Hi!
When I attempt to load the pretrained weights you provided for EfficientNetB5, there appear to be some mismatches between the keys in the state_dict. Loading the weights was quite straightforward for ResNet50 and ConvNeXt, but this was not the case with EfficientNetB5
Is there any function to get the predicted metric depth from video or image? For my use case, I want metric depth in real time.
Hi, author, thanks for your remarkable work.
I attempted training with the following settings.
--dataset kitti
--eval_split eigen
--height 192
--width 640
--batch_size 16
--num_epochs 25
--model_dim 32
--patch_size 16
--query_nums 120
--scheduler_step_size 15
--eval_mono
--post_process
--min_depth 0.001
--max_depth 80.0
--backbone resnet18_lite
As training progresses, the loss gradually decreases, and various metrics show improvement. However, the obtained disp maps looks strange. Here are disp maps obtained at the seventh epoch.
It seems to render nearby objects in dark colors and distant objects in bright colors. Is this normal?
I trained this model in efficient_b5 as your advice. (#27)
Of cource, FPS has improved significantly.
A whopping about 7FPS was achieved in the TensorRT environment!
I would be very happy with a little improvement. Can you give me some advice?
Thanks in advance.
Nice work!
gt_path = os.path.join(splits_dir, opt.eval_split, "gt_depths.npz")
in evaluate_depth_config.py corresponds to the path where the ground truth depth data is expected to be for evaluation. It does not directly correspond to the path of the raw data from the KITTI dataset, such as kitti_data/raw/2011_09_26/2011_09_26_drive_0002_sync/velodyne_points/data/0000000069.bin.
So how can I get gt_depths.npz?
Nice work! I would appreciate your guidance on the following two questions:
1.This is the result of my testing with the latest code using the 'kitti-resnet50-640*192' weights. How do you perceive the errors introduced by shadows?
2.What potential issues do you think might arise when using this depth estimation result for novel view synthesis? It seems that this adaptive binning approach is very friendly for NVS(novel view synthesis).
The layers dont have the function or class : PixelWiseDotProduct_for_dense, PixelWiseDotProduct_for_summary, FullQueryLayer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.