hisfog / sfmnext-impl Goto Github PK

[AAAI 2024] Official implementation of "SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation", and more.

License: MIT License

Python 99.96% Shell 0.04%

sfmnext-impl's People

Contributors

Stargazers

Watchers

Forkers

aboywithlighters steven-xiong seoalexer gladcolor rohanhg91 avi9700 chyang0822 xiaoyaocoding jackyfriend aifeixingdelv flyinggh

sfmnext-impl's Issues

train e5 low resolution

How can I train a model on KITTI dataset using E5 as the backbone? Can I train it at a lower resolution like 640x192?

I am very interested in your work! I ran your model in TensorRT, C++. The fps is so low that I think I need to run a model with fp16 precision. Do you have a pretrained model trained with fp16 precision? Can I possibly get it?

Finetune the ConvNeXt-L on KITTI

Hello and nice work! My question is how to finetune the model on KITTI?
I tried with the script ./finetune/train_ft_SQLdepth.py but cannot get good enough results. Only abs_rel 0.0494 and rmse 2.182.

reprojected image and automasked image

How can I see reprojected image and automasked image with test_simple.py?

Question about the demo.

May I ask which model or weights are specifically used for the ZoeDepth and MidaS models in your demo site?
Links would be appreciated, thanks.

About how to Train ConvNeXt-L Model

Training KITTI in README.md only mentions Python train.py./args_ Files/args_ Res50_ Kitti_ 192x640_ Train.txt, args_ There seems to be no information about the ConvNeXt-L model in the files folder. We look forward to your answer. Thank you!

how to train my own network's pre-trained model

hello!I want to use your sql in my own network,but when i train on kitti,i find that the result is not good. So I think if I should first train my own SSL pre-trained model. And the use my own SSL pre-trained model to train my model! I am very much looking forward to the author's reply！Thanks a lot !

After modifying the splits/eigen_zhou/train_files.txt, I reproduced the results of the paper

Thanks to the authors for their outstanding contribution. I have successfully reproduced the results of the paper. However, I have encountered new confusion.

At first, I failed to reproduce the results of the paper. However, I discovered from the authors' comments that using the previous version could reproduce the results.

So I obtained the previous version by executing the git checkout 6a1e997 command.

Here are the experimental results of my ResNet50 model at a resolution of 192x640.

Compare to the new version,the previous version that affects the experimental results is the 'train_files.txt' file in 'eigen_zhou', which contains about 71k images.However, the training set mentioned in the author's paper contains only 26k images.

In addition, I trained using the previous train split (71k images)on Monodepth2 and achieved results similar to the SQLdepth.

I am very confused by the experimental results.My question is:

Did author use a train set containing 71k images during training.
How to interpret Monodepth2 achieving similar results when trained on a dataset containing 71k images.
@hisfog
"Looking forward to your reply.Thanks a lot!

Convert to onnx

I impressed your work!
I want to convert pth file (ConvNeXt_Large_SQLdepth) to onnx file.
I succeed encorder to onnx, but, unfortunately, I faild to covert depth.pth to onnx.

below is my error code :
/SfMNeXt-Impl/networks/layers.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert c == ck, "Number of channels in x and Embedding dimension (at dim 2) of K matrix must match"
Traceback (most recent call last):
File "export_to_onnx_depth.py", line 56, in
input_names=input_names_2, output_names=output_names_2, opset_version=11)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/init.py", line 28, in _export
result = utils._export(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 701, in _export
dynamic_axes=dynamic_axes)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 459, in _model_to_graph
use_new_jit_passes)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 420, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 380, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 1139, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 130, in forward
self._force_outplace,
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 119, in wrapper
out_vars, _ = _flatten(outs)
RuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input of unsupported type: int

I still couldn't find int type, where in this input, output.
May I get help?

Ask when running a "test_simple_SQL_config.py" file.

Hello, I have a question, why are there no functions in the "layers.py" file, PatchTransformerEncoder, PixelWiseDotProduct.

Question about the slice of the self-cost volume

Thanks for your nice work!
According to the description in section 4.2 Self Query Layer, each slice represents the relative distance between pixels and objects. So why does a bright-dark-bright(near-far-near) situation occur, such as in the image at the second row of the first column?

关于Hugging Face Hub连接超时

作者您好，在运行python train.py ./args_files/args_res50_kitti_192x640_train.txt时，会报错Hugging Face Hub连接超时错误，请问这该如何解决。或者我能否手动下载模型权重文件，然后将其放置在脚本期望的位置？具体文件名称和位置能否请您告知一下。
错误情况如下：
(/data/ccy/env/SfMNeXt) ccy@kxB1:/data/ccy/project/SfMNeXt-Impl-main$ python train.py ./args_files/args_res50_kitti_192x640_train.txt
/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
socket.timeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 790, in urlopen
response = self._make_request(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connection.py", line 611, in connect
self.sock = sock = self._new_conn()
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connection.py", line 212, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f14bbdeda90>, 'Connection to huggingface.co timed out. (connect timeout=10)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convnext_large.fb_in22k_ft_in1k/resolve/main/pytorch_model.bin (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f14bbdeda90>, 'Connection to huggingface.co timed out. (connect timeout=10)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
metadata = get_hf_file_metadata(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1624, in get_hf_file_metadata
r = _request_wrapper(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 402, in _request_wrapper
response = _request_wrapper(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 425, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 63, in send
return super().send(request, *args, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/requests/adapters.py", line 507, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convnext_large.fb_in22k_ft_in1k/resolve/main/pytorch_model.bin (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f14bbdeda90>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: ecf9a333-7b8b-4830-bc46-02241ea337ff)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/ccy/project/SfMNeXt-Impl-main/train.py", line 22, in
trainer = Trainer(opts)
File "/data/ccy/project/SfMNeXt-Impl-main/trainer.py", line 64, in init
self.models["encoder"] = networks.Unet(pretrained=(not self.opt.load_pretrained_model), backbone=self.opt.backbone, in_channels=3, num_classes=self.opt.model_dim, decoder_channels=self.opt.dec_channels)
File "/data/ccy/project/SfMNeXt-Impl-main/networks/Unet.py", line 114, in init
encoder = create_model(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/convnext.py", line 986, in convnext_large
model = _create_convnext('convnext_large', pretrained=pretrained, **dict(model_args, **kwargs))
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/convnext.py", line 486, in _create_convnext
model = build_model_with_cfg(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_builder.py", line 397, in build_model_with_cfg
load_pretrained(
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_builder.py", line 190, in load_pretrained
state_dict = load_state_dict_from_hf(pretrained_loc)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/timm/models/_hub.py", line 188, in load_state_dict_from_hf
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/data/ccy/env/SfMNeXt/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1377, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
期待得到您的回复谢谢~

minor bug:data going out of bounds

There is a minor bug. In mono_dataset.py, due to the frame_idxs being [0, -1, 1], there can be situations of data going out of bounds when accessing getitem.

A question about pretrain weights

Thanks for your work. My questions is Did you use the vit pretrain weights? In your code it seems that you use the adabins vit pretrain weights.

traning problem

Dear,

The code provided is excellent.

However, when I try to run the code you provided, the actual model training doesn't seem to work correctly. For example, the performance of the model displayed in TensorBoard is not satisfactory, and the model output during training is not displayed correctly. Additionally, when I train the model, the output is represented as a black visual screen.

I only changed the data path in your code. Why is this happening?

Unlike the previous issue, the code seems to run fine, but the training doesn't appear to be working correctly.

What do you think?

Lastly, if the input is 192x640, is it correct for the output to be 96x320?

Sincerely,

Question about pred_depth and pred_disp in the code

Hi,

I have noticed that in your code, pred_depth is directly set equal to pred_disp, and it is the same during training. This is very confusing to me because depth and disparity are completely different. How can they be equal directly?

I have checked other repositories, and I found that only your code is written this way. Does this mean you are estimating depth directly instead of disparity?

Thank you for your clarification.

I got bad results when reproducing

Nice work! But I got some problems when reproducing. This is my config:
--data_path ../raw_data/
--log_dir ./logdir/
--model_name res_640x192
--eval_split eigen
--backbone resnet_lite
--height 192
--width 640
--batch_size 16
--num_epochs 25
--scheduler_step_size 15
--num_layers 50
--num_features 256
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--eval_mono
--post_process
--load_pretrained_model
--load_pt_folder ./pretrained/
--pretrained_pose
--pose_net_path ./pretrained/

I downloaded pretrained models (depth.pth, encoder.pth and pose.pth) in the "pretrained" folder and I used jpeg images. But I got some bad results like these. Could you help me...

fps

Thank you for the great work.

Could you please specify the frames per second (FPS) of the inference code used?

Pretrained_pose weights

Could you please provide the pretrained_pose weights? I used my own trained weights to visualize the warped image, and it should be similar to the source frame, but it is actually more similar to the reference frame. I would like to check if this might be due to a poorly trained PoseNet

Confusion about some hyper-parameters

A nice work! but I am confused about some hyper-parameters.

About your provided training commands

Your work is great. I have 2 questions for you.
I use the following command to train and get the model,
python train.py ./args_files/args_res50_kitti_192x640_train.txt

,The first question is, which corresponding command should I use for evaluation?
Secondly, the encoder of this model occupies 950MB of storage. Are you sure it is the encoder resnet50?

Evaluation results with the pretrained weights

Hi!

I have a question regarding the evaluation results when using the provided pretrained weights. When using the ones for ResNet (320x1024 and 192x640) the numbers are the same with the numbers from paper (for both eigen and eigen_benchmark evaluations), but when using the weights for Effb5 and ConvNeXt (320x1024, both for KITTI) the numbers are a little bit different from the paper and also the number of parameters for Effb5 it seems to be 45M instead of 37M.

Hello, are you training with PNG images or JPEG images for training and validation?

Thanks for your works. And I want to know if you used '.png' images to train and validate, thank you.

Difference b/w KITTI and KITTI-with-improved-GT

Thank you for making your code public !
I saw the results in your paper in Table 1 and 2. What is the difference b/w "KITTI eigen benchmark" and "KITTI-with-improved-groundtruth" ?
What do you call these depths that are on the official KITTI website , and where can I find the other one?

Training results vary widely

When I trained with monocular video frames, I found that the training results varied greatly with the same settings. Is this normal. And I found that during training, the smooth loss easily became 0. Has anyone ever encountered such a problem.

The version of kornia

Thank you for your code. What's the version of kornia? I cannot install kornia because of the conflict with pytorch.

Cityscapes datasets problem？

很感谢您的开源！！！
不过有一些问题需要您的帮助，我使用您提供的Cityscapes上预训练的权重文件进行评估与您论文及github仓库提供的结果有很大的出入，而且权重文件的分辨率是640x192并不是512x196？
评估代码及参数：
python evaluate_res50_depth_cityscapes_config.py --eval_data_path /media/MHD/lj/datasets/cityscapes --dataset cityscapes_preprocessed --split cityscapes_preprocessed --eval_split cityscapes --height 192 --width 512 --model_dim 64 --patch_size 16 --query_nums 120 --min_depth 0.001 --max_depth 80.0 --dim_out 128 --eval_mono --load_weights_folder checkpoints/cityscapes_models

Training was unable to achieve the expected results

I used the parameters from the file args_res50_kitti_192x640_train to train on the Kitti dataset. The training proceeds normally, but when visualizing the output of the model, I found that the images are completely black when using Resnet_lite as the backbone, and green when using Resnet. Has anyone encountered a similar issue? How was it resolved?

about Adabins

what's the difference between your works and Adabins?

Online Demo question.

What is SfMNeXt-indoor(78M), SfMNeXt-outdoor(78M), SfMNeXt-outdoor-cvnxt(242M), ZoeDepth(345M), ZoeDev and MidaS(345M) in model type of online demo?

The dictionary keys do not match

Nice work！
The keys in the encoder_dict do not correspond to the keys in the provided pre-trained weights for KITTI(ResNet50)640*192

Only miss FullQueryLayer?

Nice work! Is the current code missing only the implementation of the FullQueryLayer class? If so, I can go ahead and fill in that part myself.

Details about training scheme

Hi~I wonder how many epochs involved in training processing?

请问你们的网络结构图是用什么软件画的？

你好，你们论文中的网络结构图很美观，我想问你们是用什么软件画的？

The training results differ from the paper by 33%

I have completely trained your model, and the difference with the model you provided is close to 33%. Could you please help me check my training files.

Below is the content of my training file:

--data_path /home/ubuntu/ubuntu_jixie/temp/kitti-raw
--dataset kitti
--model_name res_088
--backbone resnet_lite
--height 192
--width 640
--batch_size 16
--num_epochs 25
--scheduler_step_size 15
--num_layers 50
--num_features 256
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--min_depth 0.001
--max_depth 80.0
--eval_mono
--post_process

Visualize depth map using the trained model

I use this model KITTI (Efficient-b5) and want to visualize the depth map.
for this code:
python test_simple_SQL_config.py ./args_files/args_test_simple_kitti_320x1024.txt
How should I modify the parameters? I tried several parameters but got poor results.
I'm looking forward to your reply

The smoothloss

Hello! My question is about smoothloss.
This version code shows that the output of the model is depth. And you still use 'outputs["disp", 0] = pred' to save the depth you predicted. That is ok. So you changed the behavior in generate_images_pred function.

But in smoothloss, it seems that you still use depth to compute the smoothloss. Why didnt you use the inverse depth, which means the true disparities for smoothloss as the monodepth2 did? The results will be the same?

Thanks!

RuntimeError: Error(s) in loading state_dict for ResnetEncoderDecoder:

We train the network with
CUDA_VISIBLE_DEVICES=1 python train.py ./args_files/args_res50_kitti_192x640_train.txt

and test with
CUDA_VISIBLE_DEVICES=1 python evaluate_depth_config.py args_files/hisfog/kitti/resnet_320x1024.txt
CUDA_VISIBLE_DEVICES=1 python evaluate_depth_config.py args_files/hisfog/kitti/resnet_192x640.txt

but there are some problem File "/home/yx/miniconda3/envs/ekf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetEncoderDecoder:
Missing key(s) in state_dict: "encoder.encoder.conv1.weight", "encoder.encoder.bn1.weight", "encoder.encoder.bn1.bias", "encoder.encoder.bn1.running_mean", "encoder.encoder.bn1.running_var", "encoder.encoder.layer1.0.conv1.weight", "encoder.encoder.layer1.0.bn1.weight", "encoder.encoder.layer1.0.bn1.bias", "encoder.encoder.layer1.0.bn1.running_mean", "encoder.encoder.layer1.0.bn1.running_var", "encoder.encoder.layer1.0.conv2.weight", "encoder.encoder.layer1.0.bn2.weight", "encoder.encoder.layer1.0.bn2.bias", "encoder.encoder.layer1.0.bn2.running_mean", "encoder.encoder.layer1.0.bn2.running_var", "encoder.encoder.layer1.0.conv3.weight", "encoder.encoder.layer1.0.bn3.weight", "encoder.encoder.layer1.0.bn3.bias", "encoder.encoder.layer1.0.bn3.running_mean", "encoder.encoder.layer1.0.bn3.running_var", "encoder.encoder.layer1.0.downsample.0.weight", "encoder.encoder.layer1.0.downsample.1.weight", "encoder.encoder.layer1.0.downsample.1.bias", "encoder.encoder.layer1.0.downsample.1.running_mean", "encoder.encoder.layer1.0.downsample.1.running_var", "encoder.encoder.layer1.1.conv1.weight", "encoder.encoder.layer1.1.bn1.weight", "encoder.encoder.layer1.1.bn1.bias", "encoder.encoder.layer1.1.bn1.running_mean", "encoder.encoder.layer1.1.bn1.running_var", "encoder.encoder.layer1.1.conv2.weight", "encoder.encoder.layer1.1.bn2.weight", "encoder.encoder.layer1.1.bn2.bias", "encoder.encoder.layer1.1.bn2.running_mean", "encoder.encoder.layer1.1.bn2.running_var", "encoder.encoder.layer1.1.conv3.weight", "encoder.encoder.layer1.1.bn3.weight", "encoder.encoder.layer1.1.bn3.bias", "encoder.encoder.layer1.1.bn3.running_mean", "encoder.encoder.layer1.1.bn3.running_var", "encoder.encoder.layer1.2.conv1.weight", "encoder.encoder.layer1.2.bn1.weight", "encoder.encoder.layer1.2.bn1.bias", "encoder.encoder.layer1.2.bn1.running_mean", "encoder.encoder.layer1.2.bn1.running_var", "encoder.encoder.layer1.2.conv2.weight", "encoder.encoder.layer1.2.bn2.weight", "encoder.encoder.layer1.2.bn2.bias", "encoder.encoder.layer1.2.bn2.running_mean", "encoder.encoder.layer1.2.bn2.running_var", "encoder.encoder.layer1.2.conv3.weight", "encoder.encoder.layer1.2.bn3.weight", "encoder.encoder.layer1.2.bn3.bias", "encoder.encoder.layer1.2.bn3.running_mean", "encoder.encoder.layer1.2.bn3.running_var", "encoder.encoder.layer2.0.conv1.weight", "encoder.encoder.layer2.0.bn1.weight", "encoder.encoder.layer2.0.bn1.bias", "encoder.encoder.layer2.0.bn1.running_mean", "encoder.encoder.layer2.0.bn1.running_var", "encoder.encoder.layer2.0.conv2.weight", "encoder.encoder.layer2.0.bn2.weight", "encoder.encoder.layer2.0.bn2.bias", "encoder.encoder.layer2.0.bn2.running_mean", "encoder.encoder.layer2.0.bn2.running_var", "encoder.encoder.layer2.0.conv3.weight", "encoder.encoder.layer2.0.bn3.weight", "encoder.encoder.layer2.0.bn3.bias", "encoder.encoder.layer2.0.bn3.running_mean", "encoder.encoder.layer2.0.bn3.running_var", "encoder.encoder.layer2.0.downsample.0.weight", "encoder.encoder.layer2.0.downsample.1.weight", "encoder.encoder.layer2.0.downsample.1.bias", "encoder.encoder.layer2.0.downsample.1.running_mean", "encoder.encoder.layer2.0.downsample.1.running_var", "encoder.encoder.layer2.1.conv1.weight", "encoder.encoder.layer2.1.bn1.weight", "encoder.encoder.layer2.1.bn1.bias", "encoder.encoder.layer2.1.bn1.running_mean", "encoder.encoder.layer2.1.bn1.running_var", "encoder.encoder.layer2.1.conv2.weight", "encoder.encoder.layer2.1.bn2.weight", "encoder.encoder.layer2.1.bn2.bias", "encoder.encoder.layer2.1.bn2.running_mean", "encoder.encoder.layer2.1.bn2.running_var", "encoder.encoder.layer2.1.conv3.weight", "encoder.encoder.layer2.1.bn3.weight", "encoder.encoder.layer2.1.bn3.bias", "encoder.encoder.layer2.1.bn3.running_mean", "encoder.encoder.layer2.1.bn3.running_var", "encoder.encoder.layer2.2.conv1.weight", "encoder.encoder.layer2.2.bn1.weight", "encoder.encoder.layer2.2.bn1.bias", "encoder.encoder.layer2.2.bn1.running_mean", "encoder.encoder.layer2.2.bn1.running_var", "encoder.encoder.layer2.2.conv2.weight", "encoder.encoder.layer2.2.bn2.weight", "encoder.encoder.layer2.2.bn2.bias", "encoder.encoder.layer2.2.bn2.running_mean", "encoder.encoder.layer2.2.bn2.running_var", "encoder.encoder.layer2.2.conv3.weight", "encoder.encoder.layer2.2.bn3.weight", "encoder.encoder.layer2.2.bn3.bias", "encoder.encoder.layer2.2.bn3.running_mean", "encoder.encoder.layer2.2.bn3.running_var", "encoder.encoder.layer2.3.conv1.weight", "encoder.encoder.layer2.3.bn1.weight", "encoder.encoder.layer2.3.bn1.bias", "encoder.encoder.layer2.3.bn1.running_mean", "encoder.encoder.layer2.3.bn1.running_var", "encoder.encoder.layer2.3.conv2.weight", "encoder.encoder.layer2.3.bn2.weight", "encoder.encoder.layer2.3.bn2.bias", "encoder.encoder.layer2.3.bn2.running_mean", "encoder.encoder.layer2.3.bn2.running_var", "encoder.encoder.layer2.3.conv3.weight", "encoder.encoder.layer2.3.bn3.weight", "encoder.encoder.layer2.3.bn3.bias", "encoder.encoder.layer2.3.bn3.running_mean", "encoder.encoder.layer2.3.bn3.running_var", "encoder.encoder.layer3.0.conv1.weight", "encoder.encoder.layer3.0.bn1.weight", "encoder.encoder.layer3.0.bn1.bias", "encoder.encoder.layer3.0.bn1.running_mean", "encoder.encoder.layer3.0.bn1.running_var", "encoder.encoder.layer3.0.conv2.weight", "encoder.encoder.layer3.0.bn2.weight", "encoder.encoder.layer3.0.bn2.bias", "encoder.encoder.layer3.0.bn2.running_mean", "encoder.encoder.layer3.0.bn2.running_var", "encoder.encoder.layer3.0.conv3.weight", "encoder.encoder.layer3.0.bn3.weight", "encoder.encoder.layer3.0.bn3.bias", "encoder.encoder.layer3.0.bn3.running_mean", "encoder.encoder.layer3.0.bn3.running_var", "encoder.encoder.layer3.0.downsample.0.weight", "encoder.encoder.layer3.0.downsample.1.weight", "encoder.encoder.layer3.0.downsample.1.bias", "encoder.encoder.layer3.0.downsample.1.running_mean", "encoder.encoder.layer3.0.downsample.1.running_var", "encoder.encoder.layer3.1.conv1.weight", "encoder.encoder.layer3.1.bn1.weight", "encoder.encoder.layer3.1.bn1.bias", "encoder.encoder.layer3.1.bn1.running_mean", "encoder.encoder.layer3.1.bn1.running_var", "encoder.encoder.layer3.1.conv2.weight", "encoder.encoder.layer3.1.bn2.weight", "encoder.encoder.layer3.1.bn2.bias", "encoder.encoder.layer3.1.bn2.running_mean", "encoder.encoder.layer3.1.bn2.running_var", "encoder.encoder.layer3.1.conv3.weight", "encoder.encoder.layer3.1.bn3.weight", "encoder.encoder.layer3.1.bn3.bias", "encoder.encoder.layer3.1.bn3.running_mean", "encoder.encoder.layer3.1.bn3.running_var", "encoder.encoder.layer3.2.conv1.weight", "encoder.encoder.layer3.2.bn1.weight", "encoder.encoder.layer3.2.bn1.bias", "encoder.encoder.layer3.2.bn1.running_mean", "encoder.encoder.layer3.2.bn1.running_var", "encoder.encoder.layer3.2.conv2.weight", "encoder.encoder.layer3.2.bn2.weight", "encoder.encoder.layer3.2.bn2.bias", "encoder.encoder.layer3.2.bn2.running_mean", "encoder.encoder.layer3.2.bn2.running_var", "encoder.encoder.layer3.2.conv3.weight", "encoder.encoder.layer3.2.bn3.weight", "encoder.encoder.layer3.2.bn3.bias", "encoder.encoder.layer3.2.bn3.running_mean", "encoder.encoder.layer3.2.bn3.running_var", "encoder.encoder.layer3.3.conv1.weight", "encoder.encoder.layer3.3.bn1.weight", "encoder.encoder.layer3.3.bn1.bias", "encoder.encoder.layer3.3.bn1.running_mean", "encoder.encoder.layer3.3.bn1.running_var", "encoder.encoder.layer3.3.conv2.weight", "encoder.encoder.layer3.3.bn2.weight", "encoder.encoder.layer3.3.bn2.bias", "encoder.encoder.layer3.3.bn2.running_mean", "encoder.encoder.layer3.3.bn2.running_var", "encoder.encoder.layer3.3.conv3.weight", "encoder.encoder.layer3.3.bn3.weight", "encoder.encoder.layer3.3.bn3.bias", "encoder.encoder.layer3.3.bn3.running_mean", "encoder.encoder.layer3.3.bn3.running_var", "encoder.encoder.layer3.4.conv1.weight", "encoder.encoder.layer3.4.bn1.weight", "encoder.encoder.layer3.4.bn1.bias", "encoder.encoder.layer3.4.bn1.running_mean", "encoder.encoder.layer3.4.bn1.running_var", "encoder.encoder.layer3.4.conv2.weight", "encoder.encoder.layer3.4.bn2.weight", "encoder.encoder.layer3.4.bn2.bias", "encoder.encoder.layer3.4.bn2.running_mean", "encoder.encoder.layer3.4.bn2.running_var", "encoder.encoder.layer3.4.conv3.weight", "encoder.encoder.layer3.4.bn3.weight", "encoder.encoder.layer3.4.bn3.bias", "encoder.encoder.layer3.4.bn3.running_mean", "encoder.encoder.layer3.4.bn3.running_var", "encoder.encoder.layer3.5.conv1.weight", "encoder.encoder.layer3.5.bn1.weight", "encoder.encoder.layer3.5.bn1.bias", "encoder.encoder.layer3.5.bn1.running_mean", "encoder.encoder.layer3.5.bn1.running_var", "encoder.encoder.layer3.5.conv2.weight", "encoder.encoder.layer3.5.bn2.weight", "encoder.encoder.layer3.5.bn2.bias", "encoder.encoder.layer3.5.bn2.running_mean", "encoder.encoder.layer3.5.bn2.running_var", "encoder.encoder.layer3.5.conv3.weight", "encoder.encoder.layer3.5.bn3.weight", "encoder.encoder.layer3.5.bn3.bias", "encoder.encoder.layer3.5.bn3.running_mean", "encoder.encoder.layer3.5.bn3.running_var", "encoder.encoder.layer4.0.conv1.weight", "encoder.encoder.layer4.0.bn1.weight", "encoder.encoder.layer4.0.bn1.bias", "encoder.encoder.layer4.0.bn1.running_mean", "encoder.encoder.layer4.0.bn1.running_var", "encoder.encoder.layer4.0.conv2.weight", "encoder.encoder.layer4.0.bn2.weight", "encoder.encoder.layer4.0.bn2.bias", "encoder.encoder.layer4.0.bn2.running_mean", "encoder.encoder.layer4.0.bn2.running_var", "encoder.encoder.layer4.0.conv3.weight", "encoder.encoder.layer4.0.bn3.weight", "encoder.encoder.layer4.0.bn3.bias", "encoder.encoder.layer4.0.bn3.running_mean", "encoder.encoder.layer4.0.bn3.running_var", "encoder.encoder.layer4.0.downsample.0.weight", "encoder.encoder.layer4.0.downsample.1.weight", "encoder.encoder.layer4.0.downsample.1.bias", "encoder.encoder.layer4.0.downsample.1.running_mean", "encoder.encoder.layer4.0.downsample.1.running_var", "encoder.encoder.layer4.1.conv1.weight", "encoder.encoder.layer4.1.bn1.weight", "encoder.encoder.layer4.1.bn1.bias", "encoder.encoder.layer4.1.bn1.running_mean", "encoder.encoder.layer4.1.bn1.running_var", "encoder.encoder.layer4.1.conv2.weight", "encoder.encoder.layer4.1.bn2.weight", "encoder.encoder.layer4.1.bn2.bias", "encoder.encoder.layer4.1.bn2.running_mean", "encoder.encoder.layer4.1.bn2.running_var", "encoder.encoder.layer4.1.conv3.weight", "encoder.encoder.layer4.1.bn3.weight", "encoder.encoder.layer4.1.bn3.bias", "encoder.encoder.layer4.1.bn3.running_mean", "encoder.encoder.layer4.1.bn3.running_var", "encoder.encoder.layer4.2.conv1.weight", "encoder.encoder.layer4.2.bn1.weight", "encoder.encoder.layer4.2.bn1.bias", "encoder.encoder.layer4.2.bn1.running_mean", "encoder.encoder.layer4.2.bn1.running_var", "encoder.encoder.layer4.2.conv2.weight", "encoder.encoder.layer4.2.bn2.weight", "encoder.encoder.layer4.2.bn2.bias", "encoder.encoder.layer4.2.bn2.running_mean", "encoder.encoder.layer4.2.bn2.running_var", "encoder.encoder.layer4.2.conv3.weight", "encoder.encoder.layer4.2.bn3.weight", "encoder.encoder.layer4.2.bn3.bias", "encoder.encoder.layer4.2.bn3.running_mean", "encoder.encoder.layer4.2.bn3.running_var", "encoder.encoder.fc.weight", "encoder.encoder.fc.bias", "decoder.conv2.weight", "decoder.conv2.bias", "decoder.up1._net.0.weight", "decoder.up1._net.0.bias", "decoder.up1._net.1.weight", "decoder.up1._net.1.bias", "decoder.up1._net.1.running_mean", "decoder.up1._net.1.running_var", "decoder.up1._net.3.weight", "decoder.up1._net.3.bias", "decoder.up1._net.4.weight", "decoder.up1._net.4.bias", "decoder.up1._net.4.running_mean", "decoder.up1._net.4.running_var", "decoder.up2._net.0.weight", "decoder.up2._net.0.bias", "decoder.up2._net.1.weight", "decoder.up2._net.1.bias", "decoder.up2._net.1.running_mean", "decoder.up2._net.1.running_var", "decoder.up2._net.3.weight", "decoder.up2._net.3.bias", "decoder.up2._net.4.weight", "decoder.up2._net.4.bias", "decoder.up2._net.4.running_mean", "decoder.up2._net.4.running_var", "decoder.up3._net.0.weight", "decoder.up3._net.0.bias", "decoder.up3._net.1.weight", "decoder.up3._net.1.bias", "decoder.up3._net.1.running_mean", "decoder.up3._net.1.running_var", "decoder.up3._net.3.weight", "decoder.up3._net.3.bias", "decoder.up3._net.4.weight", "decoder.up3._net.4.bias", "decoder.up3._net.4.running_mean", "decoder.up3._net.4.running_var", "decoder.up4._net.0.weight", "decoder.up4._net.0.bias", "decoder.up4._net.1.weight", "decoder.up4._net.1.bias", "decoder.up4._net.1.running_mean", "decoder.up4._net.1.running_var", "decoder.up4._net.3.weight", "decoder.up4._net.3.bias", "decoder.up4._net.4.weight", "decoder.up4._net.4.bias", "decoder.up4._net.4.running_mean", "decoder.up4._net.4.running_var", "decoder.conv3.weight", "decoder.conv3.bias".

Trouble with inference

I am trying to run inference on some images using the KITTI (ConvNeXt-L) linked on the README.

the model runs without error, but outputs look like this:

I believe something is going wrong since this doesn't match the supposed results.

I read some of this repo's previous issues that had similar images, but didn't find any full configuration setups that fixed the issue.

I am running inference with this command from the README
python test_simple_SQL_config.py ./args_files/args_test_simple_kitti_320x1024.txt

Where can I find a combination of:

model from the Pretrained Weights Section
config file from /SfMNeXt-Impl-main/args_files
contents of config file that correctly access the model above and set it up without error

Thanks!

Loading pretrained weights for EfficientNetB5 - Missing key(s) in state_dict

Hi!

When I attempt to load the pretrained weights you provided for EfficientNetB5, there appear to be some mismatches between the keys in the state_dict. Loading the weights was quite straightforward for ResNet50 and ConvNeXt, but this was not the case with EfficientNetB5

Can we get metric depth ?

Is there any function to get the predicted metric depth from video or image? For my use case, I want metric depth in real time.

Question about disp map

Hi, author, thanks for your remarkable work.
I attempted training with the following settings.
--dataset kitti
--eval_split eigen
--height 192
--width 640
--batch_size 16
--num_epochs 25
--model_dim 32
--patch_size 16
--query_nums 120
--scheduler_step_size 15
--eval_mono
--post_process
--min_depth 0.001
--max_depth 80.0
--backbone resnet18_lite
As training progresses, the loss gradually decreases, and various metrics show improvement. However, the obtained disp maps looks strange. Here are disp maps obtained at the seventh epoch.

It seems to render nearby objects in dark colors and distant objects in bright colors. Is this normal?

Regarding fps improvement.

I trained this model in efficient_b5 as your advice. (#27)
Of cource, FPS has improved significantly.
A whopping about 7FPS was achieved in the TensorRT environment!
I would be very happy with a little improvement. Can you give me some advice?
Thanks in advance.

What is difference with Lite_depth_decoder_QTR and depth_decoder_QTR

How to get gt_depths

Nice work!
gt_path = os.path.join(splits_dir, opt.eval_split, "gt_depths.npz") in evaluate_depth_config.py corresponds to the path where the ground truth depth data is expected to be for evaluation. It does not directly correspond to the path of the raw data from the KITTI dataset, such as kitti_data/raw/2011_09_26/2011_09_26_drive_0002_sync/velodyne_points/data/0000000069.bin.
So how can I get gt_depths.npz?

Promblem about reproducing the results

Nice work! I would appreciate your guidance on the following two questions:
1.This is the result of my testing with the latest code using the 'kitti-resnet50-640*192' weights. How do you perceive the errors introduced by shadows?

2.What potential issues do you think might arise when using this depth estimation result for novel view synthesis? It seems that this adaptive binning approach is very friendly for NVS(novel view synthesis).

ImportError: cannot import name 'PixelWiseDotProduct_for_dense' from 'layers'

The layers dont have the function or class : PixelWiseDotProduct_for_dense, PixelWiseDotProduct_for_summary, FullQueryLayer

hisfog / sfmnext-impl Goto Github PK

sfmnext-impl's People

Contributors

Stargazers

Watchers

Forkers

sfmnext-impl's Issues

Recommend Projects

Recommend Topics

Recommend Org