Giter Club home page Giter Club logo

voxformer's Introduction

VoxFormer: a Cutting-edge Baseline for 3D Semantic Occupancy Prediction

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion, CVPR 2023.

Yiming Li, Zhiding Yu, Chris Choy, Chaowei Xiao, Jose M. Alvarez, Sanja Fidler, Chen Feng, Anima Anandkumar

[PDF] [Project] [Intro Video]

News

  • [2023/07]: We release the code of voxformer with 3D deformable attention module, achieving slightly better performance.
  • [2023/06]: 🔥 We release SSCBench, a large-scale semantic scene completion benchmark derived from KITTI-360, nuScenes, and Waymo.
  • [2023/06]: Welcome to our CVPR poster session on 21 June (WED-AM-082), and check our online video.
  • [2023/03]: 🔥 VoxFormer is accepted by CVPR 2023 as a highlight paper (235/9155, 2.5% acceptance rate).
  • [2023/02]: Our paper is on arxiv.
  • [2022/11]: VoxFormer achieve the SOTA on SemanticKITTI 3D SSC (Semantic Scene Completion) Task with 13.35% mIoU and 44.15% IoU (camera-only)!

Abstract

Humans can easily imagine the complete 3D geometry of occluded objects and scenes. This appealing ability is vital for recognition and understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images. Our framework adopts a two-stage design where we start from a sparse set of visible and occupied voxel queries from depth estimation, followed by a densification stage that generates dense 3D voxels from the sparse ones. A key idea of this design is that the visual features on 2D images correspond only to the visible scene structures rather than the occluded or empty spaces. Therefore, starting with the featurization and prediction of the visible structures is more reliable. Once we obtain the set of sparse queries, we apply a masked autoencoder design to propagate the information to all the voxels by self-attention. Experiments on SemanticKITTI show that VoxFormer outperforms the state of the art with a relative improvement of 20.0% in geometry and 18.1% in semantics and reduces GPU memory during training by ~45% to less than 16GB.

Method

space-1.jpg
Figure 1. Overall framework of VoxFormer. Given RGB images, 2D features are extracted by ResNet50 and the depth is estimated by an off-the-shelf depth predictor. The estimated depth after correction enables the class-agnostic query proposal stage: the query located at an occupied position will be selected to carry out deformable cross-attention with image features. Afterwards, mask tokens will be added for completing voxel features by deformable self-attention. The refined voxel features will be upsampled and projected to the output space for per-voxel semantic segmentation. Note that our framework supports the input of single or multiple images.

Getting Started

Model Zoo

The query proposal network (QPN) for stage-1 is available here. For stage-2, please download the trained models based on the following table.

Backbone Method Lr Schd IoU mIoU Config Download
R50 VoxFormer-T 20ep 44.15 13.35 config model
R50 VoxFormer-S 20ep 44.02 12.35 config model
R50 VoxFormer-T-3D 20ep 44.35 13.69 config model
R50 VoxFormer-S-3D 20ep 44.42 12.86 config model

Dataset

  • SemanticKITTI
  • KITTI-360
  • nuScenes

Bibtex

If this work is helpful for your research, please cite the following BibTeX entry.

@InProceedings{li2023voxformer,
      title={VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion}, 
      author={Li, Yiming and Yu, Zhiding and Choy, Christopher and Xiao, Chaowei and Alvarez, Jose M and Fidler, Sanja and Feng, Chen and Anandkumar, Anima},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2023}
}

License

Copyright © 2022-2023, NVIDIA Corporation and Affiliates. All rights reserved.

This work is made available under the Nvidia Source Code License-NC. Click here to view a copy of this license.

The pre-trained models are shared under CC-BY-NC-SA-4.0. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Star History

Star History Chart

Acknowledgement

Many thanks to these excellent open source projects:

voxformer's People

Contributors

chrisding avatar lty2226262 avatar roboticsyimingli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voxformer's Issues

Colab/Jupyter Notebook

Hello and congrats to this amazing work!

Thank you for releasing the code, I really appreciate that.

I was wondering whether you will also release a Colab environment once the model is released? I think this would greatly increase accessibility for different audiences.

Kind regards

Question about voxel query

voxformer_query

In the first stage of VoxFormer, I roughly visualized the output of a (128 * 128 * 16) query and it looked like the above image. The paper mentioned they use LMSCNet for Depth correction, but in reality, can we say that they used a higher scale, complete voxel map obtained from the occupancy map ground truth based on the depth pseudo-lidar as the query?

for dense voxel visualization

awosome work! The visualization effect of the dense voxel map in the document is very good, but when I use the output of the model for voxel visualization, my gridmap is not as dense as in the document. like car voxel , it would not occupy the space from car top to groud ,but single layer grid in car top space. Could you tell me how to visualize the voxel better?
using command
./tools/dist_test.sh ./projects/configs/voxformer/voxformer-S.py ./ckpts/voxformer-S/miou12.35_iou44.02_epoch_14.pth 4
and collecting y_pred(np.array shape 256,256,32).
do i need change my model head?
thanks a loooot!

my grid
https://lingdongfangcheng.feishu.cn/file/VaDOb6kqHoS2ywxg5tXcfVIMnIg?from=from_copylink

How to support single-image input for inference?

In the context of Occ related tasks, why do some works support single-image input for inference, while others require multiple images for inference? What is the key factor causing this difference?

MVXTwoStageDetector

Hi,

First of all, congrats on the paper! Thank you for sharing this interesting work + amazing results!
I have a question regarding the second stage implementation:
Was there a specific reason as to why the MVXTwoStageDetector was used as the base model for implementing VoxFormer?

Thanks

Usage of mask token

Is the function of the mask token m to map the gray voxels that are not selected as query in the first stage to the purple voxels in the second stage?Looking forward to your reply. :D

Stage-2 Training Error

Hello @RoboticsYimingLi , Now I am reproducing Your fantastic VoxFormer
But, when stage-1 study is well done, stage-2 make error like below

my compute spec is
Tesla A100(80G) 4 GPUS
CUDA 11.3 torch 1.10.1

Please Tell me solution... I can't do anything now...

Error Message

error in ms_deformable_im2col_cuda: an illegal memory access was encountered
Traceback (most recent call last):
File "./tools/train.py", line 261, in
main()
File "./tools/train.py", line 250, in main
custom_train_model(
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/apis/train.py", line 27, in custom_train_model
custom_train_detector(
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/apis/mmdet_train.py", line 200, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/detectors/voxformer.py", line 108, in forward
return self.forward_train(**kwargs)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/detectors/voxformer.py", line 138, in forward_train
losses_pts = self.forward_pts_train(img_feats, img_metas, target)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/detectors/voxformer.py", line 93, in forward_pts_train
outs = self.pts_bbox_head(img_feats, img_metas, target)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/voxformer_head.py", line 95, in forward
seed_feats = self.cross_transformer.get_vox_features(
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/modules/transformer.py", line 136, in get_vox_features
bev_embed = self.encoder(
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/modules/encoder.py", line 205, in forward
output = layer(
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/modules/encoder.py", line 372, in forward
query = self.attentions[attn_index](
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func
return old_func(*args, kwargs)
File "/home/hamyo/SSC/mmdetection3d/VoxFormer/projects/mmdet3d_plugin/voxformer/modules/deformable_cross_attention.py", line 171, in forward
slots[j, index_query_per_img] += queries[j, i, :len(index_query_per_img)]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1639180588308/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdede183d62 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: + 0x1c613 (0x7fdf236a1613 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0x1a2 (0x7fdf236a2022 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7fdede16d314 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: + 0x295359 (0x7fdf776a3359 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0xadb231 (0x7fdf77ee9231 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object
) + 0x292 (0x7fdf77ee9532 in /home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4d386f]
frame #8: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4e55cb]
frame #9: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4e55cb]
frame #10: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4e0800]
frame #11: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f16a8]
frame #12: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #13: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #14: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #15: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #16: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #17: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #18: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4f1691]
frame #19: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x4c9280]
frame #20: PyDict_SetItemString + 0x52 (0x5823a2 in /home/hamyo/anaconda3/envs/voxformer/bin/python)
frame #21: PyImport_Cleanup + 0x93 (0x5a7623 in /home/hamyo/anaconda3/envs/voxformer/bin/python)
frame #22: Py_FinalizeEx + 0x71 (0x5a6751 in /home/hamyo/anaconda3/envs/voxformer/bin/python)
frame #23: Py_RunMain + 0x112 (0x5a21f2 in /home/hamyo/anaconda3/envs/voxformer/bin/python)
frame #24: Py_BytesMain + 0x39 (0x57a799 in /home/hamyo/anaconda3/envs/voxformer/bin/python)
frame #25: __libc_start_main + 0xe7 (0x7fdfb1900c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #26: /home/hamyo/anaconda3/envs/voxformer/bin/python() [0x57a64d]

WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 53769 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 53771 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 53772 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 53770) of binary: /home/hamyo/anaconda3/envs/voxformer/bin/python
Traceback (most recent call last):
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/hamyo/anaconda3/envs/voxformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

./tools/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-04-20_00:00:00
host : server-45
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 53770)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 53770

Visualize GT voxel data

Hi, could you please tell me if there's any way to visualize the GT voxel data after downloading all necessary datasets?

Thank you so much!

Inference performance

Hey, do you have any statistics on the throughput of the network? E.g. can it run realtime in "low" end devices like Jetson?

question for qpn train

when i trained the stage 1, I found that the code coredump was caused because this parameter(save_query_path) was not defined.
How can I fix it?
image

evaluate problem

When I run test.py, I will report an error: TypeError: string indices must be integers
In the following code:

    for result in results:
        !!self.metrics.add_batch(result['y_pred'], result['y_true'])!!
    metric_prefix = f'{result_name}_SemanticKITTI'

In semantic_kitti_dataset_stage2.py and semantic_kitti_dataset_stage1.py. The function is “evaluate”.

Whether it's me testing qpn or voxformer-S, this error occurred in the same location.

stage2 training error

File "/home/ziliu/anaconda3/envs/semantics/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func
return old_func(*args, **kwargs)
File "/data/acentauri/user/ziliu/data/voxformer/projects/mmdet3d_plugin/voxformer/modules/deformable_cross_attention.py", line 143, in forward
index_query_per_img = mask_per_img[0].sum(-1).nonzero().squeeze(-1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

odometry_poses file

It seems the odometry_poses file downloaded from KITTI website is different from the poses file provided in this repo

Timeline for code/models?

Congrats on the paper, looks very interesting!
When do you plan on publishing the code?
Are you also going to publish any pre-trained models?

Thanks :)

preds are nan

Thanks for your great work. I have a issue. In stage2, my preds are nan at the start of training and it turns out error. Have you ever encounted this problem?
I train using VoxFormer-T

bug show_results

Traceback (most recent call last):
  File "./tools/test.py", line 263, in <module>
    main()
  File "./tools/test.py", line 228, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/home/Dataset/weiqian/mmdetection3d/mmdet3d/apis/test.py", line 53, in single_gpu_test
    out_dir=out_dir,
  File "/home/Dataset/weiqian/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 466, in show_results
    if isinstance(data['points'][0], DC):
KeyError: 'points'

name 'save_query_path' is not defined

When eval "qpn", it will raise "name 'save_query_path' is not defined". I check line 317 of lmscnet.py, and find that the defination of "save_query_path" in line 314 is masked. Are there some problems here?

Issue with saving and visualizing results.

Hi,

I have managed to run the first stage and save the quarry proposals.

Now I'm running the second stage with:
./tools/dist_test.sh ./projects/configs/voxformer/voxformer-S.py ./ckpts/voxformer-S/miou12.35_iou44.02_epoch_14.pth 1
And I'm having problems with saving and visualizing the results.

I had to uncomment the following line:
https://github.com/NVlabs/VoxFormer/blob/9d28f0ed857291f2b707e66757118c466e1f158d/tools/test.py#L242C21-L242C21

And I had to remove the ['bbox_results'] from mmcv.dump(outputs['bbox_results'], args.out) because it raised a KeyError.
Now I have managed to save the .pkl but have no clue on what should be next step.
How do I visualize the results? you said you used semantic-kitti-ap but it is not compatible with the .pkl file.
I also tried to add --show-dir and --show flags to the test script, but with no effect.

I would really appreciate help in visualizing the results.
Thank you.

training log

could you please provide the training log, thanks!

data_odometry_voxels.zip

could you please provide the data_odometry_voxels.zip, the download speed is slow in semantickitti website

dist_test.sh failure

Hi, there!
Thank you for your work!

i'm trying to launch dist_test.sh, but got an error

./tools/dist_test.sh ./projects/configs/voxformer/qpn.py ./qpn/qpn_iou52.03.pth 1

File "./projects/mmdet3d_plugin/voxformer/detectors/lmscnet.py", line 317, in foward_test
y_pred_bin.tofile(save_query_path)
NameError: name 'save_query_path' is not defined

Also tried adding flags:
"--out" -- not implemented

VoxFormer/tools/test.py

Lines 239 to 241 in 9d28f0e

if args.out:
print(f'\nwriting results to {args.out}')
assert False

"format-only" - dataloader specified in config () has not function format_results
image

VoxFormer/tools/test.py

Lines 246 to 247 in 9d28f0e

if args.format_only:
dataset.format_results(outputs, **kwargs)

How to write test results to a disk for visualization with semantic-kitti-api ?

Or maybe I'm launching dist_test.sh incorrectly?

Looking forward for your reply!
Thanks

What might be the cause for the inferiority to LiDAR-based methods?

A concise solution towards camera-based SSC!
Yet, I'm still wondering what might be the cause for the inferiority to LiDAR-based methods like SSCNet & JS3CNet?

Since VoxFormer:

  1. has already employed a depth estimator to obtain geometry information
  2. able to perform on par with LiDAR-based methods at the range of 12.8 m, (I assume LiDAR helps more at close range?)

what else may make VoxFormer and other camera-based methods lag behind the LiDAR-based ones?

No module named 'deform3dattn_custom_cn'

Thanks for your great work.

I have issues. I was trying to run

./tools/dist_test.sh ./projects/configs/voxformer/qpn.py ./path/to/ckpts.pth 1

on my computer, which only has 1 GPU. and the error came as follow:

NotImplementedError: Use sys.path.append here to modify the path to your .so file
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 20568) of binary: /home/made/dep/anaconda/anaconda3/envs/voxformer/bin/python

lt' s located in multi_scale_deformable_attn_3D_custom_function.py, where I found the problem:

No module named 'deform3dattn_custom_cn'

I have followed every steps in install.md, and maybe the mmdetection3d has some problems, but I could not figure them

How do I visualize the.pkl file from the stage2 test

Hello, I would like to ask how you load the pkl result and convert it into the data form required by semantic-kitti-api. I run the "./tools/my_dist_test. sh ./projects/configs/voxformer/voxformer-T.py /home/ZHX/Data/sy/VoxFormer/ckpts/VoxFormer-T/miou13.35_iou44.15_epoch_12.pth 1", I got a file called part_0.pkl which size is 20 g . If you can provide the conversion code, I will be appreciated, looking forward to your reply!

Where is voxel query?

Thank you for your research and I have a few questions for you.

I don't know exactly where the voxel query resides.
I am curious about the following points.

  1. Is a voxel query a voxelized point cloud? If so, is the voxel query provided in sequences_msnet3d_sweep10?
  2. Is the voxel query different from the binary voxel grid map (M_in) introduced in the paper?
  3. Where is the voxel query implemented in the code?

Problem of label_preprocess.py

Here is the down-sampling code block in label_preprocess.py:

for scale in downscaling:
                filename = frame_id + "_" + scale + ".npy"
                label_filename = os.path.join(out_dir, filename)
                # If files have not been created...
                if not os.path.exists(label_filename):
                    if scale == "1_8":
                        LABEL_ds = _downsample_label(
                            LABEL, (256, 256, 32), downscaling[scale]
                        )
                    else:
                        LABEL_ds = LABEL

I find that downscaling = {"1_1": 1, "1_2": 2}, which means scale is either "1_1" or "1_2". The program will enter the "else" branch, where LABEL_ds = LABEL, and no downsample is carried out.
So I wonder where is the actual downsample process? Is there a problem or did I just miss something?
Thank you for your respond!

Use in small sizes

I would like to use this method on a smaller area (10m✕5m) to reduce the computational cost. Is there any way to do this?
Thank you for your work!

Error while trying to run test

Hello, I followed the installation guide and tried to run test, and got this error:

Traceback (most recent call last):
  File "./tools/test.py", line 19, in <module>
    from mmdet3d.apis import single_gpu_test
  File "/home/max/Projects/vox/mmdetection3d/mmdet3d/apis/__init__.py", line 2, in <module>
    from .inference import (convert_SyncBN, inference_detector,
  File "/home/max/Projects/vox/mmdetection3d/mmdet3d/apis/inference.py", line 11, in <module>
    from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes,
  File "/home/max/Projects/vox/mmdetection3d/mmdet3d/core/__init__.py", line 2, in <module>
    from .anchor import *  # noqa: F401, F403
  File "/home/max/Projects/vox/mmdetection3d/mmdet3d/core/anchor/__init__.py", line 2, in <module>
    from mmdet.core.anchor import build_prior_generator
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmdet/core/__init__.py", line 2, in <module>
    from .bbox import *  # noqa: F401, F403
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmdet/core/bbox/__init__.py", line 7, in <module>
    from .samplers import (BaseSampler, CombinedSampler,
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmdet/core/bbox/samplers/__init__.py", line 9, in <module>
    from .score_hlr_sampler import ScoreHLRSampler
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmdet/core/bbox/samplers/score_hlr_sampler.py", line 2, in <module>
    from mmcv.ops import nms_match
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 2, in <module>
    from .assign_score_withk import assign_score_withk
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmcv/ops/assign_score_withk.py", line 5, in <module>
    ext_module = ext_loader.load_ext(
  File "/home/max/miniconda3/envs/om/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/home/max/miniconda3/envs/om/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'

I can't find a way to fix this,
Thanks

Depth Estimation Problems

If there is only monocular camera available, is it possible to use this voxFormer?

  1. Depth estimation need images from binocular camera (sterero-camera), not monocular?
  2. Why don't you used monodepth2 , its performance is not good as MobileStereoNet?

Question about attention.

Hello there! The work and code is amazing. But I have some question about the code, in deformable_*_attention.py, the attention weight is from a linear transform from query, I do not understand why the attention weight is unrelevent to the key.

About the stage1 training

According to your paper:
image
You output the query proposal based on the depth prediction, however I saw in your released code that you used LMSCNet to generate the proposal, is it a misalignment? It seems that LMSCNet didn't generate a depth prediction, but got lidar input and output the class-specified results. Look forward to your reply!

split problem

splits = { "train": ["00", "01", "02", "03", "04", "05", "06", "07", "09", "10"], "val": ["08"], "test": ["11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21"], }
I found the config use val split to test. Whether the test split is not used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.