Giter Club home page Giter Club logo

cgformer's Introduction

Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

🚀 News

  • 2024.5.27 code released
  • 2024.5.24 arXiv preprint released

Introduction

Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks. Existing sparse-to-dense methods typically employ shared context-independent queries across various input images, which fails to capture distinctions among them as the focal regions of different inputs vary and may result in undirected feature aggregation of cross attention. Additionally, the absence of depth information may lead to points projected onto the image plane sharing the same 2D position or similar sampling points in the feature map, resulting in depth ambiguity. In this paper, we present a novel context and geometry aware voxel transformer. It utilizes a context aware query generator to initialize context-dependent queries tailored to individual input images, effectively capturing their unique characteristics and aggregating information within the region of interest. Furthermore, it extend deformable cross-attention from 2D to 3D pixel space, enabling the differentiation of points with similar image coordinates based on their depth coordinates. Building upon this module, we introduce a neural network named CGFormer to achieve semantic scene completion. Simultaneously, CGFormer leverages multiple 3D representations (i.e., voxel and TPV) to boost the semantic and geometric representation abilities of the transformed 3D volume from both local and global perspectives. Experimental results demonstrate that CGFormer achieves state-of-the-art performance on the SemanticKITTI and SSCBench-KITTI-360 benchmarks, attaining a mIoU of 16.87 and 20.05, as well as an IoU of 45.99 and 48.07, respectively.

Method

overview

Schematics and detailed architectures of CGFormer. (a) The framework of the proposed CGFormer for camera-based semantic scene completion. The pipeline consists of the image encoder for extracting 2D features, the context and geometry aware voxel (CGVT) transformer for lifting the 2D features to 3D volumes, the 3D local and global encoder (LGE) for enhancing the 3D volumes and a decoding head to predict the semantic occupancy. (b) Detailed structure of the context and geometry aware voxel transformer. (c) Details of the Depth Net.

Quantitative Results

SemanticKITTI

KITTI360

Getting Started

step 1. Refer to install.md to install the environment.

step 2. Refer to dataset.md to prepare SemanticKITTI and KITTI360 dataset.

step 3. Refer to train_and_eval.md for training and evaluation.

Model Zoo

We provide the pretrained weights on SemanticKITTI and KITTI360 datasets, reproduced with the released codebase.

Dataset Backbone IoU mIoU Model Weights Training Logs
SemanticKITTI EfficientNetB7 44.41, 45.99 (val) 16.63, 16.89 (val) Link Link
KITTI360 EfficientNetB7 48.07 20.05 Link Link

Acknowledgement

Many thanks to these exceptional open source projects:

As it is not possible to list all the projects of the reference papers. If you find we leave out your repo, please contact us and we'll update the lists.

Bibtex

If you find our work beneficial for your research, please consider citing our paper and give us a star:

@misc{CGFormer,
      title={Context and Geometry Aware Voxel Transformer for Semantic Scene Completion}, 
      author={Zhu Yu and Runming Zhang and Jiacheng Ying and Junchen Yu and Xiaohai Hu and Lun Luo and 						Siyuan Cao and Huiliang Shen},
      year={2024},
      eprint={2405.13675},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

If you encounter any issues, please contact [email protected].

To do

  • Visualization scripts

cgformer's People

Contributors

pkqbajng avatar salemon avatar

Stargazers

WonJune Kim avatar cuiyb avatar Hao avatar chenhaomingbob avatar  avatar Cao Siyuan avatar 庄庭达 avatar  avatar nianxiangfu avatar Kwonyoung Ryu avatar Lun Luo avatar huhupy avatar Zhuoguang Chen avatar Zhiheng Li avatar Jinzhou Lin avatar vasgaowei avatar yahooo avatar  avatar Junming WANG avatar li changliang avatar 李好 avatar Journey avatar Shawn avatar  avatar wangrujia avatar  avatar Jianbiao Mei avatar Lingting Zhu avatar RM-Zhang avatar  avatar

Watchers

 avatar

Forkers

imdumpl

cgformer's Issues

关于训练中的epoch

非常感谢你们的工作,我在论文中看到你们是训练了25个epoch,但是在配置文件中,只有training_steps=25000这个参数,而在kitti360中是54000,我想问一下epoch是根据这个steps算出来的吗?如果我想训练其他数据集,我应该怎么去得到一个epoch中的steps?

About dataset

Hi,
Thanks for releasing code, this is a great work! After I installed dataset according to dataset.md, I found that lidarseg was not installed, but the code could run correctly. May I ask where lidarseg was used.

关于训练时间

非常感谢你们的出色工作,我想请问一下,你们在4张4090上大概训练了多久?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.