Giter Club home page Giter Club logo

sphereformer's Introduction

PWC PWC

Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023)

This is the official PyTorch implementation of SphereFormer (CVPR 2023).

Spherical Transformer for LiDAR-based 3D Recognition [Paper]

Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia

Highlight

  1. SphereFormer is a plug-and-play transformer module. We develop radial window attention, which significantly boosts the segmentation performance of distant points, e.g., from 13.3% to 30.4% mIoU on nuScenes lidarseg val set.
  2. It achieves superior performance on various outdoor semantic segmentation benchmarks, e.g., nuScenes, SemanticKITTI, Waymo, and also shows competitive results on nuScenes detection dataset.
  3. This repository employs a fast and memory-efficient library for sparse transformer with varying token numbers, SparseTransformer.

Get Started

For object deteciton, please go to the detection/ directory. (or click Here)

The below guide is for semantic segmentation.

Environment

Install dependencies (we test on python=3.7.9, pytorch==1.8.0, cuda==11.1, gcc==7.5.0)

git clone https://github.com/dvlab-research/SphereFormer.git --recursive
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch_scatter==2.0.9
pip install torch_geometric==1.7.2
pip install spconv-cu114==2.1.21
pip install torch_sparse==0.6.12 cumm-cu114==0.2.8 torch_cluster==1.5.9
pip install tensorboard timm termcolor tensorboardX

Install sptr

cd third_party/SparseTransformer && python setup.py install

Note: Make sure you have installed gcc and cuda, and nvcc can work (if you install cuda by conda, it won't provide nvcc and you should install cuda manually.)

Datasets Preparation

nuScenes (test git aaaaa)

Download the nuScenes dataset from here. Unzip and arrange it as follows. Then fill in the data_root entry in the .yaml configuration file.

nuscenes/
|--- v1.0-trainval/
|--- samples/
|------- LIDAR_TOP/
|--- lidarseg/
|------- v1.0-trainval/

Then, fill in the data_path and save_dir in data/nuscenes_preprocess_infos.py, then generate the infos by

pip install nuscenes-devkit pyquaternion
cd data && python nuscenes_preprocess_infos.py

SemanticKITTI

Download the SemanticKIITI dataset from here. Unzip and arrange it as follows. Then fill in the data_root entry in the .yaml configuration file.

dataset/
|--- sequences/
|------- 00/
|------- 01/
|------- 02/
|------- 03/
|------- .../

Waymo Open Dataset

Download the Waymo Open Dataset from here. Unzip and arrange it as follows. Then fill in the data_root entry in the .yaml configuration file.

waymo/
|--- training/
|--- validation/
|--- testing/

Then, transfer the raw files into the format of SemanticKITTI as follows. (Note: do not use GPU here, and CPU works well already)

cd data/waymo_to_semanticKITTI
CUDA_VISIBLE_DEVICES="" python convert.py --load_dir [YOUR_DATA_ROOT] --save_dir [YOUR_SAVE_ROOT]

Training

nuScenes

python train.py --config config/nuscenes/nuscenes_unet32_spherical_transformer.yaml

SemanticKITTI

python train.py --config config/semantic_kitti/semantic_kitti_unet32_spherical_transformer.yaml

Waymo Open Dataset

python train.py --config config/waymo/waymo_unet32_spherical_transformer.yaml

Validation

For validation, you need to modify the .yaml config file. (1) fill in the weight with the path of model weight (.pth file); (2) set val to True; (3) for testing-time augmentation, set use_tta to True and set vote_num accordingly. After that, run the following command.

python train.py --config [YOUR_CONFIG_PATH]

Pre-trained Models

dataset Val mIoU (tta) Val mIoU mIoU_close mIoU_medium mIoU_distant Download
nuScenes 79.5 78.4 80.8 60.8 30.4 Model Weight
SemanticKITTI 69.0 67.8 68.6 60.4 17.8 Model Weight
Waymo Open Dataset 70.8 69.9 70.3 68.6 61.9 N/A

Note: Pre-trained weights on Waymo Open Dataset are not released due to the regulations.

SpTr Library

The SpTr library is highly recommended for sparse transformer, particularly for 3D point cloud attention. It is fast, memory-efficient and easy-to-use. The github repository is https://github.com/dvlab-research/SparseTransformer.git.

Citation

If you find this project useful, please consider citing:

@inproceedings{lai2023spherical,
  title={Spherical Transformer for LiDAR-based 3D Recognition},
  author={Lai, Xin and Chen, Yukang and Lu, Fanbin and Liu, Jianhui and Jia, Jiaya},
  booktitle={CVPR},
  year={2023}
}

Our Works on 3D Point Cloud

  • Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023) [Paper] [Code] : A plug-and-play transformer module that boosts performance for distant region (for 3D LiDAR point cloud)

  • Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022): [Paper] [Code] : Point-based window transformer for 3D point cloud segmentation

  • SparseTransformer (SpTr) Library [Code] : A fast, memory-efficient, and easy-to-use library for sparse transformer with varying token numbers.

sphereformer's People

Contributors

x-lai avatar yukang2017 avatar ziliangmiao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.