Giter Club home page Giter Club logo

cpd's Introduction

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection (CVPR 2024)

This is the codebase of our CVPR 2024 paper. The codebase is still under updating.

Overview

Abstract

CPD (Commonsense Prototype-based Detector) is a high-performance unsupervised 3D object detection framework. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on the Waymo Open Dataset (WOD), and KITTI datasets by a large margin. image

Environment

conda create -n spconv2 python=3.9
conda activate spconv2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator

Environment we tested:

Ubuntu 18.04
Python 3.9.13
PyTorch 1.8.1
Numba 0.53.1
Spconv 2.1.22 # pip install spconv-cu111
NVIDIA CUDA 11.1
4x 3090 GPUs

Prepare Dataset

Waymo Dataset

  • Please download the official Waymo Open Dataset, including the training data training_0000.tar~training_0031.tar and the validation data validation_0000.tar~validation_0007.tar.
  • Unzip all the above xxxx.tar files to the directory of data/waymo/raw_data as follows (You could get 798 train tfrecord and 202 val tfrecord ):
CPD
├── data
│   ├── waymo
│   │   │── ImageSets
│   │   │── raw_data
│   │   │   │── segment-xxxxxxxx.tfrecord
|   |   |   |── ...
|   |   |── waymo_processed_data_train_val_test
│   │   │   │── segment-xxxxxxxx/
|   |   |   |── ...
│   │   │── pcdet_waymo_track_dbinfos_train_cp.pkl
│   │   │── waymo_infos_test.pkl
│   │   │── waymo_infos_train.pkl
│   │   │── waymo_infos_val.pkl
├── pcdet
├── tools

Then, generate dataset information:

python3 -m pcdet.datasets.waymo_unsupervised.waymo_unsupervised_dataset --cfg_file tools/cfgs/dataset_configs/waymo_unsupervised/waymo_unsupervised_cproto.yaml

KITTI Dataset

  • Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):
CasA
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── pcdet
├── tools

Run following command to create dataset infos:

python3 -m pcdet.datasets.kitti.kitti2waymo_dataset create_kitti_infos tools/cfgs/dataset_configs/waymo_unsupervised/kitti2waymo_dataset.yaml

Training

Train using scripts

cd tools
sh dist_train.sh {cfg_file}

The log infos are saved into log-test.txt You can run cat log.txt to view the test results.

or run directly

cd tools
python train.py 

Evaluation

cd tools
sh dist_test.sh {cfg_file}

The log infos are saved into log-test.txt You can run cat log-test.txt to view the test results.

Model Zoo

Model Vehicle 3D AP Pedestrian 3D AP Cyclist 3D AP Download
L1 L2 L1 L2 L1 L2
DBSCAN-single-train 2.65 2.29 0 0 0.25 0.20 ---
OYSTER-single-train 7.91 6.78 0.03 0.02 4.65 4.05 oyster_pretrained
CPD 38.74 33.37 16.53 13.72 4.28 4.13 cpd_pretrained

The thresholds for evaluating these three categories are respectively set to $IoU_{0.7}$, $IoU_{0.5}$, and $IoU_{0.5}$.

Citation

@inproceedings{CPD,
    title={Commonsense Prototype for Outdoor Unsupervised 3D Object Detection},
    author={Wu, Hai and Zhao, Shijia and Huang, Xun and Wen, Chenglu and Li, Xin and Wang, Cheng},
    booktitle={CVPR},
    year={2024}
}

cpd's People

Contributors

hailanyi avatar harry710887048 avatar

Stargazers

daikiyamanaka avatar Yifan Zhang avatar Jokester avatar YangXiuyu avatar  avatar  avatar Minggang Dou avatar Mulin Wan avatar YecheolKim avatar chenhaomingbob avatar Sean Nachtrab avatar Peter Siegel avatar Hai Pham avatar  avatar Xiaobing Han avatar nyqq avatar  avatar  avatar  avatar Jinyeob Kim avatar  avatar  avatar wangrujia avatar qimingxia avatar Shijia Zhao avatar  avatar killer9 avatar Jim avatar Guangjing avatar  avatar sankin avatar  avatar chy_ocean avatar Ioannis Tsampras avatar  avatar Ye Wei avatar zhuyun97 avatar  avatar Sean Stevens avatar

Watchers

Kostas Georgiou avatar Howard H. Tang avatar  avatar Lingdong Kong avatar

cpd's Issues

论文疑问

image

您好,感谢您的贡献!这里是不是把x_k^p和b_k^p写反了?

代码处理问题

你好,祝贺你的工作!
我打算复现你的文章,我主要使用的是kitti数据集,但是我在kitti数据集的处理过程中貌似貌似遇到了一点问题:

  1. 你的文件里似乎没有pcdet文件夹,我执行setup.py的时候会报错,我从你的CasA代码中将pcdet文件夹引入解决了问题
  2. 在kitti数据集的处理命令中,voxel_rcnn_cproto_center_kitti.yaml貌似是模型文件而不是数据处理文件,但是当我替换成kitti2waymo_dataset.yaml文件,处理依然会报错。load() missing 1 required positional argument: 'Loader'
  3. 你的kitti2waymo_dataset.py文件我不清楚和原来的kitti_dataset.py有什么区别,你的kitti2waymo_dataset.py第458行是不是应该改成Kitti2WaymoDataset
  4. 你的文章中有针对前后时段帧的Motion Artifact Removal (MAR)操作,我不太清楚针对kitti数据集这种没有前后帧的数据集该如何操作

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.