The ponderv2 from opengvlab

ponderv2's Introduction

PonderV2: Pave the Way for 3D Foundation Model
with A Universal Pre-training Paradigm

Haoyi Zhu^1,4*, Honghui Yang^1,3*, Xiaoyang Wu^1,2*, Di Huang^1*, Sha Zhang^1,4, Xianglong He¹,
Hengshuang Zhao², Chunhua Shen³, Yu Qiao¹, Tong He¹, Wanli Ouyang¹

¹Shanghai AI Lab, ²HKU, ³ZJU, ⁴USTC

This is the official implementation of paper "PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm".

PonderV2 is a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. It is a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds.

Important Notes:

PonderV2 indoor pre-training configs have bugs before this commit, please make sure to use the fixed ones.
Structured3D RGB-D data preprocessing has bugs before this commit, please re-generate the processed data if you have used the code before that.

News:

Dec. 2023: Checkpoint weights are available in model zoo!
Dec. 2023: Multi-dataset training supported! More instructions on installation and usage are available. Please check out!
Nov. 2023: Model files are released! Usage instructions, complete codes and checkpoints are coming soon!
Oct. 2023: PonderV2 is released on arXiv.

Installation

Requirements

Ubuntu: 18.04 or higher
CUDA: 11.3 or higher
PyTorch: 1.10.0 or higher

Conda Environment

conda create -n ponderv2 python=3.8 -y
conda activate ponderv2
# Choose version you want here: https://pytorch.org/get-started/previous-versions/
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
conda install h5py pyyaml -c anaconda -y
conda install sharedarray tensorboard tensorboardx addict einops scipy plyfile termcolor timm -c conda-forge -y
conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y
pip install torch-geometric yapf==0.40.1 opencv-python open3d==0.10.0 imageio
pip install git+https://github.com/openai/CLIP.git

# spconv (SparseUNet)
# refer https://github.com/traveller59/spconv
pip install spconv-cu113

# precise eval
cd libs/pointops
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python  setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python  setup.py install
cd ../..

# NeuS renderer
cd libs/smooth-sampler
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python setup.py install
cd ../..

If you want to run instance segmentation downstream tasks with PointGroup, you should also run the following:

conda install -c bioconda google-sparsehash 
cd libs/pointgroup_ops
python setup.py install --include_dirs=${CONDA_PREFIX}/include
cd ../..

Then uncomment # from .point_group import * in ponder/models/__init__.py.

Data Preparation

Please check out docs/data_preparation.md.

Model Zoo

Please check out docs/model_zoo.md.

Quick Start:

Pretraining: Pretrain PonderV2 on indoor or outdoor datasets.

Pre-train PonderV2 (indoor) on single ScanNet dataset with 8 GPUs:

# -g: number of GPUs
# -d: dataset
# -c: config file, the final config is ./config/${-d}/${-c}.py
# -n: experiment name
bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-sc

Pre-train PonderV2 (indoor) on ScanNet, S3DIS and Structured3D datasets using Point Prompt Training (PPT) with 8 GPUs:

bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-ppt-v1m1-0-sc-s3-st-spunet -n ponderv2-pretrain-sc-s3-st

Pre-train PonderV2 (outdoor) on single nuScenes dataset with 4 GPUs:

bash scripts/train.sh -g 4 -d nuscenes -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-nu

Finetuning: Finetune on downstream tasks with PonderV2 pre-trained checkpoints.

Finetune PonderV2 on ScanNet semantic segmentation downstream task with PPT:

# -w: path to checkpoint
bash scripts/train.sh -g 8 -d scannet -c semseg-ppt-v1m1-0-sc-s3-st-spunet-lovasz-ft -n ponderv2-semseg-ft -w ${PATH/TO/CHECKPOINT}

Finetune PonderV2 on ScanNet instance segmentation downstream task using PointGroup:

bash scripts/train.sh -g 4 -d scannet -c insseg-ppt-v1m1-0-pointgroup-spunet-ft -n insseg-pointgroup-v1m1-0-spunet-ft -w ${PATH/TO/CHECKPOINT}

Testing: Test a finetuned model on a downstream task.

# Based on experiment folder created by training script
bash scripts/test.sh -g 8 -d scannet -n ponderv2-semseg-ft -w ${CHECKPOINT/NAME}

You can download our trained checkpoint weights in docs/model_zoo.md.

For more detailed options and examples, please refer to docs/getting_started.md.

For more outdoor pre-training and downstream information, you can also refer to UniPAD.

Todo:

add instructions on installation and usage
add ScanNet w. RGB-D dataloader and data pre-processing scripts
add multi-dataset loader and trainer
add multi-dataset point prompt training model
add more pre-training and finetuning configs
add pre-trained checkpoints

Citation

@article{zhu2023ponderv2,
  title={PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm}, 
  author={Haoyi Zhu and Honghui Yang and Xiaoyang Wu and Di Huang and Sha Zhang and Xianglong He and Tong He and Hengshuang Zhao and Chunhua Shen and Yu Qiao and Wanli Ouyang},
  journal={arXiv preprint arXiv:2310.08586},
  year={2023}
}

@inproceedings{huang2023ponder,
  title={Ponder: Point cloud pre-training via neural rendering},
  author={Huang, Di and Peng, Sida and He, Tong and Yang, Honghui and Zhou, Xiaowei and Ouyang, Wanli},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16089--16098},
  year={2023}
}

@article{yang2023unipad,
  title={UniPAD: A Universal Pre-training Paradigm for Autonomous Driving}, 
  author={Honghui Yang and Sha Zhang and Di Huang and Xiaoyang Wu and Haoyi Zhu and Tong He and Shixiang Tang and Hengshuang Zhao and Qibo Qiu and Binbin Lin and Xiaofei He and Wanli Ouyang},
  journal={arXiv preprint arXiv:2310.08370},
  year={2023},
}

Acknowledgement

This project is mainly based on the following codebases. Thanks for their great works!

ponderv2's People

Contributors

Stargazers

Watchers

ponderv2's Issues

请问室内场景在训练的时候，如果没有 rgb 信息，同样需要做颜色的渲染监督吗？在训练的时候遇到没有 rgb 信息的点云时候是怎么做的呢？训练的用于渲染的 shallow MLP 是所有场景共享的，还是每个场景有各自的 MLP 参数？

Some question about outdoor rendering results

Thanks for your exciting work! I'm learning outdoor reconstruction recently and I'm really interested in your rendering results in Fig.9. Could you please tell me what is the real size of an outdoor scene in your paper? It is really surprising to achieve such accurate and detailed rendering results only with the resolution of 1801805 in driving scenes.

Visualize Code

First, Great Job! But I check the code, no visualization code for segment, any plan to release the related code ?

How can I run the downsteam tasks with your model?

How can I run the downsteam tasks with your model? Will you provide some example code snippets?

Readme for Instance Segmentation

Can you add an introduction on how to perform instance segmentation to the README? There is only introduction about semantic segmentation, I tried instance segmentation but failed, thanks

Query about the preprocess of ScanNetpp

Hi, Haoyi. Thanks for your great work. Recently, I find that PonderV2 gets a better performance of semantic segmentation on ScanNet++. I also have tried to use the official toolkit to process the data, but find that the training is very slow on Pointcept. I would like to ask whether it is possible to share the code for ScanNet++?

how to use the network for semantic segmentation?

Hi, thank you for publishing this great work. I have a question related to testing my collected point clouds. I need to test my own collected data for semantic segmentation using a pretrained model (checkpoint file) that used without making full training to S3DIS datasets, is this available ? how to use it?
Thank you for your interset and reply.

Can you provide per-class IoU for semantic segmentation on S3DIS 6-fold?

We note that your excellent work yielded very competitive results, but the paper does not show detailed per-class IoU results for S3DIS 6-fold. We would appreciate it if you could provide per-class IoU results!

Will this project support KITTI dateset in the future? : )

Feature extraction for point clouds

Hi, thanks for your great work. I am trying to use your PonderV2 as a frozen feature extractor for new 3D point clouds, like how people use CLIP/DINOv2 for images.

Could you share some simple scripts or ideas about how we can achieve this with your codebase? This can be very helpful.

Estimated release date

Hi,
Super interesting work. May I know a tentative release date?

Thank you!

Change method for rendering decoder

Hello!
I'm now trying to test different rendering methods for rendering decoder, but I have no idea which part in this repo should I pay attention to. Could you please give me some suggestions?

Looking forward to seeing more instructions and code

Hi, thanks for your great work. 👍
Looking forward to seeing more instructions and code! 😃

Question about experiment folder created by training script.

Hello, thank you for your great work! I have collection of point cloud data that I want to test using your pre-trained model checkpoint. Can I use them without training the S3DIS dataset?

I'm a bit confused because from reading the testing script, there's an experiment folder that I couldn't fine anywhere except making it by training S3DIS myself. How can I test my own data without training the whole S3DIS dataset first?

Your guidance is very much needed, thank you so much.

Training w/ default config diverges and ultimately crashes: NaN or Inf found in input tensor

Hi! Thank you for this work and for the well-documented code release!

I am trying to reproduce the pretraining results, and I am seeing training diverge quite quickly, within 5 batches when using the default config. Is this expected, and how have you dealt with it in the past?

In particular I will see some losses be nan intermittently:

[2024-02-14 04:57:41,686 INFO misc.py line 120 1075918] Train: [2/100][82/150] Data 0.214 (0.115) Batch 28.240 (31.931) Remain 130:59:20 loss: nan depth_loss: nan rgb_loss: nan psnr: nan semantic_loss: 0.8146 free_space_loss: nan sdf_loss: nan eikonal_loss: nan Lr: 0.00009
NaN or Inf found in input tensor.
...
NaN or Inf found in input tensor.
[2024-02-14 04:58:19,912 INFO misc.py line 120 1075918] Train: [2/100][83/150] Data 0.025 (0.114) Batch 38.136 (32.009) Remain 131:17:54 loss: 3.1007 depth_loss: 0.1920 rgb_loss: 0.9712 psnr: 14.9213 semantic_loss: 0.8159 free_space_loss: 0.1176 sdf_loss: 0.7030 eikonal_loss: 0.3010 Lr: 0.00009

For reference, I am running the following command to pretrain on just scannet using 8 v100s:

bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-sn-base

I was able to reproduce other results outside of pretraining: that is, I have been able to successfully evaluate the pretrained weights: ponderv2-ppt-ft-semseg-scannet.pth, and to finetune ponderv2-ppt-pretrain-scannet-s3dis-structured3d.pth and achieve 0.77 mIoU on ScanNet semseg.

Share ScanNet skip.lst

I believe that ScanNet has a few frames where the pose matrices are all -inf, and others where the depth is all missing. For both cases, the Ponder pretraining code ends up throwing an error, and I'm guessing that is why you added the skip.lst file for the scannet.py dataloader.

Would it be possible to share this skip list, since it is not included in the repo and I want to make sure that I am training on the same data that you did :)