Giter Club home page Giter Club logo

ponderv2's Introduction

PonderV2: Pave the Way for 3D Foundation Model
with A Universal Pre-training Paradigm

Haoyi Zhu1,4*, Honghui Yang1,3*, Xiaoyang Wu1,2*, Di Huang1*, Sha Zhang1,4, Xianglong He1,
Hengshuang Zhao2, Chunhua Shen3, Yu Qiao1, Tong He1, Wanli Ouyang1

1Shanghai AI Lab, 2HKU, 3ZJU, 4USTC

PWC
PWC
PWC
PWC

radar

This is the official implementation of paper "PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm".

PonderV2 is a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. It is a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds.

pipeline

Important Notes:

  • PonderV2 indoor pre-training configs have bugs before this commit, please make sure to use the fixed ones.
  • Structured3D RGB-D data preprocessing has bugs before this commit, please re-generate the processed data if you have used the code before that.

News:

  • Dec. 2023: Checkpoint weights are available in model zoo!
  • Dec. 2023: Multi-dataset training supported! More instructions on installation and usage are available. Please check out!
  • Nov. 2023: Model files are released! Usage instructions, complete codes and checkpoints are coming soon!
  • Oct. 2023: PonderV2 is released on arXiv.

Installation

Requirements

  • Ubuntu: 18.04 or higher
  • CUDA: 11.3 or higher
  • PyTorch: 1.10.0 or higher

Conda Environment

conda create -n ponderv2 python=3.8 -y
conda activate ponderv2
# Choose version you want here: https://pytorch.org/get-started/previous-versions/
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
conda install h5py pyyaml -c anaconda -y
conda install sharedarray tensorboard tensorboardx addict einops scipy plyfile termcolor timm -c conda-forge -y
conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y
pip install torch-geometric yapf==0.40.1 opencv-python open3d==0.10.0 imageio
pip install git+https://github.com/openai/CLIP.git

# spconv (SparseUNet)
# refer https://github.com/traveller59/spconv
pip install spconv-cu113

# precise eval
cd libs/pointops
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python  setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python  setup.py install
cd ../..

# NeuS renderer
cd libs/smooth-sampler
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python setup.py install
cd ../..

If you want to run instance segmentation downstream tasks with PointGroup, you should also run the following:

conda install -c bioconda google-sparsehash 
cd libs/pointgroup_ops
python setup.py install --include_dirs=${CONDA_PREFIX}/include
cd ../..

Then uncomment # from .point_group import * in ponder/models/__init__.py.

Data Preparation

Please check out docs/data_preparation.md.

Model Zoo

Please check out docs/model_zoo.md.

Quick Start:

  • Pretraining: Pretrain PonderV2 on indoor or outdoor datasets.

Pre-train PonderV2 (indoor) on single ScanNet dataset with 8 GPUs:

# -g: number of GPUs
# -d: dataset
# -c: config file, the final config is ./config/${-d}/${-c}.py
# -n: experiment name
bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-sc

Pre-train PonderV2 (indoor) on ScanNet, S3DIS and Structured3D datasets using Point Prompt Training (PPT) with 8 GPUs:

bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-ppt-v1m1-0-sc-s3-st-spunet -n ponderv2-pretrain-sc-s3-st

Pre-train PonderV2 (outdoor) on single nuScenes dataset with 4 GPUs:

bash scripts/train.sh -g 4 -d nuscenes -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-nu
  • Finetuning: Finetune on downstream tasks with PonderV2 pre-trained checkpoints.

Finetune PonderV2 on ScanNet semantic segmentation downstream task with PPT:

# -w: path to checkpoint
bash scripts/train.sh -g 8 -d scannet -c semseg-ppt-v1m1-0-sc-s3-st-spunet-lovasz-ft -n ponderv2-semseg-ft -w ${PATH/TO/CHECKPOINT}

Finetune PonderV2 on ScanNet instance segmentation downstream task using PointGroup:

bash scripts/train.sh -g 4 -d scannet -c insseg-ppt-v1m1-0-pointgroup-spunet-ft -n insseg-pointgroup-v1m1-0-spunet-ft -w ${PATH/TO/CHECKPOINT}
  • Testing: Test a finetuned model on a downstream task.
# Based on experiment folder created by training script
bash scripts/test.sh -g 8 -d scannet -n ponderv2-semseg-ft -w ${CHECKPOINT/NAME}

You can download our trained checkpoint weights in docs/model_zoo.md.

For more detailed options and examples, please refer to docs/getting_started.md.

For more outdoor pre-training and downstream information, you can also refer to UniPAD.

Todo:

  • add instructions on installation and usage
  • add ScanNet w. RGB-D dataloader and data pre-processing scripts
  • add multi-dataset loader and trainer
  • add multi-dataset point prompt training model
  • add more pre-training and finetuning configs
  • add pre-trained checkpoints

Citation

@article{zhu2023ponderv2,
  title={PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm}, 
  author={Haoyi Zhu and Honghui Yang and Xiaoyang Wu and Di Huang and Sha Zhang and Xianglong He and Tong He and Hengshuang Zhao and Chunhua Shen and Yu Qiao and Wanli Ouyang},
  journal={arXiv preprint arXiv:2310.08586},
  year={2023}
}

@inproceedings{huang2023ponder,
  title={Ponder: Point cloud pre-training via neural rendering},
  author={Huang, Di and Peng, Sida and He, Tong and Yang, Honghui and Zhou, Xiaowei and Ouyang, Wanli},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16089--16098},
  year={2023}
}

@article{yang2023unipad,
  title={UniPAD: A Universal Pre-training Paradigm for Autonomous Driving}, 
  author={Honghui Yang and Sha Zhang and Di Huang and Xiaoyang Wu and Haoyi Zhu and Tong He and Shixiang Tang and Hengshuang Zhao and Qibo Qiu and Binbin Lin and Xiaofei He and Wanli Ouyang},
  journal={arXiv preprint arXiv:2310.08370},
  year={2023},
}

Acknowledgement

This project is mainly based on the following codebases. Thanks for their great works!

ponderv2's People

Contributors

haoyizhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ponderv2's Issues

Some question about outdoor rendering results

Thanks for your exciting work! I'm learning outdoor reconstruction recently and I'm really interested in your rendering results in Fig.9. Could you please tell me what is the real size of an outdoor scene in your paper? It is really surprising to achieve such accurate and detailed rendering results only with the resolution of 1801805 in driving scenes.

Visualize Code

First, Great Job! But I check the code, no visualization code for segment, any plan to release the related code ?

Readme for Instance Segmentation

Can you add an introduction on how to perform instance segmentation to the README? There is only introduction about semantic segmentation, I tried instance segmentation but failed, thanks

Query about the preprocess of ScanNetpp

Hi, Haoyi. Thanks for your great work. Recently, I find that PonderV2 gets a better performance of semantic segmentation on ScanNet++. I also have tried to use the official toolkit to process the data, but find that the training is very slow on Pointcept. I would like to ask whether it is possible to share the code for ScanNet++?

how to use the network for semantic segmentation?

Hi, thank you for publishing this great work. I have a question related to testing my collected point clouds. I need to test my own collected data for semantic segmentation using a pretrained model (checkpoint file) that used without making full training to S3DIS datasets, is this available ? how to use it?
Thank you for your interset and reply.

Feature extraction for point clouds

Hi, thanks for your great work. I am trying to use your PonderV2 as a frozen feature extractor for new 3D point clouds, like how people use CLIP/DINOv2 for images.

Could you share some simple scripts or ideas about how we can achieve this with your codebase? This can be very helpful.

Change method for rendering decoder

Hello!
I'm now trying to test different rendering methods for rendering decoder, but I have no idea which part in this repo should I pay attention to. Could you please give me some suggestions?

Question about experiment folder created by training script.

Hello, thank you for your great work! I have collection of point cloud data that I want to test using your pre-trained model checkpoint. Can I use them without training the S3DIS dataset?

I'm a bit confused because from reading the testing script, there's an experiment folder that I couldn't fine anywhere except making it by training S3DIS myself. How can I test my own data without training the whole S3DIS dataset first?
image

Your guidance is very much needed, thank you so much.

Training w/ default config diverges and ultimately crashes: NaN or Inf found in input tensor

Hi! Thank you for this work and for the well-documented code release!

I am trying to reproduce the pretraining results, and I am seeing training diverge quite quickly, within 5 batches when using the default config. Is this expected, and how have you dealt with it in the past?

In particular I will see some losses be nan intermittently:

[2024-02-14 04:57:41,686 INFO misc.py line 120 1075918] Train: [2/100][82/150] Data 0.214 (0.115) Batch 28.240 (31.931) Remain 130:59:20 loss: nan depth_loss: nan rgb_loss: nan psnr: nan semantic_loss: 0.8146 free_space_loss: nan sdf_loss: nan eikonal_loss: nan Lr: 0.00009
NaN or Inf found in input tensor.
...
NaN or Inf found in input tensor.
[2024-02-14 04:58:19,912 INFO misc.py line 120 1075918] Train: [2/100][83/150] Data 0.025 (0.114) Batch 38.136 (32.009) Remain 131:17:54 loss: 3.1007 depth_loss: 0.1920 rgb_loss: 0.9712 psnr: 14.9213 semantic_loss: 0.8159 free_space_loss: 0.1176 sdf_loss: 0.7030 eikonal_loss: 0.3010 Lr: 0.00009

For reference, I am running the following command to pretrain on just scannet using 8 v100s:

bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-sn-base

I was able to reproduce other results outside of pretraining: that is, I have been able to successfully evaluate the pretrained weights: ponderv2-ppt-ft-semseg-scannet.pth, and to finetune ponderv2-ppt-pretrain-scannet-s3dis-structured3d.pth and achieve 0.77 mIoU on ScanNet semseg.

Share ScanNet skip.lst

I believe that ScanNet has a few frames where the pose matrices are all -inf, and others where the depth is all missing. For both cases, the Ponder pretraining code ends up throwing an error, and I'm guessing that is why you added the skip.lst file for the scannet.py dataloader.

Would it be possible to share this skip list, since it is not included in the repo and I want to make sure that I am training on the same data that you did :)

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.