mohamedafham / crosspoint Goto Github PK

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Home Page: https://mohamedafham.github.io/CrossPoint/

Shell 1.09% Python 92.07% Jupyter Notebook 6.85%

self-supervised-learning 3d-point-clouds cross-modal-learning transfer-learning unsupervised-learning point-cloud few-shot-learning deep-learning object-classification

crosspoint's Introduction

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22)

Paper Link | Project Page

Citation

If you find our work, this repository, or pretrained models useful, please consider giving a star ⭐ and citation.

@InProceedings{Afham_2022_CVPR,
    author    = {Afham, Mohamed and Dissanayake, Isuru and Dissanayake, Dinithi and Dharmasiri, Amaya and Thilakarathna, Kanchana and Rodrigo, Ranga},
    title     = {CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {9902-9912}
}

🚀 News

(Mar 25, 2023)
- An implementation supporting PyTorchDistributedDataParallel (DDP) is available here. Thanks to Jerry Sun
(Mar 2, 2022)
- Paper accepted at CVPR 2022 🎉
(Mar 2, 2022)
- Training and evaluation codes for CrossPoint, along with pretrained models are released.

Dependencies

Refer requirements.txt for the required packages.

Pretrained Models

CrossPoint pretrained models with DGCNN feature extractor are available here.

Download data

Datasets are available here. Run the command below to download all the datasets (ShapeNetRender, ModelNet40, ScanObjectNN, ShapeNetPart) to reproduce the results.

cd data
source download_data.sh

Train CrossPoint

Refer scripts/script.sh for the commands to train CrossPoint.

Downstream Tasks

1. 3D Object Classification

Run eval_ssl.ipynb notebook to perform linear SVM object classification in both ModelNet40 and ScanObjectNN datasets.

2. Few-Shot Object Classification

Refer scripts/fsl_script.sh to perform few-shot object classification.

3. 3D Object Part Segmentation

Refer scripts/script.sh for fine-tuning experiment for part segmentation in ShapeNetPart dataset.

Acknowledgements

Our code borrows heavily from DGCNN repository. We thank the authors of DGCNN for releasing their code. If you use our model, please consider citing them as well.

crosspoint's People

Contributors

Stargazers

Watchers

crosspoint's Issues

The distributed implementation of CrossPoint is available here !

Please refer to this project !

Best Regards

About the pointcloud visualization software in Fig.2

Hi, Mohamed Afham!
I really appreciate your great work! And I think the figures in your paper are wonderful!
Could you please tell me what the pointcloud visualization software is in Figure 2? It's looks nice!
Thanks in advance!

null

Can you provide pointnet as a feature extractor to train your code? Thank you very much.

According to the source code referenced by the pointnet that you provided, but probably because the parameters were set differently from DGCNN, the training results were bad.

Please provide the part of the code that trains pointnet as a feature extractor. Thank you very much.
It means a lot to me. Thank you!

distributed training for CrossPoint

@MohamedAfham I have succefully integrated the PyTorch DistributedDataParallel mechanism into your codebase, which accelerates the training procedure remarkbly and achieves a similar performance with the paper reported.

Later on I want to pull a request to your repository, thank you.

What's the GPU device used during your training and finetuing?

As the title described, I wonder the GPU device you used to support the batch_size=20.

I use a RTX 2080 Ti, which has 11GB memory, when running train_crosspoint.py, I have to set batch_size=2 to avoid CUDA out of memory since you konw, knn and torch.cat in models/dgcnn.py will consume a large portion of memory.

However, the small batch_size leads to much slower training procedure so that I can get the final results probably in 4 or 5 days.

By the way, I have multiple GPUs, is it possible to incorporate DistributedDataParallel to accelerate the training procedure?

Anyway, I will try it out!

what variant do we use in few-shot learning on ScanObjectNN?

Hi, thank you for sharing such excellent results

I would like to ask what variant do we use in few-shot learning on ScanObjectNN?

OBJ ONLY
OBJ BG
PB T25
PB T25 R
PB T50 R
PB T50 RS

Looking forward for your response, thank you

Downstream tasks 3D Object classification

thanks for your great work!
I'm confused that why you fit a simple linear SVM classifier on the train split of the classification datasets in 3D object classification? where can I find the corresponding code?

About PointNetRendering Dataset

I think the view metrics of rendering images are discribed in your rendering_mentadata.txt. Could you specify the meaning of each metric? Thanks.

What's the specific environment of this code？

Hi @MohamedAfham

What's the specific environment of this code? Could this code run with cuda11.1 and pytorch 1.8.0? Thanks.

CrossPoint pretrained models with pointnet feature extractor

Thank you for your excellent work. Can you provide the pretrained models with pointnet feature extractor?

Can't download the dataset using gdown

When using the download_data.sh, it will raise the error:
requests.exceptions.MissingSchema: Invalid URL '': No scheme supplied. Perhaps you meant http://?

How to use gdown to download the dataset?

How did you get the 2D images corresponding to the ModelNet40, ScanObjectNN point cloud data? The content inside eval_ssl.ipynb looks incomprehensible, can you provide the original .py file code?

Hello, dear author! How did you get the 2D images corresponding to the ModelNet40, ScanObjectNN point cloud data? The content inside eval_ssl.ipynb looks incomprehensible, can you provide the original .py file code?

Availability of checkpoint

Is it possible to have the checkpoints of both models (3d and 2d) in order to make a fine tuning starting from pretrained models?

How did you use the t-sne visualization feature and can you provide the source code?

Is batch_size=20 enough?

I'm curious is the batch_size equal to 20 enough for training? should I try batch_size=128 or 256?

RuntimeError: CUDA error: invalid device ordinal

Hi @MohamedAfham

Have you ever met this bug before? Thanks a lot.

Using GPU : 0 from 1 devices
Use Adam
Start training epoch: (0/100)
/export/home/hanxiaobing/anaconda3/envs/crosspoint/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train_crosspoint.py", line 261, in
train(args, io)
File "train_crosspoint.py", line 103, in train
_, point_feats, _ = point_model(data)
File "/export/home/hanxiaobing/anaconda3/envs/crosspoint/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/export/home/hanxiaobing/Documents/PlaneNet_PlaneRCNN/DGCNN_PointNet2/SensatUrban/MAE/CrossPoint/models/dgcnn.py", line 95, in forward
x = get_graph_feature(x, k=self.k)
File "/export/home/hanxiaobing/Documents/PlaneNet_PlaneRCNN/DGCNN_PointNet2/SensatUrban/MAE/CrossPoint/models/dgcnn.py", line 29, in get_graph_feature
idx_base = torch.arange(0, batch_size, device=device).view(-1, 1, 1)*num_points
RuntimeError: CUDA error: invalid device ordinal

Definition of the dgcnn_seg model

Thanks for your nice work. I am trying to reproduce your fine-tuning results on ShapeNetPart segmentation. I find that the model architecture for classification pertaining and segmentation pertaining are different. More specifically, in classification pertaining, the dgcnn model is adopted, while dgcnn_seg is utilized for the pre-training for part segmentation, as shown in the following:

CrossPoint/scripts/script.sh

Line 6 in 364987e

 python train_crosspoint.py --model dgcnn_seg --epochs 100 --lr 0.001 --exp_name crosspoint_dgcnn_seg --batch_size 20 --print_freq 200 --k 15 

However, I can not find the definition of dgcnn_seg in your model library. I guess the dgcnn_seg should be the DGCNN_partseg model with pretrain=True, right?

In addition, in my opinion, other paper may adopt the same architecture in pre-training for both classification and part segmentation, such as OcCo. Such a difference may lead to unfair comparison. What's your opinion?

It seems that the pretrain model you provide has gap on modelnet40

Hi, I used your pretrain model directly test linear accuracy on modelnet40, it got 90.27%, same as I runed train_crosspoint.py without any initialize, But the result you mentioned in your paper can get 91.2%. So I want to know are there any tricks in your codes. Or It means I should train based on your pretrain model? I look forward to your answers

get_graph_feature adds tensors on two different devices

Thank you for your contributions. Very interesting work!. I have two GPUs and I'm trying to run crosspoint pre-training for classification using:

python train_crosspoint.py --model dgcnn --epochs 100 --lr 0.001 --exp_name crosspoint_dgcnn_cls --batch_size 20 --print_freq 200 --k 15

And i'm getting the following error:

Traceback (most recent call last):
  File "train_crosspoint.py", line 258, in <module>
    train(args, io)
  File "train_crosspoint.py", line 100, in train
    _, point_feats, _ = point_model(data)
  File "/home/nas/anaconda3/envs/crosspoint/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nas/Desktop/CrossPoint/models/dgcnn.py", line 95, in forward
    x = get_graph_feature(x, k=self.k)
  File "/home/nas/Desktop/CrossPoint/models/dgcnn.py", line 31, in get_graph_feature
    idx = idx + idx_base
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Is there a reason for hardcoding cuda device to 1 here: https://github.com/MohamedAfham/CrossPoint/blob/440e3bdf1656014eb4284786a6b2bcdf83e8df30/models/dgcnn.py#L27

the definition of loss function

Thank you for your excellent work. Which code can I find the definition of loss function (including imid and cmid) mentioned in the paper?

Can train_crosspoint.py train the partseg model based on ShapeNetPart?

@MohamedAfham Thank you for releasing the code. The paper is well written and the code is robust.

I have successfully trained the classification and part segmentation models based on train_crosspoint.py and train_partseg.py, respectively. Everything goes smoothly.

One point I'm confused with is the comments in scripts/script.sh, you point out train_crosspoint.py can be used for training the part segmentation model and train_partseg.py is used for finetuing it. The code in train_crosspoint.py, however, only load ShapeNetRender for pretraining and ModelNet40 for linear accuracy evaluation. Actually, it does not load ShapeNetPart for part segmentation.

Instead, I think both training and finetuning take place in train_partseg.py as the train_loader in this file is designed for ShapeNetPart. Further, I think the self-superviesd cross-modal contrastive learning is intended for point cloud classification. Have I got a correct understaning?

relatively large performance gap on ScanObjectNN

@MohamedAfham Recently, I have run all experiments in the codebase at least 3 times to ensure there are not explicit exceptions during my operations.

Some of the results are very encouraging, which means they are comparable with the paper reported, sometimes even higher than that in the paper, e.g. the reproduced results on ModelNet. But some are not.

Specifically, for the downstream task few-shot classification on ScanObjectNN, the performance gap is relatively large, e.g.,

for 5 way, 10 shot, I got 72.5 ± 8.33,
for 5 way, 20 shot, I got 82.5 ± 5.06,
for 10 way, 10 shot, I got 59.4 ± 3.95,
for 10 way, 20 shot, I got 67.8 ± 4.41

For the downstream task linear SVM classification on ScanObjectNN, the reproduced performance is 75.73%. All experiments use the DGCNN backbone and default settings except for the batch size.

In short, all of results are behind the reported peformances on ScanObjectNN in the paper, by a large margin.

At this point, I wonder whether there are some precautions when experimenting on ScanObjectNN, and what possible reasons are. Can you provide some suggestions? thank you.