mit-han-lab / spvnas Goto Github PK

View Code? Open in Web Editor NEW

572.0 24.0 108.0 8.35 MB

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Home Page: http://spvnas.mit.edu/

License: MIT License

Python 93.14% Jupyter Notebook 6.86%

computer-vision deep-learning pytorch point-cloud lidar architecture-search efficiency 3d-deep-learning semantickitti

spvnas's Introduction

SPVNAS

video | paper | website

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Haotian Tang*, Zhijian Liu*, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han

ECCV 2020

SPVNAS achieves state-of-the-art performance on the SemanticKITTI leaderboard (as of July 2020) and outperforms MinkowskiNet with 3x speedup, 8x MACs reduction.

News

[2020-09] We release the baseline training code for SPVCNN and MinkowskiNet.

[2020-08] Please check out our ECCV 2020 tutorial on AutoML for Efficient 3D Deep Learning, which summarizes the algorithm in this codebase.

[2020-07] Our paper is accepted to ECCV 2020.

Usage

Prerequisites

The code is built with following libraries:

Python >= 3.6, <3.8
PyTorch >= 1.6
tqdm
torchpack
torchsparse
numba
cv2

Recommended Installation

For easy installation, use conda:

conda create -n torch python=3.7
conda activate torch
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install numba opencv
pip install torchpack
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git

Data Preparation

SemanticKITTI

Please follow the instructions from here to download the SemanticKITTI dataset (both KITTI Odometry dataset and SemanticKITTI labels) and extract all the files in the sequences folder to /dataset/semantic-kitti. You shall see 22 folders 00, 01, …, 21; each with subfolders named velodyne and labels.

Model Zoo

SemanticKITTI

We share the pretrained models for MinkowskiNets, our manually designed SPVCNN models and also SPVNAS models found by our 3D-NAS pipeline. All the pretrained models are available in the model zoo. Currently, we release the models trained on sequences 00-07 and 09-10 and evaluated on sequence 08.

	#Params (M)	MACs (G)	mIoU (paper)	mIoU (reprod.)
`SemanticKITTI_val_MinkUNet@29GMACs`	5.5	28.5	58.9	59.3
`SemanticKITTI_val_SPVCNN@30GMACs`	5.5	30.0	60.7	60.8 ± 0.5
`SemanticKITTI_val_SPVNAS@20GMACs`	3.3	20.0	61.5	-
`SemanticKITTI_val_SPVNAS@25GMACs`	4.5	24.6	62.9	-
`SemanticKITTI_val_MinkUNet@46GMACs`	8.8	45.9	60.3	60.0
`SemanticKITTI_val_SPVCNN@47GMACs`	8.8	47.4	61.4	61.5 ± 0.2
`SemanticKITTI_val_SPVNAS@35GMACs`	7.0	34.7	63.5	-
`SemanticKITTI_val_MinkUNet@114GMACs`	21.7	113.9	61.1	61.9
`SemanticKITTI_val_SPVCNN@119GMACs`	21.8	118.6	63.8	63.7 ± 0.4
`SemanticKITTI_val_SPVNAS@65GMACs`	10.8	64.5	64.7	-

Here, the results are reproduced using 8 NVIDIA RTX 2080Ti GPUs. Result variation for each single model is due to the existence of floating point atomic addition operation in our TorchSparse CUDA backend.

Testing Pretrained Models

You can run the following command to test the performance of SPVNAS / SPVCNN / MinkUNet models.

torchpack dist-run -np [num_of_gpus] python evaluate.py configs/semantic_kitti/default.yaml --name [num_of_net]

For example, to test the model SemanticKITTI_val_SPVNAS@65GMACs on one GPU, you may run

torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs

Visualizations

You can run the following command (on a headless server) to visualize the predictions of SPVNAS / SPVCNN / MinkUNet models:

xvfb-run --server-args="-screen 0 1024x768x24" python visualize.py

If you are running the code on a computer with monitor, you may also directly run

python visualize.py

The visualizations will be generated in assets/.

Training

SemanticKITTI

We currently release the training code for manually-designed baseline models (SPVCNN and MinkowskiNets). You may run the following code to train the model from scratch:

torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/[model name]/[config name].yaml

For example, to train the model SemanticKITTI_val_SPVCNN@30GMACs, you may run

torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml

To train the model in a non-distributed environment without MPI, i.e. on a single GPU, you may directly call train.py with the --distributed False argument:

python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml --distributed False

Searching

The code related to architecture search will be coming soon!

Citation

If you use this code for your research, please cite our paper.

@inproceedings{tang2020searching,
  title = {Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution},
  author = {Tang, Haotian* and Liu, Zhijian* and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
  booktitle = {European Conference on Computer Vision},
  year = {2020}
}

spvnas's People

Stargazers

Watchers

Forkers

zxczrx123 liuguoyou woodpecker0 dynamiselettronica2020 cv-ip songyongkang narmadabalasooriya xrosliang bruinxiong hamradiodxer youngjoo-kim phuclb1 torchlidar simonsroad stiphyjay devin-coder robot-ai-machinelearning plaovem whuchenlin steph1793 enesozi aimicm sunnyln hadonga collector-m huhaowen0130 sandeepnmenon liuxinren sj-li deepmirrorinc openhero freeverc coolloveboy ltriess luolizhuiyuyun720 cule adamzdw pointcloudyc arghyachatterjee husnejahan liuxinren456852 timingan bohanzhai chulongc v-catlna yaomoren digital-idiot ccinc alsun-oven allianc-rgb jlqzzz leedk3 jie311 big-data-ai apeizou liukang1811 clubzip fangqiaohu climbingdaily woojulee24 zhangzw12319 zzyzzyzz ajinkyakhoche mindtrace liamkboyle nmnghjss yesterday24-cuit zhoupeng0418 carl12138aka hymilex cyqian97 suyunzzz mfkiwl plyfager engomarwasfy garrettchristian smathurintel maxshi007 sammn97 raul0806 tuskaw lfl256 xiuyu-li dazory xiaoaoran lyx-soul lyrics-wangkl robotislove lvdongxu whuhxb kerkilchoi bysowhat yukang2017 yanlingz ziyan-wyq iq-scm chichengzzz jiangzhou224 limhyungtae hailuo0112

spvnas's Issues

ssh: Could not resolve hostname localhost:1: Temporary failure in name resolution

When I run "torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/default.yaml" on a ubuntu server, I met this error:
"ssh: Could not resolve hostname localhost:1: Temporary failure in name resolution". Additionally, I'm not root user.
Does anybody know how to solve that?

ssh: Could not resolve hostname localhost:1: Temporary failure in name resolution

I've fixed all the prerequisites on a ubuntu server with four gpus. But when I run the training command I got this problem. I have no idea of what happened. Can anybody help me out? Appreciate!

Questions About core/datasets/semantic_kitti.py

Hello, I have a couple of questions about some of the data processing in the dataset file "core/datasets/semantic_kitti.py". I have been using that as a bit of a reference in trying to do something similar with the nuScenes dataset.

The output SparseTensors from getitem are created from NumPy arrays which has caused me some problems in trying to run inference with SPVNAS. It seems the issue potentially stems from a few places (e.g. converting to cuda). I have fixed this by converting to torch tensors before creating the SparseTensors, and this seems like a perfectly fine solution, but mostly I am curious why this is not done in your code? I couldn't seem to find another place where it would have been handled. A similar thing appears to be going on in the example for the torchsparse library also.
It seems to me that the features for a voxel should be the mean coordinate of the points contained in that voxel and the mean lidar intensity of those points. However, it appears as if you are taking a voxel feature to be the feature for only one of the points in that voxel. Is this correct, or am I misunderstanding something?
In terms of point cloud pre-processing it seems you are subtracting the minimum of each coordinate. Is this all that is done, it seems to me that some more normalization would be helpful.
Is the labels variable returned by sparse_quantize the same as the input labels parameter, and if so what is the purpose of this?

Thanks for all the help!

How to train and evaluate spvnas model?

Dear authors,

Thanks for your great work.
I notice that the training code only supports minkunet and spvcnn but not spvnas models (only minkunet and spvcnn in configs/semantic_kitti).
The evaluation code only supports to test your provided pretrained models.
How can we train and spvnas model? Thanks in advance.

Regards,
Chen Yukang

Values to unpack

Hey,

nice work. The results for the SemanticKITTI datasets are looking very promising.
I wanted to play around with your network approach and faced the problem:
"not enough values to unpack (expected 2, got 1)" during the forward pass of the class SPVCNN. I tried the mentioned example within the subrepo and got the same problem.

Thanks.

ROS inference node and visualization in RVIZ

Thanks for your interesting work!. Adding a ROS node to visualize results in RVIZ would be great?

question about using spvnas for object detection head

Hi,
Thanks for your awesome work first.

I'm trying to use spvnas backbone for object detection, and trying to merge torchsparse library to MMdetection3d.
but I got stuck in the last layer, the last layer is to add all the feature back to PointTensor, and use a simple classifier to classify the point cloud.

If I want to use SparseTensor for dense head and output the detection result, the simple idea is to make 3D tensor to 2D BEV,
and use anchor or anchor free head to generate detection result, but it seems like all the tensor.shape is different from the original mmdetection3d.

If it is possible to provide the detection code on kitti, or give some ideas about how to reimplement the result on paper.

thanks!

Cuda out of memory

I'm trying to run the example.py with colab and when I load a pre-trained model as:

model = spvnas_specialized('SemanticKITTI_val_SPVNAS@20GMACs').to(device)

I get the following out of memory which is strange as I think that the network actually does not have so many parameters:

RuntimeError: CUDA out of memory. Tried to allocate 127.56 GiB (GPU 0; 14.76 GiB total capacity; 1.25 GiB already allocated; 12.37 GiB free; 1.36 GiB reserved in total by PyTorch)

Also I want to notify you that I opened an issue in the torch sparse repository as there are some problems running test on CPU with Docker environment configured as you request.

Problem about "torchsparse_cuda"

Hi! Thanks for your wonderful work!
I meet a problem when testing pretrained model.
I installed "torchsparse" correctly but could not import "torchsparse_cuda".
Could you help me with the problem?

Errors on latest torchsparse release

The imports of spvnas need to be updated based on the renaming on the latest torchsparse commits, particularly:


Traceback (most recent call last):
  File "train.py", line 103, in <module>
    main()
  File "train.py", line 68, in main
    model = builder.make_model().cuda()
  File "/media/chris/Elements/data/PC/spvnas/core/builder.py", line 39, in make_model
    from core.models.semantic_kitti import SPVCNN
  File "/media/chris/Elements/data/PC/spvnas/core/models/semantic_kitti/__init__.py", line 2, in <module>
    from .spvcnn import *
  File "/media/chris/Elements/data/PC/spvnas/core/models/semantic_kitti/spvcnn.py", line 12, in <module>
    from torchsparse.utils.kernel_region import *
ModuleNotFoundError: No module named 'torchsparse.utils.kernel_region'

Errors while evaluate the model in training progress

Hi, Thanks for your work!

I meet a problem when evaluate the model in training progress or evaluate the pretrained model.
It exit with an AssertionError: assert coords.shape[0] == len(labels)

pytorch version: 1.6.0;
python version: 3.6;

Could you help me with the problem?

(pytorch1.4.0) tdmc@tdmc:~/work/guozhij/gitwork/dl-ai/3d-task/e3d/spvnas$ torchpack dist-run -v -np 1 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml
mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x CUDA_HOME -x HOME -x LANG -x LC_ADDRESS -x LC_IDENTIFICATION -x LC_MEASUREMENT -x LC_MONETARY -x LC_NAME -x LC_NUMERIC -x LC_PAPER -x LC_TELEPHONE -x LC_TIME -x LD_LIBRARY_PATH -x LESSCLOSE -x LESSOPEN -x LOGNAME -x LS_COLORS -x MAIL -x MASTER_HOST -x PATH -x PS1 -x PWD -x QT_QPA_PLATFORMTHEME -x SHELL -x SHLVL -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x USER -x VIRTUAL_ENV -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSION_ID -x _ -x tdip -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml
[2020-10-28 16:43:15.574] /home/tdmc/work/python_env/pytorch1.4.0/bin/python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml
[2020-10-28 16:43:15.575] Experiment started: "runs/run-cd60bee5".
workers_per_gpu: 8
data:
  num_classes: 19
  ignore_label: 255
  training_size: 19132
train:
  seed: 1588147245
  deterministic: False
dataset:
  name: semantic_kitti
  root: /home/tdmc/data/Dataset/kitti_odom/dataset/sequences
  num_points: 80000
  voxel_size: 0.05
num_epochs: 15
batch_size: 4
criterion:
  name: cross_entropy
  ignore_index: 255
optimizer:
  name: sgd
  lr: 0.24
  weight_decay: 0.0001
  momentum: 0.9
  nesterov: True
scheduler:
  name: cosine_warmup
model:
  name: spvcnn
  cr: 0.5
[2020-10-28 16:43:22.733] Epoch 1/15 started.
[loss] = 0.225: 100% 4783/4783 [2:33:00<00:00,  1.92s/it]  
[2020-10-28 19:16:23.439] Training finished in 2 hours 33 minutes.
 34% 350/1018 [01:51<03:33,  3.13it/s]
Traceback (most recent call last):
  File "train.py", line 107, in <module>
    main()
  File "train.py", line 102, in main
    Saver(),
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/train/trainer.py", line 39, in train_with_defaults
    callbacks=callbacks)
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/train/trainer.py", line 88, in train
    self.trigger_epoch()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/train/trainer.py", line 156, in trigger_epoch
    self.callbacks.trigger_epoch()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/callbacks/callback.py", line 90, in trigger_epoch
    self._trigger_epoch()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/callbacks/callback.py", line 308, in _trigger_epoch
    callback.trigger_epoch()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/callbacks/callback.py", line 90, in trigger_epoch
    self._trigger_epoch()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/callbacks/inference.py", line 29, in _trigger_epoch
    self._trigger()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchpack/callbacks/inference.py", line 36, in _trigger
    for feed_dict in tqdm.tqdm(self.dataflow, ncols=0):
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/tqdm/std.py", line 1171, in __iter__
    for obj in iterable:
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 971, in _next_data
    return self._process_data(data)
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 6.
Original Traceback (most recent call last):
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/tdmc/work/guozhij/gitwork/dl-ai/3d-task/e3d/spvnas/core/datasets/semantic_kitti.py", line 236, in __getitem__
    return_invs=True)
  File "/home/tdmc/work/python_env/pytorch1.4.0/lib/python3.6/site-packages/torchsparse-1.1.0-py3.6-linux-x86_64.egg/torchsparse/utils/helpers.py", line 48, in sparse_quantize
    assert coords.shape[0] == len(labels)
AssertionError

-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[49202,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Best,
Gorzhij

About SPVNAS Pretrained Model

Dear authors,

Thanks for your great work.
I notice that you only provide 64G SPVNAS Pretrained Model rather than 73.8G SPVNAS Model as you reported in the Tabel 2.

Also, the **8.9G ** (tabel 2)and 15.0G (tabel 3) SPVNAS Pretrained Model are not released, right ?

How can we get these model? Thanks in advance.

Regards,
Iris

Support for NuScenes dataset

Hi!
Would this codebase support the NuScenes dataset?

Provide full list of requirements

The installation process would be a lot faster if the repo had a requirements.txt or setup.py. Especially since the prerequisites listed here https://github.com/mit-han-lab/e3d#prerequisites are not complete. For example spvnas/core/datasets/semantic_kitti.py also requires h5py, numba, opencv-python.

SPVCNN Hyperparameters

Hi,

The parameters in configs/semantic_kitti/default.py are designed for the SPVNAS network, correct? I'm curious about the parameters used for the SPVCNN and MinkUNet models, in particular the learning rate. I couldn't find it in the paper.

Thanks! Congratulations on your great work.

Map label color to class name

First of all sorry for the stupid question but I could not find this anywhere.

In your code you use the following colors:

COLOR_MAP = np.array(['#f59664', '#f5e664', '#963c1e', '#b41e50', '#ff0000', '#1e1eff',
                      '#c828ff', '#5a1e96', '#ff00ff', '#ff96ff', '#4b004b', '#4b00af',
                      '#00c8ff', '#3278ff', '#00af00', '#003c87', '#50f096', '#96f0ff',
                      '#0000ff', '#ffffff'])

that I have mapped to rgb resulting in this color pallet:

My question is, where can I find the correspondence between each color and the actual class it represents.
Thank you and sorry for the noob question

ImportError: libtorch.so: cannot open shared object file: No such file or directory

Is there any other rerequisites or settings before running "torchpack dist-run..." ?

Thanks for your wonderful work! The model works fine even on velodyne vlp16 dataset.

But when I try to test pretrained models with "torchpack dist-run...", some errors come up:

`$ torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/cr0$p5.yaml

ssh: Could not resolve hostname localhost:1: Name or service not known

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------`

Since I have install all the rerequisites according to README.md, I have no idea what may cause this problem. Is there any other rerequisites or settings ?

Why not use the pre-trained pytorch model (`.pt`)?

Reconstructed labels and "raw" labels differ

Hi!

Thanks for your work! SPVNAS is indeed a great tool!

I was playing around with evaluate.py to test on different metrics and wondered why there is always a ~100-200 point difference between the ground truth when using labels from original tensor all_labels = feed_dict['targets_mapped'] and reconstructing labels from feed_dict['targets'] by using feed_dict['inverse_map']?

For example (first line is your code in evaluate.py):

targets_mapped = all_labels.F[cur_label]
targets_quick = targets[invs.F.long()]
diff = torch.eq(targets_mapped.long(), targets_quick)
print("Different points: ", diff[diff==False].shape)

I would understand if the difference would be because of ignore_class but looks like these are just random classes:

print("Diff", targets_mapped[~diff], targets_quick[~diff])

will output something like:

tensor([ 16.,   8.,  10.,   0.,  16.,  16.,  10.,  10.,   8.,   8.,  14.,   8. ....
tensor([10,  0, 16,  8, 14, 15,  8, 16, 10,  0, 16, 10, 14, 14, 14 ...

I cannot figure out where the difference comes from? All point clouds have around 100-200 points where original and reconstructed ground truth is different. There is filtered_labels[counts > 1] = ignore_label in torchsparse/utils/helpers.py but this seems to be removing duplicates and replacing with 255 (but I observe random classes). Am I missing something or how to explain this difference?

SPVCNN vs MinkowskiNet at same value of `cr`

Hi, thank you for the insightful work! I had a (potentially dumb) question regarding the comparison of MinkowskiNet and SPVCNN (without the NAS): I see that you provide cr parameter to control the channel ratio in order to control the width of the networks. Am I correct in my understanding that, for this Figure, when comparing MinkowskiNet and SPVCNN under same MACs, the cr for both models are different?

3D UNet with TorchSparse

By chance, has anyone implemented 3D UNet with TorchSparse? If so, I would appreciate a link.

The specific issue I am currently pondering about is how to concatenate the features at a level with the up-sampled features from lower level. I looked to see whether there is a built-in function to do that. The functions I saw didn't seem to be great fits.

Things I saw:

The add function in SparseTensor: This assumes that the coordinates in the two tensors are identical, i.e., the ".C" field is the same in the two tensors. But, this need not be the situation in the UNet case.
sparse_collate_tensors function in helpers.py : Seems to be for the purpose of creating a tensor where in multiple tensors are added into a batch (?).

Thanks.

Train the newtwork on one class only

Hi, How can I train the network on certain class like people or cars and ignore all other classes.
Thanks!

MinkUNet vs SpatioTemporal Seg

Hi!

As I'm reading through the code and comparing to the original spatiotemporal segmentation code, I was curious how the architecture of MinkUNet and SPVCNN compares to the original minkowskinet. It looks like for cr=1.0 you have similar feature dimensions as Res16UNet34C (MinkowskiNet42 in the paper), i.e. (32, 64, 128, 256, 256, 128, 96, 96), but you have 2 residual layers per block, whereas the Res16UNet34 architectures have (2, 3, 4, 6, 2, 2, 2, 2) layers. I'm curious if you experimented with other numbers of residual layers in your implementation to match the original MinkowskiNet implementation, or if you found that the two layers per block worked fine.

Also, I'm curious about the first few layers, where your implementation is:

stem = {
	Conv3d(4, 32, kernel_size=3, stride=1),
	Conv3d(32, 32, kernel_size=3, stride=1)
}
		
stage1 = (
	Conv3d(32, 32, ks=2, stride=2, dilation=1),
	ResidualBlock(32, 64, ks=3, stride=1, dilation=1),
	ResidualBlock(64, 64, ks=3, stride=1, dilation=1),
)

while the spatiotemporal segmentation impl is:

stem = {
	Conv3d(4, 32, kernel_size=3, stride=1)
}
		
stage1 = (
	Conv3d(32, 32, ks=2, stride=2, dilation=1),
	ResidualBlock(32, 64, ks=3, stride=1, dilation=1),
	ResidualBlock(64, 64, ks=3, stride=1, dilation=1),
)

i.e. there is one less convolutional layer. Is the extra layer necessary? I could not find details of these in the paper.

Thanks!

Submanifold sparse convolution

Hello, I wonder if torchsparse spnn.Conv3d is an implementation of the submanifold sparse convolution (https://arxiv.org/pdf/1706.01307.pdf) or just a sparse convolution. If so can you explain to me the principal difference? Thank you

Error on eval code: feature size and kernel size mismatch

Hi! An earlier version of evaluate.py on SemanticKITTI_val_SPVNAS@65GMACs worked fine when I tested it a few months back, but the latest code produces a weird error ValueError: Input feature size and kernel size mismatch caused by torchsparse (probably this line).

Using torchsparse 1.2, pytorch 1.7.1, cuda 11.0

Is it a torchsparse`spvnas` compatibility issue (e.g., models are trained on older version) or something else? I noticed that one earlier similar problem was solved by downgrading torchsparse, but the latest version of SPVNAS seems to be updated to use 1.2?

Thanks!

martin@pytorch18-vm:~/spvnas$ torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs
  File "/code/spvnas/model_zoo.py", line 50, in spvnas_specialized
    model = model.determinize()
  File "/code/spvnas/core/models/semantic_kitti/spvnas.py", line 311, in determinize
    x = self.forward(x)
  File "/code/spvnas/core/models/semantic_kitti/spvnas.py", line 343, in forward
    x1 = self.downsample[0](x1)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/modules.py", line 82, in forward
    x = self.layers[k](x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/layers.py", line 499, in forward
    out = self.relu(self.net(x) + self.downsample(x))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/layers.py", line 339, in forward
    out = self.net(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/code/spvnas/core/modules/dynamic_sparseop.py", line 98, in forward
    return spf.conv3d(inputs, cur_kernel, self.ks, self.s, self.d, self.t)
  File "/opt/conda/lib/python3.7/site-packages/torchsparse/nn/functional/conv.py", line 149, in conv3d
    idx_query[1], sizes, transpose)
  File "/opt/conda/lib/python3.7/site-packages/torchsparse/nn/functional/conv.py", line 40, in forward
    neighbor_offset, transpose)
ValueError: Input feature size and kernel size mismatch
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing                                                                           
the job to be terminated. The first process to do so was:

  Process name: [[37270,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Segmentation fault when I use visualize.py

When I run:
xvfb-run --server-args="-screen 0 1024x768x24" python visualize.py

I got :
`QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
xkbcommon: ERROR: failed to add default include path
Qt: Failed to create XKB context!
Use QT_XKB_CONFIG_ROOT environmental variable to provide an additional search path, add ':' as separator to provide several search paths and/or make sure that XKB configuration data directory contains recent enough contents, to update please see http://cgit.freedesktop.org/xkeyboard-config/ .
/home/anaconda3/lib/python3.7/site-packages/traits/etsconfig/etsconfig.py:415: UserWarning: Environment variable "HOME" not set, setting home directory to /tmp
% (environment_variable, parent_directory)

WARNING: Imported VTK version (9.0) does not match the one used
to build the TVTK classes (8.2). This may cause problems.
Please rebuild TVTK.

Generic Warning: In ../Rendering/Core/vtkGlyph3DMapper.cxx, line 65
Error: no override found for 'vtkGlyph3DMapper'.

Generic Warning: In ../Rendering/Core/vtkPolyDataMapper.cxx, line 28
Error: no override found for 'vtkPolyDataMapper'.

Segmentation fault (core dumped)
`

Performance of pretrained models

Hi, thanks for your great work.

You have released the pretrained model, namely, semantickitti_val_spvnas@65gmacs, which achieves miou=64.7 on validation set. How about its performance on test set?
In fact, I generate the predictions (using evaluate.py with two line output-saving code) on testset based on your code and this pretrained model, however, it achieves miou=60.96. Is it right? Or there are some special techniques. Thanks very much.

There are not enough slots available in the system to satisfy the 8 slots

When I run:
torchpack dist-run -np 8 -v python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml

the system tell me:

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 8 slots 
that were requested by the application:
  python

Either request fewer slots for your application, or make more slots available
for use.

Which label belongs to which semanticKITTI IOU prediction

I can't figure out which label belongs to which IOU prediction when evaluating on the semanticKITTI dataset.

The file semantic_kitti.py defines the following list: 

kept_labels = [
    'road', 'sidewalk', 'parking', 'other-ground', 'building', 'car', 'truck',
    'bicycle', 'motorcycle', 'other-vehicle', 'vegetation', 'trunk', 'terrain',
    'person', 'bicyclist', 'motorcyclist', 'fence', 'pole', 'traffic-sign'
]

I run the following command:
!torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs

And get the following IOU prediction output:
[0.9718420082206927, 0.38961800341472447, 0.7145743531211676, 0.6846713350897515, 0.6445271805914676, 0.7246225795083294, 0.8675636397731001, 0.0012132890839667415, 0.9343194385183315, 0.5199240494364405, 0.812111782522395, 0.0001901425259827739, 0.9177672828390993, 0.6610816523276035, 0.8819792946476588, 0.6617228983717633, 0.7466429908103505, 0.6438834155400621, 0.5139405980831536]

Where in the code is the order of the labels defined, used for printing the IOU predictions?

The program stops at line `dist.init() `

@zhijian-liu

After I run CUDA_VISIBLE_DEVICES=1 torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml 2>&1 | tee ./train.log
The program stops at line dist.init() in train.py and cannot continue to run.

Is there something wrong, could you please help me to solve this problem?

Enviromnet:
cudatookit 10.2
pytorch 1.8.0
python 3.6
openmpi 4.1.1

Question about the implementation details of SECOND(our)

There is only one sentence about the realization of SECOND (ours) on page 12 of the article:
"As for our model, we only replace these 3D Sparse Convolutions in SECOND with our SPVConv while keeping all the other settings the same for fair comparison. "

However, when I think carefully, there are some unclear points. In SPVCNN (and PVCNN), the neighborhood information is collected by the Voxel branch, and then it flows to the Point branch. The final feature representation is on the Point branch. On the contrary, the final feature representation in SECOND is on the structured feature map from the Voxel branch. So, I am interested in how you “only replace these 3D Sparse Convolutions in SECOND with our SPVConv ” ？

Looking forward to your reply.

Transfer the learnings from the pretrained models to training in custom datasets.

Does the repo support training on new dataset after preloading the model with given weights and editing some layers of the architecture?

Thank you

The mpi4py problem

Hi, nice work. I am really interested in your work, and new to semantic segmentation.
When I run the code
torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml
the system return this:
`Synopsis: mpirun [options]
mpirun [options] []

Description: Start an MPI application in LAM/MPI.

Notes:
[options] Zero or more of the options listed below
LAM/MPI appschema
List of LAM nodes and/or CPUs (examples
below)
Must be a LAM/MPI program that either
invokes MPI_INIT or has exactly one of
its children invoke MPI_INIT
Optional list of command line arguments
to

Options:
-c Run copies of (same as -np)
-client :
Run IMPI job; connect to the IMPI server
at port as IMPI client number
-D Change current working directory of new
processes to the directory where the
executable resides
-f Do not open stdio descriptors
-ger Turn on GER mode
-h Print this help message
-l Force line-buffered output
-lamd Use LAM daemon (LAMD) mode (opposite of -c2c)
-nger Turn off GER mode
-np Run copies of (same as -c)
-nx Don't export LAM_MPI_* environment variables
-O Universe is homogeneous
-pty / -npty Use/don't use pseudo terminals when stdout is
a tty
-s Load from node
-sigs / -nsigs Catch/don't catch signals in MPI application
-ssi Set environment variable LAM_MPI_SSI_=
-toff Enable tracing with generation initially off
-ton, -t Enable tracing with generation initially on
-tv Launch processes under TotalView Debugger
-v Be verbose
-w / -nw Wait/don't wait for application to complete
-wd

Change current working directory of new
processes to
-x Export environment vars in

Nodes: n, e.g., n0-3,5
CPUS: c, e.g., c0-3,5
Extras: h (local node), o (origin node), N (all nodes), C (all CPUs)

Examples: mpirun n0-7 prog1
Executes "prog1" on nodes 0 through 7.

            mpirun -lamd -x FOO=bar,DISPLAY N prog2
            Executes "prog2" on all nodes using the LAMD RPI.  
            In the environment of each process, set FOO to the value
            "bar", and set DISPLAY to the current value.

            mpirun n0 N prog3
            Run "prog3" on node 0, *and* all nodes.  This executes *2*
            copies on n0.

            mpirun C prog4 arg1 arg2
            Run "prog4" on each available CPU with command line
            arguments of "arg1" and "arg2".  If each node has a
            CPU count of 1, the "C" is equivalent to "N".  If at
            least one node has a CPU count greater than 1, LAM
            will run neighboring ranks of MPI_COMM_WORLD on that
            node.  For example, if node 0 has a CPU count of 4 and
            node 1 has a CPU count of 2, "prog4" will have
            MPI_COMM_WORLD ranks 0 through 3 on n0, and ranks 4
            and 5 on n1.

            mpirun c0 C prog5
            Similar to the "prog3" example above, this runs "prog5"
            on CPU 0 *and* on each available CPU.  This executes
            *2* copies on the node where CPU 0 is (i.e., n0).
            This is probably not a useful use of the "C" notation;
            it is only shown here for an example.

Defaults: -c2c -w -pty -nger -nsigs

And when I install mpi4py, and then run the code above, the system return:[mpiexec@eyre-MS-7C35] match_arg (utils/args/args.c:163): unrecognized argument allow-run-as-root
[mpiexec@eyre-MS-7C35] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@eyre-MS-7C35] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@eyre-MS-7C35] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@eyre-MS-7C35] main (ui/mpich/mpiexec.c:148): error parsing parameters
`
Also, I am wandering where is the 'dataset' folder located?

CUDA out of memory while running visualize.py in 2 x Tesla K80 (12 + 12 GB) VM

I am trying to use the pretrained model 'SemanticKITTI_val_SPVCNN@119GMACs' to segment my point cloud of shape
(566151, 4).
I was getting the following error in my

|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:B3:00.0  On |                  N/A |
| 22%   45C    P8    16W / 250W |    532MiB / 12211MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

RuntimeError: CUDA out of memory. Tried to allocate 162.00 MiB (GPU 0; 11.93 GiB total capacity; 10.29 GiB already allocated; 128.81 MiB free; 10.67 GiB reserved in total by PyTorch)
on running the command
python visualize.py --model SemanticKITTI_val_SPVCNN@119GMACs

Now I am trying in my VM with 2 x Tesla K80 GPUs.

|===============================+======================+======================|
|   0  Tesla K80           On   | 00008FA5:00:00.0 Off |                    0 |
| N/A   34C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000BC37:00:00.0 Off |                    0 |
| N/A   41C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Using the command
torchpack dist-run -np 2 python visualize.py --model SemanticKITTI_val_SPVCNN@119GMACs
to use both the GPUs. Still I am getting the out of memory error
RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 11.17 GiB total capacity; 5.47 GiB already allocated; 46.69 MiB free; 5.62 GiB reserved in total by PyTorch)
And it seems like only one GPU is getting used on watching "nvidia-smi' and also from the error message.

How much single GPU space is required to run inference of the SPVCNN@119GMACs model on a point cloud of size (566151, 4)?
Can we use torchpack to split inference in multiple GPUs? If yes what is wrong with my command mentioned above?

Thank you

Reproducing the results

Hi I have been working with your models (withou NAS) in my own framework and I have been struggling to reproduce your results with the MinkUNet. I would like just to confirm some hyperparameters to see if I'm missing something.

So for the best test so far:

I used your scheduler, the cosine_warmup
learning_rate: 0.24
decay_lr: 1.0e-4
SGD optimizer with momentum 0.9 and nesterov=True
For the criterion I have used mix lovasz and cross-entropy which really increased the performance during my training
And epoch = 20

My main question is, in your paper it's said that after the first 15 epochs with lr = 0.24 a second training with other 15 epochs and lr = 0.096 is done. This second training is only for the NAS version or for the bare models it's also done a second 15 epochs training?

I'm asking because after the 20 epochs of training on my framework I've got only 38% of mIoU in the validation set (seq 08) and ~50% mIoU on the training set.

When will the neural architecture search code for SPVNAS released?

Hi! Thank you very much for a wonderful work.
I can't wait to try some experiments on SPVNAS.
In that matter, when will the neural architecture search code for SPVNAS release?
Thanks

Error when importing SPVNAS model to run inference

I use the e3d.ipynb from the tutorial folder. I get the following error when importing SPVNAS model to run inference:
model = spvnas_specialized('SemanticKITTI_val_SPVNAS@65GMACs').to(device)
RuntimeError: running_mean should contain 1000 elements not 44

Also, when will the training code for SPVNAS be available?

Inference on CPU

@zhijian-liu @songhan thanks for opensourcing this code base i have few queries

Can we run the architecture inference on cpu systems
What is the latency vs num of class of the trained model which you have observed
THanks in advance

Version 1.1 not compatible with SPVNAS Semantic Segmentation

It looks like version 1.1 is not compatible with SPVNAS (Version 1.0 worked). Starting at line "model = model.determinize()" in model_zoo.py I get the following error:
RuntimeError: running_mean should contain 1000 elements not 44

You can use the following Python code to reproduce the error:

# Go into the folder SPVNAS (git clone https://github.com/mit-han-lab/e3d)
from model_zoo import spvnas_specialized

net_id = "SemanticKITTI_val_SPVNAS@65GMACs"

model = spvnas_specialized(net_id, True)

Error during executing train.py

Hi, I have some problem during executing train.py of e3d.

After I executed following commands of train.py

model = torch.nn.parallel.DistributedDataParallel(
    model.cuda(),
    device_ids=[dist.local_rank()],
    find_unused_parameters=True)

I got following error messages related to nvidia-modprobe.

/usr/bin/nvidia-modprobe: unrecognized option: "-f"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.

/usr/bin/nvidia-modprobe: unrecognized option: "-f"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help`
for usage information.

Do you have any idea for solving this problem? Please check and let me know.

Thank you in advance for your help.

How to test the trained model like MinUNet5cm

Dear Ken, thanks for your great code! I have trained my MinkUNet on SemanticKitti and got checkpoints.pt file! Now I wanna test the model on 11~21 scenes and hope to get the predicted label, what should I do? I found a trigger-like code in spvnas/core/datasets/semantic_kitti.py:
68: submit_to_server = kwargs.get("submit",False)

Should I change "False" to "True" and I will get the label on test set?

# of MACs in paper

Hi,

congrats to this awesome work and thanks for releasing your code.
I have a question regarding tables 2 and 3 in the paper.

How did you calculate the number of MACs? (also for the other approaches)

Thanks!

/bin/sh: 1: mpirun: not found

How should I deal with this error '/bin/sh: 1: mpirun: not found' when I run ''torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/default.yaml"

AttributeError: module 'torchsparse_cuda' has no attribute 'hash_forward'

I am trying to run the model inference code using the pretrained model using visualise.py. However at the line below
https://github.com/mit-han-lab/e3d/blob/db65d6c968a12d30a4caa2715ef6766ec04d7505/spvnas/visualize.py#L189
I am getting the following error

outputs = model(inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/docker/e3d/spvnas/core/models/semantic_kitti/spvcnn.py", line 194, in forward
    x0 = initial_voxelize(z, self.pres, self.vres)
  File "/opt/docker/e3d/spvnas/core/models/utils.py", line 19, in initial_voxelize
    pc_hash = spf.sphash(torch.floor(new_float_coord).int())
  File "/usr/local/lib/python3.6/dist-packages/torchsparse/nn/functional/hash.py", line 44, in sphash
    return hash_gpu(idx)
  File "/usr/local/lib/python3.6/dist-packages/torchsparse/nn/functional/hash.py", line 12, in forward
    return torchsparse_cuda.hash_forward(idx.contiguous())
AttributeError: module 'torchsparse_cuda' has no attribute 'hash_forward'

Environment details
Docker image: nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
Installed all prerequisites as mentioned in the repo
Torch 1.7.0
torchsparse 1.1.0
Cuda 10.2
GPU GeForce GTX TITanX 12GB

Errors while Evaluating the Pretrained Model

Thanks for your wonderful work.

I got errors while evaluating using 'torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs',

/bin/sh: -c: line 0: syntax error near unexpected token (' /bin/sh: -c: line 0: mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSION_ID -x _ -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python -m torchpack.launch.assets.silentrun python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs'

I checked I got the torchpack and torchsparse installed successfully. Any idea how to solve the issue? Thank you in advance.

Question about the difference between PVCNN and SPVCNN

Awesome performance!

If the NAS part is excluded, is it just that the sparse 3D convolution is added to SPVCNN, compared to PVCNN? However, I thought you already used the sparse 3D convolution in PVCNN when I read PVCNN some time ago. After all, you have already cited the sparse 3D convolution and SECOND in PVCNN.

If the answer is yes, I feel very regret that I missed this idea.

How to train MinkUNet on SemanticKitti

Thanks for your excellent work!
But how to train MinkUNet on semanticKitti dataset? I found the train and test results of MinkNet in your paper SPVNAS.

Ditributed training problem.

Hi,
I faced a distributed training problem.
I run the training script:
torchpack dist-run -np 4 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml
I found the following problems:

I used 4 gpus, and change the batchsize from 2 to 4 in this config.
I found that only 1 gpu is used 100%, while others are not used. (see the following Figure)

Besides, when I trained several steps, the code was stuck. There are no errors, just stuck. I have tried many times. Some logs are illustraed as follows.
[loss] = 0.365: 5% 252/5398 [07:05<2:21:57, 1.66s/it] [loss] = 0.365: 5% 253/5398 [07:05<2:17:34, 1.60s/it] [loss] = 0.493: 5% 253/5398 [07:07<2:17:34, 1.60s/it] [loss] = 0.493: 5% 254/5398 [07:07<2:22:29, 1.66s/it] [loss] = 0.582: 5% 254/5398 [07:08<2:22:29, 1.66s/it] [loss] = 0.582: 5% 255/5398 [07:08<2:20:24, 1.64s/it] [loss] = 0.0939: 5% 255/5398 [07:10<2:20:24, 1.64s/it] [loss] = 0.0939: 5% 256/5398 [07:10<2:19:29, 1.63s/it] [loss] = 0.291: 5% 256/5398 [07:11<2:19:29, 1.63s/it] [loss] = 0.291: 5% 257/5398 [07:11<2:14:54, 1.57s/it] [loss] = 0.225: 5% 257/5398 [07:13<2:14:54, 1.57s/it] [loss] = 0.225: 5% 258/5398 [07:13<2:15:47, 1.59s/it]

All my configs are as follows.
workers_per_gpu: 4 data: num_classes: 1 ignore_label: 255 training_size: 19132 train: seed: 1588147245 deterministic: False dataset: name: shift_prediction root: ../../data/Scannet num_points: 80000 voxel_size: 0.01 num_epochs: 200 batch_size: 4 criterion: name: regression optimizer: name: sgd lr: 0.24 weight_decay: 0.0001 momentum: 0.9 nesterov: True scheduler: name: cosine_warmup model: name: spvcnn_classification cr: 1.0 input_channel: 3

Looking forward to your reply.
Thx a lot.

Issues with Voxelization

Here I generate a point cloud having XY coordinates from a regular grid and initialized with random Z values. I removed some points (points having Z value less than 26, which should be roughly ~10% of all points in the chosen grid) to simulate the sparsity. For simplicity, I used the Z values as the features. Consider the following code snippet:

import numpy as np
import torch
from torchsparse import PointTensor
from spvnas.core.models.utils import *

# Generate Grid Coordinates
x, y = np.meshgrid(np.arange(1000), np.arange(1200), indexing='ij')
x, y = x.ravel(), y.ravel()

# Initialize with random Z values
z = np.random.randint(0, 256, x.shape)

# Remove some points for realistic scenario  
xx = x[z>25]
yy = y[z>25]
zz = z[z>25]

# Using Z values as features for testing
f = torch.from_numpy(np.stack((zz,), -1).astype(float))

# Create tensor of coordinayes
pc = torch.from_numpy(np.stack((xx, yy, zz), -1).astype(float))

# Create PointTensor
pt = PointTensor(feat=f, coords=pc)

# Initial Voxelization
st = initial_voxelize(z=pt, init_res=1, after_res=5)

# Voxelization
vt = point_to_voxel(st, pt)

When I check the content of st and vt, the coordinates as well as the features are zero tensors, i. e. each of the following statements returns True

torch.all(st.C==0)  # Returns True
torch.all(st.F==0)  # Returns True
torch.all(vt.C==0)  # Returns True
torch.all(vt.F==0)  # Returns True

Am I missing something?

mit-han-lab / spvnas Goto Github PK

spvnas's Introduction

SPVNAS

video | paper | website

News

Usage

Prerequisites

Recommended Installation

Data Preparation

SemanticKITTI

Model Zoo

SemanticKITTI

Testing Pretrained Models

Visualizations

Training

SemanticKITTI

Searching

Citation

spvnas's People

Stargazers

Watchers

Forkers

spvnas's Issues

ssh: Could not resolve hostname localhost:1: Name or service not known

Defaults: -c2c -w -pty -nger -nsigs

After I executed following commands of train.py

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

Recommend Projects

Recommend Topics

Recommend Org

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help`
for usage information.