huguesthomas / kpconv-pytorch Goto Github PK

View Code? Open in Web Editor NEW

759.0 15.0 153.0 432 KB

Kernel Point Convolution implemented in PyTorch

License: MIT License

Shell 0.02% C++ 15.40% Python 84.56% Batchfile 0.01%

kpconv-pytorch's Issues

Apply KPConv on an large ALS data set

Hi Thomas,

First of all, congratulations for your work and thank you in advance if you find time to answer my questions.

I work for an electric utility company, next month I will be part of a project whose goal is to classify a large ALS point cloud (~60,000 km already human labeled). I have some experience in computer vision but this will be my first project on this topic, so forgive me if I ask naive questions.

In your opinion, is KPConv a good choice for this type of data set? I say yes given this and this promising results, but if you know any reason why is not, I would be very interested to know it
What are the main steps that you suggest to take to adapt your scripts to a different data set (in particular to a large ALS data set where one of the most important classes is the powerline)?
What are the main hyper-parameters which need to be tuned when changing the data set? I imagine for sure the input sphere radius and dl_0
Does this implementation support multi-GPUs training?

Any further feedback is appreciated, I have read your paper but I still need to study your repository, I would like first to have a confirmation that I am looking in the right direction before investing time to study more your excellent work :)

max_in limit dictionary?

I encounter the error below when I run train_SemanticKitti.py. I did not find the file named "max_in_limits.pkl". Could you give any hints? Thanks!

Starting Calibration of max_in_points value (use verbose=True for more details)

Previous calibration found:
Check max_in limit dictionary
"balanced_4.000_0.060": ?
Traceback (most recent call last):

File "", line 1, in
training_sampler.calib_max_in(config, training_loader, verbose=True)

File "/media/xingyi/Tools4TB/KPConv-PyTorch/datasets/SemanticKitti.py", line 905, in calib_max_in
for batch_i, batch in enumerate(dataloader):

File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)

File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in init
self._try_put_index()

File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
index = self._next_index()

File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index
return next(self._sampler_iter) # may raise StopIteration

File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in iter
for idx in self.sampler:

File "/media/xingyi/Tools4TB/KPConv-PyTorch/datasets/SemanticKitti.py", line 797, in iter
self.dataset.epoch_inds += gen_indices

RuntimeError: The size of tensor a (4400) must match the size of tensor b (4084) at non-singleton dimension 0

when training sematickitti, it appears a ValueError: Truth values are stored in a 0D array instead of 1D array

I rename the project as HPC-Resnet.
I train sematickitti, it works well at the beginning.
Howerver, it appears a valueError: Truth values are stored in a 0D array instead of 1D array when the epoch reaches at 160 or more (sometimes at about 460 or 680).
I dont know what happens. I think this may be something in data processing.

File "train_SemanticKitti.py", line 324, in
trainer.train(net, training_loader, test_loader, config)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/trainer.py", line 271, in train
self.validation(net, val_loader, config)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/trainer.py", line 289, in validation
self.slam_segmentation_validation(net, val_loader, config)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/trainer.py", line 801, in slam_segmentation_validation
Confs[i, :, :] = fast_confusion(truth, preds, val_loader.dataset.label_values).astype(np.int32)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/metrics.py", line 48, in fast_confusion
raise ValueError('Truth values are stored in a {:d}D array instead of 1D array'. format(len(true.shape)))
ValueError: Truth values are stored in a 0D array instead of 1D array

How to build a data_loader for custom dataset?

hello, thank you for your great work.
I noticed that you use complex data_loaders, which may be hard to understand but in the end are more efficient.
Now I'd like to train on my own dataset, Could you please give me some tips to build a dataloader for my own dataset?

SemanticKitti all batches are unit sized

Hi,
thank you very much for making this PyTorch version available!
I wonder why at line 1230 of SemanticKitty.py (here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/master/datasets/SemanticKitti.py#L1230) you remove the batch size, and with that all the other input samples except for the first one in input_list, making all batches unit sized.
In fact, because of that, what goes into the network is always a single point cloud, no matter config.batch_num.
Could you please explain why?
Doesn't that alter the training process, reducing the number of iterations and missing most of the samples for each epoch?
Does that have to do with the parallelism introduced by config.input_threads, so each thread picks one? Debugging I probably can't see this happening. If so, the code doesn't work the same with multi-batch size and a single thread right?
Thank you!

is voxel grid downsampling necessary

Hi Thomas,

first, thanks for the pytorch implementation.

I would have one question regarding the motivation for the initial downsampling step. Is it mostly performed to keep the size of the point clouds manageable or is kpconv very sensitive to changes in point cloud resolution?

Of course the downsampling does not present any problems when dealing with classification or even segementation tasks where it is easy to interpolate or project the results on the original pc. However, it could present a problem if for example one would like to estimate some vector quantity for each point which might not be trivial to interpolate.

I have seen that in semanticKITTI for example you use 6 cm voxel size. Do you see a large degradation of the performance in parts where the resolution of the original point cloud is actually lower than that?

Thanks,

Zan

question about s3dis preprocessing

hi, @HuguesTHOMAS
Thanks for your amazing work! Effective and Efficient.
I have some questions w.r.t the data preprocessing of s3dis. Could you help me figure it out?

why do you train and evaluate on the subsampled S3DIS, not the original (full-resolution) one? Is that a common practice?

KPConv-PyTorch/datasets/S3DIS.py

Line 171 in 9bae9a3

self.load_subsampled_clouds()

you subsample (fuse the points inside 0.03 m spheres). Is 0.03 m a common practice also?
what's the function of potentials and pot_tree?
my understanding is taht potential is used as a way to sample the centre points of a sphere. Is that correct? what's the difference between use_potentials and random sample?

KPConv-PyTorch/datasets/S3DIS.py

Line 779 in 9bae9a3

if self.use_potentials:

KPConv-PyTorch/datasets/S3DIS.py

Line 240 in 9bae9a3

return self.random_item(batch_i)
Why do you use sphere-sampled S3DIS? Are you the first one to use it? PointNet, PointCNN, DGCNN all used pillar based sampling. However, I do believe sphere-sampling preserves better spatial information and appreciate sphere-sampling.
why do you stack zeros in the feature input? why do you only use RGB + Z not RGB+xyz?

KPConv-PyTorch/datasets/S3DIS.py

Line 409 in 9bae9a3

stacked_features = np.hstack((stacked_features, features))

mapping values are not allowed in "semantic-kitti.yaml"

I got an error while loading "semantic-kitti.yaml" , line 122, column 73 which is shown below. Any suggestions? Thanks!

line 122: "span class="progress-pjax-loader-bar top-0 left-0" style="width: 0%;"></span"

Traceback (most recent call last):

File "", line 52, in
balance_classes=True)

File "/media/xingyi/Tools4TB/KPConv-PyTorch/datasets/SemanticKitti.py", line 103, in init
doc = yaml.safe_load(stream)

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/init.py", line 162, in safe_load
return load(stream, SafeLoader)

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/init.py", line 114, in load
return loader.get_single_data()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/composer.py", line 58, in compose_document
self.get_event()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/parser.py", line 118, in get_event
self.current_event = self.state()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/parser.py", line 193, in parse_document_end
token = self.peek_token()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/scanner.py", line 129, in peek_token
self.fetch_more_tokens()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
return self.fetch_value()

File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/scanner.py", line 579, in fetch_value
self.get_mark())

ScannerError: mapping values are not allowed here
in "./Data/SemanticKitti/semantic-kitti.yaml", line 122, column 73

multi-gpu implementation

Thanks for your great work. I'm wondering if it is easy to implement it in a multi-gpu manner based on this repository?

Testing parameters for Semantic KITTI

Hello,
When testing your network on Semantic KITTI and obtaining the the current score on the leaderboard, were the parameters values different from the current values in train and test files? like val_radius in train_SemanticKitti.py and other convolution parameters. I was trying to replicate your results and wanted to make sure that I use the same parameters that you have used.
Thanks

Error: 'list' object has no attribute 'clone' while executing train_S3DIS

Hi @HuguesTHOMAS ,
I was trying to run the S3DIS for just area1 and area2

I changed the following in datasets/S3DIS.py

        self.cloud_names = ['Area_1', 'Area_2' ] #, 'Area_3', 'Area_4', 'Area_5', 'Area_6']
        self.all_splits = [0, 1] # 2, 3, 4, 5]

But I run in to the following error
AttributeError: 'list' object has no attribute 'clone'

Start training
**************
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1434, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "E:/kpconv/KPConv-PyTorch-master/train_S3DIS.py", line 304, in <module>
    trainer.train(net, training_loader, test_loader, config)
  File "E:\kpconv\KPConv-PyTorch-master\utils\trainer.py", line 188, in train
    outputs = net(batch, config)
  File "P:\Anaconda_installation\envs\aero\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "E:\kpconv\KPConv-PyTorch-master\models\architectures.py", line 325, in forward
    x = batch.points.clone().detach()
AttributeError: 'list' object has no attribute 'clone'

Kindly let me know what I should do to resolve this error.

Thanks,
Arjun.

Training Time

Hello, HuguesTHOMAS, thanks for your great work and the release of code.

I notice that the training time of the first epoch is pretty longer than the training time of the subsequent epochs by using the code of tensorflow version. I guess it is because you have cached some batchs of data in the first epoch, then the processing of the following epochs can be very quick, is that true?

But when performing the same operation (caching some batchs of data) in pytorch, I cannot find such a significant time reduction.

Can I reimplement this operation with pytorch?

Thanks for your advice. @HuguesTHOMAS

Segmentation fault (core dumped) when executing "train_ModelNet40.py"

Hi, @HuguesTHOMAS ,

When I execute python train_ModelNet40.py, I got the Segmentation fault error as the following:

Data Preparation
****************

Loading training points subsampled at 0.020
1620.2 MB loaded in 367.4s

Loading test points subsampled at 0.020

411.6 MB loaded in 40.1s

Starting Calibration (use verbose=True for more details)
Segmentation fault (core dumped)
(pytorch1.4) root@milton-ThinkCentre-M93p:/data/code10/KPConv-PyTorch#

Any hints to solve this issue?

THX!

INPUT settings

Hi, I am handling a new dataset DALES. If I want to train it without features other than point location how to provide it in the configuration.
Thanks,
Arjun.

out of memory during training

Hi Hugues,

I'm currently running some experiments with this repository on NPM3D. However, I found part of the experiments will exit automatically because of out-of-memory issue during training. I have checked the code, but I donot have any idea about it now. Have you encountered similar cases? Have you got any suggestions about it?

Here is some snapshot:

...
e007-i0009 => L=3.165 acc= 91% / t(ms): 122.0 120.0 61.5)
Traceback (most recent call last):
File "train_dist_NPM3D.py", line 349, in
trainer.train(net, training_loader, test_loader, config)
File "/home/dongnie/codebooks/KPConv-PyTorch/utils/trainer_dist.py", line 212, in train
loss.backward()
File "/home/dongnie/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/dongnie/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 3.36 GiB (GPU 2; 10.73 GiB total capacity; 3.70 GiB already allocated; 3.14 GiB free; 6.81 GiB reserved in total by PyTorch) (malloc at /opt/conda/conda-bld/pytorch_1591914742272/work/c10/cuda/CUDACachingAllocator.cpp:289)

Thanks.

Did you ever try less epoch or less step per epoch?

Dear Thomas,

Due to 500 epochs and 500 steps/epoch in your original setting cost too much time, did you ever try less epoch or step setting but can still gain the similar performance?

Thank you so much.

Missing inference on some points of the points cloud

Hi, @HuguesTHOMAS
Thanks for your work!
We are trying to use the library for the cloud segmentation on our own dataset.
We have some problems with the inference. We have the following output for a particular points cloud

You can see in blue the ground and in green the vegetation.
As you can notice we have some points that are not classified (gray ones).
For completeness, the parameters chosen (for training and inference) are the same parameters that you use for the S3DIS dataset.
Do you have some suggestions in order to have a complete classified points cloud?

vote accuracy

I could see vote accuray and validation accuray in the plot reuslts. What does "vote acc" means?
Besides, why there are votes in the test function, should we run (i.e.) 100 times to get an averaged results? Thanks and looking forward to your kind reply!

Can this work be run under Cuda 9.0?

I've tried to run this code using gpu version of pytorch 1.5. However it says the driver version is too old.
AssertionError:
The NVIDIA driver on your system is too old (found version 9000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
I've tried to downgrade to pytorch 1.1 to cope with cuda 9.0 but the module "get_worker_info" couldn't be found in that version.
I'm running it on a shared server so I don't want to reinstall cuda or Nvidia drivers. Are there any possible workarounds? Thanks a lot!

multi-processing on Windows

Hi Hugues,

I'm running the Pytorch codes on Windows system. When I set 'num_workers' in Dataloader to a value larger than 0, the training is slower than when 'num_workers' is set as 0.
However, when I run the code on Linux, the training is faster with a larger 'num_workers'.
I wonder whether you met the same issue on Windows system.

Best wishes,
Yaping

Database Class

I currently have my own code for generating batches of points and labels as below:


def load_lasfile(filename):
   inFile = laspy.file.File(filename)
   data = np.vstack([inFile.x, inFile.y, inFile.z, inFile.classification]).transpose()
   points = data.astype(np.float32)
   # labels_seg = data.loc[:, 'class'].to_numpy().astype(np.int64)
   return points  # labels_seg


def minMax(x):
   return pd.Series(index=['min', 'max'], data=[x.min(), x.max()])


class Dataset(IterableDataset):
   """
   """

   def __init__(self, file_in, path='../Data/DALES', batch_size=32, npoints=2048):
       super(Dataset).__init__()
       self.file_list = [os.path.join(path, line.strip()) for line in open(os.path.join(path, file_in))]
       self.batch_size = batch_size
       self.npoints = npoints
       self.index = 0
       self.first_batch = True
       self.sq_size = 0
       self.label_to_names = {0: 'unknown',
                              1: 'buildings',
                              2: 'cars',
                              3: 'trucks',
                              4: 'poles',
                              5: 'power lines',
                              6: 'fences',
                              7: 'ground',
                              8: 'chair',
                              9: 'vegetation'}
       self.init_labels()
       self.ignored_labels = np.array([0])
       self.dataset_task = 'segmentation'

   def next_batch(self):
       # __file = self.file_list[self.index]
       for each_file in self.file_list:
           # Find sq_size which gives value of x and y coordinates skip considering uniformity in data files
           batch_accumulation = 0
           data_ = load_lasfile(each_file)
           print("-----------{}-----------".format(each_file))
           if self.first_batch:
               data_min = data_.min(axis=0)
               data_max = data_.max(axis=0)
               x_range = data_max[0] - data_min[0]
               y_range = data_max[1] - data_min[1]
               x_unq = np.sort(np.unique(data_[:, 0]))
               y_unq = np.sort(np.unique(data_[:, 1]))
               size = 0
               while size < self.npoints and self.first_batch:
                   self.sq_size += 1
                   pos_x = np.logical_and(x_unq[self.sq_size] > data_[:, 0],
                                          data_[:, 0] >= x_unq[0])
                   pos_y = np.logical_and(y_unq[self.sq_size] > data_[:, 1],
                                          data_[:, 1] >= y_unq[0])
                   pos = np.logical_and(pos_x, pos_y)
                   size = len(data_[pos])
               self.sq_size -= 1  # to include not more than self.npoints
           self.first_batch = False
           self.y_ll = 0
           while (self.y_ll + self.sq_size) < len(y_unq):
               self.y_ul = self.y_ll + self.sq_size
               self.x_ll = 0
               while (self.x_ll + self.sq_size) < len(x_unq):
                   self.x_ul = self.x_ll + self.sq_size
                   pos_x = np.logical_and(x_unq[self.x_ul] > data_[:, 0],
                                          data_[:, 0] >= x_unq[self.x_ll])
                   pos_y = np.logical_and(y_unq[self.y_ul] > data_[:, 1],
                                          data_[:, 1] >= y_unq[self.y_ll])
                   pos = np.logical_and(pos_x, pos_y)
                   self.x_ll += self.sq_size
                   data = data_[pos]
                   data = np.resize(data, (self.npoints, data_.shape[1]))
                   if labels_seperate:
                       if batch_accumulation < self.batch_size:
                           if batch_accumulation == 0:
                               data_batch = np.reshape(data[:, :-1], (1, data.shape[0], data.shape[1] - 1))
                               labels_batch = np.reshape(data[:, -1], (1, -1))
                           else:
                               data_batch = np.vstack(
                                   (data_batch, data[:, :-1].reshape((1, data.shape[0], data.shape[1] - 1)))
                               )
                               labels_batch = np.vstack(
                                   (labels_batch, data[:, -1].reshape((1, -1)))
                               )
                           batch_accumulation += 1
                       else:
                           batch_accumulation = 0
                           yield data_batch, labels_batch
                   else:
                       if batch_accumulation < self.batch_size:
                           if batch_accumulation == 0:
                               data_batch = np.reshape(data, (1, data.shape[0], data.shape[1]))
                           else:
                               data_batch = np.vstack(
                                   (data_batch, data.reshape((1, data.shape[0], data.shape[1])))
                               )
                           batch_accumulation += 1
                       else:
                           batch_accumulation = 0
                           yield data_batch
               self.y_ll += self.sq_size
           self.y_ul = -1
           self.x_ul = -1
           pos_x = np.logical_and(x_unq[self.x_ul] > data_[:, 0],
                                  data_[:, 0] >= x_unq[self.x_ll])
           pos_y = np.logical_and(y_unq[self.y_ul] > data_[:, 1],
                                  data_[:, 1] >= y_unq[self.y_ll])
           pos = np.logical_and(pos_x, pos_y)
           data = data_[pos]
           data = np.resize(data, (self.npoints, data_.shape[1]))
           if labels_seperate:
               if batch_accumulation < self.batch_size:
                   if batch_accumulation == 0:
                       data_batch = np.reshape(data[:, :-1], (1, data.shape[0], data.shape[1] - 1))
                       labels_batch = np.reshape(data[:, -1], (1, -1))
                   else:
                       data_batch = np.vstack(
                           (data_batch, data[:, :-1].reshape((1, data.shape[0], data.shape[1] - 1)))
                       )
                       labels_batch = np.vstack(
                           (labels_batch, data[:, -1].reshape((1, -1)))
                       )
                   yield data_batch, labels_batch
           else:
               if batch_accumulation < self.batch_size:
                   if batch_accumulation == 0:
                       data_batch = np.reshape(data, (1, data.shape[0], data.shape[1]))
                   else:
                       data_batch = np.vstack(
                           (data_batch, data.reshape((1, data.shape[0], data.shape[1])))
                       )
               yield data_batch

   def init_labels(self):
       # Initialize all label parameters given the label_to_names dict
       self.num_classes = len(self.label_to_names)
       self.label_values = np.sort([k for k, v in self.label_to_names.items()])
       self.label_names = [self.label_to_names[k] for k in self.label_values]
       self.label_to_idx = {l: i for i, l in enumerate(self.label_values)}
       self.name_to_label = {v: k for k, v in self.label_to_names.items()}

   def __iter__(self):
       return self.next_batch()

   def __len__(self):
       return 2000

class Dataset was imported as DalesDataset and used in train file as below.

    training_dataset = DalesDataset(config.train_filelist, path='../Data/DALES/train', batch_size=1, npoints=2048)
    test_dataset = DalesDataset(config.test_filelist, path='../Data/DALES/test', batch_size=1, npoints=2048)

    training_loader = DataLoader(training_dataset,
                                 batch_size=1,
                                 num_workers=0, 
                                 pin_memory=True)
    test_loader = DataLoader(test_dataset,
                             batch_size=None,
                             num_workers=0, 
                             pin_memory=True)

but I run into many errors. How should the data look to feed into your model?

I did not understand the length for the generator. so I added a random 2000 for checking how it runs.
But I find it hard to integrate this into your code. Is it possible to use this code or should I make changes?
I just need an idea or a link from where I can make it proper.

Thanks,
Arjun

Data Preprocessing for SemanticKitti

Hi,

May I ask that could you provide some references for the data prereprocessing for SemanticKitti dataset? It seems there are several different steps. I am new to SLAM segmentation. Thanks!

Training with imbalanced S3DIS-like data set

I am attempting to train a model on custom point cloud data in a similar format to the S3DIS data set. One of these classes is only minimally represented relative to the size of the point cloud as a whole, and any training results in a model that simply never predicts that class.

Are there any configuration changes that I can make to prevent this from happening? In particular, I see that SemanticKitti has a balanced_class=True parameter, but S3DIS does not appear to have any such equivalent. I tried setting segloss_balance = 'class' in train_S3DIS.py, but that did not appear to make a difference.

Apologies in advance if this is not the appropriate place to ask this. I am essentially curious whether there is any class balancing for S3DIS data sets. Thanks!

Error occuring when trying to trian ModelNet40

Hi, thank you for sharing your code in Pytorch!
I was trying to run train_ModelNet40.py after sh compile_wrappers.sh which gives me running build_ext as an output.
Also, I get this error message below when I do python3 train_ModelNet40.py:
....../KPConv-PyTorch/datasets/common.py", line 64, in grid_subsampling
return cpp_subsampling.subsample(points,
AttributeError: module 'cpp_wrappers.cpp_subsampling.grid_subsampling' has no attribute 'subsample'

Could you please give me an advice with this problem?
Thank you for your help in advance.

Training with S3DIS dataset

Hi, Hugues
Thank you very much for your great codes. I have two questions on implementation with the S3DIS datasets.

I met the following error after 25 epochs:
'RuntimeError: CUDA out of memory'
I use 'batch_num = 3' and keep other parameters as the original code.
in_radius = 1.5, num_kernel_points = 15, first_subsampling_dl = 0.03, conv_radius = 2.5,
deform_radius = 6.0, KP_extent = 1.2, KP_influence = 'linear', aggregation_mode = 'sum',
first_features_dim = 128, in_features_dim = 5
I'm using RTX2080ti and PyTorch1.5.0, Cuda10.1, cudnn7.6.3
During the training, it takes about 10.7GB GPU memory. Before the 26th epoch, the error popped up.
I wonder what is the possible reason for this error.
The loss during the training keeps fluctuating around 4 starting from the 3rd epoch. I cannot see obvious decrease in loss.
e003-i0477 => L=5.083 acc= 63% / t(ms): 122.0 116.8 152.1)
e003-i0479 => L=4.044 acc= 78% / t(ms): 144.7 116.9 152.5)
e003-i0483 => L=4.744 acc= 62% / t(ms): 109.6 117.4 154.5)
e003-i0486 => L=4.503 acc= 77% / t(ms): 118.6 118.0 158.0)
e003-i0489 => L=4.099 acc= 71% / t(ms): 114.0 114.9 153.0)
e003-i0492 => L=4.226 acc= 74% / t(ms): 122.7 118.5 156.7)
e003-i0495 => L=4.434 acc= 71% / t(ms): 118.7 119.3 156.4)
e003-i0498 => L=3.956 acc= 87% / t(ms): 115.7 117.3 152.4)
...
e025-i0479 => L=4.096 acc= 76% / t(ms): 119.6 121.1 159.9)
e025-i0482 => L=3.515 acc= 85% / t(ms): 117.8 120.9 159.6)
e025-i0485 => L=3.297 acc= 91% / t(ms): 120.4 122.9 165.7)
e025-i0488 => L=4.218 acc= 74% / t(ms): 135.3 123.5 165.3)
e025-i0492 => L=4.084 acc= 67% / t(ms): 107.2 122.6 162.7)
e025-i0495 => L=4.135 acc= 75% / t(ms): 102.4 120.1 158.9)
e025-i0498 => L=3.695 acc= 81% / t(ms): 104.7 120.0 158.9)
Here are my learning parameters:
epoch_steps = 500
validation_size = 200
learning_rate = 1e-2
momentum = 0.98
lr_decays = {i: 0.1 ** (1 / 150) for i in range(1, max_epoch)}
grad_clip_norm = 100.0
I wonder whether this change in loss is expected or the loss should decrease much faster?
I would be appreciated if you could share your training figure.

Best wishes,
Yaping

Training Issue on own dataset

Hi @HuguesTHOMAS,

thanks for making your code open-source.
I am currently trying to train KPConv on my own Dataset which is in "kitti-format".

But in the last step of each epoch I get stuck in the while loop in SemanticKitty.py line 772-774.

765   # Get the indices to generate thanks to potentials
766   used_classes = self.dataset.num_classes - len(self.dataset.ignored_labels)
767   class_n = num_centers // used_classes + 1
768   if class_n < class_potentials.shape[0]:
769       _, class_indices = torch.topk(class_potentials, class_n, largest=False)
770   else:
771       class_indices = torch.zeros((0,), dtype=torch.int32)
772       while class_indices.shape[0] < class_n:
773              new_class_inds = torch.randperm(class_potentials.shape[0])
774              class_indices = torch.cat((class_indices, new_class_inds), dim=0)
775       class_indices = class_indices[:class_n]
776   class_indices = self.dataset.class_frames[i][class_indices]

(I added the [0] in the while condition)

Usually it shouldn't even enter this part at that stage, should it?
Do you maybe know what my problem is and how to solve it?

Error occurs. 'python3 train_SemanticKitti.py'

Hi! Thanks for implementing the pytorch version!

I met some error when I run 'python3 train_SemanticKitti.py'.
My environment is ubuntu16.04, python=3.5, pytorch=1.2.0, cuda=10.0.
Could you give me some suggestions for that? Thanks!

Logs are below:

 python train_SemanticKitti.py 

Data Preparation
****************
Preparing seq 00 class frames. (Long but one time only)
Preparing seq 01 class frames. (Long but one time only)
Preparing seq 02 class frames. (Long but one time only)
Preparing seq 03 class frames. (Long but one time only)
Preparing seq 04 class frames. (Long but one time only)
Preparing seq 05 class frames. (Long but one time only)
Preparing seq 06 class frames. (Long but one time only)
Preparing seq 07 class frames. (Long but one time only)
Preparing seq 09 class frames. (Long but one time only)
Preparing seq 10 class frames. (Long but one time only)
Preparing seq 08 class frames. (Long but one time only)

Starting Calibration of max_in_points value (use verbose=True for more details)

Previous calibration found:
Check max_in limit dictionary
"balanced_6.000_0.060": ?
Traceback (most recent call last):
  File "train_SemanticKitti.py", line 290, in <module>
    training_sampler.calib_max_in(config, training_loader, verbose=True)
  File "/media/work/3D/KPConv-PyTorch/datasets/SemanticKitti.py", line 904, in calib_max_in
    for batch_i, batch in enumerate(dataloader):
  File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 709, in __init__
    self._try_put_index()
  File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 826, in _try_put_index
    index = self._next_index()
  File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 318, in _next_index
    return next(self.sampler_iter)  # may raise StopIteration
  File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/media/work/3D/KPConv-PyTorch/datasets/SemanticKitti.py", line 780, in __iter__
    self.dataset.potentials[class_indices] += np.random.rand(class_indices.shape[0]) * 0.1 + 0.1
TypeError: add_(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

ScanNet training code?

Hi Hugues,

Could you provide a training script for ScanNet semantic segmentation task? Thanks a lot!

Code for Part Segmentation in Pytorch

Hi,

Thank you so much for open-sourcing your code. Awesome work! I was wondering if you are planning to release Pytorch code for the shapenet part dataset as well.

Best,
Ankit

Is the torch.cuda.synchronize(self.device) necessary?

Thanks for the great work. After reading the code in trainer.py (utils/trainer.py), I'd like to ask is torch.cuda.synchronize(self.device) necessary? Since it seems slower the training.

About weight init

Hi @HuguesTHOMAS ,

Did you add the weight init of the bias or weight in MLP or BN in your code?

Will this influence the performance?

Thanks~

Without test classification and evaluation issues

Thanks for the excellent work! However, I could not find the test code for shape classification on the ModelNet40.

Different batch size different results during the inference

Hi, @HuguesTHOMAS,
I am trying to use your library on my own dataset.
My own dataset is very huege and a single point cloud has dimension that can reach the 15 GB and I have some porblems related to the out of memory during the inference phase.
Hence, I have tried to modify the batch_size parameters from 1 to 200 in order to avoid the out of memory.
Unfornutely, changing the batc_size parameters I don not obtain good results in terms of IoU a
As double check, I have tried to classify a small points cloud (1 GB) with two value of the batch_size and i have obtain:

batch_size = 1 good performance
batch_size = 200 poor performance

Do you have some ideas of how can I solve my problem?
Thank you,
Giovanni

Suggestion of code simplification.

I am comparing different algorithm implemented on Pytorch for 3D point cloud, I want to make Kpconv works well.
i found that is way more complicated to implement Kpconv for my own dataset in this repo and takes substantial amount of time.

i would suggest this https://github.com/valeoai/LightConvPoint code style. In this repo, the achitecture is already KPconv. this only thing that is need is KPconv layer. However, i do not how to code it.

model numbers per epoch setting when my own data ist implemented

The setting on modelnet40 is following ,if self.train: self.num_models = 9843.
does num_models mean samples?

i have 60 samples of my own data with obj file format. So, i should set this number to 40?
and what is identical code for loadtxt, is there load.obj? if not, is there anyway to convert obj to txt.

Run bug

Hi,Thanks for your great work!
I follow your instruction to run train_S3DIS.py,but it alaways stop at epoch 16 or epoch 17,I set the maxepoch is 500,and change first_subsampling_dl =0.03 to 0.08
could you help me? Thank you!

Visualization of predictions of semantic-KITTI dataset

I wonder how to do visualizations of predictions of semantic-KITTI dataset?

Accuracy with ModelNet40

Hello

I really appreciate your great work.

I asked this question on KPConv with Tensorflow implementation previously

So I ask again here.

However, I have questions related to classification accuracy on ModelNet40 with your default setting

All the things I revised are just the follows.

It's just the saving best accuracy, and best_vote accuracy process to compare the other works.

in trainer.py,

1) validation part --> to save best accuracy

revised self.validation like below and appended saving code

val_acc, vote_acc = self.validation(net, val_loader, config)

        if best_acc < val_acc:
            best_acc = val_acc
            conv_epoch = self.epoch
            if config.saving:
                # Get current state dict
                save_dict = {'epoch': self.epoch,
                            'model_state_dict': net.state_dict(),
                            'optimizer_state_dict': self.optimizer.state_dict(),
                            'saving_path': config.saving_path}

                # Save current state of the network (for restoring purposes)
                checkpoint_path = join(checkpoint_directory, 'best_acc_chkp.tar')
                torch.save(save_dict, checkpoint_path)

        if best_vote_acc < vote_acc:
            best_vote_acc = vote_acc
            conv_epoch_vote = self.epoch
            if config.saving:
                # Get current state dict
                save_dict = {'epoch': self.epoch,
                            'model_state_dict': net.state_dict(),
                            'optimizer_state_dict': self.optimizer.state_dict(),
                            'saving_path': config.saving_path}

                # Save current state of the network (for restoring purposes)
                checkpoint_path = join(checkpoint_directory, 'best_vote_acc_chkp.tar')
                torch.save(save_dict, checkpoint_path)
        
        print('>>> epoch: {:d} \nbest_acc: {:.3f} | conv_epoch: {:d} \nbest_vote_acc: {:.3f} | conv_epoch_vote: {:d} \n'.format(self.epoch, 
                                                                            best_acc, conv_epoch, 
                                                                            best_vote_acc, conv_epoch_vote))
                
        with open(join(config.saving_path, 'best_results.txt'), "a") as file:
            message = '>>> epoch: {:d} \nbest_acc: {:.3f} | conv_epoch: {:d} \nbest_vote_acc: {:.3f} | conv_epoch_vote: {:d} \n'
            file.write(message.format(self.epoch,
                                        best_acc,
                                        conv_epoch,
                                        best_vote_acc,
                                        conv_epoch_vote))

2) revise validation function to return val_acc and vote_acc

def validation(self, net, val_loader, config: Config):

    if config.dataset_task == 'classification':
        _, val_acc, vote_acc = self.object_classification_validation(net, val_loader, config)
    elif config.dataset_task == 'segmentation':
        self.object_segmentation_validation(net, val_loader, config)
    elif config.dataset_task == 'cloud_segmentation':
        self.cloud_segmentation_validation(net, val_loader, config)
    elif config.dataset_task == 'slam_segmentation':
        self.slam_segmentation_validation(net, val_loader, config)
    else:
        raise ValueError('No validation method implemented for this network type')
    return val_acc, vote_acc

3) revise object_classification_validation function to return 3 values, C1, val_acc, vote_acc to use in validation function

            return C1, val_ACC, vote_ACC

The code worked and experiment was successfully done.

The best validation accuracy was 93.75 and best vote_accuracy was 92.385 during the training

however, when I test the saved model of above training with test_models.py,
the best test accuracy was 92.1 max

My questions is that which one is right results on your code?

as I check the data loading part of your code, 2 code use the same test_dataset which is used as val_dataset in training.
Also, the test process between 2 code is regarded almost same as I think.

what makes the accuracy different between 2 codes?
Did I experiment wrong?

batch_num of pytorch version is smaller than that of tf version

Hi Hugues,

I have tried both the tf version and pytorch version of KPConv, I'd like to ask why the batch_num of pytorch version is smaller than tf version? According to my experience, I found pytorch version cost more memory than tf version.

Thanks.

Logic on SemanticKitti testing code

Thanks for your excellent work! The ending criteria for testing SemanticKitti test Dataset(with on_val=False) seems to require the minimum Frame Potentials greater than 100? And I don't understand what this potential means and I have already produced the ".npy" files in /test/probs/ which is the prediction results, but the test_model.py program keeps changing these .npy files. What's more, the mIoU could only be calculated if I turn the "on_val=True", so why can't I get the mIoU on test set and what is the stopping logic for the testing program?

Thanks in advance.

How to choose the radius size when I need to look for neighbours of each point

Hi,
I'm replicating your code, but using a different data set. I want to know how I can set the search radius to find neighbours points. In your paper, you just said: where σ is the inﬂuence distance of the kernel points, and will be chosen according to the input density, whereas in your code (config.py) you set some default values for the radius and density parameters, I wanna know if there was any reference when I'm going to set the search radius?

Difference between in_radius and conv_radius

Hi, could you please clarify the meaning of in_radius and conv_radius in the config class? If I understand correctly, conv_radius is the spherical domain of kernel function g. But it should be the same as the neighbourhood radius of each input point.

Thanks in advance!

Scannet preprocessing and training

Hi @HuguesTHOMAS,

Due to the data preprocessing may goes into many details in your work, transferring to ScanNet scenes from S3DIS training script is really non-trivial.

So could you please give us some tips of adapting the S3DIS script to ScanNet?

Highly appreciate!

bug when training S3DIS dataset in random mode instead of potential mode

Hi @HuguesTHOMAS ,
I am training S3DIS dataset in random mode instead of potential mode, if input_threads is set to 0, every thing is ok, but if input_threads is set to bigger than 0, this will lead to a crash.
My change is like this:
training_dataset = S3DISDataset(config, set='training', use_potentials=False)

Thanks!

killed when training on S3DIS

@HuguesTHOMAS thanks so much for your amazing work. However I got some problem when training on S3DIS dataset.
After Preparing KDTree and Preparing ply files, the process is killed.
Could you please helps me to find what is happening here? Thanks so much in advance!

Starting Calibration (use verbose=True for more details)

Previous calibration found:
Check batch limit dictionary
"potentials_1.500_0.030_6": ?
Check neighbors limit dictionary
"0.030_0.075": ?
"0.060_0.150": ?
"0.120_0.720": ?
"0.240_1.440": ?
"0.480_2.880": ?
Step 1 estim_b = 0.10 batch_limit = 501
Step 27 estim_b = 0.94 batch_limit = 13501
Step 48 estim_b = 1.10 batch_limit = 23801
Step 68 estim_b = 1.72 batch_limit = 32501
Step 80 estim_b = 2.34 batch_limit = 36601
Step 91 estim_b = 2.44 batch_limit = 40501
Step 105 estim_b = 2.88 batch_limit = 44701
Step 115 estim_b = 3.13 batch_limit = 47401
Step 125 estim_b = 3.13 batch_limit = 50401
Step 136 estim_b = 3.59 batch_limit = 52801
Step 147 estim_b = 3.66 batch_limit = 55301
Step 157 estim_b = 3.85 batch_limit = 57301
Step 165 estim_b = 4.06 batch_limit = 58701
Step 169 estim_b = 3.86 batch_limit = 59701
Step 181 estim_b = 4.08 batch_limit = 61901
Step 190 estim_b = 4.40 batch_limit = 63201
Step 191 estim_b = 4.36 batch_limit = 63401
Step 200 estim_b = 4.52 batch_limit = 64601
Step 206 estim_b = 4.43 batch_limit = 65601
Step 215 estim_b = 4.53 batch_limit = 66901
Step 226 estim_b = 4.85 batch_limit = 68101
Step 232 estim_b = 4.60 batch_limit = 69101
Step 241 estim_b = 4.60 batch_limit = 70301
Step 249 estim_b = 4.47 batch_limit = 71601
Step 254 estim_b = 4.45 batch_limit = 72401
Step 256 estim_b = 4.55 batch_limit = 72601
Step 257 estim_b = 4.60 batch_limit = 72701
Step 259 estim_b = 4.86 batch_limit = 72701
Step 260 estim_b = 4.88 batch_limit = 72801
Step 261 estim_b = 4.79 batch_limit = 73001
Step 262 estim_b = 4.81 batch_limit = 73101
Step 264 estim_b = 4.85 batch_limit = 73301
Step 265 estim_b = 4.86 batch_limit = 73401
Step 266 estim_b = 5.08 batch_limit = 73301
Step 269 estim_b = 5.07 batch_limit = 73601
Killed

Wrong with "import cpp_wrappers.cpp_neighbors.radius_neighbors as cpp_neighbors" in common.py file

H! Thank you for implementing the PyTorch version!

I met some error when I run 'common.py'.
My environment is mac, I have done "sh compile_wrappers.sh", but I can't get the function "radius_neighbors", Could you give me some suggestions for that? Thanks a lot!

Training code for ShapeNet part segmentation

WIll the training code/ models for ShapeNet Part segmentation be released?

Performance of KPConv on S3DIS

Hi, @HuguesTHOMAS
Thanks for your amazing work.
I wonder could this PyTorch code (seg on S3DIS) reproduce the results in the paper?

Thank you.

Data Input format

Thank you for releasing your PyTorch code. It`s awesome and nice work.
I have a question about the flow of this code.

In my knowledge, for the s3dis dataset, the input format of the network would be (Batch x Points x 3+RGB).
But, I understand why the last dimension in the input list 3~5.
But, can I ask you why your first dimension of the input list is 103955?
When I see the input of kpconv, the input have the (103955, 5) dimension.

Sorry for bothering you due to my short knowledge.

IndexError: index 65655 is out of bounds for axis 0 with size 50176

Hi! I'm getting the following error from testing slam_segmentation/semanticKitti:

e015-i0183 => L=0.626 acc= 79% / t(ms): 0.6 42.7 49.2)

Traceback (most recent call last):
  File "train_SemanticKitti.py", line 324, in <module>
    trainer.train(net, training_loader, test_loader, config)
  File "/KPConv-PyTorch/utils/trainer.py", line 167, in train
    for batch in training_loader:
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 971, in _next_data
    return self._process_data(data)
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 6.
Original Traceback (most recent call last):
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/KPConv-PyTorch/datasets/SemanticKitti.py", line 311, in __getitem__
    p0 = new_points[wanted_ind, :3]
IndexError: index 65655 is out of bounds for axis 0 with size 50176

Can you please help? Is it error in the data? I am trying to reproduce the results of semantic kitti. Thanks!

huguesthomas / kpconv-pytorch Goto Github PK

kpconv-pytorch's Issues

2) revise validation function to return val_acc and vote_acc

3) revise object_classification_validation function to return 3 values, C1, val_acc, vote_acc to use in validation function

Recommend Projects

Recommend Topics

Recommend Org