huguesthomas / kpconv-pytorch Goto Github PK
View Code? Open in Web Editor NEWKernel Point Convolution implemented in PyTorch
License: MIT License
Kernel Point Convolution implemented in PyTorch
License: MIT License
Hi Thomas,
First of all, congratulations for your work and thank you in advance if you find time to answer my questions.
I work for an electric utility company, next month I will be part of a project whose goal is to classify a large ALS point cloud (~60,000 km already human labeled). I have some experience in computer vision but this will be my first project on this topic, so forgive me if I ask naive questions.
Any further feedback is appreciated, I have read your paper but I still need to study your repository, I would like first to have a confirmation that I am looking in the right direction before investing time to study more your excellent work :)
I encounter the error below when I run train_SemanticKitti.py. I did not find the file named "max_in_limits.pkl". Could you give any hints? Thanks!
Starting Calibration of max_in_points value (use verbose=True for more details)
Previous calibration found:
Check max_in limit dictionary
"balanced_4.000_0.060": ?
Traceback (most recent call last):
File "", line 1, in
training_sampler.calib_max_in(config, training_loader, verbose=True)
File "/media/xingyi/Tools4TB/KPConv-PyTorch/datasets/SemanticKitti.py", line 905, in calib_max_in
for batch_i, batch in enumerate(dataloader):
File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in init
self._try_put_index()
File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
index = self._next_index()
File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/xingyi/miniconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in iter
for idx in self.sampler:
File "/media/xingyi/Tools4TB/KPConv-PyTorch/datasets/SemanticKitti.py", line 797, in iter
self.dataset.epoch_inds += gen_indices
RuntimeError: The size of tensor a (4400) must match the size of tensor b (4084) at non-singleton dimension 0
I rename the project as HPC-Resnet.
I train sematickitti, it works well at the beginning.
Howerver, it appears a valueError: Truth values are stored in a 0D array instead of 1D array when the epoch reaches at 160 or more (sometimes at about 460 or 680).
I dont know what happens. I think this may be something in data processing.
File "train_SemanticKitti.py", line 324, in
trainer.train(net, training_loader, test_loader, config)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/trainer.py", line 271, in train
self.validation(net, val_loader, config)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/trainer.py", line 289, in validation
self.slam_segmentation_validation(net, val_loader, config)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/trainer.py", line 801, in slam_segmentation_validation
Confs[i, :, :] = fast_confusion(truth, preds, val_loader.dataset.label_values).astype(np.int32)
File "/home/ludy/PythonWorkSpace/HPC-Resnet/utils/metrics.py", line 48, in fast_confusion
raise ValueError('Truth values are stored in a {:d}D array instead of 1D array'. format(len(true.shape)))
ValueError: Truth values are stored in a 0D array instead of 1D array
hello, thank you for your great work.
I noticed that you use complex data_loaders, which may be hard to understand but in the end are more efficient.
Now I'd like to train on my own dataset, Could you please give me some tips to build a dataloader for my own dataset?
Hi,
thank you very much for making this PyTorch version available!
I wonder why at line 1230 of SemanticKitty.py
(here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/master/datasets/SemanticKitti.py#L1230) you remove the batch size, and with that all the other input samples except for the first one in input_list
, making all batches unit sized.
In fact, because of that, what goes into the network is always a single point cloud, no matter config.batch_num
.
Could you please explain why?
Doesn't that alter the training process, reducing the number of iterations and missing most of the samples for each epoch?
Does that have to do with the parallelism introduced by config.input_threads
, so each thread picks one? Debugging I probably can't see this happening. If so, the code doesn't work the same with multi-batch size and a single thread right?
Thank you!
Hi Thomas,
first, thanks for the pytorch implementation.
I would have one question regarding the motivation for the initial downsampling step. Is it mostly performed to keep the size of the point clouds manageable or is kpconv very sensitive to changes in point cloud resolution?
Of course the downsampling does not present any problems when dealing with classification or even segementation tasks where it is easy to interpolate or project the results on the original pc. However, it could present a problem if for example one would like to estimate some vector quantity for each point which might not be trivial to interpolate.
I have seen that in semanticKITTI for example you use 6 cm voxel size. Do you see a large degradation of the performance in parts where the resolution of the original point cloud is actually lower than that?
Thanks,
Zan
hi, @HuguesTHOMAS
Thanks for your amazing work! Effective and Efficient.
I have some questions w.r.t the data preprocessing of s3dis. Could you help me figure it out?
why do you train and evaluate on the subsampled S3DIS, not the original (full-resolution) one? Is that a common practice?
KPConv-PyTorch/datasets/S3DIS.py
Line 171 in 9bae9a3
0.03 m
a common practice also?
what's the function of potentials
and pot_tree
?
my understanding is taht potential is used as a way to sample the centre points of a sphere. Is that correct? what's the difference between use_potentials
and random sample
?
KPConv-PyTorch/datasets/S3DIS.py
Line 779 in 9bae9a3
KPConv-PyTorch/datasets/S3DIS.py
Line 240 in 9bae9a3
Why do you use sphere-sampled S3DIS? Are you the first one to use it? PointNet, PointCNN, DGCNN all used pillar based sampling. However, I do believe sphere-sampling preserves better spatial information and appreciate sphere-sampling.
why do you stack zeros in the feature input? why do you only use RGB + Z not RGB+xyz?
KPConv-PyTorch/datasets/S3DIS.py
Line 409 in 9bae9a3
I got an error while loading "semantic-kitti.yaml" , line 122, column 73 which is shown below. Any suggestions? Thanks!
line 122: "span class="progress-pjax-loader-bar top-0 left-0" style="width: 0%;"></span"
Traceback (most recent call last):
File "", line 52, in
balance_classes=True)
File "/media/xingyi/Tools4TB/KPConv-PyTorch/datasets/SemanticKitti.py", line 103, in init
doc = yaml.safe_load(stream)
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/init.py", line 162, in safe_load
return load(stream, SafeLoader)
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/init.py", line 114, in load
return loader.get_single_data()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/composer.py", line 58, in compose_document
self.get_event()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/parser.py", line 118, in get_event
self.current_event = self.state()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/parser.py", line 193, in parse_document_end
token = self.peek_token()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/scanner.py", line 129, in peek_token
self.fetch_more_tokens()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
return self.fetch_value()
File "/home/xingyi/miniconda3/envs/pointConv/lib/python3.7/site-packages/yaml/scanner.py", line 579, in fetch_value
self.get_mark())
ScannerError: mapping values are not allowed here
in "./Data/SemanticKitti/semantic-kitti.yaml", line 122, column 73
Thanks for your great work. I'm wondering if it is easy to implement it in a multi-gpu manner based on this repository?
Hello,
When testing your network on Semantic KITTI and obtaining the the current score on the leaderboard, were the parameters values different from the current values in train and test files? like val_radius
in train_SemanticKitti.py and other convolution parameters. I was trying to replicate your results and wanted to make sure that I use the same parameters that you have used.
Thanks
Hi @HuguesTHOMAS ,
I was trying to run the S3DIS for just area1 and area2
I changed the following in datasets/S3DIS.py
self.cloud_names = ['Area_1', 'Area_2' ] #, 'Area_3', 'Area_4', 'Area_5', 'Area_6']
self.all_splits = [0, 1] # 2, 3, 4, 5]
But I run in to the following error
AttributeError: 'list' object has no attribute 'clone'
Start training
**************
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1434, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/kpconv/KPConv-PyTorch-master/train_S3DIS.py", line 304, in <module>
trainer.train(net, training_loader, test_loader, config)
File "E:\kpconv\KPConv-PyTorch-master\utils\trainer.py", line 188, in train
outputs = net(batch, config)
File "P:\Anaconda_installation\envs\aero\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\kpconv\KPConv-PyTorch-master\models\architectures.py", line 325, in forward
x = batch.points.clone().detach()
AttributeError: 'list' object has no attribute 'clone'
Kindly let me know what I should do to resolve this error.
Thanks,
Arjun.
Hello, HuguesTHOMAS, thanks for your great work and the release of code.
I notice that the training time of the first epoch is pretty longer than the training time of the subsequent epochs by using the code of tensorflow version. I guess it is because you have cached some batchs of data in the first epoch, then the processing of the following epochs can be very quick, is that true?
But when performing the same operation (caching some batchs of data) in pytorch, I cannot find such a significant time reduction.
Can I reimplement this operation with pytorch?
Thanks for your advice. @HuguesTHOMAS
Hi, @HuguesTHOMAS ,
When I execute python train_ModelNet40.py
, I got the Segmentation fault error as the following:
Data Preparation
****************
Loading training points subsampled at 0.020
1620.2 MB loaded in 367.4s
Loading test points subsampled at 0.020
411.6 MB loaded in 40.1s
Starting Calibration (use verbose=True for more details)
Segmentation fault (core dumped)
(pytorch1.4) root@milton-ThinkCentre-M93p:/data/code10/KPConv-PyTorch#
Any hints to solve this issue?
THX!
Hi, I am handling a new dataset DALES. If I want to train it without features other than point location how to provide it in the configuration.
Thanks,
Arjun.
Hi Hugues,
I'm currently running some experiments with this repository on NPM3D. However, I found part of the experiments will exit automatically because of out-of-memory issue during training. I have checked the code, but I donot have any idea about it now. Have you encountered similar cases? Have you got any suggestions about it?
Here is some snapshot:
...
e007-i0009 => L=3.165 acc= 91% / t(ms): 122.0 120.0 61.5)
Traceback (most recent call last):
File "train_dist_NPM3D.py", line 349, in
trainer.train(net, training_loader, test_loader, config)
File "/home/dongnie/codebooks/KPConv-PyTorch/utils/trainer_dist.py", line 212, in train
loss.backward()
File "/home/dongnie/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/dongnie/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 3.36 GiB (GPU 2; 10.73 GiB total capacity; 3.70 GiB already allocated; 3.14 GiB free; 6.81 GiB reserved in total by PyTorch) (malloc at /opt/conda/conda-bld/pytorch_1591914742272/work/c10/cuda/CUDACachingAllocator.cpp:289)
Thanks.
Dear Thomas,
Due to 500 epochs and 500 steps/epoch in your original setting cost too much time, did you ever try less epoch or step setting but can still gain the similar performance?
Thank you so much.
Hi, @HuguesTHOMAS
Thanks for your work!
We are trying to use the library for the cloud segmentation on our own dataset.
We have some problems with the inference. We have the following output for a particular points cloud
You can see in blue the ground and in green the vegetation.
As you can notice we have some points that are not classified (gray ones).
For completeness, the parameters chosen (for training and inference) are the same parameters that you use for the S3DIS dataset.
Do you have some suggestions in order to have a complete classified points cloud?
I could see vote accuray and validation accuray in the plot reuslts. What does "vote acc" means?
Besides, why there are votes in the test function, should we run (i.e.) 100 times to get an averaged results? Thanks and looking forward to your kind reply!
I've tried to run this code using gpu version of pytorch 1.5. However it says the driver version is too old.
AssertionError:
The NVIDIA driver on your system is too old (found version 9000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
I've tried to downgrade to pytorch 1.1 to cope with cuda 9.0 but the module "get_worker_info" couldn't be found in that version.
I'm running it on a shared server so I don't want to reinstall cuda or Nvidia drivers. Are there any possible workarounds? Thanks a lot!
Hi Hugues,
I'm running the Pytorch codes on Windows system. When I set 'num_workers' in Dataloader to a value larger than 0, the training is slower than when 'num_workers' is set as 0.
However, when I run the code on Linux, the training is faster with a larger 'num_workers'.
I wonder whether you met the same issue on Windows system.
Best wishes,
Yaping
I currently have my own code for generating batches of points and labels as below:
def load_lasfile(filename):
inFile = laspy.file.File(filename)
data = np.vstack([inFile.x, inFile.y, inFile.z, inFile.classification]).transpose()
points = data.astype(np.float32)
# labels_seg = data.loc[:, 'class'].to_numpy().astype(np.int64)
return points # labels_seg
def minMax(x):
return pd.Series(index=['min', 'max'], data=[x.min(), x.max()])
class Dataset(IterableDataset):
"""
"""
def __init__(self, file_in, path='../Data/DALES', batch_size=32, npoints=2048):
super(Dataset).__init__()
self.file_list = [os.path.join(path, line.strip()) for line in open(os.path.join(path, file_in))]
self.batch_size = batch_size
self.npoints = npoints
self.index = 0
self.first_batch = True
self.sq_size = 0
self.label_to_names = {0: 'unknown',
1: 'buildings',
2: 'cars',
3: 'trucks',
4: 'poles',
5: 'power lines',
6: 'fences',
7: 'ground',
8: 'chair',
9: 'vegetation'}
self.init_labels()
self.ignored_labels = np.array([0])
self.dataset_task = 'segmentation'
def next_batch(self):
# __file = self.file_list[self.index]
for each_file in self.file_list:
# Find sq_size which gives value of x and y coordinates skip considering uniformity in data files
batch_accumulation = 0
data_ = load_lasfile(each_file)
print("-----------{}-----------".format(each_file))
if self.first_batch:
data_min = data_.min(axis=0)
data_max = data_.max(axis=0)
x_range = data_max[0] - data_min[0]
y_range = data_max[1] - data_min[1]
x_unq = np.sort(np.unique(data_[:, 0]))
y_unq = np.sort(np.unique(data_[:, 1]))
size = 0
while size < self.npoints and self.first_batch:
self.sq_size += 1
pos_x = np.logical_and(x_unq[self.sq_size] > data_[:, 0],
data_[:, 0] >= x_unq[0])
pos_y = np.logical_and(y_unq[self.sq_size] > data_[:, 1],
data_[:, 1] >= y_unq[0])
pos = np.logical_and(pos_x, pos_y)
size = len(data_[pos])
self.sq_size -= 1 # to include not more than self.npoints
self.first_batch = False
self.y_ll = 0
while (self.y_ll + self.sq_size) < len(y_unq):
self.y_ul = self.y_ll + self.sq_size
self.x_ll = 0
while (self.x_ll + self.sq_size) < len(x_unq):
self.x_ul = self.x_ll + self.sq_size
pos_x = np.logical_and(x_unq[self.x_ul] > data_[:, 0],
data_[:, 0] >= x_unq[self.x_ll])
pos_y = np.logical_and(y_unq[self.y_ul] > data_[:, 1],
data_[:, 1] >= y_unq[self.y_ll])
pos = np.logical_and(pos_x, pos_y)
self.x_ll += self.sq_size
data = data_[pos]
data = np.resize(data, (self.npoints, data_.shape[1]))
if labels_seperate:
if batch_accumulation < self.batch_size:
if batch_accumulation == 0:
data_batch = np.reshape(data[:, :-1], (1, data.shape[0], data.shape[1] - 1))
labels_batch = np.reshape(data[:, -1], (1, -1))
else:
data_batch = np.vstack(
(data_batch, data[:, :-1].reshape((1, data.shape[0], data.shape[1] - 1)))
)
labels_batch = np.vstack(
(labels_batch, data[:, -1].reshape((1, -1)))
)
batch_accumulation += 1
else:
batch_accumulation = 0
yield data_batch, labels_batch
else:
if batch_accumulation < self.batch_size:
if batch_accumulation == 0:
data_batch = np.reshape(data, (1, data.shape[0], data.shape[1]))
else:
data_batch = np.vstack(
(data_batch, data.reshape((1, data.shape[0], data.shape[1])))
)
batch_accumulation += 1
else:
batch_accumulation = 0
yield data_batch
self.y_ll += self.sq_size
self.y_ul = -1
self.x_ul = -1
pos_x = np.logical_and(x_unq[self.x_ul] > data_[:, 0],
data_[:, 0] >= x_unq[self.x_ll])
pos_y = np.logical_and(y_unq[self.y_ul] > data_[:, 1],
data_[:, 1] >= y_unq[self.y_ll])
pos = np.logical_and(pos_x, pos_y)
data = data_[pos]
data = np.resize(data, (self.npoints, data_.shape[1]))
if labels_seperate:
if batch_accumulation < self.batch_size:
if batch_accumulation == 0:
data_batch = np.reshape(data[:, :-1], (1, data.shape[0], data.shape[1] - 1))
labels_batch = np.reshape(data[:, -1], (1, -1))
else:
data_batch = np.vstack(
(data_batch, data[:, :-1].reshape((1, data.shape[0], data.shape[1] - 1)))
)
labels_batch = np.vstack(
(labels_batch, data[:, -1].reshape((1, -1)))
)
yield data_batch, labels_batch
else:
if batch_accumulation < self.batch_size:
if batch_accumulation == 0:
data_batch = np.reshape(data, (1, data.shape[0], data.shape[1]))
else:
data_batch = np.vstack(
(data_batch, data.reshape((1, data.shape[0], data.shape[1])))
)
yield data_batch
def init_labels(self):
# Initialize all label parameters given the label_to_names dict
self.num_classes = len(self.label_to_names)
self.label_values = np.sort([k for k, v in self.label_to_names.items()])
self.label_names = [self.label_to_names[k] for k in self.label_values]
self.label_to_idx = {l: i for i, l in enumerate(self.label_values)}
self.name_to_label = {v: k for k, v in self.label_to_names.items()}
def __iter__(self):
return self.next_batch()
def __len__(self):
return 2000
class Dataset was imported as DalesDataset and used in train file as below.
training_dataset = DalesDataset(config.train_filelist, path='../Data/DALES/train', batch_size=1, npoints=2048)
test_dataset = DalesDataset(config.test_filelist, path='../Data/DALES/test', batch_size=1, npoints=2048)
training_loader = DataLoader(training_dataset,
batch_size=1,
num_workers=0,
pin_memory=True)
test_loader = DataLoader(test_dataset,
batch_size=None,
num_workers=0,
pin_memory=True)
but I run into many errors. How should the data look to feed into your model?
I did not understand the length for the generator. so I added a random 2000 for checking how it runs.
But I find it hard to integrate this into your code. Is it possible to use this code or should I make changes?
I just need an idea or a link from where I can make it proper.
Thanks,
Arjun
Hi,
May I ask that could you provide some references for the data prereprocessing for SemanticKitti dataset? It seems there are several different steps. I am new to SLAM segmentation. Thanks!
I am attempting to train a model on custom point cloud data in a similar format to the S3DIS data set. One of these classes is only minimally represented relative to the size of the point cloud as a whole, and any training results in a model that simply never predicts that class.
Are there any configuration changes that I can make to prevent this from happening? In particular, I see that SemanticKitti has a balanced_class=True
parameter, but S3DIS does not appear to have any such equivalent. I tried setting segloss_balance = 'class'
in train_S3DIS.py
, but that did not appear to make a difference.
Apologies in advance if this is not the appropriate place to ask this. I am essentially curious whether there is any class balancing for S3DIS data sets. Thanks!
Hi, thank you for sharing your code in Pytorch!
I was trying to run train_ModelNet40.py after sh compile_wrappers.sh which gives me running build_ext as an output.
Also, I get this error message below when I do python3 train_ModelNet40.py:
....../KPConv-PyTorch/datasets/common.py", line 64, in grid_subsampling
return cpp_subsampling.subsample(points,
AttributeError: module 'cpp_wrappers.cpp_subsampling.grid_subsampling' has no attribute 'subsample'
Could you please give me an advice with this problem?
Thank you for your help in advance.
Hi, Hugues
Thank you very much for your great codes. I have two questions on implementation with the S3DIS datasets.
I met the following error after 25 epochs:
'RuntimeError: CUDA out of memory'
I use 'batch_num = 3' and keep other parameters as the original code.
in_radius = 1.5, num_kernel_points = 15, first_subsampling_dl = 0.03, conv_radius = 2.5,
deform_radius = 6.0, KP_extent = 1.2, KP_influence = 'linear', aggregation_mode = 'sum',
first_features_dim = 128, in_features_dim = 5
I'm using RTX2080ti and PyTorch1.5.0, Cuda10.1, cudnn7.6.3
During the training, it takes about 10.7GB GPU memory. Before the 26th epoch, the error popped up.
I wonder what is the possible reason for this error.
The loss during the training keeps fluctuating around 4 starting from the 3rd epoch. I cannot see obvious decrease in loss.
e003-i0477 => L=5.083 acc= 63% / t(ms): 122.0 116.8 152.1)
e003-i0479 => L=4.044 acc= 78% / t(ms): 144.7 116.9 152.5)
e003-i0483 => L=4.744 acc= 62% / t(ms): 109.6 117.4 154.5)
e003-i0486 => L=4.503 acc= 77% / t(ms): 118.6 118.0 158.0)
e003-i0489 => L=4.099 acc= 71% / t(ms): 114.0 114.9 153.0)
e003-i0492 => L=4.226 acc= 74% / t(ms): 122.7 118.5 156.7)
e003-i0495 => L=4.434 acc= 71% / t(ms): 118.7 119.3 156.4)
e003-i0498 => L=3.956 acc= 87% / t(ms): 115.7 117.3 152.4)
...
e025-i0479 => L=4.096 acc= 76% / t(ms): 119.6 121.1 159.9)
e025-i0482 => L=3.515 acc= 85% / t(ms): 117.8 120.9 159.6)
e025-i0485 => L=3.297 acc= 91% / t(ms): 120.4 122.9 165.7)
e025-i0488 => L=4.218 acc= 74% / t(ms): 135.3 123.5 165.3)
e025-i0492 => L=4.084 acc= 67% / t(ms): 107.2 122.6 162.7)
e025-i0495 => L=4.135 acc= 75% / t(ms): 102.4 120.1 158.9)
e025-i0498 => L=3.695 acc= 81% / t(ms): 104.7 120.0 158.9)
Here are my learning parameters:
epoch_steps = 500
validation_size = 200
learning_rate = 1e-2
momentum = 0.98
lr_decays = {i: 0.1 ** (1 / 150) for i in range(1, max_epoch)}
grad_clip_norm = 100.0
I wonder whether this change in loss is expected or the loss should decrease much faster?
I would be appreciated if you could share your training figure.
Best wishes,
Yaping
Hi @HuguesTHOMAS,
thanks for making your code open-source.
I am currently trying to train KPConv on my own Dataset which is in "kitti-format".
But in the last step of each epoch I get stuck in the while loop in SemanticKitty.py line 772-774.
765 # Get the indices to generate thanks to potentials
766 used_classes = self.dataset.num_classes - len(self.dataset.ignored_labels)
767 class_n = num_centers // used_classes + 1
768 if class_n < class_potentials.shape[0]:
769 _, class_indices = torch.topk(class_potentials, class_n, largest=False)
770 else:
771 class_indices = torch.zeros((0,), dtype=torch.int32)
772 while class_indices.shape[0] < class_n:
773 new_class_inds = torch.randperm(class_potentials.shape[0])
774 class_indices = torch.cat((class_indices, new_class_inds), dim=0)
775 class_indices = class_indices[:class_n]
776 class_indices = self.dataset.class_frames[i][class_indices]
(I added the [0] in the while condition)
Usually it shouldn't even enter this part at that stage, should it?
Do you maybe know what my problem is and how to solve it?
Hi! Thanks for implementing the pytorch version!
I met some error when I run 'python3 train_SemanticKitti.py'.
My environment is ubuntu16.04, python=3.5, pytorch=1.2.0, cuda=10.0.
Could you give me some suggestions for that? Thanks!
Logs are below:
python train_SemanticKitti.py
Data Preparation
****************
Preparing seq 00 class frames. (Long but one time only)
Preparing seq 01 class frames. (Long but one time only)
Preparing seq 02 class frames. (Long but one time only)
Preparing seq 03 class frames. (Long but one time only)
Preparing seq 04 class frames. (Long but one time only)
Preparing seq 05 class frames. (Long but one time only)
Preparing seq 06 class frames. (Long but one time only)
Preparing seq 07 class frames. (Long but one time only)
Preparing seq 09 class frames. (Long but one time only)
Preparing seq 10 class frames. (Long but one time only)
Preparing seq 08 class frames. (Long but one time only)
Starting Calibration of max_in_points value (use verbose=True for more details)
Previous calibration found:
Check max_in limit dictionary
"balanced_6.000_0.060": ?
Traceback (most recent call last):
File "train_SemanticKitti.py", line 290, in <module>
training_sampler.calib_max_in(config, training_loader, verbose=True)
File "/media/work/3D/KPConv-PyTorch/datasets/SemanticKitti.py", line 904, in calib_max_in
for batch_i, batch in enumerate(dataloader):
File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 278, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 709, in __init__
self._try_put_index()
File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 826, in _try_put_index
index = self._next_index()
File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 318, in _next_index
return next(self.sampler_iter) # may raise StopIteration
File "/home/kx/venv/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 200, in __iter__
for idx in self.sampler:
File "/media/work/3D/KPConv-PyTorch/datasets/SemanticKitti.py", line 780, in __iter__
self.dataset.potentials[class_indices] += np.random.rand(class_indices.shape[0]) * 0.1 + 0.1
TypeError: add_(): argument 'other' (position 1) must be Tensor, not numpy.ndarray
Hi Hugues,
Could you provide a training script for ScanNet semantic segmentation task? Thanks a lot!
Hi,
Thank you so much for open-sourcing your code. Awesome work! I was wondering if you are planning to release Pytorch code for the shapenet part dataset as well.
Best,
Ankit
Thanks for the great work. After reading the code in trainer.py (utils/trainer.py), I'd like to ask is torch.cuda.synchronize(self.device) necessary? Since it seems slower the training.
Hi @HuguesTHOMAS ,
Did you add the weight init of the bias or weight in MLP or BN in your code?
Will this influence the performance?
Thanks~
Thanks for the excellent work! However, I could not find the test code for shape classification on the ModelNet40.
Hi, @HuguesTHOMAS,
I am trying to use your library on my own dataset.
My own dataset is very huege and a single point cloud has dimension that can reach the 15 GB and I have some porblems related to the out of memory during the inference phase.
Hence, I have tried to modify the batch_size parameters from 1 to 200 in order to avoid the out of memory.
Unfornutely, changing the batc_size parameters I don not obtain good results in terms of IoU a
As double check, I have tried to classify a small points cloud (1 GB) with two value of the batch_size and i have obtain:
batch_size = 1 good performance
batch_size = 200 poor performance
Do you have some ideas of how can I solve my problem?
Thank you,
Giovanni
I am comparing different algorithm implemented on Pytorch for 3D point cloud, I want to make Kpconv works well.
i found that is way more complicated to implement Kpconv for my own dataset in this repo and takes substantial amount of time.
i would suggest this https://github.com/valeoai/LightConvPoint code style. In this repo, the achitecture is already KPconv. this only thing that is need is KPconv layer. However, i do not how to code it.
The setting on modelnet40 is following ,if self.train: self.num_models = 9843.
does num_models mean samples?
i have 60 samples of my own data with obj file format. So, i should set this number to 40?
and what is identical code for loadtxt, is there load.obj? if not, is there anyway to convert obj to txt.
I wonder how to do visualizations of predictions of semantic-KITTI dataset?
Hello
I really appreciate your great work.
I asked this question on KPConv with Tensorflow implementation previously
So I ask again here.
However, I have questions related to classification accuracy on ModelNet40 with your default setting
All the things I revised are just the follows.
It's just the saving best accuracy, and best_vote accuracy process to compare the other works.
in trainer.py,
1) validation part --> to save best accuracy
revised self.validation like below and appended saving code
val_acc, vote_acc = self.validation(net, val_loader, config)
if best_acc < val_acc:
best_acc = val_acc
conv_epoch = self.epoch
if config.saving:
# Get current state dict
save_dict = {'epoch': self.epoch,
'model_state_dict': net.state_dict(),
'optimizer_state_dict': self.optimizer.state_dict(),
'saving_path': config.saving_path}
# Save current state of the network (for restoring purposes)
checkpoint_path = join(checkpoint_directory, 'best_acc_chkp.tar')
torch.save(save_dict, checkpoint_path)
if best_vote_acc < vote_acc:
best_vote_acc = vote_acc
conv_epoch_vote = self.epoch
if config.saving:
# Get current state dict
save_dict = {'epoch': self.epoch,
'model_state_dict': net.state_dict(),
'optimizer_state_dict': self.optimizer.state_dict(),
'saving_path': config.saving_path}
# Save current state of the network (for restoring purposes)
checkpoint_path = join(checkpoint_directory, 'best_vote_acc_chkp.tar')
torch.save(save_dict, checkpoint_path)
print('>>> epoch: {:d} \nbest_acc: {:.3f} | conv_epoch: {:d} \nbest_vote_acc: {:.3f} | conv_epoch_vote: {:d} \n'.format(self.epoch,
best_acc, conv_epoch,
best_vote_acc, conv_epoch_vote))
with open(join(config.saving_path, 'best_results.txt'), "a") as file:
message = '>>> epoch: {:d} \nbest_acc: {:.3f} | conv_epoch: {:d} \nbest_vote_acc: {:.3f} | conv_epoch_vote: {:d} \n'
file.write(message.format(self.epoch,
best_acc,
conv_epoch,
best_vote_acc,
conv_epoch_vote))
def validation(self, net, val_loader, config: Config):
if config.dataset_task == 'classification':
_, val_acc, vote_acc = self.object_classification_validation(net, val_loader, config)
elif config.dataset_task == 'segmentation':
self.object_segmentation_validation(net, val_loader, config)
elif config.dataset_task == 'cloud_segmentation':
self.cloud_segmentation_validation(net, val_loader, config)
elif config.dataset_task == 'slam_segmentation':
self.slam_segmentation_validation(net, val_loader, config)
else:
raise ValueError('No validation method implemented for this network type')
return val_acc, vote_acc
return C1, val_ACC, vote_ACC
The code worked and experiment was successfully done.
The best validation accuracy was 93.75 and best vote_accuracy was 92.385 during the training
however, when I test the saved model of above training with test_models.py,
the best test accuracy was 92.1 max
My questions is that which one is right results on your code?
as I check the data loading part of your code, 2 code use the same test_dataset which is used as val_dataset in training.
Also, the test process between 2 code is regarded almost same as I think.
what makes the accuracy different between 2 codes?
Did I experiment wrong?
Hi Hugues,
I have tried both the tf version and pytorch version of KPConv, I'd like to ask why the batch_num of pytorch version is smaller than tf version? According to my experience, I found pytorch version cost more memory than tf version.
Thanks.
Thanks for your excellent work! The ending criteria for testing SemanticKitti test Dataset(with on_val=False) seems to require the minimum Frame Potentials greater than 100? And I don't understand what this potential means and I have already produced the ".npy" files in /test/probs/ which is the prediction results, but the test_model.py program keeps changing these .npy files. What's more, the mIoU could only be calculated if I turn the "on_val=True", so why can't I get the mIoU on test set and what is the stopping logic for the testing program?
Thanks in advance.
Hi,
I'm replicating your code, but using a different data set. I want to know how I can set the search radius to find neighbours points. In your paper, you just said: where σ is the influence distance of the kernel points, and will be chosen according to the input density, whereas in your code (config.py) you set some default values for the radius and density parameters, I wanna know if there was any reference when I'm going to set the search radius?
Hi, could you please clarify the meaning of in_radius and conv_radius in the config class? If I understand correctly, conv_radius is the spherical domain of kernel function g. But it should be the same as the neighbourhood radius of each input point.
Thanks in advance!
Hi @HuguesTHOMAS,
Due to the data preprocessing may goes into many details in your work, transferring to ScanNet scenes from S3DIS training script is really non-trivial.
So could you please give us some tips of adapting the S3DIS script to ScanNet?
Highly appreciate!
Hi @HuguesTHOMAS ,
I am training S3DIS dataset in random mode instead of potential mode, if input_threads is set to 0, every thing is ok, but if input_threads is set to bigger than 0, this will lead to a crash.
My change is like this:
training_dataset = S3DISDataset(config, set='training', use_potentials=False)
Thanks!
@HuguesTHOMAS thanks so much for your amazing work. However I got some problem when training on S3DIS dataset.
After Preparing KDTree and Preparing ply files, the process is killed.
Could you please helps me to find what is happening here? Thanks so much in advance!
Starting Calibration (use verbose=True for more details)
Previous calibration found:
Check batch limit dictionary
"potentials_1.500_0.030_6": ?
Check neighbors limit dictionary
"0.030_0.075": ?
"0.060_0.150": ?
"0.120_0.720": ?
"0.240_1.440": ?
"0.480_2.880": ?
Step 1 estim_b = 0.10 batch_limit = 501
Step 27 estim_b = 0.94 batch_limit = 13501
Step 48 estim_b = 1.10 batch_limit = 23801
Step 68 estim_b = 1.72 batch_limit = 32501
Step 80 estim_b = 2.34 batch_limit = 36601
Step 91 estim_b = 2.44 batch_limit = 40501
Step 105 estim_b = 2.88 batch_limit = 44701
Step 115 estim_b = 3.13 batch_limit = 47401
Step 125 estim_b = 3.13 batch_limit = 50401
Step 136 estim_b = 3.59 batch_limit = 52801
Step 147 estim_b = 3.66 batch_limit = 55301
Step 157 estim_b = 3.85 batch_limit = 57301
Step 165 estim_b = 4.06 batch_limit = 58701
Step 169 estim_b = 3.86 batch_limit = 59701
Step 181 estim_b = 4.08 batch_limit = 61901
Step 190 estim_b = 4.40 batch_limit = 63201
Step 191 estim_b = 4.36 batch_limit = 63401
Step 200 estim_b = 4.52 batch_limit = 64601
Step 206 estim_b = 4.43 batch_limit = 65601
Step 215 estim_b = 4.53 batch_limit = 66901
Step 226 estim_b = 4.85 batch_limit = 68101
Step 232 estim_b = 4.60 batch_limit = 69101
Step 241 estim_b = 4.60 batch_limit = 70301
Step 249 estim_b = 4.47 batch_limit = 71601
Step 254 estim_b = 4.45 batch_limit = 72401
Step 256 estim_b = 4.55 batch_limit = 72601
Step 257 estim_b = 4.60 batch_limit = 72701
Step 259 estim_b = 4.86 batch_limit = 72701
Step 260 estim_b = 4.88 batch_limit = 72801
Step 261 estim_b = 4.79 batch_limit = 73001
Step 262 estim_b = 4.81 batch_limit = 73101
Step 264 estim_b = 4.85 batch_limit = 73301
Step 265 estim_b = 4.86 batch_limit = 73401
Step 266 estim_b = 5.08 batch_limit = 73301
Step 269 estim_b = 5.07 batch_limit = 73601
Killed
WIll the training code/ models for ShapeNet Part segmentation be released?
Hi, @HuguesTHOMAS
Thanks for your amazing work.
I wonder could this PyTorch code (seg on S3DIS) reproduce the results in the paper?
Thank you.
Thank you for releasing your PyTorch code. It`s awesome and nice work.
I have a question about the flow of this code.
In my knowledge, for the s3dis dataset, the input format of the network would be (Batch x Points x 3+RGB).
But, I understand why the last dimension in the input list 3~5.
But, can I ask you why your first dimension of the input list is 103955?
When I see the input of kpconv, the input have the (103955, 5) dimension.
Sorry for bothering you due to my short knowledge.
Hi! I'm getting the following error from testing slam_segmentation/semanticKitti:
e015-i0183 => L=0.626 acc= 79% / t(ms): 0.6 42.7 49.2)
Traceback (most recent call last):
File "train_SemanticKitti.py", line 324, in <module>
trainer.train(net, training_loader, test_loader, config)
File "/KPConv-PyTorch/utils/trainer.py", line 167, in train
for batch in training_loader:
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 971, in _next_data
return self._process_data(data)
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 6.
Original Traceback (most recent call last):
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/anaconda3/envs/kpconv-pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/KPConv-PyTorch/datasets/SemanticKitti.py", line 311, in __getitem__
p0 = new_points[wanted_ind, :3]
IndexError: index 65655 is out of bounds for axis 0 with size 50176
Can you please help? Is it error in the data? I am trying to reproduce the results of semantic kitti. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.