Giter Club home page Giter Club logo

deep_gcns's Introduction

DeepGCNs: Can GCNs Go as Deep as CNNs?

In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly residual/dense connections and dilated convolutions, and adapt them to GCN architectures. Through extensive experiments, we show the positive effect of these deep GCN frameworks.

[Project] [Paper] [Slides] [Tensorflow Code] [Pytorch Code]

Overview

We do extensive experiments to show how different components (#Layers, #Filters, #Nearest Neighbors, Dilation, etc.) effect DeepGCNs. We also provide ablation studies on different type of Deep GCNs (MRGCN, EdgeConv, GraphSage and GIN).

Further information and details please contact Guohao Li and Matthias Müller.

Requirements

Conda Environment

In order to setup a conda environment with all neccessary dependencies run,

conda env create -f environment.yml

Getting Started

You will find detailed instructions how to use our code for semantic segmentation of 3D point clouds, in the folder sem_seg. Currently, we provide the following:

  • Conda environment
  • Setup of S3DIS Dataset
  • Training code
  • Evaluation code
  • Several pretrained models
  • Visualization code

Citation

Please cite our paper if you find anything helpful,

@InProceedings{li2019deepgcns,
    title={DeepGCNs: Can GCNs Go as Deep as CNNs?},
    author={Guohao Li and Matthias Müller and Ali Thabet and Bernard Ghanem},
    booktitle={The IEEE International Conference on Computer Vision (ICCV)},
    year={2019}
}
@misc{li2019deepgcns_journal,
    title={DeepGCNs: Making GCNs Go as Deep as CNNs},
    author={Guohao Li and Matthias Müller and Guocheng Qian and Itzel C. Delgadillo and Abdulellah Abualshour and Ali Thabet and Bernard Ghanem},
    year={2019},
    eprint={1910.06849},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

License

MIT License

Acknowledgement

This code is heavily borrowed from PointNet and EdgeConv. We would also like to thank 3d-semantic-segmentation for the visualization code.

deep_gcns's People

Contributors

cclauss avatar lightaime avatar thias15 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep_gcns's Issues

Error occurs when run 'sh train_job.sh' with 1 TitanX (gpu:12GB)

......

**** EPOCH 001 ****

Current batch/total batch num: 0/9949
2019-06-08 19:12:55.361276: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-06-08 19:12:55.437713: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-06-08 19:13:01.682376: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-06-08 19:13:01.729082: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-06-08 19:13:04.594699: E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.0.5 but source was compiled with: 7.1.4. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2019-06-08 19:13:04.595317: W ./tensorflow/stream_executor/stream.h:2093] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1741, in
main()
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 333, in
train()
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 275, in train
train_one_epoch(sess, ops, train_writer)
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 320, in train_one_epoch
feed_dict=feed_dict)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cudnn PoolBackward launch failed
[[node tower_0/gradients/tower_0/maxpool/maxpool_grad/MaxPoolGrad (defined at /media/zgh/winz/3D/deep_gcns/sem_seg/train.py:223) = MaxPoolGrad[T=DT_FLOAT, data_format="NHWC", ksize=[1, 4096, 1, 1], padding="VALID", strides=[1, 2, 2, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/adj_conv_final/Relu, tower_0/maxpool/maxpool, tower_0/gradients/tower_0/Tile_28_grad/Sum)]]
[[{{node tower_0/gradients/AddN_147/_2879}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_19645_tower_0/gradients/AddN_147", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'tower_0/gradients/tower_0/maxpool/maxpool_grad/MaxPoolGrad', defined at:
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1741, in
main()
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 333, in
train()
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 223, in train
grads = trainer.compute_gradients(loss)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
return grad_fn() # Exit early
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in
lambda: grad_fn(op, *out_grads))
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_grad.py", line 607, in _MaxPoolGrad
data_format=op.get_attr("data_format"))
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 5081, in max_pool_grad
data_format=data_format, name=name)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

...which was originally created as op u'tower_0/maxpool/maxpool', defined at:
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/runrunrun/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1741, in
main()
[elided 2 identical lines from previous traceback]
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 333, in
train()
File "/media/zgh/winz/3D/deep_gcns/sem_seg/train.py", line 204, in train
skip_connect=SKIP_CONNECT)
File "/media/zgh/winz/3D/deep_gcns/sem_seg/model.py", line 50, in init
fusion = self.build_fusion_block(graphs, num_vertices)
File "/media/zgh/winz/3D/deep_gcns/sem_seg/model.py", line 115, in build_fusion_block
out_max = tf_util.max_pool2d(out, [num_vertices, 1], padding='VALID', scope='maxpool')
File "/media/zgh/winz/3D/deep_gcns/utils/tf_util.py", line 381, in max_pool2d
name=sc.name)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2140, in max_pool
name=name)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 4641, in max_pool
data_format=data_format, name=name)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/3D/.virtualenvs/py2te112/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): cudnn PoolBackward launch failed
[[node tower_0/gradients/tower_0/maxpool/maxpool_grad/MaxPoolGrad (defined at /media/zgh/winz/3D/deep_gcns/sem_seg/train.py:223) = MaxPoolGrad[T=DT_FLOAT, data_format="NHWC", ksize=[1, 4096, 1, 1], padding="VALID", strides=[1, 2, 2, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/adj_conv_final/Relu, tower_0/maxpool/maxpool, tower_0/gradients/tower_0/Tile_28_grad/Sum)]]
[[{{node tower_0/gradients/AddN_147/_2879}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_19645_tower_0/gradients/AddN_147", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

We've got an error while stopping in post-mortem: <type 'exceptions.KeyboardInterrupt'>

Process finished with exit code 1

Could you help me? Thank you very much!

the obj file can not open

i got the prediction visualization file and the groudtruth file,but i can not open them with maya or 3D Max,my process of achieve files as follows
python batch_inference.py --model_path log/epoch_30.ckpt --dump_dir log/dump --
output_filelist log/output_filelist.txt --room_data_filelist meta/area5_data_label.txt --visu
and the epoch_30.ckpt is acquired through training the S3DIS dataset
the zip package is the obj files(_gt,_pred),hope you can help me ,thanks a lot!
dump.zip

ImportError: cannot import name 'scale_translate_pointcloud' from 'utils'

In PyTorch code, when I try to run the train code of Part Segmentation on PartNet, I got this issue, can you help?

Traceback (most recent call last):
File "main.py", line 13, in
from utils import scale_translate_pointcloud
ImportError: cannot import name 'scale_translate_pointcloud' from 'utils' (/home/jrl/Github/deep_gcns_torch-master/examples/part_sem_seg/../../utils/init.py)

Graph convolution

hi, @lightaime
I'm a beginner in GCN. In many papers, I see that GCN has a graph Laplacian matrix, but you don't seem to use it. How do you implement graph convolution?

missing 'meta/anno_paths.txt'

I download dataset and unzip into data/. And run the collect_indoor3d_data.py, it shows:
Traceback (most recent call last): File "collect_indoor3d_data.py", line 6, in <module> import indoor3d_util File "/deep_gcns/sem_seg/indoor3d_util.py", line 14, in <module> g_classes = [x.rstrip() for x in open(os.path.join(BASE_DIR, 'meta/class_names.txt'))] FileNotFoundError: [Errno 2] No such file or directory: '/deep_gcns/sem_seg/meta/class_names.txt'

The error, please help me.

When I run the code, I have not download the dataset Stanford3dDataset_v1.2_Aligned_Version.zip.
I just use S3DIS. The error occured as fellow:

2021-02-21 05:14:43.910830: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 3932160 totalling 3.75MiB
2021-02-21 05:14:43.910862: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 8388608 totalling 48.00MiB
2021-02-21 05:14:43.910895: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 11665408 totalling 11.12MiB
2021-02-21 05:14:43.910927: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 14680064 totalling 42.00MiB
2021-02-21 05:14:43.910960: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 33554432 totalling 64.00MiB
2021-02-21 05:14:43.910992: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 37748736 totalling 36.00MiB
2021-02-21 05:14:43.911024: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 54 Chunks of size 134217728 totalling 6.75GiB
2021-02-21 05:14:43.911057: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 136314880 totalling 130.00MiB
2021-02-21 05:14:43.911090: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 209643264 totalling 199.93MiB
2021-02-21 05:14:43.911122: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 268435456 totalling 1.50GiB
2021-02-21 05:14:43.911158: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 536870912 totalling 2.00GiB
2021-02-21 05:14:43.911191: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 10.79GiB
2021-02-21 05:14:43.911226: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                 12044415796
InUse:                 11584530176
MaxInUse:              11584530176
NumAllocs:                     590
MaxAllocSize:            856752640

2021-02-21 05:14:43.911371: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *************************************************************************************************___
2021-02-21 05:14:43.911439: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at topk_op.cc:92 : Resource exhausted: OOM when allocating tensor with shape[1073742079] and type int8 on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1073742079] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node tower_0/TopKV2_7}} = TopKV2[T=DT_FLOAT, _class=["loc:@tower_0/cond_7/strided_slice_2/Switch"], sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/Neg_7, tower_0/TopKV2_7/k)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[{{node tower_0/gradients/AddN_30/_1927}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8956_tower_0/gradients/AddN_30", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 346, in <module>
    train()
  File "train.py", line 289, in train
    train_one_epoch(sess, ops, train_writer)
  File "train.py", line 333, in train_one_epoch
    feed_dict=feed_dict)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1073742079] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node tower_0/TopKV2_7 (defined at /home/deep_gcns-master/deep_gcns-master/utils/tf_util.py:672)  = TopKV2[T=DT_FLOAT, _class=["loc:@tower_0/cond_7/strided_slice_2/Switch"], sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/Neg_7, tower_0/TopKV2_7/k)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[{{node tower_0/gradients/AddN_30/_1927}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8956_tower_0/gradients/AddN_30", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op 'tower_0/TopKV2_7', defined at:
  File "train.py", line 346, in <module>
    train()
  File "train.py", line 222, in train
    dilations=DILATIONS)
  File "/home/deep_gcns-master/deep_gcns-master/sem_seg/model.py", line 55, in __init__
    dilations)
  File "/home/deep_gcns-master/deep_gcns-master/sem_seg/model.py", line 102, in build_gcn_backbone_block
    is_training=self.is_training)
  File "/home/deep_gcns-master/deep_gcns-master/gcn_lib/gcn_utils.py", line 47, in build
    is_training=is_training)
  File "/home/deep_gcns-master/deep_gcns-master/gcn_lib/tf_edge.py", line 41, in dilated_knn_graph
    neigh_idx = tf_util.knn(dists, k=k*dilation)
  File "/home/deep_gcns-master/deep_gcns-master/utils/tf_util.py", line 672, in knn
    _, nn_idx = tf.nn.top_k(neg_adj, k=k)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 2359, in top_k
    return gen_nn_ops.top_kv2(input, k=k, sorted=sorted, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 7701, in top_kv2
    "TopKV2", input=input, k=k, sorted=sorted, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1073742079] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node tower_0/TopKV2_7 (defined at /home/deep_gcns-master/deep_gcns-master/utils/tf_util.py:672)  = TopKV2[T=DT_FLOAT, _class=["loc:@tower_0/cond_7/strided_slice_2/Switch"], sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/Neg_7, tower_0/TopKV2_7/k)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[{{node tower_0/gradients/AddN_30/_1927}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8956_tower_0/gradients/AddN_30", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors keep popping up, the code doesn't stop, it just keeps reporting errors and looping through EPOACH001.
Please help me ,thank you!

why the inpus_ph default to shape=(batch_size, num_vertices, 9)?

Hi!
As your write in inputs_ph = tf.placeholder(tf.float32, shape=(batch_size, num_vertices, 9)).

Why it is default to 9? where is the 9 from? I think the 3D point cloud should be 3.

If I wish to use the model to my own point cloud dataset with (x,y,z) coordinates => 3, should I change the original code to shape=(batch_size, num_vertices, 3)?

thanks!

The environment errors when run the script 'sh +x train_job.sh'

**** EPOCH 001 ****

Current batch/total batch num: 0/1243
2019-09-18 09:37:40.102211: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-09-18 09:37:40.303894: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-09-18 09:37:42.000486: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-09-18 09:37:42.117158: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-09-18 09:37:44.299611: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemmBatched: CUBLAS_STATUS_EXECUTION_FAILED
2019-09-18 09:37:44.299648: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[8,4096,3], b.shape=[8,3,4096], m=4096, n=4096, k=3, batch_size=8
[[{{node tower_0/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/Squeeze, tower_0/transpose)]]
[[{{node tower_1/adj_conv_27/bn/cond/add/_4375}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_25108_tower_1/adj_conv_27/bn/cond/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 327, in
train()
File "train.py", line 270, in train
train_one_epoch(sess, ops, train_writer)
File "train.py", line 314, in train_one_epoch
feed_dict=feed_dict)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[8,4096,3], b.shape=[8,3,4096], m=4096, n=4096, k=3, batch_size=8
[[node tower_0/MatMul (defined at /disk/tia/tia/deep_gcns/utils/tf_util.py:655) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/Squeeze, tower_0/transpose)]]
[[{{node tower_1/adj_conv_27/bn/cond/add/_4375}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_25108_tower_1/adj_conv_27/bn/cond/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'tower_0/MatMul', defined at:
File "train.py", line 327, in
train()
File "train.py", line 203, in train
skip_connect=SKIP_CONNECT)
File "/disk/tia/tia/deep_gcns/sem_seg/model.py", line 49, in init
skip_connect)
File "/disk/tia/tia/deep_gcns/sem_seg/model.py", line 82, in build_gcn_backbone_block
is_training=self.is_training)
File "/disk/tia/tia/deep_gcns/gcn_lib/gcn_utils.py", line 50, in build
is_training=is_training)
File "/disk/tia/tia/deep_gcns/gcn_lib/tf_edge.py", line 40, in dilated_knn_graph
dists = distance_metric(vertex_features)
File "/disk/tia/tia/deep_gcns/utils/tf_util.py", line 655, in pairwise_distance
point_cloud_inner = tf.matmul(point_cloud, point_cloud_transpose)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/data/anaconda3/envs/lmt36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[8,4096,3], b.shape=[8,3,4096], m=4096, n=4096, k=3, batch_size=8
[[node tower_0/MatMul (defined at /disk/tia/tia/deep_gcns/utils/tf_util.py:655) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/Squeeze, tower_0/transpose)]]
[[{{node tower_1/adj_conv_27/bn/cond/add/_4375}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_25108_tower_1/adj_conv_27/bn/cond/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

My environment is CUDA 9.0; cudnn 7.4; tensorflow-gpu 1.12.0 with two 2080ti
Can you help me with this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.