ibm / matrix-capsules-with-em-routing Goto Github PK

View Code? Open in Web Editor NEW

87.0 12.0 26.0 539 KB

A TensorFlow implementation of "Matrix Capsules with EM Routing" by Hinton et al. (2018).

License: Apache License 2.0

Python 99.71% Shell 0.29%

capsules hinton matrix-capsules em-routing dynamic-routing capsnet capsule-networks

matrix-capsules-with-em-routing's Introduction

Implementation of "Matrix Capsules with EM Routing"

A TensorFlow implementation of Hinton's paper Matrix Capsules with EM Routing by Ashley Gritzman from IBM Research AI.

E-mail: [email protected]

This implementation fixes a number of common issues that were found in other open-source implementations, the main ones being:

Parent capsules at different spatial positions compete for child capsules
Numerical instability due to parent capsules with only one child
Normalising the amount of data assigned to parent capsules

If you would like more information about these issues, please refer to the associated paper and blog.

Usage

Step 1. Download this repository with git or click the download ZIP button.

$ git clone https://github.com/IBM/matrix-capsules-with-em-routing.git
$ cd matrix-capsules-with-em-routing

Step 2. Download smallNORB dataset.

$ chmod +x data/download.sh
$ ./data/download.sh

The download is 251MB which will then be unzipped to about 856MB. The six .mat files are placed in the directory data/smallNORB/mat.

Step 3. Set up the environment with Anaconda. (See here for instructions on how to install Anaconda.)

With Anaconda (recommended):

$ conda env create -f capsenv.yml
$ conda activate capsenv

Without Anaconda:

$ pip install --requirement requirements.txt

Step 4. Generate TFRecord for train and test datasets from .mat files.

$ python ./data/convert_to_tfrecord.py

The resulting TFRecords are about 3.4GB each. The TensorFlow api employs multithreading, so this process should be fast (within a minute). If you are planning to commit to GitHub, make sure to ignore these files as they are too large to upload. The .tfrecord files for train and test datasets are placed in the data/smallNORB/tfrecord directory.

If you receive the errors:
Bus error (core dumped) python ./convert_to_tfrecord.py or
Killed python ./convert_to_tfrecord.py
these most likely indicate that you have insufficient memory (8GB should be enough), and you should try the sharded approach.

Step 5. Start the training and validation on smallNORB.

$ python train_val.py

If you need to monitor the training process, open tensorboard with this command.

$ tensorboard --logdir=./logs

To get the full list of command line flags, python train_val.py --helpfull

Step 6. Calculate test accuracy. Make sure to specify the actual path to your directory, the directory below "./logs/smallNORB/20190731_wip" is just an example.

$ python test.py --load_dir="./logs/smallNORB/20190731_wip"

Results

The graph below shows the test accuracy of our implementation after each training epoch for 1–3 iterations of EM routing. We achieve our best accuracy of 95.4% with 2 routing iterations, and with 3 iterations we get 93.7%. The table shows how our results stack up to other open source implementations available on GitHub: yl-1993, www0wwwjs1, Officium (as recorded on 28 May 2019). The accuracy of our implementation at 95.4% is a 3.8 percentage point improvement on the previous best open source implementation at 91.8%, however it is still a bit below the accuracy of Hinton et al. at 97.8%. To our knowledge, our implementation is currently the best open-source implementation available.

Implementation	Framework	Routing iterations	Test accuracy
Hinton	Not available	3	97.8%
yl-1993	PyTorch	1	74.8%
yl-1993	PyTorch	2	89.5%
yl-1993	PyTorch	3	82.5%
www0wwwjs1	Tensorflow	2	91.8%
Officium	PyTorch	3	90.9%
Ours	TensorFlow	1	86.2%
Ours	TensorFlow	2	95.4%
Ours	TensorFlow	3	93.7%

Implementation Details

If you would like more information on the implementation details, please refer to the associated paper and blog.

Acknowledgements

Jonathan Hui's blog, "Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)"
Questions and answers on OpenReview, "Matrix capsules with EM routing"
Suofei Zhang's implementation on GitHub, "Matrix-Capsules-EM-Tensorflow"
Guang Yang's implementation on GitHub, "CapsulesEM"

Contributions

Contributions are welcome, please submit a pull request.

How to Cite this Work

If you find this code useful in your academic work, please cite as follows:

A. Gritzman, "Avoiding Implementation Pitfalls of Matrix Capsules with EM Routing by Hinton et al.", in Joint Workshop on Human Brain and Artificial Intelligence (HBAI) at IJCAI'19, Macao, 2019.

Disclaimer: This is not an official IBM product.

matrix-capsules-with-em-routing's People

Contributors

Stargazers

Watchers

matrix-capsules-with-em-routing's Issues

cost_j_h = (beta_v + 0.5*tf.log(var_j)) ?

Hi Ashley,

For 'def m_step()' in em_routing.py, I can see you code 'cost_j_h = (beta_v + 0.5tf.log(var_j)) * rr_prime_sum * layer_norm_factor' prior 'cost_j = tf.reduce_sum(cost_j_h, axis=-1, keepdims=True, name="cost_j")' .
My question is whether this lead to beta_v been multiplied by 'h' times? because 'beta_v + 0.5tf.log(var_j)' will broadcast beta_v over all the h elements along the last dimension.
According formula (2) in 'M ATRIX CAPSULES WITH EM
ROUTING' paper, it should be put something like (beta_v + Sum of cost_j_h) instead of Sum of (beta_v + cost_j_h), how do you think? maybe i am wrong.

Kindly regards
Jeff

Great realization, great work!!

Routing by agreement with Transformer-based for NMT

Hello all :)

I’m trying to use Routing by agreement with TRANSFORMER-BASED for NMT task. The proposed idea is to use each output of head attention as an input capsule for a capsule network to fuse the semantic and spatial information from different heads to help boost the correction of sentence output. As below:

The implementation code is here, and Pytorch issue is here.

I have got so bad results. Kindly, I need and suggestion to work on.

I look forward to your feedback.

spatial_routing_matrix = utl.create_routing_map(child_space=1, k=1, s=1) ?

Hi Ashley,
In layers.py, 'def fc_caps( )' creat spatial_routing_matrix with 'spatial_routing_matrix = utl.create_routing_map(child_space=1, k=1, s=1)' , where child_space is 1, but i think it's not necessary to be 1 over this point, you know along the tensor shape alteration flow before (64, 7, 7, 8, *) ---> (64, 5, 5, 16, *) , the child_space should be 5 instead of 1.
And with child_space=1, the newly generated spatial_routing_matrix has shape (1,1), that will make the 'em_routing()' thereafter incorrect.
How do you think about that? maybe my reasoning is wrong somewhere?

Kindly regards
Jeff

pip requirements need to be fixed

I try to use pip install -r requirements.txt to install the dependencies.
However, some requirements are not met:

mkl-fft==1.0.12 (only 1.0.6 is available)
mkl-random==1.0.2 (only 1.0.1.1)
mkl-service==2.0.2 (not found)

failed to run cuBLAS routine cublasGemmBatchedEx issue

Hi Ashley,
Thanks for your great work.
When I ran the code, it failed with below information:

*2019-10-03 13:25:00.047383: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-10-03 13:25:00.477672: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-03 13:25:00.478676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.695
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.53GiB
2019-10-03 13:25:00.478708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-10-03 13:25:08.880050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-03 13:25:08.880113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-10-03 13:25:08.880130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-10-03 13:25:08.881112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7286 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-03 13:25:09.813176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-10-03 13:25:09.813211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-03 13:25:09.813215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-10-03 13:25:09.813218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-10-03 13:25:09.813350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7286 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-03 13:26:52.896021: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2019-10-03 13:26:52.897701: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
2019-10-03 13:26:53 CRITICAL: Traceback (most recent call last):
File "/home/jeff/anaconda2/envs/tf_36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
return fn(args)
File "/home/jeff/anaconda2/envs/tf_36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/jeff/anaconda2/envs/tf_36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[3612672,4,4], b.shape=[3612672,4,4], m=4, n=4, k=4, batch_size=3612672
[[{{node tower_0/lyr.conv_caps1/votes/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/lyr.conv_caps1/votes/Tile_1, tower_0/lyr.conv_caps1/votes/Tile, ^swap_out_tower_0/gradients/tower_0/lyr.conv_caps1/votes/MatMul_grad/MatMul_1_0, ^swap_out_tower_0/gradients/tower_0/lyr.conv_caps1/votes/MatMul_grad/MatMul_1)]]
[[{{node tower_0/class_caps/activation_out/_23}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1082_tower_0/class_caps/activation_out", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

My computer system information:
Linux Ubuntu 16.04
Nvidia GPU GeForce GTX 1070, 8G
CUDA 9.0/cuDNN 7.3
Python 3.6.8
Tensorflow version: 1.11.0-gpu

I met this problem first time with CUDA 9.2 and cuDNN 7.6, I downgraded them to CUDA 9.0 and cuDNN 7.3, but still has this issue.
I also tried to reduce 'bath_size' from 64 to 2, but still have the same problem. Any idea, why it failed?

The impact of validation set selected from test set

Dear Ashley,
Thanks for your work fixing some common mistakes in other open source implementations. I found that the validation set was selected from the test set according to the proportion of 10%. Does this lead to an inaccurate test result?

@ashleygritzman

Try to continue to train from a checkpoint

Hi guys,

When I try to continue to train the network from a ckpt_dir, I use the flag "load_dir" to do that.
python3 train_val.py --load_dir=./logs/smallNORB/20200103_/train/checkpoint
But the code returns:
"load_ckpt directory exists but cannot find a valid
checkpoint to resore, consider using the reset flag"
I have checked the dir and there is some checkpoints from previous training.
Is there some mistakes that I made in this process?