Giter Club home page Giter Club logo

capsule-net-pytorch's Introduction

PyTorch CapsNet: Capsule Network for PyTorch

license completion No Maintenance Intended

A CUDA-enabled PyTorch implementation of CapsNet (Capsule Network) based on this paper: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules. NIPS 2017

The current test error is 0.21% and the best test error is 0.20%. The current test accuracy is 99.31% and the best test accuracy is 99.32%.

What is a Capsule

A Capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part.

You can learn more about Capsule Networks here.

Why another CapsNet implementation?

I wanted a decent PyTorch implementation of CapsNet and I couldn't find one at the point when I started. The goal of this implementation is focus to help newcomers learn and understand the CapsNet architecture and the idea of Capsules. The implementation is NOT focus on rigorous correctness of the results. In addition, the codes are not optimized for speed. To help us read and understand the codes easier, the codes comes with ample comments and the Python classes and functions are documented with Python docstring.

I will try my best to check and fix issues reported. Contributions are highly welcomed. If you find any bugs or errors in the codes, please do not hesitate to open an issue or a pull request. Thank you.

Status and Latest Updates:

See the CHANGELOG

Datasets

The model was trained on the standard MNIST data.

Note: you don't have to manually download, preprocess, and load the MNIST dataset as TorchVision will take care of this step for you.

I have tried using other datasets. See the Other Datasets section below for more details.

Requirements

  • Python 3
    • Tested with version 3.6.4
  • PyTorch
    • Tested with version 0.3.0.post4
    • Migrate existing code to work in version 0.4.0. [Work-In-Progress]
    • Code will not run with version 0.1.2 due to keepdim not available in this version.
    • Code will not run with version 0.2.0 due to softmax function doesn't takes a dimension.
  • CUDA 8 and above
    • Tested with CUDA 8 and CUDA 9.
  • TorchVision
  • tensorboardX
  • tqdm

Usage

Training and Evaluation

Step 1. Clone this repository with git and install project dependencies.

$ git clone https://github.com/cedrickchee/capsule-net-pytorch.git
$ cd capsule-net-pytorch
$ pip install -r requirements.txt

Step 2. Start the CapsNet on MNIST training and evaluation:

  • Training with default settings:
$ python main.py
  • Training on 8 GPUs with 30 epochs and 1 routing iteration:
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --epochs 30 --num-routing 1 --threads 16 --batch-size 128 --test-batch-size 128

Step 3. Test a pre-trained model:

If you have trained a model in Step 2 above, then the weights for the trained model will be saved to results/trained_model/model_epoch_10.pth. [WIP] Now just run the following command to get test results.

$ python main.py --is-training 0 --weights results/trained_model/model_epoch_10.pth

Pre-trained Model

You can download the weights for the pre-trained model from my Google Drive. We saved the weights (model state dict) and the optimizer state for the model at the end of every training epoch.

Uncompress and put the weights (.pth files) into ./results/trained_model/.

Note: the model was last trained on 2017-11-26 and the weights last updated on 2017-11-28.

The Default Hyper Parameters

Parameter Value CLI arguments
Training epochs 10 --epochs 10
Learning rate 0.01 --lr 0.01
Training batch size 128 --batch-size 128
Testing batch size 128 --test-batch-size 128
Log interval 10 --log-interval 10
Disables CUDA training false --no-cuda
Num. of channels produced by the convolution 256 --num-conv-out-channel 256
Num. of input channels to the convolution 1 --num-conv-in-channel 1
Num. of primary unit 8 --num-primary-unit 8
Primary unit size 1152 --primary-unit-size 1152
Num. of digit classes 10 --num-classes 10
Output unit size 16 --output-unit-size 16
Num. routing iteration 3 --num-routing 3
Use reconstruction loss true --use-reconstruction-loss
Regularization coefficient for reconstruction loss 0.0005 --regularization-scale 0.0005
Dataset name (mnist, cifar10) mnist --dataset mnist
Input image width to the convolution 28 --input-width 28
Input image height to the convolution 28 --input-height 28

Results

Test Error

CapsNet classification test error on MNIST. The MNIST average and standard deviation results are reported from 3 trials.

The results can be reproduced by running the following commands.

 python main.py --epochs 50 --num-routing 1 --use-reconstruction-loss no --regularization-scale 0.0       #CapsNet-v1
 python main.py --epochs 50 --num-routing 1 --use-reconstruction-loss yes --regularization-scale 0.0005   #CapsNet-v2
 python main.py --epochs 50 --num-routing 3 --use-reconstruction-loss no --regularization-scale 0.0       #CapsNet-v3
 python main.py --epochs 50 --num-routing 3 --use-reconstruction-loss yes --regularization-scale 0.0005   #CapsNet-v4
Method Routing Reconstruction MNIST (%) Paper
Baseline -- -- -- 0.39
CapsNet-v1 1 no -- 0.34 (0.032)
CapsNet-v2 1 yes -- 0.29 (0.011)
CapsNet-v3 3 no -- 0.35 (0.036)
CapsNet-v4 3 yes 0.21 0.25 (0.005)

Training Loss and Accuracy

The training losses and accuracies for CapsNet-v4 (50 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):

Training accuracy. Highest training accuracy: 100%

Training loss. Lowest training error: 0.1938%

Test Loss and Accuracy

The test losses and accuracies for CapsNet-v4 (50 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):

Test accuracy. Highest test accuracy: 99.32%

Test loss. Lowest test error: 0.2002%

Training Speed

  • Around 5.97s / batch or 8min / epoch on a single Tesla K80 GPU with batch size of 704.
  • Around 3.25s / batch or 25min / epoch on a single Tesla K80 GPUwith batch size of 128.

In my case, these are the hyperparameters I used for the training setup:

  • batch size: 128
  • Epochs: 50
  • Num. of routing: 3
  • Use reconstruction loss: yes
  • Regularization scale for reconstruction loss: 0.0005

Reconstruction

The results of CapsNet-v4.

Digits at left are reconstructed images.

[WIP] Ground truth image from dataset

Model Design

Model architecture:
------------------

Net (
  (conv1): ConvLayer (
    (conv0): Conv2d(1, 256, kernel_size=(9, 9), stride=(1, 1))
    (relu): ReLU (inplace)
  )
  (primary): CapsuleLayer (
    (conv_units): ModuleList (
      (0): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (1): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (2): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (3): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (4): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (5): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (6): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
      (7): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
    )
  )
  (digits): CapsuleLayer (
  )
  (decoder): Decoder (
    (fc1): Linear (160 -> 512)
    (fc2): Linear (512 -> 1024)
    (fc3): Linear (1024 -> 784)
    (relu): ReLU (inplace)
    (sigmoid): Sigmoid ()
  )
)

Parameters and size:
-------------------

conv1.conv0.weight: [256, 1, 9, 9]
conv1.conv0.bias: [256]
primary.conv_units.0.weight: [32, 256, 9, 9]
primary.conv_units.0.bias: [32]
primary.conv_units.1.weight: [32, 256, 9, 9]
primary.conv_units.1.bias: [32]
primary.conv_units.2.weight: [32, 256, 9, 9]
primary.conv_units.2.bias: [32]
primary.conv_units.3.weight: [32, 256, 9, 9]
primary.conv_units.3.bias: [32]
primary.conv_units.4.weight: [32, 256, 9, 9]
primary.conv_units.4.bias: [32]
primary.conv_units.5.weight: [32, 256, 9, 9]
primary.conv_units.5.bias: [32]
primary.conv_units.6.weight: [32, 256, 9, 9]
primary.conv_units.6.bias: [32]
primary.conv_units.7.weight: [32, 256, 9, 9]
primary.conv_units.7.bias: [32]
digits.weight: [1, 1152, 10, 16, 8]
decoder.fc1.weight: [512, 160]
decoder.fc1.bias: [512]
decoder.fc2.weight: [1024, 512]
decoder.fc2.bias: [1024]
decoder.fc3.weight: [784, 1024]
decoder.fc3.bias: [784]

Total number of parameters on (with reconstruction network): 8227088 (8 million)

TensorBoard

We logged the training and test losses and accuracies using tensorboardX. TensorBoard helps us visualize how the machine learn over time. We can visualize statistics, such as how the objective function is changing or weights or accuracy varied during training.

TensorBoard operates by reading TensorFlow data (events files).

How to Use TensorBoard

  1. Download a copy of the events files for the latest run from my Google Drive.
  2. Uncompress the file and put it into ./runs.
  3. Check to ensure you have installed tensorflow (CPU version). We need this for TensorBoard server and dashboard.
  4. Start TensorBoard.
$ tensorboard --logdir runs
  1. Open TensorBoard dashboard in your web browser using this URL: http://localhost:6006

Other Datasets

CIFAR10

In the spirit of experiment, I have tried using other datasets. I have updated the implementation so that it supports and works with CIFAR10. Need to note that I have not tested throughly our capsule model on CIFAR10.

Here's how we can train and test the model on CIFAR10 by running the following commands.

python main.py --dataset cifar10 --num-conv-in-channel 3 --input-width 32 --input-height 32 --primary-unit-size 2048 --epochs 80 --num-routing 1 --use-reconstruction-loss yes --regularization-scale 0.0005
Training Loss and Accuracy

The training losses and accuracies for CapsNet-v4 (80 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):

  • Highest training accuracy: 100%
  • Lowest training error: 0.3589%
Test Loss and Accuracy

The test losses and accuracies for CapsNet-v4 (80 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):

  • Highest test accuracy: 71%
  • Lowest test error: 0.5735%

TODO

  • Publish results.
  • More testing.
  • Inference mode - command to test a pre-trained model.
  • Jupyter Notebook version.
  • Create a sample to show how we can apply CapsNet to real-world application.
  • Experiment with CapsNet:
    • Try using another dataset.
    • Come out a more creative model structure.
  • Pre-trained model and weights.
  • Add visualization for training and evaluation metrics.
  • Implement recontruction loss.
  • Check algorithm for correctness.
  • Update results from TensorBoard after making improvements and bug fixes.
  • Publish updated pre-trained model weights.
  • Log the original and reconstructed images using TensorBoard.
  • Update results with reconstructed image and original image.
  • Resume training by loading model checkpoint.
  • Migrate existing code to work in PyTorch 0.4.0.

WIP is an acronym for Work-In-Progress

Credits

Referenced these implementations mainly for sanity check:

  1. TensorFlow implementation by @naturomics

Learning Resources

Here's some resources that we think will be helpful if you want to learn more about Capsule Networks:

Other Implementations

Real-world Application of CapsNet

The following is a few samples in the wild that show how we can apply CapsNet to real-world use cases.

capsule-net-pytorch's People

Contributors

cedrickchee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

capsule-net-pytorch's Issues

Test model on other dataset like CIFAR10

In the spirit of learning by experimenting, we should try CapsNet using other datasets.

Please refer to the paper under section 7 "Other datasets" for things to change to support CIFAR10 dataset.

I am planning to do this by this weekend, unless someone else have already implemented this and would like to submit a PR for this.

Show training log for reconstruction loss

Currently, the training log only shows the total loss.

For the sake of debugging and monitoring the learning progress of the network, we should show both the margin and reconstruction loss separately.

Deviations of this implementation from the original paper

I have come across the following main deviation between the paper(https://arxiv.org/abs/1710.09829) and the implementation of this repo:

In the paper: there are 32 primary capsule layer with 6x6 capsules each of 8 dimensions each. Hence, we needs to have 32 independent convolution layers with 8 output channels.
In the repo: it is implemented to be having 8 independent convolution layers with 32 output channels.

reference:

https://photos.app.goo.gl/FeCg4ejNdF3eVPvh6

https://github.com/cedrickchee/capsule-net-pytorch/blob/master/capsule_layer.py#L52

I have noticed this issue in another repo as well. I'm not sure if there is a misunderstanding in my interpretation of the paper.
gram-ai/capsule-networks#23

Can you please check and comment on this?

Memory required explodes when the CapsLayer instance is called

@cedrickchee The memory required by the network increases from 3 GB to about 25 GB after the finishing of the Primary caps layer and after Capslayer is called. I wonder why so. I'm not able to solve this issue. The batch size is (8,128,64,6) which is not too large. On checking through print statements it is the classes Capslayer and Agreement routing that are causing an issue. Please help me out.

bug

thanks for this very good code!
But I got a problem here,when I try to Test a pre-trained model, I use the bash "python main.py --is-training 0 --weights results/trained_model/model_epoch_5.pth"as README said. Then I got this problem:
main.py: error: unrecognized arguments: --is-training 0 --weights results/trained_model/model_epoch_5.pth
I have tried to add the arguments in main():
parser.add_argument('--is-training',type=bool,default=1)
parser.add_argument('--weights',default='results/trained_model/model_epoch_5.pth')
but it didn't work.
Could you please help me out?

Issue in Routing and Squash

In routing method, F.softmax is called on b_ij which is 1 x 1152 x 10 x 1 but it should be called on dimension 2. Of course pytorch 0.2.0 does not yet have softmax where dim can be supplied but using F.softmax as done here may not behave as required.

Also the default value for dim param in squash function is 2 but in line 93 when it is called on s_j to compute v_j, default dimension will not work since s_j is 128 x 1 x 10 x 16 x 1 and so dim = 3 should be passed.
Correct me if i'm wrong. But if I'm correct and with these changes can we get in 10 epochs the expected results?

Continue training?

Train CapsulesNet is known as time-consuming and not yet optimized. So, I want an option of loading pre-trained weights (e.g. provided file path) and continue training.

Activation functions on FC decoder layers

First, thanks for this very good code!

The original paper proposes a decoder with three fully connected layers:

FC+ReLU -> FC+ReLU -> FC+Sigmoid.

But your decoder code seens to do FC -> FC -> FC -> ReLU -> Sigmoid:

self.fc1 = nn.Linear(num_classes * output_unit_size, fc1_output_size) # input dim 10 * 16.
self.fc2 = nn.Linear(fc1_output_size, fc2_output_size)
self.fc3 = nn.Linear(fc2_output_size, self.fc3_output_size)
# Activation functions
self.relu = nn.ReLU(inplace=True)
self.sigmoid = nn.Sigmoid()

Would not be correct this way?

self.fc1 = nn.Linear(num_classes * output_unit_size, fc1_output_size) # input dim 10 * 16.
self.relu = nn.ReLU(inplace=True)
self.fc2 = nn.Linear(fc1_output_size, fc2_output_size)
self.relu = nn.ReLU(inplace=True)
self.fc3 = nn.Linear(fc2_output_size, self.fc3_output_size)
self.sigmoid = nn.Sigmoid()

Support PyTorch 0.4.0

Migrate existing code to work in PyTorch version 0.4.0. PyTorch 0.4.0 introduced a major core changes that changed the previous API version.

Status: I have started work on this and making code changes by referring to the official PyTorch migration guide.

Test error results reported are actually loss figures and are not comparable with paper reported accuracies.

Thanks for a very nicely presented implementation of Capsule Networks. I especially appreciate the tensorboard plots.

Unfortunately I believe you have mixed up "test error" with "test loss" when reporting your best results and comparing with the results from the paper.

The paper shows a table of test classification accuracy (Table 1) and reports a best error of 0.25%. This will have been calculated as:

(number of incorrectly classified test images) / (total number of test images) * 100%

Thus since there are 10,000 test images this equates to 25 mis-classified images for 0.25% error.

This is equivalent to an accuracy of 99.75%

Unfortunately you list test accuracy and test error figures that do not sum to 100% because you are listing the test loss figure which is not a useful measure of the classification accuracy of the network.

Although I have not seen an independent implementation on the net that claims to achieve this 99.75% figure, I have seen several that achieve greater than 99.6% (my own implementation has achieved 99.68% in 50 epochs). Since your best test accuracy is 99.32% it is possible that you have some error in your implementation as this is quite a way from the 99.75% achieved by the authors of the paper.

Issue of squash dimension

Is the squash operation applied on dimension 8 or 1152?

return utils.squash(unit, dim=2) # dim 2 is the third dim (1152D array) in our tensor

It seems like the squash is applied on dim=2, namely 1152D array but I have seen other implementation doing this on the 8D vector dimension namely dim=1. Is this correct?

RGB 256*256 image

Can you tell us how can we use your code to classify RGB images.

Our dataset is like that:

class1:
0001.jpg
0002.jpg
Class2:
001.jpg
002.jpg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.