Giter Club home page Giter Club logo

stochastic_depth's Introduction

Deep Networks with Stochastic Depth

This repository hosts the Torch 7 code for the paper Deep Networks with Stochastic Depth available at http://arxiv.org/abs/1603.09382. For now, the code reproduces the results in Figure 3 for CIFAR-10 and CIFAR-100, and Figure 4 left for SVHN. The code for the 1202-layer network is easily modified from the repo fb.resnet.torch using our provided module for stochastic depth.

Table of Contents

Updates

Please see the latest implementation of stochastic depth and other cool models (DenseNet etc.) in PyTorch, by Felix Wu and Danlu Chen. Their code is much more memory efficient, more user friendly and better maintained. The 1202-layer architecture on CIFAR-10 can be trained on one TITAN X (amazingly!) under our standard settings.

Prerequisites

  • Torch 7 and CUDA with the basic packages (nn, optim, image, cutorch, cunn).
  • cudnn and torch bindings.
  • nninit; luarocks install nninit should do the trick.
  • CIFAR-10 and CIFAR-100 datasets in Torch format; this script should very conveniently handle it for you.
  • SVHN dataset in Torch format, available here. Please note that running on SVHN requires roughly 28GB of RAM for dataset loading.

Getting Started on CIFAR-10

git clone https://github.com/yueatsprograms/Stochastic_Depth
cd Stochastic_Depth
git clone https://github.com/soumith/cifar.torch
cd cifar.torch
th Cifar10BinToTensor.lua
cd ..
mkdir results
th main.lua -dataRoot cifar.torch/ -resultFolder results/ -deathRate 0.5

Usage Details

th main.lua -dataRoot path_to_data -resultFolder path_to_save -deathRate 0.5
This command runs the 110-layer ResNet on CIFAR-10 with stochastic depth, using linear decay survival probabilities ending in 0.5. The -device flag allows you to specify which GPU to run on. On our machine with a TITAN X, each epoch takes about 60 seconds, and the program ends with a test error (selected by best validation error) of 5.25%.

The default deathRate is set to 0. This is equivalent to a constant depth network, so to run our baseline, enter:
th main.lua -dataRoot path_to_data -resultFolder path_to_save
On our machine with a TITAN X, each epoch takes about 75 seconds, and this baseline program ends with a test error (selected by best validation error) of 6.41% (see Figure 3 in the paper).

You can run on CIFAR-100 by adding the flag -dataset cifar100. Our program provides other options, for example, your network depth (-N), data augmentation (-augmentation), batch size (-batchSize) etc. You can change the optimization hyperparameters in the sgdState variable, and learning rate schedule in the the main function. The program saves a file every epoch to resultFolder/errors_N_dataset_deathMode_deathRate, which has a table of tuples containing your test and validation errors until that epoch.

The architecture and number of epochs for SVHN used in our paper are slightly different from the code's default, please use the following command if you would like to replicate our result of 1.75% on SVHN:
th main.lua -dataRoot path_to_data -resultFolder path_to_save -dataset svhn -N 25 -maxEpochs 50 -deathRate 0.5

Known Problems

  • It is normal to get a +/- 0.2% difference from our reported results on CIFAR-10, and analogously for the other datasets. Networks are initialized differently, and most importantly, the validation set is chosen at random (determined by your seed).
  • If you train on SVHN and the model doesn't converge for the first 1600 or so iterations, that's ok, just wait for a little longer.
  • Xavier reported that the model is able to converge for him on CIFAR-10 only after he uses the following initalization for Batch Normalization model:add(cudnn.SpatialBatchNormalization(_dim_):init('weight', nninit.normal, 1.0, 0.002):init('bias', nninit.constant, 0)). We could not replicate the non-convergence and thus won't put this initialization into our code, but recognize that machines (or the versions of Torch installed) might be different.

Contact

My email is ys646 at cornell.edu. I'm happy to answer any of your questions, and I'd very much appreciate your suggestions. My academic website is at http://yueatsprograms.github.io.

stochastic_depth's People

Contributors

gaohuang avatar

Watchers

Wenbin Hou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.