Giter Club home page Giter Club logo

training-batchnorm-and-only-batchnorm's Introduction

Training-BatchNorm-and-Only-BatchNorm

Experiments with the ideas presented in Training BatchNorm and Only BatchNorm: On the c of Random Features in CNNs by Frankle et al. In this paper, Frankle et al. explore the expressiveness of the random features in CNNs by starting with the following experimental setup:

  • They first set all the layers of a CNN to trainable=False.
  • Before they kickstart model training, they also set the Batch Norm layers to be trainable.

This simple experimental setup led to some pretty amazing discoveries on the expressive power of the randomly initialized layers in a CNN. So, the authors further explore the question - what if we only train the Batch Norm layers and lead this setup to a potential optimum? Their findings were pretty intriguing.

Dataset used

CIFAR10

Architecture used

ResNet20 (Thanks to the Keras Idiomatic Programmer repo)

About the files

  • CIFAR10_Subset.ipynb: Runs experiments on a GPU with a subset of the CIFAR10 dataset.
  • CIFAR10_Full.ipynb: Runs experiments on a GPU with the full CIFAR10 dataset.
  • CIFAR10_Full_TPU.ipynb: Runs experiments on a TPU with the full CIFAR10 dataset.
  • CIFAR10_Full_TPU_Different_LR_Schedules.ipynb: Runs experiments on a TPU with the full CIFAR10 dataset but with different learning rate schedules.
  • All_Layers_Frozen.ipynb: As the name suggests this notebook shows what happens when all the layers of a CNN is made non-trainable.
  • Varying_Batch_Sizes.ipynb: Runs experiments with varying batch sizes (only batch norm layer as trainable).
  • Visualization.ipynb: Visualizes the learned convolution filters of the networks.
  • Visualization_II.ipynb: Almost same as Visualization.ipynb with a bit different visualization plots.

Some interesting findings (of course credits to the authors)

Below is the output of the first trained convolution layer (all the layers were trained from scratch in this case)

Below is the output of the first trained convolution layer (this time only the Batch Norm layers were trained)

More results can be found here: https://app.wandb.ai/sayakpaul/training-bn-only. A more detailed report can be found here.

Important note

I trained both the variants of the networks for 75 epochs. Naturally, the one that contains only the BN layers as trainable ones would take longer to converge because of the number of parameters. But that can be used as a proxy to alleviate the problems of huge model size.

Acknowledgements

Although the notebooks are available as Colab-ready I trained all of them on a pre-configured AI Platform Notebook to make the experiments more reproducible. Thanks to the ML-GDE program program for the GCP Credits.

training-batchnorm-and-only-batchnorm's People

Contributors

sayakpaul avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.