The edge from mohitgulla

Energy Efficient Deep Learning

Repository of Capstone Work at Data Science Institute, Columbia University in collaboration with GE Research

About The Project
Contributors
Usage
Methodology
Future Work

About The Project

The project aims to develop techniques for training and inference of machine learning models with reduced carbon footprint. Recent estimates suggest training deep learning models such as BERT, ELMo and GPT-2 requires training on multiple GPU's for a few days. The carbon emissions from training a deep learning model is equivalent to 5 times the lifetime emissions of an average car. Hence, GE requires low-latency and lighter deep learning models without compromising accuracy, which can be deployed on GE's EDGE devices. Our objective is to explore techniques that enable us to store a model in lower precision and assess its effect during inference.

Contributors

Mentors:

Tapan Shah - Lead Machine Learning Scientist, GE Research
Eleni Drinea - Lecturer, Data Science Institute, Columbia University

Capstone Team:

Mohit Gulla, Kumari Nishu, Neelam Patodia, Prasham Sheth, Pritam Biswas

Usage

Demo / Tutorial

For a detailed walkthrough of the main techniques, i.e. multi-point mixed precision post-training quantization, pruning, and quantization aware training, please refer to notebook Demo_Code.ipynb.

Directory Structure

data - contains .py files with contain class definition of PyTorch dataset and the corresponding .dat file. The datasets explored are ANN based Classification: Churn and Telescope, ANN based Regression: MV Data and California Housing and CNN based Classification: CIFAR-100 and FMNIST. A subdirectory results conatins .csv files which track accuracy and loss at different precision levels from the experiments we conducted.
model - contains .py files with model class definition for Dense Neural Networks (DNNs) and Convolutional Neural Networks (CNNs). The various architectures of each model type are defined as separate class objects within its corresponding .py file.
model_artifacts - contains .pt files of full precision trained models.
utils - contains .py files with post-training quantization, pruning and quantization-aware training methods which were explored. In post-training quantization we have implemented single-point methods such as mid-rise quantization, regular rounding, stochastic rounding and multi-point method such as the mixed precision multipoint quantization. Each method is designed to be a standalone functionality. It also contains utility code for fetching datasets, plotting graphs, etc.

All *.ipynb and *.py files in main directory has the comprehensive code for model training, weight quantization and evaluation. They leverage the code base from the sub-directories.

Methodology

Post Training Quantization

Single-point Quantization approximates a weight value using a single low precision number.

Mid-Rise

Delta - controls granularity of data quantization, high delta implies high quantization and significant loss of information
Uniform division of range of Weight values into 2^p bins for p precision
w_quantized = Delta * (floor(w/Delta) + 0.5)

Regular Rounding

Quantization Set - collect a set of landmark values using uniform bin, histogram, prior normal on weight values
Map each weight value to the nearest landmark value from quantization set

Stochastic Rounding

Quantization Set - collect a set of landmark values using uniform bis, histogram, prior normal on weight values
Assign each weight value to either the closest smaller value or the closest larger value from quantization set probabilistically

Multi-point Quantization approximates a weight value using linear combination of multiple values of low precision.

Multi-point - mixed precision method

Assign more bits to important layers, and fewer bits to unimportant layers to balance the accuracy and cost more efficiently
Achieves the same flexibility as mixed precision hardware but using only a single-precision level
The quantization set is constructed using a uniform grid on [-1, 1] with increment epsilon and each weight value w is approximated as a linear combination of low precision weight vectors.

Pruning

It is a method of compression that involves removing less contributing weights from a trained model.

Setting the neural network parameters’ values to zero to remove what we estimate are less contributing (unnecessary connections) between the layers of a neural network.
Using the magnitude of weights to determine the importance of the weights towards the model’s performance.

Quantization-Aware Training

It is a process of training the model assuming that it will be quantized later during inference.

The steps involved in QAT are:

Initialize a full precision model
Quantize model weights per layer
Forward propagate and compute gradients
Update gradients using straight through estimator
Backprop on full precision model and return quantized model

Future Work

Model Size

To get a complete picture of each method’s effectiveness, we need to observe model size at different levels of precision. This relates to our objective of reducing the carbon footprint of deep learning models.

Quantize Activations

Along with quantization of weights, explore quantization of activations as well.

Improve Training Algorithm

Most of the carbon emissions are caused due to the intensive computations required during the training. (e.g.) BERT and GPT-3 require a lot a computation to learn the parameters. We can explore techniques to get smart weight updates and reduce computations required during the training.

Hardware Simulations

Experiment on specialized low-precision hardware to accurately evaluate different quantization techniques.

mohitgulla / edge Goto Github PK

edge's Introduction

Energy Efficient Deep Learning

Table of Contents

About The Project

Contributors

Usage

Demo / Tutorial

Directory Structure

Methodology

Post Training Quantization

Pruning

Quantization-Aware Training

Future Work

edge's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org