nvlabs / condensa Goto Github PK

Programmable Neural Network Compression

License: Apache License 2.0

Python 86.13% Jupyter Notebook 13.70% Shell 0.18%

model-compression model-pruning deep-neural-networks

condensa's Introduction

A Programming System for Neural Network Compression

Note: the original version of Condensa (contained in this branch) is no longer actively maintained. Please check out the lite branch for the most up-to-date version.

Condensa is a framework for programmable model compression in Python. It comes with a set of built-in compression operators which may be used to compose complex compression schemes targeting specific combinations of DNN architecture, hardware platform, and optimization objective. To recover any accuracy lost during compression, Condensa uses a constrained optimization formulation of model compression and employs an Augmented Lagrangian-based algorithm as the optimizer.

Status: Condensa is under active development, and bug reports, pull requests, and other feedback are all highly appreciated. See the contributions section below for more details on how to contribute.

Supported Operators and Schemes

Condensa provides the following set of pre-built compression schemes:

The schemes above are built using one or more compression operators, which may be combined in various ways to define your own custom schemes.

Please refer to the documentation for a detailed description of available operators and schemes.

Prerequisites

Condensa requires:

A working Linux installation (we use Ubuntu 18.04)
NVIDIA drivers and CUDA 10+ for GPU support
Python 3.5 or newer
PyTorch 1.0 or newer

Installation

The most straightforward way of installing Condensa is via pip:

pip install condensa

Installation from Source

Retrieve the latest source code from the Condensa repository:

git clone https://github.com/NVlabs/condensa.git

Navigate to the source code directory and run the following:

pip install -e .

Test out the Installation

To check the installation, run the unit test suite:

bash run_all_tests.sh -v

Getting Started

The AlexNet Notebook contains a simple step-by-step walkthrough of compressing a pre-trained model using Condensa. Check out the examples folder for additional, more complex examples of using Condensa (note: some examples require the torchvision package to be installed).

Documentation

Documentation is available here. Please also check out the Condensa paper for a detailed description of Condensa's motivation, features, and performance results.

Contributing

We appreciate all contributions, including bug fixes, new features and documentation, and additional tutorials. You can initiate contributions via Github pull requests. When making code contributions, please follow the PEP 8 Python coding standard and provide unit tests for the new features. Finally, make sure to sign off your commits using the -s flag or adding Signed-off-By: Name<Email> in the commit message.

Citing Condensa

If you use Condensa for research, please consider citing the following paper:

@article{condensa2020,
  title={A Programmable Approach to Neural Network Compression}, 
  author={V. {Joseph} and G. L. {Gopalakrishnan} and S. {Muralidharan} and M. {Garland} and A. {Garg}},
  journal={IEEE Micro}, 
  year={2020},
  volume={40},
  number={5},
  pages={17-25},
  doi={10.1109/MM.2020.3012391}
}

Disclaimer

Condensa is a research prototype and not an official NVIDIA product. Many features are still experimental and yet to be properly documented.

condensa's People

Contributors

Stargazers

Watchers

condensa's Issues

How to Measure Performance?

I tried to execute compression from the Alexnet notebook example.
It gave me the output of AlexNet_MEM.pth and AlexNet_FLOP.pth

Then I tried to load the model and compare the memory with 'nvidia-smi' command.
Unfortunately, I don't see any memory improvement for both.

I also tried to compare the throughput by comparing time to do inference for each model.
Again, I'm unable to see improvement. All of them took almost the same time to do inference.

Could you advice how did you compare the performance?

Hybrid INT8 Quantization Support

Investigate adding support for hybrid quantization (int8 weights, float32 biases & activations).

Resources:

Filter and neuron pruning don't work in scheme composer (lite branch)

Filter and neuron pruning do not have add_mask_to_module() support so there is a NotImplementedError raised here or here. I was thinking I could catch and handle this exception since the masks aren't a necessity, but raising the exception breaks out of the for loop iterating over the layers in the scheme composer and leaves the remaining layers dense.

I am pretty sure saving the masks in the layer module isn't a necessity, so raising an exception seems like overkill because it stops the model from being pruned. You could also handle the exception inside the scheme composer's for loop so that it doesn't break out and is able to prune all the layers.

I can submit a PR myself if you want to describe which route you want to take for fixing this. Obviously supporting mask saving for filter and neuron pruning would also work, but I'm not sure what work needs to be done for that.

Mixed-Precision L-Step

Use fp16 mixed-precision in the L-step to enable Tensor Core utilization. Figure out how to integrate Apex-AMP.

References

Maybe a minor error in the code snippet for "Setting up the Optimizer"?

Hello,
In the "Setting up the Optimizer" part from the LeNet5 tutorial, there is a text description as follows.

In our case, we run the L-C algorithm for 40 iterations using the hyper-parameter values shown above. LC hyper-parameter values for a number of common convolutional neural networks are also included in the /workspace/condensa/examples folder in the container.

However, the steps argument for the function condensa.opt.LC is set to 2 in the given code snippet. Shouldn't the 'steps' argument be set to 40 instead?

Network Thinning Support

thx for your work, I check the library and not find the thin method which mentioned in the article of "A PROGRAMMABLE APPROACH TO MODEL COMPRESSION"

Replace view with reshape?

Hi, we were running condensa on the CIFAR-10 and Alexnet example. Our versions are:

Python version: 3.8.5
Pytorch version: 1.12.1+cu102

we get the following error at condensa/util.py line 112

correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one \
dimension spans across two contiguous subspaces). Use .reshape(...) instead.

We tried just replacing view with reshape and things seem to work okay. But I don't know Pytorch enough to know if this is the correct thing to do.

Thanks!