submission2019 / cnn-quantization Goto Github PK

View Code? Open in Web Editor NEW

234.0 234.0 59.0 2.78 MB

Quantization of Convolutional Neural networks.

Python 78.43% Shell 0.15% Cuda 0.70% C++ 0.16% Jupyter Notebook 20.56%

convolutional-neural-networks gemmlowp-quantization quantization

cnn-quantization's People

Contributors

Stargazers

Watchers

cnn-quantization's Issues

error: command '/usr/bin/nvcc' failed with exit status 1

Hi, I build cuda kernels for GEMMLOWP is "./build_all.sh" step have error:

**************************************************************
Building int quantization kernels
**************************************************************
running install
running bdist_egg
running egg_info
writing int_quantization.egg-info/PKG-INFO
writing dependency_links to int_quantization.egg-info/dependency_links.txt
writing top-level names to int_quantization.egg-info/top_level.txt
reading manifest file 'int_quantization.egg-info/SOURCES.txt'
writing manifest file 'int_quantization.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'int_quantization' extension
creating build
creating build/temp.linux-x86_64-3.6
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/TH -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/THC -I/usr/include/python3.6m -I/home/george/work/cnn-quantization/venv3/include/python3.6m -c int_quantization.cpp -o build/temp.linux-x86_64-3.6/int_quantization.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=int_quantization -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/bin/nvcc -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/TH -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/THC -I/usr/include/python3.6m -I/home/george/work/cnn-quantization/venv3/include/python3.6m -c gemmlowp.cu -o build/temp.linux-x86_64-3.6/gemmlowp.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=int_quantization -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11
nvcc fatal   : Unsupported gpu architecture 'compute_75'
error: command '/usr/bin/nvcc' failed with exit status 1
Done
**************************************************************

My GPU is RTX2080Ti
CUDA Version: 10.1
Ubuntu 18.04

and command 'nvcc -V' have return information:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Please help.

The precision of Q2W2 model drops to 0.3%

Hello,

It is really a great work on post-training quantization.

I tried your method on Q2W2 quantization and the performance drops to 0.3%. Does your method only works for Q4W4 quantization?

Thanks!

No module named 'int_quantization'

I've seen this as a previous issue but it was closed. I have access to the required HW but I am still encountering the error message shown below:

Traceback (most recent call last):
File "inference/inference_sim.py", line 25, in
from pytorch_quantizer.quantization.inference.inference_quantization_manager import QuantizationManagerInference as QM
File "/dcs/pg19/u1998253/Dissertation/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 4, in
from pytorch_quantizer.quantization import qtypes
File "/dcs/pg19/u1998253/Dissertation/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/init.py", line 1, in
from .int_quantizer import int_quantizer
File "/dcs/pg19/u1998253/Dissertation/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/int_quantizer.py", line 4, in
import int_quantization
ModuleNotFoundError: No module named 'int_quantization'

Reproduce the results of Resnet18 @a4w4

The command I run:
python3 inference/inference_sim.py -a resnet18 -b 256 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw
And it gives:
Prec@1 64.622 Prec@5 85.802

But the results reported in paper is 67.0

Did I do something wrong?

NO module named 'int_quantization'

After running this command
python -m inference.inference_sim -a resnet50 -b 512 -sm use --qtype int4 -pcq_w -pcq_a -c laplace
I got the error message as following:
Traceback (most recent call last): File "inference/inference_sim.py", line 25, in <module> from pytorch_quantizer.quantization.inference.inference_quantization_manager import QuantizationManagerInference as QM File "/Users/chingandywu/master-thesis/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 3, in <module> from pytorch_quantizer.quantization import qtypes File "/Users/chingandywu/master-thesis/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/__init__.py", line 1, in <module> from .int_quantizer import int_quantizer File "/Users/chingandywu/master-thesis/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/int_quantizer.py", line 4, in <module> import int_quantization ModuleNotFoundError: No module named 'int_quantization'

What can I do to solve this problem? I tried to run all this without CUDA because I don't have a GPU. As a result, I disabled the CUDA part in absorb_bn.py.

--device cpu uses cuda and crash

Using the example with --device cpu leads to crash:
python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 --device cpu

The bug comes from calling torch.cuda.clear().

Likewise when the program finishes.

cuda device should not be called in cpu mode.

ACIQ clipping not in sync with paper

Hi @ynahshan , @submission2019,
I checked your paper and code, so as per my understanding the after forward pass from each operator (example conv), there will be ACIQ based on the laplase (the input i had used for 4-bit quantizing the resnet50). So i assumed that the clipping will only clip the regions from left and right of the tensors, but when i plot the histogram i can see entirly different histogram for output, I am attaching one the histogram as reference to you, the Red line shows the input tensor and black one shows the output(clipped tensor).
Can you please explain the result which is not in sync with your paper or at least my understanding of your paper.

Regards
Amit

Inquiry about integer inference with bias or variance correction

Hi, thank you for sharing the source code of your work, it's amazing.
I would like to inquire about pure integer inference using bias or variance correction as indicated in the paper.
when using bias and variance correction on INT 8 quantized weights, the weights change into floating-point values, therefore do you take round/floor/ ceil in order to do Pure INT computation on quantitated weights? what kind of VLC algorithm did you use as it's not indicated on the paper? How do you calculate the average number of bits for the weights?

multi-gpu results WRONG

I couldn't reproduce the results with the examples provided in the readme on my 4 GPUs.
So I used batch 256 on only 1 GPU and it works.
Adding one GPU, lower precision@5 by ~10%, so with 4 GPUs results were around 60%!!!

Please correct this bug asap.

Thanks,
--mike

Q4Q4 post-train Resnet-50 cannot get accuracy 73.3.

Hello,

Thanks for the great codebase. I tried Experiment W4A4 + ACIQ + Bit Alloc(A) + Bit Alloc(W) + Bias correction by
python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw
but only get accuracy 71.6. Is there any particular setting that should be followed? Thanks.

My environment:
Pytorch 1.3
Torchvision 0.4.2

AttributeError: 'TruncationOpManagerInference' object has no attribute 'quantizers'

Can you please explain what need to be changed for the following error ? Thank you.

python inference/inference_sim.py -a resnet50 -b 512
/home/user/anaconda3/lib/python3.7/site-packages/yaml/constructor.py:126: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if not isinstance(key, collections.Hashable):
=> using pre-trained model 'resnet50'
Perform BN folding
Traceback (most recent call last):
File "inference/inference_sim.py", line 381, in
im = InferenceModel(ml_logger)
File "inference/inference_sim.py", line 194, in init
QM().quantize_model(self.model)
File "/home/user/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 365, in quantize_model
weight_q = QMI().quantize_instant(m.weight, n + '.weight', "weight", override_att=('num_bits', 8), verbose=True)
File "/home/usr/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 343, in quantize_instant
return self.op_manager.quantize_instant(tensor, id, tag, stat_id, half_range, override_att, verbose)
File "/home/user/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 556, in quantize_instant
q = self.get_quantizer(qtag)
File "/home/user/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 510, in get_quantizer
if tag in self.quantizers:
AttributeError: 'TruncationOpManagerInference' object has no attribute 'quantizers'

Question about the model parameters after quantization

Dear Yury,
I have run the code successfully as the setps in readme.Thanks for your work! And I have some questions about the model after quantization.
I try to print the parameters after quantization, I chose 'qtype=int4' and qweigh=int8, but the parameters seem to be float but not int? such as :
'conv1.weight', Parameter containing:
tensor([[[[-2.4899e-03, -1.2449e-03, 0.0000e+00, ..., 1.3694e-02,
3.7348e-03, -2.4899e-03],
[ 2.4899e-03, 2.4899e-03, -2.6144e-02, ..., -6.3492e-02,
-2.9879e-02, 1.2449e-03],
[-1.2449e-03, 1.3694e-02, 6.8472e-02, ..., 1.2076e-01,
5.9758e-02, 1.4939e-02],.....
I try to save the model by 'torch.save(self.model.stat_dict(),'resnet18_qm.pkl)', but the size of the model is as same as the original resnet18 pretrained model file. I thought it would be much smaller after quantization.
Is there any steps I missed or haven't understand the meaning if code correctly?
Thanks again and Looking forward to your reply!

Anna

activation per channel

what is the formula for perchannel activation quantization?

How can I quantify my model

question about the quantization of activation

It seems like you will work out a new set of scales for activations on every batch, and use the new set of scales to quantize and dequantize activations?

Segmentation Fault

Quantize activation_pooling | Id - None | IntQuantizer - [bits: 8, clipping: no, bit_alloc_act: False, bit_alloc_weight: False, pcq_w: False, pcq_a: False, bcorr_act: False, bcorr_weight: False, vcorr_weight: False, kind: mean] | cuda:0
Segmentation fault (core dumped)

I am getting above mentioned error even if I try it with CPU? Can you help me with this?

Any effect of batch size reduction?

Hi Author, highly appreciate the project. Just wondering, that you have mentioned 512 as default batch size. Will there be any effect if i reduce the batch size to lets say 32 or less. Just wanted to know your observation on this point.

Also i have one query, you see i have used the sameple command given in landing page as below.

python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw

The accuracy is as you have mentioned.
But when i saved the model post validation. The model size stays almost same, could find only reduced by 80KB max. Is it same with your observation or i did something wrong?

Advantage of the 4-bit Quantization

Hi @submission2019 ,
First of all i would like to congratulate you guys for coming up with this paper and opening the github project for the analysis. I have gone though your paper and github project deeply, and i would like to know the following : -

What is the advantage of this approach over 8-bit quantization ? Since all the operation should be byte aligned that mean mathematical operations should least be 8-bit also storage part also seems to be 8-bit aligned so i can not understand where the advantage lies in doing 4-bit quantization ? Also i can see there is a drop of accuracy of about 2~3% compared to 8-bit quantization.

So may be there is a bigger picture which i am not able to see, can you guys please point me to the right direction.

Regards
Amit

can not reproduce the w4a4 quantization results based on the default commands

After I run the following commands one by one: (they are all the default commands)

python inference/inference_sim.py -a resnet50 -b 256 -sm collect -ac --qtype int4

python inference/inference_sim.py -a resnet50 -b 256 -sm collect -ac --qtype int4 -pcq_a

python pytorch_quantizer/quantization/kmeans_quantization.py -a resnet50 -bits 4 -t quantize

python inference/inference_sim.py -a resnet50 -b 512 -sm use --qtype int4 -pcq_w -pcq_a -c laplace -qm 4 -qw f32

Then it gives me these lines and weird classification accuracy:

=> using pre-trained model 'resnet50'
Perform BN folding
Test: [0/98]	Time 24.867 (24.867)	Loss 9.3266 (9.3266)	Prec@1 0.586 (0.586)	Prec@5 2.539 (2.539)
Test: [10/98]	Time 1.649 (5.473)	Loss 9.9686 (10.5482)	Prec@1 0.781 (0.337)	Prec@5 3.906 (1.935)
Test: [20/98]	Time 16.422 (5.568)	Loss 8.8218 (10.4333)	Prec@1 0.195 (0.456)	Prec@5 2.148 (1.990)
Test: [30/98]	Time 1.644 (5.145)	Loss 10.8783 (10.3225)	Prec@1 0.000 (0.372)	Prec@5 0.195 (1.928)
Test: [40/98]	Time 14.035 (5.262)	Loss 10.0452 (10.3040)	Prec@1 0.781 (0.438)	Prec@5 3.711 (2.158)
Test: [50/98]	Time 1.646 (5.140)	Loss 9.2239 (10.1178)	Prec@1 0.391 (0.724)	Prec@5 1.758 (2.742)
Test: [60/98]	Time 15.470 (5.170)	Loss 10.2627 (10.0315)	Prec@1 0.000 (0.832)	Prec@5 1.367 (3.099)
Test: [70/98]	Time 1.637 (5.067)	Loss 9.7789 (9.9481)	Prec@1 0.000 (0.935)	Prec@5 0.781 (3.381)
Test: [80/98]	Time 14.160 (5.121)	Loss 9.5262 (9.8638)	Prec@1 1.758 (1.059)	Prec@5 7.031 (3.771)
Test: [90/98]	Time 1.667 (5.066)	Loss 9.0029 (9.8053)	Prec@1 0.781 (1.011)	Prec@5 4.297 (3.728)
 * Prec@1 1.066 Prec@5 4.016

submission2019 / cnn-quantization Goto Github PK

cnn-quantization's People

Contributors

Stargazers

Watchers

Forkers

cnn-quantization's Issues

Recommend Projects

Recommend Topics

Recommend Org