submission2019 / cnn-quantization Goto Github PK
View Code? Open in Web Editor NEWQuantization of Convolutional Neural networks.
Quantization of Convolutional Neural networks.
Hi, I build cuda kernels for GEMMLOWP is "./build_all.sh" step have error:
**************************************************************
Building int quantization kernels
**************************************************************
running install
running bdist_egg
running egg_info
writing int_quantization.egg-info/PKG-INFO
writing dependency_links to int_quantization.egg-info/dependency_links.txt
writing top-level names to int_quantization.egg-info/top_level.txt
reading manifest file 'int_quantization.egg-info/SOURCES.txt'
writing manifest file 'int_quantization.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'int_quantization' extension
creating build
creating build/temp.linux-x86_64-3.6
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/TH -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/THC -I/usr/include/python3.6m -I/home/george/work/cnn-quantization/venv3/include/python3.6m -c int_quantization.cpp -o build/temp.linux-x86_64-3.6/int_quantization.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=int_quantization -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/bin/nvcc -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/TH -I/home/george/work/cnn-quantization/venv3/lib/python3.6/site-packages/torch/include/THC -I/usr/include/python3.6m -I/home/george/work/cnn-quantization/venv3/include/python3.6m -c gemmlowp.cu -o build/temp.linux-x86_64-3.6/gemmlowp.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=int_quantization -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11
nvcc fatal : Unsupported gpu architecture 'compute_75'
error: command '/usr/bin/nvcc' failed with exit status 1
Done
**************************************************************
My GPU is RTX2080Ti
CUDA Version: 10.1
Ubuntu 18.04
and command 'nvcc -V' have return information:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
Please help.
Hello,
It is really a great work on post-training quantization.
I tried your method on Q2W2 quantization and the performance drops to 0.3%. Does your method only works for Q4W4 quantization?
Thanks!
I've seen this as a previous issue but it was closed. I have access to the required HW but I am still encountering the error message shown below:
Traceback (most recent call last):
File "inference/inference_sim.py", line 25, in
from pytorch_quantizer.quantization.inference.inference_quantization_manager import QuantizationManagerInference as QM
File "/dcs/pg19/u1998253/Dissertation/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 4, in
from pytorch_quantizer.quantization import qtypes
File "/dcs/pg19/u1998253/Dissertation/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/init.py", line 1, in
from .int_quantizer import int_quantizer
File "/dcs/pg19/u1998253/Dissertation/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/int_quantizer.py", line 4, in
import int_quantization
ModuleNotFoundError: No module named 'int_quantization'
The command I run:
python3 inference/inference_sim.py -a resnet18 -b 256 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw
And it gives:
Prec@1 64.622 Prec@5 85.802
But the results reported in paper is 67.0
Did I do something wrong?
After running this command
python -m inference.inference_sim -a resnet50 -b 512 -sm use --qtype int4 -pcq_w -pcq_a -c laplace
I got the error message as following:
Traceback (most recent call last): File "inference/inference_sim.py", line 25, in <module> from pytorch_quantizer.quantization.inference.inference_quantization_manager import QuantizationManagerInference as QM File "/Users/chingandywu/master-thesis/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 3, in <module> from pytorch_quantizer.quantization import qtypes File "/Users/chingandywu/master-thesis/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/__init__.py", line 1, in <module> from .int_quantizer import int_quantizer File "/Users/chingandywu/master-thesis/cnn-quantization/inference/../pytorch_quantizer/quantization/qtypes/int_quantizer.py", line 4, in <module> import int_quantization ModuleNotFoundError: No module named 'int_quantization'
What can I do to solve this problem? I tried to run all this without CUDA because I don't have a GPU. As a result, I disabled the CUDA part in absorb_bn.py.
Using the example with --device cpu leads to crash:
python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 --device cpu
The bug comes from calling torch.cuda.clear().
Likewise when the program finishes.
cuda device should not be called in cpu mode.
Hi @ynahshan , @submission2019,
I checked your paper and code, so as per my understanding the after forward pass from each operator (example conv), there will be ACIQ based on the laplase (the input i had used for 4-bit quantizing the resnet50). So i assumed that the clipping will only clip the regions from left and right of the tensors, but when i plot the histogram i can see entirly different histogram for output, I am attaching one the histogram as reference to you, the Red line shows the input tensor and black one shows the output(clipped tensor).
Can you please explain the result which is not in sync with your paper or at least my understanding of your paper.
Regards
Amit
Hi, thank you for sharing the source code of your work, it's amazing.
I would like to inquire about pure integer inference using bias or variance correction as indicated in the paper.
when using bias and variance correction on INT 8 quantized weights, the weights change into floating-point values, therefore do you take round/floor/ ceil in order to do Pure INT computation on quantitated weights? what kind of VLC algorithm did you use as it's not indicated on the paper? How do you calculate the average number of bits for the weights?
I couldn't reproduce the results with the examples provided in the readme on my 4 GPUs.
So I used batch 256 on only 1 GPU and it works.
Adding one GPU, lower precision@5 by ~10%, so with 4 GPUs results were around 60%!!!
Please correct this bug asap.
Thanks,
--mike
Hello,
Thanks for the great codebase. I tried Experiment W4A4 + ACIQ + Bit Alloc(A) + Bit Alloc(W) + Bias correction
by
python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw
but only get accuracy 71.6. Is there any particular setting that should be followed? Thanks.
My environment:
Pytorch 1.3
Torchvision 0.4.2
Can you please explain what need to be changed for the following error ? Thank you.
python inference/inference_sim.py -a resnet50 -b 512
/home/user/anaconda3/lib/python3.7/site-packages/yaml/constructor.py:126: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if not isinstance(key, collections.Hashable):
=> using pre-trained model 'resnet50'
Perform BN folding
Traceback (most recent call last):
File "inference/inference_sim.py", line 381, in
im = InferenceModel(ml_logger)
File "inference/inference_sim.py", line 194, in init
QM().quantize_model(self.model)
File "/home/user/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 365, in quantize_model
weight_q = QMI().quantize_instant(m.weight, n + '.weight', "weight", override_att=('num_bits', 8), verbose=True)
File "/home/usr/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 343, in quantize_instant
return self.op_manager.quantize_instant(tensor, id, tag, stat_id, half_range, override_att, verbose)
File "/home/user/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 556, in quantize_instant
q = self.get_quantizer(qtag)
File "/home/user/ML/cnn-quantization/inference/../pytorch_quantizer/quantization/inference/inference_quantization_manager.py", line 510, in get_quantizer
if tag in self.quantizers:
AttributeError: 'TruncationOpManagerInference' object has no attribute 'quantizers'
Dear Yury,
I have run the code successfully as the setps in readme.Thanks for your work! And I have some questions about the model after quantization.
I try to print the parameters after quantization, I chose 'qtype=int4' and qweigh=int8, but the parameters seem to be float but not int? such as :
'conv1.weight', Parameter containing:
tensor([[[[-2.4899e-03, -1.2449e-03, 0.0000e+00, ..., 1.3694e-02,
3.7348e-03, -2.4899e-03],
[ 2.4899e-03, 2.4899e-03, -2.6144e-02, ..., -6.3492e-02,
-2.9879e-02, 1.2449e-03],
[-1.2449e-03, 1.3694e-02, 6.8472e-02, ..., 1.2076e-01,
5.9758e-02, 1.4939e-02],.....
I try to save the model by 'torch.save(self.model.stat_dict(),'resnet18_qm.pkl)', but the size of the model is as same as the original resnet18 pretrained model file. I thought it would be much smaller after quantization.
Is there any steps I missed or haven't understand the meaning if code correctly?
Thanks again and Looking forward to your reply!
Anna
what is the formula for perchannel activation quantization?
How can I quantify my model
It seems like you will work out a new set of scales for activations on every batch, and use the new set of scales to quantize and dequantize activations?
Quantize activation_pooling | Id - None | IntQuantizer - [bits: 8, clipping: no, bit_alloc_act: False, bit_alloc_weight: False, pcq_w: False, pcq_a: False, bcorr_act: False, bcorr_weight: False, vcorr_weight: False, kind: mean] | cuda:0
Segmentation fault (core dumped)
I am getting above mentioned error even if I try it with CPU? Can you help me with this?
Hi Author, highly appreciate the project. Just wondering, that you have mentioned 512 as default batch size. Will there be any effect if i reduce the batch size to lets say 32 or less. Just wanted to know your observation on this point.
Also i have one query, you see i have used the sameple command given in landing page as below.
python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw
The accuracy is as you have mentioned.
But when i saved the model post validation. The model size stays almost same, could find only reduced by 80KB max. Is it same with your observation or i did something wrong?
Hi @submission2019 ,
First of all i would like to congratulate you guys for coming up with this paper and opening the github project for the analysis. I have gone though your paper and github project deeply, and i would like to know the following : -
So may be there is a bigger picture which i am not able to see, can you guys please point me to the right direction.
Regards
Amit
After I run the following commands one by one: (they are all the default commands)
python inference/inference_sim.py -a resnet50 -b 256 -sm collect -ac --qtype int4
python inference/inference_sim.py -a resnet50 -b 256 -sm collect -ac --qtype int4 -pcq_a
python pytorch_quantizer/quantization/kmeans_quantization.py -a resnet50 -bits 4 -t quantize
python inference/inference_sim.py -a resnet50 -b 512 -sm use --qtype int4 -pcq_w -pcq_a -c laplace -qm 4 -qw f32
Then it gives me these lines and weird classification accuracy:
=> using pre-trained model 'resnet50'
Perform BN folding
Test: [0/98] Time 24.867 (24.867) Loss 9.3266 (9.3266) Prec@1 0.586 (0.586) Prec@5 2.539 (2.539)
Test: [10/98] Time 1.649 (5.473) Loss 9.9686 (10.5482) Prec@1 0.781 (0.337) Prec@5 3.906 (1.935)
Test: [20/98] Time 16.422 (5.568) Loss 8.8218 (10.4333) Prec@1 0.195 (0.456) Prec@5 2.148 (1.990)
Test: [30/98] Time 1.644 (5.145) Loss 10.8783 (10.3225) Prec@1 0.000 (0.372) Prec@5 0.195 (1.928)
Test: [40/98] Time 14.035 (5.262) Loss 10.0452 (10.3040) Prec@1 0.781 (0.438) Prec@5 3.711 (2.158)
Test: [50/98] Time 1.646 (5.140) Loss 9.2239 (10.1178) Prec@1 0.391 (0.724) Prec@5 1.758 (2.742)
Test: [60/98] Time 15.470 (5.170) Loss 10.2627 (10.0315) Prec@1 0.000 (0.832) Prec@5 1.367 (3.099)
Test: [70/98] Time 1.637 (5.067) Loss 9.7789 (9.9481) Prec@1 0.000 (0.935) Prec@5 0.781 (3.381)
Test: [80/98] Time 14.160 (5.121) Loss 9.5262 (9.8638) Prec@1 1.758 (1.059) Prec@5 7.031 (3.771)
Test: [90/98] Time 1.667 (5.066) Loss 9.0029 (9.8053) Prec@1 0.781 (1.011) Prec@5 4.297 (3.728)
* Prec@1 1.066 Prec@5 4.016
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.