yhhhli / brecq Goto Github PK
View Code? Open in Web Editor NEWPytorch implementation of BRECQ, ICLR 2021
License: MIT License
Pytorch implementation of BRECQ, ICLR 2021
License: MIT License
the scale and offset of UniformAffineQuantizer are tensor-type data after I finished quantization.
How to convert them to scalar data ,used to generate quantization form. @yhhhli
such as:
Encoding:{
bitwidth: integer
is_symmetric: string
max: float
min: float
offset: integer
scale: float
}
and
如文中所提出的Fisher-diag方式来估计Hessian矩阵,需要计算每一层pre-activation的梯度。但在实际代码运行时,save_grad_data中的cur_grad = get_grad(cali_data[i * batch_size:(i + 1) * batch_size])在执行到第二个batch的时候会报错Trying to backward through the graph a second time,第一个batch的数据并不会报错。不知道作者是否遇到过类似的情况?
请问论文中提到的fasterRCNN RetinaNet 网络的全精度模型参数是从哪里下载的呢?
Hi,
Very impressive coding.
https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet50_imagenet.pth.tar
I tested the pre-trained model resnet50 with only 76.62 performance, but the paper wrote 77.00.
Thanks
Hi,
Very impressive coding.
There is a question about the quantization of activation values.
In the code:
Why can it be replaced like this?
Thanks
Greetings,
I was trying to reproduce the layer-wise reconstruction compared to block-wise. I have commented out the first two lines of this loop of quant_model.py
However, the accuracy has drastically dropped to 0.62% for ResNet 18 (W2), which cannot reach the 65.19% accuracy from your paper. Could you briefly introduce the way of applying layer-wise reconstruction?
Thanks.
Hey,
Im just wondering what BRECQ is stand for? BR as block reconstruction but what about the other letters?
Hi,
Thanks for the release of your code. But I have one problem regarding the detail of the implementation.
In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example.
The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.
class QuantBasicBlock(BaseQuantBlock):
"""
Implementation of Quantized BasicBlock used in ResNet-18 and ResNet-34.
"""
def __init__(self, basic_block: BasicBlock, weight_quant_params: dict = {}, act_quant_params: dict = {}):
super().__init__(act_quant_params)
self.conv1 = QuantModule(basic_block.conv1, weight_quant_params, act_quant_params)
self.conv1.activation_function = basic_block.relu1
self.conv2 = QuantModule(basic_block.conv2, weight_quant_params, act_quant_params, disable_act_quant=True)
# modify the activation function to ReLU
self.activation_function = basic_block.relu2
if basic_block.downsample is None:
self.downsample = None
else:
self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params,
disable_act_quant=True)
# copying all attributes in original block
self.stride = basic_block.stride
It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper.
[1] and [2] denotes the modification I did to the original code.
[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False;
[2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False
But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer.
So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this?
Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?
user@machine:/path_to/BRECQ# python main_imagenet.py --data_path /path_to/IMAGENET_2012/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration
You are using fake SyncBatchNorm2d who is actually the official BatchNorm2d
==> Using Pytorch Dataset
Downloading: "https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet18_imagenet.pth.tar" to /root/.cache/torch/hub/checkpoints/resnet18_imagenet.pth.tar
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.6M/44.6M [00:27<00:00, 1.70MB/s]
Traceback (most recent call last):
File "main_imagenet.py", line 178, in
cnn.cuda()
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 593, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Hi Yuhang,
Thank you for open sourcing this project.
As noted in the paper that diagonal fisher information matrix is applied to replace the pre-activation Hessian, we tried to set opt_mode
to fisher_diag
instead of mse
for reconstruction. However, a runtime error is thrown:
File "xxxx/quant/data_utils.py", line 184, in __call__
loss.backward()
File "xxxx/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "xxxx/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
It seems occuring during backward to save grad:
handle = self.layer.register_backward_hook(self.data_saver)
with torch.enable_grad():
try:
self.model.zero_grad()
inputs = model_input.to(self.device)
self.model.set_quant_state(False, False)
out_fp = self.model(inputs)
quantize_model_till(self.model, self.layer, self.act_quant)
out_q = self.model(inputs)
loss = F.kl_div(F.log_softmax(out_q, dim=1), F.softmax(out_fp, dim=1), reduction='batchmean')
# here....
loss.backward()
except StopForwardException:
pass
As indicated by the error, first backward succeeds but second fails.
We tried to create a very simple network for reproducing and the error keeps showing:
class DummyNet(nn.Module):
def __init__(self):
super(DummyNet, self).__init__()
self.conv1 = nn.Conv2d(3, 32, 3, 3)
self.conv2 = nn.Conv2d(32, 32, 3, 3)
self.conv3 = nn.Conv2d(32, 1, 3, 3)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
output = F.log_softmax(x, dim=0)
return output
recon_model
function is the same as that in the main_imagenet
file:
def recon_model(model: nn.Module):
"""
Block reconstruction. For the first and last layers, we can only apply layer reconstruction.
"""
for name, module in model.named_children():
if isinstance(module, QuantModule):
if module.ignore_reconstruction is True:
print('Ignore reconstruction of layer {}'.format(name))
continue
else:
layer_reconstruction(qnn, module, **kwargs)
elif isinstance(module, BaseQuantBlock):
if module.ignore_reconstruction is True:
print('Ignore reconstruction of block {}'.format(name))
continue
else:
print('Reconstruction for block {}'.format(name))
block_reconstruction(qnn, module, **kwargs)
else:
recon_model(module)
We are not quite sure why PyTorch complains here as backward
only calls once in a batch... But we also noticed that after calling save_grad_data
, grad would be cached for later loss calculation:
# in block_reconstruction
err = loss_func(out_quant, cur_out, cur_grad)
Is intermediate grad still available at this point since backward has already been called? In our case, even we workaround for the first error inside save_grad_data
, here we would get a same one (i. e. backward twice)
Environment
Ubuntu 16.04 / Python 3.6.8 / PyTorch 1.7.1 / CUDA 10.1
Any advice would be appreciated.
Hi,
Very impressive coding.
Last layer bit-width setting, especially restoring to 8-bit, seems strange:
module_list[-1].weight_quantizer.bitwidth_refactor(8)
module_list[-2].act_quantizer.bitwidth_refactor(8)
The weights of the last, usually dense, layer are set to 8-bits.
However, the activations of the preceeding layer are also set to 8-bit.
Was this your intention or is it a bug?
Thanks,
Ilan.
Line 91 in 2888b29
What is the purpose for setting retain_graph=True
?
According to the paper appendix B.4.3 latency acquisition, the simulator is available in the provided source codes. But how can I find it? thx.
Hi, I am running the command exactly as written but am receiving very low accuracy:
This is the command:
python3 main_imagenet.py --data_path '/home/ofekglick/BRECQ/tiny-imagenet-200' --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration
I am receiving an accuracy of about 0.05% , both before and after quantizing
I am running the code on the tiny-imagenet-200 dataset.
Any idea why this could happen?
Thank you very much for your work,
I refer to your code modification yolov5,When w4a8 quantizing There are nearly 3points of loss,Have you experimented yolov5
Hi, it is a nice code for a quantization.
However, I have some questions about sensitivity measurement and genetic algorithm.
In the paper, you expressed that layer sensitivity can be measured with diagonal loss while off-diagonal loss expresses cross-layer sensitivity. However, I could not find such a loss term in your code in this git..... if you do not mind, please let me know where you wrote those terms?
Also, could you please tell me where are your implementation details of scale data for weights and activation? Where do you calculation them and improve them?
It was quite interesting that using genetic algorithm to find the optimal bitwidth configuration for each block. But as well as sensitivity problem, I could not find such a code for this algorithm in the git. Again, if you do not mind it either, please let me know where it is.
By the way, I am really impressed with your paper and code for new quantization method. Hope you to have a nice day. Thank you and sorry for the questions!
Greetings,
Really appreciate your open source contribution.
However, it seems the accuracy mentioned in the paper cannot be reproduced applying the standard Imagenet. For instance, with the full precision model, I have tested Resnet 18 (70.186%), MobileNetv2(71.618%), which is slightly lower than the results from your paper (71.08, 72.49 respectively).
Have you utilized any preprocessing techniques other than imagenet.build_imagenet_data?
Thanks
Hello, I have read the quantitative results of Faster RCNN reported in your paper. Could you please release your quantified weight file? Thank you
Hi,
in quant_layer.py
, in forward
function of QuantModule
, why is bias not quantized?
def forward(self, input: torch.Tensor):
if self.use_weight_quant:
weight = self.weight_quantizer(self.weight)
bias = self.bias
else:
weight = self.org_weight
bias = self.org_bias
out = self.fwd_func(input, weight, bias, **self.fwd_kwargs)
...
请问 敏感度获取 和 位宽分配 的代码能否开源?
Hello, thank you for an interesting paper and nice code.
I have two questions concerning implementation details.
cached_grads = cached_grads.abs() + 1.0
# scaling to make sure its mean is 1
# cached_grads = cached_grads * torch.sqrt(cached_grads.numel() / cached_grads.pow(2).sum())
Thank you for your time and consideration!
can you provide normal 8/8 bits quantization results of Resnet18 & mobilenetV2 on imagenet? since this is the most widely used version.
BRECQ/quant/adaptive_rounding.py
Lines 50 to 51 in 819d440
Would you elaborate how you derived hard-rounding scheme and what's the use of it?
Hi, it is a nice code for a quantization.
However, I have some questions about sensitivity measurement and genetic algorithm.
In the paper, you expressed that layer sensitivity can be measured with diagonal loss while off-diagonal loss expresses cross-layer sensitivity. However, I could not find such a loss term in your code in this git..... if you do not mind, please let me know where you wrote those terms.
It was quite interesting that using genetic algorithm to find the optimal bitwidth configuration for each block. But as well as sensitivity problem, I could not find such a code for this algorithm in the git. Again, if you do not mind it either, please let me know where it is.
By the way, I am really impressed with your paper and code for new quantization method. Hope you to have a nice day. Thank you!
p.s. when you mentioned about examining permutations, you said that it would be 3^n permutations while n is the number of layers in a block. and 3 is the number of bit candidates(2,4,6). However, due to negligible performance drop with 4bits and 8 bits quantization, only 2 bit permutation is considered according to your paper. However, if only 2 bit permutation is considered, shouldn't it be 1^n permutations per each block? I am a bit confused with this part.
I tried running your code for with a pre-trained ResNet50 and MobilieNetV2 model. I got loss function value for output and pred losses:
rec_loss = lp_loss(pred, tgt, p=self.p)
:param pred: output from quantized model
:param tgt: output from FP model
https://github.com/yhhhli/BRECQ/blob/main/quant/block_recon.py#L149
pd_loss = self.pd_loss(F.log_softmax(output / self.T, dim=1), F.softmax(output_fp / self.T, dim=1)) / self.lam
:param pred: output from quantized model
:param tgt: output from FP model
https://github.com/yhhhli/BRECQ/blob/main/quant/block_recon.py#L151
Are there additional settings I missed?
想问一下雨杭大佬,你们有没有试过AdaQuant他们的方法,去掉权重更新的范围限制来进行重构呢?这样子理论上量化的效果是不是会有进一步提升?
If my target is to get a full quantization model, does it necessary to do weight quantization reconstruction before full quantization reconstruction as the main_imagenet.py
shown? Can I skip weight quantization reconstruction and turn on full quantization reconstruction directly?
Hi,
So I tried running your code on CIFAR-10 with a pre-trained ResNet50 model. I've attached the code below.
My accuracy however does not come nearly as close to the float model which is around 93% but after quanitzation: I get:
Please help me with this. The code is inside the zip file.
Hi, nice idea for quantizaton
But it seems that the paper(not include the appendix) did not point that it is channel-wise quantization. however, the code showed it is.
As we know, it is of course that channel-wise quntization would outperform layer-wise quantization.
So, maybe it's hard to say that the performance of your method is close to QAT
Got an error:
Traceback (most recent call last):
File "main_imagenet.py", line 198, in <module>
print('Quantized accuracy before brecq: {}'.format(validate_model(test_loader, qnn)))
File "/home/xxxx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "main_imagenet.py", line 108, in validate_model
acc1, acc5 = accuracy(output, target, topk=(1, 5))
File "main_imagenet.py", line 77, in accuracy
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
So suggest replacing .view
with .reshape
in accuracy()
function.
command:
python3 main_imagenet.py --data_path image --arch mobilenetv2 --n_bits_w 2 --n_bits_a 4 --channel_wise --weight 0.1 --act_quant
result:
Full quantization (W2A4) accuracy: 0.1419999897480011
How to reproduce mobilenetv2 w2a4 result?
thanks
I used the model (resnet18) provided by the project and the default parameters of the code to run W4A4 quantization. The results are quite different from those in the paper. Are there any other specific settings required to activate quantization to 4 bits?
CUDA_VISIBLE_DEVICES=3 python main_imagenet.py --data_path /disk2/imagenet/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration | tee w2a4_test.log
CUDA_VISIBLE_DEVICES=3 python main_imagenet.py --data_path /disk2/imagenet/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant | tee w2a4.log
the outcome of w2a4 and w2a32 after reconstruction are different when I remove the hyperparameter 'test_before_calibration', but this one wouldn't modify the seed. I'm wondering why and looking forward to your reply.
We use the APoT method on COCO dataset ,but the mAP is very low,why?Thx! @yhhhli @blackandredplayerinfuture
So I tried running your code on my dataset with a pre-trained ResNet50 model. I got these results
Full precision model i got accuracy of : MobileNetV2 (58.19)
Quantized model (W8A8) i got accuracy of : MobileNetV2 (12.02)
Quantized model (W6A6) i got accuracy of : MobileNetV2 (10.12)
Full precision model i got accuracy of : ResNet-50 (65.16)
Quantized model (W8A8) i got accuracy of : ResNet-50 (13.22)
Quantized model (W6A6) i got accuracy of : ResNet-50 (11.02)
https://github.com/yhhhli/BRECQ/blob/main/main_imagenet.py#L201C1-L229C87
My accuracy however does not come nearly as close to the float model which is around 58.19% and 65.16% but after quantization
Are there additional settings I missed?
我迁移过程中,retinanet图片大小不一致之类的问题怎么解决啊?还有那种score为0的问题
Hi, impressive work.
The implementation of quant_layer forward is shown below:
Lines 193 to 210 in 2888b29
I found the layer only quant the weight and output, instead of input. The inner convolution layers can be quantized correctly by the act_quantizer at the end of the quant block. But there still are two problems.
Are there additional settings I missed?
If that is the exact setting. But it seems that the paper did not mention that.
Dose the settings are the same for AdaQuant or other baseline/workloads?
Thanks.
On my eyes, your code is just running with single gpu while I need to test this code with multi-gpu for other implementations. I just want to check that you have ran your code using data parallel and distributed data parallel.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.