Giter Club home page Giter Club logo

ppq's Introduction

Banner

PPL Quantization Tool 0.6.6 (PPL 量化工具)

PPQ 是一个可扩展的、高性能的、面向工业应用的神经网络量化工具。

神经网络量化,作为一种常用的神经网络加速方案自 2016 年以来被广泛地应用。相比于神经网络剪枝与架构搜索,网络量化的泛用性更强,具有较高的工业实用价值。特别是对于端侧芯片而言,在片上面积与功耗都受到限制的场景下,我们总是希望将所有浮点运算转换为定点运算。量化技术的价值在于浮点运算与访存是十分昂贵的,它依赖于复杂的浮点运算器以及较高的访存带宽。如果我们能够在可接受的范围内使用较低位宽的定点运算近似浮点结果,这将使得我们在芯片电路设计、系统功耗、系统延迟与吞吐量等多方面获得显著的优势。

我们正处在时代的浪潮之中,基于神经网络的人工智能正快速发展,图像识别、图像超分辨率、内容生成、模型重建等技术正改变我们的生活。与之俱来的,是不断变化的模型结构,成为摆在模型量化与部署前的第一道难关。为了处理复杂结构,我们设计了完整的计算图逻辑结构与图调度逻辑,这些努力使得 PPQ 能够解析并修改复杂的模型结构,自动判定网络中的量化区与非量化区,并允许用户对调度逻辑进行手动控制。

网络的量化与性能优化是严峻的工程问题,我们希望用户能够参与到网络的量化与部署过程中来,参与到神经网络的性能优化中来。为此我们在 Github 中提供相应的与部署相关学习资料,并在软件设计上刻意强调接口的灵活性。在我们不断的尝试与探索中,我们抽象出量化器这一逻辑类型,负责初始化不同硬件平台上的量化策略,并允许用户自定义网络中每一个算子、每一个张量的量化位宽、量化粒度与校准算法等。我们将量化逻辑重组为27个独立的量化优化过程 (Quantization Optimization Pass),PPQ 的用户可以根据需求任意组合优化过程,完成高度灵活的量化任务。作为 PPQ 的使用者,您能够根据需求新增、修改所有优化过程,探索量化技术的新边界。

这是一个为处理复杂量化任务而生的框架 —— PPQ 的执行引擎是专为量化设计的,截止 PPQ 0.6.6 版本,软件一共内置 99 种常见的 Onnx 算子执行逻辑,并原生支持执行过程中的量化模拟操作。PPQ 可以脱离 Onnxruntime 完成 Onnx 模型的推理与量化。作为架构设计一部分,我们允许用户使用 Python + Pytorch 或 C++ / Cuda 为 PPQ 注册新的算子实现,新的逻辑亦可替换现有的算子实现逻辑。PPQ 允许相同的算子在不同平台上有不同的执行逻辑,从而支撑不同硬件平台的运行模拟。借助定制化的执行引擎与 PPQ Cuda Kernel 的高性能实现,使得 PPQ 具有极其显著的性能优势,往往能以惊人的效率完成量化任务。

PPQ 的开发与推理框架关系密切,这使得我们能够了解硬件推理的诸多细节,从而严格控制硬件模拟误差。在国内外众多开源工作者共同努力之下,目前 PPQ 支持与 TensorRT, OpenPPL, Openvino, ncnn, mnn, Onnxruntime, Tengine, Snpe, GraphCore, Metax 等多个推理框架协同工作,并预制了对应量化器与导出逻辑。PPQ 是一个高度可扩展的模型量化框架,借助 ppq.lib 中的函数功能,您能够将 PPQ 的量化能力扩展到其他可能的硬件与推理库上。我们期待与您一起把人工智能带到千家万户之间。

在 0.6.6 的版本更新中,我们为你带来了这些功能:

  1. FP8 量化规范,PPQ 现在支持 E4M3, E5M2 等多种规范的 FP8 量化模拟与训练
  2. PFL 基础类库,PPQ 现在提供一套更为基础的 api 函数帮助你完成更为灵活的量化
  3. 更为强大的 图模式匹配图融合功能
  4. 基于 Onnx 的模型 QAT 功能
  5. 全新的 TensorRT 量化与导出逻辑
  6. 全球最大的量化模型库 OnnxQuant
  7. 其他未知的软件特性

Installation (安装方法)

  1. Install CUDA from CUDA Toolkit

  2. Install Complier

apt-get install ninja-build # for debian/ubuntu user
yum install ninja-build # for redhat/centos user

For Windows User:

(1) Download ninja.exe from https://github.com/ninja-build/ninja/releases, add it to Windows PATH.

(2) Install Visual Studio 2019 from https://visualstudio.microsoft.com.

(3) Add your C++ compiler to Windows PATH Environment, if you are using Visual Studio, it should be like "C:\Program Files\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx86\x86"

(4) Update PyTorch version to 1.10+.

  1. Install PPQ
git clone https://github.com/openppl-public/ppq.git
cd ppq
pip install -r requirements.txt
python setup.py install
  • Install PPQ from our docker image (optional):
docker pull stephen222/ppq:ubuntu18.04_cuda11.4_cudnn8.4_trt8.4.1.5

docker run -it --rm --ipc=host --gpus all --mount type=bind,source=your custom path,target=/workspace stephen222/ppq:ubuntu18.04_cuda11.4_cudnn8.4_trt8.4.1.5 /bin/bash

git clone https://github.com/openppl-public/ppq.git
cd ppq
export PYTHONPATH=${PWD}:${PYTHONPATH}
  • Install PPQ using pip (optional):
python3 -m pip install ppq

Learning Path (学习路线)

PPQ 基础用法及示例脚本

Description 介绍 Link 链接
01 模型量化 onnx, caffe, pytorch
02 执行器 executor
03 误差分析 analyser
04 校准器 calibration
05 网络微调 finetune
06 网络调度 dispatch
07 最佳实践 Best Practice
08 目标平台 platform
09 优化过程 Optim
10 图融合 Fusion

PPQ 优化过程文档

Description 介绍 Link 链接
01 QuantSimplifyPass(通用量化精简过程) doc
02 QuantFusionPass(通用量化图融合过程) doc
03 QuantAlignmentPass(通用量化对齐过程) doc
04 RuntimeCalibrationPass(参数校准过程) doc
05 BiasCorrectionPass(Bias修正过程) doc
06 QuantSimplifyPass(通用量化精简过程) doc
07 LayerwiseEqualizationPass(层间权重均衡过程) doc
08 LayerSpilitPass(算子分裂过程) doc
09 LearnedStepSizePass(网络微调过程) doc
10 Other(其他) refer to

视频资料

Desc 介绍 Link 链接
01 计算机体系结构基础知识 link
02 网络性能分析 link
03 量化计算原理 part1, part2
04 图优化与量化模拟 link
05 图调度与模式匹配 link
06 神经网络部署 link
07 量化参数选择 link
08 量化误差传播分析 link

量化部署教程

使用例子(Examples) 网络部署平台(Platform) 输入模型格式(Format) 链接(Link) 相关视频(Video)
TensorRT
使用 Torch2trt 加速你的网络 pytorch pytorch link link
TensorRT 量化训练 TensorRT pytorch link link
TensorRT 后训练量化(PPQ) TensorRT onnx

1. Quant with TensorRT OnnxParser

2. Quant with TensorRT API

TensorRT fp32 部署 TensorRT onnx link link
TensorRT 性能比较 TensorRT pytorch link link
TensorRT Profiler TensorRT pytorch link link
onnxruntime
使用 onnxruntime 加速你的网络 onnxruntime onnx link link
onnx 后训练量化(PPQ) onnxruntime onnx link link
onnxruntime 性能比较 onnxruntime pytorch link link
openvino
使用 openvino 加速你的网络 openvino onnx link
openvino 量化训练 openvino pytorch link
openvino 后训练量化(PPQ) openvino onnx link
openvino 性能比较 openvino pytorch link
snpe
snpe 后训练量化(PPQ) snpe caffe link
ncnn
ncnn 后训练量化(PPQ) ncnn onnx link
OpenPPL
ppl cuda 后训练量化(PPQ) ppl cuda onnx link

Dive into PPQ 深入理解量化框架

Desc 介绍 Link 链接
01 PPQ 量化执行流程 link
02 PPQ 网络解析 link
03 PPQ 量化图调度 link
04 PPQ 目标平台与 TQC link
05 PPQ 量化器 link
06 PPQ 量化优化过程 link
07 PPQ 量化函数 link

Contact Us

WeChat Official Account QQ Group
OpenPPL 627853444
OpenPPL QQGroup

Email: [email protected]

Other Resources

Contributions

We appreciate all contributions. If you are planning to contribute back bug fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

Benchmark

PPQ is tested with models from mmlab-classification, mmlab-detection, mmlab-segamentation, mmlab-editing, here we listed part of out testing result.

  • No quantization optimization procedure is applied with following models.
Model Type Calibration Dispatcher Metric PPQ(sim) PPLCUDA FP32
Resnet-18 Classification 512 imgs conservative Acc-Top-1 69.50% 69.42% 69.88%
ResNeXt-101 Classification 512 imgs conservative Acc-Top-1 78.46% 78.37% 78.66%
SE-ResNet-50 Classification 512 imgs conservative Acc-Top-1 77.24% 77.26% 77.76%
ShuffleNetV2 Classification 512 imgs conservative Acc-Top-1 69.13% 68.85% 69.55%
MobileNetV2 Classification 512 imgs conservative Acc-Top-1 70.99% 71.1% 71.88%
---- ---- ---- ---- ---- ---- ---- ----
retinanet Detection 32 imgs pplnn bbox_mAP 36.1% 36.1% 36.4%
faster_rcnn Detection 32 imgs pplnn bbox_mAP 36.6% 36.7% 37.0%
fsaf Detection 32 imgs pplnn bbox_mAP 36.5% 36.6% 37.4%
mask_rcnn Detection 32 imgs pplnn bbox_mAP 37.7% 37.6% 37.9%
---- ---- ---- ---- ---- ---- ---- ----
deeplabv3 Segmentation 32 imgs conservative aAcc / mIoU 96.13% / 78.81% 96.14% / 78.89% 96.17% / 79.12%
deeplabv3plus Segmentation 32 imgs conservative aAcc / mIoU 96.27% / 79.39% 96.26% / 79.29% 96.29% / 79.60%
fcn Segmentation 32 imgs conservative aAcc / mIoU 95.75% / 74.56% 95.62% / 73.96% 95.68% / 72.35%
pspnet Segmentation 32 imgs conservative aAcc / mIoU 95.79% / 77.40% 95.79% / 77.41% 95.83% / 77.74%
---- ---- ---- ---- ---- ---- ---- ----
srcnn Editing 32 imgs conservative PSNR / SSIM 27.88% / 79.70% 27.88% / 79.07% 28.41% / 81.06%
esrgan Editing 32 imgs conservative PSNR / SSIM 27.84% / 75.20% 27.49% / 72.90% 27.51% / 72.84%
  • PPQ(sim) stands for PPQ quantization simulator's result.
  • Dispatcher stands for dispatching policy of PPQ.
  • Classification models are evaluated with ImageNet, Detection and Segmentation models are evaluated with the COCO dataset, Editing models are evaluated with DIV2K dataset.
  • All calibration datasets are randomly picked from training data.

License

Logo

This project is distributed under the Apache License, Version 2.0.

ppq's People

Contributors

a1trl9 avatar bug1989 avatar feigechuanshu avatar huoshuai-dot avatar ice-tong avatar inisis avatar jzz24 avatar leiwang1999 avatar lenan22 avatar liuxubit avatar ljoson avatar marsmiao avatar openppl-public avatar ouonline avatar sanbuphy avatar tairenpiao avatar tpoisonooo avatar triple-mu avatar wangqiang9 avatar xiguadong avatar yuksing12 avatar zchrissirhcz avatar zhangzhipku avatar zhenglongjiepheonix avatar zhiqwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ppq's Issues

module 'numpy' has no attribute 'asscalar'

File "/usr/local/lib/python3.8/dist-packages/ppq-0.6.4-py3.8.egg/ppq/parser/util.py", line 27, in convert_value
value = np.asscalar(value[0])
File "/usr/local/lib/python3.8/dist-packages/numpy/init.py", line 311, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'asscalar'

numpy version 1.23.0

请问是否支持操作batch维度?

您好,我有一个输入为[15,3,512,512]的分割模型,在运行ProgramEntrance.py时提示“Error happens when dealing with operation Transpose_93(TargetPlatform.UNSPECIFIED) “,这个transpose是把第0个维度和第1个维度进行交换,请问是否支持这样的操作呢?

torch executor 中 Resize_forward 的实现在 sizes 未指定而使用 size > 1 的 scales 时疑似存在问题

如下面的 code snippet 所示,当 resize 操作以 scales 而非 sizes 来规定输出大小时。若传入的 scale 只有一个元素则没什么问题,当输入的scales.numel() > 1时,1089 行只取scale[-2]送进torch.nn.functional.interpolate,即另外一个 dimension 的输入 scale_factor 压根没用上,结果导致后续跟一些 concat 类操作时,由于 dimension 不对,很容易扑街。

看 1088 行其实已经先校验了scales.numel() % 2 == 0,猜测实际上这是一个scales[-2:] -> scales[-2]的 typo?

if sizes is None or len(sizes) == 0:
sizes = None
if scales.numel() == 1:
scales = scales.item()
else:
assert scales.numel() % 2 == 0
scales = scales[-2].cpu().numpy().tolist()

关于scheduler/dispatcher.py 125行处的bug

项目很不错!但是我在跑ONNX官网model zoo 的 efficientnet-lite4-11.onnx 模型有报错。报错在scheduler/dispatcher.py 125行。分析了一下原因是这样:

  1. 该模型的graph里有这么一个流:···-->Conv-->BN-->Clip-->···。PPQ会默认 fuse ConvBN,但是fuse得到的operation 是 append 到 graph.operations末尾的。
  2. 在给Clip绑定platform时,会执行scheduler/dispatcher.py 125行的语句。

综合1、2,也就是说,此时dispatching_table 是没有ConvBN这个operation的信息的,就会导致报错。顺序上的问题,看作者您怎么解决为好

RuntimeError of Shape op during Calibration dataset progress and finetune progress

图片

配置信息:

TARGET_PLATFORM = TargetPlatform.NXP_INT8 # choose your target platform
MODEL_TYPE = NetworkFramework.ONNX # or NetworkFramework.CAFFE
INPUT_LAYOUT = 'chw' # input data layout, chw or hwc
NETWORK_INPUTSHAPE = [16, 1, 40, 61] # input shape of your network
CALIBRATION_BATCHSIZE = 16 # batchsize of calibration dataset
EXECUTING_DEVICE = 'cuda' # 'cuda' or 'cpu'.
REQUIRE_ANALYSE = True
DUMP_RESULT = False

SETTING = UnbelievableUserFriendlyQuantizationSetting(
platform = TARGET_PLATFORM, finetune_steps = 2500,
finetune_lr = 1e-3, calibration = 'percentile',
equalization = True, non_quantable_op = None)
dataloader = DataLoader(
dataset=calibration_dataset,
batch_size=32, shuffle=True)
quantized = quantize(
working_directory=WORKING_DIRECTORY, setting=SETTING,
model_type=MODEL_TYPE, executing_device=EXECUTING_DEVICE,
input_shape=NETWORK_INPUTSHAPE, target_platform=TARGET_PLATFORM,
dataloader=dataloader, calib_steps=250)

问题描述:

在213次迭代时shape算子报上述错误,计算后发现这一次迭代batch size=19, 在dataload迭代器内部打印了下log,发现这一批次finetune确实只送出来了19个样本。后来发现数据集样本数刚好在213次迭代时遍历完一遍。
后面我将finetune step和calib_step都改为100, Calibration数据集样本数调整为32*100个之后就能正常运行。
下面是模型文件:
model.zip

BN是否可以单独被量化

emmmm,又来叨扰你们勒。我在测试一个模型的时候,有这么个计算图:

...--->Add--->BN--->ReLU--->...

量化完成后,我查看了一下json文件,这个BN似乎没有被量化,想叨扰问下BN可以被量化吗,还是说模型不应该有这样单独存在的BN呢

tf graph支持

你好,请教下,ppq目前主要是针对torch及相应的onnx支持,针对tensorflow model的量化(pb),需要转onnx,或者有什么使用建议吗?

量化后模型,转换为 tensorrt int8 engine,inference 不对齐

Hello,
我在尝试使用 PPQ 量化来得到 Tensorrt Int8 模型,发现模型比较大的时候,QDQ Onnx 模型转 TRT Int8 似乎存在性能问题 (无法对齐),具体地,我尝试小模型如 mnist 时可以对齐 (1e-7量级误差),稍大的模型如 resnet50 就存在较大的误差
我不确定是否我的操作存在问题,目前定位问题倾向于认为是 Tensorrt 转换过程引入了误差,所以我在 TensorRT repo 中提了 Issue,详见 NVIDIA/TensorRT#2103
想请教一下是否遇到过类似的问题,谢谢!

RuntimeError: Error building extension 'PPQ_Cuda_Impls': [1/6] :/usr/local/cuda-10.2/bin/nvcc

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

warnings.warn(WRONG_COMPILER_WARNING.format(
Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "/root/anaconda3/envs/pytorch/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/core/ffi.py", line 16, in
CUDA_EXTENTION = load(
File "/root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1124, in load
return jit_compile(
File "/root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1337, in jit_compile
write_ninja_file_and_build_library(
File "/root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1449, in write_ninja_file_and_build_library
run_ninja_build(
File "/root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'PPQ_Cuda_Impls': [1/6] :/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cuda/linear.cu -o linear.cuda.o
FAILED: linear.cuda.o
:/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cuda/linear.cu -o linear.cuda.o
/bin/sh: :/usr/local/cuda-10.2/bin/nvcc: No such file or directory
[2/6] :/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cuda/sort.cu -o sort.cuda.o
FAILED: sort.cuda.o
:/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cuda/sort.cu -o sort.cuda.o
/bin/sh: :/usr/local/cuda-10.2/bin/nvcc: No such file or directory
[3/6] :/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cuda/train.cu -o train.cuda.o
FAILED: train.cuda.o
:/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cuda/train.cu -o train.cuda.o
/bin/sh: :/usr/local/cuda-10.2/bin/nvcc: No such file or directory
[4/6] c++ -MMD -MF export.o.d -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/export.cc -o export.o
FAILED: export.o
c++ -MMD -MF export.o.d -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/export.cc -o export.o
c++: error: unrecognized command line option ‘-std=c++14’
[5/6] c++ -MMD -MF hist_mse.o.d -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cpu/hist_mse.cc -o hist_mse.o
FAILED: hist_mse.o
c++ -MMD -MF hist_mse.o.d -DTORCH_EXTENSION_NAME=PPQ_Cuda_Impls -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.2/include -isystem /root/anaconda3/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ppq-0.6.5-py3.8.egg/ppq/csrc/cpu/hist_mse.cc -o hist_mse.o
c++: error: unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.

如何理解TargetPlatform?

首先, 很感谢你们开源为量化这方面做的贡献!
我先说一下我的理解: 这个变量应该是指我要部署的目标平台是什么? 按这个理解, 我可以理解CUDA / NXP / DSP / SNPE 等不同厂商的不同硬件, 甚至于后面接的数据类型: FP32 / FP16 / INT8 / INT4 都可以看成是不同的子硬件平台. 但我惟一感到困惑的是: 为什么要区分PPL_CUDA_INT8 和 TRT_INT8? 两者的应用目标不都是GPU吗? 那这里是隐含说一个是采用PPQ内嵌的量化算法然后直接生成engine模型给GPU部署, 另一个是采用TensorRT内自带量化算法?

Bug Report

切到 08dc0f8b10ecc8f41e52d7a0d4e7b5dc89a92f66 会报错。

2022-06-05 18:02:30,982 - mmdeploy - ERROR - name 'NCNNRequantizePass' is not defined
2022-06-05 18:02:30,982 - mmdeploy - ERROR - onnx2ncnn_quant_table failed.

54c0e3f6f7f469a1a184f54c8c565d93777c6e74 没事。多加点 CI 吧。

slice op error when axis = -1

当使用ppq量化onnx模型时产生报错:

"ppq/executor/op/torch/default.py" , line 959
    new_axes = [ x if x >= 0 else len(data.dim()) + x for x in axes]
TypeError: object of type 'int' has no len()

貌似是slice的axis为-1时产生的问题

snpe-量化问题

请问是不是只要onnx转换成功,无论原始模型是caffe还是pytorch,都可以使用ppq正常进行量化?

A bug when export file

当我在导出的时候platform=TargetPlatform.ONNXRUNTIME会出现以下报错,但是platform=TargetPlatform.ONNX就不会

网络量化结束,正在生成目标文件:
Traceback (most recent call last):
File "/tmp/pycharm_project_280/ppq/programeetrance.py", line 188, in
config_save_to=os.path.join(WORKING_DIRECTORY, 'quant_cfg.json'))
File "/home/wyy/anaconda3/envs/py_37/lib/python3.7/site-packages/ppq/api/interface.py", line 610, in export_ppq_graph
exporter.export(file_path=graph_save_to, config_path=config_save_to, graph=graph, **kwargs)
File "/home/wyy/anaconda3/envs/py_37/lib/python3.7/site-packages/ppq/parser/onnxruntime_exporter.py", line 366, in export
graph = self.prepare_graph(graph)
File "/home/wyy/anaconda3/envs/py_37/lib/python3.7/site-packages/ppq/parser/onnxruntime_exporter.py", line 361, in prepare_graph
quant_param_to_int=quant_parameter_to_int)
File "/home/wyy/anaconda3/envs/py_37/lib/python3.7/site-packages/ppq/parser/onnxruntime_exporter.py", line 315, in convert_operation
graph=graph, var=var, config=config, related_op=op, meta=meta)
File "/home/wyy/anaconda3/envs/py_37/lib/python3.7/site-packages/ppq/parser/onnxruntime_exporter.py", line 74, in insert_quant_on_variable
scale = convert_any_to_torch_tensor(config.scale.clone(), dtype=torch.float32)
AttributeError: 'NoneType' object has no attribute 'clone'

加载数据出现问题

ValueError: cannot reshape array of size 256269 into shape (1,3,480,480)
准备图像时800*1280,加载数据时出现这个问题,该如何解决

PPL_DSP_INT8量化后export问题

GPU模式下,在跑RetinaFace(backbone为ResNet50时),量化过程成功跑完,在导出时报TypeError: Cannot convert Resize_133 to caffe op。debug发现是因为没有满足ppq/parser/caffe/caffe_export_utils.py 的第439行判断而导致的。

ValueError: Input at [1] of Operation [ScatterND_270] deploy with incorrect device cuda:0

Compling CUDA Kernels. Please wait...
Traceback (most recent call last):
File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\ppq\executor\torch.py", line 366, in __forward
outputs = operation_forward_func(operation, inputs, self._executing_context)
File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\ppq\executor\op\torch\default.py", line 1421, in ScatterND_forward
ASSERT_ALL_TENSORS_AT_CPU(op=op, values=[None, values[1], None])
File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\ppq\executor\op\torch\base.py", line 37, in ASSERT_ALL_TENSORS_AT_CPU
f'Input at [{idx}] of Operation [{op.name}] deploy with incorrect device {tensor.device}, '
ValueError: Input at [1] of Operation [ScatterND_270] deploy with incorrect device cuda:0, which is not supposed to happen in PPQ execution system. This is a critical system failure, you can set ppq.core.config.force_convert as True to force convert those values, which might be able to continue executing your graph. YOU ARE RECOMMEND TO REPORT THIS FAILURE TO US.

Seems that Squeeze node of ONNX input model must have axes attribute, is it a bug or just feature?

I am using an ONNX model which contains a Squeeze operator without the axes attribute as quantize_onnx_model's input. It failed, the error messages are:

Squeeze_126(TargetPlatform.FP32) - inputs:['501'], outputs:['502']
Traceback (most recent call last):
  File "/Users/wusongchao/code/ppq/ppq/executor/torch.py", line 366, in __forward
    outputs = operation_forward_func(operation, inputs, self._executing_context)
  File "/Users/wusongchao/code/ppq/ppq/executor/op/torch/default.py", line 683, in Squeeze_forward
    [squeezing_tensor], axes = values, GET_ATTRIBUTE_FROM_OPERATION(op=op, attribute='axes', compulsive=True)
  File "/Users/wusongchao/code/ppq/ppq/executor/op/torch/base.py", line 78, in GET_ATTRIBUTE_FROM_OPERATION
    'However this value is missing from currecnt operation.')
KeyError: ('Operation Squeeze_126 is supposed to have a value of attribute axes. ', 'However this value is missing from currecnt operation.')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ppq-entrance.py", line 67, in <module>
    device=DEVICE, verbose=0)
  File "/Users/wusongchao/code/ppq/ppq/core/defs.py", line 54, in _wrapper
    return func(*args, **kwargs)
  File "/Users/wusongchao/code/ppq/ppq/api/interface.py", line 274, in quantize_onnx_model
    collate_fn=collate_fn
  File "/Users/wusongchao/code/ppq/ppq/core/defs.py", line 54, in _wrapper
    return func(*args, **kwargs)
  File "/Users/wusongchao/code/ppq/ppq/quantization/quantizer/base.py", line 61, in quantize
    executor.tracing_operation_meta(inputs=inputs)
  File "/Users/wusongchao/.pyenv/versions/3.7.11/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wusongchao/code/ppq/ppq/core/defs.py", line 54, in _wrapper
    return func(*args, **kwargs)
  File "/Users/wusongchao/code/ppq/ppq/executor/torch.py", line 433, in tracing_operation_meta
    hooks=hooks)
  File "/Users/wusongchao/code/ppq/ppq/executor/torch.py", line 394, in __forward
    raise RuntimeError(f'Error happens when dealing with operation {str(operation)}')
RuntimeError: Error happens when dealing with operation Squeeze_126(TargetPlatform.FP32) - inputs:['501'], outputs:['502']

So i was guided to the definition of Squeeze_forward, the documentation here claims that axes is a optional field(which is the same as ONNX IR doc).

def Squeeze_forward(op: Operation, values: List[torch.Tensor], ctx: TorchBackendContext = None, **kwargs) -> torch.Tensor:
"""Remove single-dimensional entries from the shape of a tensor. Takes an
input axes with a list of axes to squeeze. If axes is not provided, all the
single dimensions will be removed from the shape. If an axis is selected
with shape entry not equal to one, an error is raised.
Inputs (1 - 2)
data (differentiable) : T
Tensors with at least max(dims) dimensions.
axes (optional, non-differentiable) : tensor(int64)
List of integers indicating the dimensions to squeeze.
Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
Outputs
squeezed (differentiable) : T
Reshaped tensor with same data as input.
Args:
op (Operation): [description]
input_values (List[torch.Tensor]): [description]
Returns:
torch.Tensor: [description]
"""

However, the implementation call GET_ATTRIBUTE_FROM_OPERATION with compulsive=True. Since the Squeeze operator inside my model do not contains the axes attribute, it throws the exception that i pasted in the beginning.

ASSERT_ALL_TENSORS_AT_SAME_DEVICE(op=op, values=values)
ASSERT_NUM_OF_INPUT(op=op, values=values, min_num_of_input=1, max_num_of_input=2)
[squeezing_tensor], axes = values, GET_ATTRIBUTE_FROM_OPERATION(op=op, attribute='axes', compulsive=True)
if isinstance(axes, list):
for squeezing_dim in sorted(axes, reverse=True):
squeezing_tensor = torch.squeeze(squeezing_tensor, squeezing_dim)
elif isinstance(axes, int):
squeezing_tensor = torch.squeeze(squeezing_tensor, axes)
else: raise TypeError(f'Parameter axes of operation {op.name} misunderstood, '
f'expect int value of list of int, while {type(axes)} was given.')
return squeezing_tensor

So i wonder, is such mandatory needed of Squeeze operator axes field a intended feature, or just a bug?

请问量化感知训练(qat)怎么使用

您好,非常感谢你们开源的优秀项目。最近想学习一下,我看了下文档,好像没找到感知训练量化的内容(可能是自己粗心没找到)。如果是我没找到希望您能稍微指点一下文档在哪,如果真的没有,想了解下您们的一些想法。谢谢。

使用CPU执行时报错

我又来了。我尝试在CPU跑ONNX官网model zoo 的 efficientnet-lite4-11.onnx 模型有报错。calibration策略为kl、mse时,quantization/optim/refine.py #582行触发assert,说某算子没有被正确quantize。
我用minmax策略的时候就不会出现这个问题。上述都是在CPU条件下进行的(我这边条件没有GPUhhhh),,

我能通过改动某些代码来解决这个报错吗,还是说我只能先在CPU条件下用minmax策略勒

ONNX 模型输入时,reshape / flatten 等涉及维度变换的算子,在维度变换参数固定时,batch 输入的 calibration pass 容易扑街

这可能是一个比较 general 的问题,即有一部分部署输入的 ONNX 模型,可能经过 shape inference 或 simplified 或什么其它的方式,它的 reshape 这类算子输入的 shape 参数维度是固定的。这样就导致 calibration pass 的时候,当输入是 batch,torch executor 就扑街了。

以这个模型为例,Reshape_71 这个算子的 shape 是固定的。
image

不出意外的,当 batch_size = 32 时,12483460*32=6266880 就 reshape 不到 [1,2,48,34,60]。

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/ppq/executor/torch.py", line 359, in __forward
    outputs = operation_forward_func(operation, inputs, self._executing_contenxt)
  File "/usr/local/lib/python3.6/dist-packages/ppq/executor/op/torch/default.py", line 458, in Reshape_forward
    return data.reshape(shape)
RuntimeError: shape '[1, 2, 48, 34, 60]' is invalid for input of size 6266880

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ppq-entrance.py", line 66, in <module>
    device=DEVICE, verbose=0)
  File "/usr/local/lib/python3.6/dist-packages/ppq/core/defs.py", line 65, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ppq/api/interface.py", line 267, in quantize_onnx_model
    collate_fn=collate_fn
  File "/usr/local/lib/python3.6/dist-packages/ppq/core/defs.py", line 65, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ppq/quantization/quantizer/base.py", line 74, in quantize
    **kwargs
  File "/usr/local/lib/python3.6/dist-packages/ppq/quantization/optim/base.py", line 95, in optimize
    optimization_pass.apply(processer=processer, dataloader=dataloader, executor=executor, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ppq/quantization/optim/base.py", line 30, in apply
    self.optimize(processer, dataloader=dataloader, executor=executor, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ppq/core/defs.py", line 65, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ppq/quantization/optim/calibration.py", line 117, in optimize
    executor=executor, hooks=hooks, output_names=None)
  File "/usr/local/lib/python3.6/dist-packages/ppq/quantization/optim/calibration.py", line 59, in calibrate
    output_names=output_names)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ppq/executor/torch.py", line 231, in forward
    hooks=hooks
  File "/usr/local/lib/python3.6/dist-packages/ppq/executor/torch.py", line 387, in __forward
    raise RuntimeError(f'Error happens when dealing with operation {str(operation)}')
RuntimeError: Error happens when dealing with operation Reshape_71(TargetPlatform.FP32) - inputs:['708', '1894'], outputs:['720']

Upsample算子似乎不支持量化;ConTranpose似乎无法完成BN Fold;

a. 在跑一个onnx测试模型时,报错Upsample算子 no bakend on target platform,似乎PPQ还不支持Upsample算子的量化,后续有可能会支持吗

b. 在跑另一个onnx测试模型时,模型中有这么个计算图:
...-->ConvTranspose-->BatchNrom-->ReLU-->...
然后报错ConvTranspose算子无法和BN进行Fold

c. 还想叨扰请教一下,如果计算图为:
...-->BatchNrom-->Conv-->ReLU-->...
那么可以进行Fold吗?

Graph output of export_ppq_graph is not quantized

Hello, great work!

I try to quantize some ONNX models with ppq. The ONNX model output from export_ppq_graph is not quantized at all, but I found some key parameters for quantization such as scales and zero points for different layers saved in JSON. Is this bug or feature of PPQ? How can I get the quantized ONNX model?

'utf-8' codec can't decode byte 0xb4 in position 2833

您好,我在我自己带GPU的电脑试了一下,发现在ppq\core\ffi.py第28行,报错这个'utf-8' codec can't decode byte 0xb4 in position 2833,调了半天一直无法解决,很郁闷,这是什么导致的呢

我用的是visual studio 2019

resize forward有点问题

在ppq/executor/op/torch/default.py的Resize_forwrd中,如果len(values) == 2, 那么scales = None, 那么if scales.numel() == 1就会出错了

The time of model inference increases after doing int8 quantization

my device is i7-8750H

Start Benchmark with openvino (Batchsize = 1)
Time span (FP32 MODE): 68.0568 sec
Time span (INT8 MODE): 85.6443 sec

i don't konw what is happend, how can i get faster after doing int8 quantization in openvino ?

here is the download links of my onnx file:

链接: https://pan.baidu.com/s/1QUhs5wY1fsOVlzsCbsx7aw 提取码: tna9 复制这段内容后打开百度网盘手机App,操作更方便哦

链接: https://pan.baidu.com/s/1DHXTRxBGcPXpOAkPqZo0BQ 提取码: tc2j 复制这段内容后打开百度网盘手机App,操作更方便哦

thank you~

关于QuantizationStates.PASSIVE_INIT量化配置探讨

想叨扰问一下这个配置是干什么用的:base_quant_config.input_quantization_config[-1].state = QuantizationStates.PASSIVE_INIT

当执行这个语句时,也就是PASSIVE_INIT生效时,最后量化引入的量化噪声非常严重;但不使用这个语句时,量化噪声就几乎没有了。这是为什么呢,,

关于 量化算子的inference

我想知道量化后的算子的前向过程是怎么计算的?想知道这块的代码是在哪里可以看到?比如uint8 的conv forward?ppq/executor/op/torch/default.py 这个脚本里面conv_forward()没看到相应的实现。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.