Giter Club home page Giter Club logo

paddleslim's Introduction

PaddleSlim

PaddleSlim是一个专注于深度学习模型压缩的工具库,提供低比特量化、知识蒸馏、稀疏化和模型结构搜索等模型压缩策略,帮助开发者快速实现模型的小型化。

产品动态

  • 🔥 2022.01.18: 发布YOLOv8自动化压缩示例,量化预测加速2.5倍。

  • 【直播分享】2022-12-13 20:30 《自动化压缩技术详解及ViT模型实战》,微信扫码报名

2022.08.16:自动化压缩功能升级
模型 Base mAPval
0.5:0.95
ACT量化mAPval
0.5:0.95
模型体积压缩比 预测时延FP32
预测时延INT8
预测加速比
PPYOLOE-s 43.1 42.6 3.9倍 6.51ms 2.12ms 3.1倍
YOLOv5s 37.4 36.9 3.8倍 5.95ms 1.87ms 3.2倍
YOLOv6s 42.4 41.3 3.9倍 9.06ms 1.83ms 5.0倍
YOLOv7 51.1 50.9 3.9倍 26.84ms 4.55ms 5.9倍
YOLOv7-Tiny 37.3 37.0 3.9倍 5.06ms 1.68ms 3.0倍
历史更新
  • 2022.07.01: 发布v2.3.0版本

    • 发布自动化压缩功能
      • 支持代码无感知压缩:开发者只需提供推理模型文件和数据,既可进行离线量化(PTQ)、量化训练(QAT)、稀疏训练等压缩任务。
      • 支持自动策略选择,根据任务特点和部署环境特性:自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
      • 发布自然语言处理图像语义分割图像目标检测三个方向的自动化压缩示例。
      • 发布X2Paddle模型自动化压缩方案:YOLOv5YOLOv6YOLOv7HuggingFaceMobileNet
    • 升级量化功能
      • 统一量化模型格式;离线量化支持while op;修复BERT大模型量化训练过慢的问题。
      • 新增7种离线量化方法, 包括HIST, AVG, EMD, Bias Correction, AdaRound等。
    • 支持半结构化稀疏训练
    • 新增延时预估工具
      • 支持对稀疏化模型、低比特量化模型的性能预估;支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能;提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口。
      • 提供部署环境自动扩展工具,可以自动增加在更多 ARM CPU 设备上的预估工具。
  • 2021.11.15: 发布v2.2.0版本

    • 支持动态图离线量化功能.
  • 2021.5.20: 发布V2.1.0版本

    • 扩展离线量化方法
    • 新增非结构化稀疏
    • 增强剪枝功能
    • 修复OFA功能若干bug

更多信息请参考:release note

基础压缩功能概览

PaddleSlim支持以下功能,也支持自定义量化、裁剪等功能。

Quantization Pruning NAS Distilling

注:

  • *表示仅支持静态图,**表示仅支持动态图
  • 敏感度裁剪指的是通过各个层的敏感度分析来确定各个卷积层的剪裁率,需要和其他裁剪方法配合使用。

PaddleSlim在典型视觉和自然语言处理任务上做了模型压缩,并且测试了Nvidia GPU、ARM等设备上的加速情况,这里展示部分模型的压缩效果,详细方案可以参考下面CV和NLP模型压缩方案:


表1: 部分场景模型压缩加速情况

注意事项
  • YOLOv3: 在移动端SD855上加速3.55倍。
  • PP-OCR: 体积由8.9M减少到2.9M, 在SD855上加速1.27倍。
  • BERT: 模型参数由110M减少到80M,精度提升的情况下,Tesla T4 GPU FP16计算加速1.47倍。

不同压缩方法效果

自动压缩效果

image
表3: 自动压缩效果

离线量化效果对比

image
表2: 多种离线量化方法效果对比

安装

安装发布版本:

pip install paddleslim

安装develop版本:

git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim
python setup.py install
  • 验证安装:安装完成后您可以使用 python 或 python3 进入 python 解释器,输入import paddleslim, 没有报错则说明安装成功。
  • 版本对齐:
PaddleSlim PaddlePaddle PaddleLite
2.0.0 2.0 2.8
2.1.0 2.1.0 2.8
2.1.1 2.1.1 >=2.8
2.3.0 2.3.0 >=2.11
2.4.0 2.4.0 >=2.11
develop develop >=2.11

文档教程

模型压缩技术

快速开始

更多教程

进阶教程详细介绍了每一步的流程,帮助您把相应方法迁移到您自己的模型上。

推理部署

CV模型压缩

多场景效果展示

本系列教程均基于Paddle官方的模型套件中模型进行压缩,若您不是模型套件用户,更推荐使用快速教程和进阶教程。

NLP模型压缩

API文档

1. 量化训练或者离线量化后的模型体积为什么没有变小?

答:这是因为量化后保存的参数是虽然是int8范围,但是类型是float。这是因为Paddle训练前向默认的Kernel不支持INT8 Kernel实现,只有Paddle Inference TensorRT的推理才支持量化推理加速。为了方便量化后验证量化精度,使用Paddle训练前向能加载此模型,默认保存的Float32类型权重,体积没有发生变换。

2. macOS + Python3.9环境或者Windows环境下, 安装出错, "command 'swig' failed"

答: 请参考#1258

许可证书

本项目的发布受Apache 2.0 license许可认证。

贡献代码

我们非常欢迎你可以为PaddleSlim提供代码,也十分感谢你的反馈。

技术交流

  • 如果你发现任何PaddleSlim存在的问题或者是建议, 欢迎通过GitHub Issues给我们提issues。

  • 欢迎加入PaddleSlim 微信技术交流群

paddleslim's People

Contributors

aurelius84 avatar baiyfbupt avatar ceci3 avatar faninsm avatar gushiqiao avatar heavengate avatar huangxu96 avatar iamwhtwd avatar itminner avatar juncaipeng avatar ldoublev avatar leiqing1 avatar lidanqing-intel avatar lijianshe02 avatar littletomatodonkey avatar liuchiachi avatar lizexu123 avatar minghaobd avatar moneypi avatar qingqing01 avatar rachelxu7 avatar slf12 avatar wanghaoshuang avatar xgzhang11 avatar xiaoluomi avatar xiteng1988 avatar yghstill avatar yukavio avatar zhanghandi avatar zzjjay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

paddleslim's Issues

transformer distilling出错

我试了transformer的distill,train了两个batch之后提示如下错误:
Error: Tensor holds no memory. Call Tensor::mutable_data first.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.cc:23)
[operator < elementwise_div > error]

是内存原因吗?

quant_post量化yolov3报错 KeyError: 'stage.3.7.0.conv.weights'

CPU 8
RAM 32GB
GPU v100
显存 16GB
磁盘 100GB
环境配置
Python版本 python3.7
框架版本 PaddlePaddle 1.7.0
@slf12
aistudio@jupyter-115786-193843:~/post_training_quantization_withdata$ sh run_post_training_quanzation.sh
-------------------args----------------------
algo: KL
batch_nums: 20
batch_size: 3000
is_full_quantize: False
model_dir: ../work/PaddleDetection_1/yolov3_dark_freeze/mj_yolov3_darknet
model_filename: None
params_filename: None
save_model_path: yolov3_int8_model
use_gpu: True

W0314 12:14:31.721148 637 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0314 12:14:31.725704 637 device_context.cc:245] device: 0, cuDNN Version: 7.3.
2020-03-14 12:14:34,203-INFO: all run batch: 0
2020-03-14 12:14:34,203-INFO: all run batch: 0
2020-03-14 12:14:34,203-INFO: calculate scale factor ...
2020-03-14 12:14:34,203-INFO: calculate scale factor ...
Traceback (most recent call last):
File "post_training_quantization.py", line 66, in
batch_nums=10)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/quant/quanter.py", line 306, in quant_post
post_training_quantization.quantize()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/contrib/slim/quantization/post_training_quantization.py", line 231, in quantize
self._calculate_scale_factor()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/contrib/slim/quantization/post_training_quantization.py", line 353, in _calculate_scale_factor
data = self._sampling_data[var_name]
KeyError: 'stage.3.7.0.conv.weights'

analysis中的model_size计算

该计算中对于模型中bn层,除了scale和offset外,将mean和variance也统计为了参数量,后两者没有经过优化算法优化过,不是学习到的 ,是否应该算作参数量?

在aistudio上运行sa_nas_mobilenetv2出现错误

运行时输出如下:
aistudio@jupyter-7623-23204:~/work/PaddleSlim/demo/nas$ python sa_nas_mobilenetv2.py --class_dim 10 --lr 0.01
Namespace(batch_size=256, class_dim=10, data='cifar10', is_server=True, lr=0.01, search_steps=100, use_gpu=True)
2020-01-06 16:57:17,903-INFO: range table: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 5, 8, 6, 2, 5, 8, 6, 2, 5, 8, 6, 2, 5, 10, 6, 2, 5, 10, 6, 2, 5, 12, 6, 2])
2020-01-06 16:57:17,903-INFO: ControllerServer - listen on: [172.25.33.199:8989]
2020-01-06 16:57:17,904-INFO: Controller Server run...
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py", line 45, in convert_to_list
value_list = list(value)
TypeError: 'numpy.int64' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "sa_nas_mobilenetv2.py", line 318, in
search_mobilenetv2(config, args, image_size, is_server=args.is_server)
File "sa_nas_mobilenetv2.py", line 92, in search_mobilenetv2
train_program, startup_program, image_shape, archs, args)
File "sa_nas_mobilenetv2.py", line 49, in build_program
output = archs(data)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/mobilenetv2.py", line 186, in net_arch
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/mobilenetv2.py", line 311, in _invresi_blocks
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/mobilenetv2.py", line 271, in _inverted_residual_unit
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/base_layer.py", line 52, in conv_bn_layer
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 2721, in conv2d
filter_size = utils.convert_to_list(filter_size, 2, 'filter_size')
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py", line 49, in convert_to_list
value))
ValueError: The filter_size's type must be list or tuple. Received: 3

请问怎么处理下?谢谢
另外, block_sa_nas_mobilenetv2.py 运行也是这个错误

distilling报错

忧伤死了,调了半天,最后报个这个
Error: Tensor holds no memory. Call Tensor::mutable_data first.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.cc:23)
[operator < lookup_table_v2 > error]
实在没头绪了,查issue似乎有别人也报过这个,希望官方注意下吧

name '_logger' is not defined

使用在线量化训练模型,报错:
Traceback (most recent call last):
File "train.py", line 734, in
main()
File "train.py", line 730, in main
train(args)
File "train.py", line 339, in train
test_prog, place, quant_config, scope=None, for_test=True)
File "/home/vis/duyuting/anaconda3/lib/python3.7/site-packages/paddleslim-1.0.0-py3.7.egg/paddleslim/quant/quanter.py", line 132, in quant_aware
NameError: name '_logger' is not defined
环境是paddle1.6 paddleslim是上一个issue给的release版本 我去quanter.py去看了一下确实对_logger没有定义?是有bug?

yolov3剪枝训练时,优化器报错

训练部分主要代码如下:

def train():

    logger.info("start train YOLOv3, train params:%s", str(train_parameters))

    logger.info("create place, use gpu:" + str(train_parameters['use_gpu']))

    logger.info("build network and program")

    place = fluid.CUDAPlace(0) if train_parameters['use_gpu'] else fluid.CPUPlace()
    exe = fluid.Executor(place)

    scope = fluid.Scope()
    train_program = fluid.Program()
    start_program = fluid.Program()
    test_program = fluid.Program()
    
    feeder, reader, loss = build_program_with_feeder(train_program, start_program, place)

    pred = build_program_with_feeder(test_program, start_program, istrain=False)
    
    test_program = test_program.clone(for_test=True)
    
    train_fetch_list = [loss.name]
    
    exe.run(start_program, scope=scope)
    
    load_pretrained_params(exe, train_program)
    
    if train_parameters['print_params']:
        param_delimit_str = '-' * 20 + "All parameters in current graph" + '-' * 20
        print(param_delimit_str)
        for block in train_program.blocks:
            for param in block.all_parameters():
                print("parameter name: {}\tshape: {}".format(param.name,
                                                             param.shape))
        print('-' * len(param_delimit_str))
    
    pruned_params = train_parameters['pruned_params'].strip().split(",")
    logger.info("pruned params: {}".format(pruned_params))
    pruned_ratios = [float(n) for n in train_parameters['pruned_ratios'].strip().split(",")]
    logger.info("pruned ratios: {}".format(pruned_ratios))
    
    logger.info("build executor and init params")
    
    pruner = Pruner()
    train_program = pruner.prune(
        train_program,
        scope,
        params=pruned_params,
        ratios=pruned_ratios,
        place=place,
        only_graph=False)[0]
    
    base_flops = flops(test_program)
    test_program = pruner.prune(
        test_program,
        scope,
        params=pruned_params,
        ratios=pruned_ratios,
        place=place,
        only_graph=True)[0]
    pruned_flops = flops(test_program)

    stop_strategy = train_parameters['early_stop']
    rise_limit = stop_strategy['rise_limit']

    min_loss = stop_strategy['min_loss']
    # stop_train = False
    rise_count = 0
    total_batch_count = 0
    current_best_f1 = 0.0
    train_temp_loss = 0
    current_best_pass = 0
    current_best_box_pass = 0
    current_best_recall = 0
    current_best_precision = 0
    current_best_box_recall = 0
    current_best_box_precision = 0
    current_best_box_f1 = 0
    for pass_id in range(train_parameters["num_epochs"]):
        logger.info("current pass: {}, start read image".format(pass_id))
        batch_id = 0
        total_loss = 0.0
        for batch_id, data in enumerate(reader()):
            t1 = time.time()
            loss = exe.run(train_program, feed=feeder.feed(data), fetch_list=train_fetch_list)
            period = time.time() - t1
            loss = np.mean(np.array(loss))
            total_loss += loss
            batch_id += 1
            total_batch_count += 1
            
            if batch_id % 200 == 0:
                logger.info("pass {}, trainbatch {}, loss {} time {}".format(pass_id,
                                                                             batch_id, loss, "%2.2f sec" % period))
        pass_mean_loss = total_loss / batch_id
        logger.info("pass {0} train result, current pass mean loss: {1}".format(pass_id, pass_mean_loss))

    logger.info("end training")`

#########################################################
#####################报错信息如下:

Python Call Stacks (More useful to users):
------------------------------------------
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2594, in _prepend_op
    attrs=kwargs.get("attrs", None))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 5472, in autoincreased_step_counter
    attrs={'step': float(step)})
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/learning_rate_scheduler.py", line 48, in _decay_step_counter
    counter_name='@LR_DECAY_COUNTER@', begin=begin, step=1)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/learning_rate_scheduler.py", line 387, in piecewise_decay
    global_step = _decay_step_counter()
  File "train.py", line 265, in optimizer_momentum_setting
    learning_rate=fluid.layers.piecewise_decay(boundaries=boundaries, values=values),
  File "train.py", line 371, in get_loss
    optimizer = optimizer_momentum_setting()
  File "train.py", line 306, in build_program_with_feeder
    loss = get_loss(model, outputs, gt_box, gt_label, main_prog)
  File "train.py", line 403, in train
    feeder, reader, loss = build_program_with_feeder(train_program, start_program, place)
  File "train.py", line 544, in <module>
    train()

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: The Tensor in the increment Op's Input Variable X(@LR_DECAY_COUNTER@) is not initialized.
  [Hint: Expected t->IsInitialized() == true, but received t->IsInitialized():0 != true:1.] at (/paddle/paddle/fluid/framework/operator.cc:1264)
  [operator < increment > error]

用quant_post量化了一个yolov3_darknet的的模型,推理加载出错

环境:
paddle1.7
出错信息:
PaddleCheckError: OP(LoadCombine) fail to open file ..\yolov3_darknet_quant_924_params_, please check whether the mod
el file is complete or damaged. at [D:\1.6.1\paddle\paddle/fluid/operators/load_combine_op.h:46]

量化代码
def quantize(): val_reader = mjreader.custom_reader(images_lists, data_dir, input_size,mode) place = fluid.CUDAPlace(0) exe = fluid.Executor(place) quant_post( executor=exe, model_dir='../work/PaddleDetection/yolov3_freeze/yolov3_darknet', quantize_model_path='./yolov3_darknet_quant_924/', sample_generator=val_reader, model_filename='__model__', params_filename='__params__', batch_size=16, batch_nums=20) def main(): quantize()
用quant_post量化了一个yolov3_darknet的的模型,得到的模型如何进行加载推理

安装时错误

使用从clone代码安装会报错:

from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
ImportError: cannot import name 'PostTrainingQuantization'

使用pip安装不会报错,但运行代码会报上述错误

关于量化的问题

文档中关于量化的配置:
weight_bits(int) - 参数量化bit数,默认8, 可选1-8,推荐设为8,因为量化后的数据类型是 int8 。
activation_bits(int) - 激活量化bit数,默认8,可选1-8,推荐设为8,因为量化后的数据类型是 int8 。
dtype(int8) - 量化后的参数类型,默认 int8 , 目前仅支持 int8 。
请问我如果把weight_bits(int) 设为7,把activation_bits(int)设为7,然后训练,模型的结果相当于7bit量化吗?

yolov3训练后量化 出错

CPU8
RAM32GB
GPUv100
显存16GB
磁盘100GB
环境配置
Python版本python3.7
框架版本 PaddlePaddle 1.7.0
运行代码:

input_size=(3, 512, 512)
sys.path[0] = os.path.join(
    os.path.dirname("__file__"), os.path.pardir, os.path.pardir)


        

def quantize():
    val_reader = mjreader.custom_reader(images_lists, data_dir, input_size,mode)
    place = fluid.CUDAPlace(0) 
    exe = fluid.Executor(place)
    quant_post(
        executor=exe,
        model_dir='../work/PaddleDetection/yolov3_dark_freeze/mj_yolov3_darknet',
        quantize_model_path='./yolov3_darknet_quant/',
        sample_generator=val_reader,
        model_filename='__model__',
        params_filename='__params__',
        batch_size=16,
        batch_nums=20) 

Error Message Summary:

InvalidArgumentError: Input(ImgSize) dim[0] and Input(X) dim[0] should be same.
[Hint: Expected dim_imgsize[0] == dim_x[0], but received dim_imgsize[0]:4800 != dim_x[0]:16.] at (/paddle/paddle/fluid/operators/detection/yolo_box_op.cc:50)
[operator < yolo_box > error]

我在用slim剪枝时优化器报错怎么解决

pruned_program, _, _ = pruner.prune(train_program, fluid.global_scope(),
params=ratios.keys(),ratios=ratios.values(), place=place)

报错为:Error: Param and Velocity of MomentumOp should have the same dimension.
[Hint: Expected param_dim == ctx->GetInputDim("Velocity"), but received param_dim:12544, 4096 != ctx->GetInputDim("Velocity"):24832, 4096.] at (/paddle/paddle/fluid/operators/optimizers/momentum_op.h:79)
[operator < momentum > error]
应该是说参数数量不匹?但我剪枝的话参数数量肯定会减少吧

slim裁剪,如何获取卷积层名字?

问题一:
官网提供的很多教程都是 基于命令行形式的
但是slim中很多API参数都需要program的概念,我要如何获取这个参数?
比如使用 裁剪的时候,需要先获取卷积层的名字,应该什么方式获取?
模型训练完保存是这样的
image

问题二:
slim的API很多都是基于静态图的方法(根据参数来做的判断),在动态图上使用是否有连接教程?

nas 对搜索到的模型的中间信息进行输出

我尝试使用paddleslim平台运行https://github.com/PaddlePaddle/PaddleSlim/blob/release/1.0.1/docs/zh_cn/tutorials/image_classification_nas_quick_start.ipynb 网址下的例程并尝试将网络模型和参数输出,但是遇到了问题,在代码中并没有将全连接层以外的其他网络层定义出,我理解其他网络层的架构是使用:archs = sanas.next_archs()[0] 总体描述的,但是当我编写输出语句时,输出语句中fluid.io.save_inference_model(dirname=save_path, feeded_var_names=[data.name], target_vars=[predict], executor=exe) 其中target_vars=[predict] predict=其他网络层+最后全连接层,也就是说:
archs = sanas.next_archs()[0]
output = fluid.layers.fc(input=output, size=10)
predict = archs+output

如果我的理解没有错误,那么输出语句应该怎么表达才能将整个网络架构的信息进行输出呢?

How to append float32 operator to quantized graph

from paddleslim.quant import quant_aware, convert
quantized_graph = convert(infer_prog, place, config=config)
quantized_program = quantized_graph.to_program()
for var in quantized_program.list_vars():
    print(var.name)

with fluid.program_guard(quantized_program):
    out = quantized_program.global_block().var("your_var_name")
    out = fluid.layers.some_op(out)

fluid.io.save_inference_model(main_program=quantized_program, ...)

Some API docs:

是否支出训练自己的数据集

paddleslim是否支持训练自己的数据集,yolov3_mobilenet_v1_voc_prune下载的一堆权重文件怎么使用?是否有教程或者视频等

mobilenetv2 deeplabv3+ pruning问题

在最新的paddleslim库和paddleseg库上进行mobilenetv2 deeplabv3+的pruning,配置裁剪参数后报以下错误,请问是什么原因。
image

Parameter[decoder/separable_conv1/pointwise/BatchNorm/beta] loaded sucessfully!
Parameter[decoder/separable_conv1/pointwise/BatchNorm/moving_mean] loaded sucessfully!
Parameter[decoder/separable_conv1/pointwise/BatchNorm/moving_variance] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/weights] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/gamma] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/beta] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/moving_mean] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/moving_variance] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/weights] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/gamma] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/beta] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/moving_mean] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/moving_variance] loaded sucessfully!
Parameter[logit/weights] loaded sucessfully!
Parameter[logit/biases] loaded sucessfully!
332/332 pretrained parameters loaded successfully!
Traceback (most recent call last):
File "./slim/prune/train_prune.py", line 504, in
main(args)
File "./slim/prune/train_prune.py", line 491, in main
train(cfg)
File "./slim/prune/train_prune.py", line 347, in train
only_graph=False)[0]
File "/home/hpc/ccx/paddle1.7/PaddleSlim/paddleslim/prune/pruner.py", line 82, in prune
param_t = np.array(scope.find_var(param).get_tensor())
AttributeError: 'NoneType' object has no attribute 'get_tensor'

@wanghaoshuang

训练后量化权值使用的量化方法确切是什么?

原文:训练后量化的目标是求取量化比例因子,主要有两种方法:非饱和量化方法 ( No Saturation) 和饱和量化方法 (Saturation)。非饱和量化方法计算FP32类型Tensor中绝对值的最大值abs_max,将其映射为127,则量化比例因子等于abs_max/127。饱和量化方法使用KL散度计算一个合适的阈值T (0<T<mab_max),将其映射为127,则量化比例因子等于T/127。一般而言,对于待量化op的权重Tensor,采用非饱和量化方法,对于待量化op的激活Tensor(包括输入和输出),采用饱和量化方法 。
问题:
1.权值使用的量化方法算法原理里用的是abs_max,而api介绍里说的是channel_abs_max
2.另外对于激活层,使用KL散度计算出的T,算法原理里写的是0<T<mab_max, 其中mab_max是指什么?
3.训练后量化中,对于输入量化比例系数的计算也是使用KL散度吗?

模型量化后,不支持fluid.ParallelExecutor吗?

执行
quant_program = quant.quant_aware(train_prog, exe.place, for_test=False)
val_program = fluid.default_main_program().clone(for_test=True)
再训练quant_program,发觉不能用fluid.ParallelExecutor,提示
AttributeError: 'CompiledProgram' object has no attribute '_enable_dgc'

在使用蒸馏时,若teacher program里有包含BIGRU会报错

在teacher program里
定义
encoder_fwd_cell = fluid.layers.GRUCell(hidden_size=128)
encoder_fwd_output, fwd_state = fluid.layers.rnn(
cell=encoder_fwd_cell,
inputs=emb_out,
sequence_length=None,
time_major=False,
is_reverse=False)
# 使用GRUCell构建反向RNN
encoder_bwd_cell = fluid.layers.GRUCell(hidden_size=128)
encoder_bwd_output, bwd_state = fluid.layers.rnn(
cell=encoder_bwd_cell,
inputs=emb_out,
sequence_length=None,
time_major=False,
is_reverse=True)
# 拼接前向与反向GRU的编码结果得到h
encoder_output = fluid.layers.concat(
input=[encoder_fwd_output, encoder_bwd_output], axis=2)
encoder_output=fluid.layers.elementwise_mul(encoder_output,input_mask,axis=-1)
这个结构的话,就会报错,去掉就可以,估计源码中未考虑到某种情况导致出错

如何理解敏感度裁切中的greedy_prune和普通的prune区别?

1.如提问
2.下列代码中 ,#标出的部分有何意义?为什么要乘2?

def flops_sensitivity(program,
                      place,
                      param_names,
                      eval_func,
                      sensitivities_file=None,
                      pruned_flops_rate=0.1):

    assert (1.0 / len(param_names) > pruned_flops_rate)

    scope = fluid.global_scope()
    graph = GraphWrapper(program)
    sensitivities = load_sensitivities(sensitivities_file)

    for name in param_names:
        if name not in sensitivities:
            sensitivities[name] = {}
    base_flops = flops(program)
    target_pruned_flops = base_flops * pruned_flops_rate

    pruner = Pruner()
    baseline = None
    for name in sensitivities:

        pruned_program, _, _ = pruner.prune(
            program=graph.program,
            scope=None,
            params=[name],
            ratios=[0.5],
            place=None,
            lazy=False,
            only_graph=True)
       ################################################
        param_flops = (base_flops - flops(pruned_program)) * 2
        channel_size = graph.var(name).shape()[0]
        pruned_ratio = target_pruned_flops / float(param_flops)
       ################################################
        pruned_ratio = round(pruned_ratio, 3)
        pruned_size = round(pruned_ratio * channel_size)
        pruned_ratio = 1 if pruned_size >= channel_size else pruned_ratio

        if len(sensitivities[name].keys()) > 0:
            _logger.debug(
                '{} exist; pruned ratio: {}; excepted ratio: {}'.format(
                    name, sensitivities[name].keys(), pruned_ratio))
            continue
        if baseline is None:
            baseline = eval_func(graph.program)
        param_backup = {}
        pruner = Pruner()
        _logger.info("sensitive - param: {}; ratios: {}".format(name,
                                                                pruned_ratio))
        loss = 1
        if pruned_ratio < 1:
            pruned_program = pruner.prune(
                program=graph.program,
                scope=scope,
                params=[name],
                ratios=[pruned_ratio],
                place=place,
                lazy=True,
                only_graph=False,
                param_backup=param_backup)
            pruned_metric = eval_func(pruned_program)
            loss = (baseline - pruned_metric) / baseline
        _logger.info("pruned param: {}; {}; loss={}".format(name, pruned_ratio,
                                                            loss))
        sensitivities[name][pruned_ratio] = loss
        _save_sensitivities(sensitivities, sensitivities_file)

        # restore pruned parameters
        for param_name in param_backup.keys():
            param_t = scope.find_var(param_name).get_tensor()
            param_t.set(param_backup[param_name], place)
    return sensitivities

3.下列代码中min_loss和max_loss初始值均为0,会导致while循环不执行

def get_ratios_by_sensitive(self, sensitivities, pruned_flops,
                                eval_program):
        """
        Search a group of ratios for pruning target flops.
        Args:
          sensitivities(dict): The sensitivities used to generate a group of pruning ratios. The key of dict
                               is name of parameters to be pruned. The value of dict is a list of tuple with
                               format `(pruned_ratio, accuracy_loss)`.
          pruned_flops(float): The percent of FLOPS to be pruned.
          eval_program(Program): The program whose FLOPS is considered.
        Returns:
          dict: A group of ratios. The key of dict is name of parameters while the value is the ratio to be pruned.
        """

        min_loss = 0.
        max_loss = 0.
        # step 2: Find a group of ratios by binary searching.
        base_flops = flops(eval_program)
        ratios = None
        max_times = 20
        while min_loss < max_loss and max_times > 0:
            loss = (max_loss + min_loss) / 2
            _logger.info(
                '-----------Try pruned ratios while acc loss={}-----------'.
                format(loss))
            ratios = self.get_ratios_by_loss(sensitivities, loss)
            _logger.info('Pruned ratios={}'.format(
                [round(ratio, 3) for ratio in ratios.values()]))
            pruned_program = self._pruner.prune(
                eval_program,
                None,  # scope
                ratios.keys(),
                ratios.values(),
                None,  # place
                only_graph=True)
            pruned_ratio = 1 - (float(flops(pruned_program)) / base_flops)
            _logger.info('Pruned flops: {:.4f}'.format(pruned_ratio))

            # Check whether current ratios is enough
            if abs(pruned_ratio - pruned_flops) < 0.015:
                break
            if pruned_ratio > pruned_flops:
                max_loss = loss
            else:
                min_loss = loss
            max_times -= 1
        return ratios

加载通道裁剪后的模型,再进行蒸馏报错

    student_program = fluid.Program()
    s_startup = fluid.Program()
    with fluid.program_guard(student_program, s_startup):
        with fluid.unique_name.guard():
            # model definition
           ...
    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
    load_model(exe, student_program, args.pretrained_model)  #prune.io
    val_program = student_program.clone(for_test=True)

    teacher_model = models.__dict__[args.teacher_model]()
    teacher_program = fluid.Program()
    t_startup = fluid.Program()
    with fluid.program_guard(teacher_program, t_startup):
        with fluid.unique_name.guard():
            # teacher model definition
           ...

    exe.run(t_startup)
    if args.teacher_pretrained_model:
        def if_exist(var):
            return os.path.exists(
                os.path.join(args.teacher_pretrained_model, var.name))
        fluid.io.load_vars(
            exe,
            args.teacher_pretrained_model,
            main_program=teacher_program,
            predicate=if_exist)

    data_name_map = {'images': 'images'}
    merge(teacher_program, student_program, data_name_map, place)
    with fluid.program_guard(student_program, s_startup):
        dist_loss = soft_label_loss("teacher_fc_0.tmp_0", "fc_0.tmp_0", student_program)
        loss = avg_cost + dist_loss
        lr, opt = create_optimizer(args)
        opt.minimize(loss)
    exe.run(s_startup)

会报以下错误:

Error: Param and Velocity of MomentumOp should have the same dimension.
  [Hint: Expected param_dim == ctx->GetInputDim("Velocity"), but received param_dim:256 != ctx->GetInputDim("Velocity"):128.] at (/paddle/paddle/fluid/operators/optimizers/momentum_op.h:79)
  [operator < momentum > error]

怀疑是exe.run(s_startup)这句代码覆盖了load_model ,但是optimizer又需要初始化,请问如何解决?

安装问题

安装不成功,使用pip安装成功后,出现以下问题:
image
使用源码安装,出现以下问题:
image
另外,给的demo中,尝试运行了多个,结果都无法正常运行,主要是:
sensitive:无法运行,merge_sensitive()报错;
auto_prune:无法运行,报错信息如下
image
sensitive_prune:无法正常运行完。报错信息如下
image
几乎给的demo都无法正常运行

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.