Giter Club home page Giter Club logo

paddleclas's Introduction

简体中文 | English

PaddleClas

简介

飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。

PP-ShiTu图像识别系统应用范围

PULC实用图像分类模型效果展示

📣 近期更新

  • 🔥2023.3.16 PaddleClas集成了高性能、全场景模型部署方案FastDeploy,欢迎参考指南试用(注意使用develop分支)。

  • 💥 直播回放:PaddleClas研发团队详解PP-ShituV2优化策略与真实产业应用。微信扫描下方二维码,关注公众号并填写问卷后进入官方交流群,获取直播回放与20G重磅图像分类学习大礼包(内含20+数据集、4个垂类模型、70+前沿论文集合)

🌟 特性

PaddleClas支持多种前沿图像分类、识别相关算法,发布产业级特色骨干网络PP-HGNetPP-LCNetv2PP-LCNetSSLD半监督知识蒸馏方案等模型,在此基础上打造PULC超轻量图像分类方案PP-ShiTu图像识别系统

上述内容的使用方法建议从文档教程中的快速开始体验

⚡ 快速开始

  • PULC超轻量图像分类方案快速体验:点击这里
  • PP-ShiTu图像识别快速体验:点击这里
  • PP-ShiTuV2 Android Demo APP,可扫描如下二维码,下载体验

📖 技术交流合作

  • 飞桨低代码开发工具(PaddleX)—— 面向国内外主流AI硬件的飞桨精选模型一站式开发工具。包含如下核心优势:

    • 【产业高精度模型库】:覆盖10个主流AI任务 40+精选模型,丰富齐全。
    • 【特色模型产线】:提供融合大小模型的特色模型产线,精度更高,效果更好。
    • 【低代码开发模式】:图形化界面支持统一开发范式,便捷高效。
    • 【私有化部署多硬件支持】:适配国内外主流AI硬件,支持本地纯离线使用,满足企业安全保密需要。
  • PaddleX官网地址:https://aistudio.baidu.com/intro/paddlex

  • PaddleX官方交流频道:https://aistudio.baidu.com/community/channel/610

👫 开源社区

  • 📑项目合作: 如果您是企业开发者且有明确的图像分类应用需求,填写问卷后可免费与官方团队展开不同层次的合作。
  • 👫加入社区: 微信扫描二维码并填写问卷之后,加入交流群领取20G重磅图像分类学习大礼包,内含
    • 20+场景数据库,包括各类商品、动植物、航拍图像等数据集
    • 场景应用模型集合:包括人员出入管理、生鲜品识别、商品识别等
    • 70+前沿图像分类与识别论文、历次发版课程视频、PPT与优质社区项目等

🛠️ PP系列模型列表

模型简介 应用场景 模型下载链接
PULC 超轻量图像分类方案 固定图像类别分类方案 人体、车辆、文字相关9大模型:模型库连接
PP-ShituV2 轻量图像识别系统 针对场景数据类别频繁变动、类别数据多 主体检测模型:预训练模型 / 推理模型
识别模型:预训练模型 / 推理模型
PP-LCNet 轻量骨干网络 针对Intel CPU设备及MKLDNN加速库定制 PPLCNet_x1_0:预训练模型 / 推理模型
PP-LCNetV2 轻量骨干网络 针对Intel CPU设备,适配OpenVINO PPLCNetV2_base:预训练模型 / 推理模型
PP-HGNet 高精度骨干网络 GPU设备上相同推理时间精度更高 PPHGNet_small:预训练模型 / 推理模型

全部模型下载链接可查看 文档教程 中的各模型介绍

产业范例

📖 文档教程

PP-ShiTuV2图像识别系统

PP-ShiTuV2是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面,采用多种策略,对各个模块的模型进行优化,PP-ShiTuV2相比V1,Recall1提升近8个点。更多细节请参考PP-ShiTuV2详细介绍

PP-ShiTuV2图像识别系统效果展示

  • 瓶装饮料识别
  • 商品识别
  • 动漫人物识别
  • logo识别
  • 车辆识别

PULC超轻量图像分类方案

PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法,可以自动训练得到轻量且高精度的图像分类模型。 PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型,CPU推理3ms,精度比肩SwinTransformer。

PULC实用图像分类模型效果展示

许可证书

本项目的发布受Apache 2.0 license许可认证。

贡献代码

我们非常欢迎你为PaddleClas贡献代码,也十分感谢你的反馈。 如果想为PaddleClas贡献代码,可以参考贡献指南

  • 非常感谢nblib修正了PaddleClas中RandErasing的数据增广配置文件。
  • 非常感谢chenpy228修正了PaddleClas文档中的部分错别字。
  • 非常感谢jm12138为PaddleClas添加ViT,DeiT系列模型和RepVGG系列模型。

paddleclas's People

Contributors

aurelius84 avatar cuicheng01 avatar dyning avatar evezerest avatar flytocc avatar fredhuang16 avatar huangxu96 avatar hydrogensulfate avatar hysunflower avatar intsigstephon avatar jiaxiao243 avatar jm12138 avatar larastustu avatar lilith-zy avatar littletomatodonkey avatar lvjian0706 avatar lyuwenyu avatar qingshuchen avatar rainfrost1 avatar shippingwang avatar sibo2rr avatar tingquangao avatar vslyu avatar weisy11 avatar wqz960 avatar wuhaobo avatar yanhuidua avatar zengshao0622 avatar zhangbo9674 avatar zhiboniu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

paddleclas's Issues

run PaddleClas infer.py ERROR

my infer.sh:
export PYTHONPATH=$PWD:$PYTHONPATH

python -m paddle.distributed.launch
--selected_gpus="0"
tools/infer/infer.py -i "dataset/FGVC2020_SSFGRC/test/26.jpg"
-m "SENet154_vd"
-p "output/expr20_SENet154_vd_train_bestv1_25971.txt_val2000_val2750_78.84"

ERROR:
Traceback (most recent call last):
File "tools/infer/infer.py", line 121, in
main()
File "tools/infer/infer.py", line 113, in main
return_numpy=False)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 790, in run
six.reraise(*sys.exc_info())
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 785, in run
use_program_cache=use_program_cache)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 838, in _run_impl
use_program_cache=use_program_cache)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 909, in _run_program
self._feed_data(program, feed, feed_var_name, scope)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 591, in _feed_data
check_feed_shape_type(var, cur_feed)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 230, in check_feed_shape_type
(var.name, len(var.shape), var.shape, feed_shape))
ValueError: The fed Variable u'image' should have dimensions = 4, shape = (-1L, 3L, 224L, 224L), but received fed shape [3L, 224L, 224L] on each device

Incorrect setting of `is_test` in EfficientNet

is_test is not correctly set in EfficientNet, leading to drop_connect in test time. It can be easily reproduced by a repeat of inferring in the same image, like what happens in the following.
image
The predicted probabilities were different between different runs.

The cause may like this.
image
is_test defaults to False in EfficientNet and is not being set to True in either infer.py or predict.py.

Moreover, duplicated definition of is_test in both __init__ and net leads to confusion.
image
In fact, _drop_connect uses self.is_test and is_test passed by methods is not used.

It would be better to fix it.

resnet_vd训练出错,没有is_test字段

paddle版本:1.7.1
config: ResNet50_vd.yaml
执行训练后出错:
image
resnet50_vd init中确实没有is_test字段,但是program.create_model中会传入这个字段:
image
请问下这里是我的版本问题吗?

Mixed Precision Training

Mixed precision training is available in PaddleCV/image_classification but not in this repo. According to Release Notes of PaddlePaddle 1.7, AMP interfaces have been added.
image
Based on these, I think it would be convenient to implement it.

Mixed precision training is critical to fast training on V100. Please consider adding it. Thank you!

您好,模型infer之后,同一张图的结果有diff

您好,我用infer脚本进行推断的时候遇到了如下的问题
第一次infer:class id: 1, probability: 0.9075
第二次infer:class id: 1, probability: 0.9048
第三次infer:class id: 1, probability: 0.9069

这是我的运行脚本:
export PYTHONPATH=$PWD:$PYTHONPATH
export CUDA_VISIBLE_DEVICES=0
#--model=EfficientNetB0 --pretrained_model=output/EfficientNetB0_val/best_model_in_epoch_124/ppcls --output_paht=./convert
python tools/infer/infer.py
--image_file=./tools/img.jpg
--model=EfficientNetB0
--pretrained_model=output/EfficientNetB0_val/best_model_in_epoch_124/ppcls \

其中我的改动是,在resize的时候去掉了resize_short模式,将图片直接resize到288大小

有遇到的小伙伴帮忙答疑一下呀,谢谢~~

demo运行出错

paddle环境1.7.2 cuda9.0 cudnn7.5
如果使用命令/home/vis/duyuting/app/anaconda3/bin/python -m paddle.distributed.launch --selected_gpus="0" tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml 会报错:
Error: Failed to find dynamic library: libnccl.so ( /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/vis/duyuting/app/nccl_2.5.6-1+cuda10.0_x86_64/lib/libnccl.so) )
Please specify its path correctly using following ways:
Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS.
For instance, issue command: export LD_LIBRARY_PATH=...
Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled. at (/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:177)
[operator < gen_nccl_id > error] 看起来是nccl问题
去官网下载了cuda9版本的nccl报错:
Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.

  • New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
  • Recommended issue content: all error stack information
    [unhandled system error] at (/paddle/paddle/fluid/operators/distributed_ops/gen_nccl_id_op.cc:162)
    [operator < gen_nccl_id > error]
    如果不使用分布式命令:/home/vis/duyuting/app/anaconda3/bin/python tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml 报错:Traceback (most recent call last):
    File "tools/train.py", line 133, in
    main(args)
    File "tools/train.py", line 59, in main
    fleet.init(role)
    File "/home/vis/duyuting/app/anaconda3/lib/python3.7/site-packages/paddle/fluid/incubate/fleet/base/fleet_base.py", line 202, in init
    self._role_maker.generate_role()
    File "/home/vis/duyuting/app/anaconda3/lib/python3.7/site-packages/paddle/fluid/incubate/fleet/base/role_maker.py", line 500, in generate_role
    assert self._worker_endpoints is not None, "can't find PADDLE_TRAINER_ENDPOINTS"
    这个库难道不能单gpu运行????

训练loss突增后变为nan

用MobileNetV3_large_x1_0训练分类模型,训练到第二个epoch,loss突然增大后又变为nan,这是为什么呢?大家有什么经验吗?
Uploading image.png…

动态图版本支持情况

如题,开发者你好,请问一下目前这个库的动态图版本代码能正常运行么?和静态图版本的开发进度目前有哪些是不对齐的?

se+hrnet

因为我想在HRNet下加上注意力机制,所以选择使用se+hrnet,在赢一个issue中反馈给我的是SE+HRNet需要有带SE的预训练,直接加载没有SE的预训练的模型精度会比较低。
我的问题:
1.是否有SE+HRNet的预训练
2.如果没有,我应该怎么训练能有一个较好的结果,是否有可行性的建议
3.是否有其他易于训练的注意力机制,相较于SE+HRNet在没有预训练模型的情况下容易训练。

ValueError: Operator "gen_nccl_id" has not been registered.

E:\projects\PaddleClas-master>python -m paddle.distributed.launch --selected_gpus='0' tools/train.py -c configs/quick_start/ResNet50_vd_finetune_my.yaml
----------- Configuration Arguments -----------
cluster_node_ips: 127.0.0.1
log_dir: None
node_ip: 127.0.0.1
print_config: True
selected_gpus: '0'
started_port: 6170
training_script: tools/train.py
training_script_args: ['-c', 'configs/quick_start/ResNet50_vd_finetune_my.yaml']
use_paddlecloud: False

trainers_endpoints: 127.0.0.1:6170 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 1
2020-05-13 23:57:14 INFO:

== PaddleClas is powered by PaddlePaddle ! ==

== ==
== For more info please go to the following website. ==
== ==
== https://github.com/PaddlePaddle/PaddleClas ==

2020-05-13 23:57:14 INFO: ARCHITECTURE :
2020-05-13 23:57:14 INFO: name : ResNet50_vd
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: LEARNING_RATE :
2020-05-13 23:57:14 INFO: function : Cosine
2020-05-13 23:57:14 INFO: params :
2020-05-13 23:57:14 INFO: lr : 0.00375
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: OPTIMIZER :
2020-05-13 23:57:14 INFO: function : Momentum
2020-05-13 23:57:14 INFO: params :
2020-05-13 23:57:14 INFO: momentum : 0.9
2020-05-13 23:57:14 INFO: regularizer :
2020-05-13 23:57:14 INFO: factor : 1e-06
2020-05-13 23:57:14 INFO: function : L2
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: TRAIN :
2020-05-13 23:57:14 INFO: batch_size : 32
2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/
2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513train.list
2020-05-13 23:57:14 INFO: num_workers : 4
2020-05-13 23:57:14 INFO: shuffle_seed : 0
2020-05-13 23:57:14 INFO: transforms :
2020-05-13 23:57:14 INFO: DecodeImage :
2020-05-13 23:57:14 INFO: channel_first : False
2020-05-13 23:57:14 INFO: to_np : False
2020-05-13 23:57:14 INFO: to_rgb : True
2020-05-13 23:57:14 INFO: RandCropImage :
2020-05-13 23:57:14 INFO: size : 224
2020-05-13 23:57:14 INFO: RandFlipImage :
2020-05-13 23:57:14 INFO: flip_code : 1
2020-05-13 23:57:14 INFO: NormalizeImage :
2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406]
2020-05-13 23:57:14 INFO: order :
2020-05-13 23:57:14 INFO: scale : 1./255.
2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225]
2020-05-13 23:57:14 INFO: ToCHWImage : None
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: VALID :
2020-05-13 23:57:14 INFO: batch_size : 20
2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/
2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513test.list
2020-05-13 23:57:14 INFO: num_workers : 4
2020-05-13 23:57:14 INFO: shuffle_seed : 0
2020-05-13 23:57:14 INFO: transforms :
2020-05-13 23:57:14 INFO: DecodeImage :
2020-05-13 23:57:14 INFO: channel_first : False
2020-05-13 23:57:14 INFO: to_np : False
2020-05-13 23:57:14 INFO: to_rgb : True
2020-05-13 23:57:14 INFO: ResizeImage :
2020-05-13 23:57:14 INFO: resize_short : 256
2020-05-13 23:57:14 INFO: CropImage :
2020-05-13 23:57:14 INFO: size : 224
2020-05-13 23:57:14 INFO: NormalizeImage :
2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406]
2020-05-13 23:57:14 INFO: order :
2020-05-13 23:57:14 INFO: scale : 1.0/255.0
2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225]
2020-05-13 23:57:14 INFO: ToCHWImage : None
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: classes_num : 3
2020-05-13 23:57:14 INFO: epochs : 20
2020-05-13 23:57:14 INFO: image_shape : [3, 224, 224]
2020-05-13 23:57:14 INFO: mode : train
2020-05-13 23:57:14 INFO: model_save_dir : E:/projects/PaddleClas-master/output/
2020-05-13 23:57:14 INFO: pretrained_model : E:/projects/PaddleClas-master/ResNet50_vd_pretrained
2020-05-13 23:57:14 INFO: save_interval : 1
2020-05-13 23:57:14 INFO: topk : 5
2020-05-13 23:57:14 INFO: total_images : 795
2020-05-13 23:57:14 INFO: valid_interval : 1
2020-05-13 23:57:14 INFO: validate : True

API is deprecated since 2.0.0 Please use FleetAPI instead.
WIKI: https://github.com/PaddlePaddle/Fleet/blob/develop/markdown_doc/transpiler

Traceback (most recent call last):
File "tools/train.py", line 124, in
main(args)
File "tools/train.py", line 69, in main
config, train_prog, startup_prog, is_train=True)
File "E:\projects\PaddleClas-master\tools\program.py", line 341, in build
optimizer.minimize(fetchs['loss'][0])
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init_.py", line 424, in minimize
fleet.main_program = self.try_to_compile(startup_program, main_program)
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init
.py", line 358, in _try_to_compile
self.transpile(startup_program, main_program)
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init
.py", line 285, in _transpile
current_endpoint=current_endpoint)
File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 625, in transpile
wait_port=self.config.wait_port)
File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 397, in _transpile_nccl2
self.config.hierarchical_allreduce_inter_nranks
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1797, in init
proto = OpProtoHolder.instance().get_op_proto(type)
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1679, in get_op_proto
raise ValueError("Operator "%s" has not been registered." % type)
ValueError: Operator "gen_nccl_id" has not been registered.
2020-05-13 15:57:16,981-ERROR: ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
ERROR 2020-05-13 15:57:16,981 launch.py:284] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.

这是什么问题?

十万分类预训练模型的推断

UnavailableError: Load operator fail to open file pretrained/ResNet50_vd_10w_pretrained/fc_0.w_0, please check whether the model file is complete or damaged.
[Hint: Expected static_cast(fin) == true, but received static_cast(fin):0 != true:1.] at (/paddle/paddle/fluid/operators/load_op.h:41)
[operator < load > error]

received rank:2 != label_dims.size():3

报错: File "tools/train.py", line 124, in
main(args)


Error Message Summary:

InvalidArgumentError: If Attr(soft_label) == true, Input(X) and Input(Label) shall have the same dimensions. But received: the dimensions of Input(X) is [2],the shape of Input(X) is [-1, 2], the dimensions of Input(Label) is [3], the shape ofInput(Label) is [-1, 1, 2]
[Hint: Expected rank == label_dims.size(), but received rank:2 != label_dims.size():3.] at (D:\1.8.1\paddle\paddle\fluid\operators\cross_entropy_op.cc:63)
[operator < cross_entropy > error]
INFO 2020-05-23 18:17:34,812 utils.py:272] terminate all the procs
ERROR 2020-05-23 18:17:34,812 utils.py:416] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
INFO 2020-05-23 18:17:34,813 utils.py:272] terminate all the procs

图片512*512png,8位深度,类别1,2,3。这个报错里的rank和label_dims.size()分别是什么意思??

使用自己的训练集train from scratch

你好,我使用自己的训练集(只有1类物体)进行train from scratch ,但是训练的过程中,top1和top2始终是1.0000(eval也是这样的),如图所示:
使用的配置文件为resnet50_vd.yaml,在配置文件中我改了类别数为2,请问这种情况应该怎末更改配置文件?还有一个问题是,如何拿PaddleClas训练完成的分类模型使用PaddleDetection进行目标检测?谢谢!
image

HWC->CHW function redundancy

In operators.py
It seems that to_np, order and channel_first is not necessary
we already have a ToCHWImage function

Why Larger Batch Size Slows Training

I am training WRN-28-10 on CIFAR10 using PaddleClas. When batch size > 128, using larger batch size, training gets slower. A detailed comparison is shown below.

Batch Size Time (Per Epoch)
32 82.2s
64 72.8s
128 68.5s
256 74.1s
512 86.4s
1024 110.5s

The time of the 2nd epoch is reported, so warm-up time is not counted. Experiments showed that the results were consistent.

This behavior is strange and unexpected. Could you help me to find the reason?

Code to reproduce is here.

Thank you very much!

模型推理报错

你好,我在aistudio上已将训练的模型转成inference模型之后,在推断的时候报错了:
!export PYTHONPATH=./:$PYTHONPATH && python tools/infer/predict.py
-m=./inference/ResNet50_vd/model
-p=./inference/ResNet50_vd/params
-i=./dataset/flowers102/jpg/image_02275.jpg
--use_gpu=1
--use_tensorrt=True

报错信息如下:

Traceback (most recent call last):
File "tools/infer/predict.py", line 156, in
main()
File "tools/infer/predict.py", line 110, in main
predictor = create_predictor(args)
File "tools/infer/predict.py", line 66, in create_predictor
predictor = create_paddle_predictor(config)
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::framework::ir::PassRegistry::Get(std::string const&) const
3 paddle::inference::analysis::IRPassManager::CreatePasses(paddle::inference::analysis::Argument*, std::vector<std::string, std::allocatorstd::string > const&)
4 paddle::inference::analysis::IRPassManager::IRPassManager(paddle::inference::analysis::Argument*)
5 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
6 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*)
7 paddle::AnalysisPredictor::OptimizeInferenceProgram()
8 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptrpaddle::framework::ProgramDesc const&)
9 paddle::AnalysisPredictor::Init(std::shared_ptrpaddle::framework::Scope const&, std::shared_ptrpaddle::framework::ProgramDesc const&)
10 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
11 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictorpaddle::AnalysisConfig(paddle::AnalysisConfig const&)


Error Message Summary:

Error: Pass tensorrt_subgraph_pass has not been registered at (/paddle/paddle/fluid/framework/ir/pass.h:201)

请问如何解决?

--use_tensorrt=True

Error: Pass tensorrt_subgraph_pass has not been registered at (/paddle/paddle/fluid/framework/ir/pass.h:170)

export_model模型转换出错

export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch
--selected_gpus="0"
tools/train.py
-c ./configs/quick_start/ResNet50_vd.yaml

使用上述命令训练模型后,然后通过export_model转换模型
python tools/export_model.py --model=ResNet50_vd --pretrained_model=output/ResNet50_vd/19/ --output_path=inference/ResNet50_vd --class_dim=102

报错
2020-05-09 14:36:17,701-WARNING: output/ResNet50_vd/19/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-09 14:36:17,701-WARNING: output/ResNet50_vd/19/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-09 14:36:17,703-WARNING: variable file [ output/ResNet50_vd/19/ppcls.pdopt output/ResNet50_vd/19/ppcls.pdparams output/ResNet50_vd/19/ppcls.pdmodel ] not used
2020-05-09 14:36:17,703-WARNING: variable file [ output/ResNet50_vd/19/ppcls.pdopt output/ResNet50_vd/19/ppcls.pdparams output/ResNet50_vd/19/ppcls.pdmodel ] not used
/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:804: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used.
warnings.warn(error_info)
/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "tools/export_model.py", line 78, in
main()
File "tools/export_model.py", line 74, in main
params_filename='params')
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 1245, in save_inference_model
save_persistables(executor, save_dirname, main_program, params_filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 640, in save_persistables
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 295, in save_vars
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 350, in save_vars
executor.run(save_program)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "/home/lishi/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run
use_program_cache=use_program_cache)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 831, in _run_impl
use_program_cache=use_program_cache)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 905, in _run_program
fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::framework::Tensor::type() const
3 paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, double>, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, int>, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
8 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
9 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool, bool)


Python Call Stacks (More useful to users):

File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 343, in save_vars
'save_to_memory': save_to_memory
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 295, in save_vars
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 640, in save_persistables
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 1245, in save_inference_model
save_persistables(executor, save_dirname, main_program, params_filename)
File "tools/export_model.py", line 74, in main
params_filename='params')
File "tools/export_model.py", line 78, in
main()


Error Message Summary:

Error: Tensor not initialized yet when Tensor::type() is called.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.h:140)
[operator < save_combine > error]

数据列表文件 delimiter

貌似配置里面不能设置数据列表的delimiter,我这数据集里面文件名带空格,能用 | 的话会很方便

模型推断问题

请教下大佬:
1、使用如下命令貌似只能推断一张图片,如果做到推断一个文件夹呢?类似paddle detection那样指定一个infer_dir。
python tools/infer/predict.py
-m model文件路径
-p params文件路径
-i 图片路径
--use_gpu=1
--use_tensorrt=True

2、windows环境下,怎样设置环境变量呢?我用aistudio上面的命令,Windows终端不认啊:
export PYTHONPATH=$PWD:$PYTHONPATH

train from scratch

你好,我想问一下,需要使用paddleClas从头训练自己的数据,但是那个train_list.txt 中除了图片路径外,位置坐标是使用中心点坐标和宽高,还是使用左上右下坐标呢?
image

PaddleClas训练数据不均衡

你好,请问如果训练数据不均衡出现数据倾斜,目前PaddleClas是否有相对应解决办法,谢谢。

export model 出现Tensor not initialized yet when Tensor::type() is called错误

根据教程 导出模型的过程:
python tools/export_model.py
--model=MobileNetV3_large_x1_0
--pretrained_model=./output/MobileNetV3_large_x1_0/best_model_in_epoch_7/
--output_path=./convert/ \

报错如下:有经验的小伙伴帮忙看看?

Python Call Stacks (More useful to users):

File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 343, in save_vars
'save_to_memory': save_to_memory
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 295, in save_vars
filename=filename)
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 641, in save_persistables
filename=filename)
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 1246, in save_inference_model
save_persistables(executor, save_dirname, main_program, params_filename)
File "tools/export_model.py", line 74, in main
params_filename='params')
File "tools/export_model.py", line 78, in
main()


Error Message Summary:

Error: Tensor not initialized yet when Tensor::type() is called.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.h:140)
[operator < save_combine > error]

如何在paddleHub上部署

你好,我用SSLD模型微调-基于ResNet50_vd_ssld预训练模型来做训练,然后生成了inference模型,想用PaddleHub进行部署的操作,有没有什么借壳快速部署的方式替换一下原来的module下的inference模型就可以启动部署的方式。

调用模型微调命令训练出错

模型调用命令,使用百度ResNet50_vd_10w的预训练模型:
set CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch --selected_gpus="0" tools/train.py -c ./configs/quick_start/ResNet50_vd_10w_finetune.yaml

报错:

Traceback (most recent call last):
File "tools/train.py", line 150, in
main(args)
File "tools/train.py", line 75, in main
config, train_prog, startup_prog, is_train=True)
File "F:\pythonproject\PaddleClas\PaddleClas\tools\program.py", line 363, in build
optimizer.minimize(fetchs['loss'][0])
File "F:\Anaconda3\lib\site-packages\paddle\fluid\incubate\fleet\collective_init_.py", line 652, in minimize
fleet.main_program = self.try_to_compile(startup_program, main_program)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\incubate\fleet\collective_init
.py", line 562, in _try_to_compile
self.transpile(startup_program, main_program)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\incubate\fleet\collective_init
.py", line 489, in _transpile
current_endpoint=current_endpoint)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 625, in transpile
wait_port=self.config.wait_port)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 397, in _transpile_nccl2
self.config.hierarchical_allreduce_inter_nranks
File "F:\Anaconda3\lib\site-packages\paddle\fluid\framework.py", line 2610, in append_op
attrs=kwargs.get("attrs", None))
File "F:\Anaconda3\lib\site-packages\paddle\fluid\framework.py", line 1870, in init
proto = OpProtoHolder.instance().get_op_proto(type)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\framework.py", line 1751, in get_op_proto
raise ValueError("Operator "%s" has not been registered." % type)
ValueError: Operator "gen_nccl_id" has not been registered.
INFO 2020-06-22 11:29:30,706 utils.py:272] terminate all the procs
ERROR 2020-06-22 11:29:30,706 utils.py:416] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
INFO 2020-06-22 11:29:30,706 utils.py:272] terminate all the procs

ResNet50_vd_10w_finetune.yaml文件配置如下:
mode: 'train'
ARCHITECTURE:
name: 'ResNet50_vd'
pretrained_model: "F:/pythonproject/PaddleClas/PaddleClas/ResNet50_vd_10w_pretrained/ResNet50_vd_10w_pretrained"
model_save_dir: "./output/"
classes_num: 5
total_images: 11745
save_interval: 1
validate: True
valid_interval: 1
epochs: 20
topk: 2
image_shape: [3, 224, 224]

LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.00375

OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000001

TRAIN:
batch_size: 32
num_workers: 4
file_list: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/train_list.txt"
data_dir: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:

VALID:
batch_size: 20
num_workers: 4
file_list: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/val_list.txt"
data_dir: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:

when infer a image: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

aistudio@jupyter-305239-473669:~/work/PaddleClas$ python tools/infer/predict.py -m output_ca/ResNet50_vd/last/model -p output_ca/ResNet50_vd/last/params -i ./test0.jpg --use_gpu=1
Traceback (most recent call last):
File "tools/infer/predict.py", line 160, in
main()
File "tools/infer/predict.py", line 121, in main
inputs = preprocess(args.image_file, operators)
File "tools/infer/predict.py", line 88, in preprocess
data = open(fname).read()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

what the problem?

multi_process reader的问题

Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit

/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
Process Process-2:
(Pdb)
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
2020-05-27 14:43:10 WARNING: Your reader has raised an exception!
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1156, in thread_main
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1136, in thread_main
for tensors in self._tensor_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1206, in tensor_reader_impl
for slots in paddle_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/data_feeder.py", line 506, in reader_creator
for item in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 267, in wrapper
for idx, sample in enumerate(reader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 572, in queue_reader
raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception

/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >
)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)

2020-05-27 14:43:10 INFO: SO:exception-Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >
)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)

/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
2020-05-27 14:43:10 WARNING: Your reader has raised an exception!
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1156, in thread_main
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1136, in thread_main
for tensors in self._tensor_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1206, in tensor_reader_impl
for slots in paddle_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/data_feeder.py", line 506, in reader_creator
for item in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 267, in wrapper
for idx, sample in enumerate(reader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 572, in queue_reader
raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception

/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >
)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)

2020-05-27 14:43:10 INFO: SO:exception-Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >
)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)

日志文件在哪?

FAQ中说“启动运行后,日志会实时输出到mylog/workerlog.*中,可以在这里查看实时的日志。”
但我为什么我运行后却找不到mylog文件夹?另外怎么可视化训练过程?

下载预训练模型报错,请帮忙看一下

!python tools/download.py -a ResNet50_vd -p ./pretrained -d True
!python tools/download.py -a ResNet50_vd_ssld -p ./pretrained -d True
!python tools/download.py -a MobileNetV3_large_x1_0 -p ./pretrained -d True

Traceback (most recent call last):
File "tools/download.py", line 17, in
from ppcls import model_zoo
ModuleNotFoundError: No module named 'ppcls'
Traceback (most recent call last):
File "tools/download.py", line 17, in
from ppcls import model_zoo
ModuleNotFoundError: No module named 'ppcls'
Traceback (most recent call last):
File "tools/download.py", line 17, in
from ppcls import model_zoo
ModuleNotFoundError: No module named 'ppcls'

window10 x64 如何写训练语句

我的笔记本是window10 x64, 显卡是NVIDIA GeForce GTX 1650.

我按照示例程序编写训练语句,如下:python -m paddle.distributed.launch
--selected_gpus="0"
tools/train.py
-c ./configs/quick_start/ResNet50_vd.yaml

结果提示 ”gen_nccl_id ” has not been registered, 咨询QQ群说是window不支持多卡,请问针对我目前情况,应该如何写训练语句

res2net 200模型命名python2错误

在创建res2net 200层模型时,py2会报错:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 4: invalid start byte

因为层数超过26个英文字母,代码里的命名会出错
conv_name = "res" + str(block+2) + chr(97+i)

代码里的上一个分支应该增加res2net200
if layers in [101, 152, 200] and block == 2:

Add unittest in PaddleClas

As the CI is already built,
The unittest can be reconstructed, like:

|—— ppcls
|
|—— test
|————|———— test_reader.py
|————|———— test_imaug.py
|————|———— test_download.py
|————|———— test_compress.py
|————|———— test_model.py
|————|———— test_speed.py
|————|———— test_finetune.py
|————|———— test_eval.py
|————|———— test_train.py
|————|———— test_infer.py
|————|———— test_performance.py (IMPORTANT)
|_________|__________test_export.py

想知道去重的具体步骤

非常感谢这么棒的项目!!! 我对数据集的去重方式有点疑问, 因为我现在的数据集也需要去重, 但是我只知道使用sift找到特征点, 但是不同图片匹配到的特征点数量也是不同的, 那么怎么判断两幅图片的相似百分比呢? 然后设定阈值去去重图片
aaa

make the concept: place clear

The concept: place confuse when someone tries to set available gpu places by indicating CUDA_VISIBLE_DEVICES

using Fleet interface, only the FLAGS_selected_gpus works

so we have to obtain gpu num by

gpu_num = paddle.fluid.core.get_cuda_device_count() if (
        'PADDLE_TRAINERS_NUM') and (
            'PADDLE_TRAINER_ID'
    ) not in env else int(env.get('PADDLE_TRAINERS_NUM', 0))
  • remove this switch

resnet50vd耗时

你好,我在v100上测试resnet50vd耗时接近24ms,你们的5ms以内是怎么测试的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.