no kernel image is available for execution on the device

我的环境是Ubuntu16 cuda10 cudnn7.6 显卡telas p100

0-05-16 22:34:15:src/examples/onnx.cpp:26]:done.
I[2020-05-16 22:34:15:src/examples/onnx.cpp:28]:load model: models/demo.fp32.trtmodel
I[2020-05-16 22:34:15:src/onnxplugin/plugins/]:init MReLU config: {"kernel_size": 3, "eps": 0.03, "other": "Hello Onnx Plugin"}
I[2020-05-16 22:34:15:src/onnxplugin/plugins/]:MReLU weights = 1[1 x 1 x 1 x 1]
I[2020-05-16 22:34:15:src/onnxplugin/plugins/]:MReLU kernel_size: 3
I[2020-05-16 22:34:15:src/onnxplugin/plugins/]:MReLU eps: 0.03
I[2020-05-16 22:34:15:src/onnxplugin/plugins/]:MReLU other: Hello Onnx Plugin
I[2020-05-16 22:34:15:src/examples/onnx.cpp:35]:forward...
E[2020-05-16 22:34:15:src/infer/trt_infer.cpp:33]:NVInfer ERROR: ../rtSafe/cuda/cudaElementWiseRunner.cpp (149) - Cuda Error in execute: 48 (no kernel image is available for execution on the device)
E[2020-05-16 22:34:15:src/infer/trt_infer.cpp:33]:NVInfer ERROR: FAILED_EXECUTION: std::exception
F[2020-05-16 22:34:15:src/infer/trt_infer.cpp:652]:Assert Failure: execute_result

CUFLAGS := -std=c++11 -m31 -Xcompiler -fPIC -g -O3 -w -gencode=arch=compute_60,code=sm_60


plugin/plugins/ error: more than one conversion function from "const halfloat" to a built-in type applies:
function "__half::operator float() const"
function "__half::operator short() const"
function "__half::operator unsigned short() const"
function "__half::operator int() const"
function "__half::operator unsigned int() const"
function "__half::operator long long() const"
function "__half::operator unsigned long long() const"
function "__half::operator __nv_bool() const"

ubuntu make error

我用protobuf3.8.0编译 error onnx_ONNX_NAMESPACE-ml.pb.cpp is not a member of ‘google::protobuf::internal::WireFormat’
然后使用protobuf3.13.3编译 onnx_ONNX_NAMESPACE-ml.pb.h:368: undefined reference to `google::protobuf::internal::AssignDescriptors(google::protobuf::internal::DescriptorTable const*)'好像版本太新了。
我本机编译的onnx-tensorrt7得不到src/onnx/下边所有的.h 如onnx_ONNX_NAMESPACE-ml.pb.h onnx-operators_ONNX_NAMESPACE-ml.pb
@dlunion @hopef

How to design bottleneck layers such as denseblock layer in tensorrt?

I want to speed up the densenet network model in tensorrt, but when writing the custom layer denseblock layer, I thought of a problem. I used the caffe model as the input. The denseblock layer has only one layer in prototxt of caffe, but it contains 8 convolution layers, and other the operation of BN, then what should I do when defining plugin? Whether to define 8 plugins for 8 convolutional layer and the BN layer separately, or only one plugin include 8 convolutional and BN, here is the example of prototxt.
Can you give some advice,thanks for your reply.

DenseBlock 1

layer {
name: "DenseBlock1"
type: "DenseBlock"
bottom: "conv1"
top: "DenseBlock1"
denseblock_param {
numTransition: 8
initChannel: 64
growthRate: 8
Filter_Filler {
type: "msra"
BN_Scaler_Filler {
type: "constant"
value: 1
BN_Bias_Filler {
type: "constant"
value: 0
use_dropout: false
dropout_amount: 0.2

vs2017 报未找到导入项目 BuildCustomizations\CUDA 10.0.props 错误


严重性 代码 说明 项目 文件 行 禁止显示状态
错误 MSB4019 未找到导入的项目“C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations\CUDA 10.0.props”。请确认 声明中的路径正确,且磁盘上存在该文件。 TensorRT D:\github_prj\tensorRTIntegrate-master\TensorRT.vcxproj 128

我看了下 TensorRT.vcxproj 第128行是: <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.props" /> 这个文件无法在vs安装目录或者是项目目录下找到。




running speed

Thanks for your great contribution. I can successfully run on Windows laptop with RTX 2080, cuda 10.0, cudnn 7.6.3, tensorrt 7.
However there is no much speed improvement. The fp32/fp16 has very small (2 fps) differences.
under fp32 centernet coco object detection:
The pure forward function: 36 fps. the whole speed (input resize + forward): 26 fps. The original Python code can get forward 25 fps, 15 fps for the whole process (the preprocess costs a lot of time, the normalization part (-mean)/std).

Did you test the speed?

BTW, I have to say YOLOv3 is really fast. I use OpenCV 4.2 with cuda enabled. The forward can reach to 56fps while the whole process (resize+forward+nms) can reach to 46fps.

怎样才能支持转INT 8或者INT16?

您好!想转INT8或者INT16试试,可能会掉精度(在接受范围内的话想试试),请问下怎么修改才能支持?(貌似protobuf不支持int8/int16这样的类型),校正文件int8EntropyCalibratorFile这个生成后, Int8Process int8process 应该赋什么值,一开始是nullptr值会报错

Unsure how to apply DCNv2 Plugin to onnx-tensorrt

Hi, thanks very much for creating this repository!

I've exported CenterTrack with DCNv2 to ONNX with your replacements, but am not sure how to add DCNv2 plugin for onnx-tensorrt parser.

I get the expected error:

onnx.onnx_cpp2py_export.checker.ValidationError: No Op registered for Plugin with domain_version of 9

==> Context: Bad node spec: input: "561" input: "562" input: "dla_up.ida_0.proj_1.conv.weight" input: "dla_up.ida_0.proj_1.conv.bias" output: "563" op_type: "Plugin" attribute { name: "info" s: "{"dilation": [1, 1], "padding": [1, 1], "stride": [1, 1], "deformable_groups": 1}" type: STRING } attribute { name: "name" s: "DCNv2" type: STRING }

If you find time, would you mind showing how I should modify my onnx-tensorrt install to add the plugin?




ubuntu 16.04下用Makefile编译




protobuf 版本按照readme使用 3.8.x版本有问题

#error This file was generated by a newer version of protoc which is
#error incompatible with your Protocol Buffer headers. Please update
#error your headers.


在 Ubuntu 下面怎么编译啊

makefile 貌似有很多版本问题, 另外 LEAN 这个东西在Ubuntu 下面需要么, 不太了解? 请问你是怎么编译啊?




hi, @dlunion @hopef 感谢这个工作,我目前想在代码里面集成libtorch,但是在ubuntu下链接到主函数就报错,不知是我写的问题,还是bug?



INC_TORCH := $(TORCH_PREFIX)/include $(TORCH_PREFIX)/include/torch/csrc/api/include

LIB_TORCH := $(TORCH_PREFIX)/lib /usr/local/cuda/lib64/stubs

LD_TORCH := torch c10 c10d c10_cuda 

不知道可否帮忙看下问题在哪儿? 十分感谢!


@hopef 你好,请问coco_tracking.onnx模型可以转换为trt模型推理使用吗?



shl@zhihui-mint:~/shl_res/project_source/CRF/TensorRT-$ cat 9_CenterTrack_onnx_512_to_trt_fp16 
./trtexec --onnx=/home/shl/CenterTrack/models/coco_tracking.onnx --explicitBatch --saveEngine=/home/shl/CenterTrack/models/coco_tracking.trt --workspace=5120 --fp16

(yolov4) shl@zhihui-mint:~/shl_res/project_source/CRF/TensorRT-$ ./trtexec --onnx=/home/shl/CenterTrack/models/coco_tracking.onnx --explicitBatch --saveEngine=/home/shl/CenterTrack/models/coco_tracking.trt --workspace=5120 --fp16
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=/home/shl/CenterTrack/models/coco_tracking.onnx --explicitBatch --saveEngine=/home/shl/CenterTrack/models/coco_tracking.trt --workspace=5120 --fp16
[11/13/2020-19:40:47] [I] === Model Options ===
[11/13/2020-19:40:47] [I] Format: ONNX
[11/13/2020-19:40:47] [I] Model: /home/shl/CenterTrack/models/coco_tracking.onnx
[11/13/2020-19:40:47] [I] Output:
[11/13/2020-19:40:47] [I] === Build Options ===
[11/13/2020-19:40:47] [I] Max batch: explicit
[11/13/2020-19:40:47] [I] Workspace: 5120 MB
[11/13/2020-19:40:47] [I] minTiming: 1
[11/13/2020-19:40:47] [I] avgTiming: 8
[11/13/2020-19:40:47] [I] Precision: FP16
[11/13/2020-19:40:47] [I] Calibration: 
[11/13/2020-19:40:47] [I] Safe mode: Disabled
[11/13/2020-19:40:47] [I] Save engine: /home/shl/CenterTrack/models/coco_tracking.trt
[11/13/2020-19:40:47] [I] Load engine: 
[11/13/2020-19:40:47] [I] Inputs format: fp32:CHW
[11/13/2020-19:40:47] [I] Outputs format: fp32:CHW
[11/13/2020-19:40:47] [I] Input build shapes: model
[11/13/2020-19:40:47] [I] === System Options ===
[11/13/2020-19:40:47] [I] Device: 0
[11/13/2020-19:40:47] [I] DLACore: 
[11/13/2020-19:40:47] [I] Plugins:
[11/13/2020-19:40:47] [I] === Inference Options ===
[11/13/2020-19:40:47] [I] Batch: Explicit
[11/13/2020-19:40:47] [I] Iterations: 10
[11/13/2020-19:40:47] [I] Duration: 3s (+ 200ms warm up)
[11/13/2020-19:40:47] [I] Sleep time: 0ms
[11/13/2020-19:40:47] [I] Streams: 1
[11/13/2020-19:40:47] [I] ExposeDMA: Disabled
[11/13/2020-19:40:47] [I] Spin-wait: Disabled
[11/13/2020-19:40:47] [I] Multithreading: Disabled
[11/13/2020-19:40:47] [I] CUDA Graph: Disabled
[11/13/2020-19:40:47] [I] Skip inference: Disabled
[11/13/2020-19:40:47] [I] Inputs:
[11/13/2020-19:40:47] [I] === Reporting Options ===
[11/13/2020-19:40:47] [I] Verbose: Disabled
[11/13/2020-19:40:47] [I] Averages: 10 inferences
[11/13/2020-19:40:47] [I] Percentile: 99
[11/13/2020-19:40:47] [I] Dump output: Disabled
[11/13/2020-19:40:47] [I] Profile: Disabled
[11/13/2020-19:40:47] [I] Export timing to JSON file: 
[11/13/2020-19:40:47] [I] Export output to JSON file: 
[11/13/2020-19:40:47] [I] Export profile to JSON file: 
[11/13/2020-19:40:47] [I] 
Input filename:   /home/shl/CenterTrack/models/coco_tracking.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.1
Model version:    0
Doc string:       
While parsing node number 136 [Plugin]:
ERROR: ModelImporter.cpp:134 In function parseGraph:
[8] No importer registered for op: Plugin
[11/13/2020-19:40:48] [E] Failed to parse onnx file
[11/13/2020-19:40:48] [E] Parsing model failed
[11/13/2020-19:40:48] [E] Engine creation failed
[11/13/2020-19:40:48] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # ./trtexec --onnx=/home/shl/CenterTrack/models/coco_tracking.onnx --explicitBatch --saveEngine=/home/shl/CenterTrack/models/coco_tracking.trt --workspace=5120 --fp16



请问 detectron2 ModulatedDeformConv 如何使用 DCNv2 的插件?

由于使用 detectron2 框架做模型训练,其中的 dcn 层使用的是 detectron2.layers.ModulatedDeformConv。若直接将 ModulatedDeformConv 替换成 DCNv2 进行推理,结果与原始的不一致(数值比原始的结果要大)。请问各位大佬了解 detectron2 ModulatedDeformConv 与 DCNv2 源码实现的区别吗?



yolov5 detect gray image

thanks for your works!
i want to detect gray image , i trained yolo with gray image as train_dataset, 现在是灰度图了,转onnx和trt代码与先前的有区别吗?只需要改一下输入的通道数?


RuntimeError: function _DCNv2Backward returned a gradient different than None at position 5, but the corresponding forward input was not a Variable


src/infer/trt_infer.cpp: In member function 'void TRTInfer::Tensor::toFloat()':
src/infer/trt_infer.cpp:276:11: error: cannot convert 'TRTInfer::halfloat {aka __half}' to 'float' in assignment
*dst++ = *src++;
src/infer/trt_infer.cpp: In member function 'void TRTInfer::Tensor::toHalf()':
src/infer/trt_infer.cpp:300:11: error: no match for 'operator=' (operand types are 'TRTInfer::halfloat {aka __half}' and 'float')
*dst++ = *src++;
In file included from lean/cuda-9.0/include/cuda_fp16.h:1967:0,
from src/infer/trt_infer.cpp:10:
lean/cuda-9.0/include/cuda_fp16.hpp:137:33: note: candidate: __half& __half::operator=(const __half_raw&)
CUDA_HOSTDEVICE __half &operator=(const __half_raw &hr) { __x = hr.x; return *this; }
lean/cuda-9.0/include/cuda_fp16.hpp:137:33: note: no known conversion for argument 1 from 'float' to 'const __half_raw&'
lean/cuda-9.0/include/cuda_fp16.hpp:124:26: note: candidate: __half& __half::operator=(const __half&)
struct CUDA_ALIGN(2) __half {
lean/cuda-9.0/include/cuda_fp16.hpp:124:26: note: no known conversion for argument 1 from 'float' to 'const __half&'
lean/cuda-9.0/include/cuda_fp16.hpp:124:26: note: candidate: __half& __half::operator=(__half&&)
lean/cuda-9.0/include/cuda_fp16.hpp:124:26: note: no known conversion for argument 1 from 'float' to '__half&&'
src/infer/trt_infer.cpp: In member function 'void TRTInfer::Tensor::setRandom(float, float)':
src/infer/trt_infer.cpp:318:12: error: no match for 'operator=' (operand types are 'TRTInfer::halfloat {aka __half}' and 'float')
*ptr++ = ccutil::randrf(low, high);

libprotobuf FATAL

我在 下载了SSD的onnx文件,并用center_net_coco2x_dcn的example进行编译,总是报错:

W[2020-08-14 11:54:36:d:\nndl_pytorch\tensorrtintegrate-master\src\examples\center_net_coco2x_dcn.cpp:121]:onnx to trtmodel...
W[2020-08-14 11:54:43:d:\nndl_pytorch\tensorrtintegrate-master\src\builder\trt_builder.cpp:96]:Build FP32 trtmodel.
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
[libprotobuf FATAL d:\nndl_pytorch\tensorrtintegrate-master\lean\protobuf3.11.4\include\google\protobuf\repeated_field.h:1537] CHECK failed: (index) < (current_size_):


E[2020-08-14 12:17:59:d:\nndl_pytorch\tensorrtintegrate-master\src\builder\trt_builder.cpp:32]:NVInfer ERROR: (Unnamed Layer* 0) [Slice]: slice size must be positive, size = [3,0,640]
E[2020-08-14 12:17:59:d:\nndl_pytorch\tensorrtintegrate-master\src\builder\trt_builder.cpp:32]:NVInfer ERROR: (Unnamed Layer* 0) [Slice]: slice size must be positive, size = [3,0,640]
[libprotobuf FATAL d:\nndl_pytorch\tensorrtintegrate-master\lean\protobuf3.11.4\include\google\protobuf\repeated_field.h:1537] CHECK failed: (index) < (current_size_):





error: a nonstatic member reference must be relative to a specific object

Hi, I'm trying to compile this project on Ubuntu18.04, and I've written a Cmakelist.txt:

cmake_minimum_required(VERSION 3.15)
set(CMAKE_CUDA_COMPILER /usr/local/cuda-10.0/bin/nvcc)
set(CMAKE_CUDA_FLAGS -arch=sm_75)
#set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -ccbin /usr/local/cuda-10.0/bin/nvcc")


# TensorRT 7
set(TensorRT_7_DIR "/home/xfb/Programs/TensorRT/")
set(TensorRT_7_LIBDIR "${TensorRT_7_DIR}/lib")
file(GLOB TensorRT_7_LIBS "${TensorRT_7_LIBDIR}/*.so")

# CUDA 10.0
set(CUDA_DIR "/usr/local/cuda-10.0")
set(CUDA_LIBDIR "${CUDA_DIR}/lib64")

# OpenCV
set(OpenCV_DIR "/usr/local/lib/cmake/opencv4")
find_package(OpenCV REQUIRED)

# Protobuf
set(Protobuf_DIR "/usr/local/include/google/protobuf")
set(Protobuf_LIB /usr/local/lib/

# common files for sample projects
include_directories(builder caffeplugin common examples infer onnx_parser)

file(GLOB caffeplugin_src "caffeplugin/*.cpp" "caffeplugin/plugins/*.cu")
file(GLOB common_src "common/*.cpp")
file(GLOB infer_src "infer/*.cpp" "infer/*.cu")
file(GLOB examples_src "examples/*.cpp")
file(GLOB onnx_src "onnx/*.cpp")
file(GLOB onnx_parser_src "onnx_parser/*.cpp" "onnx_parser/*.cu")
file(GLOB onnxplugin_src onnxplugin/onnxplugin.cpp "onnxplugin/plugins/*.cu")


target_link_directories(COCO_track PUBLIC builder caffeplugin common examples infer onnx onnx_parser)
target_link_directories(COCO_track PUBLIC "${TensorRT_7_DIR}/include")
target_link_directories(COCO_track PUBLIC "${CUDA_DIR}/include")
target_link_directories(COCO_track PUBLIC "${Protobuf_DIR}")
target_link_libraries(COCO_track ${TensorRT_7_LIBS} ${CUDA_LIBS} ${OpenCV_LIBS} ${Protobuf_LIB})

However, when I built the project, I met the following error a couple of times:

/usr/include/c++/7/debug/safe_unordered_container.h(71): error: a nonstatic member reference must be relative to a specific object

Any idea on how to debug this error? Many thanks!


你好,我在实现自己plugin的时候,parseFromFile 调试进DEFINE_BUILTIN_OP_IMPORTER函数里面creator->createPlugin(name.c_str(), &pluginFieldCollection);返回null,这是因为什么?我该在哪里注册。序列化的顺序是怎么样的? @dlunion @hopef

src/caffeplugin/plugins/ error: more than one conversion function from "const TRTInfer::halfloat" to a built-in type applies:

hello,the first,i have make successfully,howeve,the second i got this error,as follows:

src/caffeplugin/plugins/ error: more than one conversion function from "const TRTInfer::halfloat" to a built-in type applies:
function "__half::operator float() const"
function "__half::operator short() const"
function "__half::operator unsigned short() const"
function "__half::operator int() const"
function "__half::operator unsigned int() const"
function "__half::operator long long() const"
function "__half::operator unsigned long long() const"
function "__half::operator __nv_bool() const"
detected during instantiation of "void channelMultiplicationKernel(const _T *, const _T *, _T *, int, int) [with _T=TRTInfer::halfloat]"
(36): here
how could i solve it?thx

getPluginCreator could not find plugin DCNv2 version 1

Hi~ Thanks for sharing!
I try to convert onnx to trt model, so i deleted all caffe related code, and only used onnx related code.
i got following error when running the code:

[08/14/2020-22:29:07] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin DCNv2 version 1
DCNv2 plugin was not found in the plugin registry!While parsing node number 136 [Plugin]:
ERROR: /d/Project_Dominant/Load_CenterTrack_Onnx/src/onnx_parser/builtin_op_importers.cpp:1693 In function importPlugin:
[8] Assertion failed: false
[08/14/2020-22:29:07] [E] [TRT] Network must have at least one output
[08/14/2020-22:29:07] [E] [TRT] Network validation failed.
Segmentation fault (core dumped)

Any helps or suggestions ? Thanks.

