shouxieai / tensorrt_pro Goto Github PK

C++ library based on tensorrt integration

License: MIT License

Makefile 0.16% C++ 75.40% C 16.62% Cuda 2.05% Python 0.62% CMake 0.08% Jupyter Notebook 1.89% Batchfile 0.01% Shell 0.02% JavaScript 0.21% HTML 2.38% Vue 0.04% CSS 0.52%

pytorch tensorrt object-detection deep-learning yolov5 yolox

tensorrt_pro's Introduction

Read this in other languages: English, 简体中文.

News:

🔥 A simple implementation is released: https://github.com/shouxieai/infer
🔥 Add yolov7 support .
🔥 Released python solution for hardware decoding with tensorRT integration
🔥 Docker Image has been released：https://hub.docker.com/r/hopef/tensorrt-pro
⚡tensorRT_Pro_comments_version(co-contributing version) is also provided for a better learning experience. Repo: https://github.com/Guanbin-Huang/tensorRT_Pro_comments
🔥 Simple yolov5/yolox implemention is released. Simple and easy to use.
🔥 yolov5-1.0-6.0/master are supported.
Tutorial notebooks download:
- WarpAffine.lesson.tar.gz
- Offset.tar.gz
Tutorial for exporting CenterNet from pytorch to tensorRT is released.

Tutorial Video

blibli : https://www.bilibili.com/video/BV1Xw411f7FW (Now only in Chinese. English is comming)
slides : http://zifuture.com:1556/fs/sxai/tensorRT.pptx (Now only in Chinese. English is comming)
tutorial folder: a good intro for beginner to get a general idea of our framework.(Chinese/English)

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

C++ Interface: 3 lines of code is all you need to run a YoloX

// create inference engine on gpu-0
//auto engine = Yolo::create_infer("yolov5m.fp32.trtmodel", Yolo::Type::V5, 0);
auto engine = Yolo::create_infer("yolox_m.fp32.trtmodel", Yolo::Type::X, 0);

// load image
auto image = cv::imread("1.jpg");

// do inference and get the result
auto box = engine->commit(image).get();  // return vector<Box>

Python Interface:

import pytrt

model     = models.resnet18(True).eval().to(device)
trt_model = tp.from_torch(model, input)
trt_out   = trt_model(input)

simple yolo for python

import os
import cv2
import numpy as np
import pytrt as tp

engine_file = "yolov5s.fp32.trtmodel"
if not os.path.exists(engine_file):
    tp.compile_onnx_to_file(1, tp.onnx_hub("yolov5s"), engine_file)

yolo   = tp.Yolo(engine_file, type=tp.YoloType.V5)
image  = cv2.imread("car.jpg")
bboxes = yolo.commit(image).get()
print(f"{len(bboxes)} objects")

for box in bboxes:
    left, top, right, bottom = map(int, [box.left, box.top, box.right, box.bottom])
    cv2.rectangle(image, (left, top), (right, bottom), tp.random_color(box.class_label), 5)

saveto = "yolov5.car.jpg"
print(f"Save to {saveto}")

cv2.imwrite(saveto, image)
cv2.imshow("result", image)
cv2.waitKey()

INTRO

High level interface for C++/Python.
Simplify the implementation of custom plugin. And serialization and deserialization have been encapsulated for easier usage.
Simplify the compile of fp32, fp16 and int8 for facilitating the deployment with C++/Python in server or embeded device.
Models ready for use also with examples are RetinaFace, Scrfd, YoloV5, YoloX, Arcface, AlphaPose, CenterNet and DeepSORT(C++)

YoloX and YoloV5-series Model Test Report

app_yolo.cpp speed testing

Resolution (YoloV5P5, YoloX) = (640x640), (YoloV5P6) = (1280x1280)
max batch size = 16
preprocessing + inference + postprocessing
cuda10.2, cudnn8.2.2.26, TensorRT-8.0.1.6
RTX2080Ti
num of testing: take the average on the results of 100 times but excluding the first time for warmup
Testing log: [workspace/perf.result.std.log (workspace/perf.result.std.log)
code for testing: src/application/app_yolo.cpp
images for testing: 6 images in workspace/inference
- with resolution 810x1080，500x806，1024x684，550x676，1280x720，800x533 respetively
Testing method: load 6 images. Then do the inference on the 6 images, which will be repeated for 100 times. Note that each image should be preprocessed and postprocessed.

Model	Resolution	Type	Precision	Elapsed Time	FPS
yolox_x	640x640	YoloX	FP32	21.879	45.71
yolox_l	640x640	YoloX	FP32	12.308	81.25
yolox_m	640x640	YoloX	FP32	6.862	145.72
yolox_s	640x640	YoloX	FP32	3.088	323.81
yolox_x	640x640	YoloX	FP16	6.763	147.86
yolox_l	640x640	YoloX	FP16	3.933	254.25
yolox_m	640x640	YoloX	FP16	2.515	397.55
yolox_s	640x640	YoloX	FP16	1.362	734.48
yolox_x	640x640	YoloX	INT8	4.070	245.68
yolox_l	640x640	YoloX	INT8	2.444	409.21
yolox_m	640x640	YoloX	INT8	1.730	577.98
yolox_s	640x640	YoloX	INT8	1.060	943.15
yolov5x6	1280x1280	YoloV5_P6	FP32	68.022	14.70
yolov5l6	1280x1280	YoloV5_P6	FP32	37.931	26.36
yolov5m6	1280x1280	YoloV5_P6	FP32	20.127	49.69
yolov5s6	1280x1280	YoloV5_P6	FP32	8.715	114.75
yolov5x	640x640	YoloV5_P5	FP32	18.480	54.11
yolov5l	640x640	YoloV5_P5	FP32	10.110	98.91
yolov5m	640x640	YoloV5_P5	FP32	5.639	177.33
yolov5s	640x640	YoloV5_P5	FP32	2.578	387.92
yolov5x6	1280x1280	YoloV5_P6	FP16	20.877	47.90
yolov5l6	1280x1280	YoloV5_P6	FP16	10.960	91.24
yolov5m6	1280x1280	YoloV5_P6	FP16	7.236	138.20
yolov5s6	1280x1280	YoloV5_P6	FP16	3.851	259.68
yolov5x	640x640	YoloV5_P5	FP16	5.933	168.55
yolov5l	640x640	YoloV5_P5	FP16	3.450	289.86
yolov5m	640x640	YoloV5_P5	FP16	2.184	457.90
yolov5s	640x640	YoloV5_P5	FP16	1.307	765.10
yolov5x6	1280x1280	YoloV5_P6	INT8	12.207	81.92
yolov5l6	1280x1280	YoloV5_P6	INT8	7.221	138.49
yolov5m6	1280x1280	YoloV5_P6	INT8	5.248	190.55
yolov5s6	1280x1280	YoloV5_P6	INT8	3.149	317.54
yolov5x	640x640	YoloV5_P5	INT8	3.704	269.97
yolov5l	640x640	YoloV5_P5	INT8	2.255	443.53
yolov5m	640x640	YoloV5_P5	INT8	1.674	597.40
yolov5s	640x640	YoloV5_P5	INT8	1.143	874.91

app_yolo_fast.cpp speed testing. Never stop desiring for being faster

Highlight: 0.5 ms faster without any loss in precision compared with the above. Specifically, we remove the Focus and some transpose nodes etc, and implement them in CUDA kenerl function. But the rest remains the same.
Test log: workspace/perf.result.std.log
Code for testing: src/application/app_yolo_fast.cpp
Tips: you can do the modification while refering to the downloaded onnx. Any questions are welcomed through any kinds of contact.
Conclusion: the main idea of this work is to optimize the pre-and-post processing. If you go for yolox, yolov5 small version, the optimization might help you.

Model	Resolution	Type	Precision	Elapsed Time	FPS
yolox_x_fast	640x640	YoloX	FP32	21.598	46.30
yolox_l_fast	640x640	YoloX	FP32	12.199	81.97
yolox_m_fast	640x640	YoloX	FP32	6.819	146.65
yolox_s_fast	640x640	YoloX	FP32	2.979	335.73
yolox_x_fast	640x640	YoloX	FP16	6.764	147.84
yolox_l_fast	640x640	YoloX	FP16	3.866	258.64
yolox_m_fast	640x640	YoloX	FP16	2.386	419.16
yolox_s_fast	640x640	YoloX	FP16	1.259	794.36
yolox_x_fast	640x640	YoloX	INT8	3.918	255.26
yolox_l_fast	640x640	YoloX	INT8	2.292	436.38
yolox_m_fast	640x640	YoloX	INT8	1.589	629.49
yolox_s_fast	640x640	YoloX	INT8	0.954	1048.47
yolov5x6_fast	1280x1280	YoloV5_P6	FP32	67.075	14.91
yolov5l6_fast	1280x1280	YoloV5_P6	FP32	37.491	26.67
yolov5m6_fast	1280x1280	YoloV5_P6	FP32	19.422	51.49
yolov5s6_fast	1280x1280	YoloV5_P6	FP32	7.900	126.57
yolov5x_fast	640x640	YoloV5_P5	FP32	18.554	53.90
yolov5l_fast	640x640	YoloV5_P5	FP32	10.060	99.41
yolov5m_fast	640x640	YoloV5_P5	FP32	5.500	181.82
yolov5s_fast	640x640	YoloV5_P5	FP32	2.342	427.07
yolov5x6_fast	1280x1280	YoloV5_P6	FP16	20.538	48.69
yolov5l6_fast	1280x1280	YoloV5_P6	FP16	10.404	96.12
yolov5m6_fast	1280x1280	YoloV5_P6	FP16	6.577	152.06
yolov5s6_fast	1280x1280	YoloV5_P6	FP16	3.087	323.99
yolov5x_fast	640x640	YoloV5_P5	FP16	5.919	168.95
yolov5l_fast	640x640	YoloV5_P5	FP16	3.348	298.69
yolov5m_fast	640x640	YoloV5_P5	FP16	2.015	496.34
yolov5s_fast	640x640	YoloV5_P5	FP16	1.087	919.63
yolov5x6_fast	1280x1280	YoloV5_P6	INT8	11.236	89.00
yolov5l6_fast	1280x1280	YoloV5_P6	INT8	6.235	160.38
yolov5m6_fast	1280x1280	YoloV5_P6	INT8	4.311	231.97
yolov5s6_fast	1280x1280	YoloV5_P6	INT8	2.139	467.45
yolov5x_fast	640x640	YoloV5_P5	INT8	3.456	289.37
yolov5l_fast	640x640	YoloV5_P5	INT8	2.019	495.41
yolov5m_fast	640x640	YoloV5_P5	INT8	1.425	701.71
yolov5s_fast	640x640	YoloV5_P5	INT8	0.844	1185.47

Setup and Configuration

Linux

VSCode (highly recommended!)
Configure your path for cudnn, cuda, tensorRT8.0 and protobuf.
Configure the compute capability matched with your nvidia graphics card in Makefile/CMakeLists.txt
- e.g. -gencode=arch=compute_75,code=sm_75. If you are using 3080Ti, that should be gencode=arch=compute_86,code=sm_86
- reference for the table for GPU Compute Capability: https://developer.nvidia.com/cuda-gpus#compute
Configure your library path in .vscode/c_cpp_properties.json
CUDA version: CUDA10.2
CUDNN version: cudnn8.2.2.26. Note that dev(.h file) and runtime(.so file) should be downloaded.
tensorRT version：tensorRT-8.0.1.6-cuda10.2
protobuf version（for onnx parser）：protobufv3.11.4
- if other version, refer to the ........
- link for download: https://github.com/protocolbuffers/protobuf/tree/v3.11.4
- download, compile and replace the path in Makefile/CMakeLists.txt with new path to protobuf3.11.4

CMake:
- mkdir build && cd build
- cmake ..
- make yolo -j8
Makefile:
- make yolo -j8

Linux: Compile for Python

compile and install
- Makefile：
  - set use_python := true in Makefile
- CMakeLists.txt:
  - set(HAS_PYTHON ON) in CMakeLists.txt
- Type in make pyinstall -j8
- Complied files are in python/pytrt/libpytrtc.so

Windows

Please check the lean/README.md for the detailed dependency
In TensorRT.vcxproj, replace the <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.props" /> with your own CUDA path
In TensorRT.vcxproj, replace the <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.targets" /> with your own CUDA path
In TensorRT.vcxproj, replace the <CodeGeneration>compute_61,sm_61</CodeGeneration> with your compute capability.
- refer to the table in https://developer.nvidia.com/cuda-gpus#compute
Configure your dependency or download it to the foler /lean. Configure VC++ dir (include dir and refence)
Configure your env, debug->environment
Compile and run the example, where 3 options are available.

Windows: Compile for Python

Compile pytrtc.pyd. Choose python in visual studio to compile
Copy dll and execute 'python/copy_dll_to_pytrt.bat'
Execute the example in python dir by 'python test_yolov5.py'

if installation is needed, switch to target env(e.g. your conda env) then 'python setup.py install', which has to be followed by step 1 and step 2.
the compiled files are in python/pytrt/libpytrtc.pyd

Other Protobuf Version

in onnx/make_pb.sh, replace the path protoc=/data/sxai/lean/protobuf3.11.4/bin/protoc in protoc with the protoc of your own version

#cd the path in terminal to /onnx
cd onnx

#execuete the command to make pb files
bash make_pb.sh

CMake:
- replace the set(PROTOBUF_DIR "/data/sxai/lean/protobuf3.11.4") in CMakeLists.txt with the same path of your protoc.

mkdir build && cd build
cmake ..
make yolo -j64

Makefile:
- replace the path lean_protobuf := /data/sxai/lean/protobuf3.11.4 in Makefile with the same path of protoc

make yolo -j64

TensorRT 7.x support

The default is tensorRT8.x

Replace onnx_parser_for_7.x/onnx_parser to src/tensorRT/onnx_parser
- bash onnx_parser/use_tensorrt_7.x.sh
Configure Makefile/CMakeLists.txt path to TensorRT7.x
Execute make yolo -j64

TensorRT 8.x support

The default is tensorRT8.x

Replace onnx_parser_for_8.x/onnx_parser to src/tensorRT/onnx_parser
- bash onnx_parser/use_tensorrt_8.x.sh
Configure Makefile/CMakeLists.txt path to TensorRT8.x
Execute make yolo -j64

Guide for Different Tasks/Model Support

YoloV5 Support

if pytorch >= 1.7, and the model is 5.0+, the model is suppored by the framework
if pytorch < 1.7 or yolov5(2.0, 3.0 or 4.0), minor modification should be done in opset.
if you want to achieve the inference with lower pytorch, dynamic batchsize and other advanced setting, please check our blog (now in Chinese) and scan the QRcode via Wechat to join us.

Download yolov5

git clone [email protected]:ultralytics/yolov5.git

Modify the code for dynamic batchsize

# line 55 forward function in yolov5/models/yolo.py 
# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# modified into:

bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
bs = -1
ny = int(ny)
nx = int(nx)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

# line 70 in yolov5/models/yolo.py
#  z.append(y.view(bs, -1, self.no))
# modified into：
z.append(y.view(bs, self.na * ny * nx, self.no))

############# for yolov5-6.0 #####################
# line 65 in yolov5/models/yolo.py
# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
# modified into:
if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)

# disconnect for pytorch trace
anchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)

# line 70 in yolov5/models/yolo.py
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh

# line 73 in yolov5/models/yolo.py
# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
wh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh
############# for yolov5-6.0 #####################


# line 52 in yolov5/export.py
# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)
#                                'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85)  修改为
# modified into:
torch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)
                                'output': {0: 'batch'}  # shape(1,25200,85)

Export to onnx model

cd yolov5
python export.py --weights=yolov5s.pt --dynamic --include=onnx --opset=11

Copy the model and execute it

cp yolov5/yolov5s.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make yolo -j32

YoloV7 Support

1. Download yolov7 and pth

# from cdn
# or wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

wget https://cdn.githubjs.cf/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
git clone [email protected]:WongKinYiu/yolov7.git

Modify the code for dynamic batchsize

# line 45 forward function in yolov7/models/yolo.py 
# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# modified into:

bs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) to x(bs,3,20,20,85)
bs = -1
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

# line 52 in yolov7/models/yolo.py
# y = x[i].sigmoid()
# y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# z.append(y.view(bs, -1, self.no))
# modified into：
y = x[i].sigmoid()
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, -1, 1, 1, 2)  # wh
classif = y[..., 4:]
y = torch.cat([xy, wh, classif], dim=-1)
z.append(y.view(bs, self.na * ny * nx, self.no))

# line 57 in yolov7/models/yolo.py
# return x if self.training else (torch.cat(z, 1), x)
# modified into:
return x if self.training else torch.cat(z, 1)


# line 52 in yolov7/models/export.py
# output_names=['classes', 'boxes'] if y is None else ['output'],
# dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # size(1,3,640,640)
#               'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic else None)
# modified into:
output_names=['classes', 'boxes'] if y is None else ['output'],
dynamic_axes={'images': {0: 'batch'},  # size(1,3,640,640)
              'output': {0: 'batch'}} if opt.dynamic else None)

Export to onnx model

cd yolov7
python models/export.py --dynamic --grid --weight=yolov7.pt

Copy the model and execute it

cp yolov7/yolov7.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make yolo -j32

YoloX Support

download from: https://github.com/Megvii-BaseDetection/YOLOX
If you don't want to export onnx by yourself, just make run in the repo of Megavii

Download YoloX

git clone [email protected]:Megvii-BaseDetection/YOLOX.git
cd YOLOX

Modify the code The modification ensures a successful int8 compilation and inference, otherwise Missing scale and zero-point for tensor (Unnamed Layer* 686) will be raised.

# line 206 forward fuction in yolox/models/yolo_head.py. Replace the commented code with the uncommented code
# self.hw = [x.shape[-2:] for x in outputs] 
self.hw = [list(map(int, x.shape[-2:])) for x in outputs]


# line 208 forward function in yolox/models/yolo_head.py. Replace the commented code with the uncommented code
# [batch, n_anchors_all, 85]
# outputs = torch.cat(
#     [x.flatten(start_dim=2) for x in outputs], dim=2
# ).permute(0, 2, 1)
proc_view = lambda x: x.view(-1, int(x.size(1)), int(x.size(2) * x.size(3)))
outputs = torch.cat(
    [proc_view(x) for x in outputs], dim=2
).permute(0, 2, 1)


# line 253 decode_output function in yolox/models/yolo_head.py Replace the commented code with the uncommented code
#outputs[..., :2] = (outputs[..., :2] + grids) * strides
#outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
#return outputs
xy = (outputs[..., :2] + grids) * strides
wh = torch.exp(outputs[..., 2:4]) * strides
return torch.cat((xy, wh, outputs[..., 4:]), dim=-1)

# line 77 in tools/export_onnx.py
model.head.decode_in_inference = True

Export to onnx

# download model
wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m.pth

# export
export PYTHONPATH=$PYTHONPATH:.
python tools/export_onnx.py -c yolox_m.pth -f exps/default/yolox_m.py --output-name=yolox_m.onnx --dynamic --no-onnxsim

Execute the command

cp YOLOX/yolox_m.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make yolo -j32

YoloV3 Support

if pytorch >= 1.7, and the model is 5.0+, the model is suppored by the framework
if pytorch < 1.7 or yolov3, minor modification should be done in opset.
if you want to achieve the inference with lower pytorch, dynamic batchsize and other advanced setting, please check our blog (now in Chinese) and scan the QRcode via Wechat to join us.

Download yolov3

git clone [email protected]:ultralytics/yolov3.git

Modify the code for dynamic batchsize

# line 55 forward function in yolov3/models/yolo.py 
# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# modified into:

bs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) to x(bs,3,20,20,85)
bs = -1
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()


# line 70 in yolov3/models/yolo.py
#  z.append(y.view(bs, -1, self.no))
# modified into：
z.append(y.view(bs, self.na * ny * nx, self.no))

# line 62 in yolov3/models/yolo.py
# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
# modified into:
if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
anchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)

# line 70 in yolov3/models/yolo.py
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh

# line 73 in yolov3/models/yolo.py
# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
wh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh


# line 52 in yolov3/export.py
# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)
#                                'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85) 
# modified into:
torch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)
                                'output': {0: 'batch'}  # shape(1,25200,85)

Export to onnx model

cd yolov3
python export.py --weights=yolov3.pt --dynamic --include=onnx --opset=11

Copy the model and execute it

cp yolov3/yolov3.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp

# change src/application/app_yolo.cpp: main
# test(Yolo::Type::V3, TRT::Mode::FP32, "yolov3");

make yolo -j32

UNet Support

reference to : https://github.com/shouxieai/unet-pytorch

make dunet -j32

Retinaface Support

https://github.com/biubug6/Pytorch_Retinaface

Download Pytorch_Retinaface Repo

git clone [email protected]:biubug6/Pytorch_Retinaface.git
cd Pytorch_Retinaface

Download model from the Training of README.md in https://github.com/biubug6/Pytorch_Retinaface#training .Then unzip it to the /weights . Here, we use mobilenet0.25_Final.pth
Modify the code

# line 24 in models/retinaface.py
# return out.view(out.shape[0], -1, 2) is modified into 
return out.view(-1, int(out.size(1) * out.size(2) * 2), 2)

# line 35 in models/retinaface.py
# return out.view(out.shape[0], -1, 4) is modified into
return out.view(-1, int(out.size(1) * out.size(2) * 2), 4)

# line 46 in models/retinaface.py
# return out.view(out.shape[0], -1, 10) is modified into
return out.view(-1, int(out.size(1) * out.size(2) * 2), 10)

# The following modification ensures the output of resize node is based on scale rather than shape such that dynamic batch can be achieved.
# line 89 in models/net.py
# up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode="nearest") is modified into
up3 = F.interpolate(output3, scale_factor=2, mode="nearest")

# line 93 in models/net.py
# up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode="nearest") is modified into
up2 = F.interpolate(output2, scale_factor=2, mode="nearest")

# The following code removes softmax (bug sometimes happens). At the same time, concatenate the output to simplify the decoding.
# line 123 in models/retinaface.py
# if self.phase == 'train':
#     output = (bbox_regressions, classifications, ldm_regressions)
# else:
#     output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)
# return output
# the above is modified into:
output = (bbox_regressions, classifications, ldm_regressions)
return torch.cat(output, dim=-1)

# set 'opset_version=11' to ensure a successful export
# torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False,
#     input_names=input_names, output_names=output_names)
# is modified into:
torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False, opset_version=11,
    input_names=input_names, output_names=output_names)

Export to onnx

python convert_to_onnx.py

Execute

cp FaceDetector.onnx ../tensorRT_cpp/workspace/mb_retinaface.onnx
cd ../tensorRT_cpp
make retinaface -j64

DBFace Support

https://github.com/dlunion/DBFace

make dbface -j64

Scrfd Support

https://github.com/deepinsight/insightface/tree/master/detection/scrfd
The know-how about exporting to onnx is comming. Before it is released, come and join us to disucss.

Arcface Support

https://github.com/deepinsight/insightface/tree/master/recognition/arcface_torch

auto arcface = Arcface::create_infer("arcface_iresnet50.fp32.trtmodel", 0);
auto feature = arcface->commit(make_tuple(face, landmarks)).get();
cout << feature << endl;  // 1x512

In the example of Face Recognition, workspace/face/library is the set of faces registered.
workspace/face/recognize is the set of face to be recognized.
the result is saved in workspace/face/result和workspace/face/library_draw

CenterNet Support

check the great details in tutorial/2.0

Bert Support(Chinese Classification)

https://github.com/649453932/Bert-Chinese-Text-Classification-Pytorch
make bert -j6

the INTRO to Interface

Python Interface：Get onnx and trtmodel from pytorch model more easily

Just one line of code to export onnx and trtmodel. And save them for usage in the future.

import pytrt

model = models.resnet18(True).eval()
pytrt.from_torch(
    model, 
    dummy_input, 
    max_batch_size=16, 
    onnx_save_file="test.onnx", 
    engine_save_file="engine.trtmodel"
)

Python Interface：TensorRT Inference

YoloX TensorRT Inference

import pytrt

yolo   = tp.Yolo(engine_file, type=tp.YoloType.X)   # engine_file is the trtmodel file
image  = cv2.imread("inference/car.jpg")
bboxes = yolo.commit(image).get()

Seamless Inference from Pytorch to TensorRT

import pytrt

model     = models.resnet18(True).eval().to(device) # pt model
trt_model = tp.from_torch(model, input)
trt_out   = trt_model(input)

C++ Interface：YoloX Inference

// create infer engine on gpu 0
auto engine = Yolo::create_infer("yolox_m.fp32.trtmodel"， Yolo::Type::X, 0);

// load image
auto image = cv::imread("1.jpg");

// do inference and get the result
auto box = engine->commit(image).get();

C++ Interface：Compile Model in FP32/FP16

TRT::compile(
  TRT::Mode::FP32,   // compile model in fp32
  3,                          // max batch size
  "plugin.onnx",              // onnx file
  "plugin.fp32.trtmodel",     // save path
  {}                         //  redefine the shape of input when needed
);

For fp32 compilation, all you need is offering onnx file whose input shape is allowed to be redefined.

C++ Interface：Compile in int8

The in8 inference performs slightly worse than fp32 in precision(about -5% drop down), but stunningly faster. In the framework, we offer int8 inference

// define int8 calibration function to read data and handle it to tenor.
auto int8process = [](int current, int count, vector<string>& images, shared_ptr<TRT::Tensor>& tensor){
    for(int i = 0; i < images.size(); ++i){
    // int8 compilation requires calibration. We read image data and set_norm_mat. Then the data will be transfered into the tensor.
        auto image = cv::imread(images[i]);
        cv::resize(image, image, cv::Size(640, 640));
        float mean[] = {0, 0, 0};
        float std[]  = {1, 1, 1};
        tensor->set_norm_mat(i, image, mean, std);
    }
};


// Specify TRT::Mode as INT8
auto model_file = "yolov5m.int8.trtmodel";
TRT::compile(
  TRT::Mode::INT8,            // INT8
  3,                          // max batch size
  "yolov5m.onnx",             // onnx
  model_file,                 // saved filename
  {},                         // redefine the input shape
  int8process,                // the recall function for calibration
  ".",                        // the dir where the image data is used for calibration
  ""                          // the dir where the data generated from calibration is saved(a.k.a where to load the calibration data.)
);

We integrate into only one int8process function to save otherwise a lot of issues that might happen in tensorRT official implementation.

C++ Interface：Inference

We introduce class Tensor for easier inference and data transfer between host to device. So that as a user, the details wouldn't be annoying.
class Engine is another facilitator.

// load model and get a shared_ptr. get nullptr if fail to load.
auto engine = TRT::load_infer("yolov5m.fp32.trtmodel");

// print model info
engine->print();

// load image
auto image = imread("demo.jpg");

// get the model input and output node, which can be accessed by name or index
auto input = engine->input(0);   // or auto input = engine->input("images");
auto output = engine->output(0); // or auto output = engine->output("output");

// put the image into input tensor by calling set_norm_mat()
float mean[] = {0, 0, 0};
float std[]  = {1, 1, 1};
input->set_norm_mat(i, image, mean, std);

// do the inference. Here sync(true) or async(false) is optional
engine->forward(); // engine->forward(true or false)

// get the outut_ptr, which can used to access the output
float* output_ptr = output->cpu<float>();

C++ Interface：Plugin

You only need to define kernel function and inference process. The details of code(e.g the serialization, deserialization and injection of plugin etc) are under the hood.
Easy to implement a new plugin in FP32 and FP16. Refer to HSwish.cu for details.

template<>
__global__ void HSwishKernel(float* input, float* output, int edge) {

    KernelPositionBlock;
    float x = input[position];
    float a = x + 3;
    a = a < 0 ? 0 : (a >= 6 ? 6 : a);
    output[position] = x * a / 6;
}

int HSwish::enqueue(const std::vector<GTensor>& inputs, std::vector<GTensor>& outputs, const std::vector<GTensor>& weights, void* workspace, cudaStream_t stream) {

    int count = inputs[0].count();
    auto grid = CUDATools::grid_dims(count);
    auto block = CUDATools::block_dims(count);
    HSwishKernel <<<grid, block, 0, stream >>> (inputs[0].ptr<float>(), outputs[0].ptr<float>(), count);
    return 0;
}


RegisterPlugin(HSwish);

About Us

Our blog：http://www.zifuture.com/ (Now only in Chinese. English is comming)
Our video channel： https://space.bilibili.com/1413433465 (Now only in Chinese. English is comming)

tensorrt_pro's People

Contributors

Stargazers

Watchers

Forkers

xjsxujingsong chaucerg 52lly sryio wangzhenlin123 amanda-barbara snaildududu hxl1990 shining-love xjrelc guanbin-huang qixuxiang projektosmium xinyujituan lyf524951805 jinmin527 wanderingyu jasper-cell chhanxiao nameless-bug miaochunle liuxiaoxiao666 achang146 939810161 poemsandbirds cyhyonghong thuanbg1996 wulouzhu appmazhining outbreak-hui sofzh arnoldfychen lucianzhong guojianyang wuyx517 loveyouer l1uw3n ipyffor xinsuinizhuan iammeizu dataxujing frank21st akon-fiber xiaohuihuichao zhonguochong feizhouxiaozhu zhao7601 wangzhifeng1996 hito0512 xd-huyoubing xzm2004260 lightsalt2011 codingfarmers zjj-2015 g807504297 qinhaihong-red fuxuliu yuzhimin999 darrenzhang1007 csldali jiangzongkang 9525 bigwriteshark amour1023 pangchao-git jzx-gooner hellojiabin pablozheng goldwater668 tyh001 richardomrmu hehewang625327 luckyplusten cking111 smallflyfly windexplore yangzhegithub marsbzp shuowoshishui tangohu17 whunamikey utopiazh huyuejingling liuhk123 hedilong pengliuru zp1018 zsffuture mengruxing wy9884255 unmannedsupz an790913552 ychenyu xk-wang qinxianglinya seg123456 atlascolin lxyxl0216 af-74413592 davis-love-ai

tensorrt_pro's Issues

error: #error This file was generated by an older version of protoc which is

编译代码时出现这个问题，问题文件来自于src/tensorRT/onnx文件夹里的几个文件的预处理部分。readme上说这几个文件是ONNX编译提取的结果。我尝试编译ONNX找到这几个文件复制替换，但是仍然有问题。
请问有没有办法规避这个问题呢？或者是我编译的ONNX版本不对？

dcn fp16分支用不了

按照说明文档运行了"test FP16 centernet_r18_dcn"，发现调不了fp16接口，今天发现fp16的相关实现给注释掉了，是支持不好吗

大佬，能否考虑出一期onnx parser + trt custom plugin的教学视频？

如题

[ 17%] Linking CXX shared library plugin_list.dll
LINK Pass 1: command "D:\PROGRA~~1\MICROS~~1\2017\BUILDT~~1\VC\Tools\MSVC\1416~~1.270\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\plugin_list.dir\objects1.rsp /out:plugin_list.dll /implib:plugin_list.lib /pdb:D:\Projects\CLionProjects\smart_classroom_cpp\build\plugin_list.pdb /dll /version:0.0 /machine:x64 /debug /INCREMENTAL -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\protobuf3.11.4\lib -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\TensorRT8.0.1.6\lib -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\cuda11.1\lib\x64 -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\cudnn8.2.1\lib\x64 -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\opencv3.4.6\lib -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\pthread\lib D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:CMakeFiles\plugin_list.dir/intermediate.manifest CMakeFiles\plugin_list.dir/manifest.res" failed (exit code 1120) with the following output:
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_yolo_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_yolov5_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_yolox_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_preprocess_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_centernet_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_dbface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_retinaface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_scrfd_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolo_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolov5_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolox_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_preprocess_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_centernet_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_dbface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_retinaface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_scrfd_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolo_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolov5_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolox_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_preprocess_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_centernet_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_dbface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_retinaface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_scrfd_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 getPluginRegistry，该符号在函数 "public: cdecl nvinfer1::PluginRegistrar::PluginRegistrar(void)" (??0?$PluginRegistrar@VDCNv2PluginCreator@@@nvinfer1@@qeaa@XZ) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 getPluginRegistry
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 getPluginRegistry
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "private: void __cdecl cv::String::deallocate(void)" (?deallocate@String@cv@@AEAAXXZ)，该符号在函数 "public: __cdecl cv::String::~String(void)" (??1String@cv@@qeaa@XZ) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "private: void __cdecl cv::String::deallocate(void)" (?deallocate@String@cv@@AEAAXXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "private: void __cdecl cv::String::deallocate(void)" (?deallocate@String@cv@@AEAAXXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "int __cdecl TRT::data_type_size(enum TRT::DataType)" (?data_type_size@TRT@@YAHW4DataType@1@@z)，该符号在函数 "public: virtual unsigned __int64 __cdecl DCNv2::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@DCNv2@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::count(int)const " (?count@GTensor@ONNXPlugin@@QEBAHH@Z)，该符号在函数 "void __cdecl enqueue_native(struct cublasContext *,class std::vector<struct ONNXPlugin::GTensor,class std::allocator > const &,class std::vector<struct ONNXPlugin::GTensor,class std::allocator > &,class std::vector<struct ONNXPlugin::GTensor,class std::allocator > const &,void *,struct CUstream_st *)" (??$enqueue_native@M@@YAXPEAUcublasContext@@aebv?$vector@UGTensor@ONNXPlugin@@v?$allocator@UGTensor@ONNXPlugin@@@std@@@std@@AEAV12@1PEAXPEAUCUstream_st@@@z) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::count(int)const " (?count@GTensor@ONNXPlugin@@QEBAHH@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::count(int)const " (?count@GTensor@ONNXPlugin@@QEBAHH@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::offset_array(unsigned __int64,int const *)const " (?offset_array@GTensor@ONNXPlugin@@QEBAH_KPEBH@Z)，该符号在函数 "public: int __cdecl ONNXPlugin::GTensor::offset<>(int)const " (??$offset@$$V@GTensor@ONNXPlugin@@QEBAHH@Z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::configurePlugin(struct nvinfer1::DynamicPluginTensorDesc const *,int,struct nvinfer1::DynamicPluginTensorDesc const *,int)" (?configurePlugin@TRTPlugin@ONNXPlugin@@UEAAXPEBUDynamicPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::configurePlugin(struct nvinfer1::DynamicPluginTensorDesc const *,int,struct nvinfer1::DynamicPluginTensorDesc const *,int)" (?configurePlugin@TRTPlugin@ONNXPlugin@@UEAAXPEBUDynamicPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::configurePlugin(struct nvinfer1::DynamicPluginTensorDesc const *,int,struct nvinfer1::DynamicPluginTensorDesc const *,int)" (?configurePlugin@TRTPlugin@ONNXPlugin@@UEAAXPEBUDynamicPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: virtual __cdecl ONNXPlugin::TRTPlugin::~TRTPlugin(void)" (??1TRTPlugin@ONNXPlugin@@UEAA@XZ)，该符号在函数 "public: virtual __cdecl DCNv2::~DCNv2(void)" (??1DCNv2@@UEAA@XZ) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual __cdecl ONNXPlugin::TRTPlugin::~TRTPlugin(void)" (??1TRTPlugin@ONNXPlugin@@UEAA@XZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual __cdecl ONNXPlugin::TRTPlugin::~TRTPlugin(void)" (??1TRTPlugin@ONNXPlugin@@UEAA@XZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: void __cdecl ONNXPlugin::TRTPlugin::pluginInit(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,void const *,unsigned __int64)" (?pluginInit@TRTPlugin@ONNXPlugin@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@PEBX_K@Z)，该符号在函数 "public: virtual class nvinfer1::IPluginV2DynamicExt * cdecl DCNv2PluginCreator::deserializePlugin(char const *,void const *,unsigned int64)" (?deserializePlugin@DCNv2PluginCreator@@UEAAPEAVIPluginV2DynamicExt@nvinfer1@@PEBDPEBX_K@Z) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: void __cdecl ONNXPlugin::TRTPlugin::pluginInit(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,void const *,unsigned __int64)" (?pluginInit@TRTPlugin@ONNXPlugin@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@PEBX_K@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: void __cdecl ONNXPlugin::TRTPlugin::pluginInit(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,void const *,unsigned __int64)" (?pluginInit@TRTPlugin@ONNXPlugin@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@PEBX_K@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: virtual class std::shared_ptr __cdecl ONNXPlugin::TRTPlugin::new_config(void)" (?new_config@TRTPlugin@ONNXPlugin@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz)，该符号在函数 "public: virtual class std::shared_ptr __cdecl DCNv2::new_config(void)" (?new_config@DCNv2@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual class std::shared_ptr __cdecl ONNXPlugin::TRTPlugin::new_config(void)" (?new_config@TRTPlugin@ONNXPlugin@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual class std::shared_ptr __cdecl ONNXPlugin::TRTPlugin::new_config(void)" (?new_config@TRTPlugin@ONNXPlugin@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual bool __cdecl ONNXPlugin::TRTPlugin::supportsFormatCombination(int,struct nvinfer1::PluginTensorDesc const *,int,int)" (?supportsFormatCombination@TRTPlugin@ONNXPlugin@@UEAA_NHPEBUPluginTensorDesc@nvinfer1@@hh@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual bool __cdecl ONNXPlugin::TRTPlugin::supportsFormatCombination(int,struct nvinfer1::PluginTensorDesc const *,int,int)" (?supportsFormatCombination@TRTPlugin@ONNXPlugin@@UEAA_NHPEBUPluginTensorDesc@nvinfer1@@hh@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual bool __cdecl ONNXPlugin::TRTPlugin::supportsFormatCombination(int,struct nvinfer1::PluginTensorDesc const *,int,int)" (?supportsFormatCombination@TRTPlugin@ONNXPlugin@@UEAA_NHPEBUPluginTensorDesc@nvinfer1@@hh@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::getNbOutputs(void)const " (?getNbOutputs@TRTPlugin@ONNXPlugin@@UEBAHXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::getNbOutputs(void)const " (?getNbOutputs@TRTPlugin@ONNXPlugin@@UEBAHXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::getNbOutputs(void)const " (?getNbOutputs@TRTPlugin@ONNXPlugin@@UEBAHXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::initialize(void)" (?initialize@TRTPlugin@ONNXPlugin@@UEAAHXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::initialize(void)" (?initialize@TRTPlugin@ONNXPlugin@@UEAAHXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::initialize(void)" (?initialize@TRTPlugin@ONNXPlugin@@UEAAHXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::terminate(void)" (?terminate@TRTPlugin@ONNXPlugin@@UEAAXXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::terminate(void)" (?terminate@TRTPlugin@ONNXPlugin@@UEAAXXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::terminate(void)" (?terminate@TRTPlugin@ONNXPlugin@@UEAAXXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@TRTPlugin@ONNXPlugin@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@TRTPlugin@ONNXPlugin@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@TRTPlugin@ONNXPlugin@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::enqueue(struct nvinfer1::PluginTensorDesc const *,struct nvinfer1::PluginTensorDesc const *,void const * const *,void * const *,void *,struct CUstream_st *)" (?enqueue@TRTPlugin@ONNXPlugin@@UEAAHPEBUPluginTensorDesc@nvinfer1@@0PEBQEBXPEBQEAXPEAXPEAUCUstream_st@@@z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::enqueue(struct nvinfer1::PluginTensorDesc const *,struct nvinfer1::PluginTensorDesc const *,void const * const *,void * const *,void *,struct CUstream_st *)" (?enqueue@TRTPlugin@ONNXPlugin@@UEAAHPEBUPluginTensorDesc@nvinfer1@@0PEBQEBXPEBQEAXPEAXPEAUCUstream_st@@@z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::enqueue(struct nvinfer1::PluginTensorDesc const *,struct nvinfer1::PluginTensorDesc const *,void const * const *,void * const *,void *,struct CUstream_st *)" (?enqueue@TRTPlugin@ONNXPlugin@@UEAAHPEBUPluginTensorDesc@nvinfer1@@0PEBQEBXPEBQEAXPEAXPEAUCUstream_st@@@z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getSerializationSize(void)const " (?getSerializationSize@TRTPlugin@ONNXPlugin@@UEBA_KXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getSerializationSize(void)const " (?getSerializationSize@TRTPlugin@ONNXPlugin@@UEBA_KXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getSerializationSize(void)const " (?getSerializationSize@TRTPlugin@ONNXPlugin@@UEBA_KXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::serialize(void *)const " (?serialize@TRTPlugin@ONNXPlugin@@UEBAXPEAX@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::serialize(void *)const " (?serialize@TRTPlugin@ONNXPlugin@@UEBAXPEAX@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::serialize(void *)const " (?serialize@TRTPlugin@ONNXPlugin@@UEBAXPEAX@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 cublasGetMathMode，该符号在函数 "enum cublasStatus_t __cdecl cublasMigrateComputeType(struct cublasContext *,enum cudaDataType_t,enum cublasComputeType_t *)" (?cublasMigrateComputeType@@ya?AW4cublasStatus_t@@PEAUcublasContext@@W4cudaDataType_t@@PEAW4cublasComputeType_t@@@z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 cublasSgemm_v2，该符号在函数 "void __cdecl segemm_native(struct cublasContext *,enum cublasOperation_t,enum cublasOperation_t,int,int,int,float,float const *,int,float const *,int,float,float *,int)" (?segemm_native@@YAXPEAUcublasContext@@W4cublasOperation_t@@1HHHMPEBMH2HMPEAMH@Z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 cublasGemmEx，该符号在函数 "enum cublasStatus_t __cdecl cublasGemmEx(struct cublasContext *,enum cublasOperation_t,enum cublasOperation_t,int,int,int,void const *,void const *,enum cudaDataType_t,int,void const *,enum cudaDataType_t,int,void const *,void *,enum cudaDataType_t,int,enum cudaDataType_t,enum cublasGemmAlgo_t)" (?cublasGemmEx@@ya?AW4cublasStatus_t@@PEAUcublasContext@@W4cublasOperation_t@@1HHHPEBX2W4cudaDataType_t@@H23H2PEAX3H3W4cublasGemmAlgo_t@@@z) 中被引用
plugin_list.dll : fatal error LNK1120: 23 个无法解析的外部命令
NMAKE : fatal error U1077: “"D:\Program Files\JetBrains\CLion 2021.2\bin\cmake\win\bin\cmake.exe"”: 返回代码“0xffffffff”
Stop.
NMAKE : fatal error U1077: “"D:\Program Files\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"”: 返回代码“0x2”
Stop.
NMAKE : fatal error U1077: “"D:\Program Files\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"”: 返回代码“0x2”
Stop.
NMAKE : fatal error U1077: “"D:\Program Files\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"”: 返回代码“0x2”
Stop.

以后会支持yolov3/yolov4吗

基于repo ultralytics/yolov3，生成了yolov3-tiny的onnx模型，然后make yolo，但是预测精度比原版低了许多（如图，FP16和FP32结果一样）

retinaface+arcface在tensorRT7.x与tensorRT8.x耗时差别很大，这是什么原因呢？

1.retinaface，resnet50+tensorRT8.x耗时如下：
car.jpg, 2 faces, average time: 24.85 ms
gril.jpg, 1 faces, average time: 24.85 ms
group.jpg, 24 faces, average time: 24.85 ms
yq.jpg, 1 faces, average time: 24.85 ms
zand.jpg, 2 faces, average time: 24.85 ms
zgjr.jpg, 3 faces, average time: 24.85 ms

2.retinaface，resnet50+tensorRT7.x耗时如下：
car.jpg, 2 faces, average time: 34.78 ms
gril.jpg, 1 faces, average time: 34.78 ms
group.jpg, 24 faces, average time: 34.78 ms
yq.jpg, 1 faces, average time: 34.78 ms
zand.jpg, 2 faces, average time: 34.78 ms
zgjr.jpg, 3 faces, average time: 34.78 ms

3.retinaface+arcface，resnet50+tensorrt8.X耗时如下：
2ys1.jpg_2face_time:0.060646772384643555s
2ys3.jpg_9face_time:0.12475872039794922s
2ys5.jpg_3face_time:0.09700417518615723s
2ys2.jpg_1face_time:0.053908586502075195s

4.retinaface+arcface，resnet50+tensorrt7.X耗时如下：
2ys2.jpg_1face_time:0.33365297317504883s
2ys1.jpg_2face_time:0.6880166530609131s
2ys3.jpg_9face_time:2.6245572566986084s
2ys5.jpg_3face_time:0.9589879512786865s

同一个模型在tensorRT版本上差别很大，是不是arcface跟tensorRT7.x版本的问题？

希望大佬，我在linux下编译出现如下错误

希望大佬，我在linux下编译出现如下错误：
in function CUDATools::check_driver(cudaError_enum, char const*, int, char const*)': /home/yjy/test_trt_new/common/cuda_tools.cpp:15: undefined reference to cuGetErrorString'
/usr/bin/ld: /home/yjy/test_trt_new/common/cuda_tools.cpp:16: undefined reference to `cuGetErrorName'

请问是怎么回事啊

protobuf编译有问题

make报错

g++: error: google/protobuf/util/internal/.libs/proto_writer.o: No such file or directory
Makefile:2372: recipe for target 'libprotobuf.la' failed
make[2]: *** [libprotobuf.la] Error 1

是不是要先下载googletest，如果是的话，是不是哪个版本呢？
ubuntu18.04 jetson xavier nx

win10 error

环境：win10,vs2019,sm-75
问题：可以编译通过，执行错误，如图
尝试的过程：
1.本来cuda环境11.0,报同样的错，将cuda设置为10.2，报同样的错
2.显卡rtx titan,sm改为sm75
3.cuda路径均已修改，包括调试-》环境中的cuda路径
4.tensorrt版本进行替换，又恢复到作者的版本的tensorrt,报错依然是下面的错误

Python

你好，想问下关于python的问题，我看到您用的是anaconda的环境，找的里面的头文件和依赖，但是我这边用系统的python3.6没有找到对应的lib/include,能方便告诉我下么，感谢回复。

install

could you give any tips about the installation of this repo?

如何修改编译文件的配置参数

作者你好，本人c++代码不是特别精通，能否添加一些如何使用自己数据训练的模型进行编译推理需要修改哪些文件中配置参数的说明文档。

/usr/bin/ld: cannot find -lcuda error

when i make yolo -j8 in my ubuntu system, it occurred the error as above, i have set my protobuf version to libprotoc 3.15.8 ,and compile the project with tensorrt 8.2 version, before that, i ran the use_tensorrt_8.x.sh as well, however, this error happened , i 've tried to install the libcuda via typing apt-get install libcuda-dev, but it didn' t work, cannot find the library root.

after the investigation, i found there is no libcuda installed for high version of tensorrt 8.2, i only found the libcudart.so instead, so it is so weird. any suggestions about this error!

class nvinfer1::INetworkDefinition has no member named getErrorcorder

hello,thx for your job,but when I try to make this project, it has wrong, please tell me why? thank you very much!!

Environment Configuration Issues！

I have never used VS Code before. I want to learn from the video of the blogger, but there are always errors. Is there any solution？（I have installed Tensorrt on Windows）

ModuleNotFoundError:No module named 'trtpy.libtrtpyc'

请问这样报错是缺了什么东西么，谢谢
/tensorRT_Pro/python/trtpy/init.py",line 303,in
from .libtrtpyc import *
ModuleNotFoundError:No module named 'trtpy.libtrtpyc'

能帮忙看一个编译问题吗？

是protobuf的链接有问题吗？该怎么处理呢？谢谢！

root@44ee5adbd43e:/home/pangguoliang/projects/tensorRT_Pro/build# make yolo -j8
[ 1%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_dbface/plugin_list_generated_dbface_decode.cu.o
[ 2%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_yolo_fast/plugin_list_generated_yolox_decode.cu.o
[ 4%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_centernet/plugin_list_generated_centernet_decode.cu.o
[ 7%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_retinaface/plugin_list_generated_retinaface_decode.cu.o
[ 7%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_scrfd/plugin_list_generated_scrfd_decode.cu.o
[ 8%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_yolo/plugin_list_generated_yolo_decode.cu.o
[ 10%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/onnxplugin/plugins/plugin_list_generated_HSwish.cu.o
[ 11%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_yolo_fast/plugin_list_generated_yolov5_decode.cu.o
[ 13%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/common/plugin_list_generated_preprocess_kernel.cu.o
[ 14%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/onnxplugin/plugins/plugin_list_generated_DCNv2.cu.o
[ 16%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/onnxplugin/plugins/plugin_list_generated_HSigmoid.cu.o
Scanning dependencies of target plugin_list
[ 17%] Linking CXX shared library libplugin_list.so
[ 17%] Built target plugin_list
Scanning dependencies of target pro
[ 19%] Building CXX object CMakeFiles/pro.dir/src/application/app_alphapose.cpp.o
[ 20%] Building CXX object CMakeFiles/pro.dir/src/application/app_centernet.cpp.o
[ 22%] Building CXX object CMakeFiles/pro.dir/src/application/app_arcface.cpp.o
[ 23%] Building CXX object CMakeFiles/pro.dir/src/application/app_bert.cpp.o
[ 25%] Building CXX object CMakeFiles/pro.dir/src/application/app_arcface/arcface.cpp.o
[ 26%] Building CXX object CMakeFiles/pro.dir/src/application/app_centernet/centernet.cpp.o
[ 28%] Building CXX object CMakeFiles/pro.dir/src/application/app_dbface.cpp.o
[ 29%] Building CXX object CMakeFiles/pro.dir/src/application/app_alphapose/alpha_pose.cpp.o
[ 31%] Building CXX object CMakeFiles/pro.dir/src/application/app_dbface/dbface.cpp.o
[ 32%] Building CXX object CMakeFiles/pro.dir/src/application/app_fall_gcn/fall_gcn.cpp.o
[ 34%] Building CXX object CMakeFiles/pro.dir/src/application/app_fall_recognize.cpp.o
[ 35%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance.cpp.o
[ 37%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance/alpha_pose_high_perf.cpp.o
[ 38%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance/high_performance.cpp.o
[ 40%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance/yolo_high_perf.cpp.o
[ 41%] Building CXX object CMakeFiles/pro.dir/src/application/app_lesson.cpp.o
[ 43%] Building CXX object CMakeFiles/pro.dir/src/application/app_plugin.cpp.o
[ 44%] Building CXX object CMakeFiles/pro.dir/src/application/app_python/interface.cpp.o
[ 46%] Building CXX object CMakeFiles/pro.dir/src/application/app_retinaface.cpp.o
[ 47%] Building CXX object CMakeFiles/pro.dir/src/application/app_retinaface/retinaface.cpp.o
[ 49%] Building CXX object CMakeFiles/pro.dir/src/application/app_scrfd.cpp.o
[ 50%] Building CXX object CMakeFiles/pro.dir/src/application/app_scrfd/scrfd.cpp.o
[ 52%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo.cpp.o
[ 53%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo/yolo.cpp.o
[ 55%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo_fast.cpp.o
[ 56%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo_fast/yolo_fast.cpp.o
[ 58%] Building CXX object CMakeFiles/pro.dir/src/application/test_warpaffine.cpp.o
[ 59%] Building CXX object CMakeFiles/pro.dir/src/application/test_yolo_map.cpp.o
[ 61%] Building CXX object CMakeFiles/pro.dir/src/application/tools/auto_download.cpp.o
[ 62%] Building CXX object CMakeFiles/pro.dir/src/application/tools/deepsort.cpp.o
[ 64%] Building CXX object CMakeFiles/pro.dir/src/application/tools/zmq_remote_show.cpp.o
[ 65%] Building CXX object CMakeFiles/pro.dir/src/application/tools/zmq_u.cpp.o
[ 67%] Building CXX object CMakeFiles/pro.dir/src/main.cpp.o
[ 68%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/builder/trt_builder.cpp.o
[ 70%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/cuda_tools.cpp.o
[ 71%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/ilogger.cpp.o
[ 73%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/json.cpp.o
[ 74%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/trt_tensor.cpp.o
[ 76%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/import_lib.cpp.o
[ 77%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/infer/trt_infer.cpp.o
[ 79%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o
[ 80%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-operators-ml.pb.cpp.o
[ 82%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/LoopHelpers.cpp.o
[ 83%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/ModelImporter.cpp.o
[ 85%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/NvOnnxParser.cpp.o
[ 86%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/OnnxAttrs.cpp.o
[ 88%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/RNNHelpers.cpp.o
[ 89%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/ShapeTensor.cpp.o
[ 91%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/ShapedWeights.cpp.o
[ 92%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/builtin_op_importers.cpp.o
[ 94%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnx2trt_utils.cpp.o
[ 95%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnxErrorRecorder.cpp.o
[ 97%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnxplugin/onnxplugin.cpp.o
[ 98%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnxplugin/plugin_binary_io.cpp.o
[100%] Linking CXX executable ../workspace/pro
CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o: In function InitDefaultsscc_info_AttributeProto_onnx_2dml_2eproto()': /home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx/onnx-ml.pb.cpp:132: undefined reference to google::protobuf::internal::VerifyVersion(int, int, char const*)'
CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o: In function `InitDefaultsscc_info_FunctionProto_onnx_2dml_2eproto()':

...

/home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:805: undefined reference to
CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnx2trt_utils.cpp.o: In function onnx2trt::poolingHelper(onnx2trt::IImporterContext*, onnx::NodeProto const&, std::vector<onnx2trt::TensorOrWeights, std::allocator<onnx2trt::TensorOrWeights> >&, nvinfer1::PoolingType)': /home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1644: undefined reference to google::protobuf::RepeatedField::size() const'
CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnx2trt_utils.cpp.o: In function onnx2trt::setAttr(nvinfer1::Dims32*, onnx::AttributeProto const*, int, int)': /home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1827: undefined reference to google::protobuf::RepeatedField::size() const'
collect2: error: ld returned 1 exit status
CMakeFiles/pro.dir/build.make:1515: recipe for target '../workspace/pro' failed
make[3]: *** [../workspace/pro] Error 1
CMakeFiles/Makefile2:424: recipe for target 'CMakeFiles/pro.dir/all' failed
make[2]: *** [CMakeFiles/pro.dir/all] Error 2
CMakeFiles/Makefile2:234: recipe for target 'CMakeFiles/yolo.dir/rule' failed
make[1]: *** [CMakeFiles/yolo.dir/rule] Error 2
Makefile:183: recipe for target 'yolo' failed
make: *** [yolo] Error 2

yolox 的部署，拆分了所需要的代码封装成dll, 但是在dll 中，从 nvinfer runtime deserializeCudaEngine（tensorrt 提供的api）反序列化引擎文件时，总是返回空指针。一样的代码在exe 中没问题，能反序列化，分析结果也正确。

关于yolox 的部署，拆分了所需要的代码封装成dll, 但是在dll 中，从 nvinfer runtime deserializeCudaEngine（tensorrt 提供的api）反序列化引擎文件时，总是返回空指针。一样的代码在exe 中没问题，能反序列化，分析结果也正确。
大佬知道咋回事嘛？？

][trt_builder.cpp:36]:NVInfer: TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1

warning, errors are showed above, it is annoying and weird to concern the compatibility issues among tensorrt version and cuda, and cuda toolkit versions. i cannot figure out the difference among them, any help will be approciated!!!

i have installed tensorrt !!!

dpkg -l | grep tensorrt
ii nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.0.2.6-ga-20181009 1-1 amd64 nv-tensorrt repository configuration files
ii nv-tensorrt-repo-ubuntu1804-cuda10.1-trt5.1.5.0-ga-20190427 1-1 amd64 nv-tensorrt repository configuration files
ii nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.1.8-ga-20211117 1-1 amd64 nv-tensorrt repository configuration files
ii tensorrt 8.2.1.8-1+cuda11.4 amd64

it should be 8.2.18 cuda11.4 as i am concerned.

and after i typed nvcc -V , the cuda version is nvcc as follows:

NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

so which version of toolkit should i install??? currently, my cuda toolkit is v11.5.0 i think

一般思路下的做法是：我在A电脑导出 onnx, 然后在A电脑编译好对应的序列化工具（依赖tensorrt,cuda和cudnn，依赖项和工具一起打包）。然后要在B电脑使用的时候，用这个序列化工具去序列化 onnx. 960, 1060, 1080 下都是这么做没问题。 1660 就遇到这个问题。 [E] [TRT] c:\p4sw\sw\gpgpu\MachineLearning\DIT\release\5.1\engine\cuda\cudaConvolutionLayer.cpp (238) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionLayer::execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

一般思路下的做法是：我在A电脑导出 onnx, 然后在A电脑编译好对应的序列化工具（依赖tensorrt,cuda和cudnn，依赖项和工具一起打包）。然后要在B电脑使用的时候，用这个序列化工具去序列化 onnx.

960, 1060, 1080 下都是这么做没问题。

1660 就遇到这个问题。
[E] [TRT] c:\p4sw\sw\gpgpu\MachineLearning\DIT\release\5.1\engine\cuda\cudaConvolutionLayer.cpp (238) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionLayer::execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

插件学习困难

您好，我看视频课里插件那一集。但是代码里app_yolo里没有测试插件的代码，请问在哪里可以找到呢？谢谢。
看到yolo5那里，跟着视频转onnx，但是报错，不能解析onnx文件。请问有什么方法检查吗.不知道和output格式有没有关系

？

大佬出一个win10安装项目的视频教程吧

retinaface的输入尺寸是固定的吗，不能动态尺寸输入吗？

导出的onnx是固定尺寸输入，没有看到动态尺寸输入的配置，我尝试过用动态输入尺寸，结果在进行转化trt出现

这个分享很好，就是环境配置如果能加上dockerfile就更完美了？

现在环境配置确实很浪费时间，作者会考虑dockefile配置环境吗，

arcface人脸特征提取不同人脸，计算的相似度都一样，是怎么回事呢？

余弦距离计算代码：
def cosine_distance(matrix1, matrix2):
matrix1_matrix2 = np.dot(matrix1, matrix2.transpose())
matrix1_norm = np.sqrt(np.multiply(matrix1, matrix1).sum(axis=1))
matrix1_norm = matrix1_norm[:, np.newaxis]
matrix2_norm = np.sqrt(np.multiply(matrix2, matrix2).sum(axis=1))
matrix2_norm = matrix2_norm[:, np.newaxis]
cosine_distance = np.divide(matrix1_matrix2, np.dot(matrix1_norm, matrix2_norm.transpose()))
return cosine_distance

是不是没有进行归一化呢，归一化有没有好的函数推荐呢？

微信群二维码，9月22日以前有效

请问一下这个项目支持arm吗

Windows 10: Linker Error

Hello,

When compiling the program in Visual Studio 2019, I encounter this error:

Do you know how to fix this?

Much thanks.

src/http/ilogger.cpp:376:53: error: ‘bind’ is not a member of ‘std’ flush_thread_.reset(new thread(std::bind(&Logger::flush_job, this)));

src/http/ilogger.cpp:376:53: note: ‘std::bind’ is defined in header ‘’; did you forget to ‘#include ’?

src/http/ilogger.cpp:38:1:
+#include

大佬在restful_server文件夹下src/http/ilogger.cpp添加一下#include

yolox master分支的模型能够正常导出到onnx但生成engine失败

[2021-10-26 14:29:01][info][app_yolo.cpp:121]:===================== test YoloX FP32 yolox_s ==================================
[2021-10-26 14:29:01][info][trt_builder.cpp:471]:Compile FP32 Onnx Model 'yolox_s.onnx'.
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: INVALID_ARGUMENT: getPluginCreator could not find plugin ScatterND version 1
While parsing node number 565 [ScatterND]:
ERROR: /home/work/tracking/tensorRT_Pro/src/tensorRT/onnx_parser/builtin_op_importers.cpp:4013 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[2021-10-26 14:29:02][error][trt_builder.cpp:517]:Can not parse OnnX file: yolox_s.onnx
[2021-10-26 14:29:02][error][yolo.cpp:138]:Engine yolox_s.FP32.trtmodel load failed
[2021-10-26 14:29:02][error][app_yolo.cpp:42]:Engine is nullptr
[100%] Built target yolo

undefined reference to symbol 'dlsym@@GLIBC_2.2.5'

在Visual studio中提示错误：未定义标识符“not”

app_yolo.cpp中提示错误：
E0020 未定义标识符 "not" pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E2830 条件声明不允许使用带圆括号的初始化表达式 pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E0020 未定义标识符 "not" pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131
E0551 当前范围内无法定义函数 "iLogger::exists" pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131
E0020 未定义标识符 "not" trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E2830 条件声明不允许使用带圆括号的初始化表达式 trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E0020 未定义标识符 "not" trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131
E0551 当前范围内无法定义函数 "iLogger::exists" trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131

我的笔记本算力是5.0，改了之后重新生成失败，但是按照UP的6.1就没这个问题，是不是我哪里没配置好

加微信好友，拉进群，永久有效

安装wiki编译的时候遇到 google/protobuf/port_def.inc: No such file or directory #include <google/protobuf/port_def.inc>

tensorRT_cpp/src/tensorRT/onnx/onnx-ml.pb.h:11:10: fatal error: google/protobuf/port_def.inc: No such file or directory
#include <google/protobuf/port_def.inc>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/pro.dir/build.make:806: recipe for target 'CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o' failed
make[3]: *** [CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs....

win10配置的小问题

win10步骤中的：（修改 Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.props" / 为你配置的CUDA路径）能不能给一个具体例子。谢谢了比如我这样对不对？$(VCTargetsPath)\C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\visual_studio_integration\MSBuildExtensions\CUDA 10.2.props" /

yolov5 如何添加自己的模型，类别的修改只需要替换static const char* cocolabels[]吗？

如题，训练好的模型在python测试满足项目需求，但是在c++上把类别替换好，检测结果为0，下面是转换过程和推理过程，麻烦大神帮忙看看呀：
[2021-09-28 09:11:59][info][app_yolo.cpp:125]:===================== test YoloV5 FP16 yolov5x ==================================
[2021-09-28 09:11:59][info][trt_builder.cpp:471]:Compile FP16 Onnx Model 'yolov5x.onnx'.
[2021-09-28 09:12:00][info][trt_builder.cpp:557]:Input shape is -1 x 3 x 640 x 640
[2021-09-28 09:12:00][info][trt_builder.cpp:558]:Set max batch size = 16
[2021-09-28 09:12:00][info][trt_builder.cpp:559]:Set max workspace size = 1024.00 MB
[2021-09-28 09:12:00][info][trt_builder.cpp:562]:Network has 1 inputs:
[2021-09-28 09:12:00][info][trt_builder.cpp:568]: 0.[images] shape is -1 x 3 x 640 x 640
[2021-09-28 09:12:00][info][trt_builder.cpp:574]:Network has 3 outputs:
[2021-09-28 09:12:00][info][trt_builder.cpp:579]: 0.[output] shape is -1 x 3 x 80 x 80 x 12
[2021-09-28 09:12:00][info][trt_builder.cpp:579]: 1.[798] shape is -1 x 3 x 40 x 40 x 12
[2021-09-28 09:12:00][info][trt_builder.cpp:579]: 2.[818] shape is -1 x 3 x 20 x 20 x 12
[2021-09-28 09:12:00][info][trt_builder.cpp:583]:Network has 710 layers:
[2021-09-28 09:12:00][info][trt_builder.cpp:650]:Building engine...
[2021-09-28 09:12:02][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:12:02][warn][trt_builder.cpp:33]:NVInfer: Detected invalid timing cache, setup a local cache instead
[2021-09-28 09:19:45][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:19:45][info][trt_builder.cpp:670]:Build done 464195 ms !
[2021-09-28 09:21:35][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:21:35][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:21:35][info][trt_infer.cpp:167]:Infer 000002202FA4FAD0 detail
[2021-09-28 09:21:35][info][trt_infer.cpp:168]: Max Batch Size: 16
[2021-09-28 09:21:35][info][trt_infer.cpp:169]: Inputs: 1
[2021-09-28 09:21:35][info][trt_infer.cpp:173]: 0.images : shape {16 x 3 x 640 x 640}
[2021-09-28 09:21:35][info][trt_infer.cpp:176]: Outputs: 3
[2021-09-28 09:21:35][info][trt_infer.cpp:180]: 0.output : shape {16 x 3 x 80 x 80 x 12}
[2021-09-28 09:21:35][info][trt_infer.cpp:180]: 1.798 : shape {16 x 3 x 40 x 40 x 12}
[2021-09-28 09:21:35][info][trt_infer.cpp:180]: 2.818 : shape {16 x 3 x 20 x 20 x 12}
[2021-09-28 09:22:08][info][app_yolo.cpp:77]:yolov5x.FP16.trtmodel[YoloV5] average: 9.57 ms / image, FPS: 104.47
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/2.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F10149_I1611_L1654_T551_W99_H195_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F5834_I879_L2390_T1147_W198_H413_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F5930_I888_L1803_T905_W291_H356_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F7010_I930_L1621_T851_W200_H353_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][yolo.cpp:207]:Engine destroy.

如何加载识别网络？

请问这个框架可以加载经典的识别网络吗？例如可以把pytorch自带的mobilenet模型直接导入工程吗？可不可以提供一个示例？

速度比用torch2trt再用c++推理慢了近10倍，为什么？

一张大小800x300的图片，使用torch2trt序列化好模型后，直接用c++加载推理，速度是90ms左右，但使用该项目推理速度1200ms左右，是什么原因？

DCNv2 support/是否支持DCNv2？(comming soon)

no result in int8 mode

Hi, thanks for your awesome code share!

I test this repo on my Ubuntu 18 Linux with CUDA V11.2.152, TensorRT-8.0.0.3 and cudnn8.2.0.

I tested yolox_s model in FP32、FP16 and INT8 mode, but in INT8 mode, the network output has no results.

FP16 test code is below:

static void test_fp16(Yolo::Type type){

    TRT::set_device(0);
    INFO("===================== test %s fp16 ==================================", Yolo::type_name(type));

    const char* name = nullptr;
    if(type == Yolo::Type::V5){
        name = "yolov5m";
    }else if(type == Yolo::Type::X){
        name = "yolox_s";
    }

    if(not requires(name))
        return;

    string onnx_file = iLogger::format("%s.onnx", name);
    string model_file = iLogger::format("%s.fp16.trtmodel", name);
    int test_batch_size = 1;  // 当你需要修改batch大于1时，请注意你的模型是否修改（看readme.md代码修改部分），否则会有错误
    
    // 动态batch和静态batch，如果你想要弄清楚，请打开http://www.zifuture.com:8090/
    // 找到右边的二维码，扫码加好友后进群交流（免费哈，就是技术人员一起沟通）
    if(not iLogger::exists(model_file)){
        TRT::compile(
            TRT::TRTMode_FP16,   // 编译方式有，FP32、FP16、INT8
            {},                         // onnx时无效，caffe的输出节点标记
            test_batch_size,            // 指定编译的batch size
            onnx_file,                  // 需要编译的onnx文件
            model_file,                 // 储存的模型文件
            {},                         // 指定需要重定义的输入shape，这里可以对onnx的输入shape进行重定义
            false                       // 是否采用动态batch维度，true采用，false不采用，使用静态固定的batch size
        );
    }

    forward_engine(model_file, type);
}

Below is the output of the log:

[2021-09-01 19:38:45][info][app_yolo.cpp:240]:===================== test YoloX fp32 ==================================
[2021-09-01 19:38:45][info][trt_builder.cpp:473]:Compile FP32 Onnx Model 'yolox_s.onnx'.
[2021-09-01 19:38:45][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
[2021-09-01 19:38:45][info][trt_builder.cpp:603]:Set max batch size = 1
[2021-09-01 19:38:45][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
[2021-09-01 19:38:45][info][trt_builder.cpp:605]:Dynamic batch dimension is false
[2021-09-01 19:38:45][info][trt_builder.cpp:608]:Network has 1 inputs:
[2021-09-01 19:38:45][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-09-01 19:38:45][info][trt_builder.cpp:620]:Network has 1 outputs:
[2021-09-01 19:38:45][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
[2021-09-01 19:38:45][info][trt_builder.cpp:670]:Building engine...
[2021-09-01 19:38:45][warn][trt_builder.cpp:33]:NVInfer WARNING: Convolution + generic activation fusion is disable due to incompatible driver or nvrtc
[2021-09-01 19:38:46][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:38:46][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-09-01 19:39:18][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:18][info][trt_builder.cpp:690]:Build done 32689 ms !
[2021-09-01 19:39:18][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.

[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:19][info][trt_infer.cpp:169]:Infer 0x7fe3f8000c40 detail
[2021-09-01 19:39:19][info][trt_infer.cpp:170]: Max Batch Size: 1
[2021-09-01 19:39:19][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
[2021-09-01 19:39:19][info][trt_infer.cpp:172]: Inputs: 1
[2021-09-01 19:39:19][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
[2021-09-01 19:39:19][info][trt_infer.cpp:179]: Outputs: 1
[2021-09-01 19:39:19][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 2 object, 10.10 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 1 object, 6.19 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 2 object, 8.72 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 3 object, 6.03 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 1 object, 5.95 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 1 object, 5.92 ms
[2021-09-01 19:39:19][info][yolo.cpp:214]:Engine destroy.
[2021-09-01 19:39:19][info][app_yolo.cpp:277]:===================== test YoloX fp16 ==================================
[2021-09-01 19:39:19][info][trt_builder.cpp:473]:Compile FP16 Onnx Model 'yolox_s.onnx'.
[2021-09-01 19:39:19][warn][trt_builder.cpp:483]:Platform not have fast fp16 support
[2021-09-01 19:39:19][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:19][info][trt_builder.cpp:603]:Set max batch size = 1
[2021-09-01 19:39:19][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
[2021-09-01 19:39:19][info][trt_builder.cpp:605]:Dynamic batch dimension is false
[2021-09-01 19:39:19][info][trt_builder.cpp:608]:Network has 1 inputs:
[2021-09-01 19:39:19][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:19][info][trt_builder.cpp:620]:Network has 1 outputs:
[2021-09-01 19:39:19][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
[2021-09-01 19:39:19][info][trt_builder.cpp:670]:Building engine...
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-09-01 19:39:47][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:47][info][trt_builder.cpp:690]:Build done 28282 ms !
[2021-09-01 19:39:47][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.

[2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:48][info][trt_infer.cpp:169]:Infer 0x7fe3f8015650 detail
[2021-09-01 19:39:48][info][trt_infer.cpp:170]: Max Batch Size: 1
[2021-09-01 19:39:48][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
[2021-09-01 19:39:48][info][trt_infer.cpp:172]: Inputs: 1
[2021-09-01 19:39:48][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
[2021-09-01 19:39:48][info][trt_infer.cpp:179]: Outputs: 1
[2021-09-01 19:39:48][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 2 object, 10.75 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 1 object, 6.38 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 2 object, 9.12 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 3 object, 6.10 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 1 object, 6.07 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 1 object, 6.01 ms
[2021-09-01 19:39:48][info][yolo.cpp:214]:Engine destroy.
[2021-09-01 19:39:48][info][app_yolo.cpp:190]:===================== test YoloX int8 ==================================
[2021-09-01 19:39:48][info][trt_builder.cpp:473]:Compile INT8 Onnx Model 'yolox_s.onnx'.
[2021-09-01 19:39:48][info][trt_builder.cpp:593]:Using image list[6 files]: inference
[2021-09-01 19:39:48][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:48][info][trt_builder.cpp:603]:Set max batch size = 1
[2021-09-01 19:39:48][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
[2021-09-01 19:39:48][info][trt_builder.cpp:605]:Dynamic batch dimension is false
[2021-09-01 19:39:48][info][trt_builder.cpp:608]:Network has 1 inputs:
[2021-09-01 19:39:48][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:48][info][trt_builder.cpp:620]:Network has 1 outputs:
[2021-09-01 19:39:48][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
[2021-09-01 19:39:48][info][trt_builder.cpp:670]:Building engine...
[2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:49][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:49][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 1 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 2 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 3 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 4 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 5 / 6
[2021-09-01 19:39:50][info][app_yolo.cpp:193]:Int8 6 / 6
[2021-09-01 19:40:27][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:27][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:59][info][trt_builder.cpp:685]:No set entropyCalibratorFile, and entropyCalibrator will not save.
[2021-09-01 19:40:59][info][trt_builder.cpp:690]:Build done 70917 ms !
[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.

[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:59][info][trt_infer.cpp:169]:Infer 0x7fe3b4000c40 detail
[2021-09-01 19:40:59][info][trt_infer.cpp:170]: Max Batch Size: 1
[2021-09-01 19:40:59][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
[2021-09-01 19:40:59][info][trt_infer.cpp:172]: Inputs: 1
[2021-09-01 19:40:59][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
[2021-09-01 19:40:59][info][trt_infer.cpp:179]: Outputs: 1
[2021-09-01 19:40:59][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 0 object, 7.75 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 0 object, 4.04 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 0 object, 6.04 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 0 object, 3.74 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 0 object, 3.72 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 0 object, 3.74 ms
[2021-09-01 19:40:59][info][yolo.cpp:214]:Engine destroy.

Time-consuming in the FP16/FP32 modes is close, is it because the code I wrote has a problem?

Your sincerely!

bugs

fatal error: NvInfer.h: No such file or directory
    8 | #include "NvInfer.h"

where the TensorRT path is right.

使用了本项目提供给的arcface_iresnet50.onnx 模型
错误如下：
While parsing node number 182 [BatchNormalization]:
ERROR: /home/Project/tensorRT_Pro-main/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1523 In function scaleHelper:
[8] Assertion failed: dims.nbDims == 4 || dims.nbDims == 5
[2021-10-04 16:39:25][error][trt_builder.cpp:517]:Can not parse OnnX file: arcface_iresnet50_iii.onnx

计算机环境如下：

tensorrt 7.2
cudnn8.1
cuda 11.2
protobufv3.11.4
gpu 3080  arch86

BatchNormalization_182 是模型的倒数第二层。