Giter Club home page Giter Club logo

tensorrt_pro's Introduction

Read this in other languages: English, 简体中文.

News:

Tutorial Video

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

  • C++ Interface: 3 lines of code is all you need to run a YoloX

    // create inference engine on gpu-0
    //auto engine = Yolo::create_infer("yolov5m.fp32.trtmodel", Yolo::Type::V5, 0);
    auto engine = Yolo::create_infer("yolox_m.fp32.trtmodel", Yolo::Type::X, 0);
    
    // load image
    auto image = cv::imread("1.jpg");
    
    // do inference and get the result
    auto box = engine->commit(image).get();  // return vector<Box>
  • Python Interface:

    import pytrt
    
    model     = models.resnet18(True).eval().to(device)
    trt_model = tp.from_torch(model, input)
    trt_out   = trt_model(input)
    • simple yolo for python
    import os
    import cv2
    import numpy as np
    import pytrt as tp
    
    engine_file = "yolov5s.fp32.trtmodel"
    if not os.path.exists(engine_file):
        tp.compile_onnx_to_file(1, tp.onnx_hub("yolov5s"), engine_file)
    
    yolo   = tp.Yolo(engine_file, type=tp.YoloType.V5)
    image  = cv2.imread("car.jpg")
    bboxes = yolo.commit(image).get()
    print(f"{len(bboxes)} objects")
    
    for box in bboxes:
        left, top, right, bottom = map(int, [box.left, box.top, box.right, box.bottom])
        cv2.rectangle(image, (left, top), (right, bottom), tp.random_color(box.class_label), 5)
    
    saveto = "yolov5.car.jpg"
    print(f"Save to {saveto}")
    
    cv2.imwrite(saveto, image)
    cv2.imshow("result", image)
    cv2.waitKey()

INTRO

  1. High level interface for C++/Python.
  2. Simplify the implementation of custom plugin. And serialization and deserialization have been encapsulated for easier usage.
  3. Simplify the compile of fp32, fp16 and int8 for facilitating the deployment with C++/Python in server or embeded device.
  4. Models ready for use also with examples are RetinaFace, Scrfd, YoloV5, YoloX, Arcface, AlphaPose, CenterNet and DeepSORT(C++)

YoloX and YoloV5-series Model Test Report

app_yolo.cpp speed testing
  1. Resolution (YoloV5P5, YoloX) = (640x640), (YoloV5P6) = (1280x1280)
  2. max batch size = 16
  3. preprocessing + inference + postprocessing
  4. cuda10.2, cudnn8.2.2.26, TensorRT-8.0.1.6
  5. RTX2080Ti
  6. num of testing: take the average on the results of 100 times but excluding the first time for warmup
  7. Testing log: [workspace/perf.result.std.log (workspace/perf.result.std.log)
  8. code for testing: src/application/app_yolo.cpp
  9. images for testing: 6 images in workspace/inference
    • with resolution 810x1080,500x806,1024x684,550x676,1280x720,800x533 respetively
  10. Testing method: load 6 images. Then do the inference on the 6 images, which will be repeated for 100 times. Note that each image should be preprocessed and postprocessed.

Model Resolution Type Precision Elapsed Time FPS
yolox_x 640x640 YoloX FP32 21.879 45.71
yolox_l 640x640 YoloX FP32 12.308 81.25
yolox_m 640x640 YoloX FP32 6.862 145.72
yolox_s 640x640 YoloX FP32 3.088 323.81
yolox_x 640x640 YoloX FP16 6.763 147.86
yolox_l 640x640 YoloX FP16 3.933 254.25
yolox_m 640x640 YoloX FP16 2.515 397.55
yolox_s 640x640 YoloX FP16 1.362 734.48
yolox_x 640x640 YoloX INT8 4.070 245.68
yolox_l 640x640 YoloX INT8 2.444 409.21
yolox_m 640x640 YoloX INT8 1.730 577.98
yolox_s 640x640 YoloX INT8 1.060 943.15
yolov5x6 1280x1280 YoloV5_P6 FP32 68.022 14.70
yolov5l6 1280x1280 YoloV5_P6 FP32 37.931 26.36
yolov5m6 1280x1280 YoloV5_P6 FP32 20.127 49.69
yolov5s6 1280x1280 YoloV5_P6 FP32 8.715 114.75
yolov5x 640x640 YoloV5_P5 FP32 18.480 54.11
yolov5l 640x640 YoloV5_P5 FP32 10.110 98.91
yolov5m 640x640 YoloV5_P5 FP32 5.639 177.33
yolov5s 640x640 YoloV5_P5 FP32 2.578 387.92
yolov5x6 1280x1280 YoloV5_P6 FP16 20.877 47.90
yolov5l6 1280x1280 YoloV5_P6 FP16 10.960 91.24
yolov5m6 1280x1280 YoloV5_P6 FP16 7.236 138.20
yolov5s6 1280x1280 YoloV5_P6 FP16 3.851 259.68
yolov5x 640x640 YoloV5_P5 FP16 5.933 168.55
yolov5l 640x640 YoloV5_P5 FP16 3.450 289.86
yolov5m 640x640 YoloV5_P5 FP16 2.184 457.90
yolov5s 640x640 YoloV5_P5 FP16 1.307 765.10
yolov5x6 1280x1280 YoloV5_P6 INT8 12.207 81.92
yolov5l6 1280x1280 YoloV5_P6 INT8 7.221 138.49
yolov5m6 1280x1280 YoloV5_P6 INT8 5.248 190.55
yolov5s6 1280x1280 YoloV5_P6 INT8 3.149 317.54
yolov5x 640x640 YoloV5_P5 INT8 3.704 269.97
yolov5l 640x640 YoloV5_P5 INT8 2.255 443.53
yolov5m 640x640 YoloV5_P5 INT8 1.674 597.40
yolov5s 640x640 YoloV5_P5 INT8 1.143 874.91
app_yolo_fast.cpp speed testing. Never stop desiring for being faster
  • Highlight: 0.5 ms faster without any loss in precision compared with the above. Specifically, we remove the Focus and some transpose nodes etc, and implement them in CUDA kenerl function. But the rest remains the same.
  • Test log: workspace/perf.result.std.log
  • Code for testing: src/application/app_yolo_fast.cpp
  • Tips: you can do the modification while refering to the downloaded onnx. Any questions are welcomed through any kinds of contact.
  • Conclusion: the main idea of this work is to optimize the pre-and-post processing. If you go for yolox, yolov5 small version, the optimization might help you.
Model Resolution Type Precision Elapsed Time FPS
yolox_x_fast 640x640 YoloX FP32 21.598 46.30
yolox_l_fast 640x640 YoloX FP32 12.199 81.97
yolox_m_fast 640x640 YoloX FP32 6.819 146.65
yolox_s_fast 640x640 YoloX FP32 2.979 335.73
yolox_x_fast 640x640 YoloX FP16 6.764 147.84
yolox_l_fast 640x640 YoloX FP16 3.866 258.64
yolox_m_fast 640x640 YoloX FP16 2.386 419.16
yolox_s_fast 640x640 YoloX FP16 1.259 794.36
yolox_x_fast 640x640 YoloX INT8 3.918 255.26
yolox_l_fast 640x640 YoloX INT8 2.292 436.38
yolox_m_fast 640x640 YoloX INT8 1.589 629.49
yolox_s_fast 640x640 YoloX INT8 0.954 1048.47
yolov5x6_fast 1280x1280 YoloV5_P6 FP32 67.075 14.91
yolov5l6_fast 1280x1280 YoloV5_P6 FP32 37.491 26.67
yolov5m6_fast 1280x1280 YoloV5_P6 FP32 19.422 51.49
yolov5s6_fast 1280x1280 YoloV5_P6 FP32 7.900 126.57
yolov5x_fast 640x640 YoloV5_P5 FP32 18.554 53.90
yolov5l_fast 640x640 YoloV5_P5 FP32 10.060 99.41
yolov5m_fast 640x640 YoloV5_P5 FP32 5.500 181.82
yolov5s_fast 640x640 YoloV5_P5 FP32 2.342 427.07
yolov5x6_fast 1280x1280 YoloV5_P6 FP16 20.538 48.69
yolov5l6_fast 1280x1280 YoloV5_P6 FP16 10.404 96.12
yolov5m6_fast 1280x1280 YoloV5_P6 FP16 6.577 152.06
yolov5s6_fast 1280x1280 YoloV5_P6 FP16 3.087 323.99
yolov5x_fast 640x640 YoloV5_P5 FP16 5.919 168.95
yolov5l_fast 640x640 YoloV5_P5 FP16 3.348 298.69
yolov5m_fast 640x640 YoloV5_P5 FP16 2.015 496.34
yolov5s_fast 640x640 YoloV5_P5 FP16 1.087 919.63
yolov5x6_fast 1280x1280 YoloV5_P6 INT8 11.236 89.00
yolov5l6_fast 1280x1280 YoloV5_P6 INT8 6.235 160.38
yolov5m6_fast 1280x1280 YoloV5_P6 INT8 4.311 231.97
yolov5s6_fast 1280x1280 YoloV5_P6 INT8 2.139 467.45
yolov5x_fast 640x640 YoloV5_P5 INT8 3.456 289.37
yolov5l_fast 640x640 YoloV5_P5 INT8 2.019 495.41
yolov5m_fast 640x640 YoloV5_P5 INT8 1.425 701.71
yolov5s_fast 640x640 YoloV5_P5 INT8 0.844 1185.47

Setup and Configuration

Linux
  1. VSCode (highly recommended!)
  2. Configure your path for cudnn, cuda, tensorRT8.0 and protobuf.
  3. Configure the compute capability matched with your nvidia graphics card in Makefile/CMakeLists.txt
  4. Configure your library path in .vscode/c_cpp_properties.json
  5. CUDA version: CUDA10.2
  6. CUDNN version: cudnn8.2.2.26. Note that dev(.h file) and runtime(.so file) should be downloaded.
  7. tensorRT version:tensorRT-8.0.1.6-cuda10.2
  8. protobuf version(for onnx parser):protobufv3.11.4
  • CMake:
    • mkdir build && cd build
    • cmake ..
    • make yolo -j8
  • Makefile:
    • make yolo -j8
Linux: Compile for Python
  • compile and install
    • Makefile:
      • set use_python := true in Makefile
    • CMakeLists.txt:
      • set(HAS_PYTHON ON) in CMakeLists.txt
    • Type in make pyinstall -j8
    • Complied files are in python/pytrt/libpytrtc.so
Windows
  1. Please check the lean/README.md for the detailed dependency

  2. In TensorRT.vcxproj, replace the <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.props" /> with your own CUDA path

  3. In TensorRT.vcxproj, replace the <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.targets" /> with your own CUDA path

  4. In TensorRT.vcxproj, replace the <CodeGeneration>compute_61,sm_61</CodeGeneration> with your compute capability.

  5. Configure your dependency or download it to the foler /lean. Configure VC++ dir (include dir and refence)

  6. Configure your env, debug->environment

  7. Compile and run the example, where 3 options are available.

Windows: Compile for Python
  1. Compile pytrtc.pyd. Choose python in visual studio to compile
  2. Copy dll and execute 'python/copy_dll_to_pytrt.bat'
  3. Execute the example in python dir by 'python test_yolov5.py'
  • if installation is needed, switch to target env(e.g. your conda env) then 'python setup.py install', which has to be followed by step 1 and step 2.
  • the compiled files are in python/pytrt/libpytrtc.pyd
Other Protobuf Version
  • in onnx/make_pb.sh, replace the path protoc=/data/sxai/lean/protobuf3.11.4/bin/protoc in protoc with the protoc of your own version
#cd the path in terminal to /onnx
cd onnx

#execuete the command to make pb files
bash make_pb.sh
  • CMake:
    • replace the set(PROTOBUF_DIR "/data/sxai/lean/protobuf3.11.4") in CMakeLists.txt with the same path of your protoc.
mkdir build && cd build
cmake ..
make yolo -j64
  • Makefile:
    • replace the path lean_protobuf := /data/sxai/lean/protobuf3.11.4 in Makefile with the same path of protoc
make yolo -j64
TensorRT 7.x support
  • The default is tensorRT8.x
  1. Replace onnx_parser_for_7.x/onnx_parser to src/tensorRT/onnx_parser
    • bash onnx_parser/use_tensorrt_7.x.sh
  2. Configure Makefile/CMakeLists.txt path to TensorRT7.x
  3. Execute make yolo -j64
TensorRT 8.x support
  • The default is tensorRT8.x
  1. Replace onnx_parser_for_8.x/onnx_parser to src/tensorRT/onnx_parser
    • bash onnx_parser/use_tensorrt_8.x.sh
  2. Configure Makefile/CMakeLists.txt path to TensorRT8.x
  3. Execute make yolo -j64

Guide for Different Tasks/Model Support

YoloV5 Support
  • if pytorch >= 1.7, and the model is 5.0+, the model is suppored by the framework
  • if pytorch < 1.7 or yolov5(2.0, 3.0 or 4.0), minor modification should be done in opset.
  • if you want to achieve the inference with lower pytorch, dynamic batchsize and other advanced setting, please check our blog (now in Chinese) and scan the QRcode via Wechat to join us.
  1. Download yolov5
git clone [email protected]:ultralytics/yolov5.git
  1. Modify the code for dynamic batchsize
# line 55 forward function in yolov5/models/yolo.py 
# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# modified into:

bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
bs = -1
ny = int(ny)
nx = int(nx)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

# line 70 in yolov5/models/yolo.py
#  z.append(y.view(bs, -1, self.no))
# modified into:
z.append(y.view(bs, self.na * ny * nx, self.no))

############# for yolov5-6.0 #####################
# line 65 in yolov5/models/yolo.py
# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
# modified into:
if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)

# disconnect for pytorch trace
anchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)

# line 70 in yolov5/models/yolo.py
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh

# line 73 in yolov5/models/yolo.py
# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
wh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh
############# for yolov5-6.0 #####################


# line 52 in yolov5/export.py
# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)
#                                'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85)  修改为
# modified into:
torch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)
                                'output': {0: 'batch'}  # shape(1,25200,85) 
  1. Export to onnx model
cd yolov5
python export.py --weights=yolov5s.pt --dynamic --include=onnx --opset=11
  1. Copy the model and execute it
cp yolov5/yolov5s.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make yolo -j32
YoloV7 Support 1. Download yolov7 and pth
# from cdn
# or wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

wget https://cdn.githubjs.cf/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
git clone [email protected]:WongKinYiu/yolov7.git
  1. Modify the code for dynamic batchsize
# line 45 forward function in yolov7/models/yolo.py 
# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# modified into:

bs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) to x(bs,3,20,20,85)
bs = -1
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

# line 52 in yolov7/models/yolo.py
# y = x[i].sigmoid()
# y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# z.append(y.view(bs, -1, self.no))
# modified into:
y = x[i].sigmoid()
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, -1, 1, 1, 2)  # wh
classif = y[..., 4:]
y = torch.cat([xy, wh, classif], dim=-1)
z.append(y.view(bs, self.na * ny * nx, self.no))

# line 57 in yolov7/models/yolo.py
# return x if self.training else (torch.cat(z, 1), x)
# modified into:
return x if self.training else torch.cat(z, 1)


# line 52 in yolov7/models/export.py
# output_names=['classes', 'boxes'] if y is None else ['output'],
# dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # size(1,3,640,640)
#               'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic else None)
# modified into:
output_names=['classes', 'boxes'] if y is None else ['output'],
dynamic_axes={'images': {0: 'batch'},  # size(1,3,640,640)
              'output': {0: 'batch'}} if opt.dynamic else None)
  1. Export to onnx model
cd yolov7
python models/export.py --dynamic --grid --weight=yolov7.pt
  1. Copy the model and execute it
cp yolov7/yolov7.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make yolo -j32
YoloX Support
  1. Download YoloX
git clone [email protected]:Megvii-BaseDetection/YOLOX.git
cd YOLOX
  1. Modify the code The modification ensures a successful int8 compilation and inference, otherwise Missing scale and zero-point for tensor (Unnamed Layer* 686) will be raised.
# line 206 forward fuction in yolox/models/yolo_head.py. Replace the commented code with the uncommented code
# self.hw = [x.shape[-2:] for x in outputs] 
self.hw = [list(map(int, x.shape[-2:])) for x in outputs]


# line 208 forward function in yolox/models/yolo_head.py. Replace the commented code with the uncommented code
# [batch, n_anchors_all, 85]
# outputs = torch.cat(
#     [x.flatten(start_dim=2) for x in outputs], dim=2
# ).permute(0, 2, 1)
proc_view = lambda x: x.view(-1, int(x.size(1)), int(x.size(2) * x.size(3)))
outputs = torch.cat(
    [proc_view(x) for x in outputs], dim=2
).permute(0, 2, 1)


# line 253 decode_output function in yolox/models/yolo_head.py Replace the commented code with the uncommented code
#outputs[..., :2] = (outputs[..., :2] + grids) * strides
#outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
#return outputs
xy = (outputs[..., :2] + grids) * strides
wh = torch.exp(outputs[..., 2:4]) * strides
return torch.cat((xy, wh, outputs[..., 4:]), dim=-1)

# line 77 in tools/export_onnx.py
model.head.decode_in_inference = True
  1. Export to onnx
# download model
wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m.pth

# export
export PYTHONPATH=$PYTHONPATH:.
python tools/export_onnx.py -c yolox_m.pth -f exps/default/yolox_m.py --output-name=yolox_m.onnx --dynamic --no-onnxsim
  1. Execute the command
cp YOLOX/yolox_m.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make yolo -j32
YoloV3 Support
  • if pytorch >= 1.7, and the model is 5.0+, the model is suppored by the framework
  • if pytorch < 1.7 or yolov3, minor modification should be done in opset.
  • if you want to achieve the inference with lower pytorch, dynamic batchsize and other advanced setting, please check our blog (now in Chinese) and scan the QRcode via Wechat to join us.
  1. Download yolov3
git clone [email protected]:ultralytics/yolov3.git
  1. Modify the code for dynamic batchsize
# line 55 forward function in yolov3/models/yolo.py 
# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# modified into:

bs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) to x(bs,3,20,20,85)
bs = -1
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()


# line 70 in yolov3/models/yolo.py
#  z.append(y.view(bs, -1, self.no))
# modified into:
z.append(y.view(bs, self.na * ny * nx, self.no))

# line 62 in yolov3/models/yolo.py
# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
# modified into:
if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
anchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)

# line 70 in yolov3/models/yolo.py
# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh

# line 73 in yolov3/models/yolo.py
# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
# modified into:
wh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh


# line 52 in yolov3/export.py
# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)
#                                'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85) 
# modified into:
torch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)
                                'output': {0: 'batch'}  # shape(1,25200,85) 
  1. Export to onnx model
cd yolov3
python export.py --weights=yolov3.pt --dynamic --include=onnx --opset=11
  1. Copy the model and execute it
cp yolov3/yolov3.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp

# change src/application/app_yolo.cpp: main
# test(Yolo::Type::V3, TRT::Mode::FP32, "yolov3");

make yolo -j32
UNet Support
make dunet -j32
Retinaface Support
  1. Download Pytorch_Retinaface Repo
git clone [email protected]:biubug6/Pytorch_Retinaface.git
cd Pytorch_Retinaface
  1. Download model from the Training of README.md in https://github.com/biubug6/Pytorch_Retinaface#training .Then unzip it to the /weights . Here, we use mobilenet0.25_Final.pth

  2. Modify the code

# line 24 in models/retinaface.py
# return out.view(out.shape[0], -1, 2) is modified into 
return out.view(-1, int(out.size(1) * out.size(2) * 2), 2)

# line 35 in models/retinaface.py
# return out.view(out.shape[0], -1, 4) is modified into
return out.view(-1, int(out.size(1) * out.size(2) * 2), 4)

# line 46 in models/retinaface.py
# return out.view(out.shape[0], -1, 10) is modified into
return out.view(-1, int(out.size(1) * out.size(2) * 2), 10)

# The following modification ensures the output of resize node is based on scale rather than shape such that dynamic batch can be achieved.
# line 89 in models/net.py
# up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode="nearest") is modified into
up3 = F.interpolate(output3, scale_factor=2, mode="nearest")

# line 93 in models/net.py
# up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode="nearest") is modified into
up2 = F.interpolate(output2, scale_factor=2, mode="nearest")

# The following code removes softmax (bug sometimes happens). At the same time, concatenate the output to simplify the decoding.
# line 123 in models/retinaface.py
# if self.phase == 'train':
#     output = (bbox_regressions, classifications, ldm_regressions)
# else:
#     output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)
# return output
# the above is modified into:
output = (bbox_regressions, classifications, ldm_regressions)
return torch.cat(output, dim=-1)

# set 'opset_version=11' to ensure a successful export
# torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False,
#     input_names=input_names, output_names=output_names)
# is modified into:
torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False, opset_version=11,
    input_names=input_names, output_names=output_names)


  1. Export to onnx
python convert_to_onnx.py
  1. Execute
cp FaceDetector.onnx ../tensorRT_cpp/workspace/mb_retinaface.onnx
cd ../tensorRT_cpp
make retinaface -j64
DBFace Support
make dbface -j64
Scrfd Support
Arcface Support
auto arcface = Arcface::create_infer("arcface_iresnet50.fp32.trtmodel", 0);
auto feature = arcface->commit(make_tuple(face, landmarks)).get();
cout << feature << endl;  // 1x512
  • In the example of Face Recognition, workspace/face/library is the set of faces registered.
  • workspace/face/recognize is the set of face to be recognized.
  • the result is saved in workspace/face/resultworkspace/face/library_draw
CenterNet Support

check the great details in tutorial/2.0

Bert Support(Chinese Classification)

the INTRO to Interface

Python Interface:Get onnx and trtmodel from pytorch model more easily
  • Just one line of code to export onnx and trtmodel. And save them for usage in the future.
import pytrt

model = models.resnet18(True).eval()
pytrt.from_torch(
    model, 
    dummy_input, 
    max_batch_size=16, 
    onnx_save_file="test.onnx", 
    engine_save_file="engine.trtmodel"
)
Python Interface:TensorRT Inference
  • YoloX TensorRT Inference
import pytrt

yolo   = tp.Yolo(engine_file, type=tp.YoloType.X)   # engine_file is the trtmodel file
image  = cv2.imread("inference/car.jpg")
bboxes = yolo.commit(image).get()
  • Seamless Inference from Pytorch to TensorRT
import pytrt

model     = models.resnet18(True).eval().to(device) # pt model
trt_model = tp.from_torch(model, input)
trt_out   = trt_model(input)
C++ Interface:YoloX Inference
// create infer engine on gpu 0
auto engine = Yolo::create_infer("yolox_m.fp32.trtmodel", Yolo::Type::X, 0);

// load image
auto image = cv::imread("1.jpg");

// do inference and get the result
auto box = engine->commit(image).get();
C++ Interface:Compile Model in FP32/FP16
TRT::compile(
  TRT::Mode::FP32,   // compile model in fp32
  3,                          // max batch size
  "plugin.onnx",              // onnx file
  "plugin.fp32.trtmodel",     // save path
  {}                         //  redefine the shape of input when needed
);
  • For fp32 compilation, all you need is offering onnx file whose input shape is allowed to be redefined.
C++ Interface:Compile in int8
  • The in8 inference performs slightly worse than fp32 in precision(about -5% drop down), but stunningly faster. In the framework, we offer int8 inference
// define int8 calibration function to read data and handle it to tenor.
auto int8process = [](int current, int count, vector<string>& images, shared_ptr<TRT::Tensor>& tensor){
    for(int i = 0; i < images.size(); ++i){
    // int8 compilation requires calibration. We read image data and set_norm_mat. Then the data will be transfered into the tensor.
        auto image = cv::imread(images[i]);
        cv::resize(image, image, cv::Size(640, 640));
        float mean[] = {0, 0, 0};
        float std[]  = {1, 1, 1};
        tensor->set_norm_mat(i, image, mean, std);
    }
};


// Specify TRT::Mode as INT8
auto model_file = "yolov5m.int8.trtmodel";
TRT::compile(
  TRT::Mode::INT8,            // INT8
  3,                          // max batch size
  "yolov5m.onnx",             // onnx
  model_file,                 // saved filename
  {},                         // redefine the input shape
  int8process,                // the recall function for calibration
  ".",                        // the dir where the image data is used for calibration
  ""                          // the dir where the data generated from calibration is saved(a.k.a where to load the calibration data.)
);
  • We integrate into only one int8process function to save otherwise a lot of issues that might happen in tensorRT official implementation.
C++ Interface:Inference
  • We introduce class Tensor for easier inference and data transfer between host to device. So that as a user, the details wouldn't be annoying.

  • class Engine is another facilitator.

// load model and get a shared_ptr. get nullptr if fail to load.
auto engine = TRT::load_infer("yolov5m.fp32.trtmodel");

// print model info
engine->print();

// load image
auto image = imread("demo.jpg");

// get the model input and output node, which can be accessed by name or index
auto input = engine->input(0);   // or auto input = engine->input("images");
auto output = engine->output(0); // or auto output = engine->output("output");

// put the image into input tensor by calling set_norm_mat()
float mean[] = {0, 0, 0};
float std[]  = {1, 1, 1};
input->set_norm_mat(i, image, mean, std);

// do the inference. Here sync(true) or async(false) is optional
engine->forward(); // engine->forward(true or false)

// get the outut_ptr, which can used to access the output
float* output_ptr = output->cpu<float>();
C++ Interface:Plugin
  • You only need to define kernel function and inference process. The details of code(e.g the serialization, deserialization and injection of plugin etc) are under the hood.
  • Easy to implement a new plugin in FP32 and FP16. Refer to HSwish.cu for details.
template<>
__global__ void HSwishKernel(float* input, float* output, int edge) {

    KernelPositionBlock;
    float x = input[position];
    float a = x + 3;
    a = a < 0 ? 0 : (a >= 6 ? 6 : a);
    output[position] = x * a / 6;
}

int HSwish::enqueue(const std::vector<GTensor>& inputs, std::vector<GTensor>& outputs, const std::vector<GTensor>& weights, void* workspace, cudaStream_t stream) {

    int count = inputs[0].count();
    auto grid = CUDATools::grid_dims(count);
    auto block = CUDATools::block_dims(count);
    HSwishKernel <<<grid, block, 0, stream >>> (inputs[0].ptr<float>(), outputs[0].ptr<float>(), count);
    return 0;
}


RegisterPlugin(HSwish);

About Us

tensorrt_pro's People

Contributors

baiiiij avatar brucesonbo avatar gghlh avatar guanbin-huang avatar lacacy avatar liuanqi-libra7 avatar lvkerui avatar shouxieai avatar xiongdafeng avatar yangmust avatar zhao7601 avatar zhuipiaochen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorrt_pro's Issues

error: #error This file was generated by an older version of protoc which is

编译代码时出现这个问题,问题文件来自于src/tensorRT/onnx文件夹里的几个文件的预处理部分。readme上说这几个文件是ONNX编译提取的结果。我尝试编译ONNX找到这几个文件复制替换,但是仍然有问题。
请问有没有办法规避这个问题呢?或者是我编译的ONNX版本不对?

dcn fp16分支用不了

按照说明文档运行了"test FP16 centernet_r18_dcn",发现调不了fp16接口,今天发现fp16的相关实现给注释掉了,是支持不好吗

Window10 编译出错

[ 17%] Linking CXX shared library plugin_list.dll
LINK Pass 1: command "D:\PROGRA1\MICROS1\2017\BUILDT1\VC\Tools\MSVC\14161.270\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\plugin_list.dir\objects1.rsp /out:plugin_list.dll /implib:plugin_list.lib /pdb:D:\Projects\CLionProjects\smart_classroom_cpp\build\plugin_list.pdb /dll /version:0.0 /machine:x64 /debug /INCREMENTAL -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\protobuf3.11.4\lib -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\TensorRT8.0.1.6\lib -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\cuda11.1\lib\x64 -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\cudnn8.2.1\lib\x64 -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\opencv3.4.6\lib -LIBPATH:D:\Projects\CLionProjects\smart_classroom_cpp\lean\pthread\lib D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:CMakeFiles\plugin_list.dir/intermediate.manifest CMakeFiles\plugin_list.dir/manifest.res" failed (exit code 1120) with the following output:
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_yolo_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_yolov5_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_yolox_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_preprocess_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_centernet_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_dbface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_retinaface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_scrfd_decode.cu.obj : error LNK2001: 无法解析的外部符号 "void __cdecl iLogger::__log_func(char const *,int,enum iLogger::LogLevel,char const *,...)" (?__log_func@iLogger@@YAXPEBDHW4LogLevel@1@0ZZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolo_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolov5_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolox_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_preprocess_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_centernet_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_dbface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_retinaface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_scrfd_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::grid_dims(int)" (?grid_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolo_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolov5_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_yolox_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_preprocess_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_centernet_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_dbface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_retinaface_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_scrfd_decode.cu.obj : error LNK2001: 无法解析的外部符号 "struct dim3 __cdecl CUDATools::block_dims(int)" (?block_dims@CUDATools@@ya?AUdim3@@h@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 getPluginRegistry,该符号在函数 "public: cdecl nvinfer1::PluginRegistrar::PluginRegistrar(void)" (??0?$PluginRegistrar@VDCNv2PluginCreator@@@nvinfer1@@qeaa@XZ) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 getPluginRegistry
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 getPluginRegistry
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "private: void __cdecl cv::String::deallocate(void)" (?deallocate@String@cv@@AEAAXXZ),该符号在函数 "public: __cdecl cv::String::~String(void)" (??1String@cv@@qeaa@XZ) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "private: void __cdecl cv::String::deallocate(void)" (?deallocate@String@cv@@AEAAXXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "private: void __cdecl cv::String::deallocate(void)" (?deallocate@String@cv@@AEAAXXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "int __cdecl TRT::data_type_size(enum TRT::DataType)" (?data_type_size@TRT@@YAHW4DataType@1@@z),该符号在函数 "public: virtual unsigned __int64 __cdecl DCNv2::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@DCNv2@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::count(int)const " (?count@GTensor@ONNXPlugin@@QEBAHH@Z),该符号在函数 "void __cdecl enqueue_native(struct cublasContext *,class std::vector<struct ONNXPlugin::GTensor,class std::allocator > const &,class std::vector<struct ONNXPlugin::GTensor,class std::allocator > &,class std::vector<struct ONNXPlugin::GTensor,class std::allocator > const &,void *,struct CUstream_st *)" (??$enqueue_native@M@@YAXPEAUcublasContext@@aebv?$vector@UGTensor@ONNXPlugin@@v?$allocator@UGTensor@ONNXPlugin@@@std@@@std@@AEAV12@1PEAXPEAUCUstream_st@@@z) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::count(int)const " (?count@GTensor@ONNXPlugin@@QEBAHH@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::count(int)const " (?count@GTensor@ONNXPlugin@@QEBAHH@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: int __cdecl ONNXPlugin::GTensor::offset_array(unsigned __int64,int const *)const " (?offset_array@GTensor@ONNXPlugin@@QEBAH_KPEBH@Z),该符号在函数 "public: int __cdecl ONNXPlugin::GTensor::offset<>(int)const " (??$offset@$$V@GTensor@ONNXPlugin@@QEBAHH@Z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::configurePlugin(struct nvinfer1::DynamicPluginTensorDesc const *,int,struct nvinfer1::DynamicPluginTensorDesc const *,int)" (?configurePlugin@TRTPlugin@ONNXPlugin@@UEAAXPEBUDynamicPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::configurePlugin(struct nvinfer1::DynamicPluginTensorDesc const *,int,struct nvinfer1::DynamicPluginTensorDesc const *,int)" (?configurePlugin@TRTPlugin@ONNXPlugin@@UEAAXPEBUDynamicPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::configurePlugin(struct nvinfer1::DynamicPluginTensorDesc const *,int,struct nvinfer1::DynamicPluginTensorDesc const *,int)" (?configurePlugin@TRTPlugin@ONNXPlugin@@UEAAXPEBUDynamicPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: virtual __cdecl ONNXPlugin::TRTPlugin::~TRTPlugin(void)" (??1TRTPlugin@ONNXPlugin@@UEAA@XZ),该符号在函数 "public: virtual __cdecl DCNv2::~DCNv2(void)" (??1DCNv2@@UEAA@XZ) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual __cdecl ONNXPlugin::TRTPlugin::~TRTPlugin(void)" (??1TRTPlugin@ONNXPlugin@@UEAA@XZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual __cdecl ONNXPlugin::TRTPlugin::~TRTPlugin(void)" (??1TRTPlugin@ONNXPlugin@@UEAA@XZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: void __cdecl ONNXPlugin::TRTPlugin::pluginInit(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,void const *,unsigned __int64)" (?pluginInit@TRTPlugin@ONNXPlugin@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@PEBX_K@Z),该符号在函数 "public: virtual class nvinfer1::IPluginV2DynamicExt * cdecl DCNv2PluginCreator::deserializePlugin(char const *,void const *,unsigned int64)" (?deserializePlugin@DCNv2PluginCreator@@UEAAPEAVIPluginV2DynamicExt@nvinfer1@@PEBDPEBX_K@Z) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: void __cdecl ONNXPlugin::TRTPlugin::pluginInit(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,void const *,unsigned __int64)" (?pluginInit@TRTPlugin@ONNXPlugin@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@PEBX_K@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: void __cdecl ONNXPlugin::TRTPlugin::pluginInit(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,void const *,unsigned __int64)" (?pluginInit@TRTPlugin@ONNXPlugin@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@PEBX_K@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 "public: virtual class std::shared_ptr __cdecl ONNXPlugin::TRTPlugin::new_config(void)" (?new_config@TRTPlugin@ONNXPlugin@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz),该符号在函数 "public: virtual class std::shared_ptr __cdecl DCNv2::new_config(void)" (?new_config@DCNv2@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz) 中被引用
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual class std::shared_ptr __cdecl ONNXPlugin::TRTPlugin::new_config(void)" (?new_config@TRTPlugin@ONNXPlugin@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual class std::shared_ptr __cdecl ONNXPlugin::TRTPlugin::new_config(void)" (?new_config@TRTPlugin@ONNXPlugin@@UEAA?AV?$shared_ptr@ULayerConfig@ONNXPlugin@@@std@@xz)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual bool __cdecl ONNXPlugin::TRTPlugin::supportsFormatCombination(int,struct nvinfer1::PluginTensorDesc const *,int,int)" (?supportsFormatCombination@TRTPlugin@ONNXPlugin@@UEAA_NHPEBUPluginTensorDesc@nvinfer1@@hh@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual bool __cdecl ONNXPlugin::TRTPlugin::supportsFormatCombination(int,struct nvinfer1::PluginTensorDesc const *,int,int)" (?supportsFormatCombination@TRTPlugin@ONNXPlugin@@UEAA_NHPEBUPluginTensorDesc@nvinfer1@@hh@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual bool __cdecl ONNXPlugin::TRTPlugin::supportsFormatCombination(int,struct nvinfer1::PluginTensorDesc const *,int,int)" (?supportsFormatCombination@TRTPlugin@ONNXPlugin@@UEAA_NHPEBUPluginTensorDesc@nvinfer1@@hh@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::getNbOutputs(void)const " (?getNbOutputs@TRTPlugin@ONNXPlugin@@UEBAHXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::getNbOutputs(void)const " (?getNbOutputs@TRTPlugin@ONNXPlugin@@UEBAHXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::getNbOutputs(void)const " (?getNbOutputs@TRTPlugin@ONNXPlugin@@UEBAHXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::initialize(void)" (?initialize@TRTPlugin@ONNXPlugin@@UEAAHXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::initialize(void)" (?initialize@TRTPlugin@ONNXPlugin@@UEAAHXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::initialize(void)" (?initialize@TRTPlugin@ONNXPlugin@@UEAAHXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::terminate(void)" (?terminate@TRTPlugin@ONNXPlugin@@UEAAXXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::terminate(void)" (?terminate@TRTPlugin@ONNXPlugin@@UEAAXXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::terminate(void)" (?terminate@TRTPlugin@ONNXPlugin@@UEAAXXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@TRTPlugin@ONNXPlugin@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@TRTPlugin@ONNXPlugin@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getWorkspaceSize(struct nvinfer1::PluginTensorDesc const *,int,struct nvinfer1::PluginTensorDesc const *,int)const " (?getWorkspaceSize@TRTPlugin@ONNXPlugin@@UEBA_KPEBUPluginTensorDesc@nvinfer1@@H0H@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::enqueue(struct nvinfer1::PluginTensorDesc const *,struct nvinfer1::PluginTensorDesc const *,void const * const *,void * const *,void *,struct CUstream_st *)" (?enqueue@TRTPlugin@ONNXPlugin@@UEAAHPEBUPluginTensorDesc@nvinfer1@@0PEBQEBXPEBQEAXPEAXPEAUCUstream_st@@@z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::enqueue(struct nvinfer1::PluginTensorDesc const *,struct nvinfer1::PluginTensorDesc const *,void const * const *,void * const *,void *,struct CUstream_st *)" (?enqueue@TRTPlugin@ONNXPlugin@@UEAAHPEBUPluginTensorDesc@nvinfer1@@0PEBQEBXPEBQEAXPEAXPEAUCUstream_st@@@z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual int __cdecl ONNXPlugin::TRTPlugin::enqueue(struct nvinfer1::PluginTensorDesc const *,struct nvinfer1::PluginTensorDesc const *,void const * const *,void * const *,void *,struct CUstream_st *)" (?enqueue@TRTPlugin@ONNXPlugin@@UEAAHPEBUPluginTensorDesc@nvinfer1@@0PEBQEBXPEBQEAXPEAXPEAUCUstream_st@@@z)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getSerializationSize(void)const " (?getSerializationSize@TRTPlugin@ONNXPlugin@@UEBA_KXZ)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getSerializationSize(void)const " (?getSerializationSize@TRTPlugin@ONNXPlugin@@UEBA_KXZ)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual unsigned __int64 __cdecl ONNXPlugin::TRTPlugin::getSerializationSize(void)const " (?getSerializationSize@TRTPlugin@ONNXPlugin@@UEBA_KXZ)
plugin_list_generated_DCNv2.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::serialize(void *)const " (?serialize@TRTPlugin@ONNXPlugin@@UEBAXPEAX@Z)
plugin_list_generated_HSigmoid.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::serialize(void *)const " (?serialize@TRTPlugin@ONNXPlugin@@UEBAXPEAX@Z)
plugin_list_generated_HSwish.cu.obj : error LNK2001: 无法解析的外部符号 "public: virtual void __cdecl ONNXPlugin::TRTPlugin::serialize(void *)const " (?serialize@TRTPlugin@ONNXPlugin@@UEBAXPEAX@Z)
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 cublasGetMathMode,该符号在函数 "enum cublasStatus_t __cdecl cublasMigrateComputeType(struct cublasContext *,enum cudaDataType_t,enum cublasComputeType_t *)" (?cublasMigrateComputeType@@ya?AW4cublasStatus_t@@PEAUcublasContext@@W4cudaDataType_t@@PEAW4cublasComputeType_t@@@z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 cublasSgemm_v2,该符号在函数 "void __cdecl segemm_native(struct cublasContext *,enum cublasOperation_t,enum cublasOperation_t,int,int,int,float,float const *,int,float const *,int,float,float *,int)" (?segemm_native@@YAXPEAUcublasContext@@W4cublasOperation_t@@1HHHMPEBMH2HMPEAMH@Z) 中被引用
plugin_list_generated_DCNv2.cu.obj : error LNK2019: 无法解析的外部符号 cublasGemmEx,该符号在函数 "enum cublasStatus_t __cdecl cublasGemmEx(struct cublasContext *,enum cublasOperation_t,enum cublasOperation_t,int,int,int,void const *,void const *,enum cudaDataType_t,int,void const *,enum cudaDataType_t,int,void const *,void *,enum cudaDataType_t,int,enum cudaDataType_t,enum cublasGemmAlgo_t)" (?cublasGemmEx@@ya?AW4cublasStatus_t@@PEAUcublasContext@@W4cublasOperation_t@@1HHHPEBX2W4cudaDataType_t@@H23H2PEAX3H3W4cublasGemmAlgo_t@@@z) 中被引用
plugin_list.dll : fatal error LNK1120: 23 个无法解析的外部命令
NMAKE : fatal error U1077: “"D:\Program Files\JetBrains\CLion 2021.2\bin\cmake\win\bin\cmake.exe"”: 返回代码“0xffffffff”
Stop.
NMAKE : fatal error U1077: “"D:\Program Files\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"”: 返回代码“0x2”
Stop.
NMAKE : fatal error U1077: “"D:\Program Files\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"”: 返回代码“0x2”
Stop.
NMAKE : fatal error U1077: “"D:\Program Files\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"”: 返回代码“0x2”
Stop.

retinaface+arcface在tensorRT7.x与tensorRT8.x耗时差别很大,这是什么原因呢?

1.retinaface,resnet50+tensorRT8.x耗时如下:
car.jpg, 2 faces, average time: 24.85 ms
gril.jpg, 1 faces, average time: 24.85 ms
group.jpg, 24 faces, average time: 24.85 ms
yq.jpg, 1 faces, average time: 24.85 ms
zand.jpg, 2 faces, average time: 24.85 ms
zgjr.jpg, 3 faces, average time: 24.85 ms

2.retinaface,resnet50+tensorRT7.x耗时如下:
car.jpg, 2 faces, average time: 34.78 ms
gril.jpg, 1 faces, average time: 34.78 ms
group.jpg, 24 faces, average time: 34.78 ms
yq.jpg, 1 faces, average time: 34.78 ms
zand.jpg, 2 faces, average time: 34.78 ms
zgjr.jpg, 3 faces, average time: 34.78 ms

3.retinaface+arcface,resnet50+tensorrt8.X耗时如下:
2ys1.jpg_2face_time:0.060646772384643555s
2ys3.jpg_9face_time:0.12475872039794922s
2ys5.jpg_3face_time:0.09700417518615723s
2ys2.jpg_1face_time:0.053908586502075195s

4.retinaface+arcface,resnet50+tensorrt7.X耗时如下:
2ys2.jpg_1face_time:0.33365297317504883s
2ys1.jpg_2face_time:0.6880166530609131s
2ys3.jpg_9face_time:2.6245572566986084s
2ys5.jpg_3face_time:0.9589879512786865s

同一个模型在tensorRT版本上差别很大,是不是arcface跟tensorRT7.x版本的问题?

希望大佬,我在linux下编译出现如下错误

希望大佬,我在linux下编译出现如下错误:
in function CUDATools::check_driver(cudaError_enum, char const*, int, char const*)': /home/yjy/test_trt_new/common/cuda_tools.cpp:15: undefined reference to cuGetErrorString'
/usr/bin/ld: /home/yjy/test_trt_new/common/cuda_tools.cpp:16: undefined reference to `cuGetErrorName'

请问是怎么回事啊

protobuf编译有问题

make报错

g++: error: google/protobuf/util/internal/.libs/proto_writer.o: No such file or directory
Makefile:2372: recipe for target 'libprotobuf.la' failed
make[2]: *** [libprotobuf.la] Error 1

是不是要先下载googletest,如果是的话,是不是哪个版本呢?
ubuntu18.04 jetson xavier nx

win10 error

环境:win10,vs2019,sm-75
问题:可以编译通过,执行错误,如图
尝试的过程:
1.本来cuda环境11.0,报同样的错,将cuda设置为10.2,报同样的错
2.显卡rtx titan,sm改为sm75
3.cuda路径均已修改,包括调试-》环境中的cuda路径
4.tensorrt版本进行替换,又恢复到作者的版本的tensorrt,报错依然是下面的错误

image
image

Python

你好,想问下关于python的问题,我看到您用的是anaconda的环境,找的里面的头文件和依赖,但是我这边用系统的python3.6没有找到对应的lib/include,能方便告诉我下么,感谢回复。

install

could you give any tips about the installation of this repo?

如何修改编译文件的配置参数

作者你好,本人c++代码不是特别精通,能否添加一些如何使用自己数据训练的模型进行编译推理需要修改哪些文件中配置参数的说明文档。

/usr/bin/ld: cannot find -lcuda error

when i make yolo -j8 in my ubuntu system, it occurred the error as above, i have set my protobuf version to libprotoc 3.15.8 ,and compile the project with tensorrt 8.2 version, before that, i ran the use_tensorrt_8.x.sh as well, however, this error happened , i 've tried to install the libcuda via typing apt-get install libcuda-dev, but it didn' t work, cannot find the library root.

after the investigation, i found there is no libcuda installed for high version of tensorrt 8.2, i only found the libcudart.so instead, so it is so weird. any suggestions about this error!

Environment Configuration Issues!

QQ截图20211119174416
QQ截图20211119173519
I have never used VS Code before. I want to learn from the video of the blogger, but there are always errors. Is there any solution?(I have installed Tensorrt on Windows)

能帮忙看一个编译问题吗?

是protobuf的链接有问题吗?该怎么处理呢?谢谢!

root@44ee5adbd43e:/home/pangguoliang/projects/tensorRT_Pro/build# make yolo -j8
[ 1%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_dbface/plugin_list_generated_dbface_decode.cu.o
[ 2%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_yolo_fast/plugin_list_generated_yolox_decode.cu.o
[ 4%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_centernet/plugin_list_generated_centernet_decode.cu.o
[ 7%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_retinaface/plugin_list_generated_retinaface_decode.cu.o
[ 7%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_scrfd/plugin_list_generated_scrfd_decode.cu.o
[ 8%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_yolo/plugin_list_generated_yolo_decode.cu.o
[ 10%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/onnxplugin/plugins/plugin_list_generated_HSwish.cu.o
[ 11%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/application/app_yolo_fast/plugin_list_generated_yolov5_decode.cu.o
[ 13%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/common/plugin_list_generated_preprocess_kernel.cu.o
[ 14%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/onnxplugin/plugins/plugin_list_generated_DCNv2.cu.o
[ 16%] Building NVCC (Device) object CMakeFiles/plugin_list.dir/src/tensorRT/onnxplugin/plugins/plugin_list_generated_HSigmoid.cu.o
Scanning dependencies of target plugin_list
[ 17%] Linking CXX shared library libplugin_list.so
[ 17%] Built target plugin_list
Scanning dependencies of target pro
[ 19%] Building CXX object CMakeFiles/pro.dir/src/application/app_alphapose.cpp.o
[ 20%] Building CXX object CMakeFiles/pro.dir/src/application/app_centernet.cpp.o
[ 22%] Building CXX object CMakeFiles/pro.dir/src/application/app_arcface.cpp.o
[ 23%] Building CXX object CMakeFiles/pro.dir/src/application/app_bert.cpp.o
[ 25%] Building CXX object CMakeFiles/pro.dir/src/application/app_arcface/arcface.cpp.o
[ 26%] Building CXX object CMakeFiles/pro.dir/src/application/app_centernet/centernet.cpp.o
[ 28%] Building CXX object CMakeFiles/pro.dir/src/application/app_dbface.cpp.o
[ 29%] Building CXX object CMakeFiles/pro.dir/src/application/app_alphapose/alpha_pose.cpp.o
[ 31%] Building CXX object CMakeFiles/pro.dir/src/application/app_dbface/dbface.cpp.o
[ 32%] Building CXX object CMakeFiles/pro.dir/src/application/app_fall_gcn/fall_gcn.cpp.o
[ 34%] Building CXX object CMakeFiles/pro.dir/src/application/app_fall_recognize.cpp.o
[ 35%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance.cpp.o
[ 37%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance/alpha_pose_high_perf.cpp.o
[ 38%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance/high_performance.cpp.o
[ 40%] Building CXX object CMakeFiles/pro.dir/src/application/app_high_performance/yolo_high_perf.cpp.o
[ 41%] Building CXX object CMakeFiles/pro.dir/src/application/app_lesson.cpp.o
[ 43%] Building CXX object CMakeFiles/pro.dir/src/application/app_plugin.cpp.o
[ 44%] Building CXX object CMakeFiles/pro.dir/src/application/app_python/interface.cpp.o
[ 46%] Building CXX object CMakeFiles/pro.dir/src/application/app_retinaface.cpp.o
[ 47%] Building CXX object CMakeFiles/pro.dir/src/application/app_retinaface/retinaface.cpp.o
[ 49%] Building CXX object CMakeFiles/pro.dir/src/application/app_scrfd.cpp.o
[ 50%] Building CXX object CMakeFiles/pro.dir/src/application/app_scrfd/scrfd.cpp.o
[ 52%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo.cpp.o
[ 53%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo/yolo.cpp.o
[ 55%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo_fast.cpp.o
[ 56%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo_fast/yolo_fast.cpp.o
[ 58%] Building CXX object CMakeFiles/pro.dir/src/application/test_warpaffine.cpp.o
[ 59%] Building CXX object CMakeFiles/pro.dir/src/application/test_yolo_map.cpp.o
[ 61%] Building CXX object CMakeFiles/pro.dir/src/application/tools/auto_download.cpp.o
[ 62%] Building CXX object CMakeFiles/pro.dir/src/application/tools/deepsort.cpp.o
[ 64%] Building CXX object CMakeFiles/pro.dir/src/application/tools/zmq_remote_show.cpp.o
[ 65%] Building CXX object CMakeFiles/pro.dir/src/application/tools/zmq_u.cpp.o
[ 67%] Building CXX object CMakeFiles/pro.dir/src/main.cpp.o
[ 68%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/builder/trt_builder.cpp.o
[ 70%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/cuda_tools.cpp.o
[ 71%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/ilogger.cpp.o
[ 73%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/json.cpp.o
[ 74%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/common/trt_tensor.cpp.o
[ 76%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/import_lib.cpp.o
[ 77%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/infer/trt_infer.cpp.o
[ 79%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o
[ 80%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-operators-ml.pb.cpp.o
[ 82%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/LoopHelpers.cpp.o
[ 83%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/ModelImporter.cpp.o
[ 85%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/NvOnnxParser.cpp.o
[ 86%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/OnnxAttrs.cpp.o
[ 88%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/RNNHelpers.cpp.o
[ 89%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/ShapeTensor.cpp.o
[ 91%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/ShapedWeights.cpp.o
[ 92%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/builtin_op_importers.cpp.o
[ 94%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnx2trt_utils.cpp.o
[ 95%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnxErrorRecorder.cpp.o
[ 97%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnxplugin/onnxplugin.cpp.o
[ 98%] Building CXX object CMakeFiles/pro.dir/src/tensorRT/onnxplugin/plugin_binary_io.cpp.o
[100%] Linking CXX executable ../workspace/pro
CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o: In function InitDefaultsscc_info_AttributeProto_onnx_2dml_2eproto()': /home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx/onnx-ml.pb.cpp:132: undefined reference to google::protobuf::internal::VerifyVersion(int, int, char const*)'
CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o: In function `InitDefaultsscc_info_FunctionProto_onnx_2dml_2eproto()':

...

/home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:805: undefined reference to
CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnx2trt_utils.cpp.o: In function onnx2trt::poolingHelper(onnx2trt::IImporterContext*, onnx::NodeProto const&, std::vector<onnx2trt::TensorOrWeights, std::allocator<onnx2trt::TensorOrWeights> >&, nvinfer1::PoolingType)': /home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1644: undefined reference to google::protobuf::RepeatedField::size() const'
CMakeFiles/pro.dir/src/tensorRT/onnx_parser/onnx2trt_utils.cpp.o: In function onnx2trt::setAttr(nvinfer1::Dims32*, onnx::AttributeProto const*, int, int)': /home/pangguoliang/projects/tensorRT_Pro/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1827: undefined reference to google::protobuf::RepeatedField::size() const'
collect2: error: ld returned 1 exit status
CMakeFiles/pro.dir/build.make:1515: recipe for target '../workspace/pro' failed
make[3]: *** [../workspace/pro] Error 1
CMakeFiles/Makefile2:424: recipe for target 'CMakeFiles/pro.dir/all' failed
make[2]: *** [CMakeFiles/pro.dir/all] Error 2
CMakeFiles/Makefile2:234: recipe for target 'CMakeFiles/yolo.dir/rule' failed
make[1]: *** [CMakeFiles/yolo.dir/rule] Error 2
Makefile:183: recipe for target 'yolo' failed
make: *** [yolo] Error 2

yolox 的部署,拆分了所需要的代码封装成dll, 但是在dll 中,从 nvinfer runtime deserializeCudaEngine(tensorrt 提供的api) 反序列化引擎文件时,总是返回空指针。一样的代码在exe 中没问题,能反序列化,分析结果也正确。

关于yolox 的部署,拆分了所需要的代码封装成dll, 但是在dll 中,从 nvinfer runtime deserializeCudaEngine(tensorrt 提供的api) 反序列化引擎文件时,总是返回空指针。一样的代码在exe 中没问题,能反序列化,分析结果也正确。
大佬知道咋回事嘛??
@53DUT~1TH7M}YY3AGX4PQ5

][trt_builder.cpp:36]:NVInfer: TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1

warning, errors are showed above, it is annoying and weird to concern the compatibility issues among tensorrt version and cuda, and cuda toolkit versions. i cannot figure out the difference among them, any help will be approciated!!!

i have installed tensorrt !!!

dpkg -l | grep tensorrt
ii nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.0.2.6-ga-20181009 1-1 amd64 nv-tensorrt repository configuration files
ii nv-tensorrt-repo-ubuntu1804-cuda10.1-trt5.1.5.0-ga-20190427 1-1 amd64 nv-tensorrt repository configuration files
ii nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.1.8-ga-20211117 1-1 amd64 nv-tensorrt repository configuration files
ii tensorrt 8.2.1.8-1+cuda11.4 amd64

it should be 8.2.18 cuda11.4 as i am concerned.

and after i typed nvcc -V , the cuda version is nvcc as follows:

NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

so which version of toolkit should i install??? currently, my cuda toolkit is v11.5.0 i think

一般思路下的做法是:我在A电脑导出 onnx, 然后在A电脑编译好对应的序列化工具(依赖tensorrt,cuda和cudnn,依赖项和工具一起打包)。 然后要在B电脑使用的时候,用这个序列化工具去序列化 onnx. 960, 1060, 1080 下都是这么做没问题。 1660 就遇到这个问题。 [E] [TRT] c:\p4sw\sw\gpgpu\MachineLearning\DIT\release\5.1\engine\cuda\cudaConvolutionLayer.cpp (238) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionLayer::execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

一般思路下的做法是:我在A电脑导出 onnx, 然后在A电脑编译好对应的序列化工具(依赖tensorrt,cuda和cudnn,依赖项和工具一起打包)。 然后要在B电脑使用的时候,用这个序列化工具去序列化 onnx.

960, 1060, 1080 下都是这么做没问题。

1660 就遇到这个问题。
[E] [TRT] c:\p4sw\sw\gpgpu\MachineLearning\DIT\release\5.1\engine\cuda\cudaConvolutionLayer.cpp (238) - Cudnn Error in nvinfer1::rt::cuda::CudnnConvolutionLayer::execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

插件学习困难

您好,我看视频课里插件那一集。但是代码里app_yolo里没有测试插件的代码,请问在哪里可以找到呢?谢谢。
看到yolo5那里,跟着视频转onnx,但是报错,不能解析onnx文件。请问有什么方法检查吗.不知道和output格式有没有关系
2021-10-30 16-59-15屏幕截图

arcface人脸特征提取不同人脸,计算的相似度都一样,是怎么回事呢?

余弦距离计算代码:
def cosine_distance(matrix1, matrix2):
matrix1_matrix2 = np.dot(matrix1, matrix2.transpose())
matrix1_norm = np.sqrt(np.multiply(matrix1, matrix1).sum(axis=1))
matrix1_norm = matrix1_norm[:, np.newaxis]
matrix2_norm = np.sqrt(np.multiply(matrix2, matrix2).sum(axis=1))
matrix2_norm = matrix2_norm[:, np.newaxis]
cosine_distance = np.divide(matrix1_matrix2, np.dot(matrix1_norm, matrix2_norm.transpose()))
return cosine_distance

是不是没有进行归一化呢,归一化有没有好的函数推荐呢?

Windows 10: Linker Error

Hello,

When compiling the program in Visual Studio 2019, I encounter this error:

image

Do you know how to fix this?

Much thanks.

yolox master分支的模型能够正常导出到onnx但生成engine失败

[2021-10-26 14:29:01][info][app_yolo.cpp:121]:===================== test YoloX FP32 yolox_s ==================================
[2021-10-26 14:29:01][info][trt_builder.cpp:471]:Compile FP32 Onnx Model 'yolox_s.onnx'.
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0]
[2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: INVALID_ARGUMENT: getPluginCreator could not find plugin ScatterND version 1
While parsing node number 565 [ScatterND]:
ERROR: /home/work/tracking/tensorRT_Pro/src/tensorRT/onnx_parser/builtin_op_importers.cpp:4013 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[2021-10-26 14:29:02][error][trt_builder.cpp:517]:Can not parse OnnX file: yolox_s.onnx
[2021-10-26 14:29:02][error][yolo.cpp:138]:Engine yolox_s.FP32.trtmodel load failed
[2021-10-26 14:29:02][error][app_yolo.cpp:42]:Engine is nullptr
[100%] Built target yolo

在Visual studio中提示错误:未定义标识符“not”

app_yolo.cpp中提示错误:
E0020 未定义标识符 "not" pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E2830 条件声明不允许使用带圆括号的初始化表达式 pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E0020 未定义标识符 "not" pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131
E0551 当前范围内无法定义 函数 "iLogger::exists" pro - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131
E0020 未定义标识符 "not" trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E2830 条件声明不允许使用带圆括号的初始化表达式 trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 124
E0020 未定义标识符 "not" trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131
E0551 当前范围内无法定义 函数 "iLogger::exists" trtpyc - x64-Debug (默认值) D:\Dep_gram\TENSORRT\src\application\app_yolo.cpp 131

安装wiki编译的时候遇到 google/protobuf/port_def.inc: No such file or directory #include <google/protobuf/port_def.inc>

tensorRT_cpp/src/tensorRT/onnx/onnx-ml.pb.h:11:10: fatal error: google/protobuf/port_def.inc: No such file or directory
#include <google/protobuf/port_def.inc>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/pro.dir/build.make:806: recipe for target 'CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o' failed
make[3]: *** [CMakeFiles/pro.dir/src/tensorRT/onnx/onnx-ml.pb.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs....

win10配置的小问题

win10步骤中的:(修改 Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.0.props" / 为你配置的CUDA路径)能不能给一个具体例子。 谢谢了 比如我这样对不对?$(VCTargetsPath)\C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\visual_studio_integration\MSBuildExtensions\CUDA 10.2.props" /

yolov5 如何添加自己的模型,类别的修改只需要替换static const char* cocolabels[]吗?

如题,训练好的模型在python测试满足项目需求,但是在c++上把类别替换好,检测结果为0,下面是转换过程和推理过程,麻烦大神帮忙看看呀:
[2021-09-28 09:11:59][info][app_yolo.cpp:125]:===================== test YoloV5 FP16 yolov5x ==================================
[2021-09-28 09:11:59][info][trt_builder.cpp:471]:Compile FP16 Onnx Model 'yolov5x.onnx'.
[2021-09-28 09:12:00][info][trt_builder.cpp:557]:Input shape is -1 x 3 x 640 x 640
[2021-09-28 09:12:00][info][trt_builder.cpp:558]:Set max batch size = 16
[2021-09-28 09:12:00][info][trt_builder.cpp:559]:Set max workspace size = 1024.00 MB
[2021-09-28 09:12:00][info][trt_builder.cpp:562]:Network has 1 inputs:
[2021-09-28 09:12:00][info][trt_builder.cpp:568]: 0.[images] shape is -1 x 3 x 640 x 640
[2021-09-28 09:12:00][info][trt_builder.cpp:574]:Network has 3 outputs:
[2021-09-28 09:12:00][info][trt_builder.cpp:579]: 0.[output] shape is -1 x 3 x 80 x 80 x 12
[2021-09-28 09:12:00][info][trt_builder.cpp:579]: 1.[798] shape is -1 x 3 x 40 x 40 x 12
[2021-09-28 09:12:00][info][trt_builder.cpp:579]: 2.[818] shape is -1 x 3 x 20 x 20 x 12
[2021-09-28 09:12:00][info][trt_builder.cpp:583]:Network has 710 layers:
[2021-09-28 09:12:00][info][trt_builder.cpp:650]:Building engine...
[2021-09-28 09:12:02][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:12:02][warn][trt_builder.cpp:33]:NVInfer: Detected invalid timing cache, setup a local cache instead
[2021-09-28 09:19:45][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:19:45][info][trt_builder.cpp:670]:Build done 464195 ms !
[2021-09-28 09:21:35][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:21:35][warn][trt_builder.cpp:33]:NVInfer: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[2021-09-28 09:21:35][info][trt_infer.cpp:167]:Infer 000002202FA4FAD0 detail
[2021-09-28 09:21:35][info][trt_infer.cpp:168]: Max Batch Size: 16
[2021-09-28 09:21:35][info][trt_infer.cpp:169]: Inputs: 1
[2021-09-28 09:21:35][info][trt_infer.cpp:173]: 0.images : shape {16 x 3 x 640 x 640}
[2021-09-28 09:21:35][info][trt_infer.cpp:176]: Outputs: 3
[2021-09-28 09:21:35][info][trt_infer.cpp:180]: 0.output : shape {16 x 3 x 80 x 80 x 12}
[2021-09-28 09:21:35][info][trt_infer.cpp:180]: 1.798 : shape {16 x 3 x 40 x 40 x 12}
[2021-09-28 09:21:35][info][trt_infer.cpp:180]: 2.818 : shape {16 x 3 x 20 x 20 x 12}
[2021-09-28 09:22:08][info][app_yolo.cpp:77]:yolov5x.FP16.trtmodel[YoloV5] average: 9.57 ms / image, FPS: 104.47
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/2.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F10149_I1611_L1654_T551_W99_H195_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F5834_I879_L2390_T1147_W198_H413_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F5930_I888_L1803_T905_W291_H356_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][app_yolo.cpp:103]:Save to yolov5x_YoloV5_FP16_result/small_F7010_I930_L1621_T851_W200_H353_R1.jpg, 0 object, average time 9.57 ms
[2021-09-28 09:22:08][info][yolo.cpp:207]:Engine destroy.

如何加载识别网络?

请问这个框架可以加载经典的识别网络吗?例如可以把pytorch自带的mobilenet模型直接导入工程吗?可不可以提供一个示例?

no result in int8 mode

Hi, thanks for your awesome code share!

I test this repo on my Ubuntu 18 Linux with CUDA V11.2.152, TensorRT-8.0.0.3 and cudnn8.2.0.

I tested yolox_s model in FP32、FP16 and INT8 mode, but in INT8 mode, the network output has no results.

FP16 test code is below:

static void test_fp16(Yolo::Type type){

    TRT::set_device(0);
    INFO("===================== test %s fp16 ==================================", Yolo::type_name(type));

    const char* name = nullptr;
    if(type == Yolo::Type::V5){
        name = "yolov5m";
    }else if(type == Yolo::Type::X){
        name = "yolox_s";
    }

    if(not requires(name))
        return;

    string onnx_file = iLogger::format("%s.onnx", name);
    string model_file = iLogger::format("%s.fp16.trtmodel", name);
    int test_batch_size = 1;  // 当你需要修改batch大于1时,请注意你的模型是否修改(看readme.md代码修改部分),否则会有错误
    
    // 动态batch和静态batch,如果你想要弄清楚,请打开http://www.zifuture.com:8090/
    // 找到右边的二维码,扫码加好友后进群交流(免费哈,就是技术人员一起沟通)
    if(not iLogger::exists(model_file)){
        TRT::compile(
            TRT::TRTMode_FP16,   // 编译方式有,FP32、FP16、INT8
            {},                         // onnx时无效,caffe的输出节点标记
            test_batch_size,            // 指定编译的batch size
            onnx_file,                  // 需要编译的onnx文件
            model_file,                 // 储存的模型文件
            {},                         // 指定需要重定义的输入shape,这里可以对onnx的输入shape进行重定义
            false                       // 是否采用动态batch维度,true采用,false不采用,使用静态固定的batch size
        );
    }

    forward_engine(model_file, type);
}

Below is the output of the log:

[2021-09-01 19:38:45][info][app_yolo.cpp:240]:===================== test YoloX fp32 ==================================
[2021-09-01 19:38:45][info][trt_builder.cpp:473]:Compile FP32 Onnx Model 'yolox_s.onnx'.
[2021-09-01 19:38:45][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
[2021-09-01 19:38:45][info][trt_builder.cpp:603]:Set max batch size = 1
[2021-09-01 19:38:45][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
[2021-09-01 19:38:45][info][trt_builder.cpp:605]:Dynamic batch dimension is false
[2021-09-01 19:38:45][info][trt_builder.cpp:608]:Network has 1 inputs:
[2021-09-01 19:38:45][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-09-01 19:38:45][info][trt_builder.cpp:620]:Network has 1 outputs:
[2021-09-01 19:38:45][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
[2021-09-01 19:38:45][info][trt_builder.cpp:670]:Building engine...
[2021-09-01 19:38:45][warn][trt_builder.cpp:33]:NVInfer WARNING: Convolution + generic activation fusion is disable due to incompatible driver or nvrtc
[2021-09-01 19:38:46][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:38:46][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-09-01 19:39:18][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:18][info][trt_builder.cpp:690]:Build done 32689 ms !
[2021-09-01 19:39:18][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.

[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:19][info][trt_infer.cpp:169]:Infer 0x7fe3f8000c40 detail
[2021-09-01 19:39:19][info][trt_infer.cpp:170]: Max Batch Size: 1
[2021-09-01 19:39:19][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
[2021-09-01 19:39:19][info][trt_infer.cpp:172]: Inputs: 1
[2021-09-01 19:39:19][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
[2021-09-01 19:39:19][info][trt_infer.cpp:179]: Outputs: 1
[2021-09-01 19:39:19][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 2 object, 10.10 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 1 object, 6.19 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 2 object, 8.72 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 3 object, 6.03 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 1 object, 5.95 ms
[2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 1 object, 5.92 ms
[2021-09-01 19:39:19][info][yolo.cpp:214]:Engine destroy.
[2021-09-01 19:39:19][info][app_yolo.cpp:277]:===================== test YoloX fp16 ==================================
[2021-09-01 19:39:19][info][trt_builder.cpp:473]:Compile FP16 Onnx Model 'yolox_s.onnx'.
[2021-09-01 19:39:19][warn][trt_builder.cpp:483]:Platform not have fast fp16 support
[2021-09-01 19:39:19][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:19][info][trt_builder.cpp:603]:Set max batch size = 1
[2021-09-01 19:39:19][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
[2021-09-01 19:39:19][info][trt_builder.cpp:605]:Dynamic batch dimension is false
[2021-09-01 19:39:19][info][trt_builder.cpp:608]:Network has 1 inputs:
[2021-09-01 19:39:19][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:19][info][trt_builder.cpp:620]:Network has 1 outputs:
[2021-09-01 19:39:19][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
[2021-09-01 19:39:19][info][trt_builder.cpp:670]:Building engine...
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-09-01 19:39:47][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:47][info][trt_builder.cpp:690]:Build done 28282 ms !
[2021-09-01 19:39:47][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.

[2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:48][info][trt_infer.cpp:169]:Infer 0x7fe3f8015650 detail
[2021-09-01 19:39:48][info][trt_infer.cpp:170]: Max Batch Size: 1
[2021-09-01 19:39:48][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
[2021-09-01 19:39:48][info][trt_infer.cpp:172]: Inputs: 1
[2021-09-01 19:39:48][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
[2021-09-01 19:39:48][info][trt_infer.cpp:179]: Outputs: 1
[2021-09-01 19:39:48][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 2 object, 10.75 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 1 object, 6.38 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 2 object, 9.12 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 3 object, 6.10 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 1 object, 6.07 ms
[2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 1 object, 6.01 ms
[2021-09-01 19:39:48][info][yolo.cpp:214]:Engine destroy.
[2021-09-01 19:39:48][info][app_yolo.cpp:190]:===================== test YoloX int8 ==================================
[2021-09-01 19:39:48][info][trt_builder.cpp:473]:Compile INT8 Onnx Model 'yolox_s.onnx'.
[2021-09-01 19:39:48][info][trt_builder.cpp:593]:Using image list[6 files]: inference
[2021-09-01 19:39:48][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:48][info][trt_builder.cpp:603]:Set max batch size = 1
[2021-09-01 19:39:48][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
[2021-09-01 19:39:48][info][trt_builder.cpp:605]:Dynamic batch dimension is false
[2021-09-01 19:39:48][info][trt_builder.cpp:608]:Network has 1 inputs:
[2021-09-01 19:39:48][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-09-01 19:39:48][info][trt_builder.cpp:620]:Network has 1 outputs:
[2021-09-01 19:39:48][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
[2021-09-01 19:39:48][info][trt_builder.cpp:670]:Building engine...
[2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:49][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:49][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 1 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 2 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 3 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 4 / 6
[2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 5 / 6
[2021-09-01 19:39:50][info][app_yolo.cpp:193]:Int8 6 / 6
[2021-09-01 19:40:27][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:27][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:59][info][trt_builder.cpp:685]:No set entropyCalibratorFile, and entropyCalibrator will not save.
[2021-09-01 19:40:59][info][trt_builder.cpp:690]:Build done 70917 ms !
[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.

[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
[2021-09-01 19:40:59][info][trt_infer.cpp:169]:Infer 0x7fe3b4000c40 detail
[2021-09-01 19:40:59][info][trt_infer.cpp:170]: Max Batch Size: 1
[2021-09-01 19:40:59][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
[2021-09-01 19:40:59][info][trt_infer.cpp:172]: Inputs: 1
[2021-09-01 19:40:59][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
[2021-09-01 19:40:59][info][trt_infer.cpp:179]: Outputs: 1
[2021-09-01 19:40:59][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 0 object, 7.75 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 0 object, 4.04 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 0 object, 6.04 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 0 object, 3.74 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 0 object, 3.72 ms
[2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 0 object, 3.74 ms
[2021-09-01 19:40:59][info][yolo.cpp:214]:Engine destroy.

Time-consuming in the FP16/FP32 modes is close, is it because the code I wrote has a problem?

Your sincerely!

bugs

fatal error: NvInfer.h: No such file or directory
    8 | #include "NvInfer.h"

where the TensorRT path is right.

Error,arcface onnx 转tensorrt 错误

使用了本项目提供给的arcface_iresnet50.onnx 模型
错误如下:
While parsing node number 182 [BatchNormalization]:
ERROR: /home/Project/tensorRT_Pro-main/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1523 In function scaleHelper:
[8] Assertion failed: dims.nbDims == 4 || dims.nbDims == 5
[2021-10-04 16:39:25][error][trt_builder.cpp:517]:Can not parse OnnX file: arcface_iresnet50_iii.onnx

计算机环境如下:

tensorrt 7.2
cudnn8.1
cuda 11.2
protobufv3.11.4
gpu 3080  arch86

BatchNormalization_182 是模型的倒数第二层。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.