triton-inference-server / paddlepaddle_backend Goto Github PK

View Code? Open in Web Editor NEW

32.0 8.0 6.0 84 KB

License: BSD 3-Clause "New" or "Revised" License

CMake 5.28% Dockerfile 7.73% Shell 8.23% C++ 78.76%

paddlepaddle_backend's Introduction

English | 简体中文

Triton Paddle Backend

Quick Start
Examples
- ERNIE Base
- ResNet50 v1.5
Performance

Quick Start

Pull Image

docker pull paddlepaddle/triton_paddle:21.10

Note: Only Triton Inference Server 21.10 image is supported.

Create A Model Repository

The model repository is the directory where you place the models that you want Triton to server. An example model repository is included in the examples. Before using the repository, you must fetch it by the following scripts.

$ cd examples
$ ./fetch_models.sh
$ cd .. # back to root of paddle_backend

Launch Triton Inference Server

Launch the image

$ docker run --gpus=all --rm -it --name triton_server --net=host -e CUDA_VISIBLE_DEVICES=0 \
           -v `pwd`/examples/models:/workspace/models \
           paddlepaddle/triton_paddle:21.10 /bin/bash

Launch the triton inference server

/opt/tritonserver/bin/tritonserver --model-repository=/workspace/models

Note: /opt/tritonserver/bin/tritonserver --help for all available parameters

Verify Triton Is Running Correctly

Use Triton’s ready endpoint to verify that the server and the models are ready for inference. From the host system use curl to access the HTTP endpoint that indicates server status.

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready.

Examples

Before running the examples, please make sure the triton server is running correctly.

Change working directory to examples

$ cd examples

ERNIE Base

ERNIE-2.0 is a pre-training framework for language understanding.

Steps to run the benchmark on ERNIE

$ bash perf_ernie.sh

ResNet50 v1.5

The ResNet50-v1.5 is a modified version of the original ResNet50 v1 model.

Steps to run the benchmark on ResNet50-v1.5

$ bash perf_resnet50_v1.5.sh

Steps to run the inference on ResNet50-v1.5.

Prepare processed images following DeepLearningExamples and place imagenet folder under examples directory.
Run the inference

$ bash infer_resnet_v1.5.sh imagenet/<id>

Performance

ERNIE Base (T4)

Precision	Backend Accelerator	Client Batch Size	Sequences/second	P90 Latency (ms)	P95 Latency (ms)	P99 Latency (ms)	Avg Latency (ms)
FP16	TensorRT	1	270.0	3.813	3.846	4.007	3.692
FP16	TensorRT	2	500.4	4.282	4.332	4.709	3.980
FP16	TensorRT	4	831.2	5.141	5.242	5.569	4.797
FP16	TensorRT	8	1128.0	7.788	7.949	8.255	7.089
FP16	TensorRT	16	1363.2	12.702	12.993	13.507	11.738
FP16	TensorRT	32	1529.6	22.495	22.817	24.634	20.901

ResNet50 v1.5 (V100-SXM2-16G)

Precision	Backend Accelerator	Client Batch Size	Sequences/second	P90 Latency (ms)	P95 Latency (ms)	P99 Latency (ms)	Avg Latency (ms)
FP16	TensorRT	1	288.8	3.494	3.524	3.608	3.462
FP16	TensorRT	2	494.0	4.083	4.110	4.208	4.047
FP16	TensorRT	4	758.4	5.327	5.359	5.460	5.273
FP16	TensorRT	8	1044.8	7.728	7.770	7.949	7.658
FP16	TensorRT	16	1267.2	12.742	12.810	13.883	12.647
FP16	TensorRT	32	1113.6	28.840	29.044	30.357	28.641
FP16	TensorRT	64	1100.8	58.512	58.642	59.967	58.251
FP16	TensorRT	128	1049.6	121.371	121.834	123.371	119.991

ResNet50 v1.5 (T4)

Precision	Backend Accelerator	Client Batch Size	Sequences/second	P90 Latency (ms)	P95 Latency (ms)	P99 Latency (ms)	Avg Latency (ms)
FP16	TensorRT	1	291.8	3.471	3.489	3.531	3.427
FP16	TensorRT	2	466.0	4.323	4.336	4.382	4.288
FP16	TensorRT	4	665.6	6.031	6.071	6.142	6.011
FP16	TensorRT	8	833.6	9.662	9.684	9.767	9.609
FP16	TensorRT	16	899.2	18.061	18.208	18.899	17.748
FP16	TensorRT	32	761.6	42.333	43.456	44.167	41.740
FP16	TensorRT	64	793.6	79.860	80.410	80.807	79.680
FP16	TensorRT	128	793.6	158.207	158.278	158.643	157.543

paddlepaddle_backend's People

Contributors

Stargazers

Watchers

Forkers

jiweibo zhaohb jzz-note org-mars heliqi cgranger-sorenson

paddlepaddle_backend's Issues

Error when building backend with paddle 2.3


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.08 (build 42766143)
Triton Server Version 2.25.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

Get:1 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1581 B]
Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:4 https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64  Packages [705 kB]
Get:5 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2183 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:8 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [916 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1556 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1214 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2650 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1671 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.4 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
Fetched 24.5 MB in 13s (1862 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  rapidjson-dev
0 upgraded, 1 newly installed, 0 to remove and 43 not upgraded.
Need to get 95.0 kB of archives.
After this operation, 636 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/universe amd64 rapidjson-dev all 1.1.0+dfsg2-5ubuntu1 [95.0 kB]
Fetched 95.0 kB in 2s (54.9 kB/s)
Selecting previously unselected package rapidjson-dev.
(Reading database ... 24861 files and directories currently installed.)
Preparing to unpack .../rapidjson-dev_1.1.0+dfsg2-5ubuntu1_all.deb ...
Unpacking rapidjson-dev (1.1.0+dfsg2-5ubuntu1) ...
Setting up rapidjson-dev (1.1.0+dfsg2-5ubuntu1) ...
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- RapidJSON found. Headers: /usr/include
-- RapidJSON found. Headers: /usr/include
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.7.99")
-- Found CUDA: /usr/local/cuda (found version "11.7")
-- Using CUDA 11.7
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/paddle_backend/build
Scanning dependencies of target triton-common-async-work-queue
[  3%] Building NVCC (Device) object _deps/repo-backend-build/CMakeFiles/kernel-library-new.dir/src/kernel-library-new_generated_kernel.cu.o
Scanning dependencies of target triton-common-table-printer
Scanning dependencies of target triton-common-thread-pool
Scanning dependencies of target triton-common-error
Scanning dependencies of target triton-core-serverstub
Scanning dependencies of target triton-common-logging
[ 11%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-async-work-queue.dir/src/async_work_queue.cc.o
[ 11%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-async-work-queue.dir/src/error.cc.o
[ 15%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-thread-pool.dir/src/thread_pool.cc.o
[ 19%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-table-printer.dir/src/table_printer.cc.o
[ 23%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-error.dir/src/error.cc.o
[ 26%] Building CXX object _deps/repo-core-build/CMakeFiles/triton-core-serverstub.dir/src/tritonserver_stub.cc.o
[ 30%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-logging.dir/src/logging.cc.o
[ 34%] Linking CXX shared library libtritonserver.so
[ 34%] Built target triton-core-serverstub
[ 38%] Building CXX object _deps/repo-common-build/CMakeFiles/triton-common-async-work-queue.dir/src/thread_pool.cc.o
[ 42%] Linking CXX static library libtritoncommonerror.a
[ 42%] Built target triton-common-error
[ 46%] Linking CXX static library libtritoncommonlogging.a
[ 50%] Linking CXX static library libtritonthreadpool.a
[ 50%] Built target triton-common-logging
[ 50%] Built target triton-common-thread-pool
[ 53%] Linking CXX static library libtritonasyncworkqueue.a
[ 53%] Built target triton-common-async-work-queue
[ 57%] Linking CXX static library libtritontableprinter.a
[ 57%] Built target triton-common-table-printer
Scanning dependencies of target kernel-library-new
[ 61%] Linking CXX static library libkernel-library-new.a
[ 61%] Built target kernel-library-new
Scanning dependencies of target triton-backend-utils
[ 65%] Building CXX object _deps/repo-backend-build/CMakeFiles/triton-backend-utils.dir/src/backend_input_collector.cc.o
[ 69%] Building CXX object _deps/repo-backend-build/CMakeFiles/triton-backend-utils.dir/src/backend_common.cc.o
[ 73%] Building CXX object _deps/repo-backend-build/CMakeFiles/triton-backend-utils.dir/src/backend_output_responder.cc.o
[ 80%] Building CXX object _deps/repo-backend-build/CMakeFiles/triton-backend-utils.dir/src/backend_memory.cc.o
[ 80%] Building CXX object _deps/repo-backend-build/CMakeFiles/triton-backend-utils.dir/src/backend_model.cc.o
[ 84%] Building CXX object _deps/repo-backend-build/CMakeFiles/triton-backend-utils.dir/src/backend_model_instance.cc.o
[ 88%] Linking CXX static library libtritonbackendutils.a
[ 88%] Built target triton-backend-utils
Scanning dependencies of target triton-paddle-backend
[ 92%] Building CXX object CMakeFiles/triton-paddle-backend.dir/src/paddle_backend_utils.cc.o
[ 96%] Building CXX object CMakeFiles/triton-paddle-backend.dir/src/paddle.cc.o
/workspace/paddle_backend/src/paddle.cc: In member function 'void ModelImpl::CollectShapeRun(paddle_infer::Predictor*, const std::map<std::__cxx11::basic_string<char>, std::vector<int> >&)':
/workspace/paddle_backend/src/paddle.cc:78:32: error: 'class paddle_infer::Predictor' has no member named 'GetInputTypes'; did you mean 'GetInputNames'?
   78 |   auto input_type = predictor->GetInputTypes();
      |                                ^~~~~~~~~~~~~
      |                                GetInputNames
/workspace/paddle_backend/src/paddle.cc: In constructor 'ModelImpl::ModelImpl(const char*, const char*, TRITONPADDLE_Config*, int32_t, cudaStream_t)':
/workspace/paddle_backend/src/paddle.cc:177:23: error: 'struct paddle::AnalysisConfig' has no member named 'SetExecStream'
  177 |     analysis_config_->SetExecStream((void*)stream);
      |                       ^~~~~~~~~~~~~
/workspace/paddle_backend/src/paddle.cc:198:27: error: 'struct paddle::AnalysisConfig' has no member named 'EnableVarseqlen'
  198 |         analysis_config_->EnableVarseqlen();
      |                           ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/triton-paddle-backend.dir/build.make:82: CMakeFiles/triton-paddle-backend.dir/src/paddle.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:170: CMakeFiles/triton-paddle-backend.dir/all] Error 2
make: *** [Makefile:149: all] Error 2

It looks like the compiled paddle inference lib does not have these member functions. And I found that build_paddle.sh build on paddle 2.3

paddlepaddle_backend/paddle-lib/Dockerfile

Line 48 in 6c1c935

RUN git pull && git checkout release/2.3

which is weird, because GetInputTypes was implemented in release/2.4, not in release/2.3

https://github.com/PaddlePaddle/Paddle/blob/release/2.4/paddle/fluid/inference/api/paddle_api.h
https://github.com/PaddlePaddle/Paddle/blob/release/2.3/paddle/fluid/inference/api/paddle_api.h

Since paddle 2.4 has not been released, I do not know if compiling backend with dev version is recommended. @heliqi could you help me deal with it?

结构体只有声明无定义

paddle.cc中struct TRITONPADDLE_Tensor和struct TRITONPADDLE_Model只有结构体的声明没有定义是为什么呢

Integrate the paddle backend to the original triton server

Is there any documentation or ways that I can configure this paddle backend and use it it the triton server together with the original backend - pytorch, tensorflow, onnyx ?

Full describe as below if i want to add the paddle backend to tritonserver:21.02-py3

run dockerfile as below:
FROM nvcr.io/nvidia/tritonserver:21.02-py3 as full
COPY ./paddlepaddle_backend/paddle-lib/paddle /opt/tritonserver/backends/paddle

docker build tritonserver_custom .

after done and i run:
docker run --gpus all --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ./model-repo:/model tritonserver_custom tritonserver --model-repository=/models --model-control-mode=POLL

I get error :
I0722 06:20:09.185171 1 server.cc:217] No server context available. Exiting immediately.
error: creating server: Internal - failed to stat file /models

Any solution on this ?

Support for ARM architecture compilation in paddlepaddle_triton project

Dear project maintainers,

I hope this message finds you well. I wanted to inquire about the possibility of adding support for ARM architecture in the paddlepaddle_triton project. Currently, it seems that the project only provides images compiled for x86 architecture.

Given the increasing popularity and usage of ARM-based systems, it would be highly beneficial to have ARM architecture support in the paddlepaddle_triton project. This would enable users with ARM-based platforms to utilize paddlepaddle_triton's capabilities for their inference requirements.

I would like to kindly request if it would be possible for the project to provide official images specifically compiled for ARM architecture. Having official ARM images would streamline the deployment process and ensure compatibility for users on ARM-based platforms.

I appreciate your attention to this request and would be grateful if you could provide any insights or plans regarding the availability of ARM architecture support and official ARM images in the paddlepaddle_triton project.

Thank you for your contributions to the open-source community.

Best regards,
yifan weng

Can not use disenable_trt_tune option

I want to use TRT and set disenable_trt_tune option is True but get an exception below:

unknown parameter 'disenable_trt_tune` is provided for GPU execution accelerator config. Available choices are [precision, min_graph_size, workspace_size, max_batch_size, enable_tensorrt_oss, is_dynamic]

But i saw that this option available here.

This is Docker image base and config.pbtxt for execution accelerators with tensorrt:

Docker image

https://hub.docker.com/r/paddlepaddle/triton_paddle

Configs.pbtxt

optimization {
  execution_accelerators {
    gpu_execution_accelerator : [
      {
        name : "tensorrt"
        parameters { key: "precision" value: "trt_fp16" }
        parameters { key: "min_graph_size" value: "30" }
        parameters { key: "max_batch_size" value: "32" }
        parameters { key: "workspace_size" value: "1073741824" }
        parameters { key: "is_dynamic" value: "1" }
        parameters { key: "disenable_trt_tune" value: "1" }
      }
    ]
  }
}

Please help me asps, thank you !!! 🥺️🥺️🥺️

Any plan to update to latest triton version? (23.07)

So far the latest publicly available triton inference server with paddle backend is paddlepaddle/triton_paddle:21.10 and there are lots of bug fixes since then. I'm experiencing an increasing amount of bugs that are related older version of triton. Can Paddle Org provides an official update to the triton image?

Or is there any way we could install paddle backend in the latest triton inference server image ourselves?

error when building paddle backend

/workspace/paddle_backend/src/paddle.cc:78:32: error: 'class paddle_infer::Predictor' has no member named 'GetInputTypes'; did you mean 'GetInputNames'?
78 | auto input_type = predictor->GetInputTypes();
| ^~~~~~~~~~~~~
| GetInputNames
/workspace/paddle_backend/src/paddle.cc: In constructor 'ModelImpl::ModelImpl(const char*, const char*, TRITONPADDLE_Config*, int32_t, cudaStream_t)':
/workspace/paddle_backend/src/paddle.cc:177:23: error: 'struct paddle::AnalysisConfig' has no member named 'SetExecStream'
177 | analysis_config_->SetExecStream((void*)stream);
| ^~~~~~~~~~~~~
/workspace/paddle_backend/src/paddle.cc:198:27: error: 'struct paddle::AnalysisConfig' has no member named 'EnableVarseqlen'
198 | analysis_config_->EnableVarseqlen();
| ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/triton-paddle-backend.dir/build.make:82: CMakeFiles/triton-paddle-backend.dir/src/paddle.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:166: CMakeFiles/triton-paddle-backend.dir/all] Error 2
make: *** [Makefile:149: all] Error 2

Can anyone give me an idea how to fix this?

What's the best way to deploy PaddleOCR-v3 using triton server?

Hey team,

I got a question on how to deploy PaddleOCR using triton.

PaddleOCR consists of three models (detection, orientation classifier, and recognition) and some processing steps might exist between those models. I was wondering if we should wrap all three models into one "meta-model", and server the "meta-model" in Triton server? I was thinking about doing something like below:

from paddleocr import PaddleOCR
from paddle.static import InputSpec
from paddle.jit import to_static
import paddle.nn as nn

class PaddleTritonModel(nn.Layer):
    def __init__(self):
        super(PaddleTritonModel, self).__init__()
        ocr = PaddleOCR(use_angle_cls=True, lang="en")
        self.ocr = ocr

    @to_static(input_spec=[InputSpec(shape=[None, None, 3], name="x")])
    def forward(self, img):
        result = self.ocr.ocr(img, cls=True)
        return result

Or should we serve three models separately? But given the complexity of the predict_system.py script provided by the PaddleOCR team, I feel we can reuse their inference script to connect those three models together. Using these three models separately sounds like a rabbit hole. Any suggestions are very welcome!!

ERROR in bash perf_ernie.sh,SUCCESS in bash perf_resnet50_v1.5.sh

When I run bash perf_ernie.sh,the server output following:

E1003 14:03:23.654152    91 helper.h:114] 3: [executionContext.cpp::setOptimizationProfileInternal::755] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setOptimizationProfileInternal::755, condition: profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654186    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654201    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654212    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654222    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654234    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654249    91 helper.h:114] 3: [executionContext.cpp::getBindingDimensions::978] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::getBindingDimensions::978, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654286    91 helper.h:114] 3: [executionContext.cpp::enqueueInternal::318] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::318, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
Signal (11) received.
 0# 0x0000562C43FD3549 in /opt/tritonserver/bin/tritonserver
 1# 0x00007F052C55D0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F05204E3928 in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 3# 0x00007F0520495579 in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 4# 0x00007F0520496775 in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 5# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 6# 0x00007F052D10A07A in /opt/tritonserver/bin/../lib/libtritonserver.so
 7# 0x00007F052D10A797 in /opt/tritonserver/bin/../lib/libtritonserver.so
 8# 0x00007F052CF9D221 in /opt/tritonserver/bin/../lib/libtritonserver.so
 9# 0x00007F052D104607 in /opt/tritonserver/bin/../lib/libtritonserver.so
10# 0x00007F052C94EDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
11# 0x00007F052CDCB609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
12# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

But I run perf_resnet50_v1.5.sh SUCCESS.

Is it possible to fix this issues ?

Paddle TensorRT配置错误

不适用TensorRT推理，配置文件如下，可以正常推理。
name: "test"
backend: "paddle"
input [
{
name: "input"
data_type: TYPE_FP32
dims: [ 3, 896, 896 ]
}
]
output [
{
name: "conv2d_59.tmp_1"
data_type: TYPE_FP32
dims: [ 3, 896, 896 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]
dynamic_batching {
preferred_batch_size: [ 2, 4 ]
max_queue_delay_microseconds: 0
}
配置 TensorRT推理时，启动失败，配置文件和错误如下：
name: "test"
backend: "paddle"
input [
{
name: "input"
data_type: TYPE_FP32
dims: [ 3, 896, 896 ]
}
]

output [
{
name: "conv2d_59.tmp_1"
data_type: TYPE_FP32
dims: [ 3, 896, 896 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]
dynamic_batching {
preferred_batch_size: [ 2, 4 ]
max_queue_delay_microseconds: 0
}
optimization {
execution_accelerators {
gpu_execution_accelerator : [
{
name : "tensorrt"
parameters { key: "precision" value: "trt_fp16" }
parameters { key: "min_graph_size" value: "4" }
parameters { key: "workspace_size" value: "1073741824" }
parameters { key: "enable_tensorrt_oss" value: "0" }
parameters { key: "is_dynamic" value: "1" }
},
{
name : "min_shape"
parameters { key: "input" value: "1 3 896 896" }
},
{
name : "max_shape"
parameters { key: "input" value: "2 3 896 896" }
},
{
name : "opt_shape"
parameters { key: "input" value: "1 3 896 896" }
}
]
}
}
错误信息：
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0306 08:51:06.968530 2126 analysis_config.cc:1336] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
Segmentation fault (core dumped)

测试examples中的ERNIE模型，也是这个错误WARNING: Logging before InitGoogleLogging() is written to STDERR
I0306 08:51:06.968530 2126 analysis_config.cc:1336] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
Segmentation fault (core dumped)

这是什么原因呢 @ZeyuChen @jeng1220 @

bash build_paddle.sh fails

Hey team,

I'm trying to build on MAC OS Monterey Version 12.5, but it fails with this message:

2261.4 Scanning dependencies of target strided_slice_grad_kernel_base
2261.5 [ 13%] Building CXX object paddle/phi/kernels/CMakeFiles/strided_slice_grad_kernel_base.dir/cpu/strided_slice_grad_kernel.cc.o
2271.4 [ 13%] Linking CXX static library libtemporal_shift_grad_kernel.a
2271.5 [ 13%] Built target temporal_shift_grad_kernel
2271.5 [ 13%] Building CUDA object paddle/phi/kernels/CMakeFiles/strided_slice_grad_kernel_base.dir/gpu/strided_slice_grad_kernel.cu.o
2292.2 /tmp/nvcc-lazy-build.KEVGwZ3b/build.sh.pre: line 18: 56852 Killed cicc --c++14 --gnu_version=90300 -w --orig_src_file_name "/opt/tritonserver/Paddle/paddle/phi/kernels/gpu/strided_slice_grad_kernel.cu" --allow_managed --extended-lambda --relaxed_constexpr -arch compute_86 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "strided_slice_grad_kernel.fatbin.c" -tused --gen_module_id_file --module_id_file_name "/tmp/nvcc-lazy-build.KEVGwZ3b/strided_slice_grad_kernel.module_id" --gen_c_file_name "/tmp/nvcc-lazy-build.KEVGwZ3b/strided_slice_grad_kernel.compute_86.cudafe1.c" --stub_file_name "/tmp/nvcc-lazy-build.KEVGwZ3b/strided_slice_grad_kernel.compute_86.cudafe1.stub.c" --gen_device_file_name "/tmp/nvcc-lazy-build.KEVGwZ3b/strided_slice_grad_kernel.compute_86.cudafe1.gpu" "/tmp/nvcc-lazy-build.KEVGwZ3b/strided_slice_grad_kernel.compute_86.cpp1.ii" -o "/tmp/nvcc-lazy-build.KEVGwZ3b/strided_slice_grad_kernel.compute_86.ptx"
2292.2 make[2]: * [paddle/phi/kernels/CMakeFiles/strided_slice_grad_kernel_base.dir/build.make:76: paddle/phi/kernels/CMakeFiles/strided_slice_grad_kernel_base.dir/gpu/strided_slice_grad_kernel.cu.o] Error 137
2292.2 make[2]: * Waiting for unfinished jobs....
2293.2 Scanning dependencies of target stack_grad_kernel
2293.2 [ 13%] Building CXX object paddle/phi/kernels/CMakeFiles/stack_grad_kernel.dir/cpu/stack_grad_kernel.cc.o
2294.5 [ 13%] Building CUDA object paddle/phi/kernels/CMakeFiles/strided_slice_kernel_base.dir/gpu/strided_slice_kernel.cu.o
2301.4 make[1]: * [CMakeFiles/Makefile2:16722: paddle/phi/kernels/CMakeFiles/strided_slice_grad_kernel_base.dir/all] Error 2
2301.4 make[1]: * Waiting for unfinished jobs....
2301.4 [ 13%] Building CUDA object paddle/phi/kernels/CMakeFiles/stack_grad_kernel.dir/gpu/stack_grad_kernel.cu.o
2319.8 [ 13%] Linking CXX static library libtile_kernel.a
2319.9 [ 13%] Built target tile_kernel
2320.0 [ 13%] Linking CXX static library libtop_k_grad_kernel.a
2320.1 [ 13%] Built target top_k_grad_kernel
2339.7 [ 13%] Linking CXX static library libdropout_grad_kernel.a
2339.8 [ 13%] Built target dropout_grad_kernel
2344.6 [ 13%] Linking CXX static library libstack_grad_kernel.a
2344.6 [ 13%] Built target stack_grad_kernel
2394.0 [ 13%] Linking CXX static library libtrace_kernel.a
2394.1 [ 13%] Built target trace_kernel
2448.2 [ 13%] Linking CXX static library libstrided_slice_kernel_base.a
2448.3 [ 13%] Built target strided_slice_kernel_base
2479.7 [ 13%] Linking CXX static library libtranspose_kernel.a
2479.8 [ 13%] Built target transpose_kernel
2479.9 make: *** [Makefile:130: all] Error 2

executor failed running [/bin/sh -c python3 -m pip install pyyaml && mkdir build-env && cd build-env && cmake .. -DWITH_PYTHON=OFF -DWITH_GPU=ON -DWITH_TESTING=OFF -DWITH_INFERENCE_API_TEST=OFF -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Auto -DON_INFER=ON -DWITH_MKL=ON -DWITH_TENSORRT=ON -DWITH_ONNXRUNTIME=ON -DCMAKE_C_COMPILER=which gcc-8 -DCMAKE_CXX_COMPILER=which g++-8 && make -jnproc]: exit code: 2

It seems to me the "cmake" command cause the problem, any idea on how to fix this?

InvalidArgumentError: The tensor Input (Input) of Slice op is not initialized.

推理paddleocr里面的re模型时，报下面的错误，这是什么原因呢？
模型路径：https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_infer.tar
错误日志：
InvalidArgumentError: The tensor Input (Input) of Slice op is not initialized.
[Hint: Expected in_tensor.IsInitialized() == true, but received in_tensor.IsInitialized():0 != true:1.] (at /opt/tritonserver/Paddle/paddle/fluid/operators/slice_op.cc:147)

Which script is the source code for loading the model?

Which script is the source code for loading the model?
I want to modify the model for encryption, so I need to add decryption to the model load, but I can't find out how the model is loaded in. @ZeyuChen @jeng1220

Build Paddle report errors

root@nvidia-B360M-D2V:/opt/tritonserver/backends/paddlepaddle_backend-main/paddle-lib# bash build_paddle.sh

docker build -t paddle-build .
[+] Building 0.7s (3/3) FINISHED
=> [internal] load .dockerignore 0.2s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.3s
=> => transferring dockerfile: 2.72kB 0.0s
=> ERROR [internal] load metadata for nvcr.io/nvidia/tritonserver:21.10-py3 0.4s

[internal] load metadata for nvcr.io/nvidia/tritonserver:21.10-py3:

Dockerfile:27

25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 |
27 | >>> FROM nvcr.io/nvidia/tritonserver:21.10-py3
28 |
29 | ENV DEBIAN_FRONTEND=noninteractive

ERROR: failed to solve: failed to fetch anonymous token: unexpected status: 401 Unauthorized

Errors when compiling with CUDA 11.6

When we build paddlepaddle v2.2.2 with CUDA 11.6, it always has errors when we turn on TENSORRT. Could you give us some suggestions to resolve it ? Thank you very much!

In file included from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/util.h:36,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/internal/copy_cross_system.h:41,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/copy.h:100,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/detail/adl/copy.h:42,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/copy.inl:22,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/copy.h:90,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/allocator/copy_construct_range.inl:21,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/allocator/copy_construct_range.h:45,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/contiguous_storage.inl:23,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/contiguous_storage.h:234,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/vector_base.h:30,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/device_vector.h:26,
                 from /workspace/Paddle/paddle/fluid/inference/tensorrt/plugin/gather_nd_op_plugin.h:17,
                 from /workspace/Paddle/paddle/fluid/inference/tensorrt/convert/gather_nd_op.cc:16:
/usr/local/cuda/targets/x86_64-linux/include/cub/detail/device_synchronize.cuh:33: error: ignoring #pragma nv_exec_check_disable  [-Werror=unknown-pragmas]
 #pragma nv_exec_check_disable
 
In file included from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/util.h:36,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/internal/copy_cross_system.h:41,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/copy.h:100,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/system/detail/adl/copy.h:42,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/copy.inl:22,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/copy.h:90,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/allocator/copy_construct_range.inl:21,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/allocator/copy_construct_range.h:45,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/contiguous_storage.inl:23,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/contiguous_storage.h:234,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/vector_base.h:30,
                 from /usr/local/cuda/targets/x86_64-linux/include/thrust/device_vector.h:26,
                 from /workspace/Paddle/paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h:17,
                 from /workspace/Paddle/paddle/fluid/inference/tensorrt/convert/split_op.cc:16:
/usr/local/cuda/targets/x86_64-linux/include/cub/detail/device_synchronize.cuh:33: error: ignoring #pragma nv_exec_check_disable  [-Werror=unknown-pragmas]
 #pragma nv_exec_check_disable

Environment:

CUDA 11.6

This is the compilation command:

cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=OFF \
             -DWITH_TENSORRT=ON \
             -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` \ 
&& make TARGET=SKYLAKEX -j`nproc`

@jiweibo

perf_analyzer paddlepaddle model fault

I compiled the backend of paddle using version 21.04 and successfully generated the libtriton_paddle.so required by triton and successfully loaded the model of paddle, but I made an error when I pressed the model using perf_Analyzer:

# ./perf_analyzer -a -b 30 -u localhost:8001 -i gRPC -m rec --concurrency-range 1 --shape x:3,32,32
*** Measurement Settings ***
  Batch size: 30
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to retrieve results from inference request.
Thread [0] had error: Failed to create output tensor 'save_infer_model/scale_0.tmp_1 for 'rec_0'

triton outout:

I0629 06:45:58.323692 307 paddle.cc:1207] model cls, instance cls_0, executing 1 requests
I0629 06:45:58.323710 307 paddle.cc:846] TRITONBACKEND_ModelExecute: Running cls_0 with 1 requests
I0629 06:45:58.337416 307 paddle.cc:1026] TRITONBACKEND_ModelExecute: model cls_0 released 1 requests
I0629 06:46:04.318618 307 http_server.cc:1229] HTTP request: 0 /v2/models/cls/stats
I0629 06:46:04.318654 307 model_repository_manager.cc:615] VersionStates() 'cls'
I0629 06:46:04.318665 307 model_repository_manager.cc:659] GetInferenceBackend() 'cls' version 1
I0629 06:47:58.797566 307 grpc_server.cc:270] Process for ModelMetadata, rpc_ok=1, 1 step START
I0629 06:47:58.797597 307 grpc_server.cc:225] Ready for RPC 'ModelMetadata', 2
I0629 06:47:58.797607 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.797617 307 model_repository_manager.cc:615] VersionStates() 'rec'
I0629 06:47:58.797691 307 grpc_server.cc:270] Process for ModelMetadata, rpc_ok=1, 1 step COMPLETE
I0629 06:47:58.797699 307 grpc_server.cc:408] Done for ModelMetadata, 1
I0629 06:47:58.798961 307 grpc_server.cc:270] Process for ModelConfig, rpc_ok=1, 1 step START
I0629 06:47:58.798977 307 grpc_server.cc:225] Ready for RPC 'ModelConfig', 2
I0629 06:47:58.798984 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.799748 307 grpc_server.cc:270] Process for ModelConfig, rpc_ok=1, 1 step COMPLETE
I0629 06:47:58.799761 307 grpc_server.cc:408] Done for ModelConfig, 1
I0629 06:47:58.801194 307 grpc_server.cc:270] Process for ServerMetadata, rpc_ok=1, 1 step START
I0629 06:47:58.801237 307 grpc_server.cc:225] Ready for RPC 'ServerMetadata', 2
I0629 06:47:58.801284 307 grpc_server.cc:270] Process for ServerMetadata, rpc_ok=1, 1 step COMPLETE
I0629 06:47:58.801290 307 grpc_server.cc:408] Done for ServerMetadata, 1
I0629 06:47:58.801546 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 2 step START
I0629 06:47:58.801568 307 grpc_server.cc:225] Ready for RPC 'ModelStatistics', 3
I0629 06:47:58.801579 307 model_repository_manager.cc:615] VersionStates() 'rec'
I0629 06:47:58.801590 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version 1
I0629 06:47:58.801673 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 2 step COMPLETE
I0629 06:47:58.801682 307 grpc_server.cc:408] Done for ModelStatistics, 2
I0629 06:47:58.802736 307 grpc_server.cc:3124] Process for ModelInferHandler, rpc_ok=1, 4 step START
I0629 06:47:58.802760 307 grpc_server.cc:3117] New request handler for ModelInferHandler, 5
I0629 06:47:58.802768 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.802776 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.802795 307 infer_request.cc:497] prepared: [0x0x7fd914009ec0] request id: 0, model: rec, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 30, priority: 0, timeout (us): 0
original inputs:
[0x0x7fd91400a148] input: x, type: FP32, original shape: [30,3,32,32], batch + shape: [30,3,32,32], shape: [3,32,32]
override inputs:
inputs:
[0x0x7fd91400a148] input: x, type: FP32, original shape: [30,3,32,32], batch + shape: [30,3,32,32], shape: [3,32,32]
original requested outputs:
save_infer_model/scale_0.tmp_1
requested outputs:
save_infer_model/scale_0.tmp_1

I0629 06:47:58.802985 307 paddle.cc:1207] model rec, instance rec_0, executing 1 requests
I0629 06:47:58.803000 307 paddle.cc:846] TRITONBACKEND_ModelExecute: Running rec_0 with 1 requests
W0629 06:47:58.826670   317 rnn_op.cu.cc:331] If the memory space of the Input WeightList is not continuous, less efficient calculation will be called. Please call flatten_parameters() to make the input memory continuous.
I0629 06:47:59.041294 307 grpc_server.cc:3275] ModelInferHandler::InferResponseComplete, 4 step ISSUED
I0629 06:47:59.041468 307 grpc_server.cc:2847] ModelInferHandler::InferRequestComplete
I0629 06:47:59.041487 307 paddle.cc:1026] TRITONBACKEND_ModelExecute: model rec_0 released 1 requests
I0629 06:47:59.041547 307 grpc_server.cc:3124] Process for ModelInferHandler, rpc_ok=1, 4 step COMPLETE
I0629 06:47:59.041559 307 grpc_server.cc:2169] Done for ModelInferHandler, 4
I0629 06:48:04.801946 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 3 step START
I0629 06:48:04.801980 307 grpc_server.cc:225] Ready for RPC 'ModelStatistics', 4
I0629 06:48:04.801991 307 model_repository_manager.cc:615] VersionStates() 'rec'
I0629 06:48:04.802002 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version 1
I0629 06:48:04.802103 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 3 step COMPLETE
I0629 06:48:04.802111 307 grpc_server.cc:408] Done for ModelStatistics, 3

How to solve this problem？

Issues with bash scripts/build_paddle_backend.sh

HI, I got this error when bash scripts/build_paddle_backend.sh ? Is it possible to fix this issues ?

/workspace/paddle_backend/src/paddle.cc: In member function 'void ModelImpl::CollectShapeRun(paddle_infer::Predictor*, const std::map<std::__cxx11::basic_string<char>, std::vector<int> >&)':
/workspace/paddle_backend/src/paddle.cc:79:32: error: 'class paddle_infer::Predictor' has no member named 'GetInputType'; did you mean 'GetInputNames'?
   79 |   auto input_type = predictor->GetInputType();
      |                                ^~~~~~~~~~~~
      |                                GetInputNames
make[2]: *** [CMakeFiles/triton-paddle-backend.dir/build.make:82: CMakeFiles/triton-paddle-backend.dir/src/paddle.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:166: CMakeFiles/triton-paddle-backend.dir/all] Error 2
make: *** [Makefile:149: all] Error 2