When I run bash perf_ernie.sh,the server output following: <div class="snippet-cli

triton server log <div class="snippet-clipboard-content notranslate position-relat

I find the problem is the tensorrt optimization according to <code class="notranslate"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

You can compile Paddle with release/2.4 : <a href="https://github.com/PaddlePaddle/Pad

Comments (17)

ZJU-lishuang commented on May 29, 2024

triton server log

==================================
== Triton Inference Server Base ==
==================================

NVIDIA Release 22.03 (build 33743047)

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I1003 14:02:03.388966 1 onnxruntime.cc:2319] TRITONBACKEND_Initialize: onnxruntime
I1003 14:02:03.389124 1 onnxruntime.cc:2329] Triton TRITONBACKEND API version: 1.8
I1003 14:02:03.389140 1 onnxruntime.cc:2335] 'onnxruntime' TRITONBACKEND API version: 1.8
I1003 14:02:03.389151 1 onnxruntime.cc:2365] backend configuration:
{}
I1003 14:02:03.537700 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f04ee000000' with size 268435456
I1003 14:02:03.538462 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1003 14:02:03.541414 1 model_repository_manager.cc:997] loading: ERNIE:1
I1003 14:02:03.642048 1 model_repository_manager.cc:997] loading: ResNet50-v1.5:1
I1003 14:02:03.781986 1 paddle.cc:1204] TRITONBACKEND_Initialize: paddle
I1003 14:02:03.782021 1 paddle.cc:1212] Triton TRITONBACKEND API version: 1.8
I1003 14:02:03.782028 1 paddle.cc:1219] 'paddle' TRITONBACKEND API version: 1.8
I1003 14:02:03.782032 1 paddle.cc:1249] backend configuration:
{}
I1003 14:02:03.782059 1 paddle.cc:1266] TRITONBACKEND_ModelInitialize: ERNIE (version 1)
I1003 14:02:03.783862 1 paddle.cc:1266] TRITONBACKEND_ModelInitialize: ResNet50-v1.5 (version 1)
I1003 14:02:03.784426 1 paddle.cc:1309] TRITONBACKEND_ModelInstanceInitialize: ERNIE_0 (GPU device 0)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1003 14:02:03.819965    88 analysis_config.cc:1164] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
I1003 14:02:03.835196    88 analysis_predictor.cc:1220] ir_optim is turned off, no IR pass will be executed
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I1003 14:02:03.975621    88 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:02:03.978397    88 naive_executor.cc:110] ---  skip [feed], feed -> token_type_ids
I1003 14:02:03.978420    88 naive_executor.cc:110] ---  skip [feed], feed -> input_ids
I1003 14:02:03.980311    88 naive_executor.cc:110] ---  skip [linear_113.tmp_1], fetch -> fetch
I1003 14:02:12.677105    88 analysis_predictor.cc:1080] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [identity_scale_op_clean_pass]
--- Running IR pass [adaptive_pool2d_convert_global_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_fill_constant_op_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [delete_quant_dequant_filter_op_pass]
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
--- Running IR pass [add_support_int8_pass]
I1003 14:02:12.796942    88 fuse_pass_base.cc:59] ---  detected 220 subgraphs
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [delete_c_identity_op_pass]
--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]
--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]
I1003 14:02:12.937906    88 fuse_pass_base.cc:59] ---  detected 6 subgraphs
--- Running IR pass [vit_attention_fuse_pass]
--- Running IR pass [trt_skip_layernorm_fuse_pass]
I1003 14:02:12.947113    88 fuse_pass_base.cc:59] ---  detected 13 subgraphs
--- Running IR pass [preln_skip_layernorm_fuse_pass]
--- Running IR pass [preln_residual_bias_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [unsqueeze2_eltwise_fuse_pass]
--- Running IR pass [trt_squeeze2_matmul_fuse_pass]
--- Running IR pass [trt_reshape2_matmul_fuse_pass]
--- Running IR pass [trt_flatten2_matmul_fuse_pass]
--- Running IR pass [trt_map_matmul_v2_to_mul_pass]
I1003 14:02:12.951237    88 fuse_pass_base.cc:59] ---  detected 20 subgraphs
--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]
--- Running IR pass [trt_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I1003 14:02:12.956024    88 fuse_pass_base.cc:59] ---  detected 20 subgraphs
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [remove_padding_recover_padding_pass]
--- Running IR pass [delete_remove_padding_recover_padding_pass]
--- Running IR pass [dense_fc_to_sparse_pass]
--- Running IR pass [dense_multihead_matmul_to_sparse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I1003 14:02:12.962479    88 tensorrt_subgraph_pass.cc:238] ---  detect a sub-graph with 51 nodes
I1003 14:02:12.976985    88 tensorrt_subgraph_pass.cc:541] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I1003 14:02:13.822504    88 engine.cc:268] Run Paddle-TRT Dynamic Shape mode.
I1003 14:03:10.097728    88 engine.cc:680] ====== engine info ======
I1003 14:03:10.104418    88 engine.cc:685] Layers:
Scale: before_reshape (Output: tmp_312)
PWN(elementwise (Output: tmp_532), elementwise (Output: tmp_634))
Scale: scale (Output: tmp_312)
skip_layernorm (Output: layer_norm_26.tmp_249)
shuffle_before_multihead_mamul(Output: reshape2_3.tmp_0104)
scale (Output: tmp_312) + unsqueeze2 (Output: unsqueeze2_0.tmp_014)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_3.tmp_0104)
multihead_mamul_fc(Output: reshape2_3.tmp_0104)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_3.tmp_0104)
multihead_matmul (Output: reshape2_3.tmp_0104)
fc_op_reshape_before_fc: Shuffle (Output: linear_79.tmp_1111)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_79.tmp_1111)
fc_op_float: FullyConnected (Output: linear_79.tmp_1111)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_79.tmp_1111)
shuffle_after_fc (Output: linear_79.tmp_1111)
skip_layernorm (Output: layer_norm_27.tmp_2122)
fc_op_reshape_before_fc: Shuffle (Output: linear_80.tmp_1128)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_80.tmp_1128)
fc_op_float: FullyConnected (Output: linear_80.tmp_1128)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_80.tmp_1128)
shuffle_after_fc (Output: linear_80.tmp_1128)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 71) [Constant], (Unnamed Layer* 72) [ElementWise]), (Unnamed Layer* 73) [Unary]), PWN((Unnamed Layer* 69) [Constant], (Unnamed Layer* 74) [ElementWise])), PWN((Unnamed Layer* 70) [Constant], (Unnamed Layer* 75) [ElementWise])), gelu (Output: gelu_1.tmp_0130))
fc_op_reshape_before_fc: Shuffle (Output: linear_81.tmp_1136)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_81.tmp_1136)
fc_op_float: FullyConnected (Output: linear_81.tmp_1136)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_81.tmp_1136)
shuffle_after_fc (Output: linear_81.tmp_1136)
skip_layernorm (Output: layer_norm_28.tmp_2147)
shuffle_before_multihead_mamul(Output: reshape2_7.tmp_0199)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_7.tmp_0199)
multihead_mamul_fc(Output: reshape2_7.tmp_0199)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_7.tmp_0199)
multihead_matmul (Output: reshape2_7.tmp_0199)
fc_op_reshape_before_fc: Shuffle (Output: linear_85.tmp_1206)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_85.tmp_1206)
fc_op_float: FullyConnected (Output: linear_85.tmp_1206)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_85.tmp_1206)
shuffle_after_fc (Output: linear_85.tmp_1206)
skip_layernorm (Output: layer_norm_29.tmp_2217)
fc_op_reshape_before_fc: Shuffle (Output: linear_86.tmp_1223)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_86.tmp_1223)
fc_op_float: FullyConnected (Output: linear_86.tmp_1223)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_86.tmp_1223)
shuffle_after_fc (Output: linear_86.tmp_1223)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 157) [Constant], (Unnamed Layer* 158) [ElementWise]), (Unnamed Layer* 159) [Unary]), PWN((Unnamed Layer* 155) [Constant], (Unnamed Layer* 160) [ElementWise])), PWN((Unnamed Layer* 156) [Constant], (Unnamed Layer* 161) [ElementWise])), gelu (Output: gelu_2.tmp_0225))
fc_op_reshape_before_fc: Shuffle (Output: linear_87.tmp_1231)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_87.tmp_1231)
fc_op_float: FullyConnected (Output: linear_87.tmp_1231)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_87.tmp_1231)
shuffle_after_fc (Output: linear_87.tmp_1231)
skip_layernorm (Output: layer_norm_30.tmp_2242)
shuffle_before_multihead_mamul(Output: reshape2_11.tmp_0294)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_11.tmp_0294)
multihead_mamul_fc(Output: reshape2_11.tmp_0294)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_11.tmp_0294)
multihead_matmul (Output: reshape2_11.tmp_0294)
fc_op_reshape_before_fc: Shuffle (Output: linear_91.tmp_1301)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_91.tmp_1301)
fc_op_float: FullyConnected (Output: linear_91.tmp_1301)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_91.tmp_1301)
shuffle_after_fc (Output: linear_91.tmp_1301)
skip_layernorm (Output: layer_norm_31.tmp_2312)
fc_op_reshape_before_fc: Shuffle (Output: linear_92.tmp_1318)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_92.tmp_1318)
fc_op_float: FullyConnected (Output: linear_92.tmp_1318)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_92.tmp_1318)
shuffle_after_fc (Output: linear_92.tmp_1318)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 243) [Constant], (Unnamed Layer* 244) [ElementWise]), (Unnamed Layer* 245) [Unary]), PWN((Unnamed Layer* 241) [Constant], (Unnamed Layer* 246) [ElementWise])), PWN((Unnamed Layer* 242) [Constant], (Unnamed Layer* 247) [ElementWise])), gelu (Output: gelu_3.tmp_0320))
fc_op_reshape_before_fc: Shuffle (Output: linear_93.tmp_1326)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_93.tmp_1326)
fc_op_float: FullyConnected (Output: linear_93.tmp_1326)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_93.tmp_1326)
shuffle_after_fc (Output: linear_93.tmp_1326)
skip_layernorm (Output: layer_norm_32.tmp_2337)
shuffle_before_multihead_mamul(Output: reshape2_15.tmp_0389)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_15.tmp_0389)
multihead_mamul_fc(Output: reshape2_15.tmp_0389)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_15.tmp_0389)
multihead_matmul (Output: reshape2_15.tmp_0389)
fc_op_reshape_before_fc: Shuffle (Output: linear_97.tmp_1396)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_97.tmp_1396)
fc_op_float: FullyConnected (Output: linear_97.tmp_1396)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_97.tmp_1396)
shuffle_after_fc (Output: linear_97.tmp_1396)
skip_layernorm (Output: layer_norm_33.tmp_2407)
fc_op_reshape_before_fc: Shuffle (Output: linear_98.tmp_1413)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_98.tmp_1413)
fc_op_float: FullyConnected (Output: linear_98.tmp_1413)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_98.tmp_1413)
shuffle_after_fc (Output: linear_98.tmp_1413)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 329) [Constant], (Unnamed Layer* 330) [ElementWise]), (Unnamed Layer* 331) [Unary]), PWN((Unnamed Layer* 327) [Constant], (Unnamed Layer* 332) [ElementWise])), PWN((Unnamed Layer* 328) [Constant], (Unnamed Layer* 333) [ElementWise])), gelu (Output: gelu_4.tmp_0415))
fc_op_reshape_before_fc: Shuffle (Output: linear_99.tmp_1421)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_99.tmp_1421)
fc_op_float: FullyConnected (Output: linear_99.tmp_1421)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_99.tmp_1421)
shuffle_after_fc (Output: linear_99.tmp_1421)
skip_layernorm (Output: layer_norm_34.tmp_2432)
shuffle_before_multihead_mamul(Output: reshape2_19.tmp_0484)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_19.tmp_0484)
multihead_mamul_fc(Output: reshape2_19.tmp_0484)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_19.tmp_0484)
multihead_matmul (Output: reshape2_19.tmp_0484)
fc_op_reshape_before_fc: Shuffle (Output: linear_103.tmp_1491)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_103.tmp_1491)
fc_op_float: FullyConnected (Output: linear_103.tmp_1491)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_103.tmp_1491)
shuffle_after_fc (Output: linear_103.tmp_1491)
skip_layernorm (Output: layer_norm_35.tmp_2502)
fc_op_reshape_before_fc: Shuffle (Output: linear_104.tmp_1508)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_104.tmp_1508)
fc_op_float: FullyConnected (Output: linear_104.tmp_1508)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_104.tmp_1508)
shuffle_after_fc (Output: linear_104.tmp_1508)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 415) [Constant], (Unnamed Layer* 416) [ElementWise]), (Unnamed Layer* 417) [Unary]), PWN((Unnamed Layer* 413) [Constant], (Unnamed Layer* 418) [ElementWise])), PWN((Unnamed Layer* 414) [Constant], (Unnamed Layer* 419) [ElementWise])), gelu (Output: gelu_5.tmp_0510))
fc_op_reshape_before_fc: Shuffle (Output: linear_105.tmp_1516)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_105.tmp_1516)
fc_op_float: FullyConnected (Output: linear_105.tmp_1516)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_105.tmp_1516)
shuffle_after_fc (Output: linear_105.tmp_1516)
skip_layernorm (Output: layer_norm_36.tmp_2527)
shuffle_before_multihead_mamul(Output: reshape2_23.tmp_0579)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_23.tmp_0579)
multihead_mamul_fc(Output: reshape2_23.tmp_0579)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_23.tmp_0579)
multihead_matmul (Output: reshape2_23.tmp_0579)
fc_op_reshape_before_fc: Shuffle (Output: linear_109.tmp_1586)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_109.tmp_1586)
fc_op_float: FullyConnected (Output: linear_109.tmp_1586)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_109.tmp_1586)
shuffle_after_fc (Output: linear_109.tmp_1586)
skip_layernorm (Output: layer_norm_37.tmp_2597)
fc_op_reshape_before_fc: Shuffle (Output: linear_110.tmp_1603)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_110.tmp_1603)
fc_op_float: FullyConnected (Output: linear_110.tmp_1603)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_110.tmp_1603)
shuffle_after_fc (Output: linear_110.tmp_1603)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 501) [Constant], (Unnamed Layer* 502) [ElementWise]), (Unnamed Layer* 503) [Unary]), PWN((Unnamed Layer* 499) [Constant], (Unnamed Layer* 504) [ElementWise])), PWN((Unnamed Layer* 500) [Constant], (Unnamed Layer* 505) [ElementWise])), gelu (Output: gelu_6.tmp_0605))
fc_op_reshape_before_fc: Shuffle (Output: linear_111.tmp_1611)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_111.tmp_1611)
fc_op_float: FullyConnected (Output: linear_111.tmp_1611)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_111.tmp_1611)
shuffle_after_fc (Output: linear_111.tmp_1611)
skip_layernorm (Output: layer_norm_38.tmp_2622)
slice (Output: layer_norm_38.tmp_2_slice_0624) + fc_op_reshape_before_fc: Shuffle (Output: linear_112.tmp_1630)
fc_op_float: FullyConnected (Output: linear_112.tmp_1630)
PWN(tanh (Output: tanh_3.tmp_0632))
fc_op_float: FullyConnected (Output: linear_113.tmp_1641)
shuffle_after_fc (Output: linear_113.tmp_1641)

Bindings:
embedding_10.tmp_0
embedding_11.tmp_0
embedding_8.tmp_0
embedding_9.tmp_0
tmp_2
linear_113.tmp_1641
I1003 14:03:10.104538    88 engine.cc:687] ====== engine info end ======
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1003 14:03:10.112846    88 ir_params_sync_among_devices_pass.cc:88] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1003 14:03:10.162847    88 memory_optimize_pass.cc:218] Cluster name : full_like_0.tmp_0  size: 8
I1003 14:03:10.162859    88 memory_optimize_pass.cc:218] Cluster name : tmp_4  size: 8
I1003 14:03:10.162863    88 memory_optimize_pass.cc:218] Cluster name : cumsum_0.tmp_0  size: 8
I1003 14:03:10.162864    88 memory_optimize_pass.cc:218] Cluster name : token_type_ids  size: 8
--- Running analysis [ir_graph_to_program_pass]
I1003 14:03:10.183755    88 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:03:10.188370    88 naive_executor.cc:110] ---  skip [feed], feed -> token_type_ids
I1003 14:03:10.188387    88 naive_executor.cc:110] ---  skip [feed], feed -> input_ids
I1003 14:03:10.188688    88 naive_executor.cc:110] ---  skip [linear_113.tmp_1], fetch -> fetch
W1003 14:03:10.188714    88 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.6
W1003 14:03:10.188877    88 gpu_resources.cc:91] device: 0, cuDNN Version: 8.3.
I1003 14:03:10.188996 1 paddle.cc:1309] TRITONBACKEND_ModelInstanceInitialize: ResNet50-v1.5_0 (GPU device 0)
I1003 14:03:10.196445 1 model_repository_manager.cc:1152] successfully loaded 'ERNIE' version 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [trt_skip_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I1003 14:03:10.288832    89 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
I1003 14:03:10.289359    89 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I1003 14:03:10.335124    89 fuse_pass_base.cc:59] ---  detected 16 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1003 14:03:10.338466    89 ir_params_sync_among_devices_pass.cc:88] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1003 14:03:10.395090    89 memory_optimize_pass.cc:218] Cluster name : fill_constant_1.tmp_0  size: 8
I1003 14:03:10.395102    89 memory_optimize_pass.cc:218] Cluster name : x0  size: 602112
I1003 14:03:10.395103    89 memory_optimize_pass.cc:218] Cluster name : elementwise_add_4  size: 1605632
I1003 14:03:10.395107    89 memory_optimize_pass.cc:218] Cluster name : conv2d_63.tmp_1  size: 3211264
I1003 14:03:10.395110    89 memory_optimize_pass.cc:218] Cluster name : elementwise_add_2  size: 3211264
I1003 14:03:10.395112    89 memory_optimize_pass.cc:218] Cluster name : conv2d_60.tmp_1  size: 3211264
--- Running analysis [ir_graph_to_program_pass]
I1003 14:03:10.422075    89 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:03:10.422649    89 naive_executor.cc:110] ---  skip [feed], feed -> x0
I1003 14:03:10.424878    89 naive_executor.cc:110] ---  skip [save_infer_model/scale_0.tmp_1], fetch -> fetch
I1003 14:03:10.425115 1 model_repository_manager.cc:1152] successfully loaded 'ResNet50-v1.5' version 1
I1003 14:03:10.425218 1 server.cc:524] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1003 14:03:10.425272 1 server.cc:551] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| paddle      | /opt/tritonserver/backends/paddle/libtriton_paddle.so           | {}     |
+-------------+-----------------------------------------------------------------+--------+

I1003 14:03:10.425306 1 server.cc:594] 
+---------------+---------+--------+
| Model         | Version | Status |
+---------------+---------+--------+
| ERNIE         | 1       | READY  |
| ResNet50-v1.5 | 1       | READY  |
+---------------+---------+--------+

I1003 14:03:10.469245 1 metrics.cc:651] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
I1003 14:03:10.469525 1 tritonserver.cc:1962] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.20.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | /workspace/models                                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                                    |
| strict_model_config              | 1                                                                                                                                                                                            |
| rate_limit                       | OFF                                                                                                                                                                                          |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |
| response_cache_byte_size         | 0                                                                                                                                                                                            |
| min_supported_compute_capability | 6.0                                                                                                                                                                                          |
| strict_readiness                 | 1                                                                                                                                                                                            |
| exit_timeout                     | 30                                                                                                                                                                                           |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1003 14:03:10.474583 1 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001
I1003 14:03:10.475297 1 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000
I1003 14:03:10.516651 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W1003 14:03:11.474144 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1003 14:03:12.475025 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1003 14:03:13.477496 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

I find the problem is the tensorrt optimization according to profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles().

How to solve it?

from paddlepaddle_backend.

heliqi commented on May 29, 2024

Are you using paddlepaddle/triton_paddle:21.10 image or other images?

from paddlepaddle_backend.

heliqi commented on May 29, 2024

@ZJU-lishuang
I use paddlepaddle/triton_paddle:21.10 image to work correctly.

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

other images。build from source，triton22.03

from paddlepaddle_backend.

heliqi commented on May 29, 2024

How is the paddle inference library for triton_paddle dependent obtained, source compiled or downloaded from?

Too higher versions of cuda and TensorRT may be risky in running paddle inference, I suggest you use our verified image first.

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

https://github.com/triton-inference-server/server/tree/r22.03
https://github.com/PaddlePaddle/Paddle/tree/a8ae87f118ddde049bd5c60c4493a667206f8055

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

我认为22.03的cuda和tensorrt版本应该没问题

from paddlepaddle_backend.

heliqi commented on May 29, 2024

You can compile Paddle with release/2.4 : https://github.com/PaddlePaddle/Paddle/tree/release/2.4.

There may be a problem with the code branch you provide

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

I will try https://github.com/triton-inference-server/server/tree/r22.03 and https://github.com/PaddlePaddle/Paddle/tree/release/2.4 again.And report the problem.
I have try this combination several days ago.

from paddlepaddle_backend.

heliqi commented on May 29, 2024

I used 21.10 + paddle release/2.4 when compiling the paddlepaddle/triton_paddle:21.10 image. So I suspect the TRT version may not match

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

condition1

ERROR:

Scanning dependencies of target paddle_inference_c
Scanning dependencies of target paddle_inference_c_shared
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_utils.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
[100%] Linking CXX shared library libpaddle_inference.so
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_utils.cc.o
[100%] Linking CXX static library libpaddle_inference_c.a
[100%] Built target paddle_inference_c
[100%] Linking CXX shared library libpaddle_inference_c.so
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.42]':
io.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
io.cc:(.text+0x13ba): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.592]':
io.cc:(.text+0x146a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::inference::ReadBinaryFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
io.cc:(.text+0x178e): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x179f): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18aa): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<char [14]>(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18b7): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<char [14]>(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18e8): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/build.make:2244: paddle/fluid/inference/libpaddle_inference.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:163108: paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.67]':
pd_config.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
pd_config.cc:(.text+0x22a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.300]':
pd_config.cc:(.text+0x2da): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `PD_ConfigDestroy':
pd_config.cc:(.text+0x808e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libpthread.so
pd_config.cc:(.text+0x80b7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_ptr<decltype(nullptr), (__gnu_cxx::_Lock_policy)2>::_M_dispose()' defined in .text._ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv[_ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
pd_config.cc:(.text+0x80e9): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()' defined in .text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::what() const':
pd_config.cc:(.text._ZNK3phi7enforce13EnforceNotMet4whatEv[_ZNK3phi7enforce13EnforceNotMet4whatEv]+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `fLI::FLAGS_call_stack_level' defined in .data section in ../libpaddle_inference.a(flags.cc.o)
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::~EnforceNotMet()':
pd_config.cc:(.text._ZN3phi7enforce13EnforceNotMetD2Ev[_ZN3phi7enforce13EnforceNotMetD5Ev]+0x13): additional relocation overflows omitted from the output
libpaddle_inference_c.so: PC-relative offset overflow in PLT entry for `_ZN3phi5funcs21LaunchBroadcastKernelINS_5dtype7float16ES3_NS_3kps13DivideFunctorIS3_fEELi1ELi1ELi4EEEvRKNS_10GPUContextERKSt6vectorIPKNS_11DenseTensorESaISD_EEPSA_IPSB_SaISI_EET1_RKNS_5ArrayINS4_7details15BroadcastConfigEXT2_EEE'
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/build.make:1204: paddle/fluid/inference/capi_exp/libpaddle_inference_c.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:177249: paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The command '/bin/sh -c python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env &&     cd build-env &&     cmake .. -DWITH_PYTHON=OFF              -DWITH_GPU=ON              -DWITH_TESTING=OFF              -DWITH_INFERENCE_API_TEST=OFF              -DCMAKE_BUILD_TYPE=Release              -DCUDA_ARCH_NAME=Auto              -DON_INFER=ON              -DWITH_MKL=ON              -DWITH_TENSORRT=ON              -DWITH_ONNXRUNTIME=ON &&     make -j8' returned a non-zero code: 2

paddlepaddle_backend/paddle-lib/Dockerfile

FROM nvcr.io/nvidia/tritonserver:22.03-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80 \
    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
    && dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev

RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100

RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4

RUN python3 -m pip install pyyaml && mkdir build-env && \
    cd build-env && \
    cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=ON \
             -DWITH_TENSORRT=ON \
             -DWITH_ONNXRUNTIME=ON  && \
    make -j`nproc`

from paddlepaddle_backend.

heliqi commented on May 29, 2024

PaddlePaddle compilation errors require raising issue on the paddlepaddle official website: https://github.com/PaddlePaddle/Paddle/issues

@ZJU-lishuang
Since release/2.4 hasn't been officially released yet, would you try v2.4.0-rc0?

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

the same problem,I have tried 2.4.0-rc0.

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

condition2

another dockerfile record
ERROR:

[ 86%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_helpers.cc.o
[  6%] Building CUDA object paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/broadcast.cu.o
[ 86%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_map_field.cc.o
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined

[ 87%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_message.cc.o
[ 87%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_message_field.cc.o
[ 88%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_primitive_field.cc.o
[ 88%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/js/js_generator.cc.o
...
...
...
[ 94%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/objectivec/objectivec_oneof.cc.o
[ 94%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc.o
6 errors detected in the compilation of "/opt/tritonserver/Paddle/paddle/phi/kernels/funcs/eigen/broadcast.cu".
make[2]: *** [paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/build.make:206: paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/broadcast.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 95%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/php/php_generator.cc.o
[ 95%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/plugin.cc.o

paddlepaddle_backend/paddle-lib/Dockerfile

FROM nvcr.io/nvidia/tritonserver:22.03-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80 \
    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
    && dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev

RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4

RUN python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && \
    cd build-env && \
    cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=ON \
             -DWITH_TENSORRT=ON \
             -DWITH_ONNXRUNTIME=ON \
             -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && \
    make -j8

from paddlepaddle_backend.

ZJU-lishuang commented on May 29, 2024

condition3

ERROR:

[100%] Built target paddle_inference
Scanning dependencies of target paddle_inference_c
Scanning dependencies of target paddle_inference_c_shared
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_utils.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_utils.cc.o
[100%] Linking CXX static library libpaddle_inference_c.a
[100%] Built target paddle_inference_c
[100%] Linking CXX shared library libpaddle_inference_c.so
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.42]':
io.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
io.cc:(.text+0x13ba): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.592]':
io.cc:(.text+0x146a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::inference::ReadBinaryFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
io.cc:(.text+0x178e): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x179f): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18aa): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<char [14]>(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18b7): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<char [14]>(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18e8): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/build.make:2244: paddle/fluid/inference/libpaddle_inference.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:163108: paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.67]':
pd_config.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
pd_config.cc:(.text+0x22a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.300]':
pd_config.cc:(.text+0x2da): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `PD_ConfigDestroy':
pd_config.cc:(.text+0x808e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libpthread.so
pd_config.cc:(.text+0x80b7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_ptr<decltype(nullptr), (__gnu_cxx::_Lock_policy)2>::_M_dispose()' defined in .text._ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv[_ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
pd_config.cc:(.text+0x80e9): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()' defined in .text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::what() const':
pd_config.cc:(.text._ZNK3phi7enforce13EnforceNotMet4whatEv[_ZNK3phi7enforce13EnforceNotMet4whatEv]+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `fLI::FLAGS_call_stack_level' defined in .data section in ../libpaddle_inference.a(flags.cc.o)
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::~EnforceNotMet()':
pd_config.cc:(.text._ZN3phi7enforce13EnforceNotMetD2Ev[_ZN3phi7enforce13EnforceNotMetD5Ev]+0x13): additional relocation overflows omitted from the output
libpaddle_inference_c.so: PC-relative offset overflow in PLT entry for `_ZN3phi5funcs21LaunchBroadcastKernelINS_5dtype7float16ES3_NS_3kps13DivideFunctorIS3_fEELi1ELi1ELi4EEEvRKNS_10GPUContextERKSt6vectorIPKNS_11DenseTensorESaISD_EEPSA_IPSB_SaISI_EET1_RKNS_5ArrayINS4_7details15BroadcastConfigEXT2_EEE'
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/build.make:1204: paddle/fluid/inference/capi_exp/libpaddle_inference_c.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:177249: paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The command '/bin/sh -c python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env &&     cd build-env &&     cmake .. -DWITH_PYTHON=OFF              -DWITH_GPU=ON              -DWITH_TESTING=OFF              -DWITH_INFERENCE_API_TEST=OFF              -DCMAKE_BUILD_TYPE=Release              -DCUDA_ARCH_NAME=Auto              -DON_INFER=ON              -DWITH_MKL=ON              -DWITH_TENSORRT=ON              -DWITH_ONNXRUNTIME=ON              -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` &&     make -j8' returned a non-zero code: 2

paddlepaddle_backend/paddle-lib/Dockerfile

FROM nvcr.io/nvidia/tritonserver:22.03-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80 \
    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
    && dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev

RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100

RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4

RUN python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && \
    cd build-env && \
    cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=ON \
             -DWITH_TENSORRT=ON \
             -DWITH_ONNXRUNTIME=ON \
             -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && \
    make -j8

from paddlepaddle_backend.

heliqi commented on May 29, 2024

Please put forward issues on https://github.com/PaddlePaddle/Paddle/issues about PaddlePaddle compilation.

from paddlepaddle_backend.

ERROR in bash perf_ernie.sh,SUCCESS in bash perf_resnet50_v1.5.sh about paddlepaddle_backend HOT 17 CLOSED

Comments (17)

condition1

condition2

condition3

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent