Describe the issue import onnxruntime as nxrun so = nxrun.Sess

[Build] Trying to use TensorrtExecutionProvider. Model not loading about onnxruntime HOT 8 OPEN

HShamimGEHC commented on May 23, 2024

[Build] Trying to use TensorrtExecutionProvider. Model not loading

from onnxruntime.

Comments (8)

jywu-msft commented on May 23, 2024

is it really "stuck"? how long did you wait? there are some initialization tasks that are costly in TensorRT. depending on the model, it can take a long time.
+@chilo-ms

from onnxruntime.

HShamimGEHC commented on May 23, 2024

is it really "stuck"? how long did you wait? there are some initialization tasks that are costly in TensorRT. depending on the model, it can take a long time. +@chilo-ms

I guess I didnt wait long enough. I was exiting after ~3min. I waited longer this time.

Every 5 minutes, I get a [41] [CRITICAL] WORKER TIMEOUT (pid:145) and then immedaitely after that it reboots a worker [165] [INFO] Booting worker with pid: 165, trying to load the model again.

from onnxruntime.

chilo-ms commented on May 23, 2024

Since the whole model is supported by TRT, could you help try whether it can be run by trtexec?

from onnxruntime.

HShamimGEHC commented on May 23, 2024

Since the whole model is supported by TRT, could you help try whether it can be run by trtexec?

Yes, I am successfully able to run it with trtexec. Over the weekend, I realized that I was getting a timeout due there being a dependency with Flask loading the model (timeout was set to 5 minutes). I removed that dependency and the model loads in ~25-30 minutes.

I do have a follow-up in regard to inference time. The time it takes for the converted model to inference an image on a CPU (using CPUExecutionProvider) vs GPU (using TensorrtExecution and CUDAExecutionProvider) is 5-10ms faster (10ms on CPU vs 15-20ms on GPU) Any advice on how to optimize it for the GPU?

from onnxruntime.

chilo-ms commented on May 23, 2024

I do have a follow-up in regard to inference time. The time it takes for the converted model to inference an image on a CPU (using CPUExecutionProvider) vs GPU (using TensorrtExecution and CUDAExecutionProvider) is 5-10ms faster (10ms on CPU vs 15-20ms on GPU) Any advice on how to optimize it for the GPU?

I assume the time you measured is compute time (GPU or CPU) not end-to-end latency, right?
You can try cuda graph ( by using trt_cuda_graph_enable). Please remember to use IOBinding as this is one of the constraints in using cuda graph.

from onnxruntime.

HShamimGEHC commented on May 23, 2024

I do have a follow-up in regard to inference time. The time it takes for the converted model to inference an image on a CPU (using CPUExecutionProvider) vs GPU (using TensorrtExecution and CUDAExecutionProvider) is 5-10ms faster (10ms on CPU vs 15-20ms on GPU) Any advice on how to optimize it for the GPU?

I assume the time you measured is compute time (GPU or CPU) not end-to-end latency, right? You can try cuda graph ( by using trt_cuda_graph_enable). Please remember to use IOBinding as this is one of the constraints in using cuda graph.

Correct I believe so. I am doing the following:

t0 = datetime.datetime.now()
session.run()
t1 = datetime.datetime.now()
InferenceTime = (t1-t0).total_seconds()

from onnxruntime.

chilo-ms commented on May 23, 2024

I do have a follow-up in regard to inference time. The time it takes for the converted model to inference an image on a CPU (using CPUExecutionProvider) vs GPU (using TensorrtExecution and CUDAExecutionProvider) is 5-10ms faster (10ms on CPU vs 15-20ms on GPU) Any advice on how to optimize it for the GPU?

I assume the time you measured is compute time (GPU or CPU) not end-to-end latency, right? You can try cuda graph ( by using trt_cuda_graph_enable). Please remember to use IOBinding as this is one of the constraints in using cuda graph.

Correct I believe so. I am doing the following:

t0 = datetime.datetime.now() session.run() t1 = datetime.datetime.now() InferenceTime = (t1-t0).total_seconds()

Are you using IOBinding?
If not, then session.run() might include host-to-device/device-to-host copies for input/output.
Please see, Inference::Run and ExecuteGraphImpl.

You can try IOBinding and make sure input/output are on GPU memory and test it again.

from onnxruntime.

github-actions commented on May 23, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

from onnxruntime.

[Build] Trying to use TensorrtExecutionProvider. Model not loading about onnxruntime HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent