Comments (18)
there are various strategies for reducing the session initialization time. we're in the process of putting together a doc to provide guidance.
+@chilo-ms
from onnxruntime.
Hi @jywu-msft
Thanks! It is very helpful if we have such a document.
from onnxruntime.
I have read the source code and found this operation cost much time.
Could someone tell me why? Is the onnx do something optimze in the model?
from onnxruntime.
Oh I found the main place to spend time.
It's here:
it seems the onnx is loading the tensorrt ep.
How did they do it?
By reflexing the dll? or something?
Why it cost so much time?
from onnxruntime.
there are 2 areas which cost the most time during tensorrt EP initialization.
- TensorRT builder instantiation. here it loads a DLL with tensorrt kernels.
- TensorRT engine build. (this can take the most time because it is doing kernel auto-tuning, where it measures timings for different kernels/tactics.
For 2), there is an option to enable serializing a built engine to disk so that you don't need to rebuild it next time you initialize a session. the option is trt_engine_cache_enable , can you try it?
to avoid 1) is a little more complicated. if 2) is enough, then you can try that first.
@chilo-ms to add more comments.
from onnxruntime.
Hi @jywu-msft
I see 2)
And I found even I use the trt_engine_cache_enable, it still cost time, but it indeed cost shorter.
Because it generate the trt IBuilder, it cost some time, but as my knowledage, If I had an off-the-shelf trt model,I just need IRuntime
So why the onnx-trt not check if enable the trt_engine_cache_enable, if it does, do not load the IBuilder?
from onnxruntime.
And about 1)
I think it is indeed not easy.
Can you roughly describe the process for me? I'm having a bit of trouble understanding the code, so if you could that would be greatly appreciated!
from onnxruntime.
So why the onnx-trt not check if enable the trt_engine_cache_enable, if it does, do not load the IBuilder?
ORT TRT has this similar feature (starts from 1.17.0) which skips TRT builder instantiation and simply deserializes engine cache to run inference.
However, we still need an "ONNX" model to start with. So, ORT TRT helps user create the "embed engine" model which is basically an ONNX model contains only one node that wraps the engine cache.
Run this embed engine model to skip those lengthy processes such as TRT builder instantiation.
Please see below the highlighted part to know how to use ORT TRT provider options to generate/run embed engine model.
BTW, we are working on documenting the usage of embed engine model.
Also note that there are constraints using it, such as
- whole model should be TRT eligible.
- It supports dynamic shape input only when user explicit specifies the shape range meaning engine won't be rebuilt for all the inference runs.
from onnxruntime.
Hi @chilo-ms,
I try to use the trt_dumo_ep_context_model like following:
But I got error:
[ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:148 onnxruntime::ProviderOptionsParser::Parse Unknown provider option: "trt_dump_ep_context_model".
from onnxruntime.
And I try to modify the source code simiply, I comments the filds about IBuilder, INetworkDefinition, IParser.
I found it could still work.
This is a simply version, I know.
I will continue to debug if this way will cause something errors, also, I want to know if I have a tensorrt model in trt_engine_cache_path, and enable the trt_engine_cache_enable, I do not initialize IBuilder, is this way correct?
from onnxruntime.
And I try to modify the source code simiply, I comments the filds about IBuilder, INetworkDefinition, IParser.
I found it could still work.
This is a simply version, I know.
I will continue to debug if this way will cause something errors, also, I want to know if I have a tensorrt model in trt_engine_cache_path, and enable the trt_engine_cache_enable, I do not initialize IBuilder, is this way correct?
I think if I comment those fields about IBuilder, INetworkDefinition, IParser, so that the outside could not get the associated object, it also could prove that the outside does not use those objects, right?
from onnxruntime.
Hi @chilo-ms,
I try to use the trt_dumo_ep_context_model like following:
But I got error: [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:148 onnxruntime::ProviderOptionsParser::Parse Unknown provider option: "trt_dump_ep_context_model".
What ORT version are you using?
Please use 1.17.0 or above or main branch.
from onnxruntime.
And I try to modify the source code simiply, I comments the filds about IBuilder, INetworkDefinition, IParser.
I found it could still work.
This is a simply version, I know.
I will continue to debug if this way will cause something errors, also, I want to know if I have a tensorrt model in trt_engine_cache_path, and enable the trt_engine_cache_enable, I do not initialize IBuilder, is this way correct?
Your idea is basically right.
Please see the ORT TRT code (here and here) in main branch.
In additions to the code path (in EP Compile) you found that it involves builder instantiation, there is also builder instantization in the EP GetCapability. So that's why we need the "Embed Engine" model to skip builder instantization.
from onnxruntime.
Hi @chilo-ms
Thanks for your reply very much!
I will try to remove the process of generating the IBuilder if it already genearta model.
And about the EP GetCapabnility, I also have a question, and here is the link:
#20029
"So that's why we need the "Embed Engine" model to skip builder instantization."I do not know why the EP GetCapability method need to genearte IBuilder Object, as my knowledage, the IBuilder is used to generate some trt objects, such as the INetworkDefinition.
And if I already have a trt model from onnx, could I skip this step in process?
from onnxruntime.
Hi @chilo-ms,
I try to use the trt_dumo_ep_context_model like following:
But I got error: [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:148 onnxruntime::ProviderOptionsParser::Parse Unknown provider option: "trt_dump_ep_context_model".What ORT version are you using? Please use 1.17.0 or above or main branch.
Yes my version is 1.16.3.
Because at first, I download your 1.17.0 or 1.17.3 packages, there is no dll in it.
So I use the 1.16.3.
Why the newest packages in nuget don't have dll?
Also I will use the newest code to build the dll.
from onnxruntime.
use the 1.17.1 nuget package.
there are multiple packages.
i.e. Microsoft.ML.Onnxruntime.Gpu depends on Microsoft.ML.OnnxRuntime.Gpu.Windows
and in that package are the onnxruntime .dll's
from onnxruntime.
Hi @jywu-msft
I try to use the 1.17.1 Microsoft.ML.Onnxruntime.Gpu depends on Microsoft.ML.OnnxRuntime.Gpu.Windows.
I check the structure of 1.17.1 package, I found that the directory was "buildTransitive" not "build", it cause that the vs could not load the props,targets files.
I feel confused, am I missing something?
from onnxruntime.
"So that's why we need the "Embed Engine" model to skip builder instantization."I do not know why the EP GetCapability method need to genearte IBuilder Object, as my knowledage, the IBuilder is used to generate some trt objects, such as the INetworkDefinition.
And if I already have a trt model from onnx, could I skip this step in process?
Because TRT parser needs TRT networks which depends on TRT builder.
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L2082
If you have TRT engine cache, you still need the embed engine model to skip the process for now.
Please see the embed engine model (EPContext node model) to skip the whole GetCapability.
Here are two PRs which introduces embed engine model feature.
#18217
#19154
But, we are working on another PR that can skip GetCapability without using the embed engine model but simply with engine cahce. (This is the exact feature that you want)
Also, I'm working on the document for users to better understand this feature.
from onnxruntime.
Related Issues (20)
- [Build] Float16_t and BFloat16_t compile error
- yolov8 onnxruntime error -1073740791 (0xC0000409) HOT 2
- [Documentation Request]
- terminate called after throwing an instance of 'Ort::Exception' what(): Invalid input name: ��veSU
- [Build] Error when load pf16 model HOT 2
- [Build] HOT 1
- [Documentation Request]
- Microsoft.ML.OnnxRuntime.DirectML causes VS 2022 project to have empty project properties list
- DirectML Exception 80070057 "The parameter is incorrect" HOT 1
- [Feature request] Support bfloat16/float8 inputs in `session.run()` HOT 2
- Getting error "Dilation not supported for AutoPadType::SAME_UPPER or AutoPadType::SAME_LOWER" HOT 1
- Caused by: java.lang.UnsatisfiedLinkError: /tmp/onnxruntime-java8147263448488960156/libonnxruntime.so: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /tmp/onnxruntime-java8147263448488960156/libonnxruntime.so) HOT 1
- [Build] LoadLibrary failed with error 126 HOT 1
- [Documentation Request]
- windows系统,Java中使用onnxruntime进行压测,cpu飙升很快,一直100% HOT 3
- Load External Data with AddExternalInitializers and output nan HOT 7
- [Build] Cross compilation for Nvidia Jetson fails at compiler check due to -fcf-protection flag HOT 1
- FAIL : LoadLibrary failed with error 126 "" when trying to load "onnxruntime_providers_cuda.dll"` HOT 2
- [Performance] Inference takes longer when session.Run() is being ran on different threads and each thread has its own session HOT 2
- Ensures <img> elements have alternate text or a role of none or presentation (img[width="30%"][height="30%"]): A11y_WCP URLs - ONNX Runtime_AI4W
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxruntime.