Giter Club home page Giter Club logo

Comments (21)

doloresgarcia avatar doloresgarcia commented on June 25, 2024 1

Hi @justinchuby, I am wondering if this could be related to the intro of new transformations 124160. Do you think it could be the case? (sorry to bother you again)

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024 1

I have optimized the model and now I can start the inference session and run it. Thank you @yuslepukhin and @justinchuby :)

Awesome! Curious what was done?

The graph had many constant that were created by the model inside functions, I initialized those with the model instead. Also there were some conversion errors like:
x[...,index_list] is not converted well and has to be modified to use torch.index_select.
However, operations like einsum do not seem to be dynamic with input shape (this is for a GNN like architecture) so that is problematic.

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024 1

Hello @doloresgarcia , I try to convert, save, load and run a custom PyTorch model via ONNX runtime. However, as in your case, the run gets stuck and I get no clear error messages besides UnsqueezeElimination cannot remove node _inlfunc_aten_mean_dim_n1 and UnsqueezeElimination cannot remove node _inlfunc_aten_mean_dim_token_14647_n1. If I turn off the optimization, I get no error message and the process gets killed after a while. Can you give some guidance on what you did exactly besides the torch.index_select to optimize the model for onnxruntime to work? That would be of great help! Thank you.

I am using the torch.onnx.dynamo_export which seems to support more complex models than the torch.onnx.export
Also, disabling the graph optimization as you say
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
but my error messages appeared even after disabling this so I guess it must be something different.

from onnxruntime.

phierhager avatar phierhager commented on June 25, 2024 1

Hello @doloresgarcia , I try to convert, save, load and run a custom PyTorch model via ONNX runtime. However, as in your case, the run gets stuck and I get no clear error messages besides UnsqueezeElimination cannot remove node _inlfunc_aten_mean_dim_n1 and UnsqueezeElimination cannot remove node _inlfunc_aten_mean_dim_token_14647_n1. If I turn off the optimization, I get no error message and the process gets killed after a while. Can you give some guidance on what you did exactly besides the torch.index_select to optimize the model for onnxruntime to work? That would be of great help! Thank you.

I am using the torch.onnx.dynamo_export which seems to support more complex models than the torch.onnx.export Also, disabling the graph optimization as you say so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL but my error messages appeared even after disabling this so I guess it must be something different.

Okay, thank you for the quick reply.

from onnxruntime.

justinchuby avatar justinchuby commented on June 25, 2024

Hi @justinchuby, I am wondering if this could be related to the intro of new transformations 124160. Do you think it could be the case? (sorry to bother you again)

I suspect there maybe another cause. Could you test with the latest ONNX Runtime release to see if it is still an issue?

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

Thanks for checking @justinchuby! I tested now with 1.17.3 and it is still the case :/

from onnxruntime.

justinchuby avatar justinchuby commented on June 25, 2024

Is the model open source? Could you share source code to it?

from onnxruntime.

justinchuby avatar justinchuby commented on June 25, 2024

Please try the following:

set the env var TORCHLIB_EXPERIMENTAL_PREFER_TRACING=1 before running the pytorch export script to get the model, then inline the model with

model_proto = onnx.load("model.onnx")
inlined = onnx.inliner.inline_local_functions(model_proto)
onnx.save(inlined, "model_inlined.onnx")

not guaranteed to succeed but curious if that would help.

from onnxruntime.

yuslepukhin avatar yuslepukhin commented on June 25, 2024

Try different optimization levels and see if this affects the outcome.

from onnxruntime.

justinchuby avatar justinchuby commented on June 25, 2024

Some observations: the model has ~350k nodes

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

Is the model open source? Could you share source code to it?
The model is an adaptation of the gatr (just removing the ._VF einsums so that it is onnx exportable)
https://github.com/Qualcomm-AI-research/geometric-algebra-transformer/blob/main/gatr/nets/gatr.py

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

Please try the following:

set the env var TORCHLIB_EXPERIMENTAL_PREFER_TRACING=1 before running the pytorch export script to get the model, then inline the model with

model_proto = onnx.load("model.onnx")
inlined = onnx.inliner.inline_local_functions(model_proto)
onnx.save(inlined, "model_inlined.onnx")

not guaranteed to succeed but curious if that would help.

This code runs, and returns the inlined model. The InferenceSession log now shows an error:

2024-04-18 23:22:23.255135644 [W:onnxruntime:, constant_folding.cc:212 ApplyImpl] Could not find a CPU kernel and hence can't constant fold CastLike node 'n1__11634_2008'
2024-04-18 23:22:23.255240965 [W:onnxruntime:, constant_folding.cc:212 ApplyImpl] Could not find a CPU kernel and hence can't constant fold CastLike node 'n1__11602_1985'
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (n0__11868) Op (Mul) [ShapeInferenceError] Incompatible dimensions

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

Some observations: the model has ~350k nodes

Would just this make the inference session take too long to start or not start at all?

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

Try different optimization levels and see if this affects the outcome.

Thanks for the reply @yuslepukhin
with ort.GraphOptimizationLevel.ORT_DISABLE_ALL it initializes the session (after 3h)
Then there is also a bug on shapes
Status Message: updates tensor should have shape equal to indices.shape[:-1] + data.shape[indices.shape[-1]:]. updates shape: {}, indices shape: {3,1}, data shape: {4,4}

What is the correct way to debug this? I have no information about where to look for this operation in the original code. I am assuming this is a conversion error.

from onnxruntime.

yuslepukhin avatar yuslepukhin commented on June 25, 2024

Some observations: the model has ~350k nodes

Would just this make the inference session take too long to start or not start at all?

The model inlining takes a lot of time. Stand by.
How exactly the conversion was performed?

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

I have optimized the model and now I can start the inference session and run it. Thank you @yuslepukhin and @justinchuby :)

from onnxruntime.

justinchuby avatar justinchuby commented on June 25, 2024

I have optimized the model and now I can start the inference session and run it. Thank you @yuslepukhin and @justinchuby :)

Awesome! Curious what was done?

from onnxruntime.

yuslepukhin avatar yuslepukhin commented on June 25, 2024

The initial model fails the check from ONNX:

This is from the ORT Optimized model (inlining only)

Graph must be in single static assignment (SSA) form, however '_inlfunc_IsScalar_tmp' has been used as output names multiple times.

==> Context: Bad node spec for node. Name: _inlfunc_aten_mean_dim_n1 OpType: If

from onnxruntime.

justinchuby avatar justinchuby commented on June 25, 2024

initialized those with the model instead

Do you mean turning Constant operators into graph initializers?

einsum do not seem to be dynamic with input shape

Could you share a concrete example?

from onnxruntime.

phierhager avatar phierhager commented on June 25, 2024

Hello @doloresgarcia ,
I try to convert, save, load and run a custom PyTorch model via ONNX runtime. However, as in your case, the run gets stuck and I get no clear error messages besides UnsqueezeElimination cannot remove node _inlfunc_aten_mean_dim_n1 and UnsqueezeElimination cannot remove node _inlfunc_aten_mean_dim_token_14647_n1. If I turn off the optimization, I get no error message and the process gets killed after a while.
Can you give some guidance on what you did exactly besides the torch.index_select to optimize the model for onnxruntime to work? That would be of great help!
Thank you.

from onnxruntime.

doloresgarcia avatar doloresgarcia commented on June 25, 2024

initialized those with the model instead

Do you mean turning Constant operators into graph initializers?

einsum do not seem to be dynamic with input shape

Could you share a concrete example?

I mean just matrices that were created inside functions that were used in many layers. So a solution was to add those as arguments of the main model class and pass them to those layers. This reduced the time to start inference and now it works quickly.

from onnxruntime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.