Comments (6)
Hello, give the following commands a try:
cd onnxruntime/onnxruntime/python/tools/transformers/
python3 optimizer.py --input /path/to/<filename>.onnx --output /path/to/<filename>.onnx --model_type gpt2 --num_heads <number of attention heads> --hidden_size <attention hidden size> --use_external_data_format --opt_level 0
You can also try the convert_to_onnx tool for Llama, which will convert & optimize in the same script.
Thanks @kunal-vaishnavi for the suggestions :)
from onnxruntime.
Could you provide the stack trace for protobuf serialization error?
from onnxruntime.
Could you provide the stack trace for protobuf serialization error?
Traceback (most recent call last):
File "/scratch/tuhinp/onnxruntime/onnxruntime/python/tools/transformers/optimizer.py", line 610, in
main()
File "/scratch/tuhinp/onnxruntime/onnxruntime/python/tools/transformers/optimizer.py", line 573, in main
optimizer = optimize_model(
File "/scratch/tuhinp/onnxruntime/onnxruntime/python/tools/transformers/optimizer.py", line 379, in optimize_model
temp_model_path = optimize_by_onnxruntime(
File "/scratch/tuhinp/onnxruntime/onnxruntime/python/tools/transformers/optimizer.py", line 204, in optimize_by_onnxruntime
onnxruntime.InferenceSession(onnx_model, sess_options, providers=providers, **kwargs)
File "/scratch/tuhinp/miniconda3/envs/x/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/scratch/tuhinp/miniconda3/envs/x/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 463, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Protobuf serialization failed.
from onnxruntime.
Thanks @carzh and @kunal-vaishnavi for the suggestion. This command works for codellama.
I tried to optimize Qwen/Qwen1.5-7B-Chat onnx model with the same optimizer.py script but getting "Segmentation fault".
I used the same command as mentioned above
python3 optimizer.py --input /path/to/<filename>.onnx --output /path/to/<filename>.onnx --model_type gpt2 --num_heads <number of attention heads> --hidden_size <attention hidden size> --use_external_data_format --opt_level 0
Can you please help me on this?
from onnxruntime.
Can you clone ORT from the main branch and try again? I can run the ORT transformer optimizer successfully with the following steps.
git clone https://github.com/microsoft/onnxruntime
cd onnxruntime/onnxruntime/python/tools/transformers
optimum-cli export onnx --model Qwen/Qwen1.5-7B-Chat ./qwen1.5 --no-post-process
mkdir -p ./qwen1.5-opt
python3 optimizer.py --input ./qwen1.5/model.onnx --output ./qwen1.5-opt/model_opt.onnx --model_type gpt2 --num_heads 32 --hidden_size 4096 --use_external_data_format --opt_level 0
from onnxruntime.
You can also run onnx.check_model
to get more information on the nature of protobuf issue.
from onnxruntime.
Related Issues (20)
- LayerNormalization doesnt' work as expected on Mac HOT 1
- User-provided session logging function is not used for every log HOT 6
- NOT_IMPLEMENTED : Could not find an implementation for ReduceProd(18) node with name 'p2o.ReduceProd.0'
- [Performance] Quadratic behaviour in list operations with SequenceInsert in onnx
- [Build] [CANN] Failed to build CANN provider with training and Python bindings HOT 1
- ONNX Runtime doesn't support the graph optimization of vision-encoder-decoder yet HOT 1
- cannot resolve operator 'HardSwish' with opsets: ai.onnx v9 [Web] HOT 1
- cuda's FusedConv is not support Sigmod
- [Mobile] Subgraphs duplicate initializers in RAM during execution HOT 1
- [Web] The YOLOv8 segmentation model with batching option is not runing on the GPU ? HOT 2
- [Performance] Regression observed when using CUDA execution provider HOT 15
- Onnxruntime-directml 1.18.0 broken multithreading inference session HOT 3
- [Build] 0.18.0 release breaks Hummingbird build pipeline HOT 5
- Windows ARM64 & X64 CLIP Image Encoder different results HOT 2
- .. HOT 1
- [Feature Request] Get device ids via get_available_device_ids()
- [Training] IR version incompatibility in artifact generation for on-device training HOT 4
- npm install provides error on onnxruntime: 404 URL not found. HOT 2
- [Build] GCC Linker can't find re2 HOT 3
- [Web] `executionProviders` chain for `webnn` fallback does not work on init error HOT 28
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxruntime.