Comments (8)
@RYangData I also had the same problem, try to check ONNX Runtime providers:
import onnxruntime
providers = onnxruntime.get_available_providers()
print(providers)
If you see CUDA there, yet still have the above error, try to adjust CUDA and CuDNN version to the compatible ONNX versions. Sometimes ONNX can't access to CUDA due to dependency incompatibility, hence the problem.
Moreover, if this doesn't work, uninstall onnx
and the library for ONNX Runtime, and only install onnxruntime-gpu
first and then onnx
.
Unfortunately this is quite tricky to get right, you might want to check out this issue microsoft/onnxruntime#7748 and I wish you good luck 🍀
from optimum.
Hi @RYangData thank you, Gemma ONNX support was added in #1714 only available from the main
branch for now: pip install git+https://github.com/huggingface/optimum.git
Note: I don't know whether colab will have enough GPU memory to export the models on GPU. You could be better off exporting on CPU first (through optimum-cli export onnx --help
), and then run on GPU.
from optimum.
Thanks - this on collab GPU instance (T4) this is exactly what happened. So did the below on CPU instance:
I remove this and i think it goes to CPU as inference using takes forever.
also the eblow happens:
Thanks for your help so far!!
from optimum.
Are you installing onnxruntime-gpu
on colab? To be fair, I never tested ONNX Runtime on colab with GPU, maybe @merveenoyan that's the same issue you had?
If you are already using onnxruntime-gpu
(not onnxruntime
), it could be a CUDA version mismatch between the one in colab & the one ORT is compiled against. You could maybe try 1.16.3 https://pypi.org/project/onnxruntime-gpu/#history
from optimum.
Hey @merveenoyan, thanks for assistance.
I looked at https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to find the compatible onnxruntime-gpu. I checked colab and found CUDA 12.2 so installed onnx-runtime-gpu as:
!pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
!pip install onnx
!pip install git+https://github.com/huggingface/optimum.git
I now get "CUDAExecutionProvider" as an option when i run :
import onnxruntime
providers = onnxruntime.get_available_providers()
print(providers)
#['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
I try to load the model with:
ONNX_MODEL_DIR = "/content/drive/MyDrive/2024_01_Illum_LLM_Jan_v1.0/onnx/model_3_25_02_24_512_1999_10_epochs_torch"
ort_model = ORTModelForCausalLM.from_pretrained(
ONNX_MODEL_DIR,
provider="CUDAExecutionProvider",
)
So I run:
ort_model = ORTModelForCausalLM.from_pretrained(
ONNX_MODEL_DIR,
use_cache=False,
use_io_binding=False,
# export=True,
provider="CUDAExecutionProvider",
)
ort_model.device
#device(type='cpu')
ort_model.to("cuda")
# use_io_binding was set to False, setting it to True because it can provide a huge speedup on GPUs. It is possible to disable this feature manually by setting the use_io_binding attribute back to False <optimum.onnxruntime.modeling_decoder.ORTModelForCausalLM at 0x7c7de8faf1c0>
ort_model.device
# device(type='cuda', index=0)
Finally I try to use the model for inference:
inputs = tokenizer([example_article], return_tensors="pt").to(ort_model.device)
outputs = ort_model.generate(**inputs, max_length=1000)
In which I'm met with this error:
It seems like the model isnt being loaded onto GPU..
Any additional help will be helpful!
from optimum.
@RYangData can you try the following snippet (it uses a small model instead):
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
model = ORTModelForCausalLM.from_pretrained("fxmarty/tiny-random-GemmaForCausalLM", export=True, provider="CUDAExecutionProvider")
tokenizer = AutoTokenizer.from_pretrained("fxmarty/tiny-random-GemmaForCausalLM")
inp = tokenizer("Today I am in Paris and", return_tensors="pt").to("cuda")
res = model.generate(**inp, max_new_tokens=10)
the GPU is rightfully used there and I don't hit any error.
from optimum.
@fxmarty i get met with this atm. Thanks for the help so far though
from optimum.
Related Issues (20)
- Optimum for Jetson Orin Nano HOT 1
- 似乎不支持cpu HOT 2
- Llama-2-7b is failing with bfloat16 export with onnx HOT 1
- Implement ORTModelForZeroShotObjectDetection HOT 3
- changes to _maybe_log_save_evaluate() not reflected in optimum repo HOT 1
- Gemma Onnx suuport HOT 5
- tflite support for gemma HOT 1
- [BUG] Mistral feature extraction export to ONNX is broken HOT 5
- [BUG] Mistral feature extraction export to ONNX is broken
- [BUG] Cannot export Gemma/Mistral to ONNX/TensorRT using INT8 HOT 9
- Cannot download ONNX external data file from Hugging Face Hub HOT 4
- Cannot download from private repository on Hugging Face using optimum 1.17 HOT 3
- onnx model issue for zeroshot facebook/bart-large-mnli HOT 1
- Mixtral-8x7B-Instruct-v0.1-GPTQ AssertionError HOT 6
- OnnxSlim support
- NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from data/data_onnx/model.onnx failed:Load model data/data_onnx/model.onnx failed. File doesn't exist HOT 1
- The exported ONNX model of Qwen/Qwen1.5-0.5B-Chat does not produce a cache-enabled model. HOT 1
- Training fails because of accelerate configure settings. HOT 1
- Llava ONNX export HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optimum.