Comments (4)
Hi @ingo-m, thank you for the report.
Locally, how did you install onnxruntime-gpu
? The wheel hosted on PyPI index is for CUDA 11.8. https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html gives instructions on how to install ORT CUDA EP for CUDA 12.1.
Not sure it will work, but you can also try export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:/path/to/miniconda3/envs/py-onnx/lib/python3.10/site-packages/nvidia/cublas/lib
Regarding the
RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
I'm not sure yet, will investigate.
from optimum.
@ingo-m I can not reproduce the issue with:
import torch
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_name = "bigscience/bloomz-560m"
device_name = "cuda"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
ort_model = ORTModelForCausalLM.from_pretrained(
base_model_name,
use_io_binding=True,
export=True,
provider="CUDAExecutionProvider",
)
prompt = "i like pancakes"
inference_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(
device_name
)
# Try to generate a prediction (fails).
output_ids = ort_model.generate(
input_ids=inference_ids["input_ids"],
attention_mask=inference_ids["attention_mask"],
max_new_tokens=512,
temperature=1e-8,
do_sample=True,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
with CUDA 11.8, torch==2.1.2+cu118, optimum==1.16.2, onnxruntime-gpu==1.17.0, onnx==1.15.0.
from optimum.
@fxmarty thanks for looking into it.
Locally, I installed directly from PyPI (with pipenv). In other words, I did not follow the specific instructions for CUDA 12, so that explains the problem. (However, it's strange that I had no problems with CUDA 12 when I was still using the older version optimum[onnxruntime-gpu]==1.9.1
🤔).
On google colab, !nvidia-smi
reveals that it's using CUDA 12 as well (this is a free tier colab instance):
Mon Feb 5 12:59:37 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 61C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
As you said, it looks like CUDA 12 is the culprit.
from optimum.
Regarding this error:
RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
Perhaps the ORTModelForCausalLM
model was not placed on the GPU for inference (because the CUDAExecutionProvider
didn't work because of the issue with CUDA 12), but the tokens were placed on the GPU, and then the error occurs because model and tokens are not on the same device?
from optimum.
Related Issues (20)
- model.ByteSize() is negative when converting microsoft/phi2 model HOT 3
- Converted LayoutLM ONNX model - Required input `bbox` missing from input feed `['input_ids', 'attention_mask', 'token_type_ids']` HOT 3
- Mixtral quantization hard-freezes Python HOT 4
- Xenova mbart-large-50-many-to-many-mmt conversion is not translating. HOT 2
- Clarity on the convert.py for a model to ONNX.py.. documentation issue HOT 2
- TextStreamer not supported for ORTCausalLM? HOT 1
- Whisper-v3 ValueError: Transformers now supports natively BetterTransformer optimizations
- Bitsandbytes integration in ORTModelForCausalLM.from_pretrained()
- the onnx in optimum can not accelerate the speed of batch feature extraction. HOT 3
- task text-classification to be supported in the ONNX export for bart HOT 2
- Assistant model in generate method
- GPTQQuantizer hard coded the device to 0 HOT 2
- ONNX support for Mixtral text-classification HOT 3
- export `audio-classification` model based on OpenAI whisper model to other formats HOT 1
- KeyError: 'last_hidden_state' HOT 1
- Non-pinned Transformers Version Incompatability HOT 2
- ORTStableDiffusionXLPipeline sdxl onnx fp16 unet load error
- optimum-cli export neuron failed for 'meta-llama/Llama-2-13b-hf' HOT 2
- Issue in Quantizer decoeder_model_merged.onnx of MT5 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optimum.