System Info <div class="highlight highlight-source-shell notranslate position-re

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Llama-2-7b is failing with bfloat16 export with onnx about optimum HOT 1 OPEN

anilmartha commented on July 24, 2024

Llama-2-7b is failing with bfloat16 export with onnx

from optimum.

Comments (1)

fxmarty commented on July 24, 2024

@anilmartha thank you for the report, this is unexpected. I did not add a full CI with bf16 but should probably add one with the most used models.

It appears that PyTorch 2.2.1 does not support scaled_dot_product_attention export in BF16, see https://github.com/pytorch/pytorch/blob/v2.2.1/torch/onnx/symbolic_opset14.py#L191-L199

This is fixed on main: https://github.com/pytorch/pytorch/blob/f4cf25bb24be735b2502ae13f290017992c2fac8/torch/onnx/symbolic_opset14.py#L194 & pytorch/pytorch#117878

So in PyTorch 2.3, this will be possible.

We could add an optional option in Optimum to choose that the export is done with the manual attention implementation, not torch.nn.functional.scaled_dot_product_attention. Would that help you?

As an alternative, you could downgrade to torch==2.1.0 (for which SDPA is not picked in Transformers).

from optimum.

Llama-2-7b is failing with bfloat16 export with onnx about optimum HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent