System Info <div class="highlight highlight-source-shell notranslate position-re

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="21

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bettertransformer] `Transformers 4.41.0 (torch.SDPA-Bert)` breaks bettertransformers Bert, but works in `Transformers 4.40.2` about optimum HOT 7 OPEN

michaelfeil commented on July 3, 2024

[Bettertransformer] `Transformers 4.41.0 (torch.SDPA-Bert)` breaks bettertransformers Bert, but works in `Transformers 4.40.2`

from optimum.

Comments (7)

michaelfeil commented on July 3, 2024

huggingface/transformers#28802 @hackyon @fxmarty

from optimum.

hackyon commented on July 3, 2024

Yea, it looks like BetterTransformer might be expecting a different shape for the attention mask.

Can you try to use "eager" attention implementation with BetterTransformer to see if it fixes things?
model = AutoModel.from_pretrained(model_name, attn_implementation="eager")

from optimum.

michaelfeil commented on July 3, 2024

Thanks for the fast response.

Eager works, but its a breaking change if you dont add it!
Do you think there is an idea to patch bettertransformers!

from optimum.

hackyon commented on July 3, 2024

Yea, unfortunately I think there might be cause to put BetterTransformer optimizations directly into Transformer, and deprecate BetterTransformer support for BERT. This means adding BERT here:

optimum/optimum/bettertransformer/transformation.py

Line 210 in db51410

if hf_config.model_type in ["falcon", "gpt_bigcode", "llama", "whisper"]:

It might be better you for to just skip that BetterTransformer conversion.

You mentioned that BetterTransfomer is still 1.5x faster, where did you get that metric?

from optimum.

michaelfeil commented on July 3, 2024

@hackyon Might be unusual, but should give a pretty good throughput idea end-to-end.

https://github.com/michaelfeil/infinity/blob/d325d9ad66ecc8732620c92f11d9119efb1f1afa/libs/infinity_emb/Makefile#L51C1-L56C30

pip install infinity_emb[all]==0.0.42

git clone github.com/michaelfeil/infinity
~/infinity/libs/infinity_emb$ make benchmark_embed

Bettertransformer (eager -> torch._transformer_encoder_fwd)

infinity_emb v2 --model-id BAAI/bge-large-en-v1.5 -> Results: 35-37 sentences per second. (over 2 runs)
infinity_emb v2 --model-id BAAI/bge-small-en-v1.5 -> Results: 263-266 sentences per second. (over 2 runs)

SDPA and w/o Bettertransformer

infinity_emb v2 --model-id BAAI/bge-large-en-v1.5 --no-bettertransformer -> Results 32-32 sentences per second (2 runs)
infinity_emb v2 --model-id BAAI/bge-large-en-v1.5 --no-bettertransformer -> Results 188-196 sentences per second (2 runs)

Result:

Please don't remove the option to use Bettertransformers! I do rely on the patch in BetterTransformers with Bert.
But, regardless thanks for your PR in transformers - it might save the world more energy that you will consume in your personal lifetime (gpu hours excluded), no kidding.

from optimum.

Recommend Projects

[Bettertransformer] `Transformers 4.41.0 (torch.SDPA-Bert)` breaks bettertransformers Bert, but works in `Transformers 4.40.2` about optimum HOT 7 OPEN

Comments (7)

Bettertransformer (eager -> torch._transformer_encoder_fwd)

SDPA and w/o Bettertransformer

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent