Comments (7)
huggingface/transformers#28802 @hackyon @fxmarty
from optimum.
Yea, it looks like BetterTransformer might be expecting a different shape for the attention mask.
Can you try to use "eager" attention implementation with BetterTransformer to see if it fixes things?
model = AutoModel.from_pretrained(model_name, attn_implementation="eager")
from optimum.
Thanks for the fast response.
Eager works, but its a breaking change if you dont add it!
Do you think there is an idea to patch bettertransformers!
from optimum.
Yea, unfortunately I think there might be cause to put BetterTransformer optimizations directly into Transformer, and deprecate BetterTransformer support for BERT. This means adding BERT here:
It might be better you for to just skip that BetterTransformer conversion.
You mentioned that BetterTransfomer is still 1.5x faster, where did you get that metric?
from optimum.
@hackyon Might be unusual, but should give a pretty good throughput idea end-to-end.
pip install infinity_emb[all]==0.0.42
git clone github.com/michaelfeil/infinity
~/infinity/libs/infinity_emb$ make benchmark_embed
Bettertransformer (eager -> torch._transformer_encoder_fwd)
infinity_emb v2 --model-id BAAI/bge-large-en-v1.5
-> Results: 35-37 sentences per second. (over 2 runs)
infinity_emb v2 --model-id BAAI/bge-small-en-v1.5
-> Results: 263-266 sentences per second. (over 2 runs)
SDPA and w/o Bettertransformer
infinity_emb v2 --model-id BAAI/bge-large-en-v1.5 --no-bettertransformer
-> Results 32-32 sentences per second (2 runs)
infinity_emb v2 --model-id BAAI/bge-large-en-v1.5 --no-bettertransformer
-> Results 188-196 sentences per second (2 runs)
Result:
Please don't remove the option to use Bettertransformers! I do rely on the patch in BetterTransformers with Bert.
But, regardless thanks for your PR in transformers - it might save the world more energy that you will consume in your personal lifetime (gpu hours excluded), no kidding.
from optimum.
Related Issues (20)
- Add support for porting CLIPVisionModelWithProjection
- onnx optimum ORTOptimizer inference runs slower than setfit.export_onnx runtime.InferenceSession inference HOT 1
- Trying to export a cohere model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as `custom_onnx_configs`.
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices HOT 2
- Support for `torch.export.export` HOT 2
- Lift upper version limit of transformers for habana HOT 4
- Add support for export SigLIP models HOT 8
- I had a problem mimicking this site: https://community.amd.com/t5/ai/developer-blog-build-a-chatbot-with-ryzen-ai-processors/ba-p/680693
- TypeError: QDQQuantizer.__init__() got an unexpected keyword argument 'static'
- Custom MT5 model ONNX export not able to run
- Add feature to export CLIPVisionModel from transformers to onnx
- Add feature to export CLIPVisionModel from transformers to onnx
- exporting ORTModelForVision2Seq doesn't work correctly on Pytorch 1.11 HOT 4
- Add ORT support for the StableDiffusionControlNetPipeline.
- Could you provide the official onnx model of Qwen-VL-Chat(-Int4)?
- `merge_decoders` mixes up `decoder_model` and `decoder_model_with_past`
- ONNX export support for 4-bit quantized models HOT 1
- Support for phi3-v Vision Model HOT 1
- Quantization by optimum-cli ignores 'use_external_data_format' flag in ORTConfig and fails to quantize models >2GB
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optimum.