Comments (5)
@jamesharrisivi When using BetterTransformer, you will already get a FA2 style attention if your compute capability is high enough. Excited to see the the F.sdpa getting merged in attention - finally a hassle free implementation, potentially compatible for masking in training.
from optimum.
@jamesharrisivi SDPA (that can dispatch to FA2) support for BERT is being merged natively in Transformers: huggingface/transformers#28802
This would likely make it easy to use SDPA as well for Roberta natively there.
Going forward, we are privileging direct support of SDPA/torch compile in Transformers and will not be adding new features to BetterTransformer. You can read more here: https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one
from optimum.
Ok thank you.
I see the current implementation uses https://github.com/pytorch/pytorch/blob/0f5e24bda9450a89ba56d2fdd471f56d97fe4546/aten/src/ATen/native/transformers/transformer.cpp#L75
But this doesn't use flash attention or does it?
from optimum.
It does, it uses https://github.com/pytorch/pytorch/blob/0f5e24bda9450a89ba56d2fdd471f56d97fe4546/aten/src/ATen/native/transformers/cuda/attention.cu#L467, see https://github.com/pytorch/pytorch/blob/232f09e0ea7b7d9ac9752bd366aa2f4c372fd9d9/aten/src/ATen/native/native_functions.yaml#L14505
This one is able to dispatch to flash attention if the attn_mask argument is None (mask
in the C++ code)
You could check that with pytorch profiler or for example the decorator https://pytorch.org/docs/master/backends.html#torch.backends.cuda.sdp_kernel
from optimum.
@michaelfeil Yes, masking works for training. We simply drop the mask when the is_causal
argument of SDPA can be used: https://github.com/huggingface/transformers/blob/83e366bfd49708796e2c6461d3988d23d008502a/src/transformers/modeling_attn_mask_utils.py#L375-L380
from optimum.
Related Issues (20)
- GPTQ Quantization Need `use_marlin` HOT 1
- Latest Optimum library does not compatible with latest Transformers HOT 1
- Convert to onnx missing safety_checker
- Whisper-large-v3 transcript is trimmed HOT 4
- UMT5 & ByT5 Support
- Unexpected arguments `trust_remote_code` when exporting model to onnx with option `--library sentence_transformers` HOT 2
- attributeError: 'str' object has no attribute 'impl' HOT 3
- BetterTransformer support for VisionEncoderDecoder models like TrOCR
- Issue converting moss-moon-003-sft-int4 model to ONNX format
- [GPTQQuantizer] How to use multi-GPU for GPTQQuantizer? HOT 2
- SentenceTransformer to tflite export failure HOT 2
- Error while optimizing seq2seq model using optimum HOT 1
- Correct example to use TensorRT? HOT 2
- RuntimeError: Failed to import optimum.onnxruntime.modeling_ort because of the following error HOT 2
- qwen2 onnx model attention_mask && output_past_kv shape is wrong
- ORTModelForCustomTasks lacks attributes HOT 1
- Support Transformers v4.44 HOT 1
- AttributeError: FLOAT8E4M3FN HOT 3
- BetterTransformer for florence2 HOT 2
- NameError: name '_SENTENCE_TRANSFORMERS_TASKS_TO_MODEL_LOADERS' is not defined HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optimum.