Comments (5)
No, it is not that the masks are broken. It is just no such a thing of "mask with all false
". Attention with softmax
is using really small value to remove the masked entries. In the case that everything is masked would have uniform attention score of the small value and thus everything is attent uniformly. The two lines are the same because the random initialized Transformer would usually attent uniformly. If you really need that behavior, you can use the component in NeuralAttentionlib to create new attention operator and call the Transformer constructor with that operator.
from transformers.jl.
The mask with all zeros is just an illustrative example to show the issue with the mask application logic.You can try with other kinds of masks and verify that the results are not as expected.
from transformers.jl.
There are tests that make sure the masks work as expected. Many model won't work if the CausalMask
doesn't being applied. Another guess of your observation is the uniform attention score caused by the random initialization. You can try changing the distribution of the weights and rerun the model to see if the value are still the same. If you are pretty sure the mask is broken, please give a MWE with non-trivial masks (at least not mask with all zeros, as explained above).
from transformers.jl.
There are no tests that CausalMask
actually works correctly, neither here nor in NeuralAttentionlib. In NeuralAttentionlib there are tests that CausalMask
works in returning the correct value, but this is completely separate from determining if CausalMask
actually works as intended during the transformer attention update.
Note that it is quite possible for the CausalMask
to be failing in properly masking decoder tokens yet for the huggingface examples to still work.
from transformers.jl.
Note that it is quite possible for the CausalMask to be failing in properly masking decoder tokens yet for the huggingface examples to still work.
Again, you'll need to provide an example to make this discussion concrete.
The supported models are tested against huggingface transformer with the validation code in example folder and we do observe that missing masks would have huge impact on the numerical outputs.
from transformers.jl.
Related Issues (20)
- Adding support for checkpointing HOT 12
- update NNlib and Flux compat HOT 9
- State of quantization HOT 3
- Dolly example no longer works ... HOT 19
- OWL-ViT HOT 1
- AMDGPU support HOT 1
- DistilBertModel support HOT 1
- Attempting to download CLIP yields UnderVarError `unk_token` not defined
- Performance issue HOT 1
- [Question] Possible to retrieve layer-wise activations? HOT 4
- Adding phi model HOT 5
- Please support Lux.jl HOT 7
- Example Code always produces Max Length Sequences
- how to download model weights on external drive
- Update to newer versions of dependencies
- Improve documentation and take inspiration from python package HOT 6
- please update compat bounds HOT 6
- Looking to update Transformers.jl and the associated modules HOT 1
- Storage of Downloaded Models from HuggingFace HOT 1
- Converting from integer-tokens to one-hot tokens gives different results. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.jl.