Comments (2)
Hi @dcaballe, do you interested on taking a look at it?
from iree.
(Sorry, catching up after break)
I think the problem could be related to the fact that we are reducing the outer dimension here:
%7 = vector.contract {indexing_maps = [#map, #map, #map1], iterator_types = ["reduction", "parallel"], kind = #vector.kind<add , %6, %arg1 : vector<1x32xf32>, vector<1x32xf32> into vector<32xf32>
and the outer-product strategy expects a very specific "matmul-like" contraction. If the contraction doesn't align with that it will convert it to that "matmul-like" form by changing the layout of the inputs and we may end up vectorizing a dimension that we shouldn't (see scalar code).
I would need to think a bit more about this but perhaps we should use a different strategy only when the contraction op is not suitable to be represented with an outer product.
from iree.
Related Issues (20)
- [flow] AnnotateDispatches does not spell out linalg ops in LinalgExt fusion
- Compilation error for SHARK-TestSuite (onnx/models/RAFT_vaiq_int8)
- [LLVMCPU] Bad packing codegen with different `outer_dims_perm` HOT 2
- Missing propagation for `unpack -> collapse_shape` to `collpase_shape -> unpack`. HOT 4
- [Attention] Generalize Attention Tiling and Decomposition
- Move DecomposeSoftmax to GlobalOptimization. HOT 28
- Move affinity interfaces to their own dialect.
- Missing f32->bf16 demotion support for the targets of data-tiling ops HOT 2
- Winograd transform generates bad memory accessing pattern for CPU
- [CPU] Big vector is generated and unrolled without warning/erroring
- Missing fusion for winograd transform ops with their consumers
- [CPU] Explore ukernels for winograd transform ops
- [Flow] Enable reshape propagation through tensor.pad
- (ROCM) Failed to distribute matmul in sdxl-turbo unet HOT 1
- Update iree for nanobind 2 HOT 4
- Hoist into global pass is producing different IRs on different runs of the same IR. HOT 6
- [LLVMCPU][UKernels] Pack and unpack ukernels not passing appropriate strides
- LinalgExt ops should not have Variadic ins/outs
- Data race between worker threads sharing a buffer HOT 41
- Wrong `flow.dispatch.tensor.store` op creation; leading to shape mismatch. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iree.