Comments (5)
yeah, and it's always possible to just form your dispatch regions at that point instead. my current mental model where util.scope would be useful is "I want this layer of the model composed of dozens to hundreds of linalg ops to run on this device/with these assumptions/etc" and then have dispatch region formation run and produce one or more dispatches that are assigned to that device. if it's an individual dispatch/extern/etc then it's best just to form that dispatch directly.
from iree.
We could do this with an op interface so that we can have dialect-specific ops/syntactic sugar - like a hal.device.scope or something - but with external interfaces we could also make a single util.scope op implement dialect-specific interfaces.
I think it could be useful to have the op interface because we could combine it with other interfaces like having an IsolatedFromAbove
version without needing to clone the op. What I'm thinking of is something like
func.func @my_kernel {
%in_binding = ...
%out_binding = ...
%0 = codegen.scope.tu (%in_binding) {
linalg.matmul
...
} -> tensor<...>
%1 = codegen.scope.tu (%0, %out_binding) {
linalg.reduce
...
}
}
The idea being, if I'm given a function to generate code for, I might want to use different strategies/pipelines for different portions of the function, but dialect conversions like bufferization will need to convert the whole function all at once (and might need similar operand tying to avoid extra allocations). Having an isolated scope lets us nest the pipeline on the scope without having to outline to a different function (and having to create another temporary module or something).
Not 100% sure this is the direction we'll want to go for codegen, but similar ideas have been thrown around a few times, so just food for thought.
from iree.
good point! my gut is that we'll want something else for codegen so it can implement interfaces that make sense to it (bufferization/etc) - we don't tend to use tied operands or shape-aware ops in there (just on host/prior to dispatch region formation). if such a codegen scope op existed earlier in the pipeline (prior to dispatch region formation) it'd probably be best to have it be distinct from whatever lower-level one would support memrefs/tiling/etc (if it even exists at that point) so that fusion/formation can handle it properly on tensors. I'd be interested to know if you could think of cases where we'd want a codegen-specific op early on that the flow/stream level needs to be aware of that isn't just setting the HAL target/etc? (gathering use cases)
from iree.
good point! my gut is that we'll want something else for codegen so it can implement interfaces that make sense to it (bufferization/etc) - we don't tend to use tied operands or shape-aware ops in there (just on host/prior to dispatch region formation). if such a codegen scope op existed earlier in the pipeline (prior to dispatch region formation) it'd probably be best to have it be distinct from whatever lower-level one would support memrefs/tiling/etc (if it even exists at that point) so that fusion/formation can handle it properly on tensors. I'd be interested to know if you could think of cases where we'd want a codegen-specific op early on that the flow/stream level needs to be aware of that isn't just setting the HAL target/etc? (gathering use cases)
Yeah something else for codegen is the key here, but I guess I'd need to see the interface for util.scope
and/or work out the details to know whether this would be the exact abstraction we want there.
Re: codegen-specific scoping before dispatch formation, I can see a plausible use case for it, but I don't think we're close to supporting it. Thinking something along the lines of hand authored kernels while allowing for certain fusions. For example, I could replace a linalg.matmul
with some hand authored (and pre-workgroup distributed code) that might want to fuse some elementwise stuff
func.func {
%b = codegen.scope.workgroups {
... do some work
codegen.store %some_tensor
} -> some_tensor
linalg.add %a, %b
}
There would be lots of trickiness in getting workgroup distribution to align, and potentially more trickiness when it comes to thread distribution on GPU (although that might just need more scoping) but I could see this as useful.
The other use case I was thinking of was scoping for custom kernel replacement, i.e. I match a dag and replace it with a hal.dispatch.extern
or something. Initially my thought was we always jump straight to hal.dispatch.extern
, but it might be useful to do matching even earlier on in, say, torch-mlir before tensor shapes have been resolved, and then use their shape propagation framework to update the region, then do outlining. Probably its own thing though if we go that far up.
Edit: Hmmm but thinking more about it, those are probably both different things. I don't see what Flow/Stream would need to know other than setting the target like you said.
from iree.
hmm not liking util.scope for user-level things - may add a util.annotate that is lighter-weight (no tied operand support, no dynamic dimension captures) that we can lower into util.scope for use within flow/stream/hal.
from iree.
Related Issues (20)
- [flow] AnnotateDispatches does not spell out linalg ops in LinalgExt fusion
- Compilation error for SHARK-TestSuite (onnx/models/RAFT_vaiq_int8)
- [LLVMCPU] Bad packing codegen with different `outer_dims_perm` HOT 2
- Missing propagation for `unpack -> collapse_shape` to `collpase_shape -> unpack`. HOT 4
- [Attention] Generalize Attention Tiling and Decomposition
- Move DecomposeSoftmax to GlobalOptimization. HOT 28
- Move affinity interfaces to their own dialect.
- Missing f32->bf16 demotion support for the targets of data-tiling ops HOT 2
- Winograd transform generates bad memory accessing pattern for CPU
- [CPU] Big vector is generated and unrolled without warning/erroring
- Missing fusion for winograd transform ops with their consumers
- [CPU] Explore ukernels for winograd transform ops
- [Flow] Enable reshape propagation through tensor.pad
- (ROCM) Failed to distribute matmul in sdxl-turbo unet HOT 1
- Update iree for nanobind 2 HOT 4
- Hoist into global pass is producing different IRs on different runs of the same IR. HOT 6
- [LLVMCPU][UKernels] Pack and unpack ukernels not passing appropriate strides
- LinalgExt ops should not have Variadic ins/outs
- Data race between worker threads sharing a buffer HOT 42
- Wrong `flow.dispatch.tensor.store` op creation; leading to shape mismatch. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iree.