Comments (7)
Looks like it's a compiler bug caused by the adams autoscheduler not really understanding what to do on hexagon, and producing some very strange code that then hit a corner case bug in the simplifier.
Let's use the human Adams autoscheduler instead. A reasonable schedule for this pipeline is:
matmul_out1_fcn.vectorize(d1, 128).parallel(d2, (B1.dim(1).extent() + 3) / 4);
but a more typical matmul schedule (for large matrices) is
void generate() {
RDom r(0, 100);
// Note: changed from sum to += so that I can schedule the reduction var
matmul_out1(d1, d2) += cast<uint16_t>(A1(d1, r)) * cast<uint16_t>(B1(r, d2));
matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);
Var d1i, d2i, d1o, d2o;
matmul_out1_fcn.tile(d1, d2, d1o, d2o, d1i, d2i, 3 * 128, 4).vectorize(d1i, 128).unroll(d1i).unroll(d2i).parallel(d2o);
matmul_out1.compute_at(matmul_out1_fcn, d1o).vectorize(d1, 128).unroll(d1).unroll(d2);
matmul_out1.update().reorder(d1, d2, r).vectorize(d1, 128).unroll(d1).unroll(d2);
}
I usually do my scheduling inside the generate() method. In this case I needed to to access the RDom. You could also make the RDom a class member instead of a local.
For a great schedule, you need to start worrying about things like managing dmas into Hexagon's cache.
from halide.
The error means you're trying to compile to hvx, but your pipeline uses vectorized floats. I think our hexagon backend doesn't support the newer versions of hvx that support float vectors.
I think it isn't triggering without the autoscheduler, because then the schedule uses scalar floats only, which is fine. The autoscheduler isn't aware of that restriction on hexagon so it's trying to just vectorize everything.
from halide.
@abadams Thank you so much for your quick reply! Is there any suggestion on how to resolve this error message?
The error means you're trying to compile to hvx, but your pipeline uses vectorized floats. I think our hexagon backend doesn't support the newer versions of hvx that support float vectors.
I think it isn't triggering without the autoscheduler, because then the schedule uses scalar floats only, which is fine. The autoscheduler isn't aware of that restriction on hexagon so it's trying to just vectorize everything.
from halide.
Don't try to do a floating point matrix multiply on hexagon. (Or at least the versions of hvx that Halide supports). It's not a good processor for running that algorithm, because you can't vectorize it. Do a fixed-point matrix multiply instead.
from halide.
@abadams Hi Adams, I'm not sure if I misunderstood your point by 'not try to do a floating point matrix multiply'. I changed my data type to 'uint8_t', but I'm getting a worse situation when I run my generator with Adams2019. There is a segmentation fault but without any error message.
from halide.
Can you share a repro that crashes (including the build commands you're using)?
from halide.
Can you share a repro that crashes (including the build commands you're using)?
@abadams Thank you for your help! Below is the code of my Halide Generator Class:
#include "Halide.h"
#include <stdio.h>
#include
using namespace Halide;
class mMatmul_matmul_out1_fcn_halide_generator : public Halide::Generator <mMatmul_matmul_out1_fcn_halide_generator> {
public:
Input<Buffer<uint8_t>> B1{"B1", 2};
Input<Buffer<uint8_t>> A1{"A1", 2};
Output<Buffer<uint16_t>> matmul_out1_fcn{"matmul_out1_fcn", 2};
void generate() {
RDom r(0, 100);
matmul_out1(d1, d2) = sum(cast<uint16_t>(A1(d1, r))*cast<uint16_t>(B1(r, d2)));
matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);
}
void schedule() {
// Schedule is determined by autoscheduler. Need to set estimate on buffer
if(using_autoscheduler()) {
B1.dim(1).set_estimate(0, 100);
B1.dim(0).set_estimate(0, 100);
A1.dim(1).set_estimate(0, 100);
A1.dim(0).set_estimate(0, 100);
matmul_out1_fcn.set_estimate(d1, 0, 100).set_estimate(d2, 0, 100);
} else {
// Default schedule
}
}
private:
Var d1{"d1"};
Var d2{"d2"};
Func matmul_out1{"matmul_out1"};
};
HALIDE_REGISTER_GENERATOR(mMatmul_matmul_out1_fcn_halide_generator, mMatmul_matmul_out1_fcn_halide_gen)
I used binary 'Halide-17.0.1-x86-64-linux-52541176253e74467dabc42eeee63d9a62c199f6.tar.gz' downloaded from: https://github.com/halide/Halide/releases
My command for compiling the Halide Genertor Class is:
$ g++ mMatmul_matmul_out1_fcn_halide.cpp -std=c++17 ....../Halide-17.0.1-x86-64-linux/share/Halide/tools/GenGen.cpp -L ....../Halide-17.0.1-x86-64-linux/lib -lHalide -I ....../Halide-17.0.1-x86-64-linux/include -o mMatmul_matmul_out1_fcn_halide
My command for running generator with Adams2019 is (which gave me segmentation fault):
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:....../Halide-17.0.1-x86-64-linux/lib
$ ./mMatmul_matmul_out1_fcn_halide -f myPipeline -g mMatmul_matmul_out1_fcn_halide_gen -e h,o target=hexagon-32-noos-hvx-no_runtime autoscheduler.parallelism=2 autoscheduler=Adams2019 -p ....../Halide-17.0.1-x86-64-linux/lib/libautoschedule_adams2019.so -o ./
My command for running generator with no auto-scheduler (which worked for me):
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:....../Halide-17.0.1-x86-64-linux/lib
$ ./mMatmul_matmul_out1_fcn_halide -f myPipeline -g mMatmul_matmul_out1_fcn_halide_gen -e h,o target=hexagon-32-noos-hvx-no_runtime -o ./
from halide.
Related Issues (20)
- HalideBuffer: offer a way to wrap existing memory but take ownership?
- Why does having inconsistent input and output dimensions in a Halide generator significantly slow down performance? HOT 4
- Python bindings do not support int64 constants HOT 2
- Spurious floating-point precision warning from Python bindings HOT 1
- Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide. HOT 4
- Does Halide support opencl on Android (Qualcomm)?
- blank
- OpenCL error with `CL_MISALIGNED_SUB_BUFFER_OFFSET` HOT 2
- Tests fail to compile without any particular error messages
- Can we delete `master` branch? HOT 1
- Wrong error message when trying to hoist_storage() out a parallel for.
- Metal shader generation should expose needed float16 intrinsics
- Add a Python bindings tutorial example that uses JIT and an autoscheduler? HOT 1
- We should make a cleanly-vectorizing fast-approximation for atan2f. HOT 2
- Runtime error (ambiguous function call) for float16 on Metal backend
- Internal Error at .../anderson2021/SearchSpace.cpp:486 ... Condition failed: !parallel_tilings.empty(): zero parallel tilings HOT 2
- vulkan backend is broken HOT 1
- Build Failure on Ubuntu 20.04 with LLVM 18.0.0 and Halide Main Branch:: Data Layout Mismatch Errors HOT 3
- Anderson2021 autoscheduler fails on inputs larger than 199993 elements HOT 1
- Anderson2021 autoscheduler fails with: Condition failed: at_or_inside_block()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from halide.