Giter Club home page Giter Club logo

Comments (7)

abadams avatar abadams commented on June 26, 2024 1

Looks like it's a compiler bug caused by the adams autoscheduler not really understanding what to do on hexagon, and producing some very strange code that then hit a corner case bug in the simplifier.

Let's use the human Adams autoscheduler instead. A reasonable schedule for this pipeline is:

matmul_out1_fcn.vectorize(d1, 128).parallel(d2, (B1.dim(1).extent() + 3) / 4);

but a more typical matmul schedule (for large matrices) is

   void generate() {
        RDom r(0, 100);
        // Note: changed from sum to += so that I can schedule the reduction var
        matmul_out1(d1, d2) += cast<uint16_t>(A1(d1, r)) * cast<uint16_t>(B1(r, d2));
        matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);

        Var d1i, d2i, d1o, d2o;
        matmul_out1_fcn.tile(d1, d2, d1o, d2o, d1i, d2i, 3 * 128, 4).vectorize(d1i, 128).unroll(d1i).unroll(d2i).parallel(d2o);
        matmul_out1.compute_at(matmul_out1_fcn, d1o).vectorize(d1, 128).unroll(d1).unroll(d2);
        matmul_out1.update().reorder(d1, d2, r).vectorize(d1, 128).unroll(d1).unroll(d2);
    }

I usually do my scheduling inside the generate() method. In this case I needed to to access the RDom. You could also make the RDom a class member instead of a local.

For a great schedule, you need to start worrying about things like managing dmas into Hexagon's cache.

from halide.

abadams avatar abadams commented on June 26, 2024

The error means you're trying to compile to hvx, but your pipeline uses vectorized floats. I think our hexagon backend doesn't support the newer versions of hvx that support float vectors.

I think it isn't triggering without the autoscheduler, because then the schedule uses scalar floats only, which is fine. The autoscheduler isn't aware of that restriction on hexagon so it's trying to just vectorize everything.

from halide.

jxl1080 avatar jxl1080 commented on June 26, 2024

@abadams Thank you so much for your quick reply! Is there any suggestion on how to resolve this error message?

The error means you're trying to compile to hvx, but your pipeline uses vectorized floats. I think our hexagon backend doesn't support the newer versions of hvx that support float vectors.

I think it isn't triggering without the autoscheduler, because then the schedule uses scalar floats only, which is fine. The autoscheduler isn't aware of that restriction on hexagon so it's trying to just vectorize everything.

from halide.

abadams avatar abadams commented on June 26, 2024

Don't try to do a floating point matrix multiply on hexagon. (Or at least the versions of hvx that Halide supports). It's not a good processor for running that algorithm, because you can't vectorize it. Do a fixed-point matrix multiply instead.

from halide.

jxl1080 avatar jxl1080 commented on June 26, 2024

@abadams Hi Adams, I'm not sure if I misunderstood your point by 'not try to do a floating point matrix multiply'. I changed my data type to 'uint8_t', but I'm getting a worse situation when I run my generator with Adams2019. There is a segmentation fault but without any error message.

from halide.

abadams avatar abadams commented on June 26, 2024

Can you share a repro that crashes (including the build commands you're using)?

from halide.

jxl1080 avatar jxl1080 commented on June 26, 2024

Can you share a repro that crashes (including the build commands you're using)?

@abadams Thank you for your help! Below is the code of my Halide Generator Class:

#include "Halide.h"
#include <stdio.h>
#include
using namespace Halide;
class mMatmul_matmul_out1_fcn_halide_generator : public Halide::Generator <mMatmul_matmul_out1_fcn_halide_generator> {

public:
    Input<Buffer<uint8_t>> B1{"B1", 2};
    Input<Buffer<uint8_t>> A1{"A1", 2};
    Output<Buffer<uint16_t>> matmul_out1_fcn{"matmul_out1_fcn", 2};

    void generate() {
        RDom r(0, 100);
        matmul_out1(d1, d2) = sum(cast<uint16_t>(A1(d1, r))*cast<uint16_t>(B1(r, d2)));
        matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);
    }

    void schedule() {
    // Schedule is determined by autoscheduler. Need to set estimate on buffer
        if(using_autoscheduler()) {
            B1.dim(1).set_estimate(0, 100);
            B1.dim(0).set_estimate(0, 100);
            A1.dim(1).set_estimate(0, 100);
            A1.dim(0).set_estimate(0, 100);
            matmul_out1_fcn.set_estimate(d1, 0, 100).set_estimate(d2, 0, 100);
        }  else {
            // Default schedule
        }
    }

private:
    Var d1{"d1"};
    Var d2{"d2"};
    Func matmul_out1{"matmul_out1"};

};
HALIDE_REGISTER_GENERATOR(mMatmul_matmul_out1_fcn_halide_generator, mMatmul_matmul_out1_fcn_halide_gen)

I used binary 'Halide-17.0.1-x86-64-linux-52541176253e74467dabc42eeee63d9a62c199f6.tar.gz' downloaded from: https://github.com/halide/Halide/releases

My command for compiling the Halide Genertor Class is:
$ g++ mMatmul_matmul_out1_fcn_halide.cpp -std=c++17 ....../Halide-17.0.1-x86-64-linux/share/Halide/tools/GenGen.cpp -L ....../Halide-17.0.1-x86-64-linux/lib -lHalide -I ....../Halide-17.0.1-x86-64-linux/include -o mMatmul_matmul_out1_fcn_halide

My command for running generator with Adams2019 is (which gave me segmentation fault):
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:....../Halide-17.0.1-x86-64-linux/lib
$ ./mMatmul_matmul_out1_fcn_halide -f myPipeline -g mMatmul_matmul_out1_fcn_halide_gen -e h,o target=hexagon-32-noos-hvx-no_runtime autoscheduler.parallelism=2 autoscheduler=Adams2019 -p ....../Halide-17.0.1-x86-64-linux/lib/libautoschedule_adams2019.so -o ./

My command for running generator with no auto-scheduler (which worked for me):
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:....../Halide-17.0.1-x86-64-linux/lib
$ ./mMatmul_matmul_out1_fcn_halide -f myPipeline -g mMatmul_matmul_out1_fcn_halide_gen -e h,o target=hexagon-32-noos-hvx-no_runtime -o ./

from halide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.