Giter Club home page Giter Club logo

Comments (2)

abadams avatar abadams commented on September 13, 2024

Ugh, if some devices require that alignment, I think we need to reconsider using sub buffers at all, because we can't guarantee that sort of crop alignment.

from halide.

zvookin avatar zvookin commented on September 13, 2024

This happens when passing cropped device buffers to an OpenCL kernel compiled by Halide. Halide implements halide_device_crop for OpenCL by storing an offset in the Halide buffer structure. In order to pass a buffer with a non-zero offset to an OpenCL kernel, at least as Halide currently generates code for OpenCL, a sub-buffer must be created or the crop area copied to a new device buffer. The halide_opencl_run calls clCreateSubBuffer to create a sub-buffer as this is more efficient than copying. (And since the buffers are general read/write, copying requires copy-in/copy-out, though it is possible we have enough information to eliminate one side or the other in many cases.)

Unfortunately, sub-buffers must be aligned to the CL_DEVICE_MEM_BASE_ADDR_ALIGN value, which cannot be assumed to be 1. On devices where stricter alignment is required, a cropped device buffer can potentially cause a failure in halide_opencl_run as documented here. The issue does not include a reproducing case, but the easiest way to make a test case is to call halide_device_crop to form a buffer with an offset of 1 and pass that to Halide generated computation scheduled using OpenCL.

There are three possible ways to go here:

  1. Declare it an error. Likely improve the reporting in halide_opencl_run and document how it occurs.

  2. Query the OpenCL device for the required alignment and if the supplied input or output buffer does not meet the requirements, have the runtime copy it to a newly allocated buffer. This involves taking care to copy in and copy out as required. The copies can likely be asynchronous but the allocation and free probably have to serialize somewhat. And as with many things where we take care of this inside the runtime, there is little opportunity to reuse the temporary buffer or otherwise optimize/amortize the cost. That said, the cost would only be incurred when the current implementation would crash. We should likely think a bit on whether copying preserves semantics in all cases. E.g. aliasing.

  3. Compile the OpenCL kernel code to take an offset in addition to the buffer. Doing this for every single buffer is likely enough of a performance issue to merit programmer control in the schedule. I.e. one would have to mark something as unaligned to support this. (Or optionally, introduce a concept akin to host alignment in the schedule for GPU to allow generating more efficient code when proper alignment is asserted. However this is fairly dubious as the required alignment is not known at compile time, only runtime.)

Option 2 is likely the way to go. It's not a one liner, but it's not that difficult to code. It's more tricky to test without a platform handy that exhibits the behavior.

from halide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.