Giter Club home page Giter Club logo

Comments (10)

maleadt avatar maleadt commented on May 16, 2024

Thanks for reducing the issue. Once CUDAnative works with Julia master again, I should:

  • see if LLVM 5.0 supports it
  • if not, check how Clang lowers this
  • possibly, use eg. Cassette to provide a different implementation

from cuda.jl.

khyperia avatar khyperia commented on May 16, 2024

For whatever it's worth, I stumbled across this issue when digging through Google for this error in an opencl program being compiled with clang using the nvptx backend.

Basically: No, Clang/LLVM 5.0 does not support this.

$ clang --version
clang version 5.0.1 (tags/RELEASE_501/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ cat test.cl
__kernel void func(__global float* input, __global float* output)
{
    *output = pow(*input, 1.5f);
}
$ clang test.cl -target nvptx -cl-std=CL1.2 -include clc/clc.h -S -o output.ptx
fatal error: error in backend: Cannot select: 0x563b17284b50: f32 = fpow 0x563b17284a80, ConstantFP:f32<1.500000e+00>
  0x563b17284a80: f32,ch = load<LD4[%0(addrspace=1)](tbaa=<0x563b171ec868>)> 0x563b172432a8, 0x563b17284c88, undef:i32
    0x563b17284c88: i32,ch = load<LD4[null(addrspace=101)](dereferenceable)(invariant)> 0x563b172432a8, TargetExternalSymbol:i32'func_param_0', undef:i32
      0x563b17284740: i32 = TargetExternalSymbol'func_param_0'
      0x563b17284810: i32 = undef
    0x563b17284810: i32 = undef
  0x563b17284ae8: f32 = ConstantFP<1.500000e+00>
In function: func
clang-5.0: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 5.0.1 (tags/RELEASE_501/final)
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
clang-5.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-5.0: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-5.0: note: diagnostic msg: /tmp/test-0eb886.cl
clang-5.0: note: diagnostic msg: /tmp/test-0eb886.sh
clang-5.0: note: diagnostic msg:
********************

(using headers of https://libclc.llvm.org/ )

The key line being this, which is the same error as the one in the original issue.

fatal error: error in backend: Cannot select: 0x563b17284b50: f32 = fpow 0x563b17284a80, ConstantFP:f32<1.500000e+00>

from cuda.jl.

maleadt avatar maleadt commented on May 16, 2024

Minimal repro:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-09471dc.bc"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: nounwind
define fastcc void @julia__26_63877() unnamed_addr #0 {
top:
  %0 = tail call float @llvm.pow.f32(float undef, float undef)
  %1 = fmul float undef, %0
  %2 = fmul float undef, %1
  store float %2, float* undef, align 4
  ret void
}

; Function Attrs: nounwind readnone speculatable
declare float @llvm.pow.f32(float, float) JuliaGPU/CUDAnative.jl#1

attributes #0 = { nounwind "no-frame-pointer-elim"="true" }
attributes JuliaGPU/CUDAnative.jl#1 = { nounwind readnone speculatable }

from cuda.jl.

maleadt avatar maleadt commented on May 16, 2024

We can possible make this work if we have this expand to a libcall and provide its implementation in our runtime. Ref https://reviews.llvm.org/D34708 and JuliaGPU/CUDAnative.jl#303.

from cuda.jl.

maleadt avatar maleadt commented on May 16, 2024

I had a patch for this and JuliaGPU/CUDAnative.jl#13:

diff --git a/src/compiler/irgen.jl b/src/compiler/irgen.jl
index 0994165..0a957ce 100644
--- a/src/compiler/irgen.jl
+++ b/src/compiler/irgen.jl
@@ -179,6 +179,7 @@ function irgen(job::CompilerJob, method_instance::Core.MethodInstance, world)
         add!(pm, ModulePass("LowerThrow", lower_throw!))
         add!(pm, FunctionPass("HideUnreachable", hide_unreachable!))
         add!(pm, ModulePass("HideTrap", hide_trap!))
+        add!(pm, ModulePass("LowerUnsupportedIntrinsics", replace_intrinsics!))
         always_inliner!(pm)
         run!(pm, mod)
     end
@@ -444,3 +445,33 @@ function hide_trap!(mod::LLVM.Module)
     end
     return changed
 end
+
+
+# this pass replaces LLVM intrinsics that are now supported by the NVPTX back-end
+# with calls to the NVVM device library.
+function replace_intrinsics!(mod::LLVM.Module)
+    job = current_job::CompilerJob
+    changed = false
+    @timeit to[] "lower unsupported intrinsics" begin
+
+
+    # intrinsics with exact libdevice matches
+
+    for (llvm_fn, nvvm_fn) in ("llvm.pow.f64" => "__nv_pow",
+                               "llvm.pow.f64" => "__nv_powf",
+                               "llvm.powi.f64" => "__nv_powi",
+                               "llvm.powi.f64" => "__nv_powif")
+        if haskey(functions(mod), llvm_fn)
+            llvm_f = functions(mod)[llvm_fn]
+            llvm_ft = eltype(llvmtype(llvm_f))
+
+            nvvm_f = LLVM.Function(mod, nvvm_fn, llvm_ft)
+
+            replace_uses!(llvm_f, nvvm_f)
+        end
+    end
+
+
+    end
+    return changed
+end

But that doesn't work due to the throw_exp_domainerror... We really should just redefine these functions to call libdevice directly instead of messing at the LLVM level:

@inline function ^(x::Float64, y::Float64)
    z = ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, y)
    if isnan(z) & !isnan(x+y)
        throw_exp_domainerror(x)
    end
    z
end
@inline function ^(x::Float32, y::Float32)
    z = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, y)
    if isnan(z) & !isnan(x+y)
        throw_exp_domainerror(x)
    end
    z
end
@inline ^(x::Float64, y::Integer) = ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
@inline ^(x::Float32, y::Integer) = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, Float32(y))

from cuda.jl.

GiggleLiu avatar GiggleLiu commented on May 16, 2024

I got a strange error, this is MWE,

using CUDAnative, CuArrays, GPUArrays
function sq_kernel(out!, x)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    v = 3
    @inbounds out![i] += x[i] ^ v
    return nothing
end

function sqfunc(out!::CuVector, z::CuVector)
   XY = GPUArrays.thread_blocks_heuristic(length(out!))
   @cuda threads=XY[1] blocks=XY[2] sq_kernel(out!, z)
end

# this fail
sqfunc(randn(Float32,128) |> CuArray, randn(Float32,128) |> CuArray)


function sq_kernel(out!, x)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    @inbounds out![i] += x[i] ^ 3
    return nothing
end

# this work
sqfunc(randn(Float32,128) |> CuArray, randn(Float32,128) |> CuArray)

@maleadt , is it related or should I post a new issue?

from cuda.jl.

maleadt avatar maleadt commented on May 16, 2024

Yes, that is related. You can switch to using one of the pow signatures that CUDA supports, e.g., ^ Float32(v).

from cuda.jl.

GiggleLiu avatar GiggleLiu commented on May 16, 2024

Well, still does not work

using CUDAnative, CuArrays, GPUArrays
function sq_kernel(out!, x)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    v = 3
    @inbounds out![i] += x[i] ^ Float32(v)
    return nothing
end

got

ERROR: InvalidIRError: compiling sq_kernel(CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CuDeviceArray{Float32,1,CUDAnative.AS.Global}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to jl_alloc_string)
Stacktrace:
 [1] _string_n at strings/string.jl:60
 [2] string at strings/substring.jl:191
 [3] throw_exp_domainerror at math.jl:36
 [4] ^ at math.jl:856
 [5] sq_kernel at /home/leo/Documents/papers/nilang/codes/bessel_gpu.jl:153
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
 [1] unsafe_copyto! at array.jl:242
 [2] macro expansion at gcutils.jl:91
 [3] __unsafe_string! at strings/substring.jl:178
 [4] string at strings/substring.jl:194
 [5] throw_exp_domainerror at math.jl:36
 [6] ^ at math.jl:856
 [7] sq_kernel at /home/leo/Documents/papers/nilang/codes/bessel_gpu.jl:153
Stacktrace:
...

from cuda.jl.

maleadt avatar maleadt commented on May 16, 2024

The available signatures are here: https://github.com/JuliaGPU/CUDAnative.jl/blob/master/src/device/cuda/math.jl#L204-L209

from cuda.jl.

GiggleLiu avatar GiggleLiu commented on May 16, 2024

It is very wield, as long as the second argument of ^ is a symbol, it does not work. Regardless the type is supported or not. More likely a compiler issue?

from cuda.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.