<div class="highlight highlight-source-julia notranslate position-relative overflow-auto" dir="auto"

Minimal repro: <div class="highlight highlight-source-llvm notranslate position-re

I had a patch for this and JuliaGPU/CUDAnative.jl#13: <div class="highlight highli

I got a strange error, this is MWE, <div class="highlight highlight-source-julia n

Well, still does not work <div class="highlight highlight-source-julia notranslate

The available signatures are here: <a href="https://github.com/JuliaGPU/CUDAnative.jl/

Cannot select `fpow` while lowering `pow.f32` about cuda.jl HOT 10 CLOSED

ghost commented on May 16, 2024

Cannot select `fpow` while lowering `pow.f32`

from cuda.jl.

Comments (10)

maleadt commented on May 16, 2024

Thanks for reducing the issue. Once CUDAnative works with Julia master again, I should:

see if LLVM 5.0 supports it
if not, check how Clang lowers this
possibly, use eg. Cassette to provide a different implementation

from cuda.jl.

khyperia commented on May 16, 2024

For whatever it's worth, I stumbled across this issue when digging through Google for this error in an opencl program being compiled with clang using the nvptx backend.

Basically: No, Clang/LLVM 5.0 does not support this.

$ clang --version
clang version 5.0.1 (tags/RELEASE_501/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ cat test.cl
__kernel void func(__global float* input, __global float* output)
{
    *output = pow(*input, 1.5f);
}
$ clang test.cl -target nvptx -cl-std=CL1.2 -include clc/clc.h -S -o output.ptx
fatal error: error in backend: Cannot select: 0x563b17284b50: f32 = fpow 0x563b17284a80, ConstantFP:f32<1.500000e+00>
  0x563b17284a80: f32,ch = load<LD4[%0(addrspace=1)](tbaa=<0x563b171ec868>)> 0x563b172432a8, 0x563b17284c88, undef:i32
    0x563b17284c88: i32,ch = load<LD4[null(addrspace=101)](dereferenceable)(invariant)> 0x563b172432a8, TargetExternalSymbol:i32'func_param_0', undef:i32
      0x563b17284740: i32 = TargetExternalSymbol'func_param_0'
      0x563b17284810: i32 = undef
    0x563b17284810: i32 = undef
  0x563b17284ae8: f32 = ConstantFP<1.500000e+00>
In function: func
clang-5.0: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 5.0.1 (tags/RELEASE_501/final)
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
clang-5.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-5.0: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-5.0: note: diagnostic msg: /tmp/test-0eb886.cl
clang-5.0: note: diagnostic msg: /tmp/test-0eb886.sh
clang-5.0: note: diagnostic msg:
********************

(using headers of https://libclc.llvm.org/ )

The key line being this, which is the same error as the one in the original issue.

fatal error: error in backend: Cannot select: 0x563b17284b50: f32 = fpow 0x563b17284a80, ConstantFP:f32<1.500000e+00>

from cuda.jl.

maleadt commented on May 16, 2024

Minimal repro:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-09471dc.bc"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: nounwind
define fastcc void @julia__26_63877() unnamed_addr #0 {
top:
  %0 = tail call float @llvm.pow.f32(float undef, float undef)
  %1 = fmul float undef, %0
  %2 = fmul float undef, %1
  store float %2, float* undef, align 4
  ret void
}

; Function Attrs: nounwind readnone speculatable
declare float @llvm.pow.f32(float, float) JuliaGPU/CUDAnative.jl#1

attributes #0 = { nounwind "no-frame-pointer-elim"="true" }
attributes JuliaGPU/CUDAnative.jl#1 = { nounwind readnone speculatable }

from cuda.jl.

maleadt commented on May 16, 2024

We can possible make this work if we have this expand to a libcall and provide its implementation in our runtime. Ref https://reviews.llvm.org/D34708 and JuliaGPU/CUDAnative.jl#303.

from cuda.jl.

maleadt commented on May 16, 2024

I had a patch for this and JuliaGPU/CUDAnative.jl#13:

diff --git a/src/compiler/irgen.jl b/src/compiler/irgen.jl
index 0994165..0a957ce 100644
--- a/src/compiler/irgen.jl
+++ b/src/compiler/irgen.jl
@@ -179,6 +179,7 @@ function irgen(job::CompilerJob, method_instance::Core.MethodInstance, world)
         add!(pm, ModulePass("LowerThrow", lower_throw!))
         add!(pm, FunctionPass("HideUnreachable", hide_unreachable!))
         add!(pm, ModulePass("HideTrap", hide_trap!))
+        add!(pm, ModulePass("LowerUnsupportedIntrinsics", replace_intrinsics!))
         always_inliner!(pm)
         run!(pm, mod)
     end
@@ -444,3 +445,33 @@ function hide_trap!(mod::LLVM.Module)
     end
     return changed
 end
+
+
+# this pass replaces LLVM intrinsics that are now supported by the NVPTX back-end
+# with calls to the NVVM device library.
+function replace_intrinsics!(mod::LLVM.Module)
+    job = current_job::CompilerJob
+    changed = false
+    @timeit to[] "lower unsupported intrinsics" begin
+
+
+    # intrinsics with exact libdevice matches
+
+    for (llvm_fn, nvvm_fn) in ("llvm.pow.f64" => "__nv_pow",
+                               "llvm.pow.f64" => "__nv_powf",
+                               "llvm.powi.f64" => "__nv_powi",
+                               "llvm.powi.f64" => "__nv_powif")
+        if haskey(functions(mod), llvm_fn)
+            llvm_f = functions(mod)[llvm_fn]
+            llvm_ft = eltype(llvmtype(llvm_f))
+
+            nvvm_f = LLVM.Function(mod, nvvm_fn, llvm_ft)
+
+            replace_uses!(llvm_f, nvvm_f)
+        end
+    end
+
+
+    end
+    return changed
+end

But that doesn't work due to the throw_exp_domainerror... We really should just redefine these functions to call libdevice directly instead of messing at the LLVM level:

@inline function ^(x::Float64, y::Float64)
    z = ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, y)
    if isnan(z) & !isnan(x+y)
        throw_exp_domainerror(x)
    end
    z
end
@inline function ^(x::Float32, y::Float32)
    z = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, y)
    if isnan(z) & !isnan(x+y)
        throw_exp_domainerror(x)
    end
    z
end
@inline ^(x::Float64, y::Integer) = ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
@inline ^(x::Float32, y::Integer) = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, Float32(y))

from cuda.jl.

GiggleLiu commented on May 16, 2024

I got a strange error, this is MWE,

using CUDAnative, CuArrays, GPUArrays
function sq_kernel(out!, x)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    v = 3
    @inbounds out![i] += x[i] ^ v
    return nothing
end

function sqfunc(out!::CuVector, z::CuVector)
   XY = GPUArrays.thread_blocks_heuristic(length(out!))
   @cuda threads=XY[1] blocks=XY[2] sq_kernel(out!, z)
end

# this fail
sqfunc(randn(Float32,128) |> CuArray, randn(Float32,128) |> CuArray)


function sq_kernel(out!, x)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    @inbounds out![i] += x[i] ^ 3
    return nothing
end

# this work
sqfunc(randn(Float32,128) |> CuArray, randn(Float32,128) |> CuArray)

@maleadt , is it related or should I post a new issue?

from cuda.jl.

maleadt commented on May 16, 2024

Yes, that is related. You can switch to using one of the pow signatures that CUDA supports, e.g., ^ Float32(v).

from cuda.jl.

GiggleLiu commented on May 16, 2024

Well, still does not work

using CUDAnative, CuArrays, GPUArrays
function sq_kernel(out!, x)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    v = 3
    @inbounds out![i] += x[i] ^ Float32(v)
    return nothing
end

got

ERROR: InvalidIRError: compiling sq_kernel(CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CuDeviceArray{Float32,1,CUDAnative.AS.Global}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to jl_alloc_string)
Stacktrace:
 [1] _string_n at strings/string.jl:60
 [2] string at strings/substring.jl:191
 [3] throw_exp_domainerror at math.jl:36
 [4] ^ at math.jl:856
 [5] sq_kernel at /home/leo/Documents/papers/nilang/codes/bessel_gpu.jl:153
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
 [1] unsafe_copyto! at array.jl:242
 [2] macro expansion at gcutils.jl:91
 [3] __unsafe_string! at strings/substring.jl:178
 [4] string at strings/substring.jl:194
 [5] throw_exp_domainerror at math.jl:36
 [6] ^ at math.jl:856
 [7] sq_kernel at /home/leo/Documents/papers/nilang/codes/bessel_gpu.jl:153
Stacktrace:
...

from cuda.jl.

maleadt commented on May 16, 2024

The available signatures are here: https://github.com/JuliaGPU/CUDAnative.jl/blob/master/src/device/cuda/math.jl#L204-L209

from cuda.jl.

GiggleLiu commented on May 16, 2024

It is very wield, as long as the second argument of ^ is a symbol, it does not work. Regardless the type is supported or not. More likely a compiler issue?

from cuda.jl.

Cannot select `fpow` while lowering `pow.f32` about cuda.jl HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent