Comments (10)
Thanks for reducing the issue. Once CUDAnative works with Julia master again, I should:
- see if LLVM 5.0 supports it
- if not, check how Clang lowers this
- possibly, use eg. Cassette to provide a different implementation
from cuda.jl.
For whatever it's worth, I stumbled across this issue when digging through Google for this error in an opencl program being compiled with clang using the nvptx backend.
Basically: No, Clang/LLVM 5.0 does not support this.
$ clang --version
clang version 5.0.1 (tags/RELEASE_501/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ cat test.cl
__kernel void func(__global float* input, __global float* output)
{
*output = pow(*input, 1.5f);
}
$ clang test.cl -target nvptx -cl-std=CL1.2 -include clc/clc.h -S -o output.ptx
fatal error: error in backend: Cannot select: 0x563b17284b50: f32 = fpow 0x563b17284a80, ConstantFP:f32<1.500000e+00>
0x563b17284a80: f32,ch = load<LD4[%0(addrspace=1)](tbaa=<0x563b171ec868>)> 0x563b172432a8, 0x563b17284c88, undef:i32
0x563b17284c88: i32,ch = load<LD4[null(addrspace=101)](dereferenceable)(invariant)> 0x563b172432a8, TargetExternalSymbol:i32'func_param_0', undef:i32
0x563b17284740: i32 = TargetExternalSymbol'func_param_0'
0x563b17284810: i32 = undef
0x563b17284810: i32 = undef
0x563b17284ae8: f32 = ConstantFP<1.500000e+00>
In function: func
clang-5.0: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 5.0.1 (tags/RELEASE_501/final)
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
clang-5.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-5.0: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-5.0: note: diagnostic msg: /tmp/test-0eb886.cl
clang-5.0: note: diagnostic msg: /tmp/test-0eb886.sh
clang-5.0: note: diagnostic msg:
********************
(using headers of https://libclc.llvm.org/ )
The key line being this, which is the same error as the one in the original issue.
fatal error: error in backend: Cannot select: 0x563b17284b50: f32 = fpow 0x563b17284a80, ConstantFP:f32<1.500000e+00>
from cuda.jl.
Minimal repro:
; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-09471dc.bc"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; Function Attrs: nounwind
define fastcc void @julia__26_63877() unnamed_addr #0 {
top:
%0 = tail call float @llvm.pow.f32(float undef, float undef)
%1 = fmul float undef, %0
%2 = fmul float undef, %1
store float %2, float* undef, align 4
ret void
}
; Function Attrs: nounwind readnone speculatable
declare float @llvm.pow.f32(float, float) JuliaGPU/CUDAnative.jl#1
attributes #0 = { nounwind "no-frame-pointer-elim"="true" }
attributes JuliaGPU/CUDAnative.jl#1 = { nounwind readnone speculatable }
from cuda.jl.
We can possible make this work if we have this expand to a libcall and provide its implementation in our runtime. Ref https://reviews.llvm.org/D34708 and JuliaGPU/CUDAnative.jl#303.
from cuda.jl.
I had a patch for this and JuliaGPU/CUDAnative.jl#13:
diff --git a/src/compiler/irgen.jl b/src/compiler/irgen.jl
index 0994165..0a957ce 100644
--- a/src/compiler/irgen.jl
+++ b/src/compiler/irgen.jl
@@ -179,6 +179,7 @@ function irgen(job::CompilerJob, method_instance::Core.MethodInstance, world)
add!(pm, ModulePass("LowerThrow", lower_throw!))
add!(pm, FunctionPass("HideUnreachable", hide_unreachable!))
add!(pm, ModulePass("HideTrap", hide_trap!))
+ add!(pm, ModulePass("LowerUnsupportedIntrinsics", replace_intrinsics!))
always_inliner!(pm)
run!(pm, mod)
end
@@ -444,3 +445,33 @@ function hide_trap!(mod::LLVM.Module)
end
return changed
end
+
+
+# this pass replaces LLVM intrinsics that are now supported by the NVPTX back-end
+# with calls to the NVVM device library.
+function replace_intrinsics!(mod::LLVM.Module)
+ job = current_job::CompilerJob
+ changed = false
+ @timeit to[] "lower unsupported intrinsics" begin
+
+
+ # intrinsics with exact libdevice matches
+
+ for (llvm_fn, nvvm_fn) in ("llvm.pow.f64" => "__nv_pow",
+ "llvm.pow.f64" => "__nv_powf",
+ "llvm.powi.f64" => "__nv_powi",
+ "llvm.powi.f64" => "__nv_powif")
+ if haskey(functions(mod), llvm_fn)
+ llvm_f = functions(mod)[llvm_fn]
+ llvm_ft = eltype(llvmtype(llvm_f))
+
+ nvvm_f = LLVM.Function(mod, nvvm_fn, llvm_ft)
+
+ replace_uses!(llvm_f, nvvm_f)
+ end
+ end
+
+
+ end
+ return changed
+end
But that doesn't work due to the throw_exp_domainerror
... We really should just redefine these functions to call libdevice directly instead of messing at the LLVM level:
@inline function ^(x::Float64, y::Float64)
z = ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, y)
if isnan(z) & !isnan(x+y)
throw_exp_domainerror(x)
end
z
end
@inline function ^(x::Float32, y::Float32)
z = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, y)
if isnan(z) & !isnan(x+y)
throw_exp_domainerror(x)
end
z
end
@inline ^(x::Float64, y::Integer) = ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
@inline ^(x::Float32, y::Integer) = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, Float32(y))
from cuda.jl.
I got a strange error, this is MWE,
using CUDAnative, CuArrays, GPUArrays
function sq_kernel(out!, x)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
v = 3
@inbounds out![i] += x[i] ^ v
return nothing
end
function sqfunc(out!::CuVector, z::CuVector)
XY = GPUArrays.thread_blocks_heuristic(length(out!))
@cuda threads=XY[1] blocks=XY[2] sq_kernel(out!, z)
end
# this fail
sqfunc(randn(Float32,128) |> CuArray, randn(Float32,128) |> CuArray)
function sq_kernel(out!, x)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
@inbounds out![i] += x[i] ^ 3
return nothing
end
# this work
sqfunc(randn(Float32,128) |> CuArray, randn(Float32,128) |> CuArray)
@maleadt , is it related or should I post a new issue?
from cuda.jl.
Yes, that is related. You can switch to using one of the pow signatures that CUDA supports, e.g., ^ Float32(v)
.
from cuda.jl.
Well, still does not work
using CUDAnative, CuArrays, GPUArrays
function sq_kernel(out!, x)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
v = 3
@inbounds out![i] += x[i] ^ Float32(v)
return nothing
end
got
ERROR: InvalidIRError: compiling sq_kernel(CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CuDeviceArray{Float32,1,CUDAnative.AS.Global}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to jl_alloc_string)
Stacktrace:
[1] _string_n at strings/string.jl:60
[2] string at strings/substring.jl:191
[3] throw_exp_domainerror at math.jl:36
[4] ^ at math.jl:856
[5] sq_kernel at /home/leo/Documents/papers/nilang/codes/bessel_gpu.jl:153
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
[1] unsafe_copyto! at array.jl:242
[2] macro expansion at gcutils.jl:91
[3] __unsafe_string! at strings/substring.jl:178
[4] string at strings/substring.jl:194
[5] throw_exp_domainerror at math.jl:36
[6] ^ at math.jl:856
[7] sq_kernel at /home/leo/Documents/papers/nilang/codes/bessel_gpu.jl:153
Stacktrace:
...
from cuda.jl.
The available signatures are here: https://github.com/JuliaGPU/CUDAnative.jl/blob/master/src/device/cuda/math.jl#L204-L209
from cuda.jl.
It is very wield, as long as the second argument of ^ is a symbol, it does not work. Regardless the type is supported or not. More likely a compiler issue?
from cuda.jl.
Related Issues (20)
- CUDABackend throws an error for empty array HOT 1
- Missing CUBLASLt wrappers
- Improve cross-device usage HOT 5
- CUDA 12.4 Update 1: CUPTI does not trace kernels anymore
- Bitonic sort exceeds launch resources HOT 3
- Avoid implementing LAPACK interfaces directly
- v5.3.0: regression in Zygote performance HOT 9
- CUBLASLt wrapper for `cublasLtMatmulDescSetAttribute` can have device buffers as input HOT 1
- Improve error message when assigning real valued arrray with complex numbers HOT 4
- `@device_code_sass` broken HOT 3
- Readme says Cuda 11 is supported but also the last version to support it is v4.4 HOT 1
- `@gcsafe_ccall` breaks inlining of ccall wrappers HOT 5
- Mixed eltype contraction failing with CuTensor HOT 1
- Add helper function to recompile CUDA stack
- Unable to use local CUDA runtime toolkit HOT 1
- Kron Support for CuSparseMatrixCSC HOT 1
- Enzyme prevents testing on 1.11
- Segfault during multiGPU tests
- EnzymeCore is an unconditional dependency. HOT 4
- Adapt + strictly-typed fields can trigger confusing errors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cuda.jl.