rocm / rocm-device-libs Goto Github PK

ROCm Device Libraries

CMake 1.84% C 92.77% LLVM 5.17% C++ 0.22%

rocm-device-libs's Introduction

Rocm-Device-Libs has moved!

This project is now located in the AMD Fork of the LLVM Project, under the "amd/device-libs" directory. This repository is now read-only.

All issues and pull requests related to the ROCm device libraries should be filed at https://github.com/ROCm/llvm-project with the device-libs tag.

Users wishing to build the device libraries against upstream LLVM without needing to clone the entire ROCm llvm-project fork can still do so as follows:

cd <upstream-llvm-project>
git remote add rocm-llvm https://github.com/ROCm/llvm-project.git
git fetch rocm-llvm <branch> (default branch is amd-staging)
git checkout rocm-llvm/<branch> -- amd (default branch is amd-staging)

The amd-specific projects, including comgr, hipcc, and device-libs, will now be present in the <upstream llvm-project>/amd directory.

rocm-device-libs's People

Contributors

Stargazers

Watchers

rocm-device-libs's Issues

can not build successfully

Create a timer in GPU kernels with LLVM instruction "__llvm_amdgcn_s_memrealtime"?

Hi,

I found LLVM instruction: __llvm_amdgcn_s_memrealtime in "irif/inc/irif.h". It seems that the instruction has been used in __clock_u64() in hc_amdgcn.ll. I am wondering what clock frequency is （i.e., CLOCKS_PER_SEC), so we can create a timer in GPU kernels.

Thanks,

Jing

Fails to build against llvm 16.0.4 with "Option 'debug-counter' registered more than once!"

FAILED: lib/amdgcn/bitcode/oclc_daz_opt_off.bc /var/tmp/portage/dev-libs/rocm-device-libs-5.5.1/work/ROCm-Device-Libs-rocm-5.5.1_build/lib/amdgcn/bitcode/oclc_daz_opt_off.bc 
cd /var/tmp/portage/dev-libs/rocm-device-libs-5.5.1/work/ROCm-Device-Libs-rocm-5.5.1_build/oclc && /usr/lib/llvm/16/bin/llvm-link -o oclc_daz_opt_off.link0.lib.bc @oclc_daz_opt_off_response && /usr/lib/llvm/16/bin/llvm-link -internalize -only-needed oclc_daz_opt_off.link0.lib.bc -o oclc_daz_opt_off.lib.bc && /usr/lib/llvm/16/bin/opt -passes=amdgpu-unify-metadata,strip -o oclc_daz_opt_off.strip.bc oclc_daz_opt_off.lib.bc && /var/tmp/portage/dev-libs/rocm-device-libs-5.5.1/work/ROCm-Device-Libs-rocm-5.5.1_build/utils/prepare-builtins/prepare-builtins -o /var/tmp/portage/dev-libs/rocm-device-libs-5.5.1/work/ROCm-Device-Libs-rocm-5.5.1_build/lib/amdgcn/bitcode/oclc_daz_opt_off.bc oclc_daz_opt_off.strip.bc
: CommandLine Error: Option 'debug-counter' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

there are more similar errors following, this is the whole build log.
build.log

ockl/src/toas.cl does not compile

If __private pointer is 64 bits then a typecast from uint chops it. This is how I fixed it.

grodgers@t0:~/git/ROC/rocm-device-libs/ockl/src$ git diff toas.cl
diff --git a/ockl/src/toas.cl b/ockl/src/toas.cl
index 604b87a..ba9a6c3 100644
--- a/ockl/src/toas.cl
+++ b/ockl/src/toas.cl
@@ -37,6 +37,6 @@ attribute((always_inline, const)) __private void *
OCKL_MANGLE_T(to,private)(void *a)
{
uint u = (uint)((ulong)a);

return OCKL_MANGLE_T(is_private,addr)(a) ? (__private void *)u : (__private void *)0;

return OCKL_MANGLE_T(is_private,addr)(a) ? (__private void *)a : (__private void *)0;
}

"requires cl_khr_fp64 extension to be enabled"

gist

Permission denied (publickey)

Cloning into 'llvm_amd-common'...
Warning: Permanently added the RSA host key for IP address '192.30.253.112' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Thanks

Test failed for gfx700 (fatal error: error in backend)

Building ROCm-Device-Libs (branch amd-stg-open, commit (8e595dd) upon llvm-16 (commit dafebd5b5a08dde25f5f52f65cac54bd6ec0ecde), two tests failure observed:

80% tests passed, 2 tests failed out of 10                                                                                                                                                   
                                                                                                                                                                                             
Total Test time (real) =   0.25 sec                                                                                                                                                          
                                                                                                                                                                                             
The following tests FAILED:                                                                                                                                                                  
          4 - compile_atan2__gfx700 (Failed)                                                                                                                                                 
          5 - compile_atan2pi__gfx700 (Failed)

The complete build.log:
build.log.gz

Testing log:
LastTest.log.gz

Code object version 5 breaks linking with undefined symbol __oclc_ABI_version

I noticed after upgrading to ROCm 5.2 from 5.1 I always get linking errors when building, even for examples in HIP-Examples (like add4), with the following error:

lld: error: undefined symbol: __oclc_ABI_version
>>> referenced by /tmp/hip-stream-778a9f/hip-stream-gfx1032.o:(void copy_looper<float, 1>(float const*, float*, int))
>>> referenced by /tmp/hip-stream-778a9f/hip-stream-gfx1032.o:(void copy_looper<float, 1>(float const*, float*, int))
>>> referenced by /tmp/hip-stream-778a9f/hip-stream-gfx1032.o:(void copy_looper<double, 1>(double const*, double*, int))
>>> referenced 29 more times

I've checked that the compiler flags are exactly the same. Using the 5.1.3 stack, I did a git bisect on this repo and found that commit 4b1191a was the culprit, which is a bit confusing because it seems like this commit only calls __oclc_ABI_version a few more times, and didn't modify it in any way.

It would be really helpful if someone can shed some light on this!

ROCDL build issue

Hi,
I am trying to build ROCDL. I am facing this error:

[ 99%] Building C object test/opencl/CMakeFiles/test_math_code.dir/test_math.c.o
fatal error: error in backend: Cannot select: 0xb34d808: i32 = bitcast 0xccc05c8
  0xccc05c8: v2i32 = BUILD_VECTOR 0xb330160, 0xb3136a0
    0xb330160: i32 = select 0xb332ba8, 0x697e268, Constant:i32<0>
      0xb332ba8: i1 = setcc 0x697e268, Constant:i32<-1>, setne:ch
        0x697e268: i32 = shl 0xb3519e8, Constant:i32<3>
          0xb3519e8: i32 = add 0xb2dc198, 0xb324228
            0xb2dc198: i32 = add 0xd113a90, 0x6982238
              0xd113a90: i32 = mul 0xb2e5868, 0xb34b700
                0xb2e5868: i32 = mul 0xcfff1c8, 0xd1135b0
                  0xcfff1c8: i32 = srl 0xb324e58, Constant:i32<16>


                  0xd1135b0: i32 = extract_vector_elt 0xd00e160, Constant:i32<1>


                0xb34b700: i32 = AssertZext 0xcff4c30, ValueType:ch:i11
                  0xcff4c30: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg6

              0x6982238: i32 = mul 0xb3464a8, 0xd1135b0
                0xb3464a8: i32 = AssertZext 0xb334340, ValueType:ch:i11
                  0xb334340: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg7

                0xd1135b0: i32 = extract_vector_elt 0xd00e160, Constant:i32<1>
                  0xd00e160: v2i32,ch = load<LD8[%7(addrspace=2)](align=4)(dereferenceable)(invariant)> 0xcfff7e0, 0xb3110d0, undef:i64


                  0x69823d8: i32 = Constant<1>
            0xb324228: i32 = AssertZext 0xb346578, ValueType:ch:i11
              0xb346578: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg8
                0xb34bd80: i32 = Register %vreg8
          0xb334138: i32 = Constant<3>
        0xb3503f0: i32 = Constant<-1>
      0x697e268: i32 = shl 0xb3519e8, Constant:i32<3>
        0xb3519e8: i32 = add 0xb2dc198, 0xb324228
          0xb2dc198: i32 = add 0xd113a90, 0x6982238
            0xd113a90: i32 = mul 0xb2e5868, 0xb34b700
              0xb2e5868: i32 = mul 0xcfff1c8, 0xd1135b0
                0xcfff1c8: i32 = srl 0xb324e58, Constant:i32<16>
                  0xb324e58: i32 = extract_vector_elt 0xd00e160, Constant:i32<0>


                  0xb324b80: i32 = Constant<16>
                0xd1135b0: i32 = extract_vector_elt 0xd00e160, Constant:i32<1>
                  0xd00e160: v2i32,ch = load<LD8[%7(addrspace=2)](align=4)(dereferenceable)(invariant)> 0xcfff7e0, 0xb3110d0, undef:i64


                  0x69823d8: i32 = Constant<1>
              0xb34b700: i32 = AssertZext 0xcff4c30, ValueType:ch:i11
                0xcff4c30: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg6
                  0xb2e5fb8: i32 = Register %vreg6
            0x6982238: i32 = mul 0xb3464a8, 0xd1135b0
              0xb3464a8: i32 = AssertZext 0xb334340, ValueType:ch:i11
                0xb334340: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg7
                  0xb3325f8: i32 = Register %vreg7
              0xd1135b0: i32 = extract_vector_elt 0xd00e160, Constant:i32<1>
                0xd00e160: v2i32,ch = load<LD8[%7(addrspace=2)](align=4)(dereferenceable)(invariant)> 0xcfff7e0, 0xb3110d0, undef:i64
                  0xb3110d0: i64 = add 0xd001590, Constant:i64<4>


                  0xb313020: i64 = undef
                0x69823d8: i32 = Constant<1>
          0xb324228: i32 = AssertZext 0xb346578, ValueType:ch:i11
            0xb346578: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg8
              0xb34bd80: i32 = Register %vreg8
        0xb334138: i32 = Constant<3>
      0xb330368: i32 = Constant<0>
    0xb3136a0: i32 = select 0xb332ba8, 0xa833ee8, Constant:i32<0>
      0xb332ba8: i1 = setcc 0x697e268, Constant:i32<-1>, setne:ch
        0x697e268: i32 = shl 0xb3519e8, Constant:i32<3>
          0xb3519e8: i32 = add 0xb2dc198, 0xb324228
            0xb2dc198: i32 = add 0xd113a90, 0x6982238
              0xd113a90: i32 = mul 0xb2e5868, 0xb34b700
                0xb2e5868: i32 = mul 0xcfff1c8, 0xd1135b0
                  0xcfff1c8: i32 = srl 0xb324e58, Constant:i32<16>


                  0xd1135b0: i32 = extract_vector_elt 0xd00e160, Constant:i32<1>


                0xb34b700: i32 = AssertZext 0xcff4c30, ValueType:ch:i11
                  0xcff4c30: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg6

              0x6982238: i32 = mul 0xb3464a8, 0xd1135b0
                0xb3464a8: i32 = AssertZext 0xb334340, ValueType:ch:i11
                  0xb334340: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg7

                0xd1135b0: i32 = extract_vector_elt 0xd00e160, Constant:i32<1>
                  0xd00e160: v2i32,ch = load<LD8[%7(addrspace=2)](align=4)(dereferenceable)(invariant)> 0xcfff7e0, 0xb3110d0, undef:i64


                  0x69823d8: i32 = Constant<1>
            0xb324228: i32 = AssertZext 0xb346578, ValueType:ch:i11
              0xb346578: i32,ch = CopyFromReg 0x58e7850, Register:i32 %vreg8
                0xb34bd80: i32 = Register %vreg8
          0xb334138: i32 = Constant<3>
        0xb3503f0: i32 = Constant<-1>
      0xa833ee8: i32,ch = load<LD4[undef(addrspace=2)+64](align=64)(dereferenceable)(invariant)> 0xb2dbec0:1, 0xb34b630, undef:i64
        0xb34b630: i64 = add 0xb2dbec0, Constant:i64<64>
          0xb2dbec0: i64,ch = CopyFromReg 0x58e7850, Register:i64 %vreg2
            0xcff7950: i64 = Register %vreg2
          0xb2e83e8: i64 = Constant<64>
        0xb313020: i64 = undef
      0xb330368: i32 = Constant<0>
In function: test_fract_float2_float2_float2
clang-4.0: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 4.0.0 (https://github.com/RadeonOpenCompute/clang 794f365321704b645955adfa7dac0851b18382fa) (https://github.com/RadeonOpenCompute/llvm.git 8c11c7bb6a75a44fec687341ba71beef9e5d6e20)
Target: amdgcn--amdhsa
Thread model: posix
InstalledDir: /home/aditya/llvm.src/build/bin
clang-4.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-4.0: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-4.0: note: diagnostic msg: /tmp/test_math-9e7cb0.cl
clang-4.0: note: diagnostic msg: /tmp/test_math-9e7cb0.sh
clang-4.0: note: diagnostic msg: 

********************
make[2]: *** [test/opencl/CMakeFiles/test_math_code.dir/test_math.c.o] Error 70
make[1]: *** [test/opencl/CMakeFiles/test_math_code.dir/all] Error 2
make: *** [all] Error 2

5.6.0: test suite is failing in two units

Looks like something is wrong and test suite is failing in two units

+ cd ROCm-Device-Libs-rocm-5.6.0
+ /usr/bin/ctest --test-dir x86_64-redhat-linux-gnu --output-on-failure --force-new-ctest-process -j48
Internal ctest changing into directory: /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/x86_64-redhat-linux-gnu
Test project /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/x86_64-redhat-linux-gnu
      Start  1: constant_fold_lgamma_r__gfx900
      Start  2: constant_fold_lgamma_r__gfx1030
      Start  3: compile_asin__gfx700
      Start  4: compile_atan2__gfx700
      Start  5: compile_atan2pi__gfx700
      Start  6: compile_asin__gfx803
      Start  7: compile_atan2__gfx803
      Start  8: compile_atan2pi__gfx803
      Start  9: compile_frexp__gfx600
      Start 10: compile_frexp__gfx700
 1/10 Test  #3: compile_asin__gfx700 ..............   Passed    0.14 sec
 2/10 Test #10: compile_frexp__gfx700 .............   Passed    0.13 sec
 3/10 Test  #8: compile_atan2pi__gfx803 ...........   Passed    0.14 sec
 4/10 Test  #6: compile_asin__gfx803 ..............   Passed    0.15 sec
 5/10 Test  #7: compile_atan2__gfx803 .............   Passed    0.15 sec
 6/10 Test  #9: compile_frexp__gfx600 .............   Passed    0.15 sec
 7/10 Test  #1: constant_fold_lgamma_r__gfx900 ....   Passed    0.17 sec
 8/10 Test  #2: constant_fold_lgamma_r__gfx1030 ...   Passed    0.17 sec
 9/10 Test  #4: compile_atan2__gfx700 .............***Failed    0.27 sec
CMake Error at /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/test/compile/RunCompileTest.cmake:24 (message):
  Error compiling test: fatal error: error in backend: Cannot select:
  0x5626e75ceee0: f16 = fma # D:1 0x5626e70ab360, 0x5626e75cee70,
  0x5626e70ab360

    0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
      0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
        0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
          0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
            0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
              0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                  0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                    0x5626e75ce8c0: f16 = Register %1
              0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                  0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                    0x5626e75ce7e0: f16 = Register %0
          0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
            0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80
              0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                  0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                    0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1



                0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                  0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                    0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0



        0x5626e70ab280: i32 = TargetConstant<0>
      0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
        0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
          0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
            0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
              0x5626e75ce8c0: f16 = Register %1
        0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
          0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
            0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
              0x5626e75ce7e0: f16 = Register %0
      0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
        0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
          0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
            0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
              0x5626e75ce8c0: f16 = Register %1
        0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
          0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
            0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
              0x5626e75ce7e0: f16 = Register %0
    0x5626e75cee70: f16 = fmul # D:1 0x5626e75cebd0, 0x5626e75cee00
      0x5626e75cebd0: f16 = fmul # D:1 0x5626e70ab360, 0x5626e70ab360
        0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
          0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
            0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
              0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
                0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                  0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                    0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930



                  0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                    0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850



              0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
                0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80
                  0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                    0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0



                    0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10



            0x5626e70ab280: i32 = TargetConstant<0>
          0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
            0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
              0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                  0x5626e75ce8c0: f16 = Register %1
            0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
              0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                  0x5626e75ce7e0: f16 = Register %0
          0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
            0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
              0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                  0x5626e75ce8c0: f16 = Register %1
            0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
              0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                  0x5626e75ce7e0: f16 = Register %0
        0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
          0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
            0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
              0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
                0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                  0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                    0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930



                  0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                    0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850



              0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
                0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80
                  0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                    0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0



                    0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10



            0x5626e70ab280: i32 = TargetConstant<0>
          0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
            0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
              0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                  0x5626e75ce8c0: f16 = Register %1
            0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
              0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                  0x5626e75ce7e0: f16 = Register %0
          0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
            0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
              0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                  0x5626e75ce8c0: f16 = Register %1
            0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
              0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                  0x5626e75ce7e0: f16 = Register %0
      0x5626e75cee00: f16 = fma # D:1 0x5626e75cebd0, 0x5626e75ced20, ConstantFP:f16<APFloat(46397)>
        0x5626e75cebd0: f16 = fmul # D:1 0x5626e70ab360, 0x5626e70ab360
          0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
            0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
              0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
                0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
                  0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                    0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0



                    0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10



                0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
                  0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80
                    0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0





              0x5626e70ab280: i32 = TargetConstant<0>
            0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
              0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                  0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                    0x5626e75ce8c0: f16 = Register %1
              0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                  0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                    0x5626e75ce7e0: f16 = Register %0
            0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
              0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                  0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                    0x5626e75ce8c0: f16 = Register %1
              0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                  0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                    0x5626e75ce7e0: f16 = Register %0
          0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
            0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
              0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
                0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
                  0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                    0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0



                    0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10



                0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
                  0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80
                    0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0





              0x5626e70ab280: i32 = TargetConstant<0>
            0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
              0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                  0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                    0x5626e75ce8c0: f16 = Register %1
              0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                  0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                    0x5626e75ce7e0: f16 = Register %0
            0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
              0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                  0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                    0x5626e75ce8c0: f16 = Register %1
              0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                  0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                    0x5626e75ce7e0: f16 = Register %0
        0x5626e75ced20: f16 = fma # D:1 0x5626e75cebd0, ConstantFP:f16<APFloat(43490)>, ConstantFP:f16<APFloat(12563)>
          0x5626e75cebd0: f16 = fmul # D:1 0x5626e70ab360, 0x5626e70ab360
            0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
              0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
                0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
                  0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
                    0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0





                  0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
                    0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80



                0x5626e70ab280: i32 = TargetConstant<0>
              0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                  0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                    0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1



                0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                  0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                    0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0



              0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                  0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                    0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1



                0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                  0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                    0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0



            0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
              0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
                0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
                  0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
                    0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0





                  0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
                    0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80



                0x5626e70ab280: i32 = TargetConstant<0>
              0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                  0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                    0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1



                0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                  0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                    0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0



              0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                  0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                    0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1



                0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                  0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                    0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0



          0x5626e75cecb0: f16 = ConstantFP<APFloat(43490)>
          0x5626e75cec40: f16 = ConstantFP<APFloat(12563)>
        0x5626e75ced90: f16 = ConstantFP<APFloat(46397)>
    0x5626e70ab360: f16 = DIV_FIXUP # D:1 0x5626e70ab2f0, 0x5626e75cea80, 0x5626e70ab440
      0x5626e70ab2f0: f16 = fp_round # D:1 0x5626e70ab210, TargetConstant:i32<0>
        0x5626e70ab210: f32 = fmul # D:1 0x5626e70aab10, 0x5626e70aae90
          0x5626e70aab10: f32 = fp_extend # D:1 0x5626e70ab440
            0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
              0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                  0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
                    0x5626e75ce8c0: f16 = Register %1
              0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                  0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
                    0x5626e75ce7e0: f16 = Register %0
          0x5626e70aae90: f32 = RCP # D:1 0x5626e70aab80
            0x5626e70aab80: f32 = fp_extend # D:1 0x5626e75cea80
              0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
                0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
                  0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
                    0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1



                0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
                  0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
                    0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0



        0x5626e70ab280: i32 = TargetConstant<0>
      0x5626e75cea80: f16 = fmaxnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
        0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
          0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
            0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
              0x5626e75ce8c0: f16 = Register %1
        0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
          0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
            0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
              0x5626e75ce7e0: f16 = Register %0
      0x5626e70ab440: f16 = fminnum_ieee # D:1 0x5626e75ceb60, 0x5626e70ab3d0
        0x5626e75ceb60: f16 = fcanonicalize # D:1 0x5626e75ce9a0
          0x5626e75ce9a0: f16 = fabs # D:1 0x5626e75ce930
            0x5626e75ce930: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %1
              0x5626e75ce8c0: f16 = Register %1
        0x5626e70ab3d0: f16 = fcanonicalize # D:1 0x5626e75cea10
          0x5626e75cea10: f16 = fabs # D:1 0x5626e75ce850
            0x5626e75ce850: f16,ch = CopyFromReg # D:1 0x5626e76b2500, Register:f16 %0
              0x5626e75ce7e0: f16 = Register %0

  In function: __ocml_atan2_f16

  PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/
  and include the crash backtrace, preprocessed source, and associated run
  script.

  Stack dump:

  0.    Program arguments: /usr/bin/clang-16 -O3 -S -cl-std=CL2.0 -target
  amdgcn-amd-amdhsa -mcpu=gfx700 -Xclang -finclude-default-header
  --rocm-path=/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/x86_64-redhat-linux-gnu
  -mllvm -amdgpu-simplify-libcall=0 -o output.atan2.gfx700.s
  /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/test/compile/atan2.cl


  1.    <eof> parser at end of file

  2.    Code generation

  3.    Running pass 'CallGraph Pass Manager' on module
  '/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/test/compile/atan2.cl'.


  4.    Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function
  '@__ocml_atan2_f16'

   #0 0x00007ff6399e3911 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/lib64/libLLVM-16.so.0+0xde3911)
   #1 0x00007ff6399e161a llvm::sys::RunSignalHandlers() (/lib64/libLLVM-16.so.0+0xde161a)
   #2 0x00007ff6398ff72a llvm::CrashRecoveryContext::HandleExit(int) (/lib64/libLLVM-16.so.0+0xcff72a)
   #3 0x00007ff6399db6c4 llvm::sys::Process::Exit(int, bool) (/lib64/libLLVM-16.so.0+0xddb6c4)
   #4 0x00005626e5890fa6 (/usr/bin/clang-16+0x11fa6)
   #5 0x00007ff6399116c5 llvm::report_fatal_error(llvm::Twine const&, bool) (/lib64/libLLVM-16.so.0+0xd116c5)
   #6 0x00007ff63a325a42 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/lib64/libLLVM-16.so.0+0x1725a42)
   #7 0x00007ff63a32a893 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/lib64/libLLVM-16.so.0+0x172a893)
   #8 0x00007ff63a322ddc llvm::SelectionDAGISel::DoInstructionSelection() (/lib64/libLLVM-16.so.0+0x1722ddc)
   #9 0x00007ff63a32d179 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/lib64/libLLVM-16.so.0+0x172d179)

  #10 0x00007ff63a330461
  llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&)
  (/lib64/libLLVM-16.so.0+0x1730461)

  #11 0x00007ff63a3321bc (/lib64/libLLVM-16.so.0+0x17321bc)

  #12 0x00007ff639df9897 (/lib64/libLLVM-16.so.0+0x11f9897)

  #13 0x00007ff639b4466b llvm::FPPassManager::runOnFunction(llvm::Function&)
  (/lib64/libLLVM-16.so.0+0xf4466b)

  #14 0x00007ff63afffd73 (/lib64/libLLVM-16.so.0+0x23ffd73)

  #15 0x00007ff639b44ecc llvm::legacy::PassManagerImpl::run(llvm::Module&)
  (/lib64/libLLVM-16.so.0+0xf44ecc)

  #16 0x00007ff641ac1ead clang::EmitBackendOutput(clang::DiagnosticsEngine&,
  clang::HeaderSearchOptions const&, clang::CodeGenOptions const&,
  clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef,
  llvm::Module*, clang::BackendAction,
  std::unique_ptr<llvm::raw_pwrite_stream,
  std::default_delete<llvm::raw_pwrite_stream>>)
  (/lib64/libclang-cpp.so.16.0+0x14c1ead)

  #17 0x00007ff641ac47fb (/lib64/libclang-cpp.so.16.0+0x14c47fb)

  #18 0x00007ff642d60ba2 clang::ParseAST(clang::Sema&, bool, bool)
  (/lib64/libclang-cpp.so.16.0+0x2760ba2)

  #19 0x00007ff6424acf79 clang::FrontendAction::Execute()
  (/lib64/libclang-cpp.so.16.0+0x1eacf79)

  #20 0x00007ff642d6acfb
  clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)
  (/lib64/libclang-cpp.so.16.0+0x276acfb)

  #21 0x00007ff642d6cbea
  clang::ExecuteCompilerInvocation(clang::CompilerInstance*)
  (/lib64/libclang-cpp.so.16.0+0x276cbea)

  #22 0x00005626e5897e68 cc1_main(llvm::ArrayRef<char const*>, char const*,
  void*) (/usr/bin/clang-16+0x18e68)

  #23 0x00005626e589ddda (/usr/bin/clang-16+0x1edda)

  #24 0x00007ff640de31f1 (/lib64/libclang-cpp.so.16.0+0x7e31f1)

  #25 0x00007ff6398ff60a
  llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>)
  (/lib64/libLLVM-16.so.0+0xcff60a)

  #26 0x00007ff640f7440f (/lib64/libclang-cpp.so.16.0+0x97440f)

  #27 0x00007ff641a1b826
  clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&,
  clang::driver::Command const*&, bool) const
  (/lib64/libclang-cpp.so.16.0+0x141b826)

  #28 0x00007ff641a1bb6c
  clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&,
  llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&,
  bool) const (/lib64/libclang-cpp.so.16.0+0x141bb6c)

  #29 0x00007ff641a20aec
  clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&,
  llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&)
  (/lib64/libclang-cpp.so.16.0+0x1420aec)

  #30 0x00005626e58a0518 clang_main(int, char**) (/usr/bin/clang-16+0x21518)

  #31 0x00007ff6384280ca __libc_start_call_main (/lib64/libc.so.6+0x280ca)

  #32 0x00007ff63842818b __libc_start_main@GLIBC_2.2.5
  (/lib64/libc.so.6+0x2818b)

  #33 0x00005626e588f175 _start (/usr/bin/clang-16+0x10175)

  clang-16: error: clang frontend command failed with exit code 70 (use -v to
  see invocation)

  clang version 16.0.6 (G2V 16.0.6-2.fc35)

  Target: amdgcn-amd-amdhsa

  Thread model: posix

  InstalledDir: /usr/bin

  clang-16: note: diagnostic msg:

  ********************



  PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:

  Preprocessed source(s) and associated run script(s) are located at:

  clang-16: note: diagnostic msg: /tmp/atan2-84026d.cl

  clang-16: note: diagnostic msg: /tmp/atan2-84026d.sh

  clang-16: note: diagnostic msg:



  ********************




10/10 Test  #5: compile_atan2pi__gfx700 ...........***Failed    0.27 sec
CMake Error at /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/test/compile/RunCompileTest.cmake:24 (message):
  Error compiling test: fatal error: error in backend: Cannot select:
  0x5584c1b533e0: f16 = fma # D:1 0x5584c1b530d0, 0x5584c1b53300,
  ConstantFP:f16<APFloat(13592)>

    0x5584c1b530d0: f16 = fmul # D:1 0x5584c162ef30, 0x5584c162ef30
      0x5584c162ef30: f16 = DIV_FIXUP # D:1 0x5584c162eec0, 0x5584c1b52f80, 0x5584c162f010
        0x5584c162eec0: f16 = fp_round # D:1 0x5584c162ede0, TargetConstant:i32<0>
          0x5584c162ede0: f32 = fmul # D:1 0x5584c162e6e0, 0x5584c162ea60
            0x5584c162e6e0: f32 = fp_extend # D:1 0x5584c162f010
              0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                  0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                    0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1



                0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                  0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                    0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0



            0x5584c162ea60: f32 = RCP # D:1 0x5584c162e750
              0x5584c162e750: f32 = fp_extend # D:1 0x5584c1b52f80
                0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                  0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                    0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30



                  0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                    0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50



          0x5584c162ee50: i32 = TargetConstant<0>
        0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
          0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
            0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
              0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                0x5584c1b52dc0: f16 = Register %1
          0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
            0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
              0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                0x5584c1b52ce0: f16 = Register %0
        0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
          0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
            0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
              0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                0x5584c1b52dc0: f16 = Register %1
          0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
            0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
              0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                0x5584c1b52ce0: f16 = Register %0
      0x5584c162ef30: f16 = DIV_FIXUP # D:1 0x5584c162eec0, 0x5584c1b52f80, 0x5584c162f010
        0x5584c162eec0: f16 = fp_round # D:1 0x5584c162ede0, TargetConstant:i32<0>
          0x5584c162ede0: f32 = fmul # D:1 0x5584c162e6e0, 0x5584c162ea60
            0x5584c162e6e0: f32 = fp_extend # D:1 0x5584c162f010
              0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                  0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                    0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1



                0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                  0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                    0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0



            0x5584c162ea60: f32 = RCP # D:1 0x5584c162e750
              0x5584c162e750: f32 = fp_extend # D:1 0x5584c1b52f80
                0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                  0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                    0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30



                  0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                    0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50



          0x5584c162ee50: i32 = TargetConstant<0>
        0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
          0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
            0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
              0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                0x5584c1b52dc0: f16 = Register %1
          0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
            0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
              0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                0x5584c1b52ce0: f16 = Register %0
        0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
          0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
            0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
              0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                0x5584c1b52dc0: f16 = Register %1
          0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
            0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
              0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                0x5584c1b52ce0: f16 = Register %0
    0x5584c1b53300: f16 = fma # D:1 0x5584c1b530d0, 0x5584c1b53220, ConstantFP:f16<APFloat(44722)>
      0x5584c1b530d0: f16 = fmul # D:1 0x5584c162ef30, 0x5584c162ef30
        0x5584c162ef30: f16 = DIV_FIXUP # D:1 0x5584c162eec0, 0x5584c1b52f80, 0x5584c162f010
          0x5584c162eec0: f16 = fp_round # D:1 0x5584c162ede0, TargetConstant:i32<0>
            0x5584c162ede0: f32 = fmul # D:1 0x5584c162e6e0, 0x5584c162ea60
              0x5584c162e6e0: f32 = fp_extend # D:1 0x5584c162f010
                0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                  0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                    0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30



                  0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                    0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50



              0x5584c162ea60: f32 = RCP # D:1 0x5584c162e750
                0x5584c162e750: f32 = fp_extend # D:1 0x5584c1b52f80
                  0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                    0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0



                    0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10



            0x5584c162ee50: i32 = TargetConstant<0>
          0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
            0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
              0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                  0x5584c1b52dc0: f16 = Register %1
            0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
              0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                  0x5584c1b52ce0: f16 = Register %0
          0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
            0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
              0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                  0x5584c1b52dc0: f16 = Register %1
            0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
              0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                  0x5584c1b52ce0: f16 = Register %0
        0x5584c162ef30: f16 = DIV_FIXUP # D:1 0x5584c162eec0, 0x5584c1b52f80, 0x5584c162f010
          0x5584c162eec0: f16 = fp_round # D:1 0x5584c162ede0, TargetConstant:i32<0>
            0x5584c162ede0: f32 = fmul # D:1 0x5584c162e6e0, 0x5584c162ea60
              0x5584c162e6e0: f32 = fp_extend # D:1 0x5584c162f010
                0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                  0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                    0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30



                  0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                    0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50



              0x5584c162ea60: f32 = RCP # D:1 0x5584c162e750
                0x5584c162e750: f32 = fp_extend # D:1 0x5584c1b52f80
                  0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                    0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0



                    0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10



            0x5584c162ee50: i32 = TargetConstant<0>
          0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
            0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
              0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                  0x5584c1b52dc0: f16 = Register %1
            0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
              0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                  0x5584c1b52ce0: f16 = Register %0
          0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
            0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
              0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                  0x5584c1b52dc0: f16 = Register %1
            0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
              0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                  0x5584c1b52ce0: f16 = Register %0
      0x5584c1b53220: f16 = fma # D:1 0x5584c1b530d0, ConstantFP:f16<APFloat(41917)>, ConstantFP:f16<APFloat(10897)>
        0x5584c1b530d0: f16 = fmul # D:1 0x5584c162ef30, 0x5584c162ef30
          0x5584c162ef30: f16 = DIV_FIXUP # D:1 0x5584c162eec0, 0x5584c1b52f80, 0x5584c162f010
            0x5584c162eec0: f16 = fp_round # D:1 0x5584c162ede0, TargetConstant:i32<0>
              0x5584c162ede0: f32 = fmul # D:1 0x5584c162e6e0, 0x5584c162ea60
                0x5584c162e6e0: f32 = fp_extend # D:1 0x5584c162f010
                  0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                    0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0



                    0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10



                0x5584c162ea60: f32 = RCP # D:1 0x5584c162e750
                  0x5584c162e750: f32 = fp_extend # D:1 0x5584c1b52f80
                    0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0





              0x5584c162ee50: i32 = TargetConstant<0>
            0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
              0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                  0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                    0x5584c1b52dc0: f16 = Register %1
              0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                  0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                    0x5584c1b52ce0: f16 = Register %0
            0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
              0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                  0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                    0x5584c1b52dc0: f16 = Register %1
              0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                  0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                    0x5584c1b52ce0: f16 = Register %0
          0x5584c162ef30: f16 = DIV_FIXUP # D:1 0x5584c162eec0, 0x5584c1b52f80, 0x5584c162f010
            0x5584c162eec0: f16 = fp_round # D:1 0x5584c162ede0, TargetConstant:i32<0>
              0x5584c162ede0: f32 = fmul # D:1 0x5584c162e6e0, 0x5584c162ea60
                0x5584c162e6e0: f32 = fp_extend # D:1 0x5584c162f010
                  0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
                    0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0



                    0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10



                0x5584c162ea60: f32 = RCP # D:1 0x5584c162e750
                  0x5584c162e750: f32 = fp_extend # D:1 0x5584c1b52f80
                    0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0





              0x5584c162ee50: i32 = TargetConstant<0>
            0x5584c1b52f80: f16 = fmaxnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
              0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                  0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                    0x5584c1b52dc0: f16 = Register %1
              0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                  0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                    0x5584c1b52ce0: f16 = Register %0
            0x5584c162f010: f16 = fminnum_ieee # D:1 0x5584c1b53060, 0x5584c162efa0
              0x5584c1b53060: f16 = fcanonicalize # D:1 0x5584c1b52ea0
                0x5584c1b52ea0: f16 = fabs # D:1 0x5584c1b52e30
                  0x5584c1b52e30: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %1
                    0x5584c1b52dc0: f16 = Register %1
              0x5584c162efa0: f16 = fcanonicalize # D:1 0x5584c1b52f10
                0x5584c1b52f10: f16 = fabs # D:1 0x5584c1b52d50
                  0x5584c1b52d50: f16,ch = CopyFromReg # D:1 0x5584c1615ac0, Register:f16 %0
                    0x5584c1b52ce0: f16 = Register %0
        0x5584c1b531b0: f16 = ConstantFP<APFloat(41917)>
        0x5584c1b53140: f16 = ConstantFP<APFloat(10897)>
      0x5584c1b53290: f16 = ConstantFP<APFloat(44722)>
    0x5584c1b53370: f16 = ConstantFP<APFloat(13592)>

  In function: __ocml_atan2pi_f16

  PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/
  and include the crash backtrace, preprocessed source, and associated run
  script.

  Stack dump:

  0.    Program arguments: /usr/bin/clang-16 -O3 -S -cl-std=CL2.0 -target
  amdgcn-amd-amdhsa -mcpu=gfx700 -Xclang -finclude-default-header
  --rocm-path=/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/x86_64-redhat-linux-gnu
  -mllvm -amdgpu-simplify-libcall=0 -o output.atan2pi.gfx700.s
  /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/test/compile/atan2pi.cl


  1.    <eof> parser at end of file

  2.    Code generation

  3.    Running pass 'CallGraph Pass Manager' on module
  '/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-5.6.0/test/compile/atan2pi.cl'.


  4.    Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function
  '@__ocml_atan2pi_f16'

   #0 0x00007fb1937e3911 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/lib64/libLLVM-16.so.0+0xde3911)
   #1 0x00007fb1937e161a llvm::sys::RunSignalHandlers() (/lib64/libLLVM-16.so.0+0xde161a)
   #2 0x00007fb1936ff72a llvm::CrashRecoveryContext::HandleExit(int) (/lib64/libLLVM-16.so.0+0xcff72a)
   #3 0x00007fb1937db6c4 llvm::sys::Process::Exit(int, bool) (/lib64/libLLVM-16.so.0+0xddb6c4)
   #4 0x00005584bf51bfa6 (/usr/bin/clang-16+0x11fa6)
   #5 0x00007fb1937116c5 llvm::report_fatal_error(llvm::Twine const&, bool) (/lib64/libLLVM-16.so.0+0xd116c5)
   #6 0x00007fb194125a42 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/lib64/libLLVM-16.so.0+0x1725a42)
   #7 0x00007fb19412a893 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/lib64/libLLVM-16.so.0+0x172a893)
   #8 0x00007fb194122ddc llvm::SelectionDAGISel::DoInstructionSelection() (/lib64/libLLVM-16.so.0+0x1722ddc)
   #9 0x00007fb19412d179 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/lib64/libLLVM-16.so.0+0x172d179)

  #10 0x00007fb194130461
  llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&)
  (/lib64/libLLVM-16.so.0+0x1730461)

  #11 0x00007fb1941321bc (/lib64/libLLVM-16.so.0+0x17321bc)

  #12 0x00007fb193bf9897 (/lib64/libLLVM-16.so.0+0x11f9897)

  #13 0x00007fb19394466b llvm::FPPassManager::runOnFunction(llvm::Function&)
  (/lib64/libLLVM-16.so.0+0xf4466b)

  #14 0x00007fb194dffd73 (/lib64/libLLVM-16.so.0+0x23ffd73)

  #15 0x00007fb193944ecc llvm::legacy::PassManagerImpl::run(llvm::Module&)
  (/lib64/libLLVM-16.so.0+0xf44ecc)

  #16 0x00007fb19b8c1ead clang::EmitBackendOutput(clang::DiagnosticsEngine&,
  clang::HeaderSearchOptions const&, clang::CodeGenOptions const&,
  clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef,
  llvm::Module*, clang::BackendAction,
  std::unique_ptr<llvm::raw_pwrite_stream,
  std::default_delete<llvm::raw_pwrite_stream>>)
  (/lib64/libclang-cpp.so.16.0+0x14c1ead)

  #17 0x00007fb19b8c47fb (/lib64/libclang-cpp.so.16.0+0x14c47fb)

  #18 0x00007fb19cb60ba2 clang::ParseAST(clang::Sema&, bool, bool)
  (/lib64/libclang-cpp.so.16.0+0x2760ba2)

  #19 0x00007fb19c2acf79 clang::FrontendAction::Execute()
  (/lib64/libclang-cpp.so.16.0+0x1eacf79)

  #20 0x00007fb19cb6acfb
  clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)
  (/lib64/libclang-cpp.so.16.0+0x276acfb)

  #21 0x00007fb19cb6cbea
  clang::ExecuteCompilerInvocation(clang::CompilerInstance*)
  (/lib64/libclang-cpp.so.16.0+0x276cbea)

  #22 0x00005584bf522e68 cc1_main(llvm::ArrayRef<char const*>, char const*,
  void*) (/usr/bin/clang-16+0x18e68)

  #23 0x00005584bf528dda (/usr/bin/clang-16+0x1edda)

  #24 0x00007fb19abe31f1 (/lib64/libclang-cpp.so.16.0+0x7e31f1)

  #25 0x00007fb1936ff60a
  llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>)
  (/lib64/libLLVM-16.so.0+0xcff60a)

  #26 0x00007fb19ad7440f (/lib64/libclang-cpp.so.16.0+0x97440f)

  #27 0x00007fb19b81b826
  clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&,
  clang::driver::Command const*&, bool) const
  (/lib64/libclang-cpp.so.16.0+0x141b826)

  #28 0x00007fb19b81bb6c
  clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&,
  llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&,
  bool) const (/lib64/libclang-cpp.so.16.0+0x141bb6c)

  #29 0x00007fb19b820aec
  clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&,
  llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&)
  (/lib64/libclang-cpp.so.16.0+0x1420aec)

  #30 0x00005584bf52b518 clang_main(int, char**) (/usr/bin/clang-16+0x21518)

  #31 0x00007fb1922280ca __libc_start_call_main (/lib64/libc.so.6+0x280ca)

  #32 0x00007fb19222818b __libc_start_main@GLIBC_2.2.5
  (/lib64/libc.so.6+0x2818b)

  #33 0x00005584bf51a175 _start (/usr/bin/clang-16+0x10175)

  clang-16: error: clang frontend command failed with exit code 70 (use -v to
  see invocation)

  clang version 16.0.6 (G2V 16.0.6-2.fc35)

  Target: amdgcn-amd-amdhsa

  Thread model: posix

  InstalledDir: /usr/bin

  clang-16: note: diagnostic msg:

  ********************



  PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:

  Preprocessed source(s) and associated run script(s) are located at:

  clang-16: note: diagnostic msg: /tmp/atan2pi-4fa3b0.cl

  clang-16: note: diagnostic msg: /tmp/atan2pi-4fa3b0.sh

  clang-16: note: diagnostic msg:



  ********************





80% tests passed, 2 tests failed out of 10

Total Test time (real) =   0.28 sec

The following tests FAILED:
          4 - compile_atan2__gfx700 (Failed)
          5 - compile_atan2pi__gfx700 (Failed)
Errors while running CTest

I'm using LLVM 16.0.6.
I'm not sure what more I should provide as details about my build env.

conversions_3835: error: unsupported call to function __ockl_clz_i32

cd /srv/git/ROCm-Device-Libs/build/test/opencl_conformance && /srv/git/llvm.git/build/dist/bin/clang -I/srv/git/ROCm-Device-Libs/inc -x cl -Xclang -cl-std=CL2.0 -fblocks -target amdgcn--amdhsa -DCL_VERSION_2_0=200 -D__OPENCL_C_VERSION__=200 -Dcl_khr_fp64 -Dcl_khr_fp16 -include opencl-c.h -target amdgcn--amdhsa -mcpu=fiji -Xclang -mlink-bitcode-file -Xclang /srv/git/ROCm-Device-Libs/build/opencl/opencl.bc -Xclang -mlink-bitcode-file -Xclang /srv/git/ROCm-Device-Libs/build/ocml/ocml.bc -Xclang -mlink-bitcode-file -Xclang /srv/git/ROCm-Device-Libs/build/ockl/ockl.bc -Xclang -mlink-bitcode-file -Xclang /srv/git/ROCm-Device-Libs/build/oclc/oclc.bc -Xclang -mlink-bitcode-file -Xclang /srv/git/ROCm-Device-Libs/build/llvm/llvm.bc -o CMakeFiles/conversions_3835_code.dir/conversions_3835.c.o -c /srv/git/ROCm-Device-Libs/build/test/opencl_conformance/conversions_3835.c
/srv/git/ROCm-Device-Libs/build/test/opencl_conformance/conversions_3835.c:1:15: error: unsupported call to function __ockl_clz_i32
__kernel void test_convert_float2_rtn_int2( __global int2 *src, __global float2 *dest )
^
/srv/git/ROCm-Device-Libs/build/test/opencl_conformance/conversions_3835.c:1:15: error: unsupported call to function __ockl_clz_i32
2 errors generated.

Kernel:
__kernel void test_convert_float2_rtn_int2( __global int2 *src, __global float2 *dest )
{
size_t i = get_global_id(0);
dest[i] = convert_float2_rtn( src[i] );
}

Clang can't find device library when using FHS compliant path

I have packaged rocm for Solus closely following @Mystro256's packages for Fedora, not using /opt/rocm but instead GNUInstallDirs. The paths are /usr/lib64/cmake/AMDDeviceLibs/AMDDeviceLibsConfig.cmake and/usr/lib64/amdgcn/bitcode/, same as on Fedora. I used this patch without modifications. I assume this will affect Fedora as well.
All packages work so far, up to rocm-opencl and hip, but when building rocBLAS I am getting:

clang-13: error: cannot find ROCm device library. Provide its path via --rocm-path or --rocm-device-lib-path, or pass -nogpulib to build without ROCm device library.

This is happening no matter if using LLVM 13 or 14, when defining HIP_DEVICE_LIB_PATH/--rocm-device-lib-path or not.
I have tested so many different flags, went through how LLVM is supposed to detect device libs, and at this point have no idea what else to try. Does anyone have any suggestion what else can I try?
Here are my rocm-device-libs build script, and here the rocBLAS one that fails.

Remove unit tests for unsupported architectures

gfx600 and gfx700 are not supported anymore. Remove the corresponding unit tests.

Request to cherry-pick 66a1c5b for LLVM 15

Currently no version of ROCm-Device-Libs can be built with LLVM 15 and above because LLVM 15 fully switched to opaque pointers. You will get an error like below that is caused by the __llvm_amdgcn_global_atomic_fadd_f32_p1f32_f32 function in the ockl/src/gaaf.cl file:

FAILED: ockl/gaaf.bc /home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/solusBuildDir/ockl/gaaf.bc 
cd /home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/solusBuildDir/ockl && /usr/bin/clang-15 -I/home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/ockl/../irif/inc -I/home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/ockl/../oclc/inc -I/home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/ockl/inc -fcolor-diagnostics -Werror -Wno-error=atomic-alignment -x cl -Xclang -cl-std=CL2.0 -target amdgcn-amd-amdhsa -fvisibility=protected -fomit-frame-pointer -Xclang -finclude-default-header -nogpulib -cl-no-stdinc -Xclang -mcode-object-version=none -emit-llvm -Xclang -mlink-builtin-bitcode -Xclang /home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/solusBuildDir/irif/irif.bc -c /home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/ockl/src/gaaf.cl -o /home/build/YPKG/root/rocm-device-libs/build/ROCm-Device-Libs-rocm-5.2.3/solusBuildDir/ockl/gaaf.bc
'+atomic-fadd-insts' is not a recognized feature for this target (ignoring feature)
'+atomic-fadd-insts' is not a recognized feature for this target (ignoring feature)
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.global.atomic.fadd.f32.p1.f32
ptr @llvm.amdgcn.global.atomic.fadd.f32.p1f32.f32
fatal error: error in backend: Broken module found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.

This commit fixed the issue almost a year ago but is not yet available in any of the releases. Could you please cherry-pick this commit to the next release?

Thank you!

Please enable two factor authentication in your github account

@alex-t;@guansong;@wkwchau;@nhaustov;@kasaurov;@amdgerritcr

We are going to enforce two factor authentication in (https://github.com/RadeonOpenCompute/) organization on 7th April , 2022 . Since we identified you as outside collaborator for this organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement. Please skip if already done.
To set up two factor authentication, please go through the steps in below link:

https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/configuring-two-factor-authentication

Please reach out to "[email protected] " for queries

y1(-0.0) returns +inf (instead of -inf or Nan)

I noticed that both y1f and y1 return +inf for -0.0 (and NaN for any other negative number)
y0(-0.0) return -inf

linux libm return -inf for any negative number (NaN is most probably more correct)
+inf is definitively wrong.

I am using rocm 4.2 out of the box

Request: Extended hsa signal support

Hi,
as stated here HSA signals are supported at a minimum via "ockl".
Are there plans to extend the signal support? In particular I would be interested in the "hsa_signal_wait" operations.

Missed v6.1.0

j1 (both double and float) return NaN for "large" negative numbers

I suspect this is due to the use of "x" instead of "ax" in this portion of the code
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/a1f82f37cf105f0091e0f9e850a0a75aabde6eb7/ocml/src/j1F.cl#L81

lgamma becomes inf for tiny negative values

for values that corresponds to x*x ~ 0 lgamma becomes inf for negative x.

__global__ void doit(double x) {
   auto y =log(x);
   auto z1= lgamma(x);
   auto z2= lgamma(-x);
   printf ("double %a %a %a %a\n",x,y,z1,z2);
}

produces
double 0x1p-472 -0x1.472a5c30ead69p+8 0x1.472a5c30ead69p+8 0x1.472a5c30ead69p+8
double 0x1p-522 -0x1.69d2a4df51d11p+8 0x1.69d2a4df51d11p+8 inf

in my opinion this is due to the code at line 273 (actually line 276).
(as you quote fo x<0 lgamma(x) = log(pi/(|xsin(pix)|)) - lgamma(-x); that for tiny x becomes -2log(-x) + log(-x) = -log(-x);

[Issue]: hashcat HIP backend fails due to missing symbols

Problem Description

When using the HIP backend of hashcat it fails with the following error:

-------------------
* Hash-Mode 0 (MD5)
-------------------

'+gws' is not a recognized feature for this target (ignoring feature)
'+gws' is not a recognized feature for this target (ignoring feature)
hiprtcCompileProgram(): HIPRTC_ERROR_COMPILATION

ld.lld: error: undefined hidden symbol: __ockl_get_group_id
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_decompress)
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_decompress)
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_memset)
>>> referenced 7 more times

ld.lld: error: undefined hidden symbol: __ockl_get_local_size
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_decompress)
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_decompress)
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_memset)
>>> referenced 7 more times

ld.lld: error: undefined hidden symbol: __ockl_get_local_id
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_decompress)
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_decompress)
>>> referenced by /home/steve/.local/share/hashcat/comgr-4247ea/input/LLVMBitcode.bc.o:(gpu_memset)
>>> referenced 7 more times

* Device #1: Kernel /usr/share/hashcat/OpenCL/shared.cl build failed.

* Device #1: Kernel /usr/share/hashcat/OpenCL/shared.cl build failed.

I'm not sure if this is a bug in hashcat, or in ROCm. It did supposedly work with earlier versions of ROCm. The OpenCL backend works fine.

Operating System

Gentoo

CPU

Any

GPU

AMD Radeon Pro VII

ROCm Version

ROCm 6.0.0, ROCm 5.7.1

ROCm Component

ROCm-Device-Libs

Steps to Reproduce

Use hashcat benchmark mode "hashcat -b". "--backend-ignore-opencl" and/or "--backend-ignore-cuda" may be necessary depending upon system and enumeration order.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

tgammaf becomes inf for tiny values of its argument

tgammaf returns inf for |x|<= 0x1.fffffcp-127. This is the largest float for which x*x=0;
Other libraries return a value equal to 1/x (8.50706e+37)

`__builtin_amdgcn_s_sendmsg_rtnl` - ockl/src/mtime.cl:20:12: error: use of undeclared identifier

ROCm/ROCm-Device-Libs/ockl/src/mtime.cl:20:12: error: use of undeclared identifier '__builtin_amdgcn_s_sendmsg_rtnl'
    return __builtin_amdgcn_s_sendmsg_rtnl(0x83);
           ^
1 error generated.

I am using ROCm-Device-Libs @ 4d86a31 and LLVM @ a29fe42.

greping around I found BuiltinsAMDGPU.def#L282:

TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtnl, "UWiUIi", "n", "gfx11-insts")

So not sure if an include is missing or a library isn't being linked?

Required amdhsacod is not in ROCR-Runtime

There is something like this in https://github.com/RadeonOpenCompute/ROCR-Runtime, but only a library.

Where can I build this tool?

amdhsacod loader failed: Invalid code object

After building and installing ROCm-Device-libs the corresponding check fails:

make test
Running tests...
Test project ~/sonst/ROCm-Device-Libs/build
    Start 1: test_common:llvm-objdump
1/6 Test #1: test_common:llvm-objdump .........   Passed    0.09 sec
    Start 2: test_common:amdhsacod
2/6 Test #2: test_common:amdhsacod ............***Failed    0.14 sec
    Start 3: test_math:llvm-objdump
3/6 Test #3: test_math:llvm-objdump ...........   Passed    4.09 sec
    Start 4: test_math:amdhsacod
4/6 Test #4: test_math:amdhsacod ..............***Failed    8.12 sec
    Start 5: test_workitem:llvm-objdump
5/6 Test #5: test_workitem:llvm-objdump .......   Passed    0.01 sec
    Start 6: test_workitem:amdhsacod
6/6 Test #6: test_workitem:amdhsacod ..........***Failed    0.04 sec

50% tests passed, 3 tests failed out of 6

Total Test time (real) =  19.72 sec

The following tests FAILED:
	  2 - test_common:amdhsacod (Failed)
	  4 - test_math:amdhsacod (Failed)
	  6 - test_workitem:amdhsacod (Failed)
Errors while running CTest
Makefile:151: recipe for target 'test' failed
make: *** [test] Error 8

Specifically, the three failing tests, e.g.
/opt/rocm/hsa/bin/amdhsacod -test -code ~/sonst/ROCm-Device-Libs/build/test/opencl/test_common_code.co
print a lot of text before stopping with

AMD HSA Code Object End
Error: loader failed (hsa status = The code object is invalid.)

Any idea what the problem is? vector_copy works, and the libhsa-ext-* libraries seem to be in the proper place.

New location of repo

With the deprecation notice it points to the monorepo but there is no amd/ directory there or in upstream LLVM. This is a welcome transition and we would like to use / build from the new location and remove this dependency.

Continue mirroring this repo from the monorepo

Would it be possible to continue mirroring this repo so it allows end users to build with their own LLVM iree-org/iree#14894

recent improvements in ocml NOT in release 4.5

From my tests I think that the recent improvements in ocml are not included in release 4.5.
Am I missing anything?

Is the improved ocml foreseen to be included soon in a future release?

wgscratch datalayout is different from llvm-link

I'm getting the following warning with llvm-project@a268127 :

warning: Linking two modules of different data layouts: '.../ockl/src/wgscratch.ll' is

'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7'

whereas 'llvm-link' is

'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8'

It appears that the difference is

ni:7

vs

p7:160:256:256:32-p8:128:128-ni:7:8

Build errors

Hi,

I am trying to build this library and link it with clang opencl compiler.

However I am having a bunch of errors like the following.
First several project won't build for error like this:
D:\Coding\CoolProjects\ROCm-Device-Libs\opencl\src\workgroup\wgscratch.ll : fatal error LNK1107: invalid or corrupt file: cannot read at 0x244

Other projects failed because this error:
..\utils\prepare-builtins\RelWithDebInfo\prepare-builtins.exe: no such file or directory
I checked that prepare-builtins.exe is asking a bunch of .bc files, like oclc_isa_version_701.lib.bc for example. Where can I find these files? They don't seem to be in this repo.

Thanks!

Device function malloc Throws Hardware Exception

Hello,
After reading the HIP programming documentation, I was under the impression that calling malloc inside a __global__ function is supported; However, the following code throws the following exception when compiled with hipcc. Compiling with nvcc works as intended.

#include <hip/hip_runtime.h>
#include <iostream>

#if defined(__CUDACC__)
    #define SIZE 32
#elif defined(__HIP__)
    #define SIZE 64
#endif


__device__ int my_global_memory[SIZE];

__global__
void k_print_vector() {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    int* my_ptr;
    my_ptr = (int*) malloc(sizeof(int));
    *my_ptr = i;
    my_global_memory[i] = i;
#if defined(__CUDACC__)
    printf("CUDA printf: element idx [%d]: %d, %d\n", i, my_global_memory[i], *my_ptr);
    free(my_ptr);
#endif
#if defined(__HIP__)
    printf("HIP printf: element idx [%d]: %d, %d\n", i, my_global_memory[i], *my_ptr);
#endif
}


int main() {
    hipLaunchKernelGGL(k_print_vector, 1, SIZE, 0, 0);
    auto err = hipDeviceSynchronize();
    if (err != hipSuccess) {
        std::cerr << "Kernel Launch failed! error code: " << err << std::endl;
    }
    return 0;
}

AMD output:

:0:rocdevice.cpp            :2533: 9757626138808 us: Device::callbackQueue aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

CUDA output:

CUDA printf: element idx [0]: 0, 0
CUDA printf: element idx [1]: 1, 1
CUDA printf: element idx [2]: 2, 2
CUDA printf: element idx [3]: 3, 3
CUDA printf: element idx [4]: 4, 4
CUDA printf: element idx [5]: 5, 5
CUDA printf: element idx [6]: 6, 6
CUDA printf: element idx [7]: 7, 7
CUDA printf: element idx [8]: 8, 8
CUDA printf: element idx [9]: 9, 9
CUDA printf: element idx [10]: 10, 10
CUDA printf: element idx [11]: 11, 11
CUDA printf: element idx [12]: 12, 12
CUDA printf: element idx [13]: 13, 13
CUDA printf: element idx [14]: 14, 14
CUDA printf: element idx [15]: 15, 15
CUDA printf: element idx [16]: 16, 16
CUDA printf: element idx [17]: 17, 17
CUDA printf: element idx [18]: 18, 18
CUDA printf: element idx [19]: 19, 19
CUDA printf: element idx [20]: 20, 20
CUDA printf: element idx [21]: 21, 21
CUDA printf: element idx [22]: 22, 22
CUDA printf: element idx [23]: 23, 23
CUDA printf: element idx [24]: 24, 24
CUDA printf: element idx [25]: 25, 25
CUDA printf: element idx [26]: 26, 26
CUDA printf: element idx [27]: 27, 27
CUDA printf: element idx [28]: 28, 28
CUDA printf: element idx [29]: 29, 29
CUDA printf: element idx [30]: 30, 30
CUDA printf: element idx [31]: 31, 31

Could you please clarify the development status of this feature? We are teaching a course on HIP so it would help us get the correct information across.

Thanks,
Matin

are "pi" functions other than sinpi and cospi really available?

I see code and doc for aXYZpi and for (a)tanpi(2)
but in the version we installed only sinpi and cospi seem available
grep pif /opt/rocm-5.7.0/llvm/lib/clang/17.0.0/include/__clang_hip_math.h
float cospif(float __x) { return __ocml_cospi_f32(__x); }
void sincospif(float __x, float *__sinptr, float *__cosptr) {
float sinpif(float __x) { return __ocml_sinpi_f32(__x); }

mulF.cl implemented x + y

If you look into mulF.cl carefully, you will notice:

#include "mathF.h"

CONSTATTR float
MATH_MANGLE(mul_rte)(float x, float y)
{
return x + y;
}

Forward progress guarantees of ockl_hsa_signal_cas?

I've implemented a simple form of hostcall in the Julia AMDGPU compute library AMDGPU.jl. In my efforts to make it safe for concurrent wavefront access, I use ockl_hsa_signal_cas to transition a signal between various states in an atomic manner. I'm running into an issue where my CAS loop which does the transition doesn't make forward progress, even when only a single wavefront is trying to use the hostcall.

I don't have access to a working debugger (ROCgdb doesn't seem to work for me, ROCm/ROCgdb#5), so everything I know about this issue I've gleaned from observing behavior indirectly (basically, does the kernel complete or not?) I was wondering if the details on this CAS implementation are documented anywhere, specifically in what situations forward progress are not guaranteed? I can provide more details about how I'm invoking this intrinsic if that would help.

README.md is too much old

README.md is not update for 6 months.
It is too much old. and mislead building from source code.

question about sin/cos function

Is there a builtin function for double-precision sine or cosine ?

AMDDeviceLibsConfig.cmake could not be found by HIP

In CMake project, when doing find_package(HIP REQUIRED) the line find_dependency(AMDDeviceLibs) of /opt/rocm-5.1.0/hip/lib/cmake/hip/hip-config.cmake fails to find AMDDeviceLibsConfig.cmake. ~~The AMDDeviceLibsConfig.cmake.in clearly indicates that the file should present, but perhaps was lost during release packaging.~~ The file is present in /opt//rocm-5.1.0/lib/cmake/AMDDeviceLibs/AMDDeviceLibsConfig.cmake, but could not be found by HIP.

Use of unknown builtin

Trying to build ROCm-Device-Libs master as suggested by HIP-clang build instructions while reyling on LLVM 9.0.1. Rest of ROCm installed under /opt/rocm is 3.0. Ubuntu 16.04

PS /home/mate/Source/Repos/ROCm-Device-Libs/build> cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_DIR=/opt/rocm/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_WERROR=1 -DLLVM_ENABLE_ASSERTIONS=1 ..            
-- The C compiler identification is Clang 9.0.1
-- The CXX compiler identification is Clang 9.0.1
-- Check for working C compiler: /opt/rocm/llvm/bin/clang
-- Check for working C compiler: /opt/rocm/llvm/bin/clang -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Linker detection: GNU ld
CMake Deprecation Warning at OCL.cmake:12 (cmake_policy):
  The OLD behavior for policy CMP0053 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.
Call Stack (most recent call first):
  CMakeLists.txt:62 (include)


-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    LLVM_ENABLE_WERROR


-- Build files have been written to: /home/mate/Source/Repos/ROCm-Device-Libs/build
PS /home/mate/Source/Repos/ROCm-Device-Libs/build> cmake --build . -- -j8
...
[11%] Generating add_sat.bc
/home/mate/Source/Repos/ROCm-Device-Libs/ocml/src/acosH.cl:30:15: error: use of unknown builtin '__builtin_fabsf16' [-Wimplicit-function-declaration]
    half ax = BUILTIN_ABS_F16(x);
              ^
/home/mate/Source/Repos/ROCm-Device-Libs/ocml/src/builtins.h:41:25: note: expanded from macro 'BUILTIN_ABS_F16'
#define BUILTIN_ABS_F16 __builtin_fabsf16
                        ^

6.0.2 [Issue]: build fails

Problem Description

camake setting ("cmake -L" output)

-- Cache values
CMAKE_BUILD_TYPE:STRING=RelWithDebInfo
CMAKE_INSTALL_PREFIX:PATH=/usr
CPACK_GENERATOR:STRING=DEB;RPM
Clang_DIR:PATH=/usr/lib64/cmake/clang
LLVM_DIR:PATH=/usr/lib64/cmake/llvm
ROCM_CCACHE_BUILD:BOOL=OFF
ROCM_DEVICE_LIBS_BITCODE_INSTALL_LOC_NEW:STRING=
ROCM_DEVICE_LIBS_BITCODE_INSTALL_LOC_OLD:STRING=
ROCM_DIR:PATH=ROCM_DIR-NOTFOUND

Operating System

Linux x86/64

CPU

Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz

GPU

N/A

ROCm Version

ROCm 6.0.0
6.0.2

ROCm Component

ROCm-Device-Libs

Steps to Reproduce

configure source tree using cmake
make

Build fails with:

[ 77%] Generating cg.bc
cd /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/x86_64-redhat-linux-gnu/ockl && /usr/bin/clang-18 -I/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/ockl/../irif/inc -I/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/ockl/../oclc/inc -I/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/ockl/inc -fcolor-diagnostics -Werror -Wno-error=atomic-alignment -x cl -Xclang -cl-std=CL2.0 -target amdgcn-amd-amdhsa -fvisibility=protected -fomit-frame-pointer -Xclang -finclude-default-header -Xclang -fexperimental-strict-floating-point -nogpulib -cl-no-stdinc -Xclang -mcode-object-version=none -emit-llvm -Xclang -mlink-builtin-bitcode -Xclang /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/x86_64-redhat-linux-gnu/irif/irif.bc -c /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/ockl/src/cg.cl -o /home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/x86_64-redhat-linux-gnu/ockl/cg.bc
/home/tkloczko/rpmbuild/BUILD/ROCm-Device-Libs-rocm-6.0.2/ockl/src/cg.cl:91:5: error: '__builtin_amdgcn_ds_gws_init' needs target feature gws
   91 |     __builtin_amdgcn_ds_gws_init(nwm1, rid);
      |     ^
1 error generated.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

N/A

Additional Information

N/A

amdgcn/bitcodes relocation for distributions

Greetings,

While attempting to package rocm-device-libs for the debian project, I noticed amdgcn/bitcodes landed straight into $PREFIX/amdgcn. From an operating system perspective, since this is causing installation below /usr/amdgcn which is not conformant to FHS, I had to shift the directory a bit by applying this patch on Salsa to move to /usr/share/amdgcn.

Would such change be of interest on your side? Do you foresee any issue if bitcodes are moved around this way, and if so have a more appropriate approach to suggest? I'm afraid my patching could be rather naive at the moment.

Thank you for making ROCm freely available!

Have a nice day, :)
Étienne.

PS: my rationale for using /usr/share is that the produced bitcodes turn out to be CPU architecture independent data, but I'm also fine with /usr/lib/amdgcn (as long as it is not /usr/lib/$TRIPLET/amdgcn, the $TRIPLET would force us to make a set of architecture dependent packages instead of one all package, so not ideal from my point of view).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

rocm / rocm-device-libs Goto Github PK

rocm-device-libs's Introduction

Rocm-Device-Libs has moved!

rocm-device-libs's People

Contributors

Stargazers

Watchers

Forkers

rocm-device-libs's Issues

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Recommend Projects

Recommend Topics

Recommend Org