Giter Club home page Giter Club logo

hipamd's Introduction

hipamd development has been moved to https://github.com/ROCm/clr and this repo is not being used for active development since ROCm 5.6. Previously released branches are still available for reference.

hipamd's People

Contributors

aaronenyeshi avatar aditya4d1 avatar agunashe avatar alexvlx avatar aryansalmanpour avatar bensander avatar chriskitching avatar chrispaquot avatar dfukalov avatar emankov avatar gandryey avatar gargrahul avatar iassiour avatar jaydeeppatel1111 avatar jujiang-del avatar kjayapra-amd avatar lmoriche avatar mangupta avatar mhbliao avatar pghafari avatar saleelk avatar sarbojitamd avatar satyanveshd avatar scchan avatar shadidashmiz avatar sunway513 avatar tomsang avatar vsytch avatar whchung avatar yxsamliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hipamd's Issues

Segmentation fault in hipChildGraphNode::UpdateEventWaitLists

Consider the following example (which is written in Python but translates 1:1 to HIP C and should be easy enough to follow) and uses hipGraphAddChildGraphNode to embed a captured graph inside of another graph:

from ctypes import c_int, c_double, c_void_p

import numpy as np

from pyfr.backends.hip.driver import HIP, HIPGraph
from pyfr.backends.hip.compiler import HIPRTC

N = 1024*4096

hip = HIP()
hip.set_device(0)

stream = hip.create_stream()

a_hp = hip.pagelocked_empty((N,), float)
b_hp = hip.pagelocked_empty((N,), float)
c_hp = hip.pagelocked_empty((N,), float)

a_hp[:] = np.random.randn(N)
b_hp[:] = np.random.randn(N)

a_cu = hip.mem_alloc(a_hp.nbytes)
b_cu = hip.mem_alloc(b_hp.nbytes)
c_cu = hip.mem_alloc(c_hp.nbytes)

hiprtc = HIPRTC()

src = '''extern "C" __global__
void add(int n, double x, double *a, double *b, double *c)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) c[i] = a[i] + x*b[i];
}
'''
mod = hip.load_module(hiprtc.compile('kern', src))

fun = mod.get_function('add', [c_int, c_double] + 3*[c_void_p])
params = fun.make_params((N // 128, 1, 1), (128, 1, 1))
params.set_args(N, 1.1, a_cu, b_cu, c_cu)

g = HIPGraph(hip)
n1 = g.add_memcpy(a_cu, a_hp, a_hp.nbytes)
n2 = g.add_memcpy(b_cu, b_hp, b_hp.nbytes)

# Flip me!
if True:
    stream.begin_capture()
    fun.exec_async(stream, params)
    gg = stream.end_capture()
    n3 = g.add_graph(gg, deps=[n1, n2])
else:
    n3 = g.add_kernel(params, deps=[n1, n2])

g.add_memcpy(c_hp, c_cu, c_hp.nbytes, deps=[n3])

gi = g.instantiate()
gi.launch(stream)
stream.synchronize()

print(np.allclose(c_hp, a_hp + 1.1*b_hp))

Running this under ROCm 5.2 I get a segmentation fault in:

(gdb) bt
#0  0x00007fff4b2e259c in hipChildGraphNode::UpdateEventWaitLists(std::vector<amd::Event*, std::allocator<amd::Event*> >) ()
from /opt/rocm-5.2.0/lib/libamdhip64.so
#1  0x00007fff4b2abbad in FillCommands(std::vector<std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >, std::allocator<std::vector<hipGraphNode*, std::allocator<hipGraphNode*> > > >&, std::unordered_map<hipGraphNode*, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >, std::hash<hipGraphNode*>, std::equal_to<hipGraphNode*>, std::allocator<std::pair<hipGraphNode* const, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> > > > >&, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >&, amd::Command*&, amd::Command*&, amd::HostQueue*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#2  0x00007fff4b2abfdf in hipGraphExec::Run(ihipStream_t*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#3  0x00007fff4b2ae87d in ihipGraphLaunch(hipGraphExec*, ihipStream_t*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#4  0x00007fff4b2ba3a7 in hipGraphLaunch () from /opt/rocm-5.2.0/lib/libamdhip64.so

Switching the if block (so we add the kernel directly to the graph) everything works as expected. As does just running the captured graph bare. Given all API commands execute successfully this indicates a bug.

Disassembling:

Dump of assembler code for function _ZN17hipChildGraphNode20UpdateEventWaitListsESt6vectorIPN3amd5EventESaIS3_EE:
0x00007fff4b2e2580 <+0>:     push   %r13
0x00007fff4b2e2582 <+2>:     push   %r12
0x00007fff4b2e2584 <+4>:     mov    %rsi,%r12
0x00007fff4b2e2587 <+7>:     push   %rbp
0x00007fff4b2e2588 <+8>:     push   %rbx
0x00007fff4b2e2589 <+9>:     sub    $0x28,%rsp
0x00007fff4b2e258d <+13>:    mov    0xa8(%rdi),%rax
0x00007fff4b2e2594 <+20>:    mov    0x8(%r12),%rdx
0x00007fff4b2e2599 <+25>:    mov    (%rsi),%rsi
=> 0x00007fff4b2e259c <+28>:    mov    (%rax),%rax

with the associated C++ being:

https://github.com/ROCm-Developer-Tools/hipamd/blob/06f64e1a53ddc9b0ca02993ff5bff95bea1f8f7f/src/hip_graph_internal.hpp#L413

Inconsistent CMake module names

Documentation https://rocmdocs.amd.com/en/latest/Installation_Guide/Using-CMake-with-AMD-ROCm.html states to use find_package(hip) but HIP provides FindHIP.cmake, not Findhip.cmake as seen here: https://github.com/ROCm-Developer-Tools/HIP/blob/develop/cmake/FindHIP.cmake

resulting in the following error:

[cmake] CMake Error at CMakeLists.txt:8 (find_package):
[cmake]   By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
[cmake]   asked CMake to find a package configuration file provided by "hip", but
[cmake]   CMake did not find one.

RPATH is not optional

I noticed in 5.1.0, RPATH is not optional.

The commit in question is 58193bf

While I understand the rational from a ROCm packaging perspective, this causes issues for distros trying to package since RPATH is not optional.

Does this make sense to wrap this within the ROCM_RPATH condition later in the file?

RedHat & SLES - missing Clang RT Directory

RedHat & SLES

https://github.com/ROCm-Developer-Tools/hipamd/blob/474e8620099a463ad2ced821ae7400609b29bf7f/hip-config.cmake.in#L294

Missing Clang RT location in - /opt/rocm/hip/lib/cmake/hip/hip-config.cmake

CMake Error at /opt/rocm/hip/lib/cmake/hip/hip-config.cmake:331 (message):
  clangrt builtins lib not found
Call Stack (most recent call first):
  CMakeLists.txt:181 (find_package)

On RedHat

In file /opt/rocm/hip/lib/cmake/hip/hip-config.cmake the clang_rt.builtins library is located in /usr/lib64/clang/ and not in lib-- need to add directory structure for RedHat.

file(GLOB HIP_CLANGRT_LIB_SEARCH_PATHS "${HIP_CLANG_ROOT}/lib/clang/*/lib/*")
  find_library(CLANGRT_BUILTINS
    NAMES
      clang_rt.builtins
      clang_rt.builtins-x86_64
    PATHS
      ${HIP_CLANGRT_LIB_SEARCH_PATHS}
      ${HIP_CLANG_INCLUDE_PATH}/../lib/linux
      ${HIP_CLANG_INCLUDE_PATH}/../lib/windows
      NO_DEFAULT_PATH)
  • RedHat
HIP_CLANG_ROOT - /usr
HIP_CLANGRT_LIB_SEARCH_PATHS - 
HIP_CLANG_INCLUDE_PATH - HIP_CLANG_INCLUDE_PATH-NOTFOUND
  • Ubuntu
HIP_CLANG_ROOT - /usr/lib/llvm-14
HIP_CLANGRT_LIB_SEARCH_PATHS - 
HIP_CLANG_INCLUDE_PATH - /usr/lib/llvm-14/lib/clang/14.0.0/include

Can't compile hipamd in debug?

Debian wants debug symbols by default: AFAIR this was fine in 4.5.2.
Compiling with -DCMAKE_BUILD_TYPE=Debug

Something happened with the SONAME bump from 4 to 5?

As far as I understand, neither libamdhip64 nor libhiprtc-builtins hold device-side compute kernels.
https://salsa.debian.org/rocm-team/rocm-hipamd/-/blob/master/debian/rules

debian/libhiprtc-builtins5/usr/lib/x86_64-linux-gnu/libhiprtc-builtins.so.5.0.13601-: .debug_info section not present
debian/libamdhip64-5/usr/lib/x86_64-linux-gnu/libamdhip64.so.5.0.13601-: Unknown debugging section .debug_addr

objdump -W has a lot of warnings and dh_dwz errors for debian packaging.

Const `Scalar_accessor` conversion operators should return a `const&` and be `constexpr`

I first reported these as ROCm/HIP#2359 and ROCm/HIP#2365, although I realize now that this is probably the appropriate repository instead.

The issue has emerged while adding HIP support to GPUSPH, and is illustrated by simple code such as

#include <hip/hip_runtime.h>

typedef ushort4 particleinfo;

static __forceinline__ __host__ __device__ __attribute__((pure))
const ushort& type(const particleinfo &info)
{ return info.x; }

typedef uint4 vertexinfo;
constexpr __host__ __device__ __forceinline__ __attribute__((pure))
bool has_vertex(vertexinfo const& verts, uint id)
{ return verts.x == id || verts.y == id || verts.z == id; }

(these are both actual functions from the GPUSPH code).

A simple fix is given by the following patch (I can provide a PR if necessary).

diff --git a/include/hip/amd_detail/amd_hip_vector_types.h b/include/hip/amd_detail/amd_hip_vector_types.h
index fd7c554e..b387d484 100644
--- a/include/hip/amd_detail/amd_hip_vector_types.h
+++ b/include/hip/amd_detail/amd_hip_vector_types.h
@@ -193,9 +193,15 @@ typedef basic_ostream<char> ostream;
             Vector data;
 
             __HOST_DEVICE__
-            operator T() const noexcept { return data[idx]; }
+            constexpr operator const T&() const noexcept {
+                return reinterpret_cast<
+                    const T (&)[sizeof(Vector) / sizeof(T)]>(data)[idx];
+            }
             __HOST_DEVICE__
-            operator T() const volatile noexcept { return data[idx]; }
+            operator const volatile T&() const volatile noexcept {
+                return reinterpret_cast<
+                    const volatile T (&)[sizeof(Vector) / sizeof(T)]>(data)[idx];
+            }
 
 #ifdef __HIP_ENABLE_VECTOR_SCALAR_ACCESSORY_ENUM_CONVERSION__
             // The conversions to enum are fairly ghastly, but unfortunately used in

wavefrontsize64 not supported on GFX10 while it is on GFX11. Intended or not?

Hi,

I would like to compile my kernels for GFX10 with wavefront64 enabled, but I get the following compilation error:

/opt/rocm-5.4.1/include/hip/hip_runtime.h:41:2: error: HIP is not supported on GFX10 with wavefront size 64

But it happily compiles for GFX11. I don't have GFX11 to test, so I can't say if it actually works correctly or not on that target.
Is it what you wanted to have wavefront64 enabled for GFX11 but not GFX10? The architectures seem similar enough that it seems to me that it should either be working on both or none.

I would really appreciate to have the wavefront64 mode working on gfx10 to see if I get better performance with it.

Best regards,
Epliz

Implement float/double atomicMin/Max in terms of integer atomics

Currently, HIP implements atomicMin/Max for single and double precision floating point values as CAS loops. However, in fast math scenarios, on architectures with hardware support for signed/unsigned integer atomicMin/Max a better implementation is possible. As per https://stackoverflow.com/a/72461459 for single precision:

__device__ __forceinline__ float atomicMinFloat(float* addr, float value) {
    float old;
    old = !signbit(value) ? __int_as_float(atomicMin((int*)addr, __float_as_int(value))) :
        __uint_as_float(atomicMax((unsigned int*)addr, __float_as_uint(value)));

    return old;
}

__device__ __forceinline__ float atomicMaxFloat(float* addr, float value) {
    float old;
    old = !signbit(value) ? __int_as_float(atomicMax((int*)addr, __float_as_int(value))) :
        __uint_as_float(atomicMin((unsigned int*)addr, __float_as_uint(value)));

    return old;
}

Better implementations still are possible on NVIDIA using Opportunistic Warp-level Programming wherein one first looks to see if any other active threads in the warp have the same addr, and if so first do the reduction at the warp level. This greatly cuts down the number of RMW operations which leave the core when there is contention. I suspect a similar idea can carry over to AMD GPUs.

question about boardName

Do you know where info.boardName_ in hipamd/src/hip_device.cpp is read from ? The name is empty for some devices (e.g. MI210). Thanks

Consider creating releases of hipamd

All other software that is needed to create the ROCm suite is versioned as rocm-major.minor.path expect for hipamd. It would make it easier to integrate into build systems if this repository was versioned in the same manner.

Hyphens in library names

When building from a release, compilation with hip fails because of unnecessary hyphens at the end of the library names,

ls /opt/rocm/hip/lib
cmake           libamdhip64.so.5           libhiprtc-builtins.so    libhiprtc-builtins.so.5.0.13601-
libamdhip64.so  libamdhip64.so.5.0.13601-  libhiprtc-builtins.so.5

Now, clang expects input from stdin.

This is because the folder downloaded doesn't contain the .git folder to get the githash information.

Importing external semaphore not working under linux

Attempting to import an external semaphore always returns hipErrorInvalidValue under Linux:

#include <hip/hip_runtime.h>
#include <iostream>
#include <cstdlib>

int main(int argc, char* argv[]) {
    hipExternalSemaphoreHandleDesc desc = {};
    desc.type = hipExternalSemaphoreHandleTypeOpaqueFd;
    desc.handle.fd = 0;

    hipExternalSemaphore_t external_sem;
    hipError_t result = hipImportExternalSemaphore(&external_sem, &desc);
    if (result != hipSuccess) {
        std::cerr << "error: " << hipGetErrorString(result) << std::endl;
        return EXIT_FAILURE;
    }

    std::cout << "success" << std::endl;

    hipDestroyExternalSemaphore(external_sem);

    return EXIT_SUCCESS;
}
$ hipcc -oexternal_semaphore external_semaphore.hip
$ ./external_semaphore
error: hipErrorInvalidValue

Note: This is not caused by the external fd being incorrect, as the code path executed by hipDestroyExternalSemaphore hits NullDevice::importExtSemaphore:

$ gdb ./external_semaphore
$ b NullDevice::importExtSemaphore
$ r
Breakpoint 1 at 0x7ffff70d6da0: file /nix/store/v954yk0ng3hz50ppmf8ax1jfxhcfzgxq-rocclr-5.2.3/cmake/../device/rocm/rocdevice.hpp, line 237.

Here is a more complete example (but im not sure if its correct) that attempts to construct a proper semaphore fd:
external_semaphore.txt

$ hipcc -oexternal_semaphore ./external_semaphore.hip -I /usr/include/drm -ldrm
$ ./external_semaphore  /dev/dri/renderD128 
error: hipErrorInvalidValue

Missing definitions for hipExternalSemaphore and hipExternalMemory when compiling for cuda

When compiling the following executable for cuda, hipcc/nvcc reports errors about hipExternalSemaphore_t and the related functions type not being defined:

#include <hip/hip_runtime.h>
#include <iostream>
#include <cstdlib>

int main(int argc, char* argv[]) {
    hipExternalSemaphoreHandleDesc desc = {};
    desc.type = hipExternalSemaphoreHandleTypeOpaqueFd;
    desc.handle.fd = 0;

    hipExternalSemaphore_t external_sem;
    hipError_t result = hipImportExternalSemaphore(&external_sem, &desc);
    if (result != hipSuccess) {
        std::cerr << "error: " << hipGetErrorString(result) << std::endl;
        return EXIT_FAILURE;
    }

    std::cout << "success" << std::endl;

    hipDestroyExternalSemaphore(external_sem);

    return EXIT_SUCCESS;
}
$ hipcc -o external_semaphore ./external_semaphore.hip -x cu
./external_semaphore.hip(6): error: identifier "hipExternalSemaphoreHandleDesc" is undefined

./external_semaphore.hip(7): error: identifier "hipExternalSemaphoreHandleTypeOpaqueFd" is undefined

./external_semaphore.hip(10): error: identifier "hipExternalSemaphore_t" is undefined

./external_semaphore.hip(11): error: identifier "hipImportExternalSemaphore" is undefined

./external_semaphore.hip(19): error: identifier "hipDestroyExternalSemaphore" is undefined

./external_semaphore.hip(21): error: identifier "hipExternalMemory_t" is undefined

This is also the case for hipExternalMemory_t and related types/functions.

Judging from the headers, it seems like the definitions for this are simply missing. For amd, they are defined in hip_runtime_api.h, but guarded by a macro that checks for amd. If that same if statement determines that the compilation is for nvidia, nvidia_hip_runtime_api.h is included, and the relevant definitions are missing there.

hipDeviceGetUuid yields link error when compiling with nvcc

#include <hip/hip_runtime.h>

int main() {
    hipUUID_t uuid;
    (void) hipDeviceGetUuid(&uuid, 0);
}

Compiling the above with nvcc yields the following error:

$ nvcc -o test ./test.hip -x cu
ld: /tmp/tmpxft_0002f12e_00000000-11_test.o: in function `main':
tmpxft_0002f12e_00000000-6_test.cudafe1.cpp:(.text.startup+0x1a): undefined reference to `cuDeviceGetUuid'
collect2: error: ld returned 1 exit status

HIP_CLANG_ROOT: Not Overwritable via ENV{HIP_PATH} Yet

While debugging another issue with a HPE/Cray environment on OLCF Crusher the other week, I realized that the logic for HIP_CLANG_ROOT in
https://github.com/ROCm-Developer-Tools/hipamd/blob/d2d2cacfe210307ec10c77400e1dafdeafefbc0f/hip-config.cmake.in#L122-L153

cannot be overwritten properly with HIP_PATH env hints.

It looks that down the line, although HIP_CLANG_ROOT is already set, the variable is again overwritten. This caused a dead-end for me with one of the programming environments that HPE/OLCF provided for us on OLCF, where I tried to instead use the amdclang++ compiler instead of CC.

Can this be fixed, e.g., by not overwriting the HIP_CLANG_ROOT variable in the end of the snippet in case it is already set properly?

hipGetSymbol* functions always return hipErrorInvalidSymbol.

Calling hipGetSymbolAddress and hipGetSymbolSize always yields hipErrorInvalidSymbol. I did a little bit of investigation: Calling either of these functions eventually leads control flow to this function:
https://github.com/ROCm-Developer-Tools/hipamd/blob/928684dfed0fb4a8a14c6d5d79014324f3218d82/src/hip_code_object.cpp#L817-L832
Here, hostVar is the same variable that was passed down as the variable namne from hipGetSymbolAddress and hipGetSymbolSize. The vars_ map is declared with key const void*:
https://github.com/ROCm-Developer-Tools/hipamd/blob/928684dfed0fb4a8a14c6d5d79014324f3218d82/src/hip_code_object.hpp#L158
This means that when vars_.find() is called, the keys of the map are compared by pointer comparison instead of variable comparison. This should be able to be resolved by using std::string for they instead, which has the proper hash/comparison operators.

fatal error: amdocl/cl_icd_amd.h: No such file or directory

Build fails using rocm-5.2.x branch

  cmake .. \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DHIP_PLATFORM=amd \
    -DHIP_COMMON_DIR=$HIP_DIR \
    -DAMD_OPENCL_INCLUDE_DIR=/opt/rocm/opencl/include \
    -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \

[  0%] Building CXX object src/CMakeFiles/rocclr.dir/home/pvelesko/space/HIPAMD/ROCclr/device/blit.cpp.o
cd /home/pvelesko/space/HIPAMD/hipamd/build/src && /usr/bin/c++ -DATI_ARCH_X86 -DATI_BITS_32 -DATI_OS_LINUX -DCL_TARGET_OPENCL_VERSION=220 -DCL_USE_DEPRECATED_OPENCL_1_0_APIS -DCL_USE_DEPRECATED_OPENCL_1_1_APIS -DCL_USE_DEPRECATED_OPENCL_1_2_API
S -DCL_USE_DEPRECATED_OPENCL_2_0_APIS -DCOMGR_DYN_DLL -DHAVE_CL2_HPP -DLITTLEENDIAN_CPU -DOPENCL_C_MAJOR=2 -DOPENCL_C_MINOR=0 -DOPENCL_MAJOR=2 -DOPENCL_MINOR=1 -DROCCLR_SUPPORT_NUMA_POLICY -DUSE_COMGR_LIBRARY -DWITH_HSA_DEVICE -DWITH_LIGHTNING_COMPILER -DWITH_LIQUID_FLASH=0 -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/.. -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/../compiler/lib -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/../compiler/lib/include -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/../compiler/lib/backends/common -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/../device -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/../elf -I/home/pvelesko/space/HIPAMD/ROCclr/cmake/../include -I/opt/rocm/opencl/include -I/opt/rocm/opencl/include/.. -I/opt/rocm/opencl/include/../.. -I/opt/rocm/opencl/include/../../.. -I/opt/rocm/opencl/include/../../../.. -I/opt/rocm/opencl/include/../../../../amdocl -isystem /opt/rocm/include -isystem /opt/rocm/include/hsa -Werror -O2 -g -DNDEBUG -fPIC -pthread -std=c++1z -MD -MT src/CMakeFiles/rocclr.dir/home/pvelesko/space/HIPAMD/ROCclr/device/blit.cpp.o -MF CMakeFiles/rocclr.dir/home/pvelesko/space/HIPAMD/ROCclr/device/blit.cpp.o.d -o CMakeFiles/rocclr.dir/home/pvelesko/space/HIPAMD/ROCclr/device/blit.cpp.o -c /home/pvelesko/space/HIPAMD/ROCclr/device/blit.cpp
In file included from /home/pvelesko/space/HIPAMD/ROCclr/cmake/../platform/agent.hpp:27:0,
                 from /home/pvelesko/space/HIPAMD/ROCclr/cmake/../platform/command.hpp:34,
                 from /home/pvelesko/space/HIPAMD/ROCclr/cmake/../platform/commandqueue.hpp:33,
                 from /home/pvelesko/space/HIPAMD/ROCclr/device/blit.cpp:21:
/home/pvelesko/space/HIPAMD/ROCclr/cmake/../include/vdi_agent_amd.h:25:10: fatal error: amdocl/cl_icd_amd.h: No such file or directory
 #include "amdocl/cl_icd_amd.h"

Failed to embed PCH

Trying to build HIP using a fresh install of Ubuntu 22.04

Following this guide: https://docs.amd.com/en-US/bundle/HIP-Programming-Guide-v5.0/page/Installing_HIP.html

rocm-5.4.x

pvelesko@cupcake:~/HIPAMD/hipamd$ cd "$HIPAMD_DIR"
mkdir -p build; cd build
cmake -DHIP_COMMON_DIR=$HIP_DIR -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="/opt/rocm/" -DCMAKE_INSTALL_PREFIX=$PWD/install ..
make -j$(nproc)
sudo make install
-- HIPCC_BIN_DIR found at
-- HIP_COMMON_DIR found at /home/pvelesko/HIPAMD/hip
-- Using CPACK_DEBIAN_PACKAGE_RELEASE local
-- CPACK_RPM_PACKAGE_RELEASE: local
-- HIP Platform: amd
-- HIP Runtime: rocclr
-- HIP Compiler: clang
-- ROCM Installation path(ROCM_PATH): /opt/rocm
-- HIP will be installed in: /home/pvelesko/HIPAMD/hipamd/build/install
-- Could NOT find NUMA (missing: NUMA_LIBRARIES NUMA_INCLUDE_DIR)
-- Could NOT find LibXml2 (missing: LIBXML2_LIBRARY LIBXML2_INCLUDE_DIR)
'sh' '-c' '/home/pvelesko/HIPAMD/hipamd/src/hip_embed_pch.sh /home/pvelesko/HIPAMD/hip/include /home/pvelesko/HIPAMD/hipamd/build/include /home/pvelesko/HIPAMD/hipamd/include /opt/rocm/llvm/lib/cmake/llvm/../../..'
+ /opt/rocm/llvm/lib/cmake/llvm/../../../bin/clang -O3 --rocm-path=/home/pvelesko/HIPAMD/hipamd/build/include/.. -std=c++17 -nogpulib -isystem /home/pvelesko/HIPAMD/hipamd/build/include -isystem /home/pvelesko/HIPAMD/hip/include -isystem /home/pvelesko/HIPAMD/hipamd/include --cuda-device-only --cuda-gpu-arch=gfx1030 -x hip /tmp/hip_pch.146539/hip_pch.h -E
In file included from /tmp/hip_pch.146539/hip_pch.h:1:
/home/pvelesko/HIPAMD/hip/include/hip/hip_runtime.h:54:10: fatal error: 'thread' file not found
#include <thread>
         ^~~~~~~~
1 error generated when compiling for gfx1030.
CMake Error at src/CMakeLists.txt:186 (message):
  Failed to embed PCH

main branch is outdated

Dear maintainer, could you please delete the main branch? This branch is outdated and is only confusing developers.

hip-config.cmake contains wrong paths when not using /opt/rocm/

This issue comes up when rocm is installed in system directories and not /opt/rocm, for example Fedora package by @Mystro256 and Solus package I have done.
When building something relying on hip (in my case rocblas), hip-config.cmake causes problems. Ones I currently found are:

  1. _IMPORT_PREFIX: ${_IMPORT_PREFIX}/../include will mean that it will point to /usr/../include which doesn't exist. Changing all the occurrences to ${_IMPORT_PREFIX}/include it now correctly points to /usr/include
  2. HIP_CLANG_ROOT: When using system's LLVM ${ROCM_PATH}/llvm will not find the correct folders in system directories, like for example it then looks for ${ROCM_PATH}/llvm/clang, and the folder it is looking for is /usr/lib64/clang. So you can't simply set ${ROCM_PATH}, you need to correct this .cmake file too. I had to change ${ROCM_PATH}/llvm to /usr.

Here is the generated hip-config.cmake that is problematic for me (in /usr/lib64/cmake/hip/).
And to fix it I did:

sed -i 's|${_IMPORT_PREFIX}/../include|${_IMPORT_PREFIX}/include|g' /usr/lib64/cmake/hip/hip-config.cmake
sed -i 's|"${ROCM_PATH}/llvm"|"/usr"|g' /usr/lib64/cmake/hip/hip-config.cmake

Status of atomicAddNoRet

Hello!

I would like to inquire about the state of the atomicAddNoRet function. It gives our code (GROMACS) a 2x speed-up in one of the kernels when running on MI100 (gfx908), compared to a plain atomicAdd (which gets compiled into a CAS-loop). So, I would really like to keep using the noret version, since the return value is anyway ignored.

However, atomicAddNoRet is marked as deprecated, and a plain atomicAdd is suggested instead (with no indications of possible performance degradation, by the way!). Could you please advise on what function should be used? I also considered using the __ockl_atomic_add_noret_f32 intrinsic directly, but it's also not documented.

We are using with ROCm 4.5.2 and hipSYCL for our code. However, the problem is easily demonstrated with the plain HIP (ROCm 4.5.2 and 5.0.0 tested):

#include "hip/hip_runtime.h"

__global__ void atomicAddKernel(float *__restrict__ a) {
  // Return value is clearly unused.
  atomicAdd(a, 1);
}

__global__ void atomicAddNoRetKernel(float *__restrict__ a) {
  // 'atomicAddNoRet' is deprecated: use atomicAdd instead
  atomicAddNoRet(a, 1);
}

int main() { return 0; }
$ hipcc --version
HIP version: 4.4.21432-f9dccde4
AMD clang version 13.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.5.2 21432 9bbd96fd1936641cd47defd8022edafd063019d5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-4.5.2/llvm/bin
$ hipcc --offload-arch=gfx908 -O3 test.cpp -save-temps -Wno-deprecated-declarations

Examining the test-hip-amdgcn-amd-amdhsa-gfx908.s file, we see that _Z15atomicAddKernelPf contains a loop of global_atomic_cmpswap, while _Z20atomicAddNoRetKernelPf only has one nice little global_atomic_add_f32 call.

Remove HIP_COMPILER STREQUAL "clang"

When using the amdclang compiler through a wrapper script called "clang", HIP's cmake throws an error because it cannot find libclang_rt.builtins.

I think the source of the error is in a combination of:
https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/hip-config.cmake.in
and https://github.com/ROCm-Developer-Tools/HIP/blob/develop/hip-lang-config.cmake.in

The reason that I am unsure is that the error is coming from this file in my system:
/opt/rocm-5.4.0/lib/cmake/hip/hip-config.cmake

It contains both

if(HIP_COMPILER STREQUAL "clang")
  if(WIN32)

and, later in the same file,

  file(GLOB HIP_CLANGRT_LIB_SEARCH_PATHS "${HIP_CLANG_ROOT}/lib/clang/*/lib/*")
  find_library(CLANGRT_BUILTINS
    NAMES
      clang_rt.builtins
      clang_rt.builtins-x86_64
    PATHS
      ${HIP_CLANGRT_LIB_SEARCH_PATHS}
      ${HIP_CLANG_INCLUDE_PATH}/../lib/linux
      ${HIP_CLANG_INCLUDE_PATH}/../lib/windows
      NO_DEFAULT_PATH)

The actual path to the library it's looking for on my system is
/opt/rocm-5.4.0/llvm/lib/clang/15.0.0/lib/linux/libclang_rt.builtins-x86_64.a

IMHO, resolving this issue properly requires defining and enforcing a consistent convention for rocm, llvm, and clang paths. These paths are a recurring issue in many components of rocm. A consistent convention stands to fix a lot of related troubles.

Compile error in hip_cooperative_groups.h with ROCm 5.5

With ROCm 5.5, the inclusion of hip/hip_cooperative_groups.h results in a compile error on debug builds:

In file included from /opt/rocm-5.5.0/include/hip/hip_cooperative_groups.h:38:
/opt/rocm-5.5.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:455:7: error: invalid instruction, did you mean: s_trap?
      __hip_assert(false && "invalid cooperative group type");
      ^
/opt/rocm-5.5.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:48:7: note: expanded from macro '__hip_assert'
      __hip_abort();                                                                               \
      ^
/opt/rocm-5.5.0/include/hip/amd_detail/amd_hip_cooperative_groups.h:41:9: note: expanded from macro '__hip_abort'
  { asm("trap;"); }
        ^
<inline asm>:1:2: note: instantiated into assembly here
        trap;
        ^

This issue does not occur in ROCm 5.4, where this code used assert instead. The problem seems to have been introduced by commit 5ff4b16. Note that it only occurs if NDEBUG is not defined. Steps to reproduce:

git clone https://github.com/amd/rocm-examples.git
cd rocm-examples/HIP-Basic/cooperative_groups
cmake -S . -B build
cmake --build build

HIP build is fundamentally broken in 4.5

It appears the instruction in INSTALL.md misled in such a way build expects the hipconfig in wrong place.

DCMAKE_INSTALL_PREFIX=$PWD/install ..
-- HIP_COMMON_DIR found at /root/ROCm-4.5/hipamd/hip
Can't open perl script "/root/ROCm-4.5/hipamd/hip/bin/hipconfig": No such file or directory

root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build# find ~/ROCm-4.5/ -name hipconfig
/root/ROCm-4.5/HIP/bin/hipconfig

root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd# export HIPAMD_DIR="$(readlink -f hipamd)"
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd# export HIP_DIR="$(readlink -f hip)"
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd# export ROCclr_DIR="$(readlink -f ROCclr)"
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd# export OPENCL_DIR="$(readlink -f ROCm-OpenCL-Runtime)"
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd# 
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd# mkdir -p build; cd build
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build# 
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build# cmake -DHIP_COMMON_DIR=$HIP_DIR -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="/opt/rocm/" -DCMAKE_INSTALL_PREFIX=$PWD/install ..
-- HIP_COMMON_DIR found at /root/ROCm-4.5/hipamd/hip
Can't open perl script "/root/ROCm-4.5/hipamd/hip/bin/hipconfig": No such file or directory
CMake Error at CMakeLists.txt:82 (string):
  string sub-command REPLACE requires at least four arguments.


-- Using CPACK_DEBIAN_PACKAGE_RELEASE local
-- CPACK_RPM_PACKAGE_RELEASE: local
-- HIP Platform: 
CMake Error at CMakeLists.txt:218 (message):
  Unexpected HIP_PLATFORM:


-- Configuring incomplete, errors occurred!
See also "/root/ROCm-4.5/hipamd/CMakeFiles/CMakeOutput.log".
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build# nano -w ../../build-rocm.sh 
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build#     echo HIPAMD_DIR: $HIPAMD_DIR, HIP_DIR: $HIP_DIR, ROCclr_DIR: $ROCclr_DIR, OPENCL_DIR: $OPENCL_DIR
HIPAMD_DIR: /root/ROCm-4.5/hipamd/hipamd, HIP_DIR: /root/ROCm-4.5/hipamd/hip, ROCclr_DIR: /root/ROCm-4.5/hipamd/ROCclr, OPENCL_DIR: /root/ROCm-4.5/hipamd/ROCm-OpenCL-Runtime
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build# 
root@nonroot-SYS-7049GP-TRT:~/ROCm-4.5/hipamd/build# 

CMake: `SHELL:` Escape Combined Flags Missing on Cray CC?

Migrated from originally:

Flags like these -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false:
https://github.com/ROCm-Developer-Tools/hipamd/blob/d2d2cacfe210307ec10c77400e1dafdeafefbc0f/hip-config.cmake.in#L225

Need to be escaped with SHELL:, otherwise CMake will de-duplicate the -llvm part and then they are not passed as intented in the compile line (e.g., they become -mllvm -amdgpu-early-inline-all=true -amdgpu-function-calls=false and the 2nd flag is ignored).
Update: oh, maybe they are (see comment).

It looks like these flags are not present when compiling through HPE/Cray Compiler wrappers (CC) on OLCF Crusher (ROCm 4.5.2).

[ 17%] Building CXX object _deps/localamrex-build/Src/CMakeFiles/amrex.dir/LinearSolvers/MLMG/AMReX_MLTensorOp.cpp.o
/sw/crusher/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/ccache-4.4.2-p3gb734ehpc3nfdqv3sxf7itxq5uvpmq/bin/ccache /opt/cray/pe/craype/2.7.13/bin/CC -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/ccs/home/huebl/src/amrex/Src/Base -I/ccs/home/huebl/src/amrex/Src/Base/Parser -I/ccs/home/huebl/src/amrex/Src/Boundary -I/ccs/home/huebl/src/amrex/Src/AmrCore -I/ccs/home/huebl/src/amrex/Src/LinearSolvers/MLMG -I/ccs/home/huebl/src/amrex/Src/Particle -I/ccs/home/huebl/src/warpx/build_crusher/_deps/localamrex-build -isystem /opt/rocm-4.5.2/llvm/lib/clang/13.0.0/include/.. -isystem /opt/rocm-4.5.2/hip/include -isystem /opt/rocm-4.5.2/hiprand/include -isystem /opt/rocm-4.5.2/rocrand/include -isystem /opt/rocm-4.5.2/rocprim/include -I/opt/rocm-4.5.2/include -O3 -DNDEBUG -m64 -munsafe-fp-atomics -x hip --offload-arch=gfx90a -std=c++17 -MD -MT _deps/localamrex-build/Src/CMakeFiles/amrex.dir/LinearSolvers/MLMG/AMReX_MLTensorOp.cpp.o -MF CMakeFiles/amrex.dir/LinearSolvers/MLMG/AMReX_MLTensorOp.cpp.o.d -o CMakeFiles/amrex.dir/LinearSolvers/MLMG/AMReX_MLTensorOp.cpp.o -c /ccs/home/huebl/src/amrex/Src/LinearSolvers/MLMG/AMReX_MLTensorOp.cpp

fatal error: error in backend: Error while trying to spill SGPR4_SGPR5 from class SReg_64: Cannot scavenge register without an emergency spill slot!
clang-13: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Cray clang version 13.0.0  (24b043d62639ddb4320c86db0b131600fdbc6ec6)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/cray/pe/cce/13.0.0/cce-clang/x86_64/share/../bin
clang-13: note: diagnostic msg: Error generating preprocessed source(s).
gmake[2]: *** [_deps/localamrex-build/Src/CMakeFiles/amrex.dir/build.make:1630: _deps/localamrex-build/Src/CMakeFiles/amrex.dir/LinearSolvers/MLMG/AMReX_MLTensorOp.cpp.o] Error 70
gmake[1]: *** [CMakeFiles/Makefile2:910: _deps/localamrex-build/Src/CMakeFiles/amrex.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2

CC @tomsang @yxsamliu

Vector types operators taking a second arbitrary argument are too greedy

This is again an issue that came up trying to introduce a HIP backend support in GPUSPH. We have Point and Vector classes that can be constructed from vector types, equipped with operators that allow adding (etc) them together.

This means that code such as this: double3 v ; Point pt ; Point n = v + pt works because the v gets converted to a Point and Point+Point is a defined operation —at least in CUDA and on CPU.

This is however does not work when using HIP vector types, because of definitions such as:

    template<typename T, unsigned int n, typename U>
    __HOST_DEVICE__
    inline
    constexpr
    HIP_vector_type<T, n> operator+(
        const HIP_vector_type<T, n>& x, U y) noexcept
    {
        return HIP_vector_type<T, n>{x} += HIP_vector_type<T, n>{y};
    }

or the one with the flipped arguments. These take precedence of the conversion-enabled Point+Point operator, and result in an error because HIP_vector_type<T, n> cannot be constructed from a Point.

The solution is to enable these operators only for types for which the conversion is defined, for example using appropriate enable_if fencing:

    template<typename T, unsigned int n, typename U>
    __HOST_DEVICE__
    inline
    constexpr
    typename std::enable_if<std::is_convertible<U, HIP_vector_type<T, n>>::value, HIP_vector_type<T, n>>::type
    operator+(
        const HIP_vector_type<T, n>& x, U y) noexcept
    {
        return HIP_vector_type<T, n>{x} += HIP_vector_type<T, n>{y};
    }

This should be done for all these operators. Additionally, the U argument should be a const& to avoid unnecessary invocations of copy constructors.

Slowdown observed on Linux with RDNA2 when blockDim is NOT loaded

The following kernel:

extern "C" __global__ void VkFFT_main(unsigned long long* g, unsigned long long* h) {
  __shared__ unsigned long long c[8192];
  asm volatile(";x: %0" : : "s"((unsigned)blockDim.x));
  unsigned b =
      threadIdx.y * (threadIdx.y + threadIdx.x) * 7 + blockIdx.z * 6384;
  c[0] = g[b];
  h[b] = c[threadIdx.x];
}

becomes ~10% slower when the inline assembly for loading blockDim.x is removed.
This seems to happen only on RDNA2 (tested with V620 and RX6650XT) on Linux.

Observations

  • Does not seem to reproduce on Windows with the RX6900XT
  • The amount of shared memory used by the kernel is relevant, decreasing the size of c the slowdown
    becomes less without loading blockDim
  • The generated assembly with and without blockDim is almost identical, differing only by the load
    of the block size from the dispatch packet and the metadata that flags that the dispatch packet is in use.
    • What's more if this load instruction is removed from the faster kernel (without changing the meta-data) it becomes as slow as the other one.

Motivation

The kernel above was produced using c-reduce on a kernel extracted from vkFFT.
A possible optimization for VkFFT's HIP backend aimed to replace blockDim with it's values ahead of time as they are known when compilation happens.
This results in big improvements especially for small problem sizes, except select cases in RDNA2, where it leads to a slowdown as much as ~30%. This issue is the result of investigating the cause of this.

Environment

hipconfig
hipconfig
HIP version  : 5.3.22061-e8e78f1a

== hipconfig
HIP_PATH     : /opt/rocm-5.3.0
ROCM_PATH    : /opt/rocm-5.3.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.3.0/include -I/opt/rocm-5.3.0/llvm/bin/../lib/clang/15.0.0 -I/opt/rocm-5.3.0/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.3.0/hsa
HIP_CLANG_PATH   : /opt/rocm-5.3.0/llvm/bin
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.3.0 22362 3cf23f77f8208174a2ee7c616f4be23674d7b081)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.3.0/llvm/bin
AMD LLVM version 15.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver3

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-5.3.0/llvm/lib/clang/15.0.0/include/.." -isystem /opt/rocm-5.3.0/hsa/include -isystem "/opt/rocm-5.3.0/include" -O3
hip-clang-ldflags  :  -L"/opt/rocm-5.3.0/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/home/gergely/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname     : nostromo
Linux nostromo 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.5 LTS
Release:	20.04
Codename:	focal
clang version
/opt/rocm/llvm/bin/clang++ --version
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.3.0 22362 3cf23f77f8208174a2ee7c616f4be23674d7b081)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/llvm/bin
rocminfo
rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD EPYC 7713P 64-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD EPYC 7713P 64-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2000
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            128
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    528082872(0x1f79e7b8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    528082872(0x1f79e7b8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    528082872(0x1f79e7b8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1030
  Uuid:                    GPU-abcb45dca7663b11
  Marketing Name:          AMD Radeon PRO V620
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      4096(0x1000) KB
    L3:                      131072(0x20000) KB
  Chip ID:                 29601(0x73a1)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2570
  BDFID:                   33536
  Internal Node ID:        1
  Compute Unit:            72
  SIMDs per CU:            2
  Shader Engines:          8
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    31440896(0x1dfc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1030
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*******
Agent 3
*******
  Name:                    gfx1030
  Uuid:                    GPU-2293a876b6331dff
  Marketing Name:          AMD Radeon PRO V620
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    2
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      4096(0x1000) KB
    L3:                      131072(0x20000) KB
  Chip ID:                 29601(0x73a1)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2570
  BDFID:                   34304
  Internal Node ID:        2
  Compute Unit:            72
  SIMDs per CU:            2
  Shader Engines:          8
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    31440896(0x1dfc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1030
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

Attachments

Archive file blockdim-faster-linux-rdna2.tar.gz containing the original VkFFT kernel,
the host code used to do the speed tests (based on the launch params done by VkFFT), annotated assembly from the kernel and the script used for the test case reduction.
The script is useful to verify the slowdown. It compiles and runs the kernel reduced kernel source (test.hip) with and without loading blockDim up to 3 times and shows the difference in time taken.

failure to build from source with Gcc 12 headers

Greetings,

With the recent introduction of Gcc 12 in Debian unstable, I noticed that rocm-hipamd 5.0.0 failed to build from source when hipcc was trying to refer to Gcc 12 headers, see the log below:

-- Generating /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/CMakeFiles/directed_tests.deviceLib.hipVectorTypesDevice.dir/hip/tests/src/deviceLib/./directed_tests.deviceLib.hipVectorTypesDevice_generated_hipVectorTypesDevice.cpp.o
/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/bin/hipcc -c /<<PKGBUILDDIR>>/hip/tests/src/deviceLib/hipVectorTypesDevice.cpp -o /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/CMakeFiles/directed_tests.deviceLib.hipVectorTypesDevice.dir/hip/tests/src/deviceLib/./directed_tests.deviceLib.hipVectorTypesDevice_generated_hipVectorTypesDevice.cpp.o --offload-arch=gfx803 -I/<<PKGBUILDDIR>>/include -I/<<PKGBUILDDIR>>/hip/tests/src -I/<<PKGBUILDDIR>>/hip/tests/unit -I/<<PKGBUILDDIR>>/hip/tests/performance -I/<<PKGBUILDDIR>>/include -I/<<PKGBUILDDIR>>/hip/tests/src
In file included from /<<PKGBUILDDIR>>/hip/tests/src/deviceLib/hipVectorTypesDevice.cpp:34:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:77:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
      __attribute__((__noinline__))
                     ^
/<<PKGBUILDDIR>>/include/hip/amd_detail/host_defines.h:50:37: note: expanded from macro '__noinline__'
#define __noinline__ __attribute__((noinline))
                                    ^
In file included from /<<PKGBUILDDIR>>/hip/tests/src/deviceLib/hipVectorTypesDevice.cpp:34:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:77:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: type name does not allow function specifier to be specified
/<<PKGBUILDDIR>>/include/hip/amd_detail/host_defines.h:50:37: note: expanded from macro '__noinline__'
#define __noinline__ __attribute__((noinline))
                                    ^
In file included from /<<PKGBUILDDIR>>/hip/tests/src/deviceLib/hipVectorTypesDevice.cpp:34:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:77:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: expected expression
/<<PKGBUILDDIR>>/include/hip/amd_detail/host_defines.h:50:46: note: expanded from macro '__noinline__'
#define __noinline__ __attribute__((noinline))
                                             ^
3 errors generated when compiling for gfx803.

I worked around the build failure by shunting the __attribute__((noinline)) with the below patch, but it feel rather hacky and possibly wrong (but it builds okay):

--- rocm-hipamd.orig/include/hip/amd_detail/host_defines.h
+++ rocm-hipamd/include/hip/amd_detail/host_defines.h
@@ -47,7 +47,7 @@
 #define __constant__ __attribute__((constant))
 #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__
 
-#define __noinline__ __attribute__((noinline))
+#define __noinline__
 #define __forceinline__ inline __attribute__((always_inline))
 
 #if __HIP_NO_IMAGE_SUPPORT

Having a quick look at hipamd 5.2 source code, it may be possible the issue might be still present, but I can't tell for sure while we haven't completed the upgrade to 5.2 in unstable.

Kind Regards,
Étienne.

Linking error occurs when using hipcc compiling file

hipamd branch: rocm-5.2.x
working directory: /workspace/
hipamd build command:

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON /workspace/src//hipamd -B /workspace/build/Debug//hipamd -DHIP_COMMON_DIR=/workspace/src//HIP '-DCMAKE_PREFIX_PATH=/workspace/build/Debug//comgr;/workspace/build/Debug//llvm-project' -Dhsa-runtime64_DIR=/workspace/install/Debug/rocm/lib/cmake/hsa-runtime64 -DAMD_OPENCL_PATH=/workspace/src//ROCm-OpenCL-Runtime -DROCCLR_PATH=/workspace/src//ROCclr -DCMAKE_INSTALL_PREFIX=/workspace/install/Debug/rocm -DROCM_PATH=/workspace/install/Debug/rocm`

That building command executed outside the building_stage dir is leading to this issuse happen.↓↓

cmake --build /workspace/build/Debug//hipamd --target install

After hipcc building is completed, test hipcc function.

hipcc test file: main.c

#include "hip/hip_runtime.h"
int main()
{
	printf("[HIP]  %d\n", HIP_VERSION);
	return 0;
}

hipcc compile command:hipcc main.c
error output during compile test file:
/usr/bin/ld: /workspace/install/Debug/rocm/lib/libamdhip64.so: undefined reference to __hip_pch'
/usr/bin/ld: /workspace/install/Debug/rocm/lib/libamdhip64.so: undefined reference to __hip_pch_size

I found that the hip_pch.o would not be generated in correct CMAKE_BINARY_DIR when building hipcc executable.
The follow patch can fix this problem.
Pull Request Here

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index f81296d3..1fbd355c 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -181,7 +181,11 @@ if(__HIP_ENABLE_PCH)
     set(HIP_LLVM_ROOT "${LLVM_DIR}/../../..")
   endif()
 
-  execute_process(COMMAND sh -c "${HIP_COMMON_BIN_DIR}/hip_embed_pch.sh ${HIP_COMMON_INCLUDE_DIR} ${PROJECT_BINARY_DIR}/include ${PROJECT_SOURCE_DIR}/include ${HIP_LLVM_ROOT}" COMMAND_ECHO STDERR RESULT_VARIABLE EMBED_PCH_RC)
+  execute_process(COMMAND sh -c "${HIP_COMMON_BIN_DIR}/hip_embed_pch.sh ${HIP_COMMON_INCLUDE_DIR} ${PROJECT_BINARY_DIR}/include ${PROJECT_SOURCE_DIR}/include ${HIP_LLVM_ROOT}" 
+    COMMAND_ECHO STDERR 
+    RESULT_VARIABLE EMBED_PCH_RC 
+    WORKING_DIRECTORY "${CMAKE_BINARY_DIR}")
+
   if (EMBED_PCH_RC AND NOT EMBED_PCH_RC EQUAL 0)
     message(FATAL_ERROR "Failed to embed PCH")
   endif()

Polaris 10 support

Is it possible to use HIP on RX400 series GPU? I'd like to try it on my RX480

5.0.1/5.1.3: hip_prof_str.h changes after running hip_prof_gen.py which causes build failure

I try to build hip-5.0.1 with profile USE_PROF_API=1, but while building hip_prof_gen.py threw "Warning: "/ext4-disk/build/portage/dev-util/hip-5.0.1/work/hipamd-rocm-5.0.1/include/hip/amd_detail/hip_prof_str.h" needs to be re-generated and checked-in with the current changes" and exit with return code 1. I found that the hip_prof_str.h in src and build dir is different:

diff -u hip-5.0.1_build/include/hip/amd_detail/hip_prof_str.h hipamd-rocm-5.0.1/include/hip/amd_detail/hip_prof_str.h
--- hip-5.0.1_build/include/hip/amd_detail/hip_prof_str.h	2022-02-21 12:44:30.100019006 +0800
+++ hipamd-rocm-5.0.1/include/hip/amd_detail/hip_prof_str.h	2022-02-01 22:54:45.000000000 +0800
@@ -302,9 +302,7 @@
   HIP_API_ID_hipStreamGetCaptureInfo_v2 = 289,
   HIP_API_ID_hipStreamIsCapturing = 290,
   HIP_API_ID_hipStreamUpdateCaptureDependencies = 291,
-  HIP_API_ID_hipDrvPointerGetAttributes = 292,
-  HIP_API_ID_hipPointerGetAttribute = 293,
-  HIP_API_ID_LAST = 293,
+  HIP_API_ID_LAST = 291,
 
   HIP_API_ID_hipArray3DGetDescriptor = HIP_API_ID_NONE,
   HIP_API_ID_hipArrayGetDescriptor = HIP_API_ID_NONE,
@@ -411,7 +409,6 @@
     case HIP_API_ID_hipDrvMemcpy2DUnaligned: return "hipDrvMemcpy2DUnaligned";
     case HIP_API_ID_hipDrvMemcpy3D: return "hipDrvMemcpy3D";
     case HIP_API_ID_hipDrvMemcpy3DAsync: return "hipDrvMemcpy3DAsync";
-    case HIP_API_ID_hipDrvPointerGetAttributes: return "hipDrvPointerGetAttributes";
     case HIP_API_ID_hipEventCreate: return "hipEventCreate";
     case HIP_API_ID_hipEventCreateWithFlags: return "hipEventCreateWithFlags";
     case HIP_API_ID_hipEventDestroy: return "hipEventDestroy";
@@ -604,7 +601,6 @@
     case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: return "hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags";
     case HIP_API_ID_hipOccupancyMaxPotentialBlockSize: return "hipOccupancyMaxPotentialBlockSize";
     case HIP_API_ID_hipPeekAtLastError: return "hipPeekAtLastError";
-    case HIP_API_ID_hipPointerGetAttribute: return "hipPointerGetAttribute";
     case HIP_API_ID_hipPointerGetAttributes: return "hipPointerGetAttributes";
     case HIP_API_ID_hipProfilerStart: return "hipProfilerStart";
     case HIP_API_ID_hipProfilerStop: return "hipProfilerStop";
@@ -711,7 +707,6 @@
   if (strcmp("hipDrvMemcpy2DUnaligned", name) == 0) return HIP_API_ID_hipDrvMemcpy2DUnaligned;
   if (strcmp("hipDrvMemcpy3D", name) == 0) return HIP_API_ID_hipDrvMemcpy3D;
   if (strcmp("hipDrvMemcpy3DAsync", name) == 0) return HIP_API_ID_hipDrvMemcpy3DAsync;
-  if (strcmp("hipDrvPointerGetAttributes", name) == 0) return HIP_API_ID_hipDrvPointerGetAttributes;
   if (strcmp("hipEventCreate", name) == 0) return HIP_API_ID_hipEventCreate;
   if (strcmp("hipEventCreateWithFlags", name) == 0) return HIP_API_ID_hipEventCreateWithFlags;
   if (strcmp("hipEventDestroy", name) == 0) return HIP_API_ID_hipEventDestroy;
@@ -904,7 +899,6 @@
   if (strcmp("hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags", name) == 0) return HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags;
   if (strcmp("hipOccupancyMaxPotentialBlockSize", name) == 0) return HIP_API_ID_hipOccupancyMaxPotentialBlockSize;
   if (strcmp("hipPeekAtLastError", name) == 0) return HIP_API_ID_hipPeekAtLastError;
-  if (strcmp("hipPointerGetAttribute", name) == 0) return HIP_API_ID_hipPointerGetAttribute;
   if (strcmp("hipPointerGetAttributes", name) == 0) return HIP_API_ID_hipPointerGetAttributes;
   if (strcmp("hipProfilerStart", name) == 0) return HIP_API_ID_hipProfilerStart;
   if (strcmp("hipProfilerStop", name) == 0) return HIP_API_ID_hipProfilerStop;
@@ -1199,14 +1193,6 @@
       hipStream_t stream;
     } hipDrvMemcpy3DAsync;
     struct {
-      unsigned int numAttributes;
-      hipPointer_attribute* attributes;
-      hipPointer_attribute attributes__val;
-      void** data;
-      void* data__val;
-      hipDeviceptr_t ptr;
-    } hipDrvPointerGetAttributes;
-    struct {
       hipEvent_t* event;
       hipEvent_t event__val;
     } hipEventCreate;
@@ -2436,11 +2422,6 @@
       int blockSizeLimit;
     } hipOccupancyMaxPotentialBlockSize;
     struct {
-      void* data;
-      hipPointer_attribute attribute;
-      hipDeviceptr_t ptr;
-    } hipPointerGetAttribute;
-    struct {
       hipPointerAttribute_t* attributes;
       hipPointerAttribute_t attributes__val;
       const void* ptr;
@@ -2523,7 +2504,7 @@
       hipGraph_t* graph_out;
       hipGraph_t graph_out__val;
       const hipGraphNode_t** dependencies_out;
-      hipGraphNode_t* dependencies_out__val;
+      const hipGraphNode_t* dependencies_out__val;
       size_t* numDependencies_out;
       size_t numDependencies_out__val;
     } hipStreamGetCaptureInfo_v2;
@@ -2948,9 +2929,6 @@
   cb_data.args.hipDrvMemcpy3DAsync.pCopy = (const HIP_MEMCPY3D*)pCopy; \
   cb_data.args.hipDrvMemcpy3DAsync.stream = (hipStream_t)stream; \
 };
-// hipDrvPointerGetAttributes[('unsigned int', 'numAttributes'), ('hipPointer_attribute*', 'attributes'), ('void**', 'data'), ('hipDeviceptr_t', 'ptr')]
-#define INIT_hipDrvPointerGetAttributes_CB_ARGS_DATA(cb_data) { \
-};
 // hipEventCreate[('hipEvent_t*', 'event')]
 #define INIT_hipEventCreate_CB_ARGS_DATA(cb_data) { \
   cb_data.args.hipEventCreate.event = (hipEvent_t*)event; \
@@ -3143,11 +3121,6 @@
 };
 // hipGraphAddChildGraphNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipGraph_t', 'childGraph')]
 #define INIT_hipGraphAddChildGraphNode_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphAddChildGraphNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \
-  cb_data.args.hipGraphAddChildGraphNode.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphAddChildGraphNode.pDependencies = (const hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphAddChildGraphNode.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipGraphAddChildGraphNode.childGraph = (hipGraph_t)childGraph; \
 };
 // hipGraphAddDependencies[('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'from'), ('const hipGraphNode_t*', 'to'), ('size_t', 'numDependencies')]
 #define INIT_hipGraphAddDependencies_CB_ARGS_DATA(cb_data) { \
@@ -3165,27 +3138,12 @@
 };
 // hipGraphAddEventRecordNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipEvent_t', 'event')]
 #define INIT_hipGraphAddEventRecordNode_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphAddEventRecordNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \
-  cb_data.args.hipGraphAddEventRecordNode.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphAddEventRecordNode.pDependencies = (const hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphAddEventRecordNode.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipGraphAddEventRecordNode.event = (hipEvent_t)event; \
 };
 // hipGraphAddEventWaitNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipEvent_t', 'event')]
 #define INIT_hipGraphAddEventWaitNode_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphAddEventWaitNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \
-  cb_data.args.hipGraphAddEventWaitNode.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphAddEventWaitNode.pDependencies = (const hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphAddEventWaitNode.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipGraphAddEventWaitNode.event = (hipEvent_t)event; \
 };
 // hipGraphAddHostNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipHostNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphAddHostNode_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphAddHostNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \
-  cb_data.args.hipGraphAddHostNode.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphAddHostNode.pDependencies = (const hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphAddHostNode.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipGraphAddHostNode.pNodeParams = (const hipHostNodeParams*)pNodeParams; \
 };
 // hipGraphAddKernelNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipKernelNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphAddKernelNode_CB_ARGS_DATA(cb_data) { \
@@ -3216,27 +3174,9 @@
 };
 // hipGraphAddMemcpyNodeFromSymbol[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphAddMemcpyNodeFromSymbol_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode = (hipGraphNode_t*)pGraphNode; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.pDependencies = (const hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.dst = (void*)dst; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.symbol = (const void*)symbol; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.count = (size_t)count; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.offset = (size_t)offset; \
-  cb_data.args.hipGraphAddMemcpyNodeFromSymbol.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphAddMemcpyNodeToSymbol[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphAddMemcpyNodeToSymbol_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.pGraphNode = (hipGraphNode_t*)pGraphNode; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.pDependencies = (const hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.symbol = (const void*)symbol; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.src = (const void*)src; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.count = (size_t)count; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.offset = (size_t)offset; \
-  cb_data.args.hipGraphAddMemcpyNodeToSymbol.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphAddMemsetNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipMemsetParams*', 'pMemsetParams')]
 #define INIT_hipGraphAddMemsetNode_CB_ARGS_DATA(cb_data) { \
@@ -3248,13 +3188,9 @@
 };
 // hipGraphChildGraphNodeGetGraph[('hipGraphNode_t', 'node'), ('hipGraph_t*', 'pGraph')]
 #define INIT_hipGraphChildGraphNodeGetGraph_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphChildGraphNodeGetGraph.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphChildGraphNodeGetGraph.pGraph = (hipGraph_t*)pGraph; \
 };
 // hipGraphClone[('hipGraph_t*', 'pGraphClone'), ('hipGraph_t', 'originalGraph')]
 #define INIT_hipGraphClone_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphClone.pGraphClone = (hipGraph_t*)pGraphClone; \
-  cb_data.args.hipGraphClone.originalGraph = (hipGraph_t)originalGraph; \
 };
 // hipGraphCreate[('hipGraph_t*', 'pGraph'), ('unsigned int', 'flags')]
 #define INIT_hipGraphCreate_CB_ARGS_DATA(cb_data) { \
@@ -3267,33 +3203,21 @@
 };
 // hipGraphDestroyNode[('hipGraphNode_t', 'node')]
 #define INIT_hipGraphDestroyNode_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphDestroyNode.node = (hipGraphNode_t)node; \
 };
 // hipGraphEventRecordNodeGetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t*', 'event_out')]
 #define INIT_hipGraphEventRecordNodeGetEvent_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphEventRecordNodeGetEvent.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphEventRecordNodeGetEvent.event_out = (hipEvent_t*)event_out; \
 };
 // hipGraphEventRecordNodeSetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t', 'event')]
 #define INIT_hipGraphEventRecordNodeSetEvent_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphEventRecordNodeSetEvent.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphEventRecordNodeSetEvent.event = (hipEvent_t)event; \
 };
 // hipGraphEventWaitNodeGetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t*', 'event_out')]
 #define INIT_hipGraphEventWaitNodeGetEvent_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphEventWaitNodeGetEvent.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphEventWaitNodeGetEvent.event_out = (hipEvent_t*)event_out; \
 };
 // hipGraphEventWaitNodeSetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t', 'event')]
 #define INIT_hipGraphEventWaitNodeSetEvent_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphEventWaitNodeSetEvent.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphEventWaitNodeSetEvent.event = (hipEvent_t)event; \
 };
 // hipGraphExecChildGraphNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('hipGraph_t', 'childGraph')]
 #define INIT_hipGraphExecChildGraphNodeSetParams_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecChildGraphNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecChildGraphNodeSetParams.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecChildGraphNodeSetParams.childGraph = (hipGraph_t)childGraph; \
 };
 // hipGraphExecDestroy[('hipGraphExec_t', 'graphExec')]
 #define INIT_hipGraphExecDestroy_CB_ARGS_DATA(cb_data) { \
@@ -3301,21 +3225,12 @@
 };
 // hipGraphExecEventRecordNodeSetEvent[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('hipEvent_t', 'event')]
 #define INIT_hipGraphExecEventRecordNodeSetEvent_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecEventRecordNodeSetEvent.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecEventRecordNodeSetEvent.hNode = (hipGraphNode_t)hNode; \
-  cb_data.args.hipGraphExecEventRecordNodeSetEvent.event = (hipEvent_t)event; \
 };
 // hipGraphExecEventWaitNodeSetEvent[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('hipEvent_t', 'event')]
 #define INIT_hipGraphExecEventWaitNodeSetEvent_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecEventWaitNodeSetEvent.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecEventWaitNodeSetEvent.hNode = (hipGraphNode_t)hNode; \
-  cb_data.args.hipGraphExecEventWaitNodeSetEvent.event = (hipEvent_t)event; \
 };
 // hipGraphExecHostNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipHostNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphExecHostNodeSetParams_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecHostNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecHostNodeSetParams.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecHostNodeSetParams.pNodeParams = (const hipHostNodeParams*)pNodeParams; \
 };
 // hipGraphExecKernelNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipKernelNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphExecKernelNodeSetParams_CB_ARGS_DATA(cb_data) { \
@@ -3325,58 +3240,24 @@
 };
 // hipGraphExecMemcpyNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('hipMemcpy3DParms*', 'pNodeParams')]
 #define INIT_hipGraphExecMemcpyNodeSetParams_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams.pNodeParams = (hipMemcpy3DParms*)pNodeParams; \
 };
 // hipGraphExecMemcpyNodeSetParams1D[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphExecMemcpyNodeSetParams1D_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams1D.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams1D.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams1D.dst = (void*)dst; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams1D.src = (const void*)src; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams1D.count = (size_t)count; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParams1D.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphExecMemcpyNodeSetParamsFromSymbol[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphExecMemcpyNodeSetParamsFromSymbol_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.dst = (void*)dst; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.symbol = (const void*)symbol; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.count = (size_t)count; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.offset = (size_t)offset; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphExecMemcpyNodeSetParamsToSymbol[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphExecMemcpyNodeSetParamsToSymbol_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.symbol = (const void*)symbol; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.src = (const void*)src; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.count = (size_t)count; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.offset = (size_t)offset; \
-  cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphExecMemsetNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipMemsetParams*', 'pNodeParams')]
 #define INIT_hipGraphExecMemsetNodeSetParams_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecMemsetNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecMemsetNodeSetParams.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphExecMemsetNodeSetParams.pNodeParams = (const hipMemsetParams*)pNodeParams; \
 };
 // hipGraphExecUpdate[('hipGraphExec_t', 'hGraphExec'), ('hipGraph_t', 'hGraph'), ('hipGraphNode_t*', 'hErrorNode_out'), ('hipGraphExecUpdateResult*', 'updateResult_out')]
 #define INIT_hipGraphExecUpdate_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphExecUpdate.hGraphExec = (hipGraphExec_t)hGraphExec; \
-  cb_data.args.hipGraphExecUpdate.hGraph = (hipGraph_t)hGraph; \
-  cb_data.args.hipGraphExecUpdate.hErrorNode_out = (hipGraphNode_t*)hErrorNode_out; \
-  cb_data.args.hipGraphExecUpdate.updateResult_out = (hipGraphExecUpdateResult*)updateResult_out; \
 };
 // hipGraphGetEdges[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'from'), ('hipGraphNode_t*', 'to'), ('size_t*', 'numEdges')]
 #define INIT_hipGraphGetEdges_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphGetEdges.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphGetEdges.from = (hipGraphNode_t*)from; \
-  cb_data.args.hipGraphGetEdges.to = (hipGraphNode_t*)to; \
-  cb_data.args.hipGraphGetEdges.numEdges = (size_t*)numEdges; \
 };
 // hipGraphGetNodes[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'nodes'), ('size_t*', 'numNodes')]
 #define INIT_hipGraphGetNodes_CB_ARGS_DATA(cb_data) { \
@@ -3392,13 +3273,9 @@
 };
 // hipGraphHostNodeGetParams[('hipGraphNode_t', 'node'), ('hipHostNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphHostNodeGetParams_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphHostNodeGetParams.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphHostNodeGetParams.pNodeParams = (hipHostNodeParams*)pNodeParams; \
 };
 // hipGraphHostNodeSetParams[('hipGraphNode_t', 'node'), ('const hipHostNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphHostNodeSetParams_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphHostNodeSetParams.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphHostNodeSetParams.pNodeParams = (const hipHostNodeParams*)pNodeParams; \
 };
 // hipGraphInstantiate[('hipGraphExec_t*', 'pGraphExec'), ('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'pErrorNode'), ('char*', 'pLogBuffer'), ('size_t', 'bufferSize')]
 #define INIT_hipGraphInstantiate_CB_ARGS_DATA(cb_data) { \
@@ -3410,9 +3287,6 @@
 };
 // hipGraphInstantiateWithFlags[('hipGraphExec_t*', 'pGraphExec'), ('hipGraph_t', 'graph'), ('unsigned long long', 'flags')]
 #define INIT_hipGraphInstantiateWithFlags_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphInstantiateWithFlags.pGraphExec = (hipGraphExec_t*)pGraphExec; \
-  cb_data.args.hipGraphInstantiateWithFlags.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphInstantiateWithFlags.flags = (unsigned long long)flags; \
 };
 // hipGraphKernelNodeGetParams[('hipGraphNode_t', 'node'), ('hipKernelNodeParams*', 'pNodeParams')]
 #define INIT_hipGraphKernelNodeGetParams_CB_ARGS_DATA(cb_data) { \
@@ -3441,29 +3315,12 @@
 };
 // hipGraphMemcpyNodeSetParams1D[('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphMemcpyNodeSetParams1D_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphMemcpyNodeSetParams1D.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphMemcpyNodeSetParams1D.dst = (void*)dst; \
-  cb_data.args.hipGraphMemcpyNodeSetParams1D.src = (const void*)src; \
-  cb_data.args.hipGraphMemcpyNodeSetParams1D.count = (size_t)count; \
-  cb_data.args.hipGraphMemcpyNodeSetParams1D.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphMemcpyNodeSetParamsFromSymbol[('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphMemcpyNodeSetParamsFromSymbol_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.dst = (void*)dst; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.symbol = (const void*)symbol; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.count = (size_t)count; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.offset = (size_t)offset; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphMemcpyNodeSetParamsToSymbol[('hipGraphNode_t', 'node'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')]
 #define INIT_hipGraphMemcpyNodeSetParamsToSymbol_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.symbol = (const void*)symbol; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.src = (const void*)src; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.count = (size_t)count; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.offset = (size_t)offset; \
-  cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.kind = (hipMemcpyKind)kind; \
 };
 // hipGraphMemsetNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemsetParams*', 'pNodeParams')]
 #define INIT_hipGraphMemsetNodeGetParams_CB_ARGS_DATA(cb_data) { \
@@ -3477,33 +3334,18 @@
 };
 // hipGraphNodeFindInClone[('hipGraphNode_t*', 'pNode'), ('hipGraphNode_t', 'originalNode'), ('hipGraph_t', 'clonedGraph')]
 #define INIT_hipGraphNodeFindInClone_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphNodeFindInClone.pNode = (hipGraphNode_t*)pNode; \
-  cb_data.args.hipGraphNodeFindInClone.originalNode = (hipGraphNode_t)originalNode; \
-  cb_data.args.hipGraphNodeFindInClone.clonedGraph = (hipGraph_t)clonedGraph; \
 };
 // hipGraphNodeGetDependencies[('hipGraphNode_t', 'node'), ('hipGraphNode_t*', 'pDependencies'), ('size_t*', 'pNumDependencies')]
 #define INIT_hipGraphNodeGetDependencies_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphNodeGetDependencies.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphNodeGetDependencies.pDependencies = (hipGraphNode_t*)pDependencies; \
-  cb_data.args.hipGraphNodeGetDependencies.pNumDependencies = (size_t*)pNumDependencies; \
 };
 // hipGraphNodeGetDependentNodes[('hipGraphNode_t', 'node'), ('hipGraphNode_t*', 'pDependentNodes'), ('size_t*', 'pNumDependentNodes')]
 #define INIT_hipGraphNodeGetDependentNodes_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphNodeGetDependentNodes.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphNodeGetDependentNodes.pDependentNodes = (hipGraphNode_t*)pDependentNodes; \
-  cb_data.args.hipGraphNodeGetDependentNodes.pNumDependentNodes = (size_t*)pNumDependentNodes; \
 };
 // hipGraphNodeGetType[('hipGraphNode_t', 'node'), ('hipGraphNodeType*', 'pType')]
 #define INIT_hipGraphNodeGetType_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphNodeGetType.node = (hipGraphNode_t)node; \
-  cb_data.args.hipGraphNodeGetType.pType = (hipGraphNodeType*)pType; \
 };
 // hipGraphRemoveDependencies[('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'from'), ('const hipGraphNode_t*', 'to'), ('size_t', 'numDependencies')]
 #define INIT_hipGraphRemoveDependencies_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipGraphRemoveDependencies.graph = (hipGraph_t)graph; \
-  cb_data.args.hipGraphRemoveDependencies.from = (const hipGraphNode_t*)from; \
-  cb_data.args.hipGraphRemoveDependencies.to = (const hipGraphNode_t*)to; \
-  cb_data.args.hipGraphRemoveDependencies.numDependencies = (size_t)numDependencies; \
 };
 // hipGraphicsGLRegisterBuffer[('hipGraphicsResource**', 'resource'), ('GLuint', 'buffer'), ('unsigned int', 'flags')]
 #define INIT_hipGraphicsGLRegisterBuffer_CB_ARGS_DATA(cb_data) { \
@@ -4205,9 +4047,6 @@
 // hipPeekAtLastError[]
 #define INIT_hipPeekAtLastError_CB_ARGS_DATA(cb_data) { \
 };
-// hipPointerGetAttribute[('void*', 'data'), ('hipPointer_attribute', 'attribute'), ('hipDeviceptr_t', 'ptr')]
-#define INIT_hipPointerGetAttribute_CB_ARGS_DATA(cb_data) { \
-};
 // hipPointerGetAttributes[('hipPointerAttribute_t*', 'attributes'), ('const void*', 'ptr')]
 #define INIT_hipPointerGetAttributes_CB_ARGS_DATA(cb_data) { \
   cb_data.args.hipPointerGetAttributes.attributes = (hipPointerAttribute_t*)attributes; \
@@ -4289,18 +4128,9 @@
 };
 // hipStreamGetCaptureInfo[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'pCaptureStatus'), ('unsigned long long*', 'pId')]
 #define INIT_hipStreamGetCaptureInfo_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipStreamGetCaptureInfo.stream = (hipStream_t)stream; \
-  cb_data.args.hipStreamGetCaptureInfo.pCaptureStatus = (hipStreamCaptureStatus*)pCaptureStatus; \
-  cb_data.args.hipStreamGetCaptureInfo.pId = (unsigned long long*)pId; \
 };
 // hipStreamGetCaptureInfo_v2[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'captureStatus_out'), ('unsigned long long*', 'id_out'), ('hipGraph_t*', 'graph_out'), ('const hipGraphNode_t**', 'dependencies_out'), ('size_t*', 'numDependencies_out')]
 #define INIT_hipStreamGetCaptureInfo_v2_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipStreamGetCaptureInfo_v2.stream = (hipStream_t)stream; \
-  cb_data.args.hipStreamGetCaptureInfo_v2.captureStatus_out = (hipStreamCaptureStatus*)captureStatus_out; \
-  cb_data.args.hipStreamGetCaptureInfo_v2.id_out = (unsigned long long*)id_out; \
-  cb_data.args.hipStreamGetCaptureInfo_v2.graph_out = (hipGraph_t*)graph_out; \
-  cb_data.args.hipStreamGetCaptureInfo_v2.dependencies_out = (const hipGraphNode_t**)dependencies_out; \
-  cb_data.args.hipStreamGetCaptureInfo_v2.numDependencies_out = (size_t*)numDependencies_out; \
 };
 // hipStreamGetFlags[('hipStream_t', 'stream'), ('unsigned int*', 'flags')]
 #define INIT_hipStreamGetFlags_CB_ARGS_DATA(cb_data) { \
@@ -4314,8 +4144,6 @@
 };
 // hipStreamIsCapturing[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'pCaptureStatus')]
 #define INIT_hipStreamIsCapturing_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipStreamIsCapturing.stream = (hipStream_t)stream; \
-  cb_data.args.hipStreamIsCapturing.pCaptureStatus = (hipStreamCaptureStatus*)pCaptureStatus; \
 };
 // hipStreamQuery[('hipStream_t', 'stream')]
 #define INIT_hipStreamQuery_CB_ARGS_DATA(cb_data) { \
@@ -4327,10 +4155,6 @@
 };
 // hipStreamUpdateCaptureDependencies[('hipStream_t', 'stream'), ('hipGraphNode_t*', 'dependencies'), ('size_t', 'numDependencies'), ('unsigned int', 'flags')]
 #define INIT_hipStreamUpdateCaptureDependencies_CB_ARGS_DATA(cb_data) { \
-  cb_data.args.hipStreamUpdateCaptureDependencies.stream = (hipStream_t)stream; \
-  cb_data.args.hipStreamUpdateCaptureDependencies.dependencies = (hipGraphNode_t*)dependencies; \
-  cb_data.args.hipStreamUpdateCaptureDependencies.numDependencies = (size_t)numDependencies; \
-  cb_data.args.hipStreamUpdateCaptureDependencies.flags = (unsigned int)flags; \
 };
 // hipStreamWaitEvent[('hipStream_t', 'stream'), ('hipEvent_t', 'event'), ('unsigned int', 'flags')]
 #define INIT_hipStreamWaitEvent_CB_ARGS_DATA(cb_data) { \
@@ -4761,11 +4585,6 @@
     case HIP_API_ID_hipDrvMemcpy3DAsync:
       if (data->args.hipDrvMemcpy3DAsync.pCopy) data->args.hipDrvMemcpy3DAsync.pCopy__val = *(data->args.hipDrvMemcpy3DAsync.pCopy);
       break;
-// hipDrvPointerGetAttributes[('unsigned int', 'numAttributes'), ('hipPointer_attribute*', 'attributes'), ('void**', 'data'), ('hipDeviceptr_t', 'ptr')]
-    case HIP_API_ID_hipDrvPointerGetAttributes:
-      if (data->args.hipDrvPointerGetAttributes.attributes) data->args.hipDrvPointerGetAttributes.attributes__val = *(data->args.hipDrvPointerGetAttributes.attributes);
-      if (data->args.hipDrvPointerGetAttributes.data) data->args.hipDrvPointerGetAttributes.data__val = *(data->args.hipDrvPointerGetAttributes.data);
-      break;
 // hipEventCreate[('hipEvent_t*', 'event')]
     case HIP_API_ID_hipEventCreate:
       if (data->args.hipEventCreate.event) data->args.hipEventCreate.event__val = *(data->args.hipEventCreate.event);
@@ -5517,9 +5336,6 @@
 // hipPeekAtLastError[]
     case HIP_API_ID_hipPeekAtLastError:
       break;
-// hipPointerGetAttribute[('void*', 'data'), ('hipPointer_attribute', 'attribute'), ('hipDeviceptr_t', 'ptr')]
-    case HIP_API_ID_hipPointerGetAttribute:
-      break;
 // hipPointerGetAttributes[('hipPointerAttribute_t*', 'attributes'), ('const void*', 'ptr')]
     case HIP_API_ID_hipPointerGetAttributes:
       if (data->args.hipPointerGetAttributes.attributes) data->args.hipPointerGetAttributes.attributes__val = *(data->args.hipPointerGetAttributes.attributes);
@@ -6067,16 +5883,6 @@
       oss << ", stream=" << data->args.hipDrvMemcpy3DAsync.stream;
       oss << ")";
     break;
-    case HIP_API_ID_hipDrvPointerGetAttributes:
-      oss << "hipDrvPointerGetAttributes(";
-      oss << "numAttributes=" << data->args.hipDrvPointerGetAttributes.numAttributes;
-      if (data->args.hipDrvPointerGetAttributes.attributes == NULL) oss << ", attributes=NULL";
-      else oss << ", attributes=" << data->args.hipDrvPointerGetAttributes.attributes__val;
-      if (data->args.hipDrvPointerGetAttributes.data == NULL) oss << ", data=NULL";
-      else oss << ", data=" << data->args.hipDrvPointerGetAttributes.data__val;
-      oss << ", ptr=" << data->args.hipDrvPointerGetAttributes.ptr;
-      oss << ")";
-    break;
     case HIP_API_ID_hipEventCreate:
       oss << "hipEventCreate(";
       if (data->args.hipEventCreate.event == NULL) oss << "event=NULL";
@@ -7698,13 +7504,6 @@
       oss << "hipPeekAtLastError(";
       oss << ")";
     break;
-    case HIP_API_ID_hipPointerGetAttribute:
-      oss << "hipPointerGetAttribute(";
-      oss << "data=" << data->args.hipPointerGetAttribute.data;
-      oss << ", attribute=" << data->args.hipPointerGetAttribute.attribute;
-      oss << ", ptr=" << data->args.hipPointerGetAttribute.ptr;
-      oss << ")";
-    break;
     case HIP_API_ID_hipPointerGetAttributes:
       oss << "hipPointerGetAttributes(";
       if (data->args.hipPointerGetAttributes.attributes == NULL) oss << "attributes=NULL";

hipDeviceGetUuid is not compatible with VkPhysicalDeviceIDProperties::deviceUUID

As the title suggests, hipDeviceGetUuid returns a different UUID than VkPhysicalDeviceIDProperties::deviceUUID. These IDs are needed to relate two devices from the different APIs to eachother, so as to ensure that the same is chosen when doing interop.

It seems that both in the Mesa and AMDVLK drivers, this ID is derived from the PCI bus address and so is compatible with each other (even if only by accident), while in hip it is via rocr derived from unique_id in /sys/devices/virtual/kfd/kfd/topology/nodes/*/properties, which is ultimately created from the device's serial number (at least on some architectures).

Possible bug in occupancy calculation

I've been porting the algorithm in ihipOccupancyMaxActiveBlocksPerMultiprocessor to Julia to use for our own occupancy calculations, and noticed a potential typo in:

https://github.com/ROCm-Developer-Tools/hipamd/blob/05e3016405bdaec0b8acfe8378df968dd02136d5/src/hip_platform.cpp#L351

I assume this should actually be if (wrkGrpInfo->usedVGPRs_ > 0) {, since we want to ensure we don't get a divide-by-zero when no VGPRs are used? Obviously such kernels probably don't actually need their occupancy calculated, but I figured it was worth reporting in case someone has run into this!

FAIL to build hipamd with pal lib. How to do that?

I met a problem. I'd like to build hipamd with pal lib on linux. But it failed.
It seemed to lack libelf and hsail. where to get them ?

CMake Error at /home/lumg/ROCm/ROCclr/cmake/FindAMD_HSA_LOADER.cmake:55 (add_subdirectory):
add_subdirectory given source "AMD_LIBELF_INCLUDE_DIR-NOTFOUND" which is
not an existing directory.
Call Stack (most recent call first):
/home/lumg/ROCm/ROCclr/cmake/ROCclrPAL.cmake:53 (find_package)
/home/lumg/ROCm/ROCclr/cmake/ROCclr.cmake:134 (include)
src/cmake/FindROCclr.cmake:51 (include)
src/CMakeLists.txt:60 (find_package)

CMake Error at /home/lumg/ROCm/ROCclr/cmake/FindAMD_HSA_LOADER.cmake:56 (add_subdirectory):
add_subdirectory given source
"AMD_HSAIL_INCLUDE_DIR-NOTFOUND/../ext/libamdhsacode" which is not an
existing directory.
Call Stack (most recent call first):
/home/lumg/ROCm/ROCclr/cmake/ROCclrPAL.cmake:53 (find_package)
/home/lumg/ROCm/ROCclr/cmake/ROCclr.cmake:134 (include)
src/cmake/FindROCclr.cmake:51 (include)
src/CMakeLists.txt:60 (find_package)

CMake Error at /home/lumg/ROCm/ROCclr/cmake/FindAMD_HSA_LOADER.cmake:57 (add_subdirectory):
add_subdirectory given source
"AMD_HSAIL_INCLUDE_DIR-NOTFOUND/../ext/loader" which is not an existing
directory.
Call Stack (most recent call first):
/home/lumg/ROCm/ROCclr/cmake/ROCclrPAL.cmake:53 (find_package)
/home/lumg/ROCm/ROCclr/cmake/ROCclr.cmake:134 (include)
src/cmake/FindROCclr.cmake:51 (include)
src/CMakeLists.txt:60 (find_package)

CMAKE_INSTALL_<dir> paths assumed to be relative

This causes problems for distributions which may use different paths.
See: NixOS/nixpkgs#197838 (comment)
I have also noticed this to be an issue across most ROCm packages, (rocsparse, hipsparse, etc) in which manual CMAKE_INSTALL_<dir> variables must be set on the package maintainer's side.

Suggested fix: Use CMAKE_INSTALL_FULL_<dir> for absolute paths instead.
In what I've noticed particularly in rocsparse first, the current method ends up making <path>/rocsparse/<path>, in hip it creates invalid symlinks. (On NixOS, where package roots are in /nix/store/<package> and not /)

Debug symbols are not generated on Fedora

I tried doing packaging tests on Fedora and debug symbols were not generated starting with 5.1.0

I did a quick bisect and it looks like it's caused by d153574, but I suspect this only exposed an existing problem, not the cause itself.

ROCm 5.3 gfx1030 hang with hipStreamCreate and hipStreamDestroy

The following test hangs with ROCm 5.3 on the gfx1030 architecture (AMD Radeon PRO V620).

#include <hip/hip_runtime.h>

#include <cstdio>

int main()
{
    printf("starting..\n");

    hipStream_t stream;
    hipStreamCreate(&stream);
    hipStreamDestroy(stream);

    hipStream_t stream2;
    hipStreamCreateWithFlags(&stream2, hipStreamNonBlocking);
    hipStreamDestroy(stream2);

    printf("finished!\n");
}

Ran with

hipcc test.cpp
./a.out

Built and executed in Docker image rocm/rocm-terminal. hipconfig reports

HIP version  : 5.3.22061-e8e78f1a

== hipconfig
HIP_PATH     : /opt/rocm-5.3.0
ROCM_PATH    : /opt/rocm-5.3.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.3.0/include -I/opt/rocm-5.3.0/llvm/bin/../lib/clang/15.0.0 -I/opt/rocm-5.3.0/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.3.0/hsa
HIP_CLANG_PATH   : /opt/rocm-5.3.0/llvm/bin
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.3.0 22362 3cf23f77f8208174a2ee7c616f4be23674d7b081)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.3.0/llvm/bin
AMD LLVM version 15.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver3

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
 -std=c++11 -isystem "/opt/rocm-5.3.0/llvm/lib/clang/15.0.0/include/.." -isystem /opt/rocm-5.3.0/hsa/include -isystem "/opt/rocm-5.3.0/include" -O3
 -L"/opt/rocm-5.3.0/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/home/rocm-user/.vscode-server/bin/129500ee4c8ab7263461ffe327268ba56b9f210d/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/rocm/bin

== Linux Kernel
Hostname     : fb5ed677a12b
Linux fb5ed677a12b 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

The test works with ROCm 5.2 and with ROCm 5.3 on other architectures.

Do libamdhip really depend on libamd_comgr ?

I'm maintaining hip package in Gentoo and I found although the cmake https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/cmake/ROCclrLC.cmake specifies that librocclr.a calls functions in libamd_comgr.so, when it comes to the final link of libamdhip64.so, the linker with --as-needed flag drops the link of libamd_comgr.so, meaning there is no functions in libamd_comgr.so called. I checked that -DUSE_COMGR_LIBRARY is used in compiling.

So is libamdhip really need libcomgr, or building hip just using comgr headers?

`#define __local` clashes with libcxx templates

There is a global #define __local in device_library_decls.h. This symbol is already used by some libcxx templates. Building (and depending on) libcxx with the (soon to be default) _LIBCPP_REMOVE_TRANSITIVE_INCLUDES macro causes these templates to break, as the define leaks into them:

In file included from hip_example/example.cpp:5:
In file included from external/rules_ll~override~rules_ll_dependencies~hip/include/hip/hip_runtime.h:62:
In file included from external/rules_ll~override~rules_ll_dependencies~hipamd/include/hip/amd_detail/amd_hip_runtime.h:380:
In file included from external/rules_ll~override~rules_ll_dependencies~hipamd/include/hip/amd_detail/amd_math_functions.h:32:
In file included from external/llvm-project-overlay~17-init-bcr.2~llvm_project_overlay~llvm-project/libcxx/include/algorithm:1732:
external/llvm-project-overlay~17-init-bcr.2~llvm_project_overlay~llvm-project/libcxx/include/__algorithm/copy_backward.h:72:58: error: expected unqualified-id
    __result = std::__copy_backward<_AlgPolicy>(_Traits::__local(__first), _Traits::__end(__slast), std::move(__result))
                                                         ^
external/rules_ll~override~rules_ll_dependencies~hipamd/include/hip/amd_detail/device_library_decls.h:121:17: note: expanded from macro '__local'
#define __local __attribute__((address_space(3)))
                ^

https://github.com/ROCm-Developer-Tools/hipamd/blob/4209792929ddf54ba9530813b7879cfdee42df14/include/hip/amd_detail/device_library_decls.h#L120-L125

An ad-hoc fix is to rename the define to something like __device_local or to write out the attribute. However, I'm not sure whether the __local macro is used in the wider ROCm ecosystem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.