Giter Club home page Giter Club logo

Comments (6)

paboyle avatar paboyle commented on August 20, 2024

Feedback from NVIDIA:

They said:

"The documentation for cuPointerGetAttribute is here

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g0c28ed0aff848042bc0533110e45820c

When testing for the CU_POINTER_ATTRIBUTE_MEMORY_TYPE, the possible values for data are CU_MEMORYTYPE_HOST and CU_MEMORYTYPE_DEVICE only. The other two options are never returned by this call.

A different call to cuPointerGetAttribute is needed to determine if a pointer points to managed memory. Testing for the CU_POINTER_ATTRIBUTE_IS_MANAGED, sets data to either 0 or 1. "

Thus, I think they say it looks like psm_user.h _psmi_is_cuda_mem is being used to indicate the CUDA IPC can be used, but is false triggering when the memory is managed, according to the CUDA documentation.

However, I disagree with them:

I have read the documentation, and it is actually unclear what the behaviour is defined as under unified memory pointers.

In fact it seems to miss a few words to say what pointer type is being talked about in places.

"Returns in *data the physical memory type of the memory that ptr addresses as a CUmemorytype enumerated value."

Can reasonably be interpreted that the Unified memory type should return CU_MEMORYTYPE_UNIFIED, on account of it being unified memory type.

It seems very very odd to have CU_MEMORYTYPE_UNIFIED and a query function for MEMORY_TYPE that is unable to return some of the values.

Whatever happens: the combination of CUDA and PSM2 has a bug until this is fixed and the usage agreed upon.

from opa-psm2.

paboyle avatar paboyle commented on August 20, 2024

Following appears to fix the issue (psm_user.h).

PSMI_ALWAYS_INLINE(
int
_psmi_is_cuda_mem(void *ptr))
{
        CUresult cres;
        CUmemorytype mt;
        unsigned uvm;

        cres = psmi_cuPointerGetAttribute(&mt, CU_POINTER_ATTRIBUTE_MEMORY_TYPE, (CUdeviceptr) ptr);

        if ((cres == CUDA_SUCCESS) && (mt == CU_MEMORYTYPE_DEVICE)) {

          cres = psmi_cuPointerGetAttribute(&uvm, CU_POINTER_ATTRIBUTE_IS_MANAGED, (CUdeviceptr) ptr);

          if ((cres==CUDA_SUCCESS) && (uvm==0) ) return 1;
          else                                   return 0;

        } else {

          return 0;

        }
}

Personally, I think cuPointerGetAttribute looks to be in the wrong, though

from opa-psm2.

paboyle avatar paboyle commented on August 20, 2024

The cuda 10.1 execution of the code was run on a system with the cuda 9.2 kernel driver. it is possible (Tim Lanfear produced same output with cuda 9.2 and 10.1) that the 10.1 behaviour will match 9.2 once the kernel driver is updated.

This is on a centrally run supercomputer, so I can't update kernel drivers to check.

from opa-psm2.

paboyle avatar paboyle commented on August 20, 2024

Received the following from Nvidia, and they confirm this is their defined behaviour of cuPointerGetAttribute. Perhaps even worse, I think this means the returned class will vary page by page with UVM according to location:

The behaviour of cuPointerGetAttribute() with the CU_POINTER_ATTRIBUTE_MEMORY_TYPE attribute is that it will return the physical memory type of the memory that the pointer addresses. With a pointer to managed memory, at any given time this could be host memory or device memory depending on where the data is mapped at that time. Putting it another way, ptr is a virtual address backed by some physical storage, and CU_POINTER_ATTRIBUTE_MEMORY_TYPE allows you to find where the pointer is at this moment in time.

While the documentation is supposed to make it clear that only these two values can be returned (if cuPointerGetAttribute() is successful), I do think it could be improved, especially since reusing the CUmemorytype enum might suggest otherwise.

I'm not familiar with PSM2, but from the context I gather it is going to use cudaIpcGetMemHandle() (or driver API equivalent) and then pass the returned pointer to another process. Unfortunately, the IPC API is not supported for managed memory allocations. That's because in the source process the CUDA Runtime is able to intercept page faults on the allocation and handle them, but the receiving process does not know that the pointer is managed by the CUDA driver and so tries to use the normal kernel page fault handler, which doesn't know about the GPU (*).

Without knowing the rest of the code, I'm not sure if your proposed fix would work – it depends what PSM2 does if it determines that the pointer is "not a CUDA pointer."

It may also be helpful to clarify the difference between "unified" and "managed" memory. Unified Memory is the ability for a virtual memory allocation to be physically located in either device or host memory, and for page faults from CPU or GPU to be handled to access the correct memory or migrate the page as appropriate (e.g. based on heuristics). Managed Memory is unified memory that is managed by the CUDA driver rather than the Linux kernel.

(*) As you may know, we have been working with the Linux community on the Heterogeneous Memory Manager (HMM) in the Linux kernel. HMM allows the Linux kernel to understand about pointers in non-conventional memory (e.g. GPU memory) which means that we no longer need "managed memory" since the memory can be handled by the kernel instead of by the CUDA driver. With HMM, you would be able to use malloc() instead of cudaMallocManaged() and you should be able to pass the pointer via IPC and have it behave correctly.

from opa-psm2.

mwheinz avatar mwheinz commented on August 20, 2024

Reviewing old issues - it appears that Adam submitted a pair of patches for this back in 2019.

@paboyle - I know it's been a ridiculously long time but do you know if this was fixed in more recent IFS releases or is this still a problem for you?

from opa-psm2.

mwheinz avatar mwheinz commented on August 20, 2024

Pete has agreed that PSM2 was patched to correct the issue.

from opa-psm2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.