Comments (7)
@tangrc99 this expected as the implementation of ibv_reg_mr in the Linux kernel requires the virtual address range to be backed by CPU memory pages.
More exactly, pin_user_pages does not work on CPU mappings of PCIe resources created via io_remap_pfn_range.
The official way of enabling RDMA on GPU memory is:
- querying a dma-buf file descriptor from a GPU device pointer (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g51e719462c04ee90a6b0f8b2a75fe031) and
- passing that to ibv_reg_dmabuf_mr.
For a full deployment case, see for example https://github.com/openucx/ucx/blob/1308d2055ab0ba948eac213c8cfcd92776c34a53/src/uct/cuda/cuda_copy/cuda_copy_md.c#L410 and https://github.com/openucx/ucx/blob/1308d2055ab0ba948eac213c8cfcd92776c34a53/src/uct/ib/base/ib_md.c#L480.
from gdrcopy.
Thanks, cause A10 don't support dma-buf file descriptor. Can I use GDR on A10 with other methods ?
from gdrcopy.
It should. Are you using the openrm variant of the GPU kernel-mode driver, see https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/ ?
from gdrcopy.
function cuMemGetHandleForAddressRange
requires CU_MEM_RANGE_HANDLE_TYPE_DMA_BUF_FD
which is 0 on A10. nv_peer_mem
and nvidia-peermem
is already loaded, is there any other requirements ?
from gdrcopy.
Hi @tangrc99,
Neither nvidia-peermem
nor nv_peer_mem
involves in dmabuf. A10 should support dmabuf. Could you check if your SW stack is new enough to support dmabuf?
- NVIDIA driver with the open variant version 515 or later.
- CUDA 11.7 or later.
- Linux kernel version 5.12 or later. This is for the NIC stack. The GPU stack does not have this requirement.
from gdrcopy.
Thanks, My Linux kernel 4.18.0 is too old.
from gdrcopy.
In that case you can use the legacy RDMA memory registration path, i.e. ibv_reg_mr
, which involves the peer-direct kernel infrastructure (for example provided by MLNX_OFED) and nvidia-peermem
.
from gdrcopy.
Related Issues (20)
- Facing issue when installing HOT 1
- Ubuntu 22 - dpkg: error processing package gdrdrv-dkms:amd64 (--install) during installation of gdrcopy HOT 3
- Why D2H is relatively slower? HOT 2
- Query: Confusion about sudo requirement HOT 3
- thinking about working with CUDA async API
- gdrcopy_sanity failed when GPU Compute Mode is set to EXCLUSIVE HOT 1
- Unable to compile GDRCOPY v2.4 HOT 2
- Minimal steps to install gdrdrv driver only please HOT 6
- Fail to access mapped memory from CPU side(Fail data_validation tests) HOT 14
- tests build failing when check.h is not available HOT 1
- How to understand the file "nv-p2p-dummpy.c" HOT 3
- Driver flavor detection fails for 545 series HOT 2
- bad performance(compare with cuMemcpy) on x86 system HOT 3
- GDRCopy 2.4 on Centos7 failing build of RPM packages HOT 2
- Increasing utilization - gdrcopy_copybw HOT 3
- Improve the error report of gdrcopy_pplat when the CUDA kernel cannot be launched
- Safe Mounting of /dev/gdrdrv in a kubernetes environment - HostPath appears to fail HOT 12
- How to effectively test if gdrcopy is enabled using Real world ML workload ? HOT 2
- Can't make with Intel Compiler HOT 4
- MAINT: gdr_unmap segfault on master branch via NVSHMEM 2.10.1 on Cray Slingshot 11 with cuFFTMp HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gdrcopy.