nvidia / gdrcopy Goto Github PK

View Code? Open in Web Editor NEW

783.0 55.0 140.0 680 KB

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

License: MIT License

Makefile 4.95% C++ 41.32% C 38.71% Shell 11.52% Cuda 3.50%

gpu-memory nvidia kernel-mode-driver gpudirect-rdma libraries linux

gdrcopy's Issues

a question about CPU mappings

Hi,
I use these same APIs to create perfectly valid CPU mappings of two GPU memory.Like this:

//------dev_b pin buff----------------------------------------------------------
     	    unsigned int flag_b;
     	     cuPointerSetAttribute(&flag_b, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, dev_b);
     	     gdr_mh_t mh_b;
     	     gdr_t g_b=gdr_open();
     	     ASSERT_NEQ(g_b, (void*)0);
     	     gdr_pin_buffer(g_b,dev_b,sizeof(int),0,0,&mh_b);
     	     void * bar_ptr_b=NULL;
     	     ASSERT_EQ(gdr_map(g_b, mh_b, &bar_ptr_b, sizeof(int)), 0);
     	     gdr_info_t info_b;
     	     gdr_get_info(g_b,mh_b,&info_b);
     	     int off_b=dev_b-info_b.va;
     	     cout<<"off_b:"<<off_b<<endl;
     	     uint32_t * buf_ptr_b=(uint32_t *)((char *)bar_ptr_b+off_b);
     	     cout<<"buf_ptr_b:"<<buf_ptr_b<<endl;
      //---------------------------------------------------------------------------------------
      //-------dev_a pin buff------------------------------------------------------
     unsigned int flag;
         cuPointerSetAttribute(&flag, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, dev_a);
     gdr_mh_t mh;
     gdr_t g=gdr_open();
     ASSERT_NEQ(g, (void*)0);
     gdr_pin_buffer(g, dev_a, N*sizeof(int), 0, 0, &mh);
     void * bar_ptr=NULL;
     ASSERT_EQ(gdr_map(g, mh, &bar_ptr, N*sizeof(int)), 0);
     gdr_info_t info;
     gdr_get_info(g,mh,&info);
     int off=dev_a-info.va;
     cout<<"off_a:"<<off<<endl;
     uint32_t * buf_ptr=(uint32_t *)((char *)bar_ptr+off);
     cout<<"buf_ptr:"<<buf_ptr<<endl;

But it failed.And I find that It's related to the order of a and b, the previous success, the subsequent failure.
Do you have any idea what could be happening here?

Thanks,

kernel crash in gdrdrv_mmap for small size

[ 2260.994632] gdrdrv:minor=0
[ 2260.994639] gdrdrv:ioctl called (cmd 0xc020da01)
[ 2260.994641] gdrdrv:invoking nvidia_p2p_get_pages(va=0x10916200000 len=4096 p2p_tok=0 va_tok=0)
[ 2260.995112] gdrdrv:page table entries: 1
[ 2260.995113] gdrdrv:page[0]=0x0000383800200000
[ 2260.995116] gdrdrv:ioctl called (cmd 0xc008da04)
[ 2260.995120] gdrdrv:mmap start=0x7f20ae059000 size=4096 off=0x455f5790
[ 2260.995121] gdrdrv:offset=0 len=65536 vaddr+offset=7f20ae059000 paddr+offset=383800200000
[ 2260.995122] gdrdrv:mmaping phys mem addr=0x383800200000 size=65536 at user virt addr=0x7f20ae059000
[ 2260.995123] gdrdrv:pfn=0x383800200
[ 2260.995124] gdrdrv:calling io_remap_pfn_range() vma=ffff883f28c33a90 vaddr=7f20ae059000 pfn=383800200 size=65536
[ 2260.995163] ------------[ cut here ]------------
[ 2260.995182] kernel BUG at /build/linux-lts-xenial-80t3lB/linux-lts-xenial-4.4.0/mm/memory.c:1674!
[ 2260.995204] invalid opcode: 0000 [#1] SMP
...
[ 2260.995861] [] gdrdrv_mmap_phys_mem_wcomb+0x71/0x130 [gdrdrv]
[ 2260.995879] [] gdrdrv_mmap+0x156/0x2e0 [gdrdrv]
[ 2260.995896] [] ? kmem_cache_alloc+0x1e2/0x200
[ 2260.995911] [] mmap_region+0x3f4/0x610
[ 2260.995926] [] do_mmap+0x2fc/0x3d0
[ 2260.995940] [] vm_mmap_pgoff+0x91/0xc0
[ 2260.995954] [] SyS_mmap_pgoff+0x197/0x260
[ 2260.995970] [] SyS_mmap+0x22/0x30

support AVX-512 instructions

Fails in ioctl call in gdr_pin_buffer. Perhaps the GDRDRV_IOC_PIN_BUFFER flags are incorrect.

-bash-4.2$ ./validate
buffer size: 327680
device ptr: 7fffa0600000
gdr open: 0xc9abf0
before ioctl GDRDRV IOC PIN BUFFER c020da01
After ioctl retcode -1
-bash-4.2$

-bash-4.2$ ./copybw
GPU id:0 name:Tesla V100-SXM2-32GB PCI domain: 0 bus: 26 device: 0
GPU id:1 name:Tesla V100-SXM2-32GB PCI domain: 0 bus: 28 device: 0
GPU id:2 name:Tesla V100-SXM2-32GB PCI domain: 0 bus: 136 device: 0
GPU id:3 name:Tesla V100-SXM2-32GB PCI domain: 0 bus: 138 device: 0
selecting device 0
testing size: 131072
rounded size: 131072
device ptr: 7fffa0600000
before ioctl GDRDRV IOC PIN BUFFER c020da01
After ioctl size -1
closing gdrdrv
-bash-4.2$

fd to /dev/gdrdrv should not be sharable with other processes

Since struct file in gdrdrv keeps the internal data for the process that opened the file, being able to share the fd can lead to undesired behaviors. Create a unit test to make sure that fd is not sharable:

via fork
via dup
via unix socket

Version mismatch between modinfo gdrdrv and dpkg -l gdrdrv-dkms

Code from the master branch (bf4848f).

$ dpkg -l gdrdrv-dkms
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                           Version                      Architecture                 Description
+++-==============================================-============================-============================-==================================================================================================
ii  gdrdrv-dkms:amd64                              2.0                          amd64                        gdrdrv driver in DKMS format.

$ modinfo gdrdrv
filename:       /lib/modules/4.15.0-58-generic/updates/dkms/gdrdrv.ko
version:        1.1
description:    GDRCopy kernel-mode driver
license:        MIT
author:         [email protected]
srcversion:     D5FB5F3108420043522DCAC
depends:        nv-p2p-dummy
retpoline:      Y
name:           gdrdrv
vermagic:       4.15.0-58-generic SMP mod_unload 
parm:           dbg_enabled:enable debug tracing (int)
parm:           info_enabled:enable info tracing (int)

build failure on power PC

devendar@ibm-p9-013 gdrcopy (git::devel)$ make
echo "GDRAPI_ARCH=POWER"
GDRAPI_ARCH=POWER
make: Warning: File `libgdrapi.so.1.2' has modification time 22 s in the future
cc -O2 -fPIC -I /usr/local/cuda/include -I gdrdrv/ -I /usr/local/cuda/include -D GDRAPI_ARCH=POWER  -c -o gdrapi.o gdrapi.c
cc -shared -Wl,-soname,libgdrapi.so.1 -o libgdrapi.so.1.2 gdrapi.o
ldconfig -n /labhome/devendar/gdrcopy
ln -sf libgdrapi.so.1.2 libgdrapi.so.1
ln -sf libgdrapi.so.1 libgdrapi.so
cd gdrdrv; \
make
make[1]: Entering directory `/labhome/devendar/gdrcopy/gdrdrv'
Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-387.26/nvidia. If that does not meet your expectation, you might have a stale driver still around and that might cause problems.
make[2]: Entering directory `/usr/src/kernels/4.11.0-44.el7a.ppc64le'
make[3]: Warning: File `/labhome/devendar/gdrcopy/gdrdrv/modules.order' has modification time 22 s in the future
make[3]: warning:  Clock skew detected.  Your build may be incomplete.
  Building modules, stage 2.
  MODPOST 2 modules
make[2]: Leaving directory `/usr/src/kernels/4.11.0-44.el7a.ppc64le'
make[1]: Leaving directory `/labhome/devendar/gdrcopy/gdrdrv'
g++ -O2 -I /usr/local/cuda/include -I gdrdrv/ -I /usr/local/cuda/include -D GDRAPI_ARCH=POWER -L /usr/local/cuda/lib64 -L /usr/local/cuda/lib -L /usr/lib64/nvidia -L /usr/lib/nvidia -L /usr/local/cuda/lib64   -o basic basic.o libgdrapi.so.1.2 -lcudart -lcuda -lpthread -ldl
libgdrapi.so.1.2: undefined reference to `_mm_sfence'
collect2: error: ld returned 1 exit status
make: *** [basic] Error 1

Allow gdr_map at CPU page size granularity (4KB) instead of GPU page size granularity (64KB)

cudaMalloc can no longer guarantee to return 64kB aligned address

GDRDRV needs 64kB aligned addresses.

gdrdrv_pin_buffer() {
...
    page_virt_start  = params.addr & GPU_PAGE_MASK;
    page_virt_end    = params.addr + params.size - 1;
    rounded_size     = page_virt_end - page_virt_start + 1;
    mr->offset       = params.addr & GPU_PAGE_OFFSET;
...
}

and

gdrdrv_mmap() {
...
    if (mr->offset) {
        gdr_dbg("offset != 0 is not supported\n");
        ret = -EINVAL;
        goto out;
    }
...
}

This is no more guaranteed with the cudaMalloc in recent CUDA drivers (since 410). A temporary WAR could be (at application level) to allocate with the cudaMalloc a memory area that is size + GPU_PAGE_SIZE and then search for the first 64kB aligned address. Something like:

alloc_size = (buffer_size + GPU_PAGE_SIZE) & GPU_PAGE_MASK;
cuMemAlloc(&dev_addr, alloc_size);
if(dev_addr % GPU_PAGE_SIZE) {
    dev_addr += (GPU_PAGE_SIZE - (dev_addr % GPU_PAGE_SIZE));
}

gdrcopy configuration for use with UCX

Hello,
I'm trying to build gdrcopy correctly in order to build UCX. (Following the website instructions) the installation seems to work fine:

sudo make PREFIX=/usr/local/gdrcopy CUDA=/usr/local/cuda-10.1
echo "GDRAPI_ARCH=X86"
GDRAPI_ARCH=X86
cd gdrdrv;
make
make[1]: Entering directory /home/centos/gdrcopy/gdrdrv' Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-418.67/nvidia. If that does not meet your expectation, you might have a stale driver still around and that might cause problems. make[2]: Entering directory /usr/src/kernels/3.10.0-957.5.1.el7.x86_64'
Building modules, stage 2.
MODPOST 2 modules
make[2]: Leaving directory /usr/src/kernels/3.10.0-957.5.1.el7.x86_64' make[1]: Leaving directory /home/centos/gdrcopy/gdrdrv'

sudo ./insmod.sh
INFO: driver major is 240
INFO: creating /dev/gdrdrv inode

The validation codes yield:
./validate
buffer size: 327680
off: 0
check 1: MMIO CPU initialization + read back via cuMemcpy D->H
check 2: gdr_copy_to_bar() + read back via cuMemcpy D->H
check 3: gdr_copy_to_bar() + read back via gdr_copy_from_bar()
check 4: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 5 dwords offset
check 5: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 11 bytes offset
warning: buffer size -325939184 is not dword aligned, ignoring trailing bytes
unampping
unpinning

./copybw
GPU id:0 name:Tesla M60 PCI domain: 0 bus: 0 device: 30
selecting device 0
testing size: 131072
rounded size: 131072
device ptr: b04720000
bar_ptr: 0x7f670a353000
info.va: b04720000
info.mapped_size: 131072
info.page_size: 65536
page offset: 0
user-space pointer:0x7f670a353000
writing test, size=131072 offset=0 num_iters=10000
write BW: 9585.88MB/s
reading test, size=131072 offset=0 num_iters=100
read BW: 529.436MB/s
unmapping buffer
unpinning buffer
closing gdrdrv

However, I don't see any file in the subdirectory /usr/local/gdrcopy and, when I try to configure and build UCX(1.5.2), I get the error message: configure: error: gdrcopy support is requested but gdrcopy packages can't found

Thank you.

Issues with gdr driver

Hello,
I'm running into some issues while trying to use gdrcopy in a MPI environment. I have CUDA 10.1 (418.67) and the error reads:
GDRCOPY library "libgdrapi.so" unable to open GDR driver, is gdrdrv.ko loaded?
I'm new to gdrcopy and don't really know what this means. After installing gdrcopy, I performed the suggested validations that read OK to me:

 ./validate
buffer size: 327680
off: 0
check 1: MMIO CPU initialization + read back via cuMemcpy D->H
check 2: gdr_copy_to_bar() + read back via cuMemcpy D->H
check 3: gdr_copy_to_bar() + read back via gdr_copy_from_bar()
check 4: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 5 dwords offset
check 5: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 11 bytes offset
warning: buffer size 1763323920 is not dword aligned, ignoring trailing bytes
unampping
unpinning
 ./copybw
GPU id:0 name:Tesla K80 PCI domain: 0 bus: 0 device: 4
selecting device 0
testing size: 131072
rounded size: 131072
device ptr: 403960000
bar_ptr: 0x7f0395d5c000
info.va: 403960000
info.mapped_size: 131072
info.page_size: 65536
page offset: 0
user-space pointer:0x7f0395d5c000
writing test, size=131072 offset=0 num_iters=10000
write BW: 9437.68MB/s
reading test, size=131072 offset=0 num_iters=100
read BW: 356.296MB/s
unmapping buffer
unpinning buffer
closing gdrdrv

Any suggestions on how to proceed or what am I missing? Thanks.

Failed to make install

I get this exception:

ln -sf libgdrapi.so.1.2 libgdrapi.so.1
ln -sf libgdrapi.so.1 libgdrapi.so
cd gdrdrv;
/usr/bin/make64
make64[1]: Entering directory /home/users/tangwei12/gdrcopy-master/gdrdrv' Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-linux-x86_64-390.12/kernel/nvidia. If that does not meet your expectation, you might have a stale driver still around and that might cause problems. make64[2]: Entering directory /home/users/tangwei12/linux-4-14'

WARNING: Symbol version dump ./Module.symvers
is missing; modules will have no dependencies and modversions.

CC [M] /home/users/tangwei12/gdrcopy-master/gdrdrv/nv-p2p-dummy.o
CC [M] /home/users/tangwei12/gdrcopy-master/gdrdrv/gdrdrv.o
Building modules, stage 2.
MODPOST 2 modules
FATAL: /home/users/tangwei12/gdrcopy-master/gdrdrv/gdrdrv.o is truncated. sechdrs[i].sh_offset=7089075323386670592 > sizeof(*hrd)=64
make64[3]: *** [__modpost] Error 1
make64[2]: *** [modules] Error 2
make64[2]: Leaving directory /home/users/tangwei12/linux-4-14' make64[1]: *** [module] Error 2 make64[1]: Leaving directory /home/users/tangwei12/gdrcopy-master/gdrdrv'
make64: *** [driver] Error 2

OS: centOS 6.3 (4.14.18)
CUDA: 9
Driver Version: 390.12

Error when building gdrcopy deb package

I'm seeing following error when building gdrcopy deb package:

> ./build-deb-packages.sh
...
> dpkg-shlibdeps: error: no dependency information found for /usr/lib/x86_64-linux-gnu/libcuda.so.1 (used by debian/gdrcopy/usr/bin/sanity)

Which, i suppose, due to installing driver from downloaded *.run package.

Error can be suppressed by adding rule:

override_dh_shlibdeps:
        dh_shlibdeps --dpkg-shlibdeps-params=--ignore-missing-info

to packages/debian/rules, but i'm not sure whether this is the right way to maintain this.

gdrcopy-devel RPM won't install

I have built the gdrcopy RPMS from the build_packages.sh script in the source tree and I've found that I'm unable to install the gdrcopy-devel package because it is missing a required dependency.

Error: Package: gdrcopy-devel-1.3-2.x86_64 (/gdrcopy-devel-1.3-2.x86_64)
Requires: libgdrapi.so.1()(64bit)
The file listed was installed by the gdrcopy RPM, but that library didn't get listed as being provided by the RPM.
If this works for other users then I may have messed something up in my environment, but if not it is probably a bug in the spec file that should get fixed.
Either way I'm willing to put some work into figuring it out, but I wanted to know which side of the problem to focus on.

Error in data_validation inside sanity unit test suite

This is a code error in "check 5: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + %d bytes offset". The size passed to gdr_copy_to_mapping and gdr_copy_from_mapping does not account for extra_off.

Installing/Uninstalling gdrcopy-kmod.rpm failed on RHEL6

systemctl is used in installation and uninstallation scripts for gdrcopy-kmod.rpm, but RHEL6 does not have systemctl.

support AVX2 x86 instructions

slow write BW observed beyond 64KB size

This has been reported by Mark Mark Silberstein [email protected]

We finally pinpointed the setup, and it's easily reproducible.

Get the CPU ptr for the buffer in the mapped BAR
Sequentially pread from file into that buffer in blocks >=64K.

As long as blocks are less than 64K, we get ~1GB/s. For blocks >= 64K we get around 13MB/s

Failed tests (validate and copybw) with latest Mellanox driver and kernel

Hi, we found the tests break the machine with CUDA-9, Nvidia driver (390.25), latest Mellanox driver (v4_3-1_0_1_0) and Linux kernel (4.14.11). The installation of the gdrcopy finishes successfully, but the machine freezes and then reboots when running the tests.
any suggestions? Thanks

add a producer-consumer benchmark

strawman design:

allocate device memory buffer B
launch CUDA kernel:
- polling on B[0]
- writing a zero-copy flag
CPU:
- wait for the kernel to really be polling
- read tsc in t_start
- write B[0]
- wait for flag
- read tsc in t_end
- d_t = t_end - t_start should be lower than 1-2 msecs
repeat until result is stable

power9: bus error with 4GB copy

root@ibm-p9-012 gdrcopy]# ./copybw -s 4294967296 -c 4294967296 -d 0
GPU id:0 name:Tesla V100-SXM2-16GB PCI domain: 4 bus: 4 device: 0
GPU id:1 name:Tesla V100-SXM2-16GB PCI domain: 4 bus: 5 device: 0
GPU id:2 name:Tesla V100-SXM2-16GB PCI domain: 53 bus: 3 device: 0
GPU id:3 name:Tesla V100-SXM2-16GB PCI domain: 53 bus: 4 device: 0
selecting device 0
testing size: 4294967296
rounded size: 4294967296
device ptr: 7ffe40000000
bar_ptr: 0x7ffc3fff0000
info.va: 7ffe40000000
info.mapped_size: 4294967296
info.page_size: 65536
page offset: 0
user-space pointer:0x7ffc3fff0000
BAR writing test, size=4294967296 offset=0 num_iters=10000
Bus error (core dumped)
[root@ibm-p9-012 gdrcopy]#

Need -lrt in Makefile

$ make
...
...
/usr/bin/ld: copybw.o: undefined reference to symbol 'clock_gettime@@GLIBC_2.2.5'
/usr/bin/ld: note: 'clock_gettime@@GLIBC_2.2.5' is defined in DSO /lib64/librt.so.1 so try adding it to the linker command line
/lib64/librt.so.1: could not read symbols: Invalid operation
collect2: error: ld returned 1 exit status
make: *** [copybw] Error 1

Had to add "-lrt" to LIBS in Makefile:15

add mechanism for creating RPM packages

./insmod.sh fails

Dear,

We have several GPU nodes (Skylake processors with 4x P100 cards per each node), and I would like to test if the RDMA is available on these nodes or not.
When I try to build the gdrcopy, I get the following error message:
mknod: ‘/dev/gdrdrv’: Operation not permitted
Here is the specification of the host:

$> uname -a Linux r23g34 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

In fact, there is not such a file at /dev/gdrdrv on our current system. Do you have an idea what is wrong here?

Thanks
Ehsan

Support for fork

today, forking could lead to spurious prints (

gdrcopy/gdrdrv/gdrdrv.c

Line 236 in f54766b

 retcode = nvidia_p2p_put_pages(mr->p2p_token, mr->va_space, mr->va, mr->page_table); 

) and possibly a crash.
tracking here further investigations (a new unit test for this case) and possible mitigations (e.g. CLOEXEC when opening the driver fd).

1.2 release tag

Appears the tag/GitHub release for 1.2 is missing. Could this please be added?

segfault for copying data buffer of 64-127 byte

Hi,

I have seen segfault when copying buffers (gdr_copy_from_bar) with size ranging from 64-byte to 127-byte. Followings are the reproducers in our machines.

$ /opt/gdrcopy8.0/copybw -s 64
GPU id:0 name:Tesla K40c PCI domain: 0 bus: 2 device: 0
selecting device 0
testing size: 64
rounded size: 65536
device ptr: b05a40000
bar_ptr: 0x7f43d9223000
info.va: b05a40000
info.mapped_size: 65536
info.page_size: 65536
page offset: 0
user-space pointer:0x7f43d9223000
BAR writing test, size=64 offset=0 num_iters=10000
BAR1 write BW: 457.923MB/s
BAR reading test, size=64 offset=0 num_iters=100
Segmentation fault

$dmesg
...
[2689239.364734] copybw[5308]: segfault at 2846000 ip 00007f43d8ecd06c sp 00007fff23b939e0 error 6 in libgdrapi.so.1.2[7f43d8ecb000+3000]

$ /opt/gdrcopy8.0/copybw -s 64
GPU id:0 name:Tesla K80 PCI domain: 0 bus: 5 device: 0
GPU id:1 name:Tesla K80 PCI domain: 0 bus: 6 device: 0
selecting device 0
testing size: 64
rounded size: 65536
device ptr: 2304fc0000
bar_ptr: 0x2acc78311000
info.va: 2304fc0000
info.mapped_size: 65536
info.page_size: 65536
page offset: 0
user-space pointer:0x2acc78311000
BAR writing test, size=64 offset=0 num_iters=10000
BAR1 write BW: 722.593MB/s
BAR reading test, size=64 offset=0 num_iters=100
Segmentation fault (core dumped)

$dmesg
...
[2614698.728292] copybw[32532]: segfault at 2acc78321000 ip 00002acc78459018 sp 00007ffd51c16b10 error 4 in libgdrapi.so.1.2[2acc78457000+3000]

Do you have any idea what could be happening here?

Thanks,

support POWER architecture

use dkms in gdrdrv

dkms is needed anyway for deb kernel module packages

gdr_close does not unmap nor clean up internally alloced gdr_memh_t

nvidia_p2p_get_pages() failed

I just installed gdrcopy on my machine (Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-116-generic x86_64)) using CUDA 7.5, V7.5.17 (NVIDIA driver version 367.27), with an NVIDIA Tesla K20m GPU. After trying to run $ ./validate, the following error was printed in dmesg:
gdrdrv:nvidia_p2p_get_pages(va=704fe0000 len=327680 p2p_token=0 va_space=0) failed [ret = -22]

-22 = -EINVAL, and according to the GPUDirect CUDA Toolkit page that function returs -EINVAL if an invalid argument was supplied.
Does anyone have any bright ideas on why I can't do GPUDirect RDMA? Thanks.

ioremap sometimes too slow

use nvidia_p2p_get_page get physical pages
use ioremap to map nvidia physical page to kernel virtual address
use memcpy_toio copy data from kernel to gpu
Sometimes memcpy_toio too slow, it costs 80ms to transfer 600KB data to gpu.
normally is 0.18ms.
when machine boots, it remains either be normal or slow until machine reboots.

Please help me! How to direct access nvidia physical memory from kernel module?

copy_to_bar: sfence not issued in the right order in some scenarios

I observed a 1msec latency from when gdr_copy_to_bar is issued to when the update is observed on the GPU.

When the target buffer is not aligned or when the copy size is too small, gdr_copy_to_bar translates to an sfence followed by a memcpy.

Issuing sfence after the mempcy seems to prevent some buffering and helps reduce the latency significantly.

fix MMIO load/store performance on pcc64le

buffer overrun in validate test

reported by Ching Chu:

$ ./validate
buffer size: 327680
off: 0
check 1: MMIO CPU initialization + read back via cuMemcpy D->H
check 2: gdr_copy_to_bar() + read back via cuMemcpy D->H
check 3: gdr_copy_to_bar() + read back via gdr_copy_from_bar()
check 4: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 5 dwords offset
check 5: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 11 bytes offset
[1] 316576 segmentation fault ./validate

add autotuning support

optimized memcpy implementations should be chosen at run-time during a tuning phase, possibly in gdr_open()

gdrdrv does not build on ARM64

The issue comes from:

random_get_entropy()
get_cycles()

gdr_open is returning NULL

Hi,

I have installed gdrcopy, but I am getting NULL for the call gdr_open and the test cases are failing.

GPU buffer invalidation does not tear-down CPU mappings obtained via gdr_map()

As of today, the nvidia_p2p_get_pages callback does not tear down the CPU mappings created via gdr_map().
This is a potential security threat, as those BAR1 pages could be reused later to expose some other GPU device memory, possibly belonging to a different OS process colocated on the same GPU.

switch to autoconf/automake build system

enforce ABI compatibility between user and kernel space components

Currently there is no run-time ABI compatibility check between libgdrapi and gdrdrv.

That can generate obscure errors, say in a container when libgdapi version A tries to work with baremetal gdrdrv version B.

A possible plan would be:

to introduce the concept of ABI version in gdrdrv
to add a new IOCTL to return that version to user-space
in gdr_open(), check ABI compatibility

cannot load the driver: Invalid parameters while run ./insert.sh

sudo /sbin/insmod gdrdrv/gdrdrv.ko dbg_enabled=0 info_enabled=0
insmod: ERROR: could not insert module gdrdrv/gdrdrv.ko: Invalid parameters

so I tried:

insmod gdrdrv.ko
insmod: ERROR: could not insert module gdrdrv.ko: Invalid parameters

Could you take a look and do a quick fix on it, now it is not working.

rate-limit printk to avoid flooding the kernel log

[61024.799569] gdrdrv:invoking nvidia_p2p_get_pages(va=0x2305ba0000 len=4194304 p2p_tok=0 va_tok=0)
[61024.799746] gdrdrv:nvidia_p2p_get_pages(va=2305ba0000 len=4194304 p2p_token=0 va_space=0) failed [ret = -22]
[61024.799920] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23265): umem get failed (-14)
[61024.800127] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23266): umem get failed (-14)
[61024.800151] gdrdrv:invoking nvidia_p2p_get_pages(va=0x2305ba0000 len=4194304 p2p_tok=0 va_tok=0)
[61024.800327] gdrdrv:nvidia_p2p_get_pages(va=2305ba0000 len=4194304 p2p_token=0 va_space=0) failed [ret = -22]
[61024.800502] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23266): umem get failed (-14)
[61024.800704] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23265): umem get failed (-14)
[61024.800726] gdrdrv:invoking nvidia_p2p_get_pages(va=0x2305ba0000 len=4194304 p2p_tok=0 va_tok=0)
[61024.800901] gdrdrv:nvidia_p2p_get_pages(va=2305ba0000 len=4194304 p2p_token=0 va_space=0) failed [ret = -22]
[61024.801083] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23265): umem get failed (-14)
[61024.801285] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23266): umem get failed (-14)
[61024.801307] gdrdrv:invoking nvidia_p2p_get_pages(va=0x2305ba0000 len=4194304 p2p_tok=0 va_tok=0)
[61024.801484] gdrdrv:nvidia_p2p_get_pages(va=2305ba0000 len=4194304 p2p_token=0 va_space=0) failed [ret = -22]
[61024.801659] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23266): umem get failed (-14)
[61024.801861] mlx5_warn:mlx5_0:mlx5_ib_reg_user_mr:1418:(pid 23265): umem get failed (-14)
[61024.801883] gdrdrv:invoking nvidia_p2p_get_pages(va=0x2305ba0000 len=4194304 p2p_tok=0 va_tok=0)
[61024.802064] gdrdrv:nvidia_p2p_get_pages(va=2305ba0000 len=4194304 p2p_token=0 va_space=0) failed [ret = -22]

create init.d/gdrdrv script for ubuntu

support creation of deb package

consolidate API versioning

ATM there are 3 places where the library major and minor version are specified:

gdrapi.h
Makefile
gdrcopy.spec
There should be a single place where those version numbers are maintained.

gdr_map returns -EAGAIN


 
[1218757.588122] gdrdrv:mmap start=0x7f45e83c3000 size=196608 off=0xc31d2952
[1218757.588123] gdrdrv:range start with p=0 vaddr=7f45e83c3000 page_paddr=3838082a0000
[1218757.588125] gdrdrv:non-contig p=1 prev_page_paddr=3838082a0000 cur_page_paddr=3838084b0000
[1218757.588127] gdrdrv:mapping p=1 entries=1 offset=0 len=65536 vaddr=7f45e83c3000 paddr=3838082a0000
[1218757.588128] gdrdrv:mmaping phys mem addr=0x3838082a0000 size=65536 at user virt addr=0x7f45e83c3000
[1218757.588129] gdrdrv:is_cow_mapping is FALSE
[1218757.588138] gdrdrv:range start with p=1 vaddr=7f45e83d3000 page_paddr=3838084b0000
[1218757.588139] gdrdrv:mapping p=3 entries=2 offset=0 len=131072 vaddr=7f45e83d3000 paddr=3838084b0000
[1218757.588141] gdrdrv:mmaping phys mem addr=0x3838084b0000 size=131072 at user virt addr=0x7f45e83d3000
[1218757.588141] gdrdrv:is_cow_mapping is FALSE
[1218757.588146] gdrdrv:track_pfn_remap failed :-22
[1218757.588150] gdrdrv:error in remap_pfn_range() ret:-22
[1218757.588151] gdrdrv:error -11 in gdrdrv_mmap_phys_mem_wcomb

provide a run-time version query mechanism

We might consider a run-time query mechanism, like gdr_query_version(int *major, int *minor) or the more generic gdr_get_attribute(int attr, int *value), which would complement the dynamic link time mechanism offered by ld.so.

That would be especially useful, say in MPI libraries, when dynamically loading the library with dlopen("libgdrapi.so") and resolving symbols with dlsym(), to enforce a run-time compatibility check.

Errors with libgdrapi.so.1.2 while building gdrcopy

Hello,
I've run into the following error message while building gdrcopy-v1.3 (it doesn't happen with the master branch):

sudo make CUDA=/usr/local/cuda-10.1 all install
make: execvp: ./config_arch: Permission denied
echo "GDRAPI_ARCH="
GDRAPI_ARCH=
cd gdrdrv; \
make
make[1]: Entering directory `/home/ody/gdrcopy-1.3/gdrdrv'
Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-418.67/nvidia. If that does not meet your expectation, you might have a stal                                   e driver still around and that might cause problems.
make[2]: Entering directory `/usr/src/kernels/3.10.0-957.27.2.el7.x86_64'
  Building modules, stage 2.
  MODPOST 2 modules
make[2]: Leaving directory `/usr/src/kernels/3.10.0-957.27.2.el7.x86_64'
make[1]: Leaving directory `/home/ody/gdrcopy-1.3/gdrdrv'
g++ -O2 -I /usr/local/cuda-10.1/include -I gdrdrv/ -I /usr/local/cuda-10.1/include -D GDRAPI_ARCH= -L /usr/local/cuda-10.1/lib64 -L /usr/local                                   /cuda-10.1/lib -L /usr/lib64/nvidia -L /usr/lib/nvidia -L /usr/local/cuda-10.1/lib64   -o basic basic.o libgdrapi.so.1.2 -lcudart -lcuda -lpth                                   read -ldl
libgdrapi.so.1.2: undefined reference to `memcpy_cached_store_sse'
libgdrapi.so.1.2: undefined reference to `memcpy_uncached_store_avx'
libgdrapi.so.1.2: undefined reference to `memcpy_cached_store_avx'
libgdrapi.so.1.2: undefined reference to `memcpy_uncached_store_sse'
libgdrapi.so.1.2: undefined reference to `memcpy_uncached_load_sse41'
collect2: error: ld returned 1 exit status
make: *** [basic] Error 1

The hardware is a virtualized environment with an Intel(R) Xeon(R) CPU @ 2.30GHz and a 00:04.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) ) GPU. Thanks.

some questions to use this repo

is there any documents to help me know about how to use code files in this repo in my own project?
is it possible to make all the function into a dll or lib file for convenience?
so i can use this to make screenshots or video streaming for games run by NVIDIA Geforce GPU?

nvidia / gdrcopy Goto Github PK

gdrcopy's Issues

Recommend Projects

Recommend Topics

Recommend Org