Comments (6)
FYI, when I try and follow the general steps I hit:
make CUDA=/usr/local/cuda
make[1]: Entering directory `/usr/bin/gdrcopy/tests'
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include -c -o copybw.o copybw.cpp
copybw.cpp:30:10: fatal error: cuda.h: No such file or directory
#include <cuda.h>
^~~~~~~~
compilation terminated.
nvidia-smi works file on the host:, extract
Tue Sep 26 01:15:22 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:10:1C.0 Off | 0 |
| N/A 28C P0 62W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:10:1D.0 Off | 0 |
| N/A 27C P0 60W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
......
from gdrcopy.
I just installed the following, on the host
yum install -y https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.2/rhel8/x64/gdrcopy-kmod-2.4-1dkms.el8.noarch.rpm
yum install -y https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.2/rhel8/x64/gdrcopy-devel-2.4-1.el8.noarch.rpm
yum install -y https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.2/rhel8/x64/gdrcopy-2.4-1.el8.x86_64.rpm
and then ran:
# gdrcopy_sanity
Total: 28, Passed: 28, Failed: 0, Waived: 0
Does that mean it looks good?
I don't see any files in
# ls -ls /dev/gdrdrv
0 crw-rw-rw- 1 root root 242, 0 Sep 26 01:26 /dev/gdrdrv
from gdrcopy.
Hi @hassanbabaie,
If you want to install gdrdrv only, https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.2/rhel8/x64/gdrcopy-kmod-2.4-1dkms.el8.noarch.rpm
is sufficient.
# ls -ls /dev/gdrdrv
0 crw-rw-rw- 1 root root 242, 0 Sep 26 01:26 /dev/gdrdrv
Doesn't it show /dev/gdrdrv
here?
from gdrcopy.
Hi @pakmarkthub, thanks for the reply:
yes so before any install I get:
# ls -ls /dev/gdrdrv
ls: cannot access /dev/gdrdrv: No such file or directory
Then if I just install:
# yum install -y https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.2/rhel8/x64/gdrcopy-kmod-2.4-1dkms.el8.noarch.rpm
I get:
# ls -ls /dev/gdrdrv
0 crw-rw-rw- 1 root root 242, 0 Sep 26 13:30 /dev/gdrdrv
So that should mean we're good and then I just need to mount this volume in the pod spec: /dev/gdrdrv
and the rest can be installed run from within the container, right?
Thanks in advance
from gdrcopy.
Yes, gdrdrv is now ready on your system. Now, you just need to mount /dev/gdrdrv
. If you use docker, docker run <other options> --device=/dev/gdrdrv:/dev/gdrdrv
.
from gdrcopy.
Thanks!
from gdrcopy.
Related Issues (20)
- Facing issue when installing HOT 1
- Ubuntu 22 - dpkg: error processing package gdrdrv-dkms:amd64 (--install) during installation of gdrcopy HOT 3
- Why D2H is relatively slower? HOT 2
- Query: Confusion about sudo requirement HOT 3
- thinking about working with CUDA async API
- gdrcopy_sanity failed when GPU Compute Mode is set to EXCLUSIVE HOT 1
- Unable to compile GDRCOPY v2.4 HOT 2
- Fail to access mapped memory from CPU side(Fail data_validation tests) HOT 14
- tests build failing when check.h is not available HOT 1
- How to understand the file "nv-p2p-dummpy.c" HOT 3
- Driver flavor detection fails for 545 series HOT 2
- bad performance(compare with cuMemcpy) on x86 system HOT 3
- GDRCopy 2.4 on Centos7 failing build of RPM packages HOT 2
- Increasing utilization - gdrcopy_copybw HOT 3
- Improve the error report of gdrcopy_pplat when the CUDA kernel cannot be launched
- Safe Mounting of /dev/gdrdrv in a kubernetes environment - HostPath appears to fail HOT 12
- How to effectively test if gdrcopy is enabled using Real world ML workload ? HOT 2
- Can't make with Intel Compiler HOT 4
- MAINT: gdr_unmap segfault on master branch via NVSHMEM 2.10.1 on Cray Slingshot 11 with cuFFTMp HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gdrcopy.