Comments (13)
Ehsan, insmod.sh requires that the user issuing the command have sudo privileges.
from gdrcopy.
I definitely have a root permission. Let me copy-paste what I get when running "make":
gdrcopy$> sudo ./build.sh &> log.txt
And the tail of the log.txt
reads:
`make PREFIX=/easybuild/work/gdrcopy/install CUDA=/software/CUDA/9.1.85 all install
echo "GDRAPI_ARCH=X86"
GDRAPI_ARCH=X86
cc -O2 -fPIC -I /software/CUDA/9.1.85/include -I gdrdrv/ -I /software/CUDA/9.1.85/include -D GDRAPI_ARCH=X86 -c -o gdrapi.o gdrapi.c
cc -O2 -fPIC -I /software/CUDA/9.1.85/include -I gdrdrv/ -I /software/CUDA/9.1.85/include -D GDRAPI_ARCH=X86 -c -mavx -o memcpy_avx.o memcpy_avx.c
cc -O2 -fPIC -I /software/CUDA/9.1.85/include -I gdrdrv/ -I /software/CUDA/9.1.85/include -D GDRAPI_ARCH=X86 -c -msse -o memcpy_sse.o memcpy_sse.c
cc -O2 -fPIC -I /software/CUDA/9.1.85/include -I gdrdrv/ -I /software/CUDA/9.1.85/include -D GDRAPI_ARCH=X86 -c -msse4.1 -o memcpy_sse41.o memcpy_sse41.c
cc -shared -Wl,-soname,libgdrapi.so.1 -o libgdrapi.so.1.2 gdrapi.o memcpy_avx.o memcpy_sse.o memcpy_sse41.o
ldconfig -n /easybuild/work/gdrcopy/gdrcopy
ln -sf libgdrapi.so.1.2 libgdrapi.so.1
ln -sf libgdrapi.so.1 libgdrapi.so
cd gdrdrv;
make
find: ‘/usr/src/nvidia-’: No such file or directory
dirname: missing operand
Try 'dirname --help' for more information.
make[1]: Entering directory /easybuild/work/gdrcopy/gdrcopy/gdrdrv' Picking NVIDIA driver sources from NVIDIA_SRC_DIR=NVIDIA_DRIVER_MISSING. If that does not meet your expectation, you might have a stale driver still around and that might cause problems. make[2]: Entering directory
/usr/src/kernels/3.10.0-693.17.1.el7.x86_64'
find: ‘/usr/src/nvidia-’: No such file or directory
dirname: missing operand
Try 'dirname --help' for more information.
CC [M] /easybuild/work/gdrcopy/gdrcopy/gdrdrv/nv-p2p-dummy.o
/easybuild/work/gdrcopy/gdrcopy/gdrdrv/nv-p2p-dummy.c:48:20: fatal error: nv-p2p.h: No such file or directory
#include "nv-p2p.h"
^
compilation terminated.
make[3]: *** [/easybuild/work/gdrcopy/gdrcopy/gdrdrv/nv-p2p-dummy.o] Error 1
make[2]: *** [module/easybuild/work/gdrcopy/gdrcopy/gdrdrv] Error 2
make[2]: Leaving directory /usr/src/kernels/3.10.0-693.17.1.el7.x86_64' make[1]: *** [module] Error 2 make[1]: Leaving directory
/easybuild/work/gdrcopy/gdrcopy/gdrdrv'
make: *** [driver] Error 2
`
I am building against CUDA/9.1.85.
from gdrcopy.
I made some progress with the previous errors, and now, I get a new error:
insmod: ERROR: could not insert module gdrdrv/gdrdrv.ko: Unknown symbol in module
from gdrcopy.
Hard to tell.
Are you building and installing on the same machine?
There should be a detailed error in the kernel log. You could use 'dmesg' to dump that log and copy the relevant lines here.
from gdrcopy.
Alright ... I'm coming back to this ticket, because I need gdrcopy for a CUDA-aware OpenMPI. I am attaching the redirected stderr/stdout from building gdrcopy in here, together with the very simple build script I am using.
In brief, I have two complains now, one about NVIDIA_SRC_DIR, and the other about CONFIG_RETPOLINE during the "make" step. In fact, I am not sure how to set these, so that they propagate properly to the make.
Furthermore, I need to know what is expected to be inside NVIDIA_SRC_DIR?
What do you see on your platform?
from gdrcopy.
I would like to attract your attention to this ticket. In fact, my installation of CUDA-aware MPI is pending on compiling gdrcopy. Could you please take a look at my error logs, and also the questions I raised above?
Thanks a lot.
E.
from gdrcopy.
Ehsan,
thank you for trying gdrcopy.
The excerpt from your build log, copied below, is clear enough:
- NVIDIA_SRC_DIR is auto set based on your local install dir of the GPU driver
- CONFIG_RETPOLINE is apparently not supported by your host compiler. I am not an expert, but I don't believe you are supposed to tweak the compiler command line for a kernel module. Either your Linux kernel automatically detects and enables retpoline or not.
make[1]: Entering directory `/easybuild/work/gdrcopy/gdrcopy/gdrdrv'
Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-418.40.04/nvidia. If that does not meet your expectation, you might have a stale driver still around and that might cause problems.
make[2]: Entering directory `/usr/src/kernels/3.10.0-957.10.1.el7.x86_64'
arch/x86/Makefile:166: *** CONFIG_RETPOLINE=y, but not supported by the compiler. Compiler update recommended.. Stop.
make[2]: Leaving directory `/usr/src/kernels/3.10.0-957.10.1.el7.x86_64'
make[1]: *** [module] Error 2
make[1]: Leaving directory `/easybuild/work/gdrcopy/gdrcopy/gdrdrv'
make: *** [driver] Error 2
from gdrcopy.
Thanks Davide for your message; it brought some activity back to this ticket.
My problem is, whether or not I set the two env vars NVIDIA_SRC_DIR and/or CONFIG_RETPOLINE, my build always crashes at the same location, and throws the same error message. That made me wonder I am not doing it right.
Do you have any idea why my build crashes? And how to resolve this?
from gdrcopy.
That kernel module build error is discussed on the net, e.g. on RH/CentOS forums/bugzilla.
For example see https://bugzilla.redhat.com/show_bug.cgi?id=1566297#c12
I think you might have updated the kernel but not the gcc RPM.
from gdrcopy.
Thanis Davide for the hint. For some reason, when I use GCC/6.4.0 module on our compute nodes (with rpm -q gcc
command givinb gcc-4.8.5-36.el7_6.1.x86_64
), the installation keeps failing! However, I purge the GCC module, and stick to the system gcc
and it builds flawlessly.
I still cannot comprehend why gdrcopy builds with an older GCC rather than a newer one!
from gdrcopy.
BTW gdrdrv is a kernel module, which takes advantage of the Linux kernel build system, i.e. it does not have its own build system.
It looks like retpoline support is in gcc 7.3 or 8.x, but not in 6.x.
Most probably RH backported retpoline support onto their gcc 4.8.5 branch.
closing as this is a local customer server issue
from gdrcopy.
dear , how dou you fix the problem "insmod: ERROR: could not insert module gdrdrv/gdrdrv.ko: Unknown symbol in module" ? i
from gdrcopy.
Hi @zhuanwancaishi ,
There are multiple possibilities:
- Was nvidia driver (nvidia.ko) loaded before you tried insmod.sh?
- When you compiled gdrdrv, there should be a message printed out. Did it pick the correct nvidia driver and the linux kernel version you are running?
from gdrcopy.
Related Issues (20)
- Facing issue when installing HOT 1
- Ubuntu 22 - dpkg: error processing package gdrdrv-dkms:amd64 (--install) during installation of gdrcopy HOT 3
- Why D2H is relatively slower? HOT 2
- Query: Confusion about sudo requirement HOT 3
- thinking about working with CUDA async API
- gdrcopy_sanity failed when GPU Compute Mode is set to EXCLUSIVE HOT 1
- Unable to compile GDRCOPY v2.4 HOT 2
- Minimal steps to install gdrdrv driver only please HOT 6
- Fail to access mapped memory from CPU side(Fail data_validation tests) HOT 14
- tests build failing when check.h is not available HOT 1
- How to understand the file "nv-p2p-dummpy.c" HOT 3
- Driver flavor detection fails for 545 series HOT 2
- bad performance(compare with cuMemcpy) on x86 system HOT 3
- GDRCopy 2.4 on Centos7 failing build of RPM packages HOT 2
- Increasing utilization - gdrcopy_copybw HOT 3
- Improve the error report of gdrcopy_pplat when the CUDA kernel cannot be launched
- Safe Mounting of /dev/gdrdrv in a kubernetes environment - HostPath appears to fail HOT 10
- How to effectively test if gdrcopy is enabled using Real world ML workload ? HOT 2
- Can't make with Intel Compiler HOT 4
- MAINT: gdr_unmap segfault on master branch via NVSHMEM 2.10.1 on Cray Slingshot 11 with cuFFTMp HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gdrcopy.