matpool / mpu Goto Github PK
View Code? Open in Web Editor NEWA shim driver allows in-docker nvidia-smi showing correct process list without modify anything
License: GNU General Public License v2.0
A shim driver allows in-docker nvidia-smi showing correct process list without modify anything
License: GNU General Public License v2.0
It doesn't work after restarting the host. The current solution is to re-execute the make install
command.
Hi,
Thanks for this kernel extension.
Is it save to install with the following specs?
Since I am not familiar with kernel extensions, I am a little hesitant to simply give it a try.
If no weird side effects can occur I would give it a shot and if it works you can update the README
accordingly, adding the above specs as being compatible.
Thanks for your input.
Best regards
Lars
Edit*: Also, if you find the time, can you maybe explain how your solution is different from this repo https://github.com/gh2o/nvidia-pidns? It was also referenced in the corresponding nvidia-docker Github issue NVIDIA/nvidia-docker#179.
Encountering a kernel panic related to write_syscall
function and an unresolved symbol error for kallsyms_lookup_name
on CentOS Stream 9 running kernel version 5.14.0-404.el9.x86_64
.
sys_call_table
While attempting to modify the sys_call_table, a kernel panic occurs due to write protection, which seems to be related to the pinned sensitive bits in CR0 and CR4 as of kernel version 5.3 (referenced here).
The current method of modification triggers a permissions violation, resulting in a system crash. Below is the kernel log snippet capturing the panic:
[ 4632.359092] BUG: unable to handle page fault for address: ffffffff998017a0
[ 4632.359654] #PF: supervisor write access in kernel mode
[ 4632.360207] #PF: error_code(0x0003) - permissions violation
[ 4632.360756] PGD 2104015067 P4D 2104015067 PUD 2104016063 PMD 80000021034000e1
[ 4632.361323] Oops: 0003 [#1] PREEMPT SMP PTI
[ 4632.361882] CPU: 24 PID: 10286 Comm: insmod Kdump: loaded Tainted: P W OE ------- --- 5.14.0-404.el9.x86_64 #1
[ 4632.362476] Hardware name: Inspur IIMS/IIMS, BIOS 4.0.05 08/22/2018
[ 4632.363068] RIP: 0010:mpu_init_ioctl_hook+0x8b/0xc0 [mpu]
[ 4632.363657] Code: 4c 89 25 80 44 00 00 48 89 1d 81 44 00 00 48 89 05 82 44 00 00 0f 20 c5 48 89 ef 48 81 e7 ff ff fe ff e8 48 b0 7a d7 48 89 ef <48> c7 83 80 00 00 00 b0 80 09 c1 e8 35 b0 7a d7 31 c0 5b 5d 41 5c
[ 4632.364876] RSP: 0018:ffffac0e1a02bda8 EFLAGS: 00010286
[ 4632.365491] RAX: 0000000000000000 RBX: ffffffff99801720 RCX: 0000000000000027
[ 4632.366121] RDX: 0000000000000027 RSI: ffffffff9a467b00 RDI: 0000000080050033
[ 4632.366740] RBP: 0000000080050033 R08: 80000000ffff8c20 R09: ffffac0e1a02bd28
[ 4632.367366] R10: 0000000000000001 R11: 000000000000001b R12: ffff99514b03e268
[ 4632.367998] R13: ffffac0e1a02be68 R14: 0000000000000003 R15: 0000000000000000
[ 4632.368632] FS: 00007f4d7e170740(0000) GS:ffff99d13b500000(0000) knlGS:0000000000000000
[ 4632.369286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4632.369940] CR2: ffffffff998017a0 CR3: 0000008164078003 CR4: 00000000007706e0
[ 4632.370604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4632.371271] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4632.371937] PKRU: 55555554
kallsyms_lookup_name
Since kernel version 5.7, the kallsyms_lookup_name
symbol is no longer exported, which is causing the module build process to fail with an undefined symbol error.
This issue has been previously mentioned in Issue #11 and PR #15 . The error log is as follows:
ERROR: modpost: "kallsyms_lookup_name" [/root/mpu/mpu.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:134: /root/mpu/Module.symvers] Error 1
make[2]: *** Deleting file '/root/mpu/Module.symvers'
make[1]: *** [Makefile:1841: modules] Error 2
MOV
operation.Finally, in the above environment with Linux Container (Issue #12), nvidia-smi
outputs the correct results.
My OS is ubuntu20.04 LTS, I also want to use mpu to make the nvidia-smi in the container output the process id, but because the kernel version of ubuntu20 is too high, compiling mpu fails.
I then went and looked up some information to try to fix this on mpu's code, and then I found a workaround for another project.
xcellerator/linux_kernel_hacking#3
However, I don't know anything about C and I don't have the ability to add this fix to mpu myself.
So could you please, see if this way works? And try to fix it? THX.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.