Comments (13)
@PedroRibeiro95 Have you done any new researches on this? I was doing some researches on this and it looks like it should be working with following configurations:
k8s-device-plugin does have a config called DEVICE_LIST_STRATEGY
, which does allow device list to be returned back as CDI. Once kubelet receive allocate response from device plugin, then it should populate CDI spec file, and start containerd ( assuming we are just using containerd ). Then the containerd will parse CDI devices and convert device to oci spec file and pass the spec to runs or runsc. Then runsc should just create linux devices here as @ayushr2 just described. (I am assuming in this case nvidia-runtime
is not needed since we don't need prestart hook?)
I never tested anything, everything mentioned above is just pure guess from me, but let me know if my reasoning makes sense or not.
from gvisor.
Adding the logs for the container:
logs.zip
from gvisor.
Thanks for the very detailed report! Apologies for the delay. nvproxy is not supported with k8s-device-plugin
yet, and we haven't investigated what needs to be done to add support. We would appreciate OSS contributions!
We are currently focused on establishing support in GKE. GKE uses a different GPU+container stack. It does not use k8s-device-plugin
. It instead has its own device plugin: https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu. This configures the container in a different way. nvproxy in GKE is still experimental, but it works! Please let me know if you want to experiment on GKE, and we can provide more detailed instructions.
To summarize, nvproxy works in the following environments:
- Docker:
docker run --gpus= ...
Needs--nvproxy-docker
flag. - nvidia-container-runtime with legacy mode. Needs
--nvproxy-docker
flag. - GKE. Does not need
--nvproxy-docker
flag.
from gvisor.
Thanks for the followup @ayushr2. In the meantime I've made some progress where by just using nvproxy
, bootstrapping the host node with NVIDIA and then mounting the driver to the container using hostPath
gets me to run nvidia-smi
successfully. However, it seems it can't fully access the GPU:
==============NVSMI LOG==============
Timestamp : Mon Oct 30 15:53:01 2023
Driver Version : 525.60.13
CUDA Version : 12.0
Attached GPUs : 1
GPU 00000000:00:1E.0
Product Name : Tesla T4
Product Brand : NVIDIA
Product Architecture : Turing
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : GPU access blocked by the operating system
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : GPU access blocked by the operating system
GPU UUID : GPU-3ec3e89a-b2ec-68d1-bb38-3becc2cf55cd
Minor Number : 0
VBIOS Version : Unknown Error
MultiGPU Board : No
Board ID : 0x1e
Board Part Number : GPU access blocked by the operating system
GPU Part Number : GPU access blocked by the operating system
Module ID : GPU access blocked by the operating system
Inforom Version
Image Version : GPU access blocked by the operating system
OEM Object : Unknown Error
ECC Object : GPU access blocked by the operating system
Power Management Object : Unknown Error
GPU Operation Mode
Current : GPU access blocked by the operating system
Pending : GPU access blocked by the operating system
GSP Firmware Version : 525.60.13
GPU Virtualization Mode
Virtualization Mode : Pass-Through
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x00
Device : 0x1E
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:00:1E.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : Unknown Error
Current : Unknown Error
Device Current : Unknown Error
Device Max : Unknown Error
Host Max : Unknown Error
Link Width
Max : Unknown Error
Current : Unknown Error
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : GPU access blocked by the operating system
Replay Number Rollovers : GPU access blocked by the operating system
Tx Throughput : GPU access blocked by the operating system
Rx Throughput : GPU access blocked by the operating system
Atomic Caps Inbound : GPU access blocked by the operating system
Atomic Caps Outbound : GPU access blocked by the operating system
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 15360 MiB
Reserved : 399 MiB
Used : 2 MiB
Free : 14957 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : GPU access blocked by the operating system
Average FPS : GPU access blocked by the operating system
Average Latency : GPU access blocked by the operating system
FBC Stats
Active Sessions : GPU access blocked by the operating system
Average FPS : GPU access blocked by the operating system
Average Latency : GPU access blocked by the operating system
Ecc Mode
Current : GPU access blocked by the operating system
Pending : GPU access blocked by the operating system
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : GPU access blocked by the operating system
Double Bit ECC : GPU access blocked by the operating system
Pending Page Blacklist : GPU access blocked by the operating system
Remapped Rows : GPU access blocked by the operating system
Temperature
GPU Current Temp : 22 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 13.16 W
Power Limit : 70.00 W
Default Power Limit : 70.00 W
Enforced Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 1590 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes : None
I tried to also run this under runtimeClass: nvidia
and this didn't happen, so it's definitely a gVisor issue now. Unfortunately for our use case GKE is not viable. I'll try with the options you described to see if I can get it working.
from gvisor.
However, it seems it can't fully access the GPU
Yeah I don't think it will work just yet. In GKE, the container spec defines which GPUs to expose in spec.Linux.Devices
. However, in the boot logs you attached above, I could not see any such devices defined. So gvisor will not expose any device.
My best guess is that k8s-device-plugin is creating bind mounts of /dev/nvidia*
devices in the container's root filesystem and then expecting the container to be able to access that. That won't work with gVisor with any combination of our --nvproxy
flags, because even though the devices exist on the host filesystem, they don't exist in our sentry's /dev
filesystem (which is an in-memory filesystem).
In docker mode, the GPU devices are explicitly exposed like this. In GKE, the device files are automatically created here because spec.Linux.Devices
defines it. So you could look into adding similar support for k8s-device-plugin
environment.
from gvisor.
Thanks for the detailed reply @ayushr2! Though I'm a bit out of my depth here, your guidance has been very helpful. I'm trying to better understand the differences for GKE; could you please point me to where the container spec/sandbox is defined? I'm not sure if it's possible to try to port that configuration over to Amazon Linux or if I should just try to add the feature directly to the gVisor code you pointed me to.
from gvisor.
I've tried very naively to add the following snipped to runsc/boot/vfs.go:createDeviceFiles
:
mode := os.FileMode(int(0777))
info.spec.Linux.Devices = append(info.spec.Linux.Devices, []specs.LinuxDevice{
{
Path: "/dev/nvidia0",
Type: "c",
Major: 195,
Minor: 0,
FileMode: &mode,
},
{
Path: "/dev/nvidia-modeset",
Type: "c",
Major: 195,
Minor: 254,
FileMode: &mode,
},
{
Path: "/dev/nvidia-uvm",
Type: "c",
Major: 245,
Minor: 0,
FileMode: &mode,
},
{
Path: "/dev/nvidia-uvm-tools",
Type: "c",
Major: 245,
Minor: 1,
FileMode: &mode,
},
}...)
in order to try to mount the devices during runtime, but seems like even this isn't enough
from gvisor.
You probably also want /dev/nvidiactl
. You basically want to call this. Usually that is only called for --nvproxy-docker
. JUST FOR TESTING try adding a new flag --nvproxy-k8s
and change the condition on line 1221 to be if info.conf.NVProxyDocker || info.conf.NVProxyK8s { ...
Also note that the minor number of /dev/nvidia-uvm is different inside the sandbox. So just copying from host won't work.
from gvisor.
Yeah, from reading the code and looking at the logs seems like gVisor automatically assigns a minor number to the device. Unfortunately your suggestion still didn't work. I'll leave the logs for the container in case you (or anyone that comes across this issue) want to use it for debugging (note that I had already added a nvproxy-automount-dev
flag for the same purpose you suggested using nvproxy-k8s
)
runsc.tar.gz
from gvisor.
Got it, thanks for working with me on this.
Just to set the expectations, adding support for k8s-device-plugin is currently not on our roadmap. We are focused on maturing GPU support in GKE first. OSS contributions for GPU support in additional environments is appreciated in the meantime!
from gvisor.
No worries! In the meantime, we don't have a strict requirement for having NVIDIA working with gVisor so we can get around it. I'd love to help bringing in this feature but it would still need to get more familiarized with gVisor, but I'll help in any way I can!
from gvisor.
A friendly reminder that this issue had no activity for 120 days.
from gvisor.
Hey @sfc-gh-hyu, thanks for the detailed instructions. I haven't revisited this in the meantime as other priorities came up, but I will be testing it again very soon. I will try to follow what you suggested and I will report back with more details.
from gvisor.
Related Issues (20)
- runsc --platform=systrap fails with "panic: seccomp failed: invalid argument" HOT 3
- Problem in building gvisor on ARM64 HOT 4
- [Feature] Asking for support for termux on android(with termux-glibc) HOT 3
- NV2080_CTRL_CMD_GRMGR_GET_GR_FS_INFO: Missing nvproxy ioctl used by NCCL HOT 2
- feed does not validate HOT 1
- Restoring a checkpointed container with a different OCI spec HOT 8
- Mark C ABI structs with `structs.HostLayout`
- segfault: buffer.View possibly released twice resulting in nil chunk HOT 8
- /proc/sys/net/core/rmem_default is visible in non-root network namespaces in recent Linux kernels HOT 1
- //test/syscalls/linux:prctl_test fails to build on x86_64 host because of aarch64 dependencies HOT 2
- runsc: Duplicate container creation deletes the existing container and causes resources leak
- File descriptors not being closed on write to mountpoint-s3 HOT 16
- runsc (in docker): fork/exec /proc/self/exe: read-only file system HOT 5
- gVisor CNI tutorial is not working as expected
- Support no-op `personality(2)` bits
- Regression in recent version? error: setsockopt(..., IP_MTU_DISCOVER, IP_PMTUDISC_OMIT...) failed: Not supported HOT 5
- No obvious way to checkpoint a container when TCP sockets have been recently closed and are in TIME_WAIT state in the kernel HOT 2
- sysctl options declared in config.json not applied to container HOT 3
- Poor performance when switching to multiple CPU Cores HOT 7
- Runtime fails to mount /sys when --tpuproxy is provided HOT 26
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gvisor.