Comments (13)
do you have a memory limit set for the container?
from gvisor.
do you have a memory limit set for the container?
No, there is no memory limit set for the container.
There are also no oom log entries and no memory pressure on the host.
from gvisor.
On an m7i
with ubuntu 22.04 docker instead hangs when starting a container using runsc:
$ docker run --runtime=runsc-debug --rm hello-world
<hang>
^C^C^C^Z^Z^Z
<still hung>
$ ll /tmp/runsc-debug/
...
-rw-r--r-- 1 root root 3076 Nov 29 09:43 runsc.log.20231129-094346.714655.kill.txt
-rw-r--r-- 1 root root 3076 Nov 29 09:43 runsc.log.20231129-094346.966621.kill.txt
-rw-r--r-- 1 root root 3076 Nov 29 09:43 runsc.log.20231129-094347.022673.kill.txt
$ cat /tmp/runsc-debug/runsc.log.20231129-094347.022673.kill.txt
I1129 09:43:47.022701 5663 main.go:189] ***************************
I1129 09:43:47.022727 5663 main.go:190] Args: [/usr/local/bin/runsc --debug --debug-log=/tmp/runsc-debug/ --strace --log-packets --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2/log.json --log-format json --systemd-cgroup kill f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2 20]
I1129 09:43:47.022737 5663 main.go:191] Version release-20231113.0
I1129 09:43:47.022741 5663 main.go:192] GOOS: linux
I1129 09:43:47.022744 5663 main.go:193] GOARCH: amd64
I1129 09:43:47.022747 5663 main.go:194] PID: 5663
I1129 09:43:47.022751 5663 main.go:195] UID: 0, GID: 0
I1129 09:43:47.022754 5663 main.go:196] Configuration:
I1129 09:43:47.022758 5663 main.go:197] RootDir: /var/run/docker/runtime-runc/moby
I1129 09:43:47.022761 5663 main.go:198] Platform: systrap
I1129 09:43:47.022765 5663 main.go:199] FileAccess: exclusive
I1129 09:43:47.022769 5663 main.go:200] Directfs: true
I1129 09:43:47.022772 5663 main.go:201] Overlay: root:self
I1129 09:43:47.022776 5663 main.go:202] Network: sandbox, logging: true
I1129 09:43:47.022780 5663 main.go:203] Strace: true, max size: 1024, syscalls:
I1129 09:43:47.022784 5663 main.go:204] IOURING: false
I1129 09:43:47.022787 5663 main.go:205] Debug: true
I1129 09:43:47.022793 5663 main.go:206] Systemd: true
I1129 09:43:47.022796 5663 main.go:207] ***************************
D1129 09:43:47.022806 5663 state_file.go:78] Load container, rootDir: "/var/run/docker/runtime-runc/moby", id: {SandboxID: ContainerID:f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2}, opts: {Exact:false SkipCheck:false TryLock:false RootContainer:false}
D1129 09:43:47.023613 5663 container.go:673] Signal container, cid: f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2, signal: signal 0 (0)
D1129 09:43:47.023622 5663 sandbox.go:1211] Signal sandbox "f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2"
D1129 09:43:47.023626 5663 sandbox.go:613] Connecting to sandbox "f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2"
D1129 09:43:47.023672 5663 urpc.go:568] urpc: successfully marshalled 144 bytes.
D1129 09:43:47.023918 5663 urpc.go:611] urpc: unmarshal success.
D1129 09:43:47.023928 5663 container.go:673] Signal container, cid: f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2, signal: stopped (20)
D1129 09:43:47.023933 5663 sandbox.go:1211] Signal sandbox "f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2"
D1129 09:43:47.023936 5663 sandbox.go:613] Connecting to sandbox "f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2"
D1129 09:43:47.023951 5663 urpc.go:568] urpc: successfully marshalled 145 bytes.
D1129 09:43:47.024237 5663 urpc.go:611] urpc: unmarshal success.
I1129 09:43:47.024244 5663 main.go:224] Exiting with status: 0
$ ps aux | grep runsc
...
root 3883 5.5 0.0 2342892 32288 ? Ssl 09:38 0:26 runsc-sandbox --log-format=json --debug-log=/tmp/runsc-debug/ --log-packets=true --strace=true --systemd-cgroup=true --root=/var/run/docker/runtime-runc/moby --debug=true --log=/run/containerd/io.containerd.runtime.v2.task/moby/f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2/log.json --log-fd=3 --debug-log-fd=4 boot --apply-caps=false --bundle=/run/containerd/io.containerd.runtime.v2.task/moby/f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2 --controller-fd=12 --cpu-num=16 --dev-io-fd=-1 --gofer-filestore-fds=9 --gofer-mount-confs=lisafs:self,lisafs:none,lisafs:none,lisafs:none --io-fds=5,6,7,8 --mounts-fd=10 --setup-root=false --spec-fd=13 --start-sync-fd=11 --stdio-fds=14,15,16 --total-host-memory=66326761472 --total-memory=66326761472 --product-name=m7i.4xlarge --proc-mount-sync-fd=23 f84bc25ff786a53ae655439d1ec56ad6cfb1966b089a5dbd250aada14ed1fee2
$ sudo kill 3883
<docker still hung>
$ sudo kill -9 3883
<docker cli exits>
from gvisor.
Could you share boot.txt
logs (or rather all logs like start
, create
, etc) on m7i
with Ubuntu 20.04?
from gvisor.
Sure, here are all the runsc debug logs from an m7i
with ubuntu 20.04: https://gist.github.com/danielnorberg/db9460552d393ae15e7d66588792ec3b (same as shared above in the original issue description, too large to include verbatim)
from gvisor.
Whoops, missed the original link. Thanks.
Hmm seems like the boot process just dies suddenly. Smells like a seccomp violation (since the boot process is sigkilled). Could you check seccomp logs? (sudo ausearch --start today --end now | grep -i seccomp
)
from gvisor.
Per what you posted above, AppArmor may be running too:
Security Options:
apparmor
seccomp
Profile: builtin
I'm not familiar with AppArmor, but is it possible that's killing runsc
? Maybe try running with it disabled. FWIW if you're already sandboxing jobs inside of gVisor, AppArmor shouldn't be necessary.
from gvisor.
Hmm seems like the boot process just dies suddenly. Smells like a seccomp violation (since the boot process is sigkilled). Could you check seccomp logs? (
sudo ausearch --start today --end now | grep -i seccomp
)
After apt install auditd
to have ausearch
available:
$ docker run --rm --runtime=runsc-debug hello-world; echo $?
137
$ sudo ausearch --start today --end now | grep -i seccomp; echo $?
1
from gvisor.
I'm not familiar with AppArmor, but is it possible that's killing
runsc
? Maybe try running with it disabled. FWIW if you're already sandboxing jobs inside of gVisor, AppArmor shouldn't be necessary.
Seems AppArmor is loaded by default on the aws-provided ubuntu 20.04 image, and fwiw it does not seem to be causing issues on an m6i
instance using the same ubuntu image.
Still, brute-force disabling AppArmor on the m7i
instance:
$ sudo apt remove apparmor
...
Removing apparmor (2.13.3-7ubuntu5.2) ...
...
$ sudo apparmor_status
sudo: apparmor_status: command not found
And then after reboot:
$ docker run --rm --runtime=runsc-debug hello-world; echo $?
137
from gvisor.
I can confirm that the same issue manifests on GCP C3 Intel Sapphire Rapids instances.
$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/zone
projects/<redacted>/zones/europe-west4-a
$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/machine-type
projects/<redacted>/machineTypes/c3-highcpu-44
$ docker run --rm --runtime=runsc hello-world; echo $?
137
from gvisor.
Related Issues (20)
- Fail to build runsc from the go branch with standard go tooling
- gVisor fails to detect memory/cpu w/ systemd+cgroupsv2 HOT 11
- cudaMallocManaged() is unsupported in nvproxy
- text-embeddings-inference fails with error attempting to alloc NV_CONFIDENTIAL_COMPUTE object HOT 4
- gvisor panic: Invalid MmapLayout HOT 7
- gVisor start failed in Rasberry Pi 3b+ HOT 3
- not able to install any package using apt on Debian 12 container - Setting TIOCSCTTY for slave fd 23 failed! - ioctl HOT 3
- Unable to checkpoint container with `-nvproxy` after the introduction of `driverABI` HOT 3
- gVisor failed to use host network silently HOT 6
- OOM OCI Events Broken for Kubernetes + CgroupsV2
- `xxx | grep > /dev/null` randomly fails HOT 2
- Unable to restore containers checkpointed with `-nvproxy` and `-nvproxy-docker` HOT 3
- netstack: performance w/TCP-RACK on Windows HOT 7
- Segmentation fault when using powershell with GKE sandboxed nodes HOT 5
- checklocksignore only seems to work as a postfix field comment HOT 1
- gVisor on GCP with gVNIC has long epoll_wait() delays when sending HTTP data HOT 16
- NV50_P2P allocation class unimplemented in nvproxy HOT 19
- Pods stuck in Terminating state due to process not being killed HOT 38
- Nvidia H100 nvproxy: unknown control command 0x20801230 HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gvisor.