Comments (45)
Fixed remaining failed tests. Both jobs are green now:
- https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1770214426134188032
- https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv2/1770214426197102592
from kubernetes.
We tried increasing the timeout to 6 hours from 4 hours but the timeout still occurs. Something is not right and we could use some help on figuring out why this can be happening.
I don't think there is something wrong with long running tests. There are a lot of slow tests in the job and if infra is loaded with other tasks all of the tests would run slower.
Here is a list of test durations for successful and failed job runs. I don't see anything that can't be explained by the busyness of the infra. Please correct me if I'm wrong here.
- successful run (https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/123386/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1767678095382286336/build-log.txt)
$ grep '\[[0-9]\+\.[0-9]\+ seconds\]' build-log.123386.passed.txt | sed 's/.*\[\(.*\) seconds\].*/\1/' | sort -un
0.011
2.033
4.033
5.365
6.196
8.067
10.381
14.344
16.362
18.369
28.415
32.516
33.452
34.098
44.494
46.470
48.319
50.448
52.412
54.300
64.436
70.343
73.407
75.377
81.594
84.577
104.448
106.508
120.254
130.551
146.379
148.473
154.882
179.385
194.662
201.310
203.016
209.513
290.059
327.427
384.706
388.334
402.381
409.849
485.483
486.376
619.149
624.753
850.932
893.079
967.962
980.838
984.849
986.872
991.022
1008.993
1042.057
Total seconds:
$ (grep '\[[0-9]\+\.[0-9]\+ seconds\]' build-log.123386.passed.txt | sed 's/.*\[\(.*\) seconds\].*/\1/' | sort -un | tr "\012" "+" ; echo "0") | bc
15984.321
$ grep '\[[0-9]\+\.[0-9]\+ seconds\]' build-log.123386.failed.txt | sed 's/.*\[\(.*\) seconds\].*/\1/' | sort -un
0.010
2.162
4.038
5.333
6.023
8.082
10.371
14.333
18.370
22.381
34.448
44.623
46.310
56.427
62.302
64.531
70.368
76.363
87.083
106.517
120.064
134.263
146.335
154.730
194.619
201.182
202.974
203.502
303.038
327.277
375.703
388.318
408.687
485.539
486.443
491.590
494.590
496.764
509.069
619.159
627.600
764.817
967.865
980.931
981.038
986.906
991.223
1009.039
1030.938
1043.115
1064.974
1157.038
1408.810
1417.362
Total seconds:
$ (grep '\[[0-9]\+\.[0-9]\+ seconds\]' build-log.123386.failed.txt | sed 's/.*\[\(.*\) seconds\].*/\1/' | sort -un | tr "\012" "+" ; echo "0") | bc
21915.577
One strange thing though is that the total time for the failed test(21915.577 sec / 60 = 365.2596m) is less than the job timeout (420m). Any ideas why the job was timed out?
from kubernetes.
An interesting datapoint I found between containerd serial and crio serial:
in containerd:
[sig-node] SeccompDefault [Serial] [Feature:SeccompDefault] [LinuxOnly] with SeccompDefault enabled should use unconfined when specified [sig-node, Serial, Feature:SeccompDefault]
k8s.io/kubernetes/test/e2e_node/seccompdefault_test.go:77
STEP: Creating a kubernetes client @ 03/15/24 03:08:28.153
STEP: Building a namespace api object, basename seccompdefault-test @ 03/15/24 03:08:28.154
I0315 03:08:28.157768 1335 framework.go:275] Skipping waiting for service account
STEP: Stopping the kubelet @ 03/15/24 03:08:28.169
I0315 03:08:28.186800 1335 util.go:368] Get running kubelet with systemctl: UNIT LOAD ACTIVE SUB DESCRIPTION
kubelet-20240315T012703.service loaded active running /tmp/node-e2e-20240315T012703/kubelet --kubeconfig /tmp/node-e2e-20240315T012703/kubeconfig --root-dir /var/lib/kubelet --v 4 --config-dir /tmp/node-e2e-20240315T012703/kubelet.conf.d --hostname-override tmp-node-e2e-32a814d3-cos-beta-113-18244-1-14 --container-runtime-endpoint unix:///run/containerd/containerd.sock --config /tmp/node-e2e-20240315T012703/kubelet-config --feature-gates=DisableKubeletCloudCredentialProviders=true --image-credential-provider-config=/tmp/node-e2e-20240315T012703/credential-provider.yaml --image-credential-provider-bin-dir=/tmp/node-e2e-20240315T012703 --kernel-memcg-notification=true --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
, kubelet-20240315T012703
W0315 03:08:28.224588 1335 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": read tcp 127.0.0.1:56986->127.0.0.1:10248: read: connection reset by peer
STEP: Starting the kubelet @ 03/15/24 03:08:28.236
W0315 03:08:28.263213 1335 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused
STEP: Creating a pod to test SeccompDefault-unconfined @ 03/15/24 03:08:33.268
STEP: Saw pod success @ 03/15/24 03:08:37.282
I0315 03:08:37.284826 1335 output.go:196] Trying to get logs from node tmp-node-e2e-32a814d3-cos-beta-113-18244-1-14 pod seccompdefault-test-04e6a259-188d-4ee1-b296-d97a82096652 container seccompdefault-test-04e6a259-188d-4ee1-b296-d97a82096652: <nil>
STEP: delete the pod @ 03/15/24 03:08:37.294
STEP: Stopping the kubelet @ 03/15/24 03:08:37.301
I0315 03:08:37.319465 1335 util.go:368] Get running kubelet with systemctl: UNIT LOAD ACTIVE SUB DESCRIPTION
kubelet-20240315T012703.service loaded active running /tmp/node-e2e-20240315T012703/kubelet --kubeconfig /tmp/node-e2e-20240315T012703/kubeconfig --root-dir /var/lib/kubelet --v 4 --config-dir /tmp/node-e2e-20240315T012703/kubelet.conf.d --hostname-override tmp-node-e2e-32a814d3-cos-beta-113-18244-1-14 --container-runtime-endpoint unix:///run/containerd/containerd.sock --config /tmp/node-e2e-20240315T012703/kubelet-config --feature-gates=DisableKubeletCloudCredentialProviders=true --image-credential-provider-config=/tmp/node-e2e-20240315T012703/credential-provider.yaml --image-credential-provider-bin-dir=/tmp/node-e2e-20240315T012703 --kernel-memcg-notification=true --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
, kubelet-20240315T012703
W0315 03:08:37.359305 1335 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": read tcp 127.0.0.1:52932->127.0.0.1:10248: read: connection reset by peer
STEP: Starting the kubelet @ 03/15/24 03:08:37.37
W0315 03:08:37.400901 1335 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused
I0315 03:08:42.408960 1335 helper.go:121] Waiting up to 7m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "seccompdefault-test-1780" for this suite. @ 03/15/24 03:08:42.411
• [14.261 seconds]
This test takes 14 seconds.
On crio, it takes 974 seconds.
[sig-node] SeccompDefault [Serial] [Feature:SeccompDefault] [LinuxOnly] with SeccompDefault enabled should use unconfined when specified [sig-node, Serial, Feature:SeccompDefault]
k8s.io/kubernetes/test/e2e_node/seccompdefault_test.go:77
STEP: Creating a kubernetes client @ 03/15/24 10:56:41.446
STEP: Building a namespace api object, basename seccompdefault-test @ 03/15/24 10:56:41.446
I0315 10:56:41.450556 2955 framework.go:275] Skipping waiting for service account
STEP: Stopping the kubelet @ 03/15/24 10:56:41.456
I0315 10:58:41.541012 2955 util.go:368] Get running kubelet with systemctl: UNIT LOAD ACTIVE SUB DESCRIPTION
kubelet-20240315T081727.service loaded active running /tmp/node-e2e-20240315T081727/kubelet --kubeconfig /tmp/node-e2e-20240315T081727/kubeconfig --root-dir /var/lib/kubelet --v 4 --config-dir /tmp/node-e2e-20240315T081727/kubelet.conf.d --hostname-override n1-standard-4-fedora-coreos-39-20240225-3-0-gcp-x86-64-82383702 --container-runtime-endpoint unix:///var/run/crio/crio.sock --config /tmp/node-e2e-20240315T081727/kubelet-config --cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
, kubelet-20240315T081727
W0315 11:02:41.775895 2955 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": read tcp 127.0.0.1:60264->127.0.0.1:10248: read: connection reset by peer
STEP: Starting the kubelet @ 03/15/24 11:02:41.784
W0315 11:04:41.838220 2955 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused
STEP: Creating a pod to test SeccompDefault-unconfined @ 03/15/24 11:04:46.845
STEP: Saw pod success @ 03/15/24 11:04:50.858
I0315 11:04:50.860386 2955 output.go:196] Trying to get logs from node n1-standard-4-fedora-coreos-39-20240225-3-0-gcp-x86-64-82383702 pod seccompdefault-test-fe386ac9-d280-465e-9dd3-4aa471ecf5a7 container seccompdefault-test-fe386ac9-d280-465e-9dd3-4aa471ecf5a7: <nil>
STEP: delete the pod @ 03/15/24 11:04:50.868
STEP: Stopping the kubelet @ 03/15/24 11:04:50.874
I0315 11:06:50.938206 2955 util.go:368] Get running kubelet with systemctl: UNIT LOAD ACTIVE SUB DESCRIPTION
kubelet-20240315T081727.service loaded active running /tmp/node-e2e-20240315T081727/kubelet --kubeconfig /tmp/node-e2e-20240315T081727/kubeconfig --root-dir /var/lib/kubelet --v 4 --config-dir /tmp/node-e2e-20240315T081727/kubelet.conf.d --hostname-override n1-standard-4-fedora-coreos-39-20240225-3-0-gcp-x86-64-82383702 --container-runtime-endpoint unix:///var/run/crio/crio.sock --config /tmp/node-e2e-20240315T081727/kubelet-config --cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
, kubelet-20240315T081727
W0315 11:10:51.109902 2955 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": read tcp 127.0.0.1:41882->127.0.0.1:10248: read: connection reset by peer
STEP: Starting the kubelet @ 03/15/24 11:10:51.119
W0315 11:12:51.216556 2955 util.go:500] Health check on "http://127.0.0.1:10248/healthz" failed, error=Head "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused
I0315 11:12:56.223758 2955 helper.go:121] Waiting up to 7m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "seccompdefault-test-6757" for this suite. @ 03/15/24 11:12:56.225
• [974.782 seconds]
from kubernetes.
Job run with only Restart
and Resource-usage
slow tests enabled took more than 5 hours and didn't produce enough artifacts: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv2/1769052751628603392
from kubernetes.
Restart
job is a current suspect. Test run with enabled resource_usage
and system_node_critical
takes 1h 46 min and looks good: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1769276078355910656
I'll try to enable more slow jobs with excluded Restart
job.
from kubernetes.
I was referring to the job shared in the description https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-swap-fedora-serial
from kubernetes.
Job run with all tests except of Restart:Dbus
takes 2h 9m and looks good: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1769659931427868672
So, Restart:Dbus
test case seems to slow down everything.
from kubernetes.
Is this applicable/needed also for Fedora?
It looks like it restarts dbus and systemd, which theoretically should work on any linux with systemd and dbus. However, that comment could mean that it was never tested on anything except Ubuntu.
from kubernetes.
I opened up #124000.
I talked with a few people at Red hat and it was suggested that this test is bad. Going to try just skipping it on fedora. Restarting dbus is not recommended
from kubernetes.
I asked @kwilczynski as he has been working on gracefulshutdown and asked about a few of these problems on RHEL like OS. I think skipping should be fine for this.
from kubernetes.
Nice job @bart0sh!
https://testgrid.k8s.io/sig-node-cri-o#node-kubelet-serial-crio
from kubernetes.
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from kubernetes.
from kubernetes.
Looks like our recent PR didn't fix this.
from kubernetes.
Looking into the serial logs, it looks like these tests are hitting OOM.
[ 3276.883455] Out of memory: Killed process 39048 (dd) total-vm:10490032kB, anon-rss:3218048kB, file-rss:80kB, shmem-rss:0kB, UID:0 pgtables:6352kB oom_score_adj:1000
[ 3276.898409] Out of memory: Killed process 39048 (dd) total-vm:10490032kB, anon-rss:3218048kB, file-rss:80kB, shmem-rss:0kB, UID:0 pgtables:6352kB oom_score_adj:1000
from kubernetes.
/reopen
Memory limits did not help..
from kubernetes.
@kannon92: Reopened this issue.
In response to this:
/reopen
Memory limits did not help..
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from kubernetes.
/reopen
from kubernetes.
@kannon92: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from kubernetes.
Bumping machine size didn't fix this.
from kubernetes.
/cc
👀
from kubernetes.
[ 3276.898409] Out of memory: Killed process 39048 (dd) total-vm:10490032kB, anon-rss:3218048kB, file-rss:80kB, shmem-rss:0kB, UID:0 pgtables:6352kB oom_score_adj:1000
dd is run in a couple of tests as far as I can see:
$ git grep 'dd if' ./test/e2e_node/
test/e2e_node/eviction_test.go: return podWithCommand(volumeSource, resources, diskConsumedMB, name, fmt.Sprintf("dd if=/dev/urandom of=%s${i} bs=1048576 count=1 2>/dev/null; sleep .1;", filepath.Join(path, "file")))
test/e2e_node/oomkiller_linux_test.go: "sleep 5 && dd if=/dev/zero of=/dev/null bs=20M",
test/e2e_node/oomkiller_linux_test.go: "sleep 5 && dd if=/dev/zero of=/dev/null bs=20M & sleep 86400",
test/e2e_node/oomkiller_linux_test.go: "sleep 5 && dd if=/dev/zero of=/dev/null iflag=fullblock count=10 bs=10G",
test/e2e_node/system_node_critical_test.go: "dd if=/dev/urandom of=file${i} bs=10485760 count=1 2>/dev/null; sleep .1;",
eviction_test.go running dd to simulate disk pressure, system_node_critical_test.go is similar. So, I'd start with oomkiller_linux_test.go. The only Serial test I can find is this one: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/oomkiller_linux_test.go#L46.
I've removed Serial
label from it to see it helps the test, here is a test PR: #120459. Let's see how it goes.
from kubernetes.
This looks quite dangerous and explains why bigger instances and limits didn't help: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/oomkiller_linux_test.go#L250
It tries to allocate 100Gb of memory.
from kubernetes.
Successful run has 'Out of memory: Killed process' as well, e.g. https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/123386/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1763620904253788160/artifacts/n1-standard-2-fedora-coreos-39-20240210-3-0-gcp-x86-64-4346b1d8/serial-1.log
from kubernetes.
@kannon92 It looks like OOMKiller message is expected and doesn't impact the issue. It's produced by the OOMKiller for pod using more memory than node allocatable [LinuxOnly]"
test case mentioned above. Without the test case the job is still getting timeouted and doesn't produce artifacts. Here is an example of failed and timeouted job run without the test case: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv2/1768026002467852288
from kubernetes.
Without slow tests pull-kubernetes-node-kubelet-serial-crio-cgroupv1 finished in around 1h 25m and produced valid artifacts.
from kubernetes.
Without slow tests and with OOMKiller test the job still works ok: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv2/1768343770648023040
from kubernetes.
With all tests the issue persisted despite the 10h timeout: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv2/1768400136934789120
I'll start bisecting. Hopefully will spot the test that causes everything to slow down.
from kubernetes.
One thing I noticed is that these tests are running two times. You can search for the start tests banner and see that we restart if the test failed.
from kubernetes.
One of the failures in the failed link was because kubelet restart took more time than ImageMaximumGCAge of 1 minute, is this expected ? Attempt to fix this 123949.
from kubernetes.
Looking into what tests take the longest, I notice that the *Manager jobs are all pretty beefy.
@ffromani @swatisehgal WDYT of moving these to a separate serial job?
from kubernetes.
Running this test on gcp with crio I don't see why it is taking up that long. I think that there is something else going on in this test. In the tests that were passing this test only take 14 seconds also.
from kubernetes.
With all but last 3 slow tests excluded the job still run more than 8h: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1768671241448722432
Remaining slow tests: Resource-usage, Restart and SystemNodeCriticalPod
In contrast the job with all slow tests excluded took only 1h25m: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1768343770585108480
I'm not sure if the reason of slowness is in one of those 3 tests or it's something else.
I'll remove them and run the job again.
from kubernetes.
One thing I noticed is that these tests are running two times. You can search for the start tests banner and see that we restart if the test failed.
I don't think they're running 2 times. The build log is duplicated if testrun fails, I guess. Look at the duration time for the same test case. They're exactly the same with a high precision, which wouldn't be possible for two different test runs.
from kubernetes.
Successful run is executed on n1-standard-2
, failed is executed on n1-standard-4
. Could it be something related with this datapoint?
from kubernetes.
Job run with enabled resource_usage, system_node_critical, log rotation, density and eviction tests takes 1h51m and looks good: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv2/1769311222651424768
So far so good.
Side note: I'm not sure I understand why log_rotation, density and eviction tests marked slow as they altogether take around 5 minutes.
from kubernetes.
Job run with all tests except of Restart
takes 2h and still looks good: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1769356436250300416
I'll try to investigate Restart
test.
from kubernetes.
Job run with all tests except of Restart:Container Runtime
and Restart:Dbus
takes 2h 4m. and still looks good: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-crio-cgroupv1/1769457335215853568
from kubernetes.
Successful run is executed on n1-standard-2 , failed is executed on n1-standard-4 . Could it be something related with this datapoint?
Can you point out to those runs, please? In my testing both pull jobs use n1-standard-4 and results don't differ.
from kubernetes.
Nice! So we can just filter out that job then? Not sure if we have a slow lane for crio/containerd?
from kubernetes.
@kannon92 checking the code it seems it is performing an Ubuntu specific task.
kubernetes/test/e2e_node/restart_test.go
Lines 153 to 154 in aa73f31
kubernetes/test/e2e_node/node_shutdown_linux_test.go
Lines 661 to 683 in aa73f31
Is this applicable/needed also for Fedora?
from kubernetes.
Restart:Dbus
is a test case in the Restart
test: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/restart_test.go#L151-L203
We can either remove/comment out it or investigate why it slows down everything.
from kubernetes.
The Dbus test case was added by this commit
@endocrimes Can you help us to investigate why restart:Dbus
test case slows everything down on Fedora/CoreOS ?
from kubernetes.
@kannon92 The Dbus
test case was added by this PR and tested only with pull-kubernetes-node-kubelet-serial-containerd as crio pull jobs didn't exist at that time. However, it should be failing/slowing down ci jobs if they exist.
from kubernetes.
/assign
from kubernetes.
Related Issues (20)
- image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available HOT 2
- kubelet crash loop panic with SIGSEGV HOT 4
- Status manager does not normalize ephemeral container statuses HOT 6
- Linux 6.6 EEVDF scheduler on Kubernetes: openat2 /sys/fs/cgroup/kubepods.slice/cpu.weight: no such file or directory HOT 2
- Duplicate Tolerations HOT 4
- [sig-cloud-provider] Hybrid cloud native support. HOT 3
- Kubelet: Add a metrics in kubelet to track how long it takes for pod to fully start HOT 8
- 1.30 tag also breaks PodIP.IP (which should be marked required) HOT 2
- One Node all pods got crashloopbackoff HOT 4
- Ephemeral volume scheduling problems HOT 2
- Enabling `publishNotReadyAddresses` causes proxy to direct traffic to NotReady pods. HOT 6
- Ignore and potentially prevent reporting container status for not-existing containers HOT 1
- Regarding adding an interface to retrieve the netns of a Pod object HOT 2
- v1.30: kube-scheduler crashes with: Observed a panic: "integer divide by zero" HOT 12
- containerized protobuf codegen does not handle .go-version / GOTOOLCHAIN properly HOT 19
- Pods that have UnexpectedAdmissionError are not automatically removed. HOT 5
- [Sidecar Containers] Pods comparison by maxContainerRestarts should account for sidecar containers HOT 2
- [Sidecar Containers] Sidecar containers finish time needs to be accounted for in job controller HOT 2
- [Sidecar Containers] Eviction message should account for the sidecar containers HOT 3
- last applied annotations are not getting updated HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.