Comments (17)
FYI #122828 documents that [Feature:GPUDevicePlugin] run Nvidia GPU Device Plugin tests
is getting dropped!
from kubernetes.
cc @BenTheElder @dims @ameukam
from kubernetes.
Note that while the job is "green" after kubernetes/test-infra#32635 we are not running [Feature:GPUDevicePlugin] run Nvidia GPU Device Plugin tests
anymore and the Windows test is skipped so ... no real tests are run.
This issue is a sub-variant of kubernetes/test-infra#32242
from kubernetes.
/remove-sig k8s-infra
/sig node
/triage accepted
from kubernetes.
It seems fixed https://testgrid.k8s.io/sig-release-master-blocking#gce-device-plugin-gpu-master
from kubernetes.
It seems fixed https://testgrid.k8s.io/sig-release-master-blocking#gce-device-plugin-gpu-master
It's not, see above #124950 (comment)
The [Feature:GPUDevicePlugin] run Nvidia GPU Device Plugin tests
is no longer even in stale tests because it hasn't run in so long now, but it was yesterday as a stale test, we're running no actual GPU tests currently.
from kubernetes.
I don't even seen [Feature:GPUDevicePlugin] run Nvidia GPU Device Plugin tests
in the skipped results
The only matching test is for windows ... asking if something changed with the test cases in SIG Node slack
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-device-plugin-gpu/1795477868910743552/artifacts/junit_01.xml
from kubernetes.
The test cases were all deleted ... bf268f0#diff-7629c065680da0396ef2e8d190ce7cdd1dbf2c336f99c22ec543a4be61d74ccd
from kubernetes.
NOTE: This also impacts the EC2 Job which is no longer running any test cases.
The GCE job is running and "passing" same as the ec2 job now ... neither of which run any tests.
/retitle EC2 + GCE GPU CI Jobs not running any test cases
See an old run: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-ec2-device-plugin-gpu/1781225142752382976
(ran: Kubernetes e2e suite: [It] [sig-scheduling] [Feature:GPUDevicePlugin] run Nvidia GPU Device Plugin tests
, 8 tests passed, the other "tests" are just cluster bringup / test runner etc)
Current:
(7 "tests" passed, none of which are actual e2e tests)
from kubernetes.
๐
from kubernetes.
https://testgrid.k8s.io/sig-release-1.30-blocking#gce-device-plugin-gpu-1.30
keeps failing
from kubernetes.
Yes, I think the job is coming up but not the driver install or device plugin. We need to add more log dump there, I've been discussing a bit with dims what we should do about the test removal in #sig-node: https://kubernetes.slack.com/archives/C0BP8PW9G/p1716914276819719?thread_ts=1716913485.823089&cid=C0BP8PW9G
from kubernetes.
Talked to @elfinhe this morning about the driver install.
from kubernetes.
I'm going to revisit test cases once we figure out the driver install issue on 1.30 with existing test cases on that branch. There are WIP PRs for this and I'm in contact with the team supporting us on the driver problems, providing upstream CI pointers. #125208 / #125206
from kubernetes.
Googlers: b/344684158 bug tracking driver issue at GCP.
[We'll update the PR and comment back here when it's sorted]
from kubernetes.
Notes from sig-node CI meeting:
- This is not release blocking
from kubernetes.
For the GPU tests, we have a driver install fix for 1.30 branch now at #125208
We will need to backport to other branches, forward port to master, and then re-introduce device plugin GPU tests to master.
from kubernetes.
Related Issues (20)
- Node Labeling node.kubernetes.io/out-of-service Taint Label Delay HOT 2
- [FG:InPlacePodVerticalScaling] e2e test does not verify resource update in pod status HOT 3
- cronjob schedule with multiple conditions not working - conflict between day (week) and day (month) HOT 5
- NetPol block self pod trafic using an svc and not direct call HOT 12
- kube-apiserver logs watch requests before they end in 1.30 HOT 9
- Node Lifecycle Controller does not mark pods not ready when node becomes Ready=False HOT 8
- endpoints cannot be changed from notReadyAddresses to addresses HOT 8
- Enhancement: Add vTPM Configuration Fields for Enhanced Container Security HOT 3
- 'kubectl delete istag/$ISTAG --dry-run=server' is unexpectedly deleting the object from the server HOT 5
- [FG:InPlacePodVerticalScaling] resources in pod status are never updated if EventedPLEG is enabled HOT 2
- [Flaking test] ci-kubernetes-e2e-gci-gce.Overall HOT 4
- `kubernetes.io/legacy-token-last-used` label being added to long lived service token secrets HOT 2
- The endpoint status does not update when the pod state changes rapidly. HOT 8
- Pod with exitCode 137๏ผ The reason has nothing to do with resourcesใ HOT 2
- Failure cluster [9afae275...] HOT 2
- finish DRA for 1.31 HOT 4
- [Failing Test] ci-kubernetes-cloud-provider-kind-conformance-parallel-ipv6 (client rate limiter error) HOT 2
- TypeMeta is empty in Type client Apply and Patch responses HOT 3
- Job API: Relax validation enforcing Pod Failure Policy is only compatible with pod restart policy of "Never" HOT 7
- invalid memory address or nil pointer dereference" in wait.JitterUntil HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.