Comments (12)
I was able to quickly reproduce this issue in a fresh cluster using 5 cronjobs on a 1-minute schedule.
manifest for cronjobs
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-1
namespace: default
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-1
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: test
image: busybox
command: ["echo", "Hello"]
volumeMounts:
- mountPath: /test
name: test
volumes:
- name: test
persistentVolumeClaim:
claimName: test-vol-1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-2
namespace: default
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-2
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: test
image: busybox
command: ["echo", "Hello"]
volumeMounts:
- mountPath: /test
name: test
volumes:
- name: test
persistentVolumeClaim:
claimName: test-vol-2
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-3
namespace: default
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-3
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: test
image: busybox
command: ["echo", "Hello"]
volumeMounts:
- mountPath: /test
name: test
volumes:
- name: test
persistentVolumeClaim:
claimName: test-vol-3
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-4
namespace: default
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-4
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: test
image: busybox
command: ["echo", "Hello"]
volumeMounts:
- mountPath: /test
name: test
volumes:
- name: test
persistentVolumeClaim:
claimName: test-vol-4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-5
namespace: default
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-5
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: test
image: busybox
command: ["echo", "Hello"]
volumeMounts:
- mountPath: /test
name: test
volumes:
- name: test
persistentVolumeClaim:
claimName: test-vol-5
from csi-driver.
Maybe it would make sense to export some Go runtime metrics via Prometheus to help debug this.
from csi-driver.
@apricote could you try it with version 1.2.0 again?
We have the same deployment for a few days running in one of our test clusters and can't see a CPU usage like on your screenshots.
from csi-driver.
@LKaemmerling I will do that once I have some free time.
from csi-driver.
I can still reproduce the issue (CSI Driver v1.2.1
, K8s v1.16.3
).
I setup dashboards with data from #67 today, so perhaps I will have some results tomorrow.
from csi-driver.
I could not detect any noticable changes in the metrics that correlate with the increased CPU usage.
For reference, I used the dashboards Go gRPC1 and Go Runtime, as well as some self-built charts to visualize the metrics.
from csi-driver.
I guess we have found the issue π It was a mix of return of wrong error codes and the k8s retrying mechanism which runs out of control...
Legend:
- Before our change
- Remove of the "Aborted" Error Code
- Adding dedicated error code fΓΌr volume already attached
I will prepare the PR shortly. Thank you for your help!
Could you try this specific container: lkdevelopment/csi-driver:1.2.2 . it contains the fix (will be removed after the release)
cc @apricote
from csi-driver.
I re-ran the test with the new versions and the problem still persists for me
from csi-driver.
Okay, could you give use some more information about your setup?
Which k8s version?
Which server types (and location)?
GRPC related:
Which codes are returned? (https://grafana.com/grafana/dashboards/9186) --> Graph "gRPC request code"
from csi-driver.
We deploy our Kubernetes Cluster using Rancher with the docker-machine-driver-hetzner and ui-driver-hetzner.
Rancher: v2.3.2
Kubernetes: v1.16.3
Docker: v18.9.9
Server Type: cx21
Server Location: nbg1-dc3
(todays test), fsn1-dc14
(prod)
gRPC Responses are 50/50 OK
/Unavailable
(v1.2.2):
from csi-driver.
Okay, thank you. We will try to reproduce this. My first idea is, that this is normal behavior.
The process of attaching/detaching volumes is really CPU intensive for the controller. We saw on our test cluster (with CCX31) that the controller uses ~0,5% of CPU. A CCX31 has 8 (dedicated) Cores.
As a general tough we recommend using CCX types for k8s workloads.
from csi-driver.
@apricote I have monitored this now on some cx21 servers over a month. I can not see an unnormal behavior. The controller needs some resources because he does a lot of attaching and detaching requests (and watch on the result). Therefore I will close this issue.
from csi-driver.
Related Issues (20)
- Volumes are attached to the http proxy HOT 5
- You must either provide secret.hcloudApiToken or secret.existingSecretName HOT 3
- Missing image 2.3.0 HOT 3
- Plans to make the CSI working on bare metal (root) servers? HOT 1
- Use hetznercloud csi driver in non hetzner-cloud servers. HOT 1
- Helm Chart: Permissions for leases in apiGroup coordination.k8s.io missing
- Allow passing file system formatting options (e.g. block size) HOT 1
- Missing "mount" directory on provisioned persistent volume HOT 6
- fix(chart): Make default values work with cloud/dedicated hybrid clusters HOT 2
- feat(helm): deploy Grafana dashboard HOT 1
- clarify nomad requirements? HOT 2
- Question: can I attach a Volume to pods running on nodes that are NOT provisioned in Hetzner cloud? HOT 1
- Volume is not attached to the instance, but VolumeAttachment is existent already HOT 3
- PVC Fail "existing disk format of " HOT 7
- Support mounting with SELinux mount options to prevent big volumes from not being able to mount into pods HOT 1
- registry.k8s.io 403, blacklisted ip HOT 1
- Running csi-driver on Hetzner bare-metal machines HOT 1
- ci: release process broken for 2.7.0 HOT 5
- Failed to increase pv size after successful increase of pvc size HOT 1
- Failed to recover after node took the drive (volume) offline
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csi-driver.