hetznercloud / csi-driver Goto Github PK

View Code? Open in Web Editor NEW

594.0 594.0 99.0 968 KB

Kubernetes Container Storage Interface driver for Hetzner Cloud Volumes

License: MIT License

Dockerfile 1.19% Shell 7.27% Go 88.72% Makefile 0.61% Smarty 2.20%

csi csi-driver csi-plugin hetzner-cloud hetzner-cloud-volumes kubernetes kubernetes-volumes

csi-driver's People

Contributors

Stargazers

Watchers

Forkers

alvaroaleman dweidenfeld mavimo johannwagner rahulvramesh alexanderkjeldaas 3cky isgasho mysticaltech theorm javadev-a costela hwuethrich alfatraining kasimon cbeneke drallgood s-soroosh morristech giannisalinetti invidian ahilsend fgruntjes slauger bhelm paprickar eplightning tumbl3w33d prinzdezibel onpaws buuhsmead iosifnicolae2 philipp1992 jota-equis ymaghzaz tpo theduke crouchjay 23technologies sisheogorath acecmantova sui77 vaisov fogs jrasell neptrio swamala jaflores357 grocerlyapp alexanderkraemer airfocusio revolusys chaitu-zelar carlesjavierre simonrosenau mohsenmottaghi ef-jgiannelli awapf maksim-paskal s4ke wavefunk snb-hz hakman guettli isabella232 jlgeering ns11 swarnat justinguese alam0rt pinkdiamond1 m4eba mhutter btxcc jonbau psavva batistein rakhithjk laurigates felixscheinost symentis megalodonbite jooola ebagovtere cdmanager resmo openresearch goenning asteinba morremeyer markusfreitag cgrrep zefixlluja jpenner123 huzaifasaeed19 samcday simonostendorf lel-amri

csi-driver's Issues

Attaching fails because volume is wrongfully assumed to be attached elsewhere

So, the symptoms look like this, a pod doesn't start because:

alvaro@t470s-[2019-06-20-22:50]:[master]
[...]
  Warning  FailedAttachVolume  3s    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-7ff22820-9333-11e9-b435-96000019d538" : rpc error: code = Internal desc = failed to publish volume: volume is already attached to a server, please detach it first (service_error)

I assumed this is because for some reason the controller does something wrong when detaching, but:

alvaro@t470s-[2019-06-20-22:50]:[master]
$ hcloud volume detach pvc-7ff22820-9333-11e9-b435-96000019d538
hcloud: volume not attached to a server (service_error)

I then assumed that maybe there is an inconsistency in the API which is why the attaching doesn't work, but:

alvaro@t470s-[2019-06-20-22:52]:[master]
$ hcloud volume attach --server [redacted]-5446d9cdc8-rpkfk pvc-7ff22820-9333-11e9-b435-96000019d538 
   1s [====================================================================] 100%
Volume 2758011 attached to server [redacted]-5446d9cdc8-rpkfk

So it seems to me the controller thinks for some reason the volume is attached when it is actually not, tries to detach it, which fails because it isn't attached and the whole thing gets stuck.

context deadline exceeded

Hi there.
I followed the guide and installed all the necessary stuff. k8s version is 1.14.0.
I successfully created two PVCs but after that cannot do new ones.
Everything in pending state and I see the error:
failed to provision volume with StorageClass "hcloud-volumes": rpc error: code = DeadlineExceeded desc = context deadline exceeded

I tried k8s 1.15 too but facing the same problem. Can anyone suggest what's going on ?

Cannot create new volumes

Thanks for providing this nice driver!

I am currently running into an issue with it: For my JenkinsX-Installation I will need 5 PVCs. Two of them could be created without problems:

NAME                        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
jenkins                     Bound     pvc-823eb68d-11c2-11e9-8fdf-96000017bcd0   30Gi       RWO            hcloud-volumes   88m
jenkins-x-chartmuseum       Pending                                                                        hcloud-volumes   88m
jenkins-x-docker-registry   Bound     pvc-823cf6f5-11c2-11e9-8fdf-96000017bcd0   100Gi      RWO            hcloud-volumes   88m
jenkins-x-mongodb           Pending                                                                        hcloud-volumes   88m
jenkins-x-nexus             Pending                                                                        hcloud-volumes   88m

Three are still in Pending state.

Investigating the csi-provisioner logs I see the following error on each retry for the pending volumes:

W0106 16:28:07.606659       1 controller.go:685] Retrying syncing claim "jx/jenkins-x-chartmuseum" because failures 1 < threshold 15
E0106 16:28:07.606746       1 controller.go:700] error syncing claim "jx/jenkins-x-chartmuseum": failed to provision volume with StorageClass "hcloud-volumes": rpc error: code = Internal desc = failed to create volume: invalid input in field 'size' (invalid_input)
I0106 16:28:07.607110       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"jx", Name:"jenkins-x-chartmuseum", UID:"823a7822-11c2-11e9-8fdf-96000017bcd0", APIVersion:"v1", ResourceVersion:"87043", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "hcloud-volumes": rpc error: code = Internal desc = failed to create volume: invalid input in field 'size' (invalid_input)
I0106 16:28:07.614090       1 controller.go:191] GRPC response: {}
I0106 16:28:07.614549       1 controller.go:192] GRPC error: rpc error: code = Internal desc = failed to create volume: invalid input in field 'size' (invalid_input)

I wonder why this error occurs since creating the other two volumes worked?

Pod failed to mount volume

Hi,

I followed the readme instruction but in the verification step (step 6), PVC created but pod failed to mount the volume.

I get this error in kubectl describe pod my-csi-app:

Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               29m                   default-scheduler        Successfully assigned default/my-csi-app to worker1
  Normal   SuccessfulAttachVolume  29m                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-eec9e766-5866-11e9-b8bf-96000017da8e"
  Warning  FailedMount             7m23s (x10 over 27m)  kubelet, worker1         Unable to mount volumes for pod "my-csi-app_default(ef206466-5866-11e9-b8bf-96000017da8e)": timeout expired waiting for volumes to attach or mount for pod "default"/"my-csi-app". list of unmounted volumes=[my-csi-volume]. list of unattached volumes=[my-csi-volume default-token-wk72f]
  Warning  FailedMount             3m8s (x21 over 29m)   kubelet, worker1         MountVolume.MountDevice failed for volume "pvc-eec9e766-5866-11e9-b8bf-96000017da8e" : driver name csi.hetzner.cloud not found in the list of registered CSI drivers

I use rancher v2.1.8, RKE, kubernetes v1.13.5.

For enabling feature gates, I add the following config to my cluster.yaml before running rke up:

services:
  kube-apiserver:
    extra_args:
      feature-gates: "CSINodeInfo=true,CSIDriverRegistry=true"
  kubelet:
    extra_args:
      feature-gates: "CSINodeInfo=true,CSIDriverRegistry=true"

I'm not sure if feature-gates are enabled.

P.S. I'm new to kubernetes world

ARM processors are not being supported by the driver

After adding PI devices to my cloud cluster, the CSI driver remained in an infinite crashing loop as the driver does not support ARM processors.

Topology awareness: CSI Controller creates volumes in its own location

This is a followup of my comment in #11. I can reproduce the behaviour with hetznercloud/hcloud-csi-driver:1.1.4 on Kubernetes v1.14.3.

The CSI controller is deployed on a node located in fsn1:

$ kubectl -n kube-system get pod hcloud-csi-controller-0 -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP           NODE                          NOMINATED NODE   READINESS GATES
hcloud-csi-controller-0   4/4     Running   0          8h    10.42.4.20   k8s-infrastructure-worker-1   <none>           <none>

$ kubectl get node --selector csi.hetzner.cloud/location=fsn1,role=worker
NAME                          STATUS   ROLES                      AGE   VERSION
k8s-infrastructure-worker-1   Ready    controlplane,etcd,worker   23h   v1.14.3

I used the following config to create 3 deployments of 1 pod each in fsn1, nbg1 and hel1:

---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: ubuntu-fsn1
  namespace: default
spec:
  selector:
    matchLabels:
      app: csi-test
  replicas: 1
  template:
    metadata:
      labels:
        app: csi-test
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: location
                operator: In
                values:
                - fsn1
      containers:
      - image: ubuntu:xenial
        name: ubuntu-fsn1
        stdin: true
        tty: true
        volumeMounts:
        - mountPath: /mnt
          name: ubuntu-fsn1
      volumes:
      - name: ubuntu-fsn1
        persistentVolumeClaim:
          claimName: ubuntu-fsn1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ubuntu-fsn1
  labels:
    app: csi-test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: ubuntu-nbg1
  namespace: default
spec:
  selector:
    matchLabels:
      app: csi-test
  replicas: 1
  template:
    metadata:
      labels:
        app: csi-test
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: location
                operator: In
                values:
                - nbg1
      containers:
      - image: ubuntu:xenial
        name: ubuntu-nbg1
        stdin: true
        tty: true
        volumeMounts:
        - mountPath: /mnt
          name: ubuntu-nbg1
      volumes:
      - name: ubuntu-nbg1
        persistentVolumeClaim:
          claimName: ubuntu-nbg1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ubuntu-nbg1
  labels:
    app: csi-test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: ubuntu-hel1
  namespace: default
spec:
  selector:
    matchLabels:
      app: csi-test
  replicas: 1
  template:
    metadata:
      labels:
        app: csi-test
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: location
                operator: In
                values:
                - hel1
      containers:
      - image: ubuntu:xenial
        name: ubuntu-hel1
        stdin: true
        tty: true
        volumeMounts:
        - mountPath: /mnt
          name: ubuntu-hel1
      volumes:
      - name: ubuntu-hel1
        persistentVolumeClaim:
          claimName: ubuntu-hel1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ubuntu-hel1
  labels:
    app: csi-test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

$ kubectl apply -f deployment.yaml
deployment.apps/ubuntu-fsn1 created
persistentvolumeclaim/ubuntu-fsn1 created
deployment.apps/ubuntu-nbg1 created
persistentvolumeclaim/ubuntu-nbg1 created
deployment.apps/ubuntu-hel1 created
persistentvolumeclaim/ubuntu-hel1 created

Only fsn1 will actually start:

kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP           NODE                       NOMINATED NODE   READINESS GATES
ubuntu-fsn1-6d5c54d48c-p7k7q   1/1     Running   0          86s   10.42.3.42   k8s-infrastructure-web-1   <none>           <none>
ubuntu-hel1-7596c45f45-q7fxx   0/1     Pending   0          86s   <none>       <none>                     <none>           <none>
ubuntu-nbg1-84bb6b947c-wkh7j   0/1     Pending   0          86s   <none>       <none>                     <none>           <none>

Checking for example the hel1 pod:

describe pod ubuntu-hel1-7596c45f45-q7fxx
Name:               ubuntu-hel1-7596c45f45-q7fxx
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=csi-test
                    pod-template-hash=7596c45f45
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      ReplicaSet/ubuntu-hel1-7596c45f45
Containers:
  ubuntu-hel1:
    Image:        ubuntu:xenial
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /mnt from ubuntu-hel1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xyz (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  ubuntu-hel1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ubuntu-hel1
    ReadOnly:   false
  default-token-dbzkq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xyz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  2m48s (x2 over 2m48s)  default-scheduler  persistentvolumeclaim "ubuntu-hel1" not found
  Warning  FailedScheduling  2m43s                  default-scheduler  pv "pvc-9e87df90-ad86-11e9-896e-9600002bd1f9" node affinity doesn't match node "k8s-infrastructure-web-3": No matching
NodeSelectorTerms
  Warning  FailedScheduling  5s (x3 over 2m43s)     default-scheduler  0/6 nodes are available: 2 node(s) had volume node affinity conflict, 4 node(s) didn't match node selector.

Checking via the hcloud cli shows the volumes in fsn1:

$ hcloud volume list
ID        NAME                                       SIZE    SERVER    LOCATION
2956880   pvc-9e16261e-ad86-11e9-896e-9600002bd1f9   10 GB   2998843   fsn1
2956881   pvc-9e87df90-ad86-11e9-896e-9600002bd1f9   10 GB   -         fsn1
2956882   pvc-9e4d1749-ad86-11e9-896e-9600002bd1f9   10 GB   -         fsn1

And it looks like we are already on to something – if I check the csi-provisioner log, I found the server could not find the requested resource (get csinodeinfos.csi.storage.k8s.io k8s-infrastructure-web-1:

I0723 20:15:31.485128       1 controller.go:926] provision "default/ubuntu-fsn1" class "hcloud-volumes": started
I0723 20:15:31.557402       1 controller.go:188] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0723 20:15:31.557769       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ubuntu-fsn1", UID:"9e16261e-ad86-11e9-896e-9600002bd1f9", APIVersion:"v1", ResourceVersion:"221469", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/ubuntu-fsn1"
I0723 20:15:31.557436       1 controller.go:189] GRPC request: {}
I0723 20:15:31.560208       1 controller.go:191] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0723 20:15:31.563301       1 controller.go:192] GRPC error: <nil>
I0723 20:15:31.563533       1 controller.go:188] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0723 20:15:31.563756       1 controller.go:189] GRPC request: {}
I0723 20:15:31.566767       1 controller.go:191] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0723 20:15:31.570955       1 controller.go:192] GRPC error: <nil>
I0723 20:15:31.571159       1 controller.go:188] GRPC call: /csi.v1.Identity/GetPluginInfo
I0723 20:15:31.571184       1 controller.go:189] GRPC request: {}
I0723 20:15:31.574111       1 controller.go:191] GRPC response: {"name":"csi.hetzner.cloud","vendor_version":"1.1.4"}
I0723 20:15:31.575313       1 controller.go:192] GRPC error: <nil>
W0723 20:15:31.579357       1 topology.go:171] error getting CSINodeInfo for selected node "k8s-infrastructure-web-1": the server could not find the requested resource (get csinodeinfos.csi.storage.k8s.io k8s-infrastructure-web-1); proceeding to provision without topology information
I0723 20:15:31.579408       1 controller.go:544] CreateVolumeRequest {Name:pvc-9e16261e-ad86-11e9-896e-9600002bd1f9 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX
_sizecache:0}
I0723 20:15:31.579827       1 controller.go:188] GRPC call: /csi.v1.Controller/CreateVolume
I0723 20:15:31.579961       1 controller.go:189] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-9e16261e-ad86-11e9-896e-9600002bd1f9","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}

Without into digging the code I am not sure what triggers the error, do you have any hints how I could debug this any further?

Volume deletion after release sometimes fails

Hi,

Sometimes the deletion of volumes with reclaimPolicy: Delete fails:

Name:            pvc-4017a6e3-2af2-11e9-9340-96000019d538
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: csi.hetzner.cloud
Finalizers:      [kubernetes.io/pv-protection external-attacher/csi-hetzner-cloud]
StorageClass:    kubermatic-fast
Status:          Released
Claim:           cluster-prow-e2e-q8lt7fwh/data-etcd-0
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            csi.hetzner.cloud
    VolumeHandle:      1759327
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1549140804874-8081-csi.hetzner.cloud
Events:
  Type     Reason              Age                From                                                                            Message
  ----     ------              ----               ----                                                                            -------
  Warning  VolumeFailedDelete  15m (x16 over 3h)  csi.hetzner.cloud_hcloud-csi-controller-0_c502e394-272c-11e9-a078-9a17553d75a7  rpc error: code = Internal desc = volume with ID '1759327' is still attached to a server (service_error)

The associated claim does not exist anymore, neither does the namespace it was in:

k get pvc -n cluster-prow-e2e-q8lt7fwh data-etcd-0
Error from server (NotFound): namespaces "cluster-prow-e2e-q8lt7fwh" not found

Not working on Kube 1.13.2

kubectl describe pod my-csi-app

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  4m37s              default-scheduler  pod has unbound immediate PersistentVolumeClaims

kubectl describe pvc csi-pvc

Events:
  Type       Reason                Age                   From                         Message
  ----       ------                ----                  ----                         -------
  Normal     ExternalProvisioning  3s (x26 over 6m3s)    persistentvolume-controller  waiting for a volume to be created, either by external provisioner "csi.hetzner.cloud" or manually created by system administrator
Mounted By:  my-csi-app

Pod failed to mount volume, had to manually add node label.

I installed CSI driver and ran the test yaml.
Volume was created, but busybox could not mount it.

Warning	FailedScheduling 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 1 node(s) were unschedulable, 3 node(s) had volume node affinity conflict.

I had to manually add label csi.hetzner.cloud/location=hel1, after witch the test busybox was able to mount the volume.

Should these labels be automatically created by csi-driver?

Can't mount a volume to a newly joined node

Hi,

I created a simple cluster on hetzner.cloud with only 3 nodes (one master and two workers). I deployed several simple tools (Grafana, Prometheus, simple Go app, PostgreSQL) and all of them used automatically mounted volumes based on rules defined in corresponding PersistentVolumeClaims files. Each node in the cluster has attachable-volumes-csi-csi.hetzner.cloud attribute.

$ kubectl describe nodes k8s-n01 | grep attachable-volumes-csi-csi.hetzner.cloud
 attachable-volumes-csi-csi.hetzner.cloud:  16
 attachable-volumes-csi-csi.hetzner.cloud:  16
  attachable-volumes-csi-csi.hetzner.cloud  0          0

After a while, I joined new node k8s-n03:

$ kubectl get nodes
NAME      STATUS   ROLES    AGE    VERSION
k8s-m1    Ready    master   7d7h   v1.15.3
k8s-n01   Ready    <none>   7d7h   v1.15.3
k8s-n02   Ready    <none>   7d7h   v1.15.3
k8s-n03   Ready    <none>   13m    v1.16.0

$ kubectl get pods -A -o wide | grep k8s-n03
kube-system             hcloud-csi-node-6lr5l                       2/2     Running   0          14m     116.203.137.200   k8s-n03   <none>           <none>
kube-system             kube-proxy-ffdpt                            1/1     Running   0          14m     116.203.137.200   k8s-n03   <none>           <none>
kube-system             node-exporter-z4g8x                         1/1     Running   0          14m     116.203.137.200   k8s-n03   <none>           <none>
kube-system             weave-net-vqrrm                             2/2     Running   0          14m     116.203.137.200   k8s-n03   <none>           <none>

On that node there is no lines with attachable-volumes-csi-csi.hetzner.cloud attribute:

$ kubectl describe nodes k8s-n03 | grep attachable-volumes-csi-csi.hetzner.cloud
$

On that node mounting of a volume doesn't work. I labeled this node and added nodeSelector attribute in a pod specification from the README.md example and got the following error (last lines):

$ kubectl label nodes k8s-n03 app=mount-error

$ cat test.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: hcloud-volumes
---
kind: Pod
apiVersion: v1
metadata:
  name: my-csi-app
spec:
  nodeSelector:
    app: mount-error
  containers:
    - name: my-frontend
      image: busybox
      volumeMounts:
        - mountPath: "/data"
          name: my-csi-volume
      command: [ "sleep", "1000000" ]
  volumes:
    - name: my-csi-volume
      persistentVolumeClaim:
        claimName: csi-pvc

$ kubectl apply -f test.yaml
persistentvolumeclaim/csi-pvc created
pod/my-csi-app created

$ kubectl describe pods my-csi-app
Name:               my-csi-app
Namespace:          graylog
Priority:           0
PriorityClassName:  <none>
Node:               k8s-n03/116.203.137.200
Start Time:         Fri, 20 Sep 2019 17:08:49 +0300
Labels:             <none>
Annotations:        kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"my-csi-app","namespace":"graylog"},"spec":{"containers":[{"command":[...
Status:             Pending
IP:                 
Containers:
  my-frontend:
    Container ID:  
    Image:         busybox
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      sleep
      1000000
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from my-csi-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tfnxq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  my-csi-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc
    ReadOnly:   false
  default-token-tfnxq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tfnxq
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  app=mount-error
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               52s               default-scheduler        Successfully assigned graylog/my-csi-app to k8s-n03
  Normal   SuccessfulAttachVolume  49s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-1874dee9-9e16-45b0-ad8b-82ed5ccb8a23"
  Warning  FailedMount             4s (x7 over 36s)  kubelet, k8s-n03         MountVolume.NewMounter initialization failed for volume "pvc-1874dee9-9e16-45b0-ad8b-82ed5ccb8a23" : volume mode "Persistent" not supported by driver csi.hetzner.cloud (only supports [])

I tried to kubeadm reset and kubeadm join ... but with no luck. Any help? Thanks.

Cannot get resource "csinodes"

Auf meinem frisch aufgesetzten Cluster können keine PVC erstellt werden.
Setup:

1 CX21 (master) und 2 CX21 (worker) mit Ubuntu 16
kubernetes 1.16
Calico Networking

Ich habe den CSI driver wie beschrieben eingerichtet und den API Token getestet mit
curl -H "Authorization: Bearer MEINAPITOKEN" https://api.hetzner.cloud/v1/locations

Weil nach 11 Minuten immer noch kein Volume erstellt war, hab ich mal mit kubectl describe pvc csi-pvc nachgeschaut, siehe unten.

Woran könnte das liegen?

Name:          csi-pvc
Namespace:     default
StorageClass:  hcloud-volumes
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: csi.hetzner.cloud
               volume.kubernetes.io/selected-node: kube-02
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Mounted By:    my-csi-app
Events:
  Type     Reason                Age                  From                                                                            Message
  ----     ------                ----                 ----                                                                            -------
  Normal   WaitForFirstConsumer  11m                  persistentvolume-controller                                                     waiting for first consumer to be created before binding
  Normal   Provisioning          3m1s (x10 over 11m)  csi.hetzner.cloud_hcloud-csi-controller-0_da4d99ce-12a8-11ea-a8c1-6e6e0c890cce  External provisioner is provisioning volume for claim "default/csi-pvc"
  Warning  ProvisioningFailed    3m1s (x10 over 11m)  csi.hetzner.cloud_hcloud-csi-controller-0_da4d99ce-12a8-11ea-a8c1-6e6e0c890cce  failed to provision volume with StorageClass "hcloud-volumes": error generating accessibility requirements: error getting CSINode for selected node "kube-02": csinodes.storage.k8s.io "kube-02" is forbidden: User "system:serviceaccount:kube-system:hcloud-csi" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope
  Normal   ExternalProvisioning  91s (x42 over 11m)   persistentvolume-controller                                                     waiting for a volume to be created, either by external provisioner "csi.hetzner.cloud" or manually created by system administrator

Failover not working

If a node fails, pods with volumes can not start on another node because of Multi-Attach error for volume. Even if there is a manual way with hcloud cli and kubectl it would make the purpose of running a "cluster" useless when the most important apps (the ones with data) can not failover without manual intervention.

kubernetes v1.15.3

How do you Backup?

Since hetzner cloud volumes cannot be included in snapshots or automatic backups, data on volumes have to be backed up by some other process, e.g. borg on storage boxes.

How do you achieve that?

I see that the new created volume is mounted at paths like:

/dev/disk/by-id/scsi-0HC_Volume_3701950 ext4        9.8G   37M  9.8G   1% /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4dc85a0b-6856-48c7-8efa-079a5dbe095b/globalmount
/dev/disk/by-id/scsi-0HC_Volume_3701950 ext4        9.8G   37M  9.8G   1% /var/lib/kubelet/pods/6b58575d-433d-4de7-bfae-9c2f432e4e73/volumes/kubernetes.io~csi/pvc-4dc85a0b-6856-48c7-8efa-079a5dbe095b/mount

So do you just include /var/lib/kubelet/plugins/kubernetes.io/csi/pv in your backup?
Or do you have any backup strategy to have the backup data organized by volume names instead of IDs?

Unable to Authenticate

Hi I have followed the README and used the most recent release but hcloud-csi-controller and hcloud-csi-node are both in CrashLoopBackOff and I can't figure out what is wrong from the logs. I have the correct api key in the secret. Thanks for any help. I have attached the logs from hcloud-csi-controller and hcloud-csi-node.

The only thing that makes sense is the unable to authenticate error? I used this guide to setup the cluster: https://vitux.com/install-and-deploy-kubernetes-on-ubuntu/. Am I maybe missing something which is not allowing correct authentication?

csi-attacher.txt

csi-cluster-driver-registrar.txt

csi-provisioner.txt

hcloud-csi-driver.txt

Resizing PVC

Changing the size of a PersistentVolumeClaim after it has already been created and applying the manifest currently results in the following error:
forbidden: only dynamically provisioned pvc can be resized and the storageclass that provisions the pvc must support resize

Are there any plans to support resizing persistent volumes created by the Hetzner CSI driver in the future? If not, is there any way to do this manually?

Adding multiple volume to a pod causes "waiting for first consumer to be created before binding"

Hi, i have added to 4 volumes to a pod. and and only first two is created. The others showing waiting for first consumer to be created before binding, but its working fine with previous versions. The only changes i see is volumeBindingMode: WaitForFirstConsumer
and --feature-gates=Topology=true

Here is the pod specs:

`apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc
spec:
accessModes:

ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: hcloud-volumes

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc2
spec:
accessModes:

ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: hcloud-volumes

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc3
spec:
accessModes:

ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: hcloud-volumes

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc4
spec:
accessModes:

ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: hcloud-volumes

kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-volume
- mountPath: "/data2"
name: my-csi-volume2
- mountPath: "/data3"
name: my-csi-volume3
- mountPath: "/data4"
name: my-csi-volume4
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-volume
persistentVolumeClaim:
claimName: csi-pvc
- name: my-csi-volume2
persistentVolumeClaim:
claimName: csi-pvc2
- name: my-csi-volume3
persistentVolumeClaim:
claimName: csi-pvc3
- name: my-csi-volume4
persistentVolumeClaim:
claimName: csi-pvc4`

Server identification via hc-utils

I was setting up a cluster in a virtual network. A gateway server would provide internet access for the cluster nodes. Everthing worked fine but when creating a pv the volume is attached the gateway server instead of the cluster node.

So far i didn't have time to find in the code how the server to attach to is identified but i would guess from that behavior it's identified by the main ip address.
Since the cluster nodes in my case don't use the eth0 device, it doens't work anymore.

Would it be possible to use a unique identifer in the config of hc-utils to securly identify the correct server to attach to?

Mount a volume to a directory of the host (not pod)

Hi, to test something I need to copy some data from the host (outside Kubernetes) into a pod's volume; what is the easiest way to mount an existing volume created with the csi driver, to a directory on the host? I was thinking I could scale the deployment to zero so that no pods are using the volume, then attach the volume manually to the host from the cloud console and then mount it to a directory. Is there a quicker/easier way? I will need to do this several times. Thanks!

hcloud-csi-nodes do not respect `taints` on nodes

Hi Team, your template for hckoud-csi-nodes ignore nodes where taints is set, to deploy daemonset to any nodes regardless taints on them you need add to spec here https://github.com/hetznercloud/csi-driver/blob/master/deploy/kubernetes/hcloud-csi.yml#L176
section below:

tolerations:
 - operator: Exists
   effect: NoExecute
 - operator: Exists
   effect: NoSchedule

redeploy cant mount volume again

After pod recreation a volume cant reattached. Volume is unassigned in Hetzner cloud console and kubernetes shows an error:

MountVolume.NewMounter initialization failed for volume "pvc-828ce17e-d3a0-11e9-a89d-9600002f996e" : volume mode "Persistent" not supported by driver csi.hetzner.cloud (only supports [])

There's no error in any hcloud pods. Happens when pod is recreated on same node.

When I force a different node (by label) is says that volume is already attached. But it isnt.

All inside a fresh setup

What can I do?

Should generate error when attempting to provision RWX storage

I've been migrating a cluster to Hetzner from another provider, and I accidentally had a PVC yaml file that requested a ReadWriteMany volume. The cloud API refused to provision it (correctly), but no good error message was given. It wasn't until I diffed the yaml file against one that worked that I caught the error. I better message in the PVC description would be welcome (it was left in a "pending" state rather than a "failed" state).

Once I changed to ReadWriteOnce I was immediately able to provision storage again; other than this one issue I've found the CSI to work really well; thanks!

Consider making topology awareness optional

It requires the Topology feature gate which is alpha, and requiring users to use alpha features is suboptimal.

E2E tests

I do manual testing before releasing a new version. I’d be cool to have automated e2e tests. Bonus points for having it in CI too.

[rancher/k3s only] attacher.Attach failed: volumeattachments.storage.k8s.io is forbidden

This was first reported by @costela in #42 (comment).

E0727 14:03:55.220699    2205 csi_attacher.go:93] kubernetes.io/csi: attacher.Attach failed: volumeattachments.storage.k8s.io is forbidden: User "system:node:master1" cannot create resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: can only get individual resources of this type

I suspect this might be related, because the call to create the VolumeAttachment happens from the master node, instead of the worker node where the pod is scheduled, so the admission is denied by the node authorizer (AFAICT, it only allows create requests for VolumeAttachment to be made by the Node actually running the pod using the volume).

I'm consistently getting this issue regardless of which node the pod that needs a volume attached is scheduled on.

The core of this issue seems to be that the attachdetach-controller (source https://github.com/kubernetes/kubernetes/tree/release-1.14/pkg/controller/volume/attachdetach), which should be using its own service account called system:controller:attachdetach-controller, is apparently instead using system:node:<NODE_NAME> service account, for unknown to me reasons.

I'm getting this on a rancher/k3s distribution of Kubernetes (version 1.14.5), and I'm not sure whether the issue is caused by this csi-driver or not, but since @costela already reported this earlier, I'm creating an issue here.

Edit: The issue is tracked here: k3s-io/k3s#732

Volume Snapshot Class

Is it already possible to do snapshots with the driver?

Fix usage of ABORTED error code

According to the CSI spec we don’t use the ABORTED error code correctly. Instead, we should return INTERNAL or UNAVAILABLE to instruct the caller to retry.

Feature Request: fstype

Is it possible to implement fstype parameter, so we can choose the filesystem (xfs/ext4) during provisioning?

[Question] Meet prerequisites?

Hy, i'm new to Kubernetes...

how can i enable the required prerequisites:

--feature-gates=CSINodeInfo=true,CSIDriverRegistry=true
--allow-privileged=true

I have create a Test-Env with kudeadm. 1 Master 2 Worker.
Thanks!
BR Bernd

error getting CSINode for selected node "ubuntu-2gb-nbg1-2"

Hi
I provision a 3-node Zookeeper.
The first 2 nodes are up and running with their 2 volumes
However the 3rd fails with this:

default              4s          Warning   ProvisioningFailed             persistentvolumeclaim/datadir-zk-2             

failed to provision volume with StorageClass "hcloud-volumes": 
error generating accessibility requirements: 
error getting CSINode for selected node "ubuntu-2gb-nbg1-2": 
csinodes.storage.k8s.io "ubuntu-2gb-nbg1-2" not found

Indeed it is not listed in the

➜ ~ kubectl get csinodes
NAME                CREATED AT
ubuntu-2gb-nbg1-4   2019-10-11T21:27:53Z
ubuntu-2gb-nbg1-6   2019-10-15T17:58:55Z

How can I fix that?

Is this production ready?

hi,

i tried it and it works so far. As this is an official driver i wonder if it is production ready.

cheers,

mike

Add encryption support

Currently I am enabling encryption as simple as

  "echo -n \"${local.admin_password}\" | cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_${hcloud_volume.al-data-node.id}",
  "echo -n \"${local.admin_password}\" | cryptsetup luksOpen /dev/disk/by-id/scsi-0HC_Volume_${hcloud_volume.al-data-node.id} encrypted-contract-analyzer-storage",
  "mkfs.ext4 /dev/mapper/encrypted-contract-analyzer-storage",
  "mount -o discard,defaults /dev/mapper/encrypted-contract-analyzer-storage /mnt/al-data",

via Terraform. It would be great if you could provide an option to enable encryption.

Export Prometheus metrics

At least we should export some Go runtime metrics, probably also number of API calls per endpoint and error rates.

Refactor deployment files

Add cluster driver registrar
Check resource names are consistent
Consider consolidating into a single YAML file
Consider creating separate namespace for our CSI

CPU Usage slowly increasing

Summary

In our deployment the csi-driver will slowly raise its CPU usage over multiple days.

I killed the pod on 09/17 16:00, causing the usage to "reset".

Access Pattern

We have a total of 5 persistent volumes created in the cluster
1 is unbound (no pvc)
2 are associated with prometheus and rarely need to be reassigned
2 are associated with cronjobs that run on a 5 minute schedule, requiring the volumes to be attached+mounted every 5 minutes (I suspect this might leak some watches/internal timers that cause the growing CPU usage)

➜  ~ kubectl get pvc --all-namespaces
NAMESPACE                         NAME                                                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
cattle-prometheus                 prometheus-cluster-monitoring-db-prometheus-cluster-monitoring-0     Bound    pvc-5e4ed3ef-c26d-11e9-9e15-9600002e0c70   10Gi       RWO            hcloud-volumes   35d
platform-monitoring               prometheus-platform-monitoring-db-prometheus-platform-monitoring-0   Bound    pvc-c01b6132-d8b6-11e9-9ef9-9600002e0c6f   10Gi       RWO            hcloud-volumes   6d17h
REDACTED                          REDACTED                                                             Bound    pvc-ff65820a-c1da-11e9-981f-9600002e0ca4   10Gi       RWO            hcloud-volumes   35d
REDACTED                          REDACTED-2                                                           Bound    pvc-d47da832-bda7-11e9-912b-960000172088   10Gi       RWO            hcloud-volumes   41d
➜  ~ kubectl get pv --all-namespaces 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                                                    STORAGECLASS     REASON   AGE
pvc-49de8eef-adfe-11e9-912b-960000172088   10Gi       RWO            Retain           Released   cattle-prometheus/prometheus-cluster-monitoring-db-prometheus-cluster-monitoring-0       hcloud-volumes            61d
pvc-5e4ed3ef-c26d-11e9-9e15-9600002e0c70   10Gi       RWO            Delete           Bound      cattle-prometheus/prometheus-cluster-monitoring-db-prometheus-cluster-monitoring-0       hcloud-volumes            35d
pvc-c01b6132-d8b6-11e9-9ef9-9600002e0c6f   10Gi       RWO            Delete           Bound      platform-monitoring/prometheus-platform-monitoring-db-prometheus-platform-monitoring-0   hcloud-volumes            6d17h
pvc-d47da832-bda7-11e9-912b-960000172088   10Gi       RWO            Delete           Bound      REDACTED/REDACTED-2                                                                      hcloud-volumes            41d
pvc-ff65820a-c1da-11e9-981f-9600002e0ca4   10Gi       RWO            Delete           Bound      REDACTED/REDACTED                                                                        hcloud-volumes            35d

Details

CSI-Driver Version: 1.1.5
K8s Version: 1.14.5

➜  ~ kubectl version  
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.5", GitCommit:"0e9fcb426b100a2aea5ed5c25b3d8cfbb01a8acf", GitTreeState:"clean", BuildDate:"2019-08-05T09:13:08Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Logs from Container hcloud-csi-driver directly after restart

level=debug ts=2019-09-23T12:34:59.042539352Z msg="getting instance id from metadata service"
level=debug ts=2019-09-23T12:34:59.048672227Z msg="fetching server"
level=info ts=2019-09-23T12:34:59.386696775Z msg="fetched server" server-name=compute4
level=debug ts=2019-09-23T12:34:59.477231815Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:34:59.477353825Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:34:59.480500769Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:34:59.480590227Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:34:59.484366757Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:34:59.484420039Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:34:59.488958315Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:34:59.489026328Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:35:00.301813104Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:35:00.301896393Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:35:00.31035102Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:35:00.310428105Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:35:00.314012584Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:35:00.314080245Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:35:00.318897099Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-23T12:35:00.318935292Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:35:08.986510143Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142363\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1566060975326-8081-csi.hetzner.cloud\" > "
level=info ts=2019-09-23T12:35:08.98664899Z component=api-volume-service msg="attaching volume" volume-id=3098265 server-id=3142363
level=debug ts=2019-09-23T12:35:12.282519058Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:35:35.058390576Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142363\" "
level=info ts=2019-09-23T12:35:35.058546437Z component=api-volume-service msg="detaching volume from server" volume-id=3098265 server-id=3142363
level=debug ts=2019-09-23T12:35:37.349104517Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:40:01.540748703Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142354\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1566060975326-8081-csi.hetzner.cloud\" > "
level=info ts=2019-09-23T12:40:01.540829349Z component=api-volume-service msg="attaching volume" volume-id=3098265 server-id=3142354
level=debug ts=2019-09-23T12:40:07.529582765Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:40:07.547330301Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142354\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1566060975326-8081-csi.hetzner.cloud\" > "
level=info ts=2019-09-23T12:40:07.547552922Z component=api-volume-service msg="attaching volume" volume-id=3098265 server-id=3142354
level=info ts=2019-09-23T12:40:08.1849406Z component=api-volume-service msg="failed to attach volume" volume-id=3098265 server-id=3142354 err="volume is already attached to a server, please detach it first (service_error)"
level=debug ts=2019-09-23T12:40:08.225571172Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:40:27.167239085Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142354\" "
level=info ts=2019-09-23T12:40:27.167581839Z component=api-volume-service msg="detaching volume from server" volume-id=3098265 server-id=3142354
level=debug ts=2019-09-23T12:40:29.483828719Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:45:03.489533212Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142352\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1566060975326-8081-csi.hetzner.cloud\" > "
level=info ts=2019-09-23T12:45:03.489663779Z component=api-volume-service msg="attaching volume" volume-id=3098265 server-id=3142352
level=debug ts=2019-09-23T12:45:07.818956291Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:45:34.322893995Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142352\" "
level=info ts=2019-09-23T12:45:34.322972245Z component=api-volume-service msg="detaching volume from server" volume-id=3098265 server-id=3142352
level=debug ts=2019-09-23T12:45:36.618721636Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:50:05.781757496Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142350\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1566060975326-8081-csi.hetzner.cloud\" > "
level=info ts=2019-09-23T12:50:05.781999032Z component=api-volume-service msg="attaching volume" volume-id=3098265 server-id=3142350
level=debug ts=2019-09-23T12:50:08.793330099Z component=grpc-server msg="finished handling request"
level=debug ts=2019-09-23T12:50:32.28802285Z component=grpc-server msg="handling request" req="volume_id:\"3098265\" node_id:\"3142350\" "
level=info ts=2019-09-23T12:50:32.288189703Z component=api-volume-service msg="detaching volume from server" volume-id=3098265 server-id=3142350
level=debug ts=2019-09-23T12:50:35.147710534Z component=grpc-server msg="finished handling request"

CPU Usage over same timeframe

Note: CPU Usage is already slowly increasing

Back-off restarting failed container

I installed this driver in my mint Rancher cluster

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T19:44:19Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

╰──  kubectl describe po/hcloud-csi-controller-0 -n kube-system
Name:               hcloud-csi-controller-0
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               master1/116.202.31.106
Start Time:         Sat, 12 Jan 2019 16:44:41 +0330
Labels:             app=hcloud-csi-controller
                    controller-revision-hash=hcloud-csi-controller-858d9c6cc4
                    statefulset.kubernetes.io/pod-name=hcloud-csi-controller-0
Annotations:        cni.projectcalico.org/podIP: 10.42.1.5/32
Status:             Running
IP:                 10.42.1.5
Controlled By:      StatefulSet/hcloud-csi-controller
Containers:
  csi-attacher:
    Container ID:  docker://d0993ce882d7ac1838e2241972ce3d0eecc51dd236d7322babffaf28abbf6a74
    Image:         quay.io/k8scsi/csi-attacher:v1.0.1
    Image ID:      docker-pullable://quay.io/k8scsi/csi-attacher@sha256:6425af42299ba211de685a94953a5c4c6fcbfd2494e445437dd9ebd70b28bf8a
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
      --v=5
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 13 Jan 2019 11:47:43 +0330
      Finished:     Sun, 13 Jan 2019 11:49:43 +0330
    Ready:          False
    Restart Count:  164
    Environment:    <none>
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-q54lk (ro)
  csi-provisioner:
    Container ID:  docker://8064efbf3109fa566935320adc384f14b7e18db977c04c4b36f8f723d6d92276
    Image:         quay.io/k8scsi/csi-provisioner:v1.0.1
    Image ID:      docker-pullable://quay.io/k8scsi/csi-provisioner@sha256:7d7d832832b536f32e899669a32d4fb75ab972da20c21a2bd6043eb498cf58e8
    Port:          <none>
    Host Port:     <none>
    Args:
      --provisioner=csi.hetzner.cloud
      --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
      --v=5
    State:          Running
      Started:      Sat, 12 Jan 2019 16:44:51 +0330
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-q54lk (ro)
  csi-cluster-driver-registrar:
    Container ID:  docker://161c1a2184ac80147b12def2ca1bf35c0aaf818df8283e175b2832ac49fe793b
    Image:         quay.io/k8scsi/csi-cluster-driver-registrar:v1.0.1
    Image ID:      docker-pullable://quay.io/k8scsi/csi-cluster-driver-registrar@sha256:fafd75ae5442f192cfa8c2e792903aee30d5884b62e802e4464b0a895d21e3ef
    Port:          <none>
    Host Port:     <none>
    Args:
      --pod-info-mount-version="v1"
      --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
      --v=5
    State:          Running
      Started:      Sun, 13 Jan 2019 11:54:41 +0330
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 13 Jan 2019 11:48:35 +0330
      Finished:     Sun, 13 Jan 2019 11:49:35 +0330
    Ready:          True
    Restart Count:  192
    Environment:    <none>
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-q54lk (ro)
  hcloud-csi-driver:
    Container ID:   docker://dd9e0d633c06840113d9a2d7ec58e14a80c579708645fc5386b1c5246e9da408
    Image:          hetznercloud/hcloud-csi-driver:1.0.0
    Image ID:       docker-pullable://hetznercloud/hcloud-csi-driver@sha256:0485a0eac48964e5ecb98204e06c34701e239862633ba0e79dfeb29266b38e17
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 13 Jan 2019 11:51:22 +0330
      Finished:     Sun, 13 Jan 2019 11:51:22 +0330
    Ready:          False
    Restart Count:  228
    Environment:
      CSI_ENDPOINT:  unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      HCLOUD_TOKEN:  <set to the key 'token' in secret 'hcloud-csi'>  Optional: false
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-q54lk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  socket-dir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  hcloud-csi-token-q54lk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hcloud-csi-token-q54lk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                     From              Message
  ----     ------   ----                    ----              -------
  Warning  BackOff  9m59s (x5629 over 19h)  kubelet, master1  Back-off restarting failed container
  Warning  BackOff  5m7s (x4038 over 19h)   kubelet, master1  Back-off restarting failed container

kubectl get pods -n kube-system
NAME                                      READY   STATUS              RESTARTS   AGE
canal-8l2rj                               3/3     Running             0          5d
canal-djc75                               3/3     Running             0          5d
canal-dpjwn                               3/3     Running             0          5d
canal-qzsk8                               3/3     Running             0          5d
cert-manager-7d4bfc44ff-b2mc9             1/1     Running             0          5d
hcloud-csi-controller-0                   2/4     CrashLoopBackOff    577        18h
hcloud-csi-node-bs6r4                     0/2     ContainerCreating   0          18h
hcloud-csi-node-hm69s                     0/2     ContainerCreating   0          18h
hcloud-csi-node-n2nck                     0/2     ContainerCreating   0          18h
hcloud-csi-node-wfrgv                     0/2     ContainerCreating   0          18h
kube-dns-7588d5b5f5-bdww7                 3/3     Running             0          5d
kube-dns-autoscaler-5db9bbb766-2lns2      1/1     Running             0          5d
metrics-server-97bc649d5-xrb28            1/1     Running             0          5d
rke-ingress-controller-deploy-job-gbr8g   0/1     Completed           0          5d
rke-kubedns-addon-deploy-job-gnfcr        0/1     Completed           0          5d
rke-metrics-addon-deploy-job-dxj4j        0/1     Completed           0          5d
rke-network-plugin-deploy-job-qcxsw       0/1     Completed           0          5d
tiller-deploy-85744d9bfb-22rnx            1/1     Running             0          5d

$ kubectl logs hcloud-csi-controller-0 -n kube-system
Error from server (BadRequest): a container name must be specified for pod hcloud-csi-controller-0, choose one of: [csi-attacher csi-provisioner csi-cluster-driver-registrar hcloud-csi-driver]

Handle unset node_id correctly in ControllerUnpublishVolume

  // The ID of the node. This field is OPTIONAL. The CO SHOULD set this
  // field to match the node ID returned by `NodeGetInfo` or leave it
  // unset. If the value is set, the SP MUST unpublish the volume from
  // the specified node. If the value is unset, the SP MUST unpublish
  // the volume from all nodes it is published to.
  string node_id = 2;

We don’t handle node_id == "" correctly. We return an error in the controller if node_id is empty.

Links in the readme are not working (404)

Links in the Readme are not working:

See Enabling features and Enable privileged Pods in the Kubernetes CSI documentation.

https://kubernetes-csi.github.io/docs/Setup.html#enabling-features
https://kubernetes-csi.github.io/docs/Setup.html#enable-privileged-pods

Volume assigning step has failed due to an unknown error

Kubernetes version: 1.13.1
Ubuntu version: 18.04
One master node, two worker nodes (all CX21)
Using the provided example in the README

The volume gets created as I can see it in the Hetzner Cloud dashboard, but it isn't attached to a server.

Name:               my-csi-app
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               worker2
Start Time:         Mon, 14 Jan 2019 13:39:28 +0100
Labels:             <none>
Annotations:        kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"my-csi-app","namespace":"default"},"spec":{"containers":[{"command":[...
Status:             Pending
IP:
Containers:
  my-frontend:
    Container ID:
    Image:         busybox
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sleep
      1000000
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from my-csi-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9pq7v (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  my-csi-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc
    ReadOnly:   false
  default-token-9pq7v:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9pq7v
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Warning  FailedScheduling    104s (x4 over 110s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled           104s                 default-scheduler        Successfully assigned default/my-csi-app to worker2
  Warning  FailedAttachVolume  37s (x8 over 101s)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-6b5bd076-17f9-11e9-a20a-9600001463bf" : rpc error: code = Internal desc = failed to publish volume: Volume assigning step has failed due to an unknown error. (unknown_error)

Reattach already created volume

I want to reattach an already created volume and specified:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
    name: storage
spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi
    storageClassName: hcloud-volumes
    volumeMode: Filesystem
    **volumeName: pvc-2d1ad3c7-ffba-42ea-9987-xxxx**

The volumeName refers to the Volume ID that was already created. However this volume cannot be reattached.

Any tips? @thcyron :)

FailedAttachVolume, volume is already attached to a server, please detach it first

Hi
Thanks for this CSI driver, most of the time this CSI driver works great
I use kubespray with kubernetes 1.13.5

But I have some problems
When I rolling-upgrade my kubernetes, some of the pvc failed to be attached to new server because it still attached to the old server

  Warning  FailedAttachVolume  24m (x2 over 24m)   attachdetach-controller      AttachVolume.Attach failed for volume "pvc-9b8b9ad0-50b3-11e9-b9f7-9600001dedca" : rpc error: code = Internal desc = failed to publish volume: cannot perform operation because server is locked (locked)
  Warning  FailedAttachVolume  22s (x18 over 24m)  attachdetach-controller      AttachVolume.Attach failed for volume "pvc-9b8b9ad0-50b3-11e9-b9f7-9600001dedca" : rpc error: code = Internal desc = failed to publish volume: volume is already attached to a server, please detach it first (service_error)
  Warning  FailedMount         11s (x11 over 22m)  kubelet, node1  Unable to mount volumes for pod "prometheus-server-0_monitoring(c070914c-66e0-11e9-a87d-9600001dedca)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"prometheus-server-0". list of unmounted volumes=[storage-volume]. list of unattached volumes=[storage-volume config-volume prometheus-server-token-lhlf6]

I go to the VM where the volume attached, and see that it already in unmounted state, so
I manually detach related pvc from cloud console (volume page), and then delete related pods so it can quickly be rescheduled. After that, related volume can be attached to new server automatically (but sometimes need to be manually attached too)

Is it expected to be happened? I think there is something that should always check the volume state, and detach it if required

Increase given volume size to 10Gi if a smaller size is given

I've just setup a new cluster and tried the new Hetzner CSI driver. Runs like a charm and integrates nicely in combination with rancher. So thanks a lot!

I figured out that a lot of helm charts do not contain information on the volume size. It would be great if a default of 10Gi or a globally defined value can be used as a fallback.

As an example, launching gitlab from helm leads to the current state.

redis, prometheus and postgres are failing because of the missing size definition:

Warning | ProvisioningFailed | failed to provision volume with StorageClass "hcloud-volumes": rpc error: code = Internal desc = failed to create volume: invalid input in field 'size' (invalid_input) | a few seconds ago

Just in case: am I missing an option that allows me to define defaults values for CSI-Drivers? In Rancher at least there's an option allowing to pass parameters to the storage class.

Is volume expansion supported?

Is volume expansion supported by this driver?

Thanks

PVC remains pending for provided example

I used the provided example from the README on a Kubernetes 1.15.3 cluster with nodes in Falkenstein. The PVC stays pending and no actual volume is created. I've followed the installation guidelines and the hcloud-csi-controller and hcloud-csi-node pods are up and running.

Relevant logs:

kubelet (restarting kubelet didn't make the error go away)

goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins/csi.hetzner.cloud/csi.sock" failed. No retries permitted until 2019-09-15 12:45:19.70614928 +0200 CEST m=+500.438830540 (durationBeforeRetry 2m2s). Error: "RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins/csi.hetzner.cloud/csi.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration"

csi-provisioner

W0915 10:38:33.285927       1 deprecatedflags.go:53] Warning: option provisioner="csi.hetzner.cloud" is deprecated and has no effect
I0915 10:38:33.286242       1 feature_gate.go:226] feature gates: &{map[Topology:true]}
I0915 10:38:33.290270       1 csi-provisioner.go:98] Version: v1.2.1-0-g971feacb
I0915 10:38:33.290337       1 csi-provisioner.go:112] Building kube configs for running in cluster...
I0915 10:38:36.401565       1 connection.go:151] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0915 10:38:36.403549       1 connection.go:261] Probing CSI driver for readiness
I0915 10:38:36.403614       1 connection.go:180] GRPC call: /csi.v1.Identity/Probe
I0915 10:38:36.403624       1 connection.go:181] GRPC request: {}
I0915 10:38:36.407791       1 connection.go:183] GRPC response: {"ready":{"value":true}}
I0915 10:38:36.409259       1 connection.go:184] GRPC error: <nil>
I0915 10:38:36.409284       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0915 10:38:36.409294       1 connection.go:181] GRPC request: {}
I0915 10:38:36.412205       1 connection.go:183] GRPC response: {"name":"csi.hetzner.cloud","vendor_version":"1.1.5"}
I0915 10:38:36.413428       1 connection.go:184] GRPC error: <nil>
I0915 10:38:36.413473       1 csi-provisioner.go:152] Detected CSI driver csi.hetzner.cloud
I0915 10:38:36.413535       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0915 10:38:36.413727       1 connection.go:181] GRPC request: {}
I0915 10:38:36.417803       1 connection.go:183] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0915 10:38:36.460950       1 connection.go:184] GRPC error: <nil>
I0915 10:38:36.460972       1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0915 10:38:36.460981       1 connection.go:181] GRPC request: {}
I0915 10:38:36.464584       1 connection.go:183] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0915 10:38:36.468865       1 connection.go:184] GRPC error: <nil>
I0915 10:38:36.469972       1 controller.go:621] Using saving PVs to API server in background
I0915 10:38:36.470352       1 controller.go:769] Starting provisioner controller csi.hetzner.cloud_hcloud-csi-controller-0_f90ff697-d7a4-11e9-ad08-0620f8431eee!
I0915 10:38:36.470712       1 volume_store.go:90] Starting save volume queue
I0915 10:38:36.470916       1 reflector.go:123] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803
I0915 10:38:36.470966       1 reflector.go:161] Listing and watching *v1.PersistentVolume from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803
I0915 10:38:36.470963       1 reflector.go:123] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806
I0915 10:38:36.470993       1 reflector.go:161] Listing and watching *v1.StorageClass from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806
I0915 10:38:36.471304       1 reflector.go:123] Starting reflector *v1.PersistentVolumeClaim (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800
I0915 10:38:36.471338       1 reflector.go:161] Listing and watching *v1.PersistentVolumeClaim from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800
I0915 10:38:36.570641       1 shared_informer.go:123] caches populated
# the next 2 lines are repeated with different IDs, removed to keep this small
I0915 10:38:36.571073       1 controller.go:979] Final error received, removing PVC 1d96eb1e-a6e6-493e-be67-aa02769f5594 from claims in progress
I0915 10:38:36.571293       1 controller.go:902] Provisioning succeeded, removing PVC 1d96eb1e-a6e6-493e-be67-aa02769f5594 from claims in progress
I0915 10:45:47.507851       1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803: Watch close - *v1.PersistentVolume total 0 items received
I0915 10:46:55.502540       1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800: Watch close - *v1.PersistentVolumeClaim total 1 items received
I0915 10:48:18.501216       1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806: Watch close - *v1.StorageClass total 0 items received
I0915 10:52:54.515268       1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803: Watch close - *v1.PersistentVolume total 0 items received

csi-node-driver-registrar

I0915 10:38:31.793335       1 main.go:110] Version: v1.1.0-0-g80a94421
I0915 10:38:31.793474       1 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0915 10:38:31.793522       1 connection.go:151] Connecting to unix:///csi/csi.sock
I0915 10:38:35.035721       1 main.go:127] Calling CSI driver to discover driver name
I0915 10:38:35.035807       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0915 10:38:35.035825       1 connection.go:181] GRPC request: {}
I0915 10:38:35.044697       1 connection.go:183] GRPC response: {"name":"csi.hetzner.cloud","vendor_version":"1.1.5"}
I0915 10:38:35.046671       1 connection.go:184] GRPC error: <nil>
I0915 10:38:35.046691       1 main.go:137] CSI driver name: "csi.hetzner.cloud"
I0915 10:38:35.046828       1 node_register.go:54] Starting Registration Server at: /registration/csi.hetzner.cloud-reg.sock
I0915 10:38:35.047106       1 node_register.go:61] Registration Server started at: /registration/csi.hetzner.cloud-reg.sock
I0915 10:38:35.418927       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0915 10:38:36.416927       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0915 10:38:36.562893       1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

hcloud-csi-driver

level=debug ts=2019-09-15T10:38:33.588757102Z msg="getting instance id from metadata service"
level=debug ts=2019-09-15T10:38:33.590703152Z msg="fetching server"
level=info ts=2019-09-15T10:38:34.070134005Z msg="fetched server" server-name=<my-node-name>
level=debug ts=2019-09-15T10:38:35.043748134Z component=grpc-server msg="handling request" req=
level=debug ts=2019-09-15T10:38:35.043910982Z component=grpc-server msg="finished handling request"

MapVolume.SetUp failed for volume: blockMapper.stageVolumeForBlock failed: "no mount capability"

I'm trying to set up Ceph via Rook, using pvcs with a StorageClass powered by hetznercloud/csi-driver, but Volumes get stuck between being attached and being mounted to a Pod.

NAMESPACE      NAME                                            READY   STATUS     RESTARTS   AGE
rook-hetzner   rook-ceph-osd-prepare-set1-0-data-wnvcq-mjrwc   0/1     Init:0/2   0          26m
rook-hetzner   rook-ceph-osd-prepare-set1-1-data-nc7cw-96czb   0/1     Init:0/2   0          26m
rook-hetzner   rook-ceph-osd-prepare-set1-2-data-mbzrb-ggftg   0/1     Init:0/2   0          26m

When I describe one of these pods, there are the following events:

Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               <unknown>           default-scheduler        Successfully assigned rook-hetzner/rook-ceph-osd-prepare-set1-2-data-mbzrb-ggftg to kube3
  Warning  FailedAttachVolume      17m (x4 over 17m)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef" : rpc error: code = Aborted desc = failed to publish volume: server is locked
  Normal   SuccessfulAttachVolume  17m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef"
  Warning  FailedMount             15m                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-ceph-osd-token-8rjhx rook-data rook-ceph-log udev set1-2-data-mbzrb-bridge ceph-conf-emptydir devices rook-binaries set1-2-data-mbzrb]: timed out waiting for the condition
  Warning  FailedMount             12m                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-data ceph-conf-emptydir set1-2-data-mbzrb rook-ceph-osd-token-8rjhx set1-2-data-mbzrb-bridge udev rook-binaries rook-ceph-log devices]: timed out waiting for the condition
  Warning  FailedMount             10m                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[udev rook-binaries set1-2-data-mbzrb-bridge rook-data rook-ceph-osd-token-8rjhx ceph-conf-emptydir rook-ceph-log devices set1-2-data-mbzrb]: timed out waiting for the condition
  Warning  FailedMount             8m17s               kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[set1-2-data-mbzrb set1-2-data-mbzrb-bridge rook-data ceph-conf-emptydir rook-ceph-log devices udev rook-ceph-osd-token-8rjhx rook-binaries]: timed out waiting for the condition
  Warning  FailedMount             6m2s                kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-ceph-log devices rook-binaries rook-ceph-osd-token-8rjhx set1-2-data-mbzrb rook-data ceph-conf-emptydir udev set1-2-data-mbzrb-bridge]: timed out waiting for the condition
  Warning  FailedMount             3m45s               kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-data ceph-conf-emptydir rook-ceph-log udev rook-ceph-osd-token-8rjhx set1-2-data-mbzrb-bridge devices rook-binaries set1-2-data-mbzrb]: timed out waiting for the condition
  Warning  FailedMount             90s                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[devices rook-ceph-osd-token-8rjhx set1-2-data-mbzrb set1-2-data-mbzrb-bridge rook-data ceph-conf-emptydir rook-ceph-log udev rook-binaries]: timed out waiting for the condition
  Warning  FailedMapVolume         31s (x16 over 16m)  kubelet, kube3           MapVolume.SetUp failed for volume "pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef" : kubernetes.io/csi: blockMapper.stageVolumeForBlock failed: rpc error: code = InvalidArgument desc = no mount capability

After a little digging, no mount capability comes from the file https://github.com/hetznercloud/csi-driver/blob/master/driver/node.go

My hetzner cloud k8s manifests installed are just the ones from the README, my CephCluster manifest is:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-hetzner
  namespace: rook-hetzner
spec:
  dataDirHostPath: /var/lib/rook
  network:
    hostNetwork: false
  placement:
    all:
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Equal
        effect: NoSchedule
  cephVersion:
    image: ceph/ceph:v14.2.4-20190917
  mon:
    count: 3
    allowMultiplePerNode: false
  storage:
   storageClassDeviceSets:
    - name: set1
      count: 3
      portable: true
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 30Gi
          # IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
          storageClassName: hcloud-volumes
          volumeMode: Block
          accessModes:
            - ReadWriteOnce

Involved PVs

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                            STORAGECLASS     REASON   AGE
pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef   30Gi       RWO            Delete           Bound    rook-hetzner/set1-2-data-mbzrb   hcloud-volumes            35m
pvc-ed978abc-56c7-4cfc-83f5-0201ceda53f3   30Gi       RWO            Delete           Bound    rook-hetzner/set1-0-data-wnvcq   hcloud-volumes            35m
pvc-f9433424-d211-454d-99fb-b90fa9891357   30Gi       RWO            Delete           Bound    rook-hetzner/set1-1-data-nc7cw   hcloud-volumes            35m

Involved PVCs

rook-hetzner   set1-0-data-wnvcq   Bound    pvc-ed978abc-56c7-4cfc-83f5-0201ceda53f3   30Gi       RWO            hcloud-volumes   35m
rook-hetzner   set1-1-data-nc7cw   Bound    pvc-f9433424-d211-454d-99fb-b90fa9891357   30Gi       RWO            hcloud-volumes   35m
rook-hetzner   set1-2-data-mbzrb   Bound    pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef   30Gi       RWO            hcloud-volumes   35m

Any ideas why the volume would have no mount capability?

The readme misses the step to apply the api key resource

In the readme the step to apply the api key resource is missing. This may confuse beginners.

AttachVolume.Attach: Don’t return internal error when server is locked

AttachVolume.Attach failed for volume "pvc-183d5148-ae93-4a7a-9b1a-e29cfa965411" : rpc error: code = Internal desc = failed to publish volume: cannot perform operation because server is locked (locked)

Should be a different error code which indicates the call should be retried.

Missing volume stats

I would like to see volume stats in Grafana/Prometheus but it is currently not implemented:

csi-driver/src/driver/node.go

Line 192 in ee4eb52

return nil, status.Error(codes.Unimplemented, "volume stats are not supported")

Is it possible to implement this?

I am running Kubernetes 1.15 and it also ends up in the runtime error, but that is related to a bug in Kubelet.

Sep 07 12:05:22 k8s-01 kubelet[1587]: E0907 12:05:22.976215    1587 runtime.go:73] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
Sep 07 12:05:22 k8s-01 kubelet[1587]: goroutine 2589 [running]:
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x3995be0, 0x7565dd0)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:69 +0x7b
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 +0x82
Sep 07 12:05:22 k8s-01 kubelet[1587]: panic(0x3995be0, 0x7565dd0)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /usr/local/go/src/runtime/panic.go:522 +0x1b5
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/pkg/kubelet/server/stats.(*volumeStatCalculator).parsePodVolumeStats(0xc000df8000, 0xc0012d37a0, 0x2c, 0xc000aade80, 0x0, 0xc0012d37a0, 0x2c, 0x0, 0x0, 0x0, ...)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:143 +0xf5
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/pkg/kubelet/server/stats.(*volumeStatCalculator).calcAndStoreStats(0xc000df8000)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:125 +0x753
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/pkg/kubelet/server/stats.(*volumeStatCalculator).StartOnce.func1.1()
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:65 +0x2a
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00134b9b0)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
Sep 07 12:05:22 k8s-01 kubelet[1587]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00134b9b0, 0xdf8475800, 0x3ff0000000000000, 0xc000a75d01, 0xc000c2ea80)
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
Sep 07 12:05:22 k8s-01 kubelet[1587]: created by k8s.io/kubernetes/pkg/kubelet/server/stats.(*volumeStatCalculator).StartOnce.func1
Sep 07 12:05:22 k8s-01 kubelet[1587]:         /workspace/anago-v1.15.3-beta.0.68+2d3c76f9091b6b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:64 +0x9d

Volume in different location than Pod

Hi, I am having an issue where the Volume seems to be created in one region and then the pod asking for it in a different region.
Here is the error:

desc = failed to publish volume: server and volume must be in the same location

I have a cluster with workers in different region, is this something the driver is not compatible with?

A bit of help with RKE

Hi, I'm super glad you guys made this library, I have been having to run the external provisioners and ansible scripts XD

I'm running the latest version of rke with the below config, and this the error I'm getting, even though rbac looks fine to me.

ssh_agent_auth: true
ingress:
  provider: none
network:
  plugin: calico
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h
  kube-controller:
    extra_args:
      cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
      cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem

  kube-api:
    extra-args:
      feature-gates: "CSINodeInfo=true,CSIDriverRegistry=true,MountPropagation=true"

  kubelet:
    extra_args:
      volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      feature-gates: "CSINodeInfo=true,CSIDriverRegistry=true,MountPropagation=true"
    extra_binds:
      - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec
      - /mnt:/mnt:rshared
      - "/var/lib/kubelet/plugins:/var/lib/kubelet/plugins"
      - "/var/lib/kubelet/plugins_registry:/var/lib/kubelet/plugins_registry"
      - "/var/lib/kubelet/pods:/var/lib/kubelet/pods:shared,z"

&RegistrationStatus{PluginRegistered:false,Error:plugin registration failed with err: error uninstalling CSI driver from CSINodeInfo object error updating

 CSINodeInfo: timed out waiting for the condition; caused by: [csinodeinfos.csi.storage.k8s.io "116.203.120.174" is forbidden: 

User "system:node" cannot get resource "csinodeinfos" in API group "csi.storage.k8s.io" at the cluster scope

CSI Spaming

Hi All, not sure if this problem related to csi-driver for Hetzner Cloud, but maybe you can point me to correct direction.
After enabling FGs i seeing constant spam in kubelet:

Mar 11 22:55:40 k8s-dev-master-01 kubelet[26459]: E0311 22:55:40.958690   26459 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1alpha1.CSIDriver: csidrivers.csi.storage.k8s.io is forbidden: User "system:node:k8s-dev-master-01" cannot list resource "csidrivers" in API group "csi.storage.k8s.io" at the cluster scope
Mar 11 22:55:41 k8s-dev-master-01 kubelet[26459]: E0311 22:55:41.963211   26459 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1alpha1.CSIDriver: csidrivers.csi.storage.k8s.io is forbidden: User "system:node:k8s-dev-master-01" cannot list resource "csidrivers" in API group "csi.storage.k8s.io" at the cluster scope
Mar 11 22:55:42 k8s-dev-master-01 kubelet[26459]: E0311 22:55:42.966554   26459 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1alpha1.CSIDriver: csidrivers.csi.storage.k8s.io is forbidden: User "system:node:k8s-dev-master-01" cannot list resource "csidrivers" in API group "csi.storage.k8s.io" at the cluster scope

In same time driver work and i'm able to create pvc and attack pv to pod without issues.

Thanks in advance.