googlecloudplatform / gcs-fuse-csi-driver Goto Github PK

The Google Cloud Storage FUSE Container Storage Interface (CSI) Plugin.

License: Apache License 2.0

Makefile 1.88% Dockerfile 0.98% Go 91.75% Python 1.66% Shell 3.73%

gcs-fuse-csi-driver's Introduction

Google Cloud Storage FUSE CSI Driver

The Google Cloud Storage FUSE Container Storage Interface (CSI) Plugin.

WARNING: Manual deployment of this driver to your GKE cluster is not recommended. Instead users should use GKE to automatically deploy and manage the CSI driver as an add-on feature. See the GKE documentation Access Cloud Storage buckets with the Cloud Storage FUSE CSI driver.

DISCLAIMER: Manual deployment of the driver to your cluster is not officially supported by Google.

Project Overview

Filesystem in Userspace (FUSE) is an interface used to export a filesystem to the Linux kernel. Cloud Storage FUSE allows you to mount Cloud Storage buckets as a file system so that applications can access the objects in a bucket using common File IO operations (e.g. open, read, write, close) rather than using cloud-specific APIs.

The Google Cloud Storage FUSE CSI Driver lets you use the Kubernetes API to mount pre-existing Cloud Storage buckets as volumes which are consumable from a Pod. Your applications can upload and download objects using Cloud Storage FUSE file system semantics.

The driver natively supports the following ways for you to configure your Cloud Storage buckets-backed volumes:

CSI ephemeral volumes: You specify the Cloud Storage buckets bucket in-line with the Pod specification. To learn more about this volume type, see the CSI ephemeral volumes overview in the open source Kubernetes documentation.
Static provisioning: You create a PersistentVolume resource that refers to the Cloud Storage buckets bucket. Your Pod can then reference a PersistentVolumeClaim that is bound to this PersistentVolume. To learn more about this workflow, see Configure a Pod to Use a PersistentVolume for Storage.

Currently, the driver does not support Dynamic Volume Provisioning.

Benefits

The Cloud Storage FUSE CSI driver on your cluster turns on automatic deployment and management of the driver. The driver works on both GKE Standard and Autopilot clusters. To leverage this benefit, you need to use GKE to automatically deploy and manage the CSI driver as a add-on feature. See the GKE documentation Access Cloud Storage buckets with the Cloud Storage FUSE CSI driver.
The Cloud Storage FUSE CSI driver does not need privileged access that is typically required by FUSE clients. This enables a better security posture.
The Cloud Storage FUSE CSI driver allows applications to access data stored in Cloud Storage buckets using file system semantics.
The Cloud Storage FUSE CSI driver supports the ReadWriteMany, ReadOnlyMany, and ReadWriteOnce access modes.
You can use GKE Workload Identity to easily manage authentication while having granular control over how your Pods access Cloud Storage buckets objects.
Many AI/ML/Batch workloads store data in Cloud Storage buckets. The Cloud Storage FUSE CSI driver enables GKE customers running ML training and serving workloads using frameworks like Ray, PyTorch, Spark, and TensorFlow to run their workloads directly on a GKE cluster without requiring any change to the code. This provides portability and simplicity with file semantics.

Project Status

Status: General Availability

GKE Compatibility

Refer to the Google Cloud Storage FUSE CSI Driver Release Notes.

Get Started

Development and Contribution

Refer to the Cloud Storage FUSE CSI Driver Development Guide.

Attribution

This project is inspired by the following open source projects:

Google Cloud Filestore CSI Driver by the Kubernetes authors
Azure Blob Storage CSI Driver by the Kubernetes authors
Kubernetes CSI driver for Google Cloud Storage by Ofek Lev

References

gcs-fuse-csi-driver's People

Contributors

Stargazers

Watchers

gcs-fuse-csi-driver's Issues

Fail to mount the PV when using Anthos Service Mesh

Hello,

I'm encoutering issue when mounting a bucket as a PV with Anthos Service Mesh. Please find the following yaml at the end of the issue. It works perfectly fine when istio injection is disabled.

  Type     Reason       Age              From               Message
  ----     ------       ----             ----               -------
  Normal   Scheduled    14s              default-scheduler  Successfully assigned nginx/nginx-d576dc799-6dmvs to xxxxxxxxxxx
  Normal   Pulled       11s              kubelet            Container image "gcr.io/gke-release/asm/proxyv2:1.15.7-asm.8" already present on machine
  Normal   Created      11s              kubelet            Created container istio-init
  Normal   Started      11s              kubelet            Started container istio-init
  Normal   Pulled       10s              kubelet            Container image "gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.3-gke.0@sha256:854e1aa1178dc3f7e3ec5fa03cea5e32f0385ff6230efd836a22e86beb876740" already present on machine
  Normal   Created      10s              kubelet            Created container gke-gcsfuse-sidecar
  Normal   Started      9s               kubelet            Started container gke-gcsfuse-sidecar
  Warning  Failed       2s               kubelet            Error: failed to generate container "77ccfad98f48aa01e248fed7e7a444e14a348b06bc55531a158a14462c4b406e" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
  Normal   Pulled       2s               kubelet            Container image "gcr.io/gke-release/asm/proxyv2:1.15.7-asm.8" already present on machine
  Normal   Created      2s               kubelet            Created container istio-proxy
  Normal   Started      2s               kubelet            Started container istio-proxy
  Warning  Unhealthy    1s               kubelet            Readiness probe failed: Get "http://100.64.128.58:15021/healthz/ready": dial tcp 100.64.128.58:15021: connect: connection refused
  Warning  Failed       1s               kubelet            Error: failed to generate container "8c309e092fd45b084460c54349deff6d01e55bfd8b4db97e5041032dc3a10bca" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
  Normal   Pulled       0s (x3 over 9s)  kubelet            Container image "nginx:1.14.2" already present on machine
  Warning  FailedMount  0s (x2 over 1s)  kubelet            MountVolume.SetUp failed for volume "gcs-fuse-csi-pv" : rpc error: code = Internal desc = the sidecar container failed with error: mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
gcsfuse exited with error: exit status 1
  Warning  Failed  0s  kubelet  Error: failed to generate container "5b5229a1b7ccaf54885b2dbbe34b1ec0d41e42d783934c30e49c2b7e816019eb" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 5Gi
  storageClassName: static-files-bucket
  claimRef:
    namespace: nginx
    name: gcs-fuse-csi-static-pvc
  mountOptions:
    - implicit-dirs
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: my-bucket
    readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc
  namespace: nginx
spec:
  accessModes:
  - ReadOnlyMany
  resources:
    requests:
      storage: 5Gi
  volumeName: gcs-fuse-csi-pv
  storageClassName: static-files-bucket
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx
  namespace: nginx
  annotations:
    iam.gke.io/gcp-service-account: nginx-gcs@{PROJECT_ID}.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      annotations:
        gke-gcsfuse/volumes: "true"
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - name: gcs-fuse-csi-static
          mountPath: /data
          readOnly: true
      serviceAccountName: nginx
      volumes:
      - name: gcs-fuse-csi-static
        persistentVolumeClaim:
          claimName: gcs-fuse-csi-static-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: nginx
spec:
  ports:
  - name: http
    port: 80
  selector:
    app: nginx
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: googleapi
  namespace: nginx
spec:
  hosts:
  - googleapis.com
  location: MESH_EXTERNAL
  ports:
  - name: https
    number: 443
    protocol: HTTPS
  resolution: DNS

Multiple PVs referring to the same bucket cannot be consumed by one Pod

Symptom

When multiple PVs are consuming the same bucket via the field volumeHandle, and the PVCs bound to these PVs are consumed by the same Pod, the volume mount will time out.

For example, the following Pod will be stuck in the volume mount stage.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc-1
spec:
  ...
  volumeName: gcs-fuse-csi-pv-1
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv-1
spec:
  ...
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: same-bucket-name
    readOnly: false
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc-2
spec:
  ...
  volumeName: gcs-fuse-csi-pv-2
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv-2
spec:
  ...
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: same-bucket-name
    readOnly: false
---
apiVersion: v1
kind: Pod
metadata:
  name: gcs-fuse-csi-example
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  serviceAccountName: gcs-csi
  containers:
    ...
  volumes:
  - name: gcs-fuse-csi-static-1
    persistentVolumeClaim:
      claimName: gcs-fuse-csi-static-pvc-1
  - name: gcs-fuse-csi-static-2
    persistentVolumeClaim:
      claimName: gcs-fuse-csi-static-pvc-2

Root Cause

According to the kubelet code: https://github.com/kubernetes/kubernetes/blob/8f15859afc9cfaeb05d4915ffa204d84da512094/pkg/kubelet/volumemanager/cache/desired_state_of_world.go#L296-L298

For non-attachable and non-device-mountable volumes, generate a unique name based on the pod namespace and name and the name of the volume within the pod.

In the case of using a CSI driver and a pre-provisioned PV, the volume name is specified via the volumeHandle. Different PVs will be treated as the same volume, therefore after kubelet mounts one of the PVs, the other PVs will be treated as already mounted. As a result, the Pod will be stuck in volume mount stage.

Solution

If for some reason, your Pod need to consume multiple volumes pointing to the same bucket, please consider using the following two approaches:

CSI ephemeral inline volume

Please use CSI ephemeral inline volume to configure Pod if the Pod needs to mount the same bucket to different mount paths.

For example, the following Pod has two volumes referring to the same bucket same-bucket-name, but the volumes are mounted to different mount paths using different mount options.

apiVersion: v1
kind: Pod
metadata:
  name: gcs-fuse-csi-example
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  serviceAccountName: gcs-csi
  containers:
    - image: busybox
      name: busybox
      command: ["sleep"]
      args: ["infinity"]
      volumeMounts:
      - name: gcs-fuse-csi-inline-1
        mountPath: "/upload"
      - name: gcs-fuse-csi-inline-2
        mountPath: "/download"
  volumes:
  - name: gcs-fuse-csi-inline-1
    csi:
      driver: gcsfuse.csi.storage.gke.io
      volumeAttributes:
        bucketName: same-bucket-name
        mountOptions: "debug_fuse,debug_fs,debug_gcs,implicit-dirs,only-dir=upload"
  - name: gcs-fuse-csi-inline-2
    csi:
      driver: gcsfuse.csi.storage.gke.io
      volumeAttributes:
        bucketName: same-bucket-name
        mountOptions: "debug_fuse,debug_fs,debug_gcs,implicit-dirs,only-dir=download"

SubPath feature

You can use PV/PVC with the subPath feature. For example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc
spec:
  ...
  volumeName: gcs-fuse-csi-pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  ...
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: my-bucket-name
    readOnly: false
---
apiVersion: v1
kind: Pod
metadata:
  name: gcs-fuse-csi-example
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  serviceAccountName: gcs-csi
  containers:
  - name: busybox
    image: busybox
    volumeMounts:
    - name: gcs-fuse-csi-static
      mountPath: /log-data
      subPath: "data/log"
    - name: gcs-fuse-csi-static
      mountPath: /config-data
      subPath: "data/config"
  volumes:
  - name: gcs-fuse-csi-static
    persistentVolumeClaim:
      claimName: gcs-fuse-csi-static-pvc

add e2e tests for SA setup

Test the SA setup using the command gcloud storage buckets add-iam-policy-binding.

Resource limitation for the sidecar container on Autopilot

Looking at the default pytorch example in this repository I see some performance incompatibilities with the minimum autopilot resources request[1].
I think that we will have many problem allocating sidecar resources if we have these high min limits in autopilot.

gcs-fuse-csi-driver/examples/pytorch/train-job-pytorch.yaml

Lines 35 to 39 in 8a8d871

 annotations: 

 gke-gcsfuse/volumes: "true" 

 gke-gcsfuse/cpu-limit: "10" 

 gke-gcsfuse/memory-limit: 40Gi 

 gke-gcsfuse/ephemeral-storage-limit: 20Gi

[1]https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests

improve the documentation

Clearly state that users need to start with the GKE doc.
Update the example applications.
Be more specific on the SA role.
Sync the doc with the GKE doc.
Talk about internal errors in the common issues section.
Be more specific on how to troubleshoot and check the errors.
Be more specific when the WI fails to work on TSG.

Remove the requirement of storage.buckets.get permission from the CSI driver

The CSI driver only needs the storage.objects.list permission to make sure the bucket exists.

After the change, users will only need the follow permissions:

For read-only workloads: roles/storage.objectViewer.
For read-write workloads: roles/storage.objectAdmin.

The issues is fixed by the commit 8c6164c

The fix will be included in the next release v0.1.4.

The doc will be updated before the release.

The sidecar container does not work well with istio-proxy sidecar container

Symptom

If the gcsfuse sidecar container starts before the istio-proxy sidecar container, the gcsfuse will fail with the following error:

mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

Root cause

The same as #46 (comment).

Solution

This issue is tracked on GitHub, and we are waiting for the Kubernetes sidecar container feature available on GKE to ultimately solve this issue.

Long running pods time out accessing cloud storage volume OSError: [Errno 107] Transport endpoint is not connected

I have a use case involving the gcs-fuse-csi-driver storing logs from airflow workers. If I have a pod that runs for 24 hours or so, over time, it starts erroring on mkdir calls to the volume:

OSError: [Errno 107] Transport endpoint is not connected: '/opt/airflow/logbucket/dag_id=xxx/run_id=scheduled__2023-07-31T00:00:00+00:00/task_id=begin_drop_day_driving_table'"

and

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/pathlib.py", line 1116, in mkdir
    os.mkdir(self, mode)
OSError: [Errno 107] Transport endpoint is not connected: '/opt/airflow/logbucket/dag_id=xxx/run_id=scheduled__2023-07-31T00:00:00+00:00/task_id=begin_drop_day_driving_table'"

Until my pod eventually fails and restarts. I tried looking for clues to this behavior but all I could see was a few people complaining about gcs-fuse having similar issues after running for a long time. Resource (cpu/ram) consumption is minimal from the pod. Any ideas what could be causing this? This is happening nightly.

I have the volume mounted using a CSI ephemeral volume if that helps.

Unable to mount on initContainer

Using static provisioning PVC in init containers causes pod initialization to hang, and eventually getting Init:CreateContainerError.

pod init container spec

  initContainers:
  - image: ubuntu:latest
    command:
    - "echo"
    - "12345"
    imagePullPolicy: Always
    name: initer
    resources:
      requests:
        memory: "2Gi"
        cpu: "1000m"
        ephemeral-storage: "10Gi"
    volumeMounts:
    - name: gcs-fuse-csi-pvc
      mountPath: /data
      readOnly: true

k8s events

Events:
  Type     Reason       Age                  From                                   Message
  ----     ------       ----                 ----                                   -------
  Normal   Scheduled    2m31s                gke.io/optimize-utilization-scheduler  Successfully assigned test/my_pod to gk3-test-pool-2-3d6bf18a-xnl2
  Warning  FailedMount  28s                  kubelet                                Unable to attach or mount volumes: unmounted volumes=[gcs-fuse-csi-pvc], unattached volumes=[gcs-fuse-csi-pvc kube-api-access-8r69w gke-gcsfuse-tmp]: timed out waiting for the condition
  Warning  FailedMount  23s (x9 over 2m31s)  kubelet                                MountVolume.MountDevice failed for volume "gcs-fuse-csi-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name gcsfuse.csi.storage.gke.io not found in the list of registered CSI drivers

Dirver version
Running Google Cloud Storage FUSE CSI driver sidecar mounter version v0.1.3-gke.0

GKE
1.26.3-gke.1000

As sidecar is needed for normal containers to access the bucket, I believe this issue arise due to init container not having a sidecar to support mounting.

Enable Workload Identity using Terraform

We can see the bucket data in the volume mount inside our container. However, if we try to write a file we get this error: Access denied.

The container that uses the mount is run as root user.

Those are the logs from the sidecar container:

D0628 12:12:35.054078 gcs: Req       0x73: <- ListObjects("a.txt/")

D0628 12:12:35.054161 gcs: Req       0x74: <- StatObject("a.txt")

D0628 12:12:35.074384 gcs: Req       0x73: -> ListObjects("a.txt/") (20.31856ms): OK

D0628 12:12:35.096392 gcs: Req       0x74: -> StatObject("a.txt") (42.316629ms): gcs.NotFoundError: storage: object doesn't exist

D0628 12:12:35.096420 debug_fs: LookUpInode(1, "a.txt"): no such file or directory

D0628 12:12:35.096449 fuse_debug: Op 0x0000030a    connection.go:500] -> Error: "no such file or directory"

D0628 12:12:35.096562 fuse_debug: Op 0x0000030c    connection.go:416] <- CreateFile (parent 1, name "a.txt", PID 0)

D0628 12:12:35.096648 gcs: Req       0x75: <- CreateObject("a.txt")

D0628 12:12:35.110979 gcs: Req       0x75: -> CreateObject("a.txt") (14.329342ms): error in closing writer : googleapi: Error 403: Access denied., forbidden

D0628 12:12:35.110999 debug_fs: CreateFile(1, "a.txt"): CreateChildFile: error in closing writer : googleapi: Error 403: Access denied., forbidden

E0628 12:12:35.111007 CreateFile: permission denied, CreateChildFile: error in closing writer : googleapi: Error 403: Access denied., forbidden

D0628 12:12:35.111039 fuse_debug: Op 0x0000030c    connection.go:500] -> Error: "permission denied"

E0628 12:12:35.111044 fuse: *fuseops.CreateFileOp error: permission denied

I0628 12:12:56.048706 Starting a garbage collection run.

We followewd the volume mount ephemeral configuration.

@songjiaxun

Synchronization Issue between gcsfuse and Kubernetes Pod: 'No Such File or Directory' Error on File Update

I am encountering a synchronization issue between gcsfuse and a pod in Kubernetes environment. When I update the files in the Google Cloud Storage (GCS) bucket mounted by gcsfuse, the gcs fuse sidecar fails to access the updated files and throws an error.

Error Logs:

fuse: *fuseops.ReadFileOp error: no such file or directory
ReadFile: no such file or directory, fh.reader.ReadAt: startRead: NewReader: storage: object doesn't exist

Configuration:

GKE Pod: Mounted GCS bucket using gcsfuse with the following configuration:
mountOptions: 'uid=101,gid=82'

Kubernetes Job: Updates the contents of the GCS bucket by replacing the existing files. Names are not changed.

Observations and Troubleshooting Steps Taken:

Verified that the files are successfully uploaded to the GCS bucket by the Kubernetes Job.
Confirmed that the gcsfuse volume is correctly mounted in the pod.
Ensured consistent file naming between the updated files and the files accessed by the GKE Pod.
Verified file system permissions and confirmed that the uid and gid specified in the mountOptions match the permissions required by the Pod.
Tried setting stat-cache-ttl=0, type-cache-ttl=0, implicit-dirs in the mount options

Anything that I'm missing here?

docs: state that `kustomize` should be installed

State in docs here that kustomize should be installed. Otherwise, make command will fail at these lines.

insufficient ephemeral storage in static mountning mode

While mounting buckets in a static way one has to be aware of ephemeral storage default value.

Case: GKE Node with 2G RAM. Mounting in static mode (PV/PVC) 10Gi. Annotations: gke-gcsfuse/volumes: "true"
With the above setup it will not work - error with insufficient ephemeral storage appears because the default ephermeral value is 5Gi which is greater than k8s ephemeral volume limit which is 1/2 of node RAM.

Workaround: add annotations: gke-gcsfuse/ephemeral-storage-limit: 100Mi (or other arbitrary value less than 1/2 of node RAM)

Is it possible to host the csi side car image locally ?

Sidecar not working with workload identity

We have an app that is trying to use GCSFuse CSI driver to connect a cloud storage bucket as an ephemeral volume.

We see this error when the Pod is initializing

FailedMount
MountVolume.SetUp failed for volume "gcsfuse-csi-volume" : rpc error: code = Unauthenticated desc = failed to prepare storage service: storage service manager failed to setup service: context deadline exceeded

On investigating the logs further we are seeing this for the CSI driver logs

{
  "insertId": "q8c6r58br9tgttx3",
  "jsonPayload": {
    "message": "/csi.v1.Node/NodePublishVolume called with request: volume_id:\"csi-893e225a821a4110581b7d0e775c6b3275fdc2f2b37bde3bbf897254c2e682cc\" target_path:\"/var/lib/kubelet/pods/e3aeade0-f02a-499b-91e0-737e037ed426/volumes/kubernetes.io~csi/gcsfuse_csi_volume/mount\" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"bucketName\" value:\"gcsfuse\" > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"true\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"gcsfuse_csi-test-7979c4764-z5pbh\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"gcsfuse-ns\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"e3aeade0-f02a-499b-91e0-737e037ed426\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"gcsfuse-sa\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.tokens\" value:\"***stripped***\" > volume_context:<key:\"mountOptions\" value:\"implicit-dirs,debug_fuse,debug_fs,debug_gcs\" > ",
    "pid": "1"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "cluster_name": "project_1_cluster_1",
      "pod_name": "gcsfusecsi-node-scx67",
      "location": "us-east1",
      "container_name": "gcs-fuse-csi-driver",
      "project_id": "project_1",
      "namespace_name": "kube-system"
    }
  },
  "timestamp": "2023-08-22T19:41:22.787536755Z",
  "severity": "INFO",
  "labels": {
    "compute.googleapis.com/resource_name": "project_1_cluster_1_nodepool_1",
    "k8s-pod/controller-revision-hash": "76cf56d8c8",
    "k8s-pod/pod-template-generation": "1",
    "k8s-pod/k8s-app": "gcs-fuse-csi-driver"
  },
  "logName": "projects/project_1/logs/stderr",
  "sourceLocation": {
    "file": "utils.go",
    "line": "83"
  },
  "receiveTimestamp": "2023-08-22T19:41:22.872838087Z"
}

{
  "insertId": "063x361d1ld2tskm",
  "jsonPayload": {
    "message": "error fetching initial token: GCP service account token fetch error: fetch GCP service account token error: rpc error: code = PermissionDenied desc = Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).",
    "pid": "1"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "pod_name": "gcsfusecsi-node-scx67",
      "location": "us-east1",
      "container_name": "gcs-fuse-csi-driver",
      "project_id": "project_1",
      "cluster_name": "project_1_cluster_1",
      "namespace_name": "kube-system"
    }
  },
  "timestamp": "2023-08-22T19:31:14.678976822Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/pod-template-generation": "1",
    "compute.googleapis.com/resource_name": "project_1_cluster_1_nodepool_1",
    "k8s-pod/k8s-app": "gcs-fuse-csi-driver",
    "k8s-pod/controller-revision-hash": "76cf56d8c8"
  },
  "logName": "projects/project_1/logs/stderr",
  "sourceLocation": {
    "file": "storage.go",
    "line": "71"
  },
  "receiveTimestamp": "2023-08-22T19:31:17.875220538Z"
}

I see that the csi driver is initializing in the kube-system namespace. But my app is in another namespace. We are using workload identity in the namespace for the app. I can see from the initialization that it knows what the service account is and what SA account to use, but not sure why I am getting the PermissionsDenied message

emphermal storage for very large tar ~100GB

I have a local folder (backup) of ~100GB files , if i directly tar the folder onto bucket e.g tar -cf /tmp/bucketmount/backup.tar /backup/ , will there be any issues with csi driver ? I see that gcsfuse csi depends on tempDir{} or some temp directore for staging files before they are uploaded to bucket.

rpc error: code = Internal desc = the sidecar container terminated due to ContainerStatusUnknown

I'm using this driver to mount a gcs bucket to a local folder in the pod and I'm seeing the error below:

kubelet  MountVolume.SetUp failed for volume "XXX" : rpc error: code = Internal desc = the sidecar container terminated due to ContainerStatusUnknown

I'm running a GKE cluster and this issue happens only on pods running on preemptible nodes that are restarted due to node replacement.

Driver Not Working

Hi again! I am trying to debug why my pods are stuck in pending. Here is a bunch of logs / configs that might shed some light.

Driver logs seem OK:

$ kubectl logs -n gcs-fuse-csi-driver gcs-fuse-csi-driver-webhook-569899b854-w9sm7 -f

I0407 21:56:37.409773       1 main.go:54] Running Google Cloud Storage FUSE CSI driver admission webhook version v0.1.2-0-gd9e3bdd, sidecar container image jiaxun/gcs-fuse-csi-driver-sidecar-mounter:v0.1.2-0-gd9e3bdd
I0407 21:56:37.409994       1 metrics.go:89] Emit component_version metric with value v999.999.999
I0407 21:56:37.410024       1 main.go:71] Setting up manager.
I0407 21:56:37.410320       1 metrics.go:68] Metric server listening at ":22032"
I0407 21:56:38.216399       1 main.go:90] Setting up webhook server.
I0407 21:56:38.216461       1 main.go:95] Registering webhooks to the webhook server.
I0407 21:56:38.216703       1 main.go:103] Starting manager.
I0407 22:25:54.297692       1 mutatingwebhook.go:109] mutating Pod: Name "", GenerateName "flux-sample-0-", Namespace "flux-operator", CPU limit "250m", memory limit "256Mi", ephemeral storage limit "5Gi"
I0407 22:25:54.404581       1 mutatingwebhook.go:109] mutating Pod: Name "", GenerateName "flux-sample-1-", Namespace "flux-operator", CPU limit "250m", memory limit "256Mi", ephemeral storage limit "5Gi"

I'm not sure if this message about "cannot create temp dir" with the read only file system is the bug...

$ kubectl logs -n gcs-fuse-csi-driver gcsfusecsi-node-bxxwm -c gcs-fuse-csi-driver

I0407 21:56:31.496608       1 clientset.go:51] using in-cluster kubeconfig
I0407 21:56:31.499001       1 metadata.go:51] got empty identityPool, constructing the identityPool using projectID
I0407 21:56:31.499026       1 metadata.go:56] got empty identityProvider, constructing the identityProvider using the gke-metadata-server flags
I0407 21:56:31.510894       1 mount_linux.go:275] Cannot create temp dir to detect safe 'not mounted' behavior: mkdir /tmp/kubelet-detect-safe-umount3354328308: read-only file system
I0407 21:56:31.510952       1 gcs_fuse_driver.go:110] Enabling volume access mode: SINGLE_NODE_WRITER
I0407 21:56:31.510985       1 gcs_fuse_driver.go:110] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0407 21:56:31.510995       1 gcs_fuse_driver.go:110] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0407 21:56:31.511001       1 gcs_fuse_driver.go:110] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0407 21:56:31.511037       1 gcs_fuse_driver.go:110] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0407 21:56:31.511171       1 main.go:112] Running Google Cloud Storage FUSE CSI driver version v0.1.2-0-gd9e3bdd, sidecar container image jiaxun/gcs-fuse-csi-driver-sidecar-mounter:v0.1.2-0-gd9e3bdd
I0407 21:56:31.511187       1 gcs_fuse_driver.go:190] Running driver: gcsfuse.csi.storage.gke.io
I0407 21:56:31.511334       1 server.go:75] Start listening with scheme unix, addr /csi/csi.sock
I0407 21:56:31.511620       1 server.go:97] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0407 21:56:38.483150       1 utils.go:82] /csi.v1.Identity/GetPluginInfo called with request: 
I0407 21:56:38.483173       1 utils.go:87] /csi.v1.Identity/GetPluginInfo succeeded with response: name:"gcsfuse.csi.storage.gke.io" vendor_version:"v0.1.2-0-gd9e3bdd" 
I0407 21:56:39.164273       1 utils.go:82] /csi.v1.Node/NodeGetInfo called with request: 
I0407 21:56:39.164345       1 utils.go:87] /csi.v1.Node/NodeGetInfo succeeded with response: node_id:"gke-flux-cluster-default-pool-a53eb99b-55kb"

My pod and the sidecar container created for it have no logs (in pending):

$ kubectl logs -n flux-operator flux-sample-0-x265q -c flux-sample -f
$ kubectl logs -n flux-operator flux-sample-0-x265q -c gke-gcsfuse-sidecar -f

Stuck in pending:

kubectl get -n flux-operator pods
NAME                         READY   STATUS      RESTARTS   AGE
flux-sample-0-x265q          0/2     Pending     0          4m47s
flux-sample-1-nwjwx          0/2     Pending     0          4m47s
flux-sample-cert-generator   0/1     Completed   0          4m47s

The PVC seems OK, it's waiting:

$ kubectl get -n flux-operator pvc
NAME   STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
data   Pending                                      gcs-fuse-class   18m

$ kubectl describe -n flux-operator pvc
Name:          data
Namespace:     flux-operator
StorageClass:  gcs-fuse-class
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: gcsfuse.csi.storage.gke.io
               volume.kubernetes.io/selected-node: gke-flux-cluster-default-pool-a53eb99b-55kb
               volume.kubernetes.io/storage-provisioner: gcsfuse.csi.storage.gke.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       flux-sample-0-x265q
               flux-sample-1-nwjwx
Events:
  Type    Reason                Age                    From                         Message
  ----    ------                ----                   ----                         -------
  Normal  WaitForFirstConsumer  8m32s (x42 over 18m)   persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ExternalProvisioning  3m32s (x21 over 8m2s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "gcsfuse.csi.storage.gke.io" or manually created by system administrator

This is how I created it:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: data
  namespace: flux-operator
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: gcs-fuse-class

Here is a pending pod:

$ kubectl describe -n flux-operator pods flux-sample-0-x265q 
Name:           flux-sample-0-x265q
Namespace:      flux-operator
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/name=flux-sample
                controller-uid=a4528f66-7a18-45ba-866a-5e9bfecf7a48
                job-name=flux-sample
                namespace=flux-operator
Annotations:    batch.kubernetes.io/job-completion-index: 0
                container.seccomp.security.alpha.kubernetes.io/gke-gcsfuse-sidecar: runtime/default
                gke-gcsfuse/volumes: true
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  Job/flux-sample
Containers:
  gke-gcsfuse-sidecar:
    Image:      jiaxun/gcs-fuse-csi-driver-sidecar-mounter:v0.1.2-0-gd9e3bdd
    Port:       <none>
    Host Port:  <none>
    Args:
      --v=5
    Limits:
      cpu:                250m
      ephemeral-storage:  5Gi
      memory:             256Mi
    Requests:
      cpu:                250m
      ephemeral-storage:  5Gi
      memory:             256Mi
    Environment:          <none>
    Mounts:
      /gcsfuse-tmp from gke-gcsfuse-tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qxmjl (ro)
  flux-sample:
    Image:      ghcr.io/rse-ops/atacseq:app-latest
    Port:       5000/TCP
    Host Port:  0/TCP
    Command:
      /bin/bash
      /flux_operator/wait-0.sh
      
    Environment:
      JOB_COMPLETION_INDEX:   (v1:metadata.annotations['batch.kubernetes.io/job-completion-index'])
    Mounts:
      /etc/flux/config from flux-sample-flux-config (ro)
      /flux_operator/ from flux-sample-entrypoint (ro)
      /mnt/curve/ from flux-sample-curve-mount (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qxmjl (ro)
      /workflow from data (rw)
Volumes:
  gke-gcsfuse-tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  flux-sample-flux-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-sample-flux-config
    Optional:  false
  flux-sample-entrypoint:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-sample-entrypoint
    Optional:  false
  flux-sample-curve-mount:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-sample-curve-mount
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data
    ReadOnly:   false
  kube-api-access-qxmjl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

The CSIDriver

$ kubectl describe -n gcs-fuse-csi-driver CSIDriver gcsfuse.csi.storage.gke.io 
Name:         gcsfuse.csi.storage.gke.io
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  storage.k8s.io/v1
Kind:         CSIDriver
Metadata:
  Creation Timestamp:  2023-04-07T21:56:26Z
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        f:attachRequired:
        f:fsGroupPolicy:
        f:podInfoOnMount:
        f:requiresRepublish:
        f:storageCapacity:
        f:tokenRequests:
        f:volumeLifecycleModes:
          .:
          v:"Ephemeral":
          v:"Persistent":
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2023-04-07T21:56:26Z
  Resource Version:  19691
  UID:               5c8396d0-e4bc-41ff-865b-27cd71ad0c02
Spec:
  Attach Required:     false
  Fs Group Policy:     ReadWriteOnceWithFSType
  Pod Info On Mount:   true
  Requires Republish:  true
  Storage Capacity:    false
  Token Requests:
    Audience:  llnl-flux.svc.id.goog
  Volume Lifecycle Modes:
    Persistent
    Ephemeral
Events:  <none>

The deployment

r$ kubectl describe -n gcs-fuse-csi-driver Deployment
Name:                   gcs-fuse-csi-driver-webhook
Namespace:              gcs-fuse-csi-driver
CreationTimestamp:      Fri, 07 Apr 2023 15:56:25 -0600
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=gcs-fuse-csi-driver-webhook
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:       app=gcs-fuse-csi-driver-webhook
  Annotations:  seccomp.security.alpha.kubernetes.io/pod: runtime/default
  Containers:
   gcs-fuse-csi-driver-webhook:
    Image:       jiaxun/gcs-fuse-csi-driver-webhook:v0.1.2-0-gd9e3bdd
    Ports:       22030/TCP, 22031/TCP, 22032/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Args:
      --sidecar-cpu-limit=250m
      --sidecar-memory-limit=256Mi
      --sidecar-ephemeral-storage-limit=5Gi
      --sidecar-image=$(SIDECAR_IMAGE)
      --sidecar-image-pull-policy=$(SIDECAR_IMAGE_PULL_POLICY)
      --cert-dir=/etc/tls-certs
      --port=22030
      --health-probe-bind-address=:22031
      --http-endpoint=:22032
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Liveness:  http-get http://:22031/readyz delay=30s timeout=15s period=30s #success=1 #failure=3
    Environment:
      SIDECAR_IMAGE_PULL_POLICY:  IfNotPresent
      SIDECAR_IMAGE:              <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
      GKE_GCSFUSECSI_VERSION:     v999.999.999
    Mounts:
      /etc/tls-certs from gcs-fuse-csi-driver-webhook-certs (ro)
  Volumes:
   gcs-fuse-csi-driver-webhook-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  gcs-fuse-csi-driver-webhook-secret
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   gcs-fuse-csi-driver-webhook-569899b854 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  42m   deployment-controller  Scaled up replica set gcs-fuse-csi-driver-webhook-569899b854 to 1

Seems like there might be failures for the Liveness probe? DaemonSet looks OK

$ kubectl describe -n gcs-fuse-csi-driver DaemonSet
Name:           gcsfusecsi-node
Selector:       k8s-app=gcs-fuse-csi-driver
Node-Selector:  kubernetes.io/os=linux
Labels:         <none>
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 4
Current Number of Nodes Scheduled: 4
Number of Nodes Scheduled with Up-to-date Pods: 4
Number of Nodes Scheduled with Available Pods: 4
Number of Nodes Misscheduled: 0
Pods Status:  4 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=gcs-fuse-csi-driver
  Annotations:      seccomp.security.alpha.kubernetes.io/pod: runtime/default
  Service Account:  gcsfusecsi-node-sa
  Containers:
   gcs-fuse-csi-driver:
    Image:      jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd
    Port:       <none>
    Host Port:  <none>
    Args:
      --v=5
      --endpoint=unix:/csi/csi.sock
      --nodeid=$(KUBE_NODE_NAME)
      --node=true
      --sidecar-image=$(SIDECAR_IMAGE)
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     5m
      memory:  10Mi
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
      SIDECAR_IMAGE:   <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
    Mounts:
      /csi from socket-dir (rw)
      /var/lib/kubelet/pods from kubelet-dir (rw)
   csi-driver-registrar:
    Image:      registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --v=5
      --csi-address=/csi/csi.sock
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    Limits:
      cpu:     50m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Environment:
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
  Volumes:
   registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
   kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods/
    HostPathType:  Directory
   socket-dir:
    Type:               HostPath (bare host directory volume)
    Path:               /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/
    HostPathType:       DirectoryOrCreate
  Priority Class Name:  csi-gcp-gcs-node
Events:
  Type    Reason            Age   From                  Message
  ----    ------            ----  ----                  -------
  Normal  SuccessfulCreate  43m   daemonset-controller  Created pod: gcsfusecsi-node-bfwd4
  Normal  SuccessfulCreate  43m   daemonset-controller  Created pod: gcsfusecsi-node-bxxwm
  Normal  SuccessfulCreate  43m   daemonset-controller  Created pod: gcsfusecsi-node-m9d6n
  Normal  SuccessfulCreate  43m   daemonset-controller  Created pod: gcsfusecsi-node-tr6wt

and pods seem OK

$ kubectl describe -n gcs-fuse-csi-driver Pods
Name:         gcs-fuse-csi-driver-webhook-569899b854-w9sm7
Namespace:    gcs-fuse-csi-driver
Priority:     0
Node:         gke-flux-cluster-default-pool-a53eb99b-6ns7/10.128.0.26
Start Time:   Fri, 07 Apr 2023 15:56:26 -0600
Labels:       app=gcs-fuse-csi-driver-webhook
              pod-template-hash=569899b854
Annotations:  cni.projectcalico.org/containerID: c9a83bf8d5268b69fe0cccc92ae85980eda576f1d4b0a948afb66a720e3da1ce
              cni.projectcalico.org/podIP: 10.116.0.5/32
              cni.projectcalico.org/podIPs: 10.116.0.5/32
              seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:       Running
IP:           10.116.0.5
IPs:
  IP:           10.116.0.5
Controlled By:  ReplicaSet/gcs-fuse-csi-driver-webhook-569899b854
Containers:
  gcs-fuse-csi-driver-webhook:
    Container ID:  containerd://df9919203c21747503e5c004ea9a1670115838ef5e5a6dfee03860e6aefd6e06
    Image:         jiaxun/gcs-fuse-csi-driver-webhook:v0.1.2-0-gd9e3bdd
    Image ID:      docker.io/jiaxun/gcs-fuse-csi-driver-webhook@sha256:bb1967c15ee8fcebf8c4c020121497e58f43f08c75066792acff0e1841b0ee34
    Ports:         22030/TCP, 22031/TCP, 22032/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      --sidecar-cpu-limit=250m
      --sidecar-memory-limit=256Mi
      --sidecar-ephemeral-storage-limit=5Gi
      --sidecar-image=$(SIDECAR_IMAGE)
      --sidecar-image-pull-policy=$(SIDECAR_IMAGE_PULL_POLICY)
      --cert-dir=/etc/tls-certs
      --port=22030
      --health-probe-bind-address=:22031
      --http-endpoint=:22032
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:37 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Liveness:  http-get http://:22031/readyz delay=30s timeout=15s period=30s #success=1 #failure=3
    Environment:
      SIDECAR_IMAGE_PULL_POLICY:  IfNotPresent
      SIDECAR_IMAGE:              <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
      GKE_GCSFUSECSI_VERSION:     v999.999.999
    Mounts:
      /etc/tls-certs from gcs-fuse-csi-driver-webhook-certs (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jnvb4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  gcs-fuse-csi-driver-webhook-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  gcs-fuse-csi-driver-webhook-secret
    Optional:    false
  kube-api-access-jnvb4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    44m                default-scheduler  Successfully assigned gcs-fuse-csi-driver/gcs-fuse-csi-driver-webhook-569899b854-w9sm7 to gke-flux-cluster-default-pool-a53eb99b-6ns7
  Warning  FailedMount  44m (x3 over 44m)  kubelet            MountVolume.SetUp failed for volume "gcs-fuse-csi-driver-webhook-certs" : secret "gcs-fuse-csi-driver-webhook-secret" not found
  Normal   Pulling      44m                kubelet            Pulling image "jiaxun/gcs-fuse-csi-driver-webhook:v0.1.2-0-gd9e3bdd"
  Normal   Pulled       43m                kubelet            Successfully pulled image "jiaxun/gcs-fuse-csi-driver-webhook:v0.1.2-0-gd9e3bdd" in 2.275865045s
  Normal   Created      43m                kubelet            Created container gcs-fuse-csi-driver-webhook
  Normal   Started      43m                kubelet            Started container gcs-fuse-csi-driver-webhook


Name:                 gcsfusecsi-node-bfwd4
Namespace:            gcs-fuse-csi-driver
Priority:             900001000
Priority Class Name:  csi-gcp-gcs-node
Node:                 gke-flux-cluster-default-pool-a53eb99b-dnp1/10.128.0.27
Start Time:           Fri, 07 Apr 2023 15:56:26 -0600
Labels:               controller-revision-hash=f6d8489cc
                      k8s-app=gcs-fuse-csi-driver
                      pod-template-generation=1
Annotations:          cni.projectcalico.org/containerID: 41ef22ee042ef73882b51a7c35efec8e61e93aaa6b434382d25fe68492bc7369
                      cni.projectcalico.org/podIP: 10.116.1.13/32
                      cni.projectcalico.org/podIPs: 10.116.1.13/32
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
IP:                   10.116.1.13
IPs:
  IP:           10.116.1.13
Controlled By:  DaemonSet/gcsfusecsi-node
Containers:
  gcs-fuse-csi-driver:
    Container ID:  containerd://8922ab1437e3be246c157d2d641b53ed1bf378b466a6664e7fa180e9c1fcb598
    Image:         jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd
    Image ID:      docker.io/jiaxun/gcs-fuse-csi-driver@sha256:1303895a8e8ab4a68e8d00ff089b86c7a43360ee3e57fc10a8f62c2e5697dac2
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --endpoint=unix:/csi/csi.sock
      --nodeid=$(KUBE_NODE_NAME)
      --node=true
      --sidecar-image=$(SIDECAR_IMAGE)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:31 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     5m
      memory:  10Mi
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
      SIDECAR_IMAGE:   <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
    Mounts:
      /csi from socket-dir (rw)
      /var/lib/kubelet/pods from kubelet-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sfpk7 (ro)
  csi-driver-registrar:
    Container ID:  containerd://001e963ccd561e5dfdc97b09d6060b4786a1ad6ef8f64c6f7d0dde4e50c193a2
    Image:         registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    Image ID:      registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=/csi/csi.sock
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:39 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     50m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Environment:
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sfpk7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods/
    HostPathType:  Directory
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/
    HostPathType:  DirectoryOrCreate
  kube-api-access-sfpk7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  44m   default-scheduler  Successfully assigned gcs-fuse-csi-driver/gcsfusecsi-node-bfwd4 to gke-flux-cluster-default-pool-a53eb99b-dnp1
  Normal  Pulling    44m   kubelet            Pulling image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd"
  Normal  Pulled     44m   kubelet            Successfully pulled image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd" in 3.417523579s
  Normal  Created    44m   kubelet            Created container gcs-fuse-csi-driver
  Normal  Started    44m   kubelet            Started container gcs-fuse-csi-driver
  Normal  Pulled     44m   kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0" already present on machine
  Normal  Created    44m   kubelet            Created container csi-driver-registrar
  Normal  Started    43m   kubelet            Started container csi-driver-registrar


Name:                 gcsfusecsi-node-bxxwm
Namespace:            gcs-fuse-csi-driver
Priority:             900001000
Priority Class Name:  csi-gcp-gcs-node
Node:                 gke-flux-cluster-default-pool-a53eb99b-55kb/10.128.0.29
Start Time:           Fri, 07 Apr 2023 15:56:26 -0600
Labels:               controller-revision-hash=f6d8489cc
                      k8s-app=gcs-fuse-csi-driver
                      pod-template-generation=1
Annotations:          cni.projectcalico.org/containerID: f2b1a4d72a1d5bbe01eb309c156208da4ec3746d9a37a48f993624c34ba43815
                      cni.projectcalico.org/podIP: 10.116.3.5/32
                      cni.projectcalico.org/podIPs: 10.116.3.5/32
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
IP:                   10.116.3.5
IPs:
  IP:           10.116.3.5
Controlled By:  DaemonSet/gcsfusecsi-node
Containers:
  gcs-fuse-csi-driver:
    Container ID:  containerd://e9bbaf689c41c0db529d34ccfb5eadbf62fa1441dc7abdc801685780789beb77
    Image:         jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd
    Image ID:      docker.io/jiaxun/gcs-fuse-csi-driver@sha256:1303895a8e8ab4a68e8d00ff089b86c7a43360ee3e57fc10a8f62c2e5697dac2
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --endpoint=unix:/csi/csi.sock
      --nodeid=$(KUBE_NODE_NAME)
      --node=true
      --sidecar-image=$(SIDECAR_IMAGE)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:31 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     5m
      memory:  10Mi
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
      SIDECAR_IMAGE:   <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
    Mounts:
      /csi from socket-dir (rw)
      /var/lib/kubelet/pods from kubelet-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vxfqm (ro)
  csi-driver-registrar:
    Container ID:  containerd://e10e98b586a2ed95e683faf7dafcf55a166defa62791693a25cb08636636c030
    Image:         registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    Image ID:      registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=/csi/csi.sock
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:38 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     50m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Environment:
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vxfqm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods/
    HostPathType:  Directory
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/
    HostPathType:  DirectoryOrCreate
  kube-api-access-vxfqm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  44m   default-scheduler  Successfully assigned gcs-fuse-csi-driver/gcsfusecsi-node-bxxwm to gke-flux-cluster-default-pool-a53eb99b-55kb
  Normal  Pulling    44m   kubelet            Pulling image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd"
  Normal  Pulled     44m   kubelet            Successfully pulled image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd" in 3.249691796s
  Normal  Created    44m   kubelet            Created container gcs-fuse-csi-driver
  Normal  Started    44m   kubelet            Started container gcs-fuse-csi-driver
  Normal  Pulled     44m   kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0" already present on machine
  Normal  Created    44m   kubelet            Created container csi-driver-registrar
  Normal  Started    43m   kubelet            Started container csi-driver-registrar


Name:                 gcsfusecsi-node-m9d6n
Namespace:            gcs-fuse-csi-driver
Priority:             900001000
Priority Class Name:  csi-gcp-gcs-node
Node:                 gke-flux-cluster-default-pool-a53eb99b-5zx1/10.128.0.28
Start Time:           Fri, 07 Apr 2023 15:56:27 -0600
Labels:               controller-revision-hash=f6d8489cc
                      k8s-app=gcs-fuse-csi-driver
                      pod-template-generation=1
Annotations:          cni.projectcalico.org/containerID: e309c99a5cdcfd6abb0791447c2b1d7faffca4b7c09f84c2761572ca6d51ce05
                      cni.projectcalico.org/podIP: 10.116.2.6/32
                      cni.projectcalico.org/podIPs: 10.116.2.6/32
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
IP:                   10.116.2.6
IPs:
  IP:           10.116.2.6
Controlled By:  DaemonSet/gcsfusecsi-node
Containers:
  gcs-fuse-csi-driver:
    Container ID:  containerd://4b12404c5ba83cd2ff9ca604347f9520b0f46160a82358153a1fba54e86ff149
    Image:         jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd
    Image ID:      docker.io/jiaxun/gcs-fuse-csi-driver@sha256:1303895a8e8ab4a68e8d00ff089b86c7a43360ee3e57fc10a8f62c2e5697dac2
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --endpoint=unix:/csi/csi.sock
      --nodeid=$(KUBE_NODE_NAME)
      --node=true
      --sidecar-image=$(SIDECAR_IMAGE)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:31 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     5m
      memory:  10Mi
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
      SIDECAR_IMAGE:   <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
    Mounts:
      /csi from socket-dir (rw)
      /var/lib/kubelet/pods from kubelet-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sz997 (ro)
  csi-driver-registrar:
    Container ID:  containerd://d24c8b71f132e71b90c97668ac20959942ed4132adbfce40db0894ef4b471848
    Image:         registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    Image ID:      registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=/csi/csi.sock
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:38 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     50m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Environment:
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sz997 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods/
    HostPathType:  Directory
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/
    HostPathType:  DirectoryOrCreate
  kube-api-access-sz997:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  44m   default-scheduler  Successfully assigned gcs-fuse-csi-driver/gcsfusecsi-node-m9d6n to gke-flux-cluster-default-pool-a53eb99b-5zx1
  Normal  Pulling    44m   kubelet            Pulling image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd"
  Normal  Pulled     44m   kubelet            Successfully pulled image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd" in 3.550932087s
  Normal  Created    44m   kubelet            Created container gcs-fuse-csi-driver
  Normal  Started    44m   kubelet            Started container gcs-fuse-csi-driver
  Normal  Pulled     44m   kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0" already present on machine
  Normal  Created    44m   kubelet            Created container csi-driver-registrar
  Normal  Started    43m   kubelet            Started container csi-driver-registrar


Name:                 gcsfusecsi-node-tr6wt
Namespace:            gcs-fuse-csi-driver
Priority:             900001000
Priority Class Name:  csi-gcp-gcs-node
Node:                 gke-flux-cluster-default-pool-a53eb99b-6ns7/10.128.0.26
Start Time:           Fri, 07 Apr 2023 15:56:27 -0600
Labels:               controller-revision-hash=f6d8489cc
                      k8s-app=gcs-fuse-csi-driver
                      pod-template-generation=1
Annotations:          cni.projectcalico.org/containerID: be243db3ac4de4f6368b4ea4e3f6d6714274d00a997494038874d10149f6a4ee
                      cni.projectcalico.org/podIP: 10.116.0.4/32
                      cni.projectcalico.org/podIPs: 10.116.0.4/32
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
IP:                   10.116.0.4
IPs:
  IP:           10.116.0.4
Controlled By:  DaemonSet/gcsfusecsi-node
Containers:
  gcs-fuse-csi-driver:
    Container ID:  containerd://0207bc460269dc017d4e6e71ca65dc7c5538e40aa27659bd0f0bdb30e53feead
    Image:         jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd
    Image ID:      docker.io/jiaxun/gcs-fuse-csi-driver@sha256:1303895a8e8ab4a68e8d00ff089b86c7a43360ee3e57fc10a8f62c2e5697dac2
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --endpoint=unix:/csi/csi.sock
      --nodeid=$(KUBE_NODE_NAME)
      --node=true
      --sidecar-image=$(SIDECAR_IMAGE)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:32 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     5m
      memory:  10Mi
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
      SIDECAR_IMAGE:   <set to the key 'sidecar-image' of config map 'gcsfusecsi-image-config'>  Optional: false
    Mounts:
      /csi from socket-dir (rw)
      /var/lib/kubelet/pods from kubelet-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fbltq (ro)
  csi-driver-registrar:
    Container ID:  containerd://ff912ed2ba26c778a2ca0b3cb3ef9c05e196206efb466bd11753422518808ebc
    Image:         registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    Image ID:      registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=/csi/csi.sock
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Fri, 07 Apr 2023 15:56:40 -0600
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     50m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  10Mi
    Environment:
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fbltq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods/
    HostPathType:  Directory
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/gcsfuse.csi.storage.gke.io/
    HostPathType:  DirectoryOrCreate
  kube-api-access-fbltq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  44m   default-scheduler  Successfully assigned gcs-fuse-csi-driver/gcsfusecsi-node-tr6wt to gke-flux-cluster-default-pool-a53eb99b-6ns7
  Normal  Pulling    44m   kubelet            Pulling image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd"
  Normal  Pulled     44m   kubelet            Successfully pulled image "jiaxun/gcs-fuse-csi-driver:v0.1.2-0-gd9e3bdd" in 4.025131123s
  Normal  Created    44m   kubelet            Created container gcs-fuse-csi-driver
  Normal  Started    44m   kubelet            Started container gcs-fuse-csi-driver
  Normal  Pulled     44m   kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0" already present on machine
  Normal  Created    44m   kubelet            Created container csi-driver-registrar
  Normal  Started    43m   kubelet            Started container csi-driver-registrar

I can't think of anything else to show - I hope you can help! Also I heard they are deprecating sidecar contiainers? Is that possibly related / will this not work in the future?

GKE worker node needs bucket level permissions

Before mounting a bucket one has to bind to the bucket IAM in fact two service accounts:

first - related to a workload, bound to k8s serviceaccount object
second - service account of GKE node (with at least roles/storage.objectAdmin)

This implies, that the content of the bucket can be accessible by other k8s workloads using GKE node service account. This usage is not straighforward, but highly possible. Is there anything that can be done with this privilliges and principles overlapping? From security point of view this is not acceptable. The best case scenario is binding only one service account - workload related.

installation docs should state that kubectl context needs to be set

in this section. Otherwise, this error occurs.

make install STAGINGVERSION=v0.1.2 PROJECT=my-project
OVERLAY is stable
STAGINGVERSION is v0.1.2
DRIVER_IMAGE is jiaxun/gcs-fuse-csi-driver
SIDECAR_IMAGE is jiaxun/gcs-fuse-csi-driver-sidecar-mounter
WEBHOOK_IMAGE is jiaxun/gcs-fuse-csi-driver-webhook
./deploy/base/webhook/patch-ca-bundle.sh
error: current-context is not set
jq: error (at <stdin>:9): Cannot iterate over null (null)

The sidecar container violates Restricted Pod Security Standard

Symptom

When the Restricted Pod Security Standard is enforced, the Pod will throw error:

pods "xxx" is forbidden: violates PodSecurity
...
"gke-gcsfuse-sidecar" must set securityContext.capabilities.drop=["ALL"])

Root Cause

The sidecar container has the drop as "all", instead of "ALL".

Workaround

Similar to #20 (comment), you can manually inject the sidecar container using drop "ALL".

Fix

This is fixed by 6a0bdba and will be included in the next release.

Sidecar crash

Recently the sidecar started to constantly crash after few minutes the pod started with this repeating error:

E0609 LookUpInode: interrupted system call, list objects: Error in iterating through objects: context canceled
E0609 fuse: *fuseops.LookUpInodeOp error: interrupted system call
E0609 LookUpInode: interrupted system call, list objects: Error in iterating through objects: context canceled
E0609  fuse: *fuseops.LookUpInodeOp error: interrupted system call
E0609  ReadFile: interrupted system call, fh.reader.ReadAt: readFull: context canceled
E0609  *fuseops.ReadFileOp error: interrupted system call

Dirver version
Running Google Cloud Storage FUSE CSI driver sidecar mounter version v0.1.3-gke.0

GKE
1.26.3-gke.1000

Listing directory with many files only returns the first 5000 files

This is caused by a gcsfuse issue: GoogleCloudPlatform/gcsfuse#1054

The bug was fixed in the gcsfuse new release v0.42.4.

The next CSI driver release will consume the new gcsfuse release to fix this issue.

The sidecar container is at the spec.containers[0] position which may cause issues in some workloads

The current sidecar container injection logic will inject the sidecar container at the spec.containers[0] position in a Pod.

The design makes sure that kubelet or the container runtime will start up the sidecar container first to obtain the file descriptor of the mount point. Otherwise, the other containers that consume the mount point will hang in the start up step.

This design may break some application's assumption, for example, the Airflow Kubernetes Executor assumes that the base container is always at the spec.containers[0] position in the worker Pod, which conflicts with the sidecar container injection logic.

We are waiting for the KEP for sidecar container pattern: kubernetes/enhancements#3761. After the KEP is accepted and implemented, we will migrate to leverage the native supported sidecar container pattern.

A workaround is to modify the workload Pod to manually inject the sidecar container. Then the sidecar container can be at any position in the container array. However, if the sidecar container is at a later position than the workload container that consumes the volume, the workload container start up will hang 2 minutes and timeout. But the sidecar container start up will proceed. Whenever the sidecar container starts up, the workload container can start as usual.

On Autopilot clusters, you cannot upload files larger than 10Gi

On GKE Autopilot clusters, according to the doc Resource requests in Autopilot:

The ephemeral storage request must be between 10 MiB and 10 GiB for all compute classes and hardware configurations.

Ephemeral storage: Autopilot modifies your ephemeral storage requests to meet the minimum amount required by each container. The cumulative value of storage requests across all containers cannot be more than the maximum allowed value. Autopilot scales the request down if the value exceeds the maximum.

This means, even you specify a value larger then 10Gi for the annotation gke-gcsfuse/ephemeral-storage-limit, GKE Autopilot will scale down the request to make the value under 10Gi.

This also means, on GKE Autopilot cluster, you cannot use the CSI driver to upload files that larger than 10Gi.

We have a few potential solutions:

A standard PD instead of an ephemeral storage can be used to support the write/upload operations to remove this limitation.
The underlaying GCSFuse could possibly support write-through that does not require a staging temp storage, which can also remove this limitation.

bug: installation script not setting required `mutatingwebhookconfiguration`'s `webhook[0].clientConfig.caBundle`

After finishing the installation, I noticed these logs from the webhook workload.

kubectl -n gcs-fuse-csi-driver logs -f --tail -1 --selector app=gcs-fuse-csi-driver-webhook

I0420 23:01:32.813714       1 main.go:54] Running Google Cloud Storage FUSE CSI driver admission webhook version v0.1.2, sidecar container image jiaxun/gcs-fuse-csi-driver-sidecar-mounter:v0.1.2
I0420 23:01:32.813930       1 metrics.go:89] Emit component_version metric with value v999.999.999
I0420 23:01:32.813957       1 main.go:71] Setting up manager.
I0420 23:01:32.814010       1 metrics.go:68] Metric server listening at ":22032"
I0420 23:01:33.960666       1 request.go:690] Waited for 1.046610071s due to client-side throttling, not priority and fairness, request: GET:https://10.160.216.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
I0420 23:01:34.164403       1 main.go:90] Setting up webhook server.
I0420 23:01:34.164440       1 main.go:95] Registering webhooks to the webhook server.
I0420 23:01:34.164557       1 main.go:103] Starting manager.
2023/04/20 23:01:51 http: TLS handshake error from 10.169.3.195:47728: EOF
2023/04/20 23:01:51 http: TLS handshake error from 10.169.5.145:53302: EOF
2023/04/20 23:02:16 http: TLS handshake error from 10.169.1.133:39250: EOF

I then created this Pod.

cat pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: dxia
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  containers:
  - name: ubuntu
    image: ubuntu:latest
    # Just spin & wait forever
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
    volumeMounts:
    - name: gcs-fuse-csi-ephemeral
      mountPath: /data
  serviceAccountName: default-editor
  volumes:
  - name: gcs-fuse-csi-ephemeral
    csi:
      driver: gcsfuse.csi.storage.gke.io
      volumeAttributes:
        bucketName: dxia-gcsfuse-test1

kubectl apply -f pod.yaml

But it didn't have the gke-gcsfuse-sidecar sidecar. I changed the mutatingwebhookconfigurations gcsfuse-sidecar-injector.csi.storage.gke.io's failurePolicy from Ignore to Fail. This time applying the Pod above resulted in error

Error from server (InternalError): error when creating "pod.yaml": Internal error occurred: failed calling webhook "gcsfuse-sidecar-injector.csi.storage.gke.io": failed to call webhook: Post "https://gcs-fuse-csi-driver-webhook.gcs-fuse-csi-driver.svc:443/inject?timeout=3s": x509: certificate signed by unknown authority

Copy-pasting the output of kubectl -n gcs-fuse-csi-driver get secrets gcsfusecsi-node-sa-token-57rbn -o jsonpath='{.data.ca\.crt}' to gcsfuse-sidecar-injector.csi.storage.gke.io's

webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: CA_BUNDLE

fixed the issue.

So I think the installation script is missing this step.

subPath does not work when Anthos Service Mesh is enabled

Symptom

The subPath field does not work when Anthos Service Mesh is enabled.

Root Cause

The root cause is described in #46.

When the kubelet checks the subpath, it does not timeout in 2 minutes, thus the workload container start up will hang forever.

Workaround

Instead of using subPath, you can use only-dir flag to mount the bucket. It allows you to only mount a sub-folder in the bucket to the mount path.

See the documentation https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#mounting-flags for details.

Solution

Similar to #46, The sidecar feature KEP will be the solution.

The sidecar container issue is tracked on GitHub, and we are waiting for the Kubernetes sidecar container feature to ultimately solve this issue.

Error: Init:CreateContainerError when GCS Fuse CSI driver enabled with GKE Cluster on apache airflow deployment

Source

While deploying apache airflow deployment to the GKE cluster which is enabled gcs-fuse-csi-driver for Mount filesystem, getting
Init:CreateContainerError. while describe the pod, we were get error like Error: failed to reserve container name "check-db_sample-airflow-db-migrations-5d99cc5b86-bjqd2_sample_c8e8766c-b2e0-486c-8e1a-f2db0fe20745_0": name "check-db_sample-airflow-db-migrations-5d99cc5b86-bjqd2_sample_c8e8766c-b2e0-486c-8e1a-f2db0fe20745_0" is reserved for "5738e4d666b4c6872c316b752cac8633b901218f71b3f63c3176f365e5e86e9e".

Here I will attach the source code for apache airflow helm chart,
https://github.com/airflow-helm/charts/tree/main/charts/airflow

I will share the values.yaml for db-migrations

values.yaml
########################################
## COMPONENT | db-migrations Deployment
########################################
dbMigrations:
## if the db-migrations Deployment/Job is created
## - [WARNING] if `false`, you have to MANUALLY run `airflow db upgrade` when required
##
enabled: true

## if a post-install helm Job should be used (instead of a Deployment)
## - [WARNING] setting `true` will NOT work with the helm `--wait` flag,
##   this is because post-install helm Jobs run AFTER the main resources become Ready,
##   which will cause a deadlock, as other resources require db-migrations to become Ready
##
runAsJob: false

## resource requests/limits for the db-migrations Pods
## - spec for ResourceRequirements:
##   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#resourcerequirements-v1-core
##
resources: {}

## the nodeSelector configs for the db-migrations Pods
## - docs for nodeSelector:
##   https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
##
nodeSelector: {}

## the affinity configs for the db-migrations Pods
## - spec for Affinity:
##   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#affinity-v1-core
##
affinity: {}

## the toleration configs for the db-migrations Pods
## - spec for Toleration:
##   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#toleration-v1-core
##
tolerations: []

## the security context for the db-migrations Pods
## - spec for PodSecurityContext:
##   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#podsecuritycontext-v1-core
##
securityContext:
  fsGroup: 3003
  runAsGroup: 2002
  runAsUser: 1001

## Labels for the db-migrations Deployment
##
labels: {}

## Pod labels for the db-migrations Deployment
##
podLabels: {}

## annotations for the db-migrations Deployment/Job
##
annotations: {}

## Pod annotations for the db-migrations Deployment/Job
##
podAnnotations:
  gke-gcsfuse/volumes: "true"

## if we add the annotation: "cluster-autoscaler.kubernetes.io/safe-to-evict" = "true"
##
safeToEvict: false

## the number of seconds between checks for unapplied db migrations
## - only applies if `airflow.dbMigrations.runAsJob` is `false`
##
checkInterval: 300

Persistence Volume and Persistence Volume Claim Configuration with terraform file

 pv_pvc.tf
 resource "kubernetes_persistent_volume" "sample-airflow-logs-pv" {
 metadata {
     name = "sample-airflow-logs-pv"
 }
 spec {
   capacity = {
   storage = "150Gi"
 }
 access_modes = ["ReadWriteMany"]
 storage_class_name = "sample-eks-efs-sc"
 mount_options = [
  "implicit-dirs",
  "uid=1001",
  "gid=3003" ]
 persistent_volume_source {
  csi {
    driver = "gcsfuse.csi.storage.gke.io"
    volume_handle = "sample-airflow-logs-fs"
       } 
     }
   }
 }
resource "kubernetes_persistent_volume_claim" "sample-airflow-logs-pvc" {
metadata {
   name      = "sample-airflow-logs"
   namespace = "sample"
}

 spec {
   access_modes       = ["ReadWriteMany"]
   storage_class_name = "sample-eks-efs-sc"
   volume_name = "sample-airflow-logs-pv"
   resources {
      requests = {
        storage = "150Gi"
     }
   }
 }
 depends_on = [
   kubernetes_persistent_volume.sample-airflow-logs-pv
 ]
}

Error:

Command : kubectl get pods -n dataworkz
Error:

sample-airflow-db-migrations-5d99cc5b86-9zsdt   0/2     Init:CreateContainerError   0          71m

sample-airflow-redis-master-0                   2/2     Running                     0          71m

sample-airflow-scheduler-7b4c5bcdff-tt246       0/2     Init:CreateContainerError   0          71m

sample-airflow-sync-users-79d7d4f477-9gzph      0/2     Init:CreateContainerError   0          71m

sample-airflow-web-57755c9659-9hwh4             0/2     Init:CreateContainerError   0          71m

sample-airflow-worker-0                         0/2     Init:CreateContainerError   0          71m

Commnad : kubectl describe pod sample-airflow-db-migrations-5d99cc5b86-bjqd2 -n dataworkz

Error from Event:
Events:

Type     Reason     Age                 From               Message

----     ------     ----                ----               -------

Normal   Scheduled  2m14s               default-scheduler  Successfully assigned dataworkz/sample-airflow-db-migrations-5d99cc5b86-bjqd2 to gke-sample-all-in-on-sample-default-p-4e593424-qtww

Warning  Failed     13s                 kubelet            Error: context deadline exceeded

Normal   Pulled     0s (x3 over 2m13s)  kubelet            Container image "[xxxxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/dataworkz/airflow:2.5.3](http://xxxxxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/dataworkz/airflow:2.5.3)" already present on machine

Warning  Failed     0s (x2 over 13s)    kubelet            Error: failed to reserve container name "check-db_sample-airflow-db-migrations-5d99cc5b86-bjqd2_sample_c8e8766c-b2e0-486c-8e1a-f2db0fe20745_0": name "check-db_sample-airflow-db-migrations-5d99cc5b86-bjqd2_sample_c8e8766c-b2e0-486c-8e1a-f2db0fe20745_0" is reserved for "5738e4d666b4c6872c316b752cac8633b901218f71b3f63c3176f365e5e86e9e"

Observation:

As per the official document for gcs-fuse-csi-driver, we followed below steps
https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver

Step1: Added pod annotations to the deployment values.yaml
podAnnotations:
gke-gcsfuse/volumes: "true"
Step2: Added security context to values.yaml
securityContext:
fsGroup: 3003
runAsGroup: 2002
runAsUser: 1001
Step3: Tried more pod annotations to the values.yaml
podAnnotations:
gke-gcsfuse/volumes: "true"
gke-gcsfuse/cpu-limit: "2048m"
gke-gcsfuse/memory-limit: "2048Mi"
gke-gcsfuse/ephemeral-storage-limit: "50Gi"

we are getting error while creating init container on check db line, below are the init container helm deployment file

deployment.yaml

initContainers:
      {{- if $extraPipPackages }}
      {{- include "airflow.init_container.install_pip_packages" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) | indent 8 }}
      {{- end }}
      {{- if .Values.dags.gitSync.enabled }}
      ## git-sync is included so "airflow plugins" & "python packages" can be stored in the dags repo
      {{- include "airflow.container.git_sync" (dict "Release" .Release "Values" .Values "sync_one_time" "true") | indent 8 }}
      {{- end }}
      {{- include "airflow.init_container.check_db" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}

Please refer the below link to get full script of deployment.yaml
https://github.com/airflow-helm/charts/blob/main/charts/airflow/templates/db-migrations/db-migrations-deployment.yaml

Please guide to add any additional configuration to get work properly.

can the CSI side car image be hosted in a private repository and referenced, for security purpose.

Some organizations may not allow to pull images from public reporsitory directly into the cluster. May be they prefer to host it privately and do regular security scans.
is it possible to host gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.4-gke.1@sha256:442969f1e565ba63ff22837ce7a530b6cbdb26330140b7f9e1dc23f53f1df335 privately and side car reference to private repository ?

Supporting More Authentication Mechanisms

Hi, Thank you very much for the great project! I'm really surprised that FUSE can run in the sidecar container without any privileges!

As kubernetes platform admin point of view, supporting FUSE was difficult(risky) because we have to give privilege to FUSE containers in application. But, this project proved it can breaks the limitation (thanks to "file descriptor passing" between CSI driver and FUSE sidecar which can encapsulate privileged operations in the CSI driver).

Context/Scenario

I(platform admin) develops a in-house kubernetes platform for internal application developers
I would like to support gcs-fuse-csi-driver in our clusters(multiple clusters)
GCP project of the kubernetes clusters are managed by us(platform admin)
But, each GCP project for applications is fully owned by the application developers

The Problem

Current, gcs-fuse-csi-driver implementation depends on Workload Identity.

However, if I understood correctly, if the application runs in multiple kubernetes clusters, application developer has to create iam-policy-binding for each k8s cluster(k8s service account). It is because applications running on different cluster have different Workload Identities. That also means the application developer will need to update iam-policy-binding whenever our cluster is added/removed.

As a platform admin, the UX is not so convenient. I would like to reduce this toils on the application developer side.

Proposals

Option 1. Supporting GCP Service Account's Private Key in Kubernetes Secret

This would be handy. Of course, I understand Workload Identity is more secure than long lived(never expired) secret key file.

Our platform can provide a feature which syncs the secret across our clusters. In this case, application developers need nothing when the cluster which the application runs on is added/reduced. What the application developers need is only to specify the secret name in their manifest.

By the way, gcsfuse also accepts key-file as cli argument. But, gcs-fuse-csi-driver explicitly prohibits to use the argument. Is there any reason for this??

In this option, I imagined below changes:

supports extra attribute (say secretName)in volumeAttributes (also in MountConfig
csi-driver
- reads the kubernetes secret
- store it to somewhere shared by the sidecar container (/gcsfuse-tmp/.volumes/<volume-name>/service_account.json?),
- set the path to MountConfig (we need to add a field for this),
- and pass it to the sidecar
sidecar-mounter run gcsfuse with key-file=...

Option 2. Supporting Workload Identity Federation

This would be more secure and might be standard. Recently, there exists application identification mechanism which is not tied with single kubernetes cluster's authority (e.g. spiffee). By using this, application can have stable application identity even if the application runs on multiple kubernetes clusters.

I think this can completely fits with Workload Identity Federation use case.

In this option, I imagined below changes:

support volumeAttributes required for workload federation, say
- workloadIdentityProvider
- serviceAccountEmail
also support annotation for application identity info which is assumed the kubernetes platform is responsible to provide
- gke-gcsfuse/credential-source-volume
- gke-gcsfuse/credential-source-file
webhook injects
- volumeMount to sidecar container for gke-gcsfuse/credential-source-volume
- add extra args for application credential file to the sidecar-mounter
csi-driver
- reads the attributes, set it into MountConfig, and pass to the sidecar
sidecar-mounter
- bootstraps credential configuration file from the provided information (/gcsfuse-tmp/.volumes/<volume-name>/credential_configuration.json can be used?)
- then, it runs gcsfuse with key-file=...

I would be very appreciated if I got feedbacks. Thanks in advance.

CSI Driver tokenRequests value incorrect when setting up from Cloud Shell

When you set up the CSI Driver from Cloud Shell the CSI Driver picks up a wrong value in the TOKENREQUESTS audience field. As a result, pods will not be able to mount any storage volumes and show the error message in the logs:

Pod event warning: MountVolume.SetUp failed for volume "xxx" : rpc error: code = Internal desc = failed to prepare storage service: rpc error: code = Internal desc = storage service manager failed to setup service: timed out waiting for the condition.

Workaround:

Manually delete the csidriver using kubectl delete csidriver gcsfuse.csi.storage.gke.io
Update the file deploy/base/setup/csi_driver.yaml and edit the property value audience
Recreate the csidriver using kubectl apply -f deploy/base/setup/csi_driver.yaml
In case there are running deployments, clean up the environment and redeploy.

RBAC improvements

gcs-fuse-csi-driver/deploy/base/controller/controller_setup.yaml

Line 36 in b9a071a

- apiGroups: [""]

Are the secrets and service account token permissions required? They are very permissive
Is the DaemonSet permission required?

Pytorch inference job example

Can we add a pytorch inference example/benchmark on Autopilot for T4?
https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/tree/main/examples/pytorch

Simply mount options for uid/gid

Right now users have to set the driver's mount options to match the uid/gid settings.

Is UID required, or is gid/fsgroup sufficient? Normally for pod volumes, having the volume ownership match the fsgroup (ie supplemental group of the contaner) is enough and uid for the volume doesn't matter.

If you enable delegate fsgroup to csidriver, then k8s will send over the fsgroup id to the NodePublishVolume call, so that could eliminate needing to repeat gid in the mount options.

Allow the gcsfuse sidecar container to use unlimited resource on GKE Standard clusters.

This is a feature request: to allow the gcsfuse sidecar container to use unlimited resource on GKE Standard clusters.

Install not able to pull containers - need to build from tag

Hi! 👋

I'm following the guide here and I've just run make deploy and everything seemed to look okay, however when I check on things, it definitely isn't:

$ kubectl get CSIDriver,Deployment,DaemonSet,Pods -n gcs-fuse-csi-driver
NAME                                                  ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES                  AGE
csidriver.storage.k8s.io/gcsfuse.csi.storage.gke.io   false            true             false             <unset>         true                Persistent,Ephemeral   14s
csidriver.storage.k8s.io/pd.csi.storage.gke.io        true             false            false             <unset>         false               Persistent             25m

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/gcs-fuse-csi-driver-webhook   0/1     1            0           14s

NAME                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/gcs-fuse-csi-node   4         4         0       4            0           kubernetes.io/os=linux   14s

NAME                                            READY   STATUS             RESTARTS   AGE
pod/gcs-fuse-csi-driver-webhook-db67745-fzb9z   0/1     ErrImagePull       0          14s
pod/gcs-fuse-csi-node-2bbwr                     2/3     ErrImagePull       0          14s
pod/gcs-fuse-csi-node-dn9jp                     2/3     ErrImagePull       0          14s
pod/gcs-fuse-csi-node-dvtgf                     2/3     ImagePullBackOff   0          14s
pod/gcs-fuse-csi-node-kxnw9                     2/3     ErrImagePull       0          14s

Specifically looking at a log for the driver

$ kubectl logs pod/gcs-fuse-csi-driver-webhook-db67745-fzb9z
Error from server (NotFound): pods "gcs-fuse-csi-driver-webhook-db67745-fzb9z" not found
(env) (base) vanessa@vanessa-ThinkPad-T490s:/tmp/gcs-fuse$ kubectl logs pod/gcs-fuse-csi-driver-webhook-db67745-fzb9z -n gcs-fuse-csi-driver
Error from server (BadRequest): container "gcs-fuse-csi-driver-webhook" in pod "gcs-fuse-csi-driver-webhook-db67745-fzb9z" is waiting to start: image can't be pulled

And more detail from the pod:

Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    3m27s                  default-scheduler  Successfully assigned gcs-fuse-csi-driver/gcs-fuse-csi-driver-webhook-db67745-fzb9z to gke-flux-cluster-default-pool-9b2e095d-lnn9
  Warning  FailedMount  3m25s (x3 over 3m27s)  kubelet            MountVolume.SetUp failed for volume "gcs-fuse-csi-driver-webhook-certs" : secret "gcs-fuse-csi-driver-webhook-secret" not found
  Normal   Pulling      117s (x4 over 3m21s)   kubelet            Pulling image "jiaxun/gcs-fuse-csi-driver-webhook:v0.4.1-18-ga3b91c3-dirty"
  Warning  Failed       117s (x4 over 3m21s)   kubelet            Failed to pull image "jiaxun/gcs-fuse-csi-driver-webhook:v0.4.1-18-ga3b91c3-dirty": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/jiaxun/gcs-fuse-csi-driver-webhook:v0.4.1-18-ga3b91c3-dirty": failed to resolve reference "docker.io/jiaxun/gcs-fuse-csi-driver-webhook:v0.4.1-18-ga3b91c3-dirty": docker.io/jiaxun/gcs-fuse-csi-driver-webhook:v0.4.1-18-ga3b91c3-dirty: not found
  Warning  Failed       117s (x4 over 3m21s)   kubelet            Error: ErrImagePull
  Normal   BackOff      105s (x5 over 3m20s)   kubelet            Back-off pulling image "jiaxun/gcs-fuse-csi-driver-webhook:v0.4.1-18-ga3b91c3-dirty"
  Warning  Failed       105s (x5 over 3m20s)   kubelet            Error: ImagePullBackOff

The issue seems to be (for all the containers) that it's deriving a dirty commit tag (that does not exist)

See existing tags here.

I probably missed it, but maybe it makes sense to have an extra note that if the user isn't specifying a custom registry/staging version, they should checkout a tag? E.g., I could do:

$ git clone https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver.git /tmp/gcs-fuse
$ cd /tmp/gcs-fuse
# fetch tags so we build a release version
$ git fetch

# View tags with git tag and then choose one
$ git checkout v0.4.1

make install

I'll put in a quick PR with a suggestion to fix this up!

Cannot parse gcsfuse bool flags with value

If the gcsfuse bool flags are passed with values, for example, "implicit-dirs=true", the sidecar container will throw error.

The issue is fixed by the commit cbfa3cd

The fix will be included in the next release v0.1.4.

Comparison with CSI GCS - sidecar overhead

Hi,

What would you say are the biggest difference between GCS FUSE & https://github.com/ofek/csi-gcs?

Even though I really appreciate the fact this project is officially supported by Google Cloud, I'm a bit concerned about the overhead with FUSE (one sidecar container with 256Mi memory & 5Gi ephemeral storage).

Any take on that?

Thanks.

Enable -o options for gcsfuse

Allow users to pass kernel flags via -o to gcsfuse.

The issue is fixed by the commit f5bb30d

The fix will be included in the next release v0.1.4.

The doc will be updated before the release.

Sidecar mounter grace period

First let me say, I'm excited that GKE has native support for this now. I'm excited to use this driver in projects.

I have a question regarding the grace period timeout that happens before the mounter sidecar exists. It seems like it waits 30 seconds before exiting. I'm using this inside of short term workloads and this causes them all to run for an extra 30 seconds.

Looking at the code it seems like it isn't currently overridable. I see there is a gracePeriod flag that could be overridden here however the webhook dosen't allow passing in overrides.

Is there a reason for not allowing an override there? Would this be something a PR could be accepted for?

I have deployed this driver natively in GKE by using the addons approach.

Usage of privileged pod in driver

Hey,

I have a quick implementation related question:

Most of the csi drivers need privileged access as they are using mount (with optional some kernel modules. e.g. ceph).
Means that they need the CAP_SYS_ADM in order to do mounting in Linux.
Fuse on the other hand is build to not need such privileges and only mount in user space.
So I'm wondering why does this csi-driver need to run a privileged container?

The gcsfuse sidecar container does not exit automatically when all the containers have exited and the Pod restartPolicy is Never

In some use cases, where the Pod restartPolicy is Never, and the users expect the sidecar container will exit automatically when all the containers have exited.

For example, in the Airflow use cases, the Task will never complete and the DAG be blocked if the sidecar container does not exit automatically.

Note that we do support sidecar container auto-exit in Jobs -- the sidecar container will exit automatically when all the containers have exited in a Job Pod.

Cannot create an Autopilot cluster with the CSI driver enabled using terraform

Symptom

When create an Autopilot cluster with the CSI driver enabled using terraform, you will get the following error:

│ Error: Conflicting configuration arguments
│ 
│   with module.cluster.google_container_cluster.cluster,
│   on ../../../DH9514_thdgit/tf-mod-gke-cluster/main.tf line 44, in resource "google_container_cluster" "cluster":
│   44:     gcs_fuse_csi_driver_config {
│ 
│ "addons_config.0.gcs_fuse_csi_driver_config": conflicts with enable_autopilot

See hashicorp/terraform-provider-google#15817 for details.

Workaround

You can create an Autopilot cluster with the CSI driver disabled, and then run the additional step below to enable CSI driver.

gcloud container clusters update ${CLUSTER} --update-addons GcsFuseCsiDriver=ENABLED --region=${REGION}

Fix

This PR GoogleCloudPlatform/magic-modules#8998 will fix the issue.

Cannot execute file from the mounted filesystem

The CSI mounter uses noexec option by default in v0.1.2.

This was fixed by commit: 9b6e032

The next release will include this fix.

Profile/pressure analisys best practices

Can you add a clear list of best practices to test the sidecar resource allocation? From the current documentation it seems to be a little bit "black magic".
It is hard to understand on real workload if the sidecar is under pressure and it requires additional memory and CPU limits.

Error: context deadline exceeded when Anthos Service Mesh is enabled

Symptom

With the Anthos Service Mesh enabled on GKE, the Pod start up will hang about two minutes. Meanwhile, you will see the gke-gcsfuse-sidecar stuck at the the step gcs: Req 0x0: <- ListObjects("") (with the debug flags enabled to see this log).

Then you will see the following Pod warning message:

  Type     Reason             Age                    From                Message
  ----     ------             ----                   ----                -------
...
  Warning  Failed             48s                    kubelet             Error: context deadline exceeded

Eventually, the Pod start up should succeed.

Root cause

The Anthos Service Mesh (ASM) feature will inject a sidecar container called istio-proxy to the workload Pod, while the CSI driver will inject a sidecar container gke-gcsfuse-sidecar.

After all the sidecar injection happened, there are three containers in the container array: [gke-gcsfuse-sidecar, workload, istio-proxy]. Kubelet will spin up the containers in the container array in order.

The wrokload container start up will hang because with ASM enabled, the network does not work at the moment when the gke-gcsfuse-sidecar container starts. The container start up has a two-minute timeout. After about two minutes, the istio-proxy sidecar container starts up, then the List request in the gke-gcsfuse-sidecar container will proceed. Then kubelet will retry to start the workload container.

This behavior is expected due to our current sidecar injection design -- we inject the gcsfuse sidecar at the 0 position in the container array. Users will experience a two-minute container start up delay when ASM is enabled.

Solution

This issue is tracked on GitHub, and we are waiting for the Kubernetes sidecar container feature available on GKE to ultimately solve this issue.

Instructions with tag no longer work

I'm trying to reproduce what I did a few weeks (months?) ago, and I'm a bit confused because previously there was a version 0.4.x and now it looks like it's reverted back to 0.1.x. I tried the same strategy as before, but it still used a git "dirty" tag that doesn't exist, so I believe something has changed in the repository since then.

Would it be possible to provide this example and deployment either with an exact example of an existing image, or not relying on git? The current setup seems very error prone, and this is the second time I'm trying and hitting a bug because of this. For reference, here is my previous issue #1 and you can see the tag was a higher version. These instructions that we fixed no longer work.

Thank you and happy Friday!

How does this work without a secret?

Hi again! I'm close to testing this out, and I thought it was curious that no secret is defined. When I tested it out, indeed my operator logs told me that one is required:

1.676949727749151e+09   ERROR   Reconciler error        {"controller": "minicluster", "controllerGroup": "flux-framework.org", "controllerKind": "MiniCluster", "miniCluster": {"name":"flux-sample","namespace":"flux-operator"}, "namespace": "flux-operator", "name": "flux-sample", "reconcileID": "32f2f18f-4468-4a62-a8c3-a95f89a8db55", "error": "PersistentVolume \"data\" is invalid: [spec.csi.controllerPublishSecretRef.name: Required value, spec.csi.nodePublishSecretRef .name: Required value]"}

I don't see any mention of a secret in the docs - is there another way to go about it? I see that there are a few:

$ kubectl get secrets --all-namespaces | grep csi
gcs-fuse-csi-driver   default-token-mqzvz                              kubernetes.io/service-account-token   3      57m
gcs-fuse-csi-driver   gcs-fuse-csi-driver-webhook-secret               Opaque                                2      57m
gcs-fuse-csi-driver   gcs-fuse-csi-node-sa-token-2r6r9                 kubernetes.io/service-account-token   3      57m
kube-system           pdcsi-node-sa-token-nmn5k                        kubernetes.io/service-account-token   3      82m

Another question I have is (ballpark) how many pods should I leave for the storage to work? Thanks!

What is exact difference between "Provision your volume as a CSI ephemeral volume" vs "Provision your volume using static provisioning" in context of GCSFuse csi

In general Ephemeral volumes are tied to pods lifecycle. While Static volumes are not .
That is incase of static volumes data will stay even if pod gets deleted. While for Ephermeral data is lost when pod is deleted.

In case of gcsfuse csi, I see that in both cases (Static as well as ephermeral) data is going to stay on bucket, when pod is deleted.
So I am trying to understand what is the difference between Ephermeral and static volume approach in case of gcsfuse csi.
I have referred to this doc https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver
but could clearly distinguish between two.

The clear distinction will help me to choose between one of the two approaches.

Is it that any of the two approach can be used as per the convenience or there are some cases where one approach is preferred (or will not work) over other ?

the webhook failed to inject the sidecar container into the Pod spec!!!

在给pod单独添加注释后会接着抛出该错误了，但没能找到相对应的解决办法

	annotations:
	gke-gcsfuse/volumes: "true"
	gke-gcsfuse/cpu-limit: "10"
	gke-gcsfuse/memory-limit: 40Gi
	gke-gcsfuse/ephemeral-storage-limit: 20Gi

googlecloudplatform / gcs-fuse-csi-driver Goto Github PK

gcs-fuse-csi-driver's Introduction

Google Cloud Storage FUSE CSI Driver

Project Overview

Benefits

Project Status

GKE Compatibility

Get Started

Development and Contribution

Attribution

References

gcs-fuse-csi-driver's People

Contributors

Stargazers

Watchers

Forkers

gcs-fuse-csi-driver's Issues

Symptom

Root Cause

Solution

Symptom

Root cause

Solution

pod init container spec

k8s events

Symptom

Root Cause

Workaround

Fix

Symptom

Root Cause

Workaround

Solution

Source

Error:

Observation:

Context/Scenario

The Problem

Proposals

Option 1. Supporting GCP Service Account's Private Key in Kubernetes Secret

Option 2. Supporting Workload Identity Federation

Symptom

Workaround

Fix

Symptom

Root cause

Solution

Recommend Projects

Recommend Topics

Recommend Org