warm-metal / container-image-csi-driver Goto Github PK
View Code? Open in Web Editor NEWKubernetes CSI driver for mounting image
License: MIT License
Kubernetes CSI driver for mounting image
License: MIT License
I am going to refactor the project as well.
docs/developer.md
detailing the workflow for a developer
Split from #93
Currently the maintainers of warm-metal need to manually build and push the csi-image
to the docker registry.
We want to automatically build and push the csi-image to the docker registry (cause that's what we are currently using) using CI pipeline for avoiding any manual work.
We might want to use GitHub workflows for running the CI, that's what we are using currently for running integration tests.
I can submit another PR updating the README, but Right now I but into this errors:
> REGISTRY=gcr.io/myregistry make image
docker buildx build -t gcr.io/myregistry/csi-image:v0.5.1 --push .
[+] Building 30.1s (4/4) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 397B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> ERROR [internal] load metadata for docker.io/library/alpine:3.13 30.0s
=> CANCELED [internal] load metadata for docker.io/library/golang:1.16-alpine3.13 30.0s
------
> [internal] load metadata for docker.io/library/alpine:3.13:
------
Dockerfile:13
--------------------
11 | RUN CGO_ENABLED=0 go build -o csi-image-plugin ./cmd/plugin
12 |
13 | >>> FROM alpine:3.13
14 | WORKDIR /
15 | COPY --from=builder /go/src/csi-driver-image/csi-image-plugin /usr/bin/
--------------------
ERROR: failed to solve: alpine:3.13: failed to do request: Head "https://registry-1.docker.io/v2/library/alpine/manifests/3.13": dial tcp: i/o timeout
Also happens with make sanity
. I already learned that one needs a docker hub account and and then run ocker login before, but it did not help in my case. Any idea?
If I try to build just using docker, I run into this:
> docker build .
Sending build context to Docker daemon 107.1MB
Step 1/11 : FROM docker.io/library/golang:1.16-alpine3.13 as builder
1.16-alpine3.13: Pulling from library/golang
5758d4e389a3: Pull complete
04b7a40ca5d5: Pull complete
452a8c64b8e1: Pull complete
01da5aed4ae6: Pull complete
83967ad3b539: Pull complete
Digest: sha256:c538c29503b9ac4b874ae776a0537fe16fde4581af896d112e53a44a9963b116
Status: Downloaded newer image for golang:1.16-alpine3.13
---> e1b239f8b504
Step 2/11 : WORKDIR /go/src/csi-driver-image
---> Running in f367a24f22b6
Removing intermediate container f367a24f22b6
---> d87a9d13e8fe
Step 3/11 : COPY go.mod go.sum ./
---> 20a74138b43c
Step 4/11 : RUN go mod download
---> Running in 02f1947e5c3a
go: github.com/BurntSushi/[email protected]: Get "https://proxy.golang.org/github.com/%21burnt%21sushi/toml/@v/v0.3.1.mod": dial tcp: lookup proxy.golang.org: Try again
The command '/bin/sh -c go mod download' returned a non-zero code: 1
We recently added a logo for our repository in #80.
After we merged the PR, the logo is not loading for me on the README.
K8s docs for updating to v1.26 mentions about dropping support for CRI v1alpha2.
We are using CRI v1alpha1 in our code, which won't be supported in k8s 1.26 version.
This means that containerd minor version 1.5 and older are not supported from k8s 1.26 onwards.
Currently we try following semver versioning while creating GitHub releases, but that's not documented anywhere.
ToDo:
Hi,
Is there a plan to support CI on this repo?
Also, what is the testing approach? I was wondering if there could be a go command that runs unit tests with no minikube.
Would you please provide details on how to run the e2e tests. looks like it's using kubectl-dev tool.
This is the failure I noticed on my fork.
Dependabot PRs as of now will update the actions only. (Maybe in the future we should also consider updating Go dependencies.)
Currently the maintainers or contributors need to manually update the tag in Makefile
and Chart.yaml
for the tag we want to create.
It would be nice if we could do the following using automation -
Makefile
and Chart.yaml
with the tag we want to cut (or already have).The readme says ephemeral mounts can specify a secret namespace. It could be very dangerous to allow that without some kind of permissions involved. Other services don't allow a user controlled secret reference outside of the same namespace. How is this safe?
This ticket is the outcome of conversation in #71 (comment)
We don't have a mechanism to cancel in-flight requests in case driver pod restarts i.e., we don't have a graceful way to cancel contexts.
Whenever we create a new tag/release in GitHub, we expect that the image should be pushed for that tag in docker container registry (that's what we support currently).
Yesterday we merged this PR, and this action ran for image build and push on the default (master) branch.
But when we cut a tag v0.8.2 for that change, another action should have been triggered which built and pushed the image with that tag.
Based on #71 (comment)
Other refs around this pattern:
Hi,
Would like to test out this really promissing csi driver, but I receive the following error:
spec: failed to apply OCI options: relabel "/var/lib/kubelet/pods/ee74a41d-f2b5-4ac3-9722-80454387d5c9/volume-subpaths/source/nginx/18" with "system_u:object_r:data_t:s0:c246,c908" failed: lsetxattr /var/lib/kubelet/pods/ee74a41d-f2b5-4ac3-9722-80454387d5c9/volume-subpaths/source/nginx/18/p: read-only file system
I already changed the mount and the pod to be readable, but I still have that error.
I'm using EKS 1.28 with bottlerocket nodes.
Any ideas what I could try?
Edit: I got it working by setting readOnly: true
on the volume directly. Any idea how I can troubleshoot why a writable volume does not work?
Thanks!
We have been using this service for over two years in production. We would like to help support a helm chart which is a different paradigm than this git repo currently has. The reason is that we have actually built a helm chart using the install script and realized that it is a not needed when you know the needed settings for your cluster/kubelet. I have put the beginnings of one together in hopes that you will merge it in. #46
Patterns I have seen other repos that have the helm chart in the same repo as the code is make tags with prefix helm-chart-VERSION.
We could always create a new github repo that contains just the helm chart.
Also eventually we could publish the helm chart as well.
We are still facing the same issue [https://github.com//issues/49].
In closing comment, it's mentioned 'this warning is a misreport'.
Did anyone try with Jfrog Xray?
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-test-csi-image-test-simple-fs1
spec:
storageClassName: csi-image.warm-metal.tech
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: csi-image.warm-metal.tech
volumeHandle: "docker.io/warmmetal/csi-image-test:simple-fs"
volumeAttributes:
# # set pullAlways if you want to ignore local images
pullAlways: "true"
# # set secret if the image is private
# secret: "name of the ImagePullSecret"
# secretNamespace: "namespace of the secret"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-test-csi-image-test-simple-fs1
spec:
storageClassName: csi-image.warm-metal.tech
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 5Gi
volumeName: pv-test-csi-image-test-simple-fs1
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-test-csi-image-test-simple-fs2
spec:
storageClassName: csi-image.warm-metal.tech
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: csi-image.warm-metal.tech
volumeHandle: "docker.io/warmmetal/csi-image-test:simple-fs"
volumeAttributes:
# # set pullAlways if you want to ignore local images
pullAlways: "true"
# # set secret if the image is private
# secret: "name of the ImagePullSecret"
# secretNamespace: "namespace of the secret"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-test-csi-image-test-simple-fs2
spec:
storageClassName: csi-image.warm-metal.tech
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 5Gi
volumeName: pv-test-csi-image-test-simple-fs2
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-test-csi-image-test-simple-fs3
spec:
storageClassName: csi-image.warm-metal.tech
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: csi-image.warm-metal.tech
volumeHandle: "docker.io/warmmetal/csi-image-test:simple-fs"
volumeAttributes:
# # set pullAlways if you want to ignore local images
pullAlways: "true"
# # set secret if the image is private
# secret: "name of the ImagePullSecret"
# secretNamespace: "namespace of the secret"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-test-csi-image-test-simple-fs3
spec:
storageClassName: csi-image.warm-metal.tech
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 5Gi
volumeName: pv-test-csi-image-test-simple-fs3
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-test-csi-image-test-simple-fs4
spec:
storageClassName: csi-image.warm-metal.tech
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: csi-image.warm-metal.tech
volumeHandle: "docker.io/warmmetal/csi-image-test:simple-fs"
volumeAttributes:
# # set pullAlways if you want to ignore local images
pullAlways: "true"
# # set secret if the image is private
# secret: "name of the ImagePullSecret"
# secretNamespace: "namespace of the secret"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-test-csi-image-test-simple-fs4
spec:
storageClassName: csi-image.warm-metal.tech
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 5Gi
volumeName: pv-test-csi-image-test-simple-fs4
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-test-csi-image-test-simple-fs5
spec:
storageClassName: csi-image.warm-metal.tech
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: csi-image.warm-metal.tech
volumeHandle: "docker.io/warmmetal/csi-image-test:simple-fs"
volumeAttributes:
# # set pullAlways if you want to ignore local images
pullAlways: "true"
# # set secret if the image is private
# secret: "name of the ImagePullSecret"
# secretNamespace: "namespace of the secret"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-test-csi-image-test-simple-fs5
spec:
storageClassName: csi-image.warm-metal.tech
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 5Gi
volumeName: pv-test-csi-image-test-simple-fs5
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: core-4472
labels:
app: busybox
spec:
replicas: 10
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
nodeName: <specify-a-nodename-here>
containers:
- name: ephemeral-volume
image: polinux/stress-ng
command: ["stress-ng"]
args:
- "--vm=2"
- "--cpu=8"
- "--timeout=0"
env:
- name: TARGET1
value: /target1
- name: TARGET2
value: /target2
- name: TARGET3
value: /target3
- name: TARGET4
value: /target4
- name: TARGET5
value: /target5
volumeMounts:
- mountPath: /target1
name: target1
- mountPath: /target2
name: target2
- mountPath: /target3
name: target3
- mountPath: /target4
name: target4
- mountPath: /target5
name: target5
volumes:
- name: target1
persistentVolumeClaim:
claimName: pvc-test-csi-image-test-simple-fs1
- name: target2
persistentVolumeClaim:
claimName: pvc-test-csi-image-test-simple-fs2
- name: target3
persistentVolumeClaim:
claimName: pvc-test-csi-image-test-simple-fs3
- name: target4
persistentVolumeClaim:
claimName: pvc-test-csi-image-test-simple-fs4
- name: target5
persistentVolumeClaim:
claimName: pvc-test-csi-image-test-simple-fs5
Note that I have been able to reproduce this issue on different AWS instance types. Some of them being:
I think it can be reproduced in other machine types as well.
Hello again. I've noticed the following error from the kubelet related to a csi-driver-image ephemeral volume.
This is using AWS EKS 1.17.
I'm not sure if this error triggered by this project, but it seems possible. Sorry I don't have more details handy, but I'm happy to try other things.
"E0604 16:43:03.142973 5169 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/b56d58d9-5c81-4640-b17a-94f5718190eb-drupal-code podName:b56d58d9-5c81-4640-b17a-94f5718190eb nodeName:}" failed.
No retries permitted until 2021-06-04 16:45:05.142951668 +0000 UTC m=+146313.618357736 (durationBeforeRetry 2m2s).
Error: "UnmountVolume.TearDown failed for volume \"drupal-code\" (UniqueName: \"kubernetes.io/csi/b56d58d9-5c81-4640-b17a-94f5718190eb-drupal-code\") pod \"b56d58d9-5c81-4640-b17a-94f5718190eb\" (UID: \"b56d58d9-5c81-4640-b17a-94f5718190eb\") :
kubernetes.io/csi: mounter.TearDownAt failed:
rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil""
Sorry for using an issue to ask questions but the "Github Discussions" are not enabled on this repository.
We (as a company) are very interesting in this project and we have a use case where we have some component that are packaged as docker images and that need to be shared between many many pods. And we want to keep the flexibility to update those images regardless of their consuming pods (this is acceptable to require a restart though).
We have no yet found any satisactory solution to this use case and this projet is promising.
We have also looked at the https://github.com/kubernetes-csi/csi-driver-image-populator project but it is clearly stated as experimental.
I wanted to know what is the status of this projet ? is it being used somehow ? will there be any releases ?
Thanks for your answers.
Most of the context around this can be found in the discussion in #71 (comment)
I had a few successful test pods operating with ephemeral image volumes, but after 24 hours, I hit a consistent panic:
plugin I0408 18:52:05.573451 1 driver.go:93] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
plugin I0408 18:52:05.577550 1 server.go:108] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
plugin I0408 18:52:23.630273 1 node.go:34] request: volume_id:"csi-4f6ad6041b4625e5e585add44e98861b094b966ccc3df6cfd8e05faac735b572" target_path:"/var/snap/microk8s/common/var/lib/kubelet/pods/f0635891-7412-4d86-a6a2-82006a7244da/volumes/kubernete
s.io~csi/drupal-code/mount" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"csi.storage.k8s.io/ephemeral" value:"true" > volume_context:<key:"csi.storage.k8s.io/pod.name" value:"drupal-58fdc8f7cd-78xq8" > volume
_context:<key:"csi.storage.k8s.io/pod.namespace" value:"test-env-prod" > volume_context:<key:"csi.storage.k8s.io/pod.uid" value:"f0635891-7412-4d86-a6a2-82006a7244da" > volume_context:<key:"csi.storage.k8s.io/serviceAccount.name" value:"default" > volum
e_context:<key:"image" value:"REDACTED:ef0f6fda48eec00356401ee8d929e7d2fb9d9b98" > volume_context:<key:"secret" value:"regcred" >
plugin I0408 18:52:23.641371 1 mounter.go:99] no local image found. Pull image "REDACTED:ef0f6fda48eec00356401ee8d929e7d2fb9d9b98"
plugin panic: image "REDACTED:ef0f6fda48eec00356401ee8d929e7d2fb9d9b98": not found
plugin goroutine 81 [running]:
plugin github.com/warm-metal/csi-driver-image/pkg/backend/containerd.(*mounter).getImageRootFSChainID(0xc00014c000, 0x1bd90e0, 0xc000422270, 0xc000675340, 0x1b8df80, 0xc000144fa0, 0xc000224000, 0x97, 0x1, 0xc000226480, ...)
plugin /go/src/csi-driver-image/pkg/backend/containerd/mounter.go:108 +0xe90
plugin github.com/warm-metal/csi-driver-image/pkg/backend/containerd.(*mounter).refSnapshot(0xc00014c000, 0x1bd90e0, 0xc000422270, 0xc000675340, 0x1b8df80, 0xc000144fa0, 0xc0001da0a0, 0x44, 0xc000224000, 0x97, ...)
plugin /go/src/csi-driver-image/pkg/backend/containerd/mounter.go:130 +0xe5
plugin github.com/warm-metal/csi-driver-image/pkg/backend/containerd.(*mounter).Mount(0xc00014c000, 0x1bd90e0, 0xc000422270, 0x1b8df80, 0xc000144fa0, 0xc0001da0a0, 0x44, 0xc000224000, 0x97, 0xc000136300, ...)
plugin /go/src/csi-driver-image/pkg/backend/containerd/mounter.go:237 +0x2c5
plugin main.nodeServer.NodePublishVolume(0xc00013dfe0, 0x1b9a3e0, 0xc00014c000, 0x1be0960, 0xc00013dfd0, 0x1bd90e0, 0xc000422270, 0xc0000ae180, 0x18, 0x18, ...)
plugin /go/src/csi-driver-image/cmd/plugin/node.go:88 +0x422
plugin github.com/container-storage-interface/spec/lib/go/csi._Node_NodePublishVolume_Handler.func1(0x1bd90e0, 0xc000422270, 0x191ac80, 0xc0000ae180, 0x18, 0x18, 0x7f14ed854c28, 0xc00041d8c0)
plugin /go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5977 +0x89
plugin github.com/kubernetes-csi/drivers/pkg/csi-common.logGRPC(0x1bd90e0, 0xc000422270, 0x191ac80, 0xc0000ae180, 0xc00041d8a0, 0xc00041d8c0, 0xc000416ba0, 0x508d86, 0x18b85c0, 0xc000422270)
plugin /go/pkg/mod/github.com/kubernetes-csi/[email protected]/pkg/csi-common/utils.go:99 +0x15d
plugin github.com/container-storage-interface/spec/lib/go/csi._Node_NodePublishVolume_Handler(0x18a5280, 0xc000219230, 0x1bd90e0, 0xc000422270, 0xc000652600, 0x1a88310, 0x1bd90e0, 0xc000422270, 0xc00017c840, 0x28e)
plugin /go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5979 +0x150
plugin google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000ca000, 0x1bec140, 0xc000501500, 0xc000580100, 0xc0000c21e0, 0x263c530, 0x0, 0x0, 0x0)
plugin /go/pkg/mod/google.golang.org/[email protected]/server.go:1210 +0x522
plugin google.golang.org/grpc.(*Server).handleStream(0xc0000ca000, 0x1bec140, 0xc000501500, 0xc000580100, 0x0)
plugin /go/pkg/mod/google.golang.org/[email protected]/server.go:1533 +0xd05
plugin google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc000138b40, 0xc0000ca000, 0x1bec140, 0xc000501500, 0xc000580100)
plugin /go/pkg/mod/google.golang.org/[email protected]/server.go:871 +0xa5
plugin created by google.golang.org/grpc.(*Server).serveStreams.func1
plugin /go/pkg/mod/google.golang.org/[email protected]/server.go:869 +0x1fd
plugin stream closed
All suggestions in https://github.com/warm-metal/container-image-csi-driver/pull/83/files/d23fbed7bd143006d4b4b89bb9174c0a11ce8a4a#r1460466259 are addressed
Hi,
It would be nice to have a dynamic volume provisioning feature available, to be able to automatically provision Persistent Volumes
for user-provided PersistentVolumeClaims
. As PersistentVolumes
require cluster-wide access, in multi-tenant clusters users won't be able to create them, and it will require cluster-administrator support to be able to use CSI.
In order to support this, 2 API methods should be added to ControllerServer
- CreateVolume
and DeleteVolume
.
Also, ExternalProvisionner
sidecar should be included, and according to Recommended mechanism for deploying csi drivers, overall deployment scheme should be split into two parts:
Daemonset
with Identity
and Node
services + node-driver-registar
sidecarDeployment
with Identity
and Controller
services + external-provisioner
sidecar.As we already implemented this support in our private fork, i would like to contribute it back to the upstream to share with the community.
Thanks,
Max
Having a GitHub team like @warm-metal/maintainers
would make it easy for people creating issues or pull requests to tag the maintainers and bring attention to a particular issue/PR.
Integration tests caches output of some steps which resulted in failure to catch errors on PR.
While working on PR, the integration tests were passing.
But when the PR was merged to master
branch, the integration tests started failing.
Integration tests should fail if there are any errors in the test/code files.
As far as I know, EKS uses Docker, so I used the docker installation from the readme.
Immediately I need to make a change for the containerd socket path.
- hostPath:
- path: /run/docker/containerd/containerd.sock
+ path: /run/containerd/containerd.sock
type: Socket
Running the example, I see this event on the pod:
MountVolume.SetUp failed for volume "target" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Internal desc = no such file or directory
And this from the daemon pod:
plugin I0405 22:46:45.461145 1 node.go:34] request: volume_id:"csi-5cc35c366d3787f082ed8fc570a7c59f2feb712e15c44337c6b82c33aeaad8c0" target_path:"/var/lib/kubelet/pods/0fed9f68-4752-4ae7-a132-9899bede7947/volumes/kubernetes.io~csi/target/mount" vo
lume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"csi.storage.k8s.io/ephemeral" value:"true" > volume_context:<key:"csi.storage.k8s.io/pod.name" value:"ephemeral-volume-6jh2z" > volume_context:<key:"csi.storage.k8s.
io/pod.namespace" value:"default" > volume_context:<key:"csi.storage.k8s.io/pod.uid" value:"0fed9f68-4752-4ae7-a132-9899bede7947" > volume_context:<key:"csi.storage.k8s.io/serviceAccount.name" value:"default" > volume_context:<key:"image" value:"docker.
io/warmmetal/csi-image-test:simple-fs" >
plugin I0405 22:46:45.486677 1 mounter.go:117] image docker.io/warmmetal/csi-image-test:simple-fs unpacked
plugin I0405 22:46:45.510033 1 mounter.go:136] prepare sha256:07fb94c730eb52c6718a45978209e1249617058da234cd270f4bd587c7ddda87
plugin E0405 22:46:45.545963 1 mounter.go:254] fail to mount image docker.io/warmmetal/csi-image-test:simple-fs to /var/lib/kubelet/pods/0fed9f68-4752-4ae7-a132-9899bede7947/volumes/kubernetes.io~csi/target/mount: no such file or directory
plugin E0405 22:46:45.546766 1 mounter.go:245] found error no such file or directory. Prepare removing the snapshot just created
plugin I0405 22:46:45.550714 1 mounter.go:222] found snapshot csi-image.warm-metal.tech-sha256:07fb94c730eb52c6718a45978209e1249617058da234cd270f4bd587c7ddda87 for volume csi-5cc35c366d3787f082ed8fc570a7c59f2feb712e15c44337c6b82c33aeaad8c0. prepar
e to unref it.
plugin I0405 22:46:45.550730 1 mounter.go:178] unref snapshot csi-image.warm-metal.tech-sha256:07fb94c730eb52c6718a45978209e1249617058da234cd270f4bd587c7ddda87, parent sha256:07fb94c730eb52c6718a45978209e1249617058da234cd270f4bd587c7ddda87
plugin I0405 22:46:45.550740 1 mounter.go:190] no other mount refs snapshot csi-image.warm-metal.tech-sha256:07fb94c730eb52c6718a45978209e1249617058da234cd270f4bd587c7ddda87, remove it
plugin E0405 22:46:45.558154 1 utils.go:101] GRPC error: rpc error: code = Internal desc = no such file or directory
Secret-fetcher reads a namespace from the service account using the following path:
https://github.com/warm-metal/csi-driver-image/blob/d1930b722b4808503b8127400b38759a3587d22d/pkg/secret/cache.go#L183-L186
But according to the kubernetes documentation it should be /var/run/secrets/kubernetes.io/serviceaccount
Containerd is leaking snapshots due to what looks like the garbage collection not having a lease to reference
https://github.com/containerd/containerd/blob/main/docs/garbage-collection.md#how-to-use-leases
Because we have limited contributors at the moment, we are unable to fix the issues or merge the PRs in progress.
I think we should increase the values in the stalebot configurations to something like -
days-before-stale: 90
days-before-close: 30
I've configured the deployment by running:
warm-metal-csi-image-install --pull-image-secret-for-daemonset=<mykeyname> >csi-driver-image.yaml
I then copied the docker.io/warmmetal/csi-image-test:simple-fs image to my private registry (gcr.io/my-private-project) and deployed warm-metal/csi-driver-image/master/sample/ephemeral-volume.yaml with the image patched to my private registry. Initially this works, but if I run this a day later it fails:
MountVolume.SetUp failed for volume "target" : rpc error: code = Aborted desc = unable to pull image "gcr.io/my-private-project/csi-image-test:simple-fs": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/giza-workcells/csi-image-test:simple-fs": failed to resolve reference "gcr.io/my-private-project/csi-image-test:simple-fs": pulling from host gcr.io failed with status code [manifests simple-fs]: 401 Unauthorized
If I restart the csi (kubectl delete pod -n kube-system csi-image-warm-metal-ndqs5
) this will cause it to re-read them dockerconfig from the secret and everything works again.
Here is our service that refreshes the dockerconfig in the imagepullsecret: https://github.com/googlecloudrobotics/core/blob/HEAD/src/go/pkg/gcr/update_gcr_credentials.go
I have not found any flag to disable the caching. I can make a PR, but like to agree first on how to fix this:
Of course I am open for additional ideas. WDYT?
See #16
Users can set ImagePullSecrets via either,
secret
and secretNamespace
in the original VolumeContext
for the secret name and namespace respectively,.spec.imagePullSecrets
of workload Pods,.spec.imagePullSecrets
of the plugin Pods.If all of them are set for the same workload pod, secrets are sorted in the order of the above list.
I have successfully tested csi-driver-image with EKS, but I had to specify image pull secrets even though EKS nodes should have read access to ECR in the same account.
I tried assigning the kube-system/aws-node ServiceAccount to the csi-driver-image DaemonSet, but that did not help. Not sure why that is not working. Running a pod manually with this service account, I have the expected AWS Role and can pull from ECR.
Reviewing the code also seems like this should just magically work with the CredentialProvider, but it doesn't.
https://github.com/kubernetes/kubernetes/tree/master/pkg/credentialprovider/aws
Ideally this would work on EKS nodes with no additional configuration, but alternatively adding a ServiceAccount to the DaemonSet would be a fine solution.
Based on this comment from @kitt1987, the install
utility should be removed.
See #91 for more details.
The tests running as part of the CI don't provide any logs right now. Would be good to have the cluster logs as part of the CI run itself.
Currently the maintainer workflow doesn't exist for this project. It'd be good if we have the process documented so that we ensure standardization and ease of onboarding.
Currently a well defined developer/contributor workflow doesn't exist for this project. It'd be good to have a few things
The plugin currently only supports secrets specified in VolumeContext
. Secrets embedded in pod manifests and credential providers are more common in the cloud. I am going to support both of them. Progress will be tracked in the following issues respectively.
When a node gets powered off hard and then brought back up, kubernetes will restart the existing pod rather then recreating it. (This does happen in production. Got the t-shirt :)
emptydir volumes, which ephemeral image volumes are patterned after, keep their data on restart of the pod in this case.
This driver reverts the data back to the original image loosing the data. This should probably be fixed to follow the emptydir pattern. The volumeHandle was designed to be unique to the pod/volume combination to ensure this state can be recovered.
We might want to add stale-bot on GitHub issues to prevent outdated open issues.
Node plugin pod crashes sometimes after printing below message.
F0129 15:23:21.872324 1 containerd.go:75] unable to retrieve local image "public.ecr.aws/docker/library/amazonlinux:2022.0.20220308.1": image "public.ecr.aws/docker/library/amazonlinux:2022.0.20220308.1": not found
Kubelet is then unable to reach the CSI driver until the new pod gets into running state, and we see warning events like -
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 2m32s (x2 over 4m47s) kubelet MountVolume.SetUp failed for volume "target" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 2m28s (x2 over 4m44s) kubelet Unable to attach or mount volumes: unmounted volumes=[target], unattached volumes=[target kube-api-access-6k74b]: timed out waiting for the condition
Warning FailedMount 2m9s (x2 over 4m33s) kubelet MountVolume.SetUp failed for volume "target" : rpc error: code = Unavailable desc = error reading from server: EOF
Warning FailedMount 117s (x2 over 2m5s) kubelet MountVolume.SetUp failed for volume "target" : kubernetes.io/csi: mounter.SetUpAt failed to determine if the node service has VOLUME_MOUNT_GROUP capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/plugins/csi-image.warm-metal.tech/csi.sock: connect: connection refused"
Warning FailedMount 12s kubelet Unable to attach or mount volumes: unmounted volumes=[target], unattached volumes=[kube-api-access-6k74b target]: timed out waiting for the condition
Node-plugin pod shouldn't crash.
It doesn't happen always so it might not be as straightforward to reproduce the issue.
Generally the restarts are seen when containerD fails to pull an image, so you may try pulling multiple larger images at the same time.
Recently upgraded csi driver to v0.6.1 and started seeing pods pulling image from private registry started failing with mount failures issue along with below error:
MountVolume.SetUp failed for volume "target" : rpc error: code = Aborted desc = unable to pull image "74180267703.dkr.ecr.us-east-1.amazonaws.com/kalpana:0ab620f52e3641cb39f5b7e3b4c152e7f1651960-from-scratch": rpc error: code = Unknown desc = failed to pull image "74180267703.dkr.ecr.us-east-1.amazonaws.com/kalpana:0ab620f52e3641cb39f5b7e3b4c152e7f1651960-from-scratch": failed to resolve reference "74180267703.dkr.ecr.us-east-1.amazonaws.com/kalpana:0ab620f52e3641cb39f5b7e3b4c152e7f1651960-from-scratch": pulling from host 74180267703.dkr.ecr.us-east-1.amazonaws.com failed with status code [manifests simple-fs]: 401 Unauthorized
where pod contains below volume details
volumes:
- name: shared-files
persistentVolumeClaim:
claimName: e32de31b-814b-4e0e-a2e3-05c92c59066b-files
- csi:
driver: csi-image.warm-metal.tech
volumeAttributes:
image: 374180267703.dkr.ecr.us-east-1.amazonaws.com/kalpana:0ab620f52e3641cb39f5b7e3b4c152e7f1651960-from-scratch
name: test-code
Tried below below flags:
--enable-daemon-image-credential-cache=false
still having the same behaviour. Is anything I am missing
Note: same pod yaml is working with v0.5.1 csi image however its not with v0.6.1
Create PV with hello-world docker image, which is build from scratch
. Create PVC and Pod using this PV.
apiVersion: v1
kind: PersistentVolume
metadata:
name: test
spec:
storageClassName: system-image
capacity:
storage: 10G
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Delete
csi:
driver: csi-image.warm-metal.tech
volumeHandle: "hello-world"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 10G
storageClassName: system-image
volumeMode: Filesystem
---
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: busybox
image: docker.io/busybox:latest
command: ['tail', '-f', '/dev/null']
volumeMounts:
- name: test
mountPath: /test
readOnly: true
volumes:
- name: test
persistentVolumeClaim:
claimName: test
readOnly: true
The pod will be created with no errors:
[root@node ~]# kubectl get po | grep test
test 1/1 Running 0 4m37s
But if you try to delete this pod, it will stuck in terminating status:
[root@node ~]# kubectl get po | grep test
test 1/1 Terminating 0 6m11s
And the following error will occur in kubelet logs:
Jun 22 17:16:19 node kubelet[3615]: E0622 17:16:19.593027 3615 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/csi-image.warm-metal.tech^hello-world podName:9cf68544-0fe0-4a30-8ee1-95d845beb776 nodeName:}" failed. No retries permitted until 2023-06-22 17:16:27.592995497 +0000 UTC m=+189462.238591632 (durationBeforeRetry 8s). Error: UnmountVolume.TearDown failed for volume "test" (UniqueName: "kubernetes.io/csi/csi-image.warm-metal.tech^hello-world") pod "9cf68544-0fe0-4a30-8ee1-95d845beb776" (UID: "9cf68544-0fe0-4a30-8ee1-95d845beb776") : kubernetes.io/csi: Unmounter.TearDownAt failed to clean mount dir [/var/lib/kubelet/pods/9cf68544-0fe0-4a30-8ee1-95d845beb776/volumes/kubernetes.io~csi/test/mount]: kubernetes.io/csi: failed to remove dir [/var/lib/kubelet/pods/9cf68544-0fe0-4a30-8ee1-95d845beb776/volumes/kubernetes.io~csi/test/mount]: remove /var/lib/kubelet/pods/9cf68544-0fe0-4a30-8ee1-95d845beb776/volumes/kubernetes.io~csi/test/mount: device or resource busy
That is because snapshotter used a bind mount for this image
[root@node ~]#
findmnt | grep csi | grep test
|-/var/lib/kubelet/pods/9cf68544-0fe0-4a30-8ee1-95d845beb776/volumes/kubernetes.io~csi/test/mount /dev/mapper/vg00-root[/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/875/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
And it should be unmounted before deletion.
Node-server implements this unmounting feature:
https://github.com/warm-metal/csi-driver-image/blob/dab54201833d97838d6508455c0044ac28c979f3/cmd/plugin/node_server.go#L152-L162
But only if IsLikelyNotMountPoint
function returns false
- path is mount point.
According to the comments in this function it doesn't work with bind mounts, so unmount operation will be skipped and kubelet UnmountVolume.TearDown
will always fail.
Facing vulnerability issues with older golang, text versions.
current env:
EKS 1.22
csi-image: 0.5.1
is there any other release with fixed errors or when we can expect fixed version
I was asked to provide an example re-implementation for the asynchronous pull feature based on what I refer to as the session pattern. In today's community standup, we determined that we'd prefer to have an issue created for any PR that is submitted in order to facilitate a more fluid discussion. This ticket was created for that purpose. The associated PR #137 is a re-implementation of the async pull feature.
Some rationale for this request can be seen the the PR description and conversation. In the future, I'll check the issues and start the conversation here before starting implementation. Since this was a direct request, it was kind of executed backwards.
Anyway, I'm interested in any discussion that flows either here or in the PR.
The devcontainer fails when I try to run a kind cluster.
vscode ➜ /go/…/github.com/warm-metal/csi-driver-image $ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.27.3) 🖼
✗ Preparing nodes 📦
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: command "docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:38285:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: 6e439e708e7a15f0e19bcbbe2f3a590d694d5b8f8f883a5cb5d132a9450842c4
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: waiting for init preliminary setup: read init-p: connection reset by peer: unknown.
Note that I am using a MacBook with Apple M2 Chip and as such there's a difference between the architecture of the container and the host device.
Host device
$ uname -a
Darwin Mriyams-MacBook-Air.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:33:31 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T8112 arm64 arm Darwin
DevContainer
$ uname -a
Linux 58b19ceaedb6 6.5.11-linuxkit #1 SMP PREEMPT Wed Dec 6 17:08:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.