csi-addons / kubernetes-csi-addons Goto Github PK
View Code? Open in Web Editor NEWCSI-Addons implementation and APIs for Kubernetes
License: Apache License 2.0
CSI-Addons implementation and APIs for Kubernetes
License: Apache License 2.0
There is a lot of duplication of code in the GitHub Workflows. It can be reduced; take a look at https://github.blog/changelog/2021-08-25-github-actions-reduce-duplication-with-action-composition/
Originally posted by @Madhu-1 in #21 (review)
This should probably become an artifact when we do a release. Creating a tag in git can then generate the yaml, and post it as part of the release (somehow). It requires adding one or more jobs to the build-push
workflow we already have.
Originally posted by @nixpanic in #106 (comment)
Currently, the sidecar runs on the provided IP and port and starts a GRPC server which is not SSL/TLS support enabled. If there is no authentication enabled on the server side anyone can send a request targeting the nodes with some known ports and can easily perform node-level or controller-level operation, This could be a security problem for the production clusters. We need to support SSL/TLS for the GRPC server when it's running on the IP and port.
@nixpanic @Rakshith-R Thoughts?
I am running v0.8.0 of csi-addons and began getting crash loops with error messages referencing a missing VolumeGroupReplication CRD (and other related ones). Looking at the source these CRDs are present in the development branch but not the v0.8 tag.
Looking at setup-controller.yaml there is a :latest tag on the manager container. Changing this to v0.8.0 appears to resolve this issue, and if the manager is going to use the bundled CRDs it should probably be version tagged to ensure the two are consistent.
Due to LastSyncTime feature which got introduced in #232, we will reconcile each VR as per the scheduling interval or with default time because of this now During Each Reconcile EnableVolumeReplication and PromoteVolume request is sent to the CSI driver. This need to be optimized to avoid flooding logs to increase the performance of the csidriver and the kubernetes-csi-addons and also to avoid unwanted bugs at the volume replication level.
Add deployment guide which helps admins/users to deploy the csi-addons with different CSI drivers in standalone kubernetes/OCP clusters.
Add Replication capability to csi-addons admin/test tool along with currently supported reclaimspace and identity capability.
https://github.com/csi-addons/kubernetes-csi-addons/tree/main/cmd/csi-addons
Super linter has few linters( listed below) that are disabled currently and can be enabled by fixing the linters.
Hello
I've tried deploying the controller with the following instructions:
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/crds.yaml
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/rbac.yaml
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/setup-controller.yaml
```
but after the controller starts I get the following error:
`
2022-11-16T08:52:34.494Z INFO All workers finished {"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode"}
2022-11-16T08:52:34.494Z INFO All workers finished {"controller": "persistentvolumeclaim", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim"}
2022-11-16T08:52:34.494Z INFO All workers finished {"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication"}
2022-11-16T08:52:34.494Z INFO All workers finished {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob"}
2022-11-16T08:52:34.494Z INFO Stopping and waiting for caches
2022-11-16T08:52:34.495Z INFO Stopping and waiting for webhooks
2022-11-16T08:52:34.495Z INFO Wait completed, proceeding to shutdown the manager
E1116 08:52:34.495138 1 leaderelection.go:334] error initially creating leader election record: Post "https://10.233.0.1:443/apis/coordination.k8s.io/v1/namespaces/csi-addons-system/leases": context canceled
2022-11-16T08:52:34.495Z ERROR setup problem running manager {"error": "open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}
main.main
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/cmd/manager/main.go:179
runtime.main
/usr/local/go/src/runtime/proc.go:250
I had it previously installed in Kubernetes 1.23 and it was working fine
I tested my pr on rook side rook/rook#12286 which created the network fence cr, cr was created but when checked the ceph osd blocklist ls
the IP was not present in the list and the logs from there csi addons controller says
2023-06-23T07:37:48.023Z ERROR Failed to get NetworkFenceClient {"controller": "networkfence", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "NetworkFence", "NetworkFence": {"name":"ip-10-0-160-193.ec2.internal"}, "namespace": "", "name": "ip-10-0-160-193.ec2.internal", "reconcileID": "d8da1419-7035-4bc9-87eb-ab4273e47fcb", "DriverName": "rook-ceph.rbd.csi.ceph.com", "CIDRs": ["100.64.0.7:0"], "error": "no connections for driver: rook-ceph.rbd.csi.ceph.com"}
kc get networkfences.csiaddons.openshift.io ip-10-0-160-193.ec2.internal
NAME DRIVER CIDRS FENCESTATE AGE RESULT
ip-10-0-160-193.ec2.internal rook-ceph.rbd.csi.ceph.com ["100.64.0.7:0"] Fenced 3m12s
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kc get pv | grep rbd-pvc
pvc-0c65191a-ceb8-4769-bf31-5c0c113c5e1d 1Gi RWO Delete Bound rook-ceph/rbd-pvc rook-ceph-block 39m
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kc get pv pvc-0c65191a-ceb8-4769-bf31-5c0c113c5e1d -oyaml | grep imageName:
imageName: csi-vol-f9ad9b3d-4b7f-40d1-9f47-0d172d5153ba
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kubectl rook
[csi-addons-logs.txt](https://github.com/csi-addons/kubernetes-csi-addons/files/11847034/csi-addons-logs.txt)
-ceph rbd status replicapool/csi-vol-f9ad9b3d-4b7f-40d1-9f47-0d172d5153ba
Warning: rook version 'rook: v1.11.0-alpha.0.449.ge5bd73104-dirty' is running a pre-release version of Rook.
Watchers:
watcher=100.64.0.7:0/4143960263 client.16345 cookie=18446462598732840961
I'm uploading the complete logs of csi-addons-controller
If the DNM label is set on the PR the mergify should not merge the PR automatically if we have 2 approvals, this can be used to wait for others to also review the PR without blocking the PR by requesting changes
Add support to build multi-arch container images for all the Dockerfiles.
Image: quay.io/csiaddons/k8s-bundle:v0.1.1
CSV name: csi-addons.v0.0.1
We have multiple bundles in the odf-operator and all of them use the tag same as a CSV version. We would like CSI addons to do the same. Either we should change the CSV version or tag of the image to match both.
As we already know currently, a csiaddons node object is created, we create the connections and keep it until the addons node object is deleted. there could be advantages/disadvantages of this one. As csiaddons is meant to be a generic component and it will be used by multiple csi drivers. Just for an example of 10 nodes cluster and 2 csidrivers are using the csiaddons. We have 20 or 2 connections (for both provisioner and node plugin sidecars are deployed) opened and kept in the in-memory, thinking about the scale what about the 100 nodes clusters or even more csidrivers in a cluster?
I would like to hear thoughts from everyone on this one. cc @nixpanic @humblec @Rakshith-R @pkalever
Other than currents flags in /manager
, need a new flag which displays version -version
.
It will be helpful in our case, where we are building pipelines for image creation of csi-addons components, once the image is built, if we have the version displayed, it will be easy to add that as a verification step in pipeline.
The actions seem to be failing from around the same time this change was introduced.
https://github.com/csi-addons/kubernetes-csi-addons/pull/383/files
https://github.com/csi-addons/kubernetes-csi-addons/actions/workflows/build-push.yaml
We can further investigate and revert this after the release.
Originally posted by @Madhu-1 in #399 (comment)
This issue is to track following items:
Please add any missing item from the list in the comments below.
# operator-sdk gets installed from the tools/vendor/ directory.
OPERATOR_SDK = $(shell pwd)/bin/operator-sdk
.PHONY: operator-sdk
operator-sdk:
# FIXME: Remove `go mod tidy && go mod vendor` once we find the reason why ci workflow fails without it.
cd ./tools && go mod tidy && go mod vendor && go build -o $(OPERATOR_SDK) ./vendor/$(shell grep operator-sdk tools/tools.go | sed 's/.*_ "//;s/"//')
refer: #382
This task is to add more metrics i.e., lastSyncDuration and lastSyncBytes to the VolumeReplication status field.
Other than currents flags in /usr/bin/csi-addons-sidecar, need a new flag which displays version -version
.
It will be helpful in our case, where we are building pipelines for image creation of csi-addons components, once the image is built, if we have the version displayed, it will be easy to add that as a verification step in pipeline.
The CSIAddonsNode object contains details about the node and CSI-driver that provides a set of CSI-Addons features. It would be useful for debugging and validation to have the capabilities (from CSI-Addons Identity service) listed in the Status field of the CSIAddonsNode CR.
This could be done on the initial connection to the csi-addons-sidecar, at
We need to handle the nonimplemented error for RPC getVolumeReplication call.
For more details please check this:
#232 (review)
This tracks the work for k8s-operatorhub/community-operators#585
Some check-boxes in the PR are not set yet, and need some extra verification or other work.
Currently some settings are being read from cmdline args.
kubernetes-csi-addons/cmd/manager/main.go
Lines 69 to 76 in b9a147c
However, when csi-addons is deployed as operator using olms, even if the user manages
to change these cmdline args from the CSV, the settings are reverted when upgraded.
We need a CSI-Addons-config configmap for users to configure settings which are persisted throughout upgrade.
The CSI-Addons operator should use options from the CSI-Addons-config configmap if it exists.
While working on #66, it became obvious that there is a need for an email address that can be used for contacting maintainers of the project. I'd like to know
a. Google group (like CSI)
b. an email list maintained by Red Hat IT (will have @redhat.com
as domain)
c. email gateway to create an issue (like Fire)
d. .... something else?
My preference would be either b or c.
This would ease installation.
NB: I'm using ArgoCD to install apps. The make deploy
way of installing is not compatible.
@yati1998 ,can you open an issue to track handling NotImplemented Error ?
We can set lastSyncTime to nil if getvolrepInfo returns this error.
We'll need to fix this before another release of csi-addons operator.
Originally posted by @Rakshith-R in #232 (review)
Deployed csi addons as instructed here, and also enabled csi-addon sidecar as mentioned here.
Images used are bleeding edge latest
and canary
for ceph-csi.
Initially as CSIAddonsNode
CRD was not created on the API server, I updated the CRD and restarted the rbd CSI provisioner plugin, by scaling it down and then scaling it back up. This led to the pod name changing, and causing the following logs from the csi-addon deployment:
2022-09-13T15:44:44.946Z ERROR Failed to resolve endpoint {"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode", "CSIAddonsNode": {"name":"csi-rbdplugin-provisioner-57b556bb77-cq97r","namespace":"rook-ceph"}, "namespace": "rook-ceph", "name": "csi-rbdplugin-provisioner-57b556bb77-cq97r", "reconcileID": "85a93305-dccc-439b-85d9-38999b78af6d", "error": "failed to get pod rook-ceph/csi-rbdplugin-provisioner-57b556bb77-cq97r: Pod \"csi-rbdplugin-provisioner-57b556bb77-cq97r\" not found"}
github.com/csi-addons/kubernetes-csi-addons/controllers/csiaddons.(*CSIAddonsNodeReconciler).Reconcile
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/controllers/csiaddons/csiaddonsnode_controller.go:98
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
2022-09-13T15:44:44.949Z ERROR Reconciler error {"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode", "CSIAddonsNode": {"name":"csi-rbdplugin-provisioner-57b556bb77-cq97r","namespace":"rook-ceph"}, "namespace": "rook-ceph", "name": "csi-rbdplugin-provisioner-57b556bb77-cq97r", "reconcileID": "85a93305-dccc-439b-85d9-38999b78af6d", "error": "Failed to resolve endpoint \"pod://csi-rbdplugin-provisioner-57b556bb77-cq97r.rook-ceph:9070\": failed to get pod rook-ceph/csi-rbdplugin-provisioner-57b556bb77-cq97r: Pod \"csi-rbdplugin-provisioner-57b556bb77-cq97r\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
The reason seems to be 2 CSIAddonsNode resource created, one with the older rbd provisioner plugin instance and then one with the newer.
To overcome the same, I removed the finalizer on the older CSIAddonsNode resource and deleted the same.
Post the workaround things were working as expected.
Reporting the issue, in case additional code or documentation changes are required to address the issue.
It seems that config/crd/bases/csiaddons.openshift.io_networkfences.yaml
is not applied automatically.
Full output when running make deploy
to deploy the controller and CRDs:
$ make deploy
/home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cd config/manager && /home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/kustomize edit set image controller=quay.io/csiaddons/k8s-controller:latest
/home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/kustomize build config/default | kubectl apply -f -
namespace/csi-addons-system created
customresourcedefinition.apiextensions.k8s.io/csiaddonsnodes.csiaddons.openshift.io created
customresourcedefinition.apiextensions.k8s.io/reclaimspacejobs.csiaddons.openshift.io created
serviceaccount/csi-addons-controller-manager created
role.rbac.authorization.k8s.io/csi-addons-leader-election-role created
clusterrole.rbac.authorization.k8s.io/csi-addons-manager-role created
clusterrole.rbac.authorization.k8s.io/csi-addons-metrics-reader created
clusterrole.rbac.authorization.k8s.io/csi-addons-proxy-role created
rolebinding.rbac.authorization.k8s.io/csi-addons-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/csi-addons-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/csi-addons-proxy-rolebinding created
configmap/csi-addons-manager-config created
service/csi-addons-controller-manager-metrics-service created
deployment.apps/csi-addons-controller-manager created
Originally posted by @nixpanic in #49 (comment)
Each Storage System/workload may have different requirements.
It'll be great to have support for [choosing only|opt out of] one of [controller/node]reclaimspace operation to run.
Example:
Hi ,
We are trying to use ReclaimSpace job with rook-ceph
rook operator version is 1.12 and ceph 17.2.5. k8s version is 1.26.8
Followed steps given below :
([https://rook.io/docs/rook/v1.12/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#csi-addons-controller)]](https://rook.io/docs/rook/v1.12/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#csi-addons-controller)%5D)
now rbdplugin and provisioner pod has csi-addon sidecar and csi addon pod in csi-addons-system namespace running fine.
But when we create a ReclaimSpace job for rbd volume with Read Write Once mode it failes and shows message in reclaimSpaceJob
"Failed to make node request node client not found for nodeID"
there are no errors in provisioner pod and plugin pod where pvc is attached.
In csi-addon-controller pod there are logs "PANIC no leader found for driver rook-ceph.rbd.csi.ceph.com" Lease.coordination.k8s.io "rook-ceph-rbd-csi-ceph-com-csi-addons"
Please help to fix this
Times of log messages are in a format that is difficult to read format:
1.6593560518327549e+09 INFO Making controller reclaim space request {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "reclaimSpaceJob": {"name":"sample-1","namespace":"default"}, "namespace": "default", "name": "sample-1", "reconcileID": "dc87e88f-318a-40be-a87b-63151495d4d3", "PVCName": "rbd-pvc", "PVCNamespace": "default", "PVName": "pvc-82801af6-dccb-4323-a08c-743766b74028", "NodeID": "ip-10-0-149-143.ec2.internal", "nodeClient": "rook-ceph/csi-rbdplugin-4z2wj", "controllerClient": "rook-ceph/csi-rbdplugin-provisioner-5794db6555-npwr7"}
1.6593560520878572e+09 INFO Successfully completed reclaim space operation {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "reclaimSpaceJob": {"name":"sample-1","namespace":"default"}, "namespace": "default", "name": "sample-1", "reconcileID": "dc87e88f-318a-40be-a87b-63151495d4d3", "PVCName": "rbd-pvc", "PVCNamespace": "default", "PVName": "pvc-82801af6-dccb-4323-a08c-743766b74028", "NodeID": "ip-10-0-149-143.ec2.internal", "nodeClient": "rook-ceph/csi-rbdplugin-4z2wj", "controllerClient": "rook-ceph/csi-rbdplugin-provisioner-5794db6555-npwr7"}
It is not easy compare times with other logs, the current situation is very user unfriendly.
By using a dummy CSI-driver that implements CSI-Addons operations for known names of PVCs, it should be possible to test the whole process from user-input to updates status in the CRs.
This fake CSI-driver does not need to use any actual storage. It can just return success/failed responses without doing anything.
We need to generate the raw yamls required to install and deploy the controller so that those yamls can be directly used by others(basically rook for now) to deploy the controller and run the operation.
This issue to track the work required to move standalone volume replication operator into the kubernetes-csi-addons repo.
https://book.kubebuilder.io/migration/manually_migration_guide_gov3_to_gov4 follow the steps and migrate from deprecated kubebuilder v3 to v4
add development guide which helps people who wants to contribute
The certificates.k8s.io
API or some Kubernetes native certificate manager should be used for the connections between the controller and sidecar. The sidecar should have the ability to verify that the incoming connection is from a valid controller.
The controller should probably use a client certificate, and the sidecar should check verify that the owner has permissions to connect.
Add GitHub actions for static tools.
etc.
NetworkFence is missing in the test tool. It would be nice to have that added.
https://github.com/csi-addons/kubernetes-csi-addons/tree/main/cmd/csi-addons
@Yuggupta27 could you have a look at this?
Currently to install latest release you need to apply all resources from the release page. Adding or removing resources will break users.
Adding trivial kustomization.yaml grouping the resources will allow installing via:
kubectl apply -k https://github.com/csi-addons/kubernetes-csi-addons.git/deploy/controller?ref=tag
As of today, the kubernetes csi addons try to connect to the random controller that are registered and try to make the RPC calls to the random controller. This can create a problem if the csi driver has implemented some internal locking mechanism or has some local cache for the lifetime of that instance.
Example as below:-
CephCSI runs deployments for Replication/Reclaimspace etc and we will have two instances running. CephCSI Internally takes a lock and processes a request one at a time based on its internal logic. With the current kubernetes sidecar, it's not a problem because the sidecar runs with a leader election and only one can process a request but with kubernetes-csiaddons it becomes a problem as we don't have any such mechanism to reach the same controller/deployment which is processing the requests.
The request is to provide this kind of functionality so that it will be helpful for the CSI driver who is having this kind of requirement and moreover not to run active/active models as it can lead to many different models.
Currently Reclaimspace requests to csi drivers use a default timeout of 3 min which can only be overridden from the cmdline args.
kubernetes-csi-addons/cmd/manager/main.go
Line 74 in b9a147c
The time taken by these cmds executed by csi drivers may vary significantly based on factors like size of the PVC, io pattern etc.
Therefore, having ability to override this timeout at reclaimspacejob CR per PVC level will be very useful.
When the sidecar fails to create the CSIAddonsNode, it fails, and causes a deployment/deamonset of the CSI-driver to fail.
It is better to report the error of the missing CRD and the failure, and backoff-retry until the CRD becomes available (might be never). This will allow deploying the sidecar by default, even when users do not deploy the controller and the CRDs.
currently, this is done manually whoever is pushing the PR. this should be added to makefile not everyone is aware of how to generate proto or what version to use
Some useful tools for promoting and code quality checking:
The current way of annotating the PVC, namespace is not so useful because the customer needs to annotate the resources and doesn't provide any option for the admin if someone wants to auto reclaim the space, Provide an option set at the csi-addon controller that exposes some configurations to say auto reclaim space for the PV created by this driver.
Example: add support to ,
separate driver names which can auto reclaim to https://github.com/csi-addons/kubernetes-csi-addons/blob/main/deploy/controller/csi-addons-config.yaml or some other way.
The above is helpful for the PV key rotation as well.
Admins deploying the operator need not worry about security or storage if it is unused and unannotated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.