csi-addons / kubernetes-csi-addons Goto Github PK

CSI-Addons implementation and APIs for Kubernetes

License: Apache License 2.0

Go 96.75% Dockerfile 0.18% Makefile 3.07%

kubernetes kubernetes-csi container-storage-interface csi

kubernetes-csi-addons's Issues

ci: use composite actions for GitHub Workflows

There is a lot of duplication of code in the GitHub Workflows. It can be reduced; take a look at https://github.blog/changelog/2021-08-25-github-actions-reduce-duplication-with-action-composition/

Originally posted by @Madhu-1 in #21 (review)

add CI jobs to generate yaml files to install with kubectl

This should probably become an artifact when we do a release. Creating a tag in git can then generate the yaml, and post it as part of the release (somehow). It requires adding one or more jobs to the build-push workflow we already have.

Originally posted by @nixpanic in #106 (comment)

Support secured GRPC server for sidecar container

Currently, the sidecar runs on the provided IP and port and starts a GRPC server which is not SSL/TLS support enabled. If there is no authentication enabled on the server side anyone can send a request targeting the nodes with some known ports and can easily perform node-level or controller-level operation, This could be a security problem for the production clusters. We need to support SSL/TLS for the GRPC server when it's running on the IP and port.

@nixpanic @Rakshith-R Thoughts?

manager container uses latest tag resulting in possible CRD mismatch

I am running v0.8.0 of csi-addons and began getting crash loops with error messages referencing a missing VolumeGroupReplication CRD (and other related ones). Looking at the source these CRDs are present in the development branch but not the v0.8 tag.

Looking at setup-controller.yaml there is a :latest tag on the manager container. Changing this to v0.8.0 appears to resolve this issue, and if the manager is going to use the bundled CRDs it should probably be version tagged to ensure the two are consistent.

Unwanted RPC calls due to LastSyncTime feature

Due to LastSyncTime feature which got introduced in #232, we will reconcile each VR as per the scheduling interval or with default time because of this now During Each Reconcile EnableVolumeReplication and PromoteVolume request is sent to the CSI driver. This need to be optimized to avoid flooding logs to increase the performance of the csidriver and the kubernetes-csi-addons and also to avoid unwanted bugs at the volume replication level.

cc @ShyamsundarR @yati1998

Csi-addons-controller-manager error after reclaimSpaceJob

kubernetes version: 1.20.14 (Does Kubernetes-CSI-Addons support Kubernetes version 1.20.14?)
rook version: 1.9.0

"csi-addons-controller-manager" error, What's wrong with this?

Add deployment guide

Add deployment guide which helps admins/users to deploy the csi-addons with different CSI drivers in standalone kubernetes/OCP clusters.

Add Replication capability to csi-addons admin/test tool

Add Replication capability to csi-addons admin/test tool along with currently supported reclaimspace and identity capability.

https://github.com/csi-addons/kubernetes-csi-addons/tree/main/cmd/csi-addons

Fix disabled linters in Super Linter

Super linter has few linters( listed below) that are disabled currently and can be enabled by fixing the linters.

ci: GitHub workflows fail to build container-images for multiple architectures

#89 changed how architectures are passed to the GitHub workflows. It seems that #91 was built only on one platform, instead of linux/amd64, linux/arm64 and linux/arm/v7.

For some reason the secrets.BUILD_PLATFORMS does not seem to be set for GitHub Actions?

Controller installation fails in Kubernetes 1.24.6

Hello

I've tried deploying the controller with the following instructions:

kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/crds.yaml
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/rbac.yaml
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/setup-controller.yaml
```
but after the controller starts I get the following error:
`
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode"}
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "persistentvolumeclaim", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim"}
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication"}
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob"}
2022-11-16T08:52:34.494Z        INFO    Stopping and waiting for caches
2022-11-16T08:52:34.495Z        INFO    Stopping and waiting for webhooks
2022-11-16T08:52:34.495Z        INFO    Wait completed, proceeding to shutdown the manager
E1116 08:52:34.495138       1 leaderelection.go:334] error initially creating leader election record: Post "https://10.233.0.1:443/apis/coordination.k8s.io/v1/namespaces/csi-addons-system/leases": context canceled
2022-11-16T08:52:34.495Z        ERROR   setup   problem running manager {"error": "open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}
main.main
        /workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/cmd/manager/main.go:179
runtime.main
        /usr/local/go/src/runtime/proc.go:250

I had it previously installed in Kubernetes 1.23 and it was working fine

"error": "node Client not found"

Hello, I install the "kubernetes - csi - addons - 0.4.0", execute ReclaimSpaceJob error, is that where there is the problem

kubernetes version: 1.22
rook: 1.9.0

no connections for driver: rook-ceph.rbd.csi.ceph.com

I tested my pr on rook side rook/rook#12286 which created the network fence cr, cr was created but when checked the ceph osd blocklist ls the IP was not present in the list and the logs from there csi addons controller says

2023-06-23T07:37:48.023Z	ERROR	Failed to get NetworkFenceClient	{"controller": "networkfence", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "NetworkFence", "NetworkFence": {"name":"ip-10-0-160-193.ec2.internal"}, "namespace": "", "name": "ip-10-0-160-193.ec2.internal", "reconcileID": "d8da1419-7035-4bc9-87eb-ab4273e47fcb", "DriverName": "rook-ceph.rbd.csi.ceph.com", "CIDRs": ["100.64.0.7:0"], "error": "no connections for driver: rook-ceph.rbd.csi.ceph.com"}

kc get networkfences.csiaddons.openshift.io ip-10-0-160-193.ec2.internal 
NAME                           DRIVER                       CIDRS              FENCESTATE   AGE     RESULT
ip-10-0-160-193.ec2.internal   rook-ceph.rbd.csi.ceph.com   ["100.64.0.7:0"]   Fenced       3m12s   
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kc get pv | grep rbd-pvc
pvc-0c65191a-ceb8-4769-bf31-5c0c113c5e1d   1Gi        RWO            Delete           Bound    rook-ceph/rbd-pvc                                       rook-ceph-block            39m
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kc get pv pvc-0c65191a-ceb8-4769-bf31-5c0c113c5e1d -oyaml | grep imageName:
      imageName: csi-vol-f9ad9b3d-4b7f-40d1-9f47-0d172d5153ba
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kubectl rook
[csi-addons-logs.txt](https://github.com/csi-addons/kubernetes-csi-addons/files/11847034/csi-addons-logs.txt)
-ceph rbd status replicapool/csi-vol-f9ad9b3d-4b7f-40d1-9f47-0d172d5153ba
Warning: rook version 'rook: v1.11.0-alpha.0.449.ge5bd73104-dirty' is running a pre-release version of Rook.

Watchers:
	watcher=100.64.0.7:0/4143960263 client.16345 cookie=18446462598732840961

I'm uploading the complete logs of csi-addons-controller

update mergify rules to consider DNM label

If the DNM label is set on the PR the mergify should not merge the PR automatically if we have 2 approvals, this can be used to wait for others to also review the PR without blocking the PR by requesting changes

Add support to build multi-arch container images

Add support to build multi-arch container images for all the Dockerfiles.

CSV version should match with the tag of the bundle image

Image: quay.io/csiaddons/k8s-bundle:v0.1.1
CSV name: csi-addons.v0.0.1

We have multiple bundles in the odf-operator and all of them use the tag same as a CSV version. We would like CSI addons to do the same. Either we should change the CSV version or tag of the image to match both.

idle GRPC connections in controller

As we already know currently, a csiaddons node object is created, we create the connections and keep it until the addons node object is deleted. there could be advantages/disadvantages of this one. As csiaddons is meant to be a generic component and it will be used by multiple csi drivers. Just for an example of 10 nodes cluster and 2 csidrivers are using the csiaddons. We have 20 or 2 connections (for both provisioner and node plugin sidecars are deployed) opened and kept in the in-memory, thinking about the scale what about the 100 nodes clusters or even more csidrivers in a cluster?

Advantages

Reuse of connection for faster communication
(anything else?)

Disadvantages

More number of idle connections in in-memory (if there are no addons operations)
More resources utilization in the controller pod
More network calls as connection keep-alive need to be checked in intervals to make sure the connection is not broken

I would like to hear thoughts from everyone on this one. cc @nixpanic @humblec @Rakshith-R @pkalever

Add version flag for volume replication operator

Other than currents flags in /manager, need a new flag which displays version -version.

It will be helpful in our case, where we are building pipelines for image creation of csi-addons components, once the image is built, if we have the version displayed, it will be easy to add that as a verification step in pipeline.

Revert to quay.io for golang:1.20

The actions seem to be failing from around the same time this change was introduced.

https://github.com/csi-addons/kubernetes-csi-addons/pull/383/files

https://github.com/csi-addons/kubernetes-csi-addons/actions/workflows/build-push.yaml

We can further investigate and revert this after the release.

Originally posted by @Madhu-1 in #399 (comment)

Add github actions ci to test builds and push images

This issue is to track following items:

Decide name for controller and sidecar images (Please comment down suggestions)
Add build scripts for controller and sidecar images
Add ci to run go test and build
Add multi architechure build test
Add github action to push images to registry.

Please add any missing item from the list in the comments below.

FIXME: Remove `go mod tidy && go mod vendor` once we find the reason why ci workflow fails without it.

# operator-sdk gets installed from the tools/vendor/ directory.
OPERATOR_SDK = $(shell pwd)/bin/operator-sdk
.PHONY: operator-sdk
operator-sdk:
# FIXME: Remove `go mod tidy && go mod vendor` once we find the reason why ci workflow fails without it.
	cd ./tools && go mod tidy && go mod vendor && go build -o $(OPERATOR_SDK) ./vendor/$(shell grep operator-sdk tools/tools.go | sed 's/.*_ "//;s/"//')

refer: #382

replication: add lastsyncbytes and lastsyncduration to volumereplication object

This task is to add more metrics i.e., lastSyncDuration and lastSyncBytes to the VolumeReplication status field.

Add version flag for csi-addons-sidecar

Other than currents flags in /usr/bin/csi-addons-sidecar, need a new flag which displays version -version.

Include supported capabilities in CSIAddonsNode status output

The CSIAddonsNode object contains details about the node and CSI-driver that provides a set of CSI-Addons features. It would be useful for debugging and validation to have the capabilities (from CSI-Addons Identity service) listed in the Status field of the CSIAddonsNode CR.

This could be done on the initial connection to the csi-addons-sidecar, at

https://github.com/csi-addons/kubernetes-csi-addons/blob/493222166f4132a930137e03ba51960319cf39d0/controllers/csiaddons/csiaddonsnode_controller.go#L138C44-L144

Handle non implemented error for getVolumeReplicationInfo RPC

We need to handle the nonimplemented error for RPC getVolumeReplication call.
For more details please check this:
#232 (review)

Provide the csi-addons bundle in OperatorHub.io

This tracks the work for k8s-operatorhub/community-operators#585

Some check-boxes in the PR are not set yet, and need some extra verification or other work.

Add support for CSI-Addons-config cm for persistent settings

Currently some settings are being read from cmdline args.

kubernetes-csi-addons/cmd/manager/main.go

Lines 69 to 76 in b9a147c

 flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.") 

 flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") 

 flag.BoolVar(&enableLeaderElection, "leader-elect", false, 

 "Enable leader election for controller manager. "+ 

 "Enabling this will ensure there is only one active controller manager.") 

 flag.DurationVar(&reclaimSpaceTimeout, "reclaim-space-timeout", defaultTimeout, "Timeout for reclaimspace operation") 

 flag.IntVar(&maxConcurrentReconciles, "max-concurrent-reconciles", 100, "Maximum number of concurrent reconciles") 

 flag.BoolVar(&enableAdmissionWebhooks, "enable-admission-webhooks", true, "Enable the admission webhooks")

However, when csi-addons is deployed as operator using olms, even if the user manages
to change these cmdline args from the CSV, the settings are reverted when upgraded.

We need a CSI-Addons-config configmap for users to configure settings which are persisted throughout upgrade.

The CSI-Addons operator should use options from the CSI-Addons-config configmap if it exists.

cc @nixpanic @Madhu-1

Provide an email address to get in contact with maintainers and contributors

While working on #66, it became obvious that there is a need for an email address that can be used for contacting maintainers of the project. I'd like to know

a. Google group (like CSI)
b. an email list maintained by Red Hat IT (will have @redhat.com as domain)
c. email gateway to create an issue (like Fire)
d. .... something else?

My preference would be either b or c.

Add helm chart

This would ease installation.

NB: I'm using ArgoCD to install apps. The make deploy way of installing is not compatible.

Issue to track handling NotImplemented Error for GetVolumeReplicationInfo

    @yati1998 ,can you open an issue to track handling NotImplemented Error ?

We can set lastSyncTime to nil if getvolrepInfo returns this error.
We'll need to fix this before another release of csi-addons operator.

Originally posted by @Rakshith-R in #232 (review)

csi-addon is stuck on older `CSIAddonsNode` resource in reconcile

Deployed csi addons as instructed here, and also enabled csi-addon sidecar as mentioned here.

Images used are bleeding edge latest and canary for ceph-csi.

Initially as CSIAddonsNode CRD was not created on the API server, I updated the CRD and restarted the rbd CSI provisioner plugin, by scaling it down and then scaling it back up. This led to the pod name changing, and causing the following logs from the csi-addon deployment:

2022-09-13T15:44:44.946Z	ERROR	Failed to resolve endpoint	{"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode", "CSIAddonsNode": {"name":"csi-rbdplugin-provisioner-57b556bb77-cq97r","namespace":"rook-ceph"}, "namespace": "rook-ceph", "name": "csi-rbdplugin-provisioner-57b556bb77-cq97r", "reconcileID": "85a93305-dccc-439b-85d9-38999b78af6d", "error": "failed to get pod rook-ceph/csi-rbdplugin-provisioner-57b556bb77-cq97r: Pod \"csi-rbdplugin-provisioner-57b556bb77-cq97r\" not found"}
github.com/csi-addons/kubernetes-csi-addons/controllers/csiaddons.(*CSIAddonsNodeReconciler).Reconcile
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/controllers/csiaddons/csiaddonsnode_controller.go:98
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
2022-09-13T15:44:44.949Z	ERROR	Reconciler error	{"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode", "CSIAddonsNode": {"name":"csi-rbdplugin-provisioner-57b556bb77-cq97r","namespace":"rook-ceph"}, "namespace": "rook-ceph", "name": "csi-rbdplugin-provisioner-57b556bb77-cq97r", "reconcileID": "85a93305-dccc-439b-85d9-38999b78af6d", "error": "Failed to resolve endpoint \"pod://csi-rbdplugin-provisioner-57b556bb77-cq97r.rook-ceph:9070\": failed to get pod rook-ceph/csi-rbdplugin-provisioner-57b556bb77-cq97r: Pod \"csi-rbdplugin-provisioner-57b556bb77-cq97r\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234

The reason seems to be 2 CSIAddonsNode resource created, one with the older rbd provisioner plugin instance and then one with the newer.

To overcome the same, I removed the finalizer on the older CSIAddonsNode resource and deleted the same.

Post the workaround things were working as expected.

Reporting the issue, in case additional code or documentation changes are required to address the issue.

`make deploy` does not install NetworkFence CRD

It seems that config/crd/bases/csiaddons.openshift.io_networkfences.yaml is not applied automatically.

Full output when running make deploy to deploy the controller and CRDs:

$ make deploy
/home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cd config/manager && /home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/kustomize edit set image controller=quay.io/csiaddons/k8s-controller:latest
/home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/kustomize build config/default | kubectl apply -f -
namespace/csi-addons-system created
customresourcedefinition.apiextensions.k8s.io/csiaddonsnodes.csiaddons.openshift.io created
customresourcedefinition.apiextensions.k8s.io/reclaimspacejobs.csiaddons.openshift.io created
serviceaccount/csi-addons-controller-manager created
role.rbac.authorization.k8s.io/csi-addons-leader-election-role created
clusterrole.rbac.authorization.k8s.io/csi-addons-manager-role created
clusterrole.rbac.authorization.k8s.io/csi-addons-metrics-reader created
clusterrole.rbac.authorization.k8s.io/csi-addons-proxy-role created
rolebinding.rbac.authorization.k8s.io/csi-addons-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/csi-addons-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/csi-addons-proxy-rolebinding created
configmap/csi-addons-manager-config created
service/csi-addons-controller-manager-metrics-service created
deployment.apps/csi-addons-controller-manager created

Originally posted by @nixpanic in #49 (comment)

Add support for [choosing only|opt out of] one of [controller/node]reclaimspace operation to run

Each Storage System/workload may have different requirements.

It'll be great to have support for [choosing only|opt out of] one of [controller/node]reclaimspace operation to run.

Example:

Running only nodereclaimspace operation on a PVC.
Advertising support for only controllerreclaimspace operation by a csidriver

reclaimSpace job not working

Hi ,

We are trying to use ReclaimSpace job with rook-ceph
rook operator version is 1.12 and ceph 17.2.5. k8s version is 1.26.8
Followed steps given below :
([https://rook.io/docs/rook/v1.12/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#csi-addons-controller)]](https://rook.io/docs/rook/v1.12/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#csi-addons-controller)%5D)
now rbdplugin and provisioner pod has csi-addon sidecar and csi addon pod in csi-addons-system namespace running fine.
But when we create a ReclaimSpace job for rbd volume with Read Write Once mode it failes and shows message in reclaimSpaceJob
"Failed to make node request node client not found for nodeID"

there are no errors in provisioner pod and plugin pod where pvc is attached.
In csi-addon-controller pod there are logs "PANIC no leader found for driver rook-ceph.rbd.csi.ceph.com" Lease.coordination.k8s.io "rook-ceph-rbd-csi-ceph-com-csi-addons"

Please help to fix this

Provide readable timestamps in the logs

Times of log messages are in a format that is difficult to read format:

1.6593560518327549e+09	INFO	Making controller reclaim space request	{"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "reclaimSpaceJob": {"name":"sample-1","namespace":"default"}, "namespace": "default", "name": "sample-1", "reconcileID": "dc87e88f-318a-40be-a87b-63151495d4d3", "PVCName": "rbd-pvc", "PVCNamespace": "default", "PVName": "pvc-82801af6-dccb-4323-a08c-743766b74028", "NodeID": "ip-10-0-149-143.ec2.internal", "nodeClient": "rook-ceph/csi-rbdplugin-4z2wj", "controllerClient": "rook-ceph/csi-rbdplugin-provisioner-5794db6555-npwr7"}
1.6593560520878572e+09	INFO	Successfully completed reclaim space operation	{"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "reclaimSpaceJob": {"name":"sample-1","namespace":"default"}, "namespace": "default", "name": "sample-1", "reconcileID": "dc87e88f-318a-40be-a87b-63151495d4d3", "PVCName": "rbd-pvc", "PVCNamespace": "default", "PVName": "pvc-82801af6-dccb-4323-a08c-743766b74028", "NodeID": "ip-10-0-149-143.ec2.internal", "nodeClient": "rook-ceph/csi-rbdplugin-4z2wj", "controllerClient": "rook-ceph/csi-rbdplugin-provisioner-5794db6555-npwr7"}

It is not easy compare times with other logs, the current situation is very user unfriendly.

Add a dummy CSI driver that can be used for testing controller/sidecar interactions

By using a dummy CSI-driver that implements CSI-Addons operations for known names of PVCs, it should be possible to test the whole process from user-input to updates status in the CRs.

This fake CSI-driver does not need to use any actual storage. It can just return success/failed responses without doing anything.

Generate the manifest files for crds and controller manager

We need to generate the raw yamls required to install and deploy the controller so that those yamls can be directly used by others(basically rook for now) to deploy the controller and run the operation.

Move volume replication operator to kubernetes-csi-addons

This issue to track the work required to move standalone volume replication operator into the kubernetes-csi-addons repo.

copy crds from https://github.com/csi-addons/volume-replication-operator/tree/main/api/v1alpha1
generate crds using the above go files
Update config/manifests, config/rbac to include volume replication to make bundle
add vol rep capability to csi-addons spec
add sidecar logic to csi-addons #205
add vol rep controllers logic to kubernetes to csi-addons

Migrate kubebuilder v3 to v4

https://book.kubebuilder.io/migration/manually_migration_guide_gov3_to_gov4 follow the steps and migrate from deprecated kubebuilder v3 to v4

Add development guide

add development guide which helps people who wants to contribute

Use encrypted/authenticated connections between controller <-> sidecar

The certificates.k8s.io API or some Kubernetes native certificate manager should be used for the connections between the controller and sidecar. The sidecar should have the ability to verify that the incoming connection is from a valid controller.

The controller should probably use a client certificate, and the sidecar should check verify that the owner has permissions to connect.

Add github actions for static tools

Add GitHub actions for static tools.

golint
shellscripts
yaml files
.md files

etc.

Add NetworkFence capability to csi-addons admin/test tool

NetworkFence is missing in the test tool. It would be nice to have that added.

https://github.com/csi-addons/kubernetes-csi-addons/tree/main/cmd/csi-addons

@Yuggupta27 could you have a look at this?

Provide kustomization for deployment

Currently to install latest release you need to apply all resources from the release page. Adding or removing resources will break users.

Adding trivial kustomization.yaml grouping the resources will allow installing via:

kubectl apply -k https://github.com/csi-addons/kubernetes-csi-addons.git/deploy/controller?ref=tag

All the Controller Operations should reach the one Controller (active) not multiple Controllers

As of today, the kubernetes csi addons try to connect to the random controller that are registered and try to make the RPC calls to the random controller. This can create a problem if the csi driver has implemented some internal locking mechanism or has some local cache for the lifetime of that instance.

Example as below:-

CephCSI runs deployments for Replication/Reclaimspace etc and we will have two instances running. CephCSI Internally takes a lock and processes a request one at a time based on its internal logic. With the current kubernetes sidecar, it's not a problem because the sidecar runs with a leader election and only one can process a request but with kubernetes-csiaddons it becomes a problem as we don't have any such mechanism to reach the same controller/deployment which is processing the requests.

The request is to provide this kind of functionality so that it will be helpful for the CSI driver who is having this kind of requirement and moreover not to run active/active models as it can lead to many different models.

reclaimspace: Add support for custom timeouts

Currently Reclaimspace requests to csi drivers use a default timeout of 3 min which can only be overridden from the cmdline args.

kubernetes-csi-addons/cmd/manager/main.go

Line 74 in b9a147c

 flag.DurationVar(&reclaimSpaceTimeout, "reclaim-space-timeout", defaultTimeout, "Timeout for reclaimspace operation") 

The time taken by these cmds executed by csi drivers may vary significantly based on factors like size of the PVC, io pattern etc.

Therefore, having ability to override this timeout at reclaimspacejob CR per PVC level will be very useful.

@nixpanic @Madhu-1

In case the CSIAddonsNode CRD is not available, the sidcar should not abort but retry

When the sidecar fails to create the CSIAddonsNode, it fails, and causes a deployment/deamonset of the CSI-driver to fail.

It is better to report the error of the missing CRD and the failure, and backoff-retry until the CRD becomes available (might be never). This will allow deploying the sidecar by default, even when users do not deploy the controller and the CRDs.

update Makefile to generate golang libs for interal protos

currently, this is done manually whoever is pushing the PR. this should be added to makefile not everyone is aware of how to generate proto or what version to use

Include status badges and links in the top of the `README.md`

Some useful tools for promoting and code quality checking:

Auto reclaim storage based on the storageclass/namespace

The current way of annotating the PVC, namespace is not so useful because the customer needs to annotate the resources and doesn't provide any option for the admin if someone wants to auto reclaim the space, Provide an option set at the csi-addon controller that exposes some configurations to say auto reclaim space for the PV created by this driver.

Example: add support to , separate driver names which can auto reclaim to https://github.com/csi-addons/kubernetes-csi-addons/blob/main/deploy/controller/csi-addons-config.yaml or some other way.

The above is helpful for the PV key rotation as well.

Admins deploying the operator need not worry about security or storage if it is unused and unannotated.

	flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
	flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
	flag.BoolVar(&enableLeaderElection, "leader-elect", false,
	"Enable leader election for controller manager. "+
	"Enabling this will ensure there is only one active controller manager.")
	flag.DurationVar(&reclaimSpaceTimeout, "reclaim-space-timeout", defaultTimeout, "Timeout for reclaimspace operation")
	flag.IntVar(&maxConcurrentReconciles, "max-concurrent-reconciles", 100, "Maximum number of concurrent reconciles")
	flag.BoolVar(&enableAdmissionWebhooks, "enable-admission-webhooks", true, "Enable the admission webhooks")

csi-addons / kubernetes-csi-addons Goto Github PK

kubernetes-csi-addons's Issues

Advantages

Disadvantages

Recommend Projects

Recommend Topics

Recommend Org