Giter Club home page Giter Club logo

kubernetes-csi-addons's Issues

Support secured GRPC server for sidecar container

Currently, the sidecar runs on the provided IP and port and starts a GRPC server which is not SSL/TLS support enabled. If there is no authentication enabled on the server side anyone can send a request targeting the nodes with some known ports and can easily perform node-level or controller-level operation, This could be a security problem for the production clusters. We need to support SSL/TLS for the GRPC server when it's running on the IP and port.

@nixpanic @Rakshith-R Thoughts?

manager container uses latest tag resulting in possible CRD mismatch

I am running v0.8.0 of csi-addons and began getting crash loops with error messages referencing a missing VolumeGroupReplication CRD (and other related ones). Looking at the source these CRDs are present in the development branch but not the v0.8 tag.

Looking at setup-controller.yaml there is a :latest tag on the manager container. Changing this to v0.8.0 appears to resolve this issue, and if the manager is going to use the bundled CRDs it should probably be version tagged to ensure the two are consistent.

Unwanted RPC calls due to LastSyncTime feature

Due to LastSyncTime feature which got introduced in #232, we will reconcile each VR as per the scheduling interval or with default time because of this now During Each Reconcile EnableVolumeReplication and PromoteVolume request is sent to the CSI driver. This need to be optimized to avoid flooding logs to increase the performance of the csidriver and the kubernetes-csi-addons and also to avoid unwanted bugs at the volume replication level.

cc @ShyamsundarR @yati1998

Add deployment guide

Add deployment guide which helps admins/users to deploy the csi-addons with different CSI drivers in standalone kubernetes/OCP clusters.

Fix disabled linters in Super Linter

Super linter has few linters( listed below) that are disabled currently and can be enabled by fixing the linters.

  • hadolint
  • golanci-lint
  • jscpd
  • kubernetes kubeconform
  • markdown
  • protolint

Controller installation fails in Kubernetes 1.24.6

Hello

I've tried deploying the controller with the following instructions:

kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/crds.yaml
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/rbac.yaml
kubectl create -f https://raw.githubusercontent.com/csi-addons/kubernetes-csi-addons/v0.5.0/deploy/controller/setup-controller.yaml
```
but after the controller starts I get the following error:
`
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode"}
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "persistentvolumeclaim", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim"}
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication"}
2022-11-16T08:52:34.494Z        INFO    All workers finished    {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob"}
2022-11-16T08:52:34.494Z        INFO    Stopping and waiting for caches
2022-11-16T08:52:34.495Z        INFO    Stopping and waiting for webhooks
2022-11-16T08:52:34.495Z        INFO    Wait completed, proceeding to shutdown the manager
E1116 08:52:34.495138       1 leaderelection.go:334] error initially creating leader election record: Post "https://10.233.0.1:443/apis/coordination.k8s.io/v1/namespaces/csi-addons-system/leases": context canceled
2022-11-16T08:52:34.495Z        ERROR   setup   problem running manager {"error": "open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}
main.main
        /workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/cmd/manager/main.go:179
runtime.main
        /usr/local/go/src/runtime/proc.go:250

I had it previously installed in Kubernetes 1.23 and it was working fine

"error": "node Client not found"

Hello, I install the "kubernetes - csi - addons - 0.4.0", execute ReclaimSpaceJob error, is that where there is the problem

kubernetes version: 1.22
rook: 1.9.0

image

no connections for driver: rook-ceph.rbd.csi.ceph.com

I tested my pr on rook side rook/rook#12286 which created the network fence cr, cr was created but when checked the ceph osd blocklist ls the IP was not present in the list and the logs from there csi addons controller says

2023-06-23T07:37:48.023Z	ERROR	Failed to get NetworkFenceClient	{"controller": "networkfence", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "NetworkFence", "NetworkFence": {"name":"ip-10-0-160-193.ec2.internal"}, "namespace": "", "name": "ip-10-0-160-193.ec2.internal", "reconcileID": "d8da1419-7035-4bc9-87eb-ab4273e47fcb", "DriverName": "rook-ceph.rbd.csi.ceph.com", "CIDRs": ["100.64.0.7:0"], "error": "no connections for driver: rook-ceph.rbd.csi.ceph.com"}
kc get networkfences.csiaddons.openshift.io ip-10-0-160-193.ec2.internal 
NAME                           DRIVER                       CIDRS              FENCESTATE   AGE     RESULT
ip-10-0-160-193.ec2.internal   rook-ceph.rbd.csi.ceph.com   ["100.64.0.7:0"]   Fenced       3m12s   
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kc get pv | grep rbd-pvc
pvc-0c65191a-ceb8-4769-bf31-5c0c113c5e1d   1Gi        RWO            Delete           Bound    rook-ceph/rbd-pvc                                       rook-ceph-block            39m
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kc get pv pvc-0c65191a-ceb8-4769-bf31-5c0c113c5e1d -oyaml | grep imageName:
      imageName: csi-vol-f9ad9b3d-4b7f-40d1-9f47-0d172d5153ba
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (fix-node-loss-rbd) $ kubectl rook
[csi-addons-logs.txt](https://github.com/csi-addons/kubernetes-csi-addons/files/11847034/csi-addons-logs.txt)
-ceph rbd status replicapool/csi-vol-f9ad9b3d-4b7f-40d1-9f47-0d172d5153ba
Warning: rook version 'rook: v1.11.0-alpha.0.449.ge5bd73104-dirty' is running a pre-release version of Rook.

Watchers:
	watcher=100.64.0.7:0/4143960263 client.16345 cookie=18446462598732840961

I'm uploading the complete logs of csi-addons-controller

update mergify rules to consider DNM label

If the DNM label is set on the PR the mergify should not merge the PR automatically if we have 2 approvals, this can be used to wait for others to also review the PR without blocking the PR by requesting changes

CSV version should match with the tag of the bundle image

Image: quay.io/csiaddons/k8s-bundle:v0.1.1
CSV name: csi-addons.v0.0.1

We have multiple bundles in the odf-operator and all of them use the tag same as a CSV version. We would like CSI addons to do the same. Either we should change the CSV version or tag of the image to match both.

idle GRPC connections in controller

As we already know currently, a csiaddons node object is created, we create the connections and keep it until the addons node object is deleted. there could be advantages/disadvantages of this one. As csiaddons is meant to be a generic component and it will be used by multiple csi drivers. Just for an example of 10 nodes cluster and 2 csidrivers are using the csiaddons. We have 20 or 2 connections (for both provisioner and node plugin sidecars are deployed) opened and kept in the in-memory, thinking about the scale what about the 100 nodes clusters or even more csidrivers in a cluster?

Advantages

  • Reuse of connection for faster communication
  • (anything else?)

Disadvantages

  • More number of idle connections in in-memory (if there are no addons operations)
  • More resources utilization in the controller pod
  • More network calls as connection keep-alive need to be checked in intervals to make sure the connection is not broken

I would like to hear thoughts from everyone on this one. cc @nixpanic @humblec @Rakshith-R @pkalever

Add version flag for volume replication operator

Other than currents flags in /manager, need a new flag which displays version -version.

It will be helpful in our case, where we are building pipelines for image creation of csi-addons components, once the image is built, if we have the version displayed, it will be easy to add that as a verification step in pipeline.

Add github actions ci to test builds and push images

This issue is to track following items:

  • Decide name for controller and sidecar images (Please comment down suggestions)
  • Add build scripts for controller and sidecar images
  • Add ci to run go test and build
  • Add multi architechure build test
  • Add github action to push images to registry.

Please add any missing item from the list in the comments below.

FIXME: Remove `go mod tidy && go mod vendor` once we find the reason why ci workflow fails without it.

# operator-sdk gets installed from the tools/vendor/ directory.
OPERATOR_SDK = $(shell pwd)/bin/operator-sdk
.PHONY: operator-sdk
operator-sdk:
# FIXME: Remove `go mod tidy && go mod vendor` once we find the reason why ci workflow fails without it.
	cd ./tools && go mod tidy && go mod vendor && go build -o $(OPERATOR_SDK) ./vendor/$(shell grep operator-sdk tools/tools.go | sed 's/.*_ "//;s/"//')

refer: #382

Add version flag for csi-addons-sidecar

Other than currents flags in /usr/bin/csi-addons-sidecar, need a new flag which displays version -version.

It will be helpful in our case, where we are building pipelines for image creation of csi-addons components, once the image is built, if we have the version displayed, it will be easy to add that as a verification step in pipeline.

Include supported capabilities in CSIAddonsNode status output

The CSIAddonsNode object contains details about the node and CSI-driver that provides a set of CSI-Addons features. It would be useful for debugging and validation to have the capabilities (from CSI-Addons Identity service) listed in the Status field of the CSIAddonsNode CR.

This could be done on the initial connection to the csi-addons-sidecar, at

https://github.com/csi-addons/kubernetes-csi-addons/blob/493222166f4132a930137e03ba51960319cf39d0/controllers/csiaddons/csiaddonsnode_controller.go#L138C44-L144

Add support for CSI-Addons-config cm for persistent settings

Currently some settings are being read from cmdline args.

flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.DurationVar(&reclaimSpaceTimeout, "reclaim-space-timeout", defaultTimeout, "Timeout for reclaimspace operation")
flag.IntVar(&maxConcurrentReconciles, "max-concurrent-reconciles", 100, "Maximum number of concurrent reconciles")
flag.BoolVar(&enableAdmissionWebhooks, "enable-admission-webhooks", true, "Enable the admission webhooks")

However, when csi-addons is deployed as operator using olms, even if the user manages
to change these cmdline args from the CSV, the settings are reverted when upgraded.

We need a CSI-Addons-config configmap for users to configure settings which are persisted throughout upgrade.

The CSI-Addons operator should use options from the CSI-Addons-config configmap if it exists.

cc @nixpanic @Madhu-1

Add helm chart

This would ease installation.

NB: I'm using ArgoCD to install apps. The make deploy way of installing is not compatible.

csi-addon is stuck on older `CSIAddonsNode` resource in reconcile

Deployed csi addons as instructed here, and also enabled csi-addon sidecar as mentioned here.

Images used are bleeding edge latest and canary for ceph-csi.

Initially as CSIAddonsNode CRD was not created on the API server, I updated the CRD and restarted the rbd CSI provisioner plugin, by scaling it down and then scaling it back up. This led to the pod name changing, and causing the following logs from the csi-addon deployment:

2022-09-13T15:44:44.946Z	ERROR	Failed to resolve endpoint	{"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode", "CSIAddonsNode": {"name":"csi-rbdplugin-provisioner-57b556bb77-cq97r","namespace":"rook-ceph"}, "namespace": "rook-ceph", "name": "csi-rbdplugin-provisioner-57b556bb77-cq97r", "reconcileID": "85a93305-dccc-439b-85d9-38999b78af6d", "error": "failed to get pod rook-ceph/csi-rbdplugin-provisioner-57b556bb77-cq97r: Pod \"csi-rbdplugin-provisioner-57b556bb77-cq97r\" not found"}
github.com/csi-addons/kubernetes-csi-addons/controllers/csiaddons.(*CSIAddonsNodeReconciler).Reconcile
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/controllers/csiaddons/csiaddonsnode_controller.go:98
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
2022-09-13T15:44:44.949Z	ERROR	Reconciler error	{"controller": "csiaddonsnode", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "CSIAddonsNode", "CSIAddonsNode": {"name":"csi-rbdplugin-provisioner-57b556bb77-cq97r","namespace":"rook-ceph"}, "namespace": "rook-ceph", "name": "csi-rbdplugin-provisioner-57b556bb77-cq97r", "reconcileID": "85a93305-dccc-439b-85d9-38999b78af6d", "error": "Failed to resolve endpoint \"pod://csi-rbdplugin-provisioner-57b556bb77-cq97r.rook-ceph:9070\": failed to get pod rook-ceph/csi-rbdplugin-provisioner-57b556bb77-cq97r: Pod \"csi-rbdplugin-provisioner-57b556bb77-cq97r\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/go/src/github.com/csi-addons/kubernetes-csi-addons/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234

The reason seems to be 2 CSIAddonsNode resource created, one with the older rbd provisioner plugin instance and then one with the newer.

To overcome the same, I removed the finalizer on the older CSIAddonsNode resource and deleted the same.

Post the workaround things were working as expected.

Reporting the issue, in case additional code or documentation changes are required to address the issue.

`make deploy` does not install NetworkFence CRD

It seems that config/crd/bases/csiaddons.openshift.io_networkfences.yaml is not applied automatically.

Full output when running make deploy to deploy the controller and CRDs:

$ make deploy
/home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
cd config/manager && /home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/kustomize edit set image controller=quay.io/csiaddons/k8s-controller:latest
/home/ndevos/go/src/github.com/csi-addons/kubernetes-csi-addons/bin/kustomize build config/default | kubectl apply -f -
namespace/csi-addons-system created
customresourcedefinition.apiextensions.k8s.io/csiaddonsnodes.csiaddons.openshift.io created
customresourcedefinition.apiextensions.k8s.io/reclaimspacejobs.csiaddons.openshift.io created
serviceaccount/csi-addons-controller-manager created
role.rbac.authorization.k8s.io/csi-addons-leader-election-role created
clusterrole.rbac.authorization.k8s.io/csi-addons-manager-role created
clusterrole.rbac.authorization.k8s.io/csi-addons-metrics-reader created
clusterrole.rbac.authorization.k8s.io/csi-addons-proxy-role created
rolebinding.rbac.authorization.k8s.io/csi-addons-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/csi-addons-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/csi-addons-proxy-rolebinding created
configmap/csi-addons-manager-config created
service/csi-addons-controller-manager-metrics-service created
deployment.apps/csi-addons-controller-manager created

Originally posted by @nixpanic in #49 (comment)

reclaimSpace job not working

Hi ,

We are trying to use ReclaimSpace job with rook-ceph
rook operator version is 1.12 and ceph 17.2.5. k8s version is 1.26.8
Followed steps given below :
([https://rook.io/docs/rook/v1.12/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#csi-addons-controller)]](https://rook.io/docs/rook/v1.12/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#csi-addons-controller)%5D)
now rbdplugin and provisioner pod has csi-addon sidecar and csi addon pod in csi-addons-system namespace running fine.
But when we create a ReclaimSpace job for rbd volume with Read Write Once mode it failes and shows message in reclaimSpaceJob
"Failed to make node request node client not found for nodeID"

there are no errors in provisioner pod and plugin pod where pvc is attached.
In csi-addon-controller pod there are logs "PANIC no leader found for driver rook-ceph.rbd.csi.ceph.com" Lease.coordination.k8s.io "rook-ceph-rbd-csi-ceph-com-csi-addons"

Please help to fix this

Provide readable timestamps in the logs

Times of log messages are in a format that is difficult to read format:

1.6593560518327549e+09	INFO	Making controller reclaim space request	{"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "reclaimSpaceJob": {"name":"sample-1","namespace":"default"}, "namespace": "default", "name": "sample-1", "reconcileID": "dc87e88f-318a-40be-a87b-63151495d4d3", "PVCName": "rbd-pvc", "PVCNamespace": "default", "PVName": "pvc-82801af6-dccb-4323-a08c-743766b74028", "NodeID": "ip-10-0-149-143.ec2.internal", "nodeClient": "rook-ceph/csi-rbdplugin-4z2wj", "controllerClient": "rook-ceph/csi-rbdplugin-provisioner-5794db6555-npwr7"}
1.6593560520878572e+09	INFO	Successfully completed reclaim space operation	{"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "reclaimSpaceJob": {"name":"sample-1","namespace":"default"}, "namespace": "default", "name": "sample-1", "reconcileID": "dc87e88f-318a-40be-a87b-63151495d4d3", "PVCName": "rbd-pvc", "PVCNamespace": "default", "PVName": "pvc-82801af6-dccb-4323-a08c-743766b74028", "NodeID": "ip-10-0-149-143.ec2.internal", "nodeClient": "rook-ceph/csi-rbdplugin-4z2wj", "controllerClient": "rook-ceph/csi-rbdplugin-provisioner-5794db6555-npwr7"}

It is not easy compare times with other logs, the current situation is very user unfriendly.

Move volume replication operator to kubernetes-csi-addons

This issue to track the work required to move standalone volume replication operator into the kubernetes-csi-addons repo.

Provide kustomization for deployment

Currently to install latest release you need to apply all resources from the release page. Adding or removing resources will break users.

Adding trivial kustomization.yaml grouping the resources will allow installing via:

kubectl apply -k https://github.com/csi-addons/kubernetes-csi-addons.git/deploy/controller?ref=tag

All the Controller Operations should reach the one Controller (active) not multiple Controllers

As of today, the kubernetes csi addons try to connect to the random controller that are registered and try to make the RPC calls to the random controller. This can create a problem if the csi driver has implemented some internal locking mechanism or has some local cache for the lifetime of that instance.

Example as below:-

CephCSI runs deployments for Replication/Reclaimspace etc and we will have two instances running. CephCSI Internally takes a lock and processes a request one at a time based on its internal logic. With the current kubernetes sidecar, it's not a problem because the sidecar runs with a leader election and only one can process a request but with kubernetes-csiaddons it becomes a problem as we don't have any such mechanism to reach the same controller/deployment which is processing the requests.

The request is to provide this kind of functionality so that it will be helpful for the CSI driver who is having this kind of requirement and moreover not to run active/active models as it can lead to many different models.

reclaimspace: Add support for custom timeouts

Currently Reclaimspace requests to csi drivers use a default timeout of 3 min which can only be overridden from the cmdline args.

flag.DurationVar(&reclaimSpaceTimeout, "reclaim-space-timeout", defaultTimeout, "Timeout for reclaimspace operation")

The time taken by these cmds executed by csi drivers may vary significantly based on factors like size of the PVC, io pattern etc.

Therefore, having ability to override this timeout at reclaimspacejob CR per PVC level will be very useful.

@nixpanic @Madhu-1

In case the CSIAddonsNode CRD is not available, the sidcar should not abort but retry

When the sidecar fails to create the CSIAddonsNode, it fails, and causes a deployment/deamonset of the CSI-driver to fail.

It is better to report the error of the missing CRD and the failure, and backoff-retry until the CRD becomes available (might be never). This will allow deploying the sidecar by default, even when users do not deploy the controller and the CRDs.

Auto reclaim storage based on the storageclass/namespace

The current way of annotating the PVC, namespace is not so useful because the customer needs to annotate the resources and doesn't provide any option for the admin if someone wants to auto reclaim the space, Provide an option set at the csi-addon controller that exposes some configurations to say auto reclaim space for the PV created by this driver.

Example: add support to , separate driver names which can auto reclaim to https://github.com/csi-addons/kubernetes-csi-addons/blob/main/deploy/controller/csi-addons-config.yaml or some other way.

The above is helpful for the PV key rotation as well.

Admins deploying the operator need not worry about security or storage if it is unused and unannotated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.