dell / csm Goto Github PK

View Code? Open in Web Editor NEW

65.0 65.0 15.0 531 KB

Dell Container Storage Modules (CSM)

License: Apache License 2.0

Makefile 100.00%

csm's Introduction

Dell Container Storage Modules (CSM)

Dell Container Storage Modules (CSM) is an open-source suite of Kubernetes storage enablers for Dell products.

For documentation, please visit Container Storage Modules documentation.

Container Storage Modules - Components

About

Dell Container Storage Modules (CSM) is 100% open source and community-driven. All components are available under Apache 2 License on GitHub.

csm's People

Contributors

Stargazers

Watchers

Forkers

andrewpsp michaelbb31 asif198788 devhliu kernelzilla taohe1012 mbasha sachin-apa coresolutiondoteu carrollzhu kvk359 cbartoszdell oa72280 laloquera710 tsbay77

csm's Issues

[BUG]: Isilon CSI driver repots: Error: Authorization required # DISCONNECTED SITE

Describe the bug
Following Operator and CSI driver upgrade we notice the following errors on the Isilon controller:
"Error in response Method DELETE URI:platform/2/protocols/nfs/exports Error: Authorization required JSON Error: Authorization required"
Old operator version 1.2
Old CSI driver version: 1.4
New Operator version 1.5
New CSI driver version 2.0.0

Following this error message
the CSI driver send an authentication request with the user and password, authentication succeed and the delete operation proceeding with no issue.
Next request will trigger another authentication error and the process follow.

The error will spawn for any type of request not only delete so basically the entire log is loaded with authentication issues and success
The time it takes for the operation to complete results in pods to timeout due to pvc access issues.

Expected behavior
A bearer token to be saved once and sent with each new request so the driver wont need to authenticate for any new request.

Screenshots
Disconnected site

Logs
Disconnected site

System Information (please complete the following information):
OpenShift: 4.6.5
K8s: 1.19
Dell Operator 1.5
CSI driver 2.0.0
Isilon version 8.2.2.0
Isilon RUP 01-2021

[FEATURE]:Documentation update for nfsv4 mount option

Adding the Comment for nfsv4 mount option, in Isilon cluster NFSv4 should be set.
To mount a volume with NFSv4, specify mount option vers=4. Make sure NFSv4 is enabled on the Isilon Cluster

[FEATURE]: CSM Replication: repctl UX improvements

Redesign repctls look&fell to improve UX, based on feedback

Use get instead of list
The to-cluster argument in execute-action naming is a not well understandable approach. Need to investigate and come up with proper naming/architecture for executeAction.
-Use the --wait argument for the execute actions command so that repctl will wait for the action to complete.

[FEATURE]: Metrics dashboards are available in the Grafana dashboard repository

Describe the solution you'd like
The metrics dashboards are available in the Grafana dashboard repository to make it easier for users to import into Grafana.

Describe alternatives you've considered
N/A

Additional context
N/A

Add support for NodeGetVolumeStats RPC for kubelet metrics

Hi Dell folks,

we're using the Unity CSI driver and are missing metrics. Please add support to the NodeGetVolumeStats RPC according to the CSI spec.

This allows the kubelet to query the CSI plugin for a PVC’s status.
The kubelet then exposes that information in kubelet_volume_stats_* metrics.

best regards

[FEATURE]:CSM reporting metrics for PV/PVC for PowerFlex

As a K8s admin, I want to view the pvc / pv metrics reporting back to kubelet via the CSI driver.

In addition, I want to verify that the volume is still provisioned properly on the array (exists and is exported to the node) and that it is still accessible and correctly mounted at the indicated volume path.

[FEATURE]: Standalone Helm install for the CSI Unity driver

Description:
Please update the helm charts to be able to be used standalone with no scripts to deploy and upgrade the Unity CSI Driver, similar to #85

[FEATURE]: Update Storage Class for Pods where a vTree storage migration has occurred.

Describe the solution you'd like
Should a pod's volume be migrated by way of PowerFlex manager UI or CLI, the Pod retains the name of the original storage class name. Would be nice to see the storageclass change once a successful vTree migration has occurred.

Describe alternatives you've considered

Additional context
Customers interested in migrating data between various storage classes or the need to migrate a volume from a storage class that will be sunset due to older nodes. Upon migrating to a new system, the Pod should be updated to reflect the storage class change.

[FEATURE]: Review logging throughout the CSM-Replication code

Current logging of csm-replication often does not help to understand a problem, we need to review messages and logging levels across the whole csm-replication and corresponding csi-drivers and fix (re-imagine) the logging approach.
Tips:

We want to see functions input and output
We want to """log.WithFields""" as much as possible
We should always use correct log levels
No "UnknowErrors"
Error messages should be properly worded and should include some sort of a simplified trace.

[FEATURE]: Security Feature for Mount request

Request Feature
Karavi Authorization should have a authorization feature regarding the mount request.
The Kubernetes Pods should not be able to mount PVs which are claimed by the other k8s clusters,
so the Karavi Authorization should deny the existing PV mount request from the Pods which do not call PVC for target PV.
Issue
In the multi-tenancy CaaS environment , there are no feature of tenant isolation on the CSI Block storage based on Dell EMC Storage. So logically, the Pods can mount PV which is provisioned by the PVC which the other k8s cluster issued. This would be security whole.
This will prevent to be chosen Dell EMC storage for the MEC in 5G platform or CSP's CaaS Service Platform.

[FEATURE]: Install/Upgrade Support via Helm for Authorization module sidecar

Description
Installation and upgrade is supported via Helm for the Authorization module sidecar. This will be incorporated in the following CSI Driver Helm charts:

Dell EMC CSI Driver for PowerFlex
Dell EMC CSI Driver for PowerMax
Dell EMC CSI Driver for PowerScale

In addition to this, the Authorization module proxy server will support upgrade via RPM.

[FEATURE]: Add version option in KARAVICTL

Describe the solution you'd like
Add a kavarictl version to check the version of the binary as well as the service similar to what kubectl version does.

Describe alternatives you've considered
Right now, I have an error from the command line but I cannot say if this is due to an incompatible version of the binary versus the karavi service.

[FEATURE]: CSM reporting metrics for PV/PVC for Isilon/PowerScale

As a K8s admin, I want to view the pvc / pv metrics reporting back to kubelet via the CSI driver.

[FEATURE]: Update kapp version from 0.39.0 to 0.40.0 for the CSM installer

Describe the solution you'd like
Update kapp version from 0.39.0 to 0.40.0 for the CSM installer to absorb the latest updates.

Describe alternatives you've considered
N/A

Additional context
N/A

[FEATURE]: Pass Volume Name off to Storage System

Describe the solution you'd like
The myvalues (values).yaml enables the naming convention of volumes carved up by the CSI. K8S is the default. This can be changed at deployment of the CSI. It could be helpful to allow the alternate name to be passed off to the storage array.

Describe alternatives you've considered
Would require the storage admin to manually change the name of the volume in the UI of PowerFlex. If deploying a volume(s) for a specific app, would be good to pass this off automatically.

Additional context
N/A

[FEATURE]: Storage Volume Multi-Tenancy Support for Unity

Unity storage array supports IP Multi-tenancy feature to assign isolated, file-based storage partitions to the NAS servers on a storage processor.
Through this feature implementation in CSI Unity Driver, customers will now be able to associate Tenant with storage volumes.

Additional reference: https://www.dell.com/community/Containers/CSI-dynamic-provisioning-in-a-multitenancy-model/td-p/7476098

Acceptance Criteria: Storage provisioned and associated with Hosts, specific to the Tenant in kubernetes cluster.

[FEATURE]: Update Resiliency module sidecar image

Describe the solution you'd like
Resiliency module side car image needs to be updated from registry.access.redhat.com/ubi8:8.3 to registry.access.redhat.com/ubi8:8.4 to absorb latest updates.

[FEATURE]: Provide visibility into the % "full" a replication session is

Provide visibility into the % "full" a replication session is

Additional context
Enable administrative insights as to making decisions to intervene and manually trigger a fail over to the secondary site or determine if its necessary to migrate to a different pool if resource bound.

I cannot find any info in the documentation regarding the permissions required on PowerStore to successfully install the driver (so hosts get properly registered) and to successfully perform dynamic provisioning.

How can the Team help you today?

Details: ?

[BUG]: VolumeGroup Snapshot Error does not contain details of the error

With VolumeGroup Snapshotter version 0.3.0, when error happens, the DellCsiVolumeGroupSnapshotStatus.Status only contain "ERROR". No details.

[FEATURE]: Support SINGLE_NODE_SINGLE_WRITER and SINGLE_NODE_MULTI_WRITER modes for Unity

As a K8s user, I want CSI drivers to support latest spec, so I could use the new features.

As part of this feature, need to support following CSI 1.5 spec for Unity CSI Driver:

SINGLE_NODE_SINGLE_WRITER and SINGLE_NODE_MULTI_WRITER modes

Acceptance Criteria: k8s pods will support ReadWriteOncePod and ReadWriteOnce as part of CSI 1.5 spec
For k8s 1.21 and below, access modes supported currently will still remain intact

[FEATURE]: CSM reporting metrics for PV/PVC for PowerStore

As a K8s admin, I want to view the pvc / pv metrics reporting back to kubelet via the CSI driver.

[FEATURE]: Easier deployment of Authorization.

Describe the solution you'd like
Easier deployment of Authorization. Introduce an alternative to deploy as a container (pod) where upon deployment, prompts or values.yaml can be generated to connect to PowerFlex system to enable access controls for storage consumption.

This would cut down on having another vm to deploy, and would decrease overall time to deploy.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[FEATURE]: Display available capacity

As K8s admin, I want to monitor the use of the available storage capacity for my cluster, so I can plan the capacity

Available capacity should be expressed in storage units.

[FEATURE]: Update the modules to v2 and csi spec to 1.5

Description:

Update the CSI Spec to 1.5.
Add /v2 suffix to the module names.
Update the ubim image to 8.4-210.

[FEATURE]: Enable true open source model

Update docs to inform customers to pull release from tag not main

[FEATURE]: Trigger alarms when volume access latency exceeds norms

Describe the solution you'd like
I'm submitting this on behalf of Itzik as it came up in a meeting this morning.
He described the need for have some kind of running averages on the latency for I/O operations to complete on the array, perhaps broken down by node, or storage pool, or for a particular volume, or overall on the array. The metrics would keep a history of past metrics (perhaps an Exponential Moving Average that weights recent usage more highly than distant past) and if the latency for the item exceeded that norm (the moving average) by some percentage trigger an alarm event of some kind (perhaps a grafana alarm).

Describe alternatives you've considered
Similar facilities are generally already available in CloudIQ and the various array User Interfaces. However they do not report to the kubernetes admins. Additionally this might allow some kind of kubernetes automation to be build around the alarm.

Additional context
This is an enhancement, not to be considered a bug. We can discuss priority and possible implementations.

[QUESTION]: Delay occurs in the elimination of luns attached to pods 

How can the Team help you today?

Delay occurs in the elimination of luns attached to pods 

Details: ?
Where this issue occurs:
• Cluster OCP NOPROD of Cuyo
• With Driver CSI Unity 1.6 
• PVC Name csivol-05e3f78392
• wwn: 0x60060160acd04e00ebc46d61578d24f2
• SO: Red Hat Enterprise Linux CoreOS release 4.7
• UNITY SV:5.0.1.0.5.011
• CSI Driver Version: 1.6.0

Problem that occurs:
• The problem occurs when a POD with a Unity iSCSI PV terminates

• And immediately a new POD is raised using the same PV, this takes about 5 minutes to lift, indicating problems with attaching the PV (this because the PV from unity still says it belongs to another Pod)

Node driver log during deleted POD:

time="2021-10-19T14:00:47Z" level=info runid=361 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:838"
time="2021-10-19T14:00:58Z" level=info msg="unmount command" cmd=umount path="/var/lib/kubelet/pods/4cddd421-88f0-415f-a8a4-f7aa8e86a066/volumes/kubernetes.io~csi/csivol-05e3f78392/mount"
time="2021-10-19T14:00:58Z" level=info runid=365 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:838"
time="2021-10-19T14:00:59Z" level=info msg="unmount command" cmd=umount path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/csivol-05e3f78392/globalmount
time="2021-10-19T14:00:59Z" level=info msg="Check for disk path /dev/disk/by-id/dm-uuid-mpath-360060160acd04e00ebc46d61578d24f2 found: /dev/dm-1"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="resolve wwn for DM: dm-1" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="multipath - get block device included in DM" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="get devices by wwn 360060160acd04e00ebc46d61578d24f2" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="devices for WWN 360060160acd04e00ebc46d61578d24f2: [sdg sdk sdl sdm]" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="multipath - trying to find multipath DM name" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="multipath - start flush dm: /dev/dm-1" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:00:59Z" level=info arrayid=ckm00195201796 runid=368 msg="multipath command: chroot args: /noderoot multipath -f /dev/dm-1" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:01:27Z" level=info runid=369 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:838"

Log node driver elapsed LUN elimination.

time="2021-10-19T14:01:59Z" level=info runid=370 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:838"
time="2021-10-19T14:02:58Z" level=error arrayid=ckm00195201796 runid=368 msg="failed to flush multipath device: signal: killed" func="github.com/dell/csi-unity/service.(*customLogger).Error()" file="/go/src/csi-unity/service/service.go:714"
time="2021-10-19T14:02:59Z" level=info runid=371 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:838"
time="2021-10-19T14:02:59Z" level=info runid=372 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:838"
time="2021-10-19T14:02:59Z" level=info msg="Check for disk path /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2 found: /dev/sdm"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="get devices by wwn 360060160acd04e00ebc46d61578d24f2" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="devices for WWN 360060160acd04e00ebc46d61578d24f2: [sdg sdk sdl sdm]" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="multipath - trying to find multipath DM name" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="multipath device not found: dm not found" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="device state is: running" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="writing '1' to /sys/class/block/sdg/device/delete" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="device state is: running" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="writing '1' to /sys/class/block/sdk/device/delete" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="device state is: running" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:02:59Z" level=info arrayid=ckm00195201796 runid=368 msg="writing '1' to /sys/class/block/sdl/device/delete" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:03:00Z" level=info arrayid=ckm00195201796 runid=368 msg="device state is: running" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:03:00Z" level=info arrayid=ckm00195201796 runid=368 msg="writing '1' to /sys/class/block/sdm/device/delete" func="github.com/dell/csi-unity/service.(*customLogger).Info()" file="/go/src/csi-unity/service/service.go:706"
time="2021-10-19T14:03:01Z" level=info msg="Check for disk path /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2 not found"
time="2021-10-19T14:03:02Z" level=info msg="Check for disk path /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2 not found"
time="2021-10-19T14:03:02Z" level=info msg="Check for disk path /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2 not found"

log pod when we want to recreate the pod with the same pvc and this is taken:

Normal Scheduled Successfully assigned movistar-3scale/simple-volume-pod-example-04 to ocpnp-7tcvg-worker-0-mtx7v
Warning FailedMount 3m11s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = Internal desc = runid=182 error publish volume to target path: mount failed: exit status 32
mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/csivol-05e3f78392/globalmount /var/lib/kubelet/pods/5e8918ab-c55b-431b-ba9f-b1e7a232e6fa/volumes/kubernetes.iocsi/csivol-05e3f78392/mount
output: mount: /var/lib/kubelet/pods/5e8918ab-c55b-431b-ba9f-b1e7a232e6fa/volumes/kubernetes.iocsi/csivol-05e3f78392/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/csivol-05e3f78392/globalmount does not exist.
Warning FailedMount 3m10s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=184 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount 3m9s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=188 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount 3m7s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=192 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount 3m2s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=197 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount 2m54s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=201 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount 2m37s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=206 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount 2m5s kubelet, ocpnp-7tcvg-worker-0-mtx7v MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=214 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory
Warning FailedMount (x2 over 2m8s) kubelet, ocpnp-7tcvg-worker-0-mtx7v Unable to attach or mount volumes: unmounted volumes=[example-04], unattached volumes=[example-04 default-token-f2f56]: timed out waiting for the condition
Warning FailedMount (x2 over 60s) kubelet, ocpnp-7tcvg-worker-0-mtx7v (combined from similar events): MountVolume.SetUp failed for volume "csivol-05e3f78392" : rpc error: code = NotFound desc = runid=231 Disk path not found. Error: readlink /dev/disk/by-id/wwn-0x60060160acd04e00ebc46d61578d24f2: no such file or directory

[FEATURE]: Delete PowerFlex SDC ID when Worker is Removed from Cluster

Describe the solution you'd like
For PowerFlex systems, where SDCs are deployed on k8s workers, if the worker is permanently removed from the cluster this creates an orphaned SDC ID (GUID)

SDC ID: d5d2119f00000025 Name: N/A IP: N/A State: Disconnected GUID:

While a new worker can be deployed and can take on the IP of that worker which had been deleted, there remains the artifact of that removed workers GUID.

The ability to, when removing a worker or destroying a k8s cluster, to pass off to the PowerFlex gateway all the SDC IDs to be removed so as to decrease the alerts flagged in PFxM / Presentation server.

Currently, storage administrators of the PowerFlex must manually go in and remove these orphaned SDC ID's.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[FEATURE]: Standalone Helm install for the CSI PowerFlex driver

Description:
Update the helm charts to be able to be used standalone with no scripts to deploy and upgrade the PowerFlex CSI Driver.

[BUG]: PowerStore CSI Driver report iSCSI target error while using FC only

Describe the bug
A clear and concise description of what the bug is.

PowerStore CSI driver is configured for FC connectivity, however its reporting "“AttachVolume.Attach failed for volume "csi-pstore-167d09d5dd" : rpc error: code = Internal desc = could not get iscsiTargets: can't get iscsi target address”" when using CSI to provision PVC from PowerStore.

Issue only resolved by config CSI with a fake iSCSI targets, then FC connectivity would work.

Looking at the code of "addTargetsInfoToPublishContext()", it seems the code would always check iSCSI target first then FC, if there is no ISCSI target configured, the provision would exit with error, it would not proceed with FC target check. This is not ideal, the code should skip ISCSI target check and proceed with FC targets check, if no ISCSI targets configured.

func (s *SCSIPublisher)
addTargetsInfoToPublishContext(
publishContext map[string]string, client gopowerstore.Client) error {
iscsiTargetsInfo, err := common.GetISCSITargetsInfoFromStorage(client)
if err != nil {
return err
}
for i, t := range iscsiTargetsInfo {
publishContext[fmt.Sprintf("%s%d", common.PublishContextISCSIPortalsPrefix, i)] = t.Portal
publishContext[fmt.Sprintf("%s%d", common.PublishContextISCSITargetsPrefix, i)] = t.Target
}
fcTargetsInfo, err := common.GetFCTargetsInfoFromStorage(client)
if err != nil {
return err
}
for i, t := range fcTargetsInfo {
publishContext[fmt.Sprintf("%s%d", common.PublishContextFCWWPNPrefix, i)] = t.WWPN
}
return nil
}

To Reproduce
Steps to reproduce the behavior:

Step 1 configure CSI with FC connectivity without configuring iSCSI targets
Step 2 try provision FC based PVC from Powerstore to pod
n See error
"“AttachVolume.Attach failed for volume "csi-pstore-167d09d5dd" : rpc error: code = Internal desc = could not get iscsiTargets: can't get iscsi target address”"

Expected behavior
User should be able to provision FC based PVC from Powerstore without configuring iSCSI targets.
The code of "addTargetsInfoToPublishContext()" should skip ISCSI target check and proceed with FC targets check if no ISCSI targets configured.

Screenshots
If applicable, add screenshots to help explain your problem.

Logs
If applicable, submit logs or stack traces from the affected services

System Information (please complete the following information):

OS/Version: [e.g. RHEL 7.6]
Kubernetes Version [e.g. 1.18]
Additional Information...

Additional context
Add any other context about the problem here.

[FEATURE]: Improve karavi-metrics logs to display the number of pulled metrics

Describe the solution you'd like
Current logs give time to collect and push data :

2021/01/12 10:47:04 Looking up system ID 31ba0b0125517f0f Name powerflex01
2021/01/12 10:47:04 gatherMetrics took 2.25µs
2021/01/12 10:47:04 pushMetrics took 812ns
2021/01/12 10:47:04 GetSDCStatistics took 22.456348ms
2021/01/12 10:47:04 gatherPoolStatistics took 3.107µs
2021/01/12 10:47:04 pushPoolStatistics took 1.026µs
2021/01/12 10:47:04 gatherPoolStatistics took 1.742µs
2021/01/12 10:47:04 pushPoolStatistics took 729ns
2021/01/12 10:47:04 GetStoragePoolStatistics took 49.552945ms

While debugging an issue with the otel-collector, I was unsure of the actual metrics pushed.

Is it possible to add to this log the number of pulled/pushed metrics. For example:

2021/01/12 10:47:04 gatherMetrics XXX took 2.25µs

where XXX is the number of metrics.

[FEATURE]: Improve deployment documentation

Describe the solution you'd like
The first step of the deployment guide is to build the rpm while we release the artifact under : https://github.com/dell/karavi-authorization/releases

It will be nice to have separate instructions to build karavi-authorization with the pre-requisites (go v1.16, CentOS/RHEL 8 to build the rpm, etc.) and link the artifact in the install guide directly.

If below version is support CSI for Powerstore

Dears:

We have a customer want to install the csi for powerstore. Would you pls help to check if below software version support or not?Tks.

PowerStore: 2.0.1.0
OS Version: RedHat 7.6
Kernel Version : 54.145-1.el7.elrepo.x86_64
Kubernetes： 1.20.9
Rancher: 2.5.9
Docker version 20.10.6, build 370c289
calico: 3.17.2
etcd: 3.4.15
nginx-ingress: 0.43.0

[FEATURE]: CSM Installer that simplifies deployment of Dell EMC CSI Drivers and Modules.

Describe the solution you'd like
CSM Installer that simplifies deployment of Dell EMC CSI Drivers and Modules.

Multi-cluster support
Generate secrets for CSI Drivers
REST API and CLI support

Describe alternatives you've considered
N/A

Additional context
N/A

[BUG]: Issue creating volume from Isilon snapshot

Describe the bug
Issue seen when creating a volume from an Isilon snapshot. The snapshot got created correctly.
I will forward the logs and other information by emal.
To Reproduce
Steps to reproduce the behavior:

Create a snapshot (successfully).
Create a volume from the snapshot. Fails.
Step 3 ...
...
n. Step n See error time="2021-10-26T20:56:24Z" level=error msg="copy snapshot failed, 'Unable to open object

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Logs
Submitted via email.

System Information (please complete the following information):

OS/Version: [e.g. RHEL 7.6] Openshift
Kubernetes Version [e.g. 1.18]
Additional Information...

Additional context
Add any other context about the problem here.

[BUG]: When VolumeGroup Snapshot ReadyToUse, the VolumeSnapshot don't have Status

When VolumeGroup Snapshot Ready, the VolumeSnapshot don't have Status

Steps to reproduce the behavior:

Using dellemc/csi-volumegroup-snapshotter:v0.2.0, I created VolumeGroup Snapshot of 3 PVCs with same label "volume-group=volumeGroup1" using controller code.
After the DellCsiVolumeGroupSnapshot.Status ReadyToUse is true, the code tried to fetch the VolumeSnapshot and found that it misses the Status field.
VG Snapshot's Status: {
"snapshotGroupID": "217b52c106138b0f-c56b7d7100000003",
"snapshotGroupName": "volumegroup1-102021-184622",
"snapshots": "volumegroup1-102021-184622-0-pvol0,volumegroup1-102021-184622-1-pvol1,volumegroup1-102021-184622-2-pvol2",
"creationTime": "2021-10-20T18:45:25Z",
"readyToUse": true,
"status": "Complete"
}
...
Missing Status of volumesnapshot 'volumegroup1-102021-184622-2-pvol2' in ns 'helmtest-vxflexos'.

[BUG]: Topology based scheduling does not work when recreating Statefulsets

Describe the bug
Topology based scheduling is not working when recreating Statefulsets.
To Reproduce
Steps to reproduce the behavior:

Create StorageClass with

allowedTopologies:
- matchLabelExpressions:
  - key: "csi-vxflexos.dellemc.com/340906774c30210f
    values:
    - csi-vxflexos.dellemc.com

and

volumeBindingMode: WaitForFirstConsumer

NOTE - Powerflex is deployed/installed only on a subset of nodes in the cluster
2. Create a Statefulset without nodeAffinity, pods are scheduled on the right nodes
3. Delete the Statefulset created in step2.
4. Recreate the Statefulset created in step2.
5. The pods are scheduled on nodes where powerflex is not installed

Expected behavior
Pods are scheduled on nodes where Powerflex is installed, even after re-creating the statefulset

Logs
Controller logs - https://gist.github.com/skavyas/6e155d1cff3dc0c75ff039d0142b80a4

System Information (please complete the following information):

OS/Version: [e.g. RHEL 7.9]
Kubernetes Version [e.g. 1.19]

[QUESTION]:Facing an issue during deployment of PV/PVC with CSI 2.0.

How can the Team help you today?

Facing an issue during deployment of PV/PVC with CSI 2.0
Environment details ;

EKS Anywhere with VM OS ubuntu-v1.21.2-kubernetes-1-21-eks-4-amd64-0f34334 deployment
CSI 2.0 deployed.
I am getting the below error when I deploying the App with PV /PVC with CSI 2.0.

MountVolume.SetUp failed for volume "k8s-e331a62c87" : rpc error: code = Internal desc = error performing private mount: mount failed: exit status 32
mounting arguments: -t xfs -o nouuid,fsFormatOption:xfs,defaults /dev/scinia /var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/12dbccc1024d0a0f-72a87c1b00000006
output: mount: /var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/12dbccc1024d0a0f-72a87c1b00000006: wrong fs type, bad option, bad superblock on /dev/scinia, missing codepage or helper program, or other error.

I tried testing with CSI test samples ( .starttest.sh 2vols) Even that throws the same message.
On the backend PowerFlex storage, I do see volumes being created. During the mount, the operation gets this error.
The default storage controller is set to vxflexos.

Any suggestion or workaround should help me

[BUG]: CSM Authorization unit test and gosec failures

Describe the bug
CSM Authorization has sporadic unit test failures when validating user-defined port ranges.
gosec action has false positive security alerts in GitHub actions

To Reproduce
Steps to reproduce the behavior:

Run go test and gosec ./...
...
n. Step n See error

Expected behavior
Both actions should pass

Screenshots
If applicable, add screenshots to help explain your problem.

Logs
If applicable, submit logs or stack traces from the affected services

System Information (please complete the following information):

OS/Version: [e.g. RHEL 7.6]
Kubernetes Version [e.g. 1.18]
Additional Information...

Additional context
Add any other context about the problem here.

[FEATURE]: Upgrade Support via Helm for Replication module

As a user of CSM Replication,
I want a mechanism to upgrade the common controller and ensure it works with upgrades of replicator sidecars in specific drivers
so that I can deploy the latest release.

[BUG]: CSM Authorization: rpm install error with locating Makefile

Describe the bug
CSM Authorization 1.0.0. rpm run in to an issue with Makefile during install.

To Reproduce
Download and install the .rpm package from the 1.0.0 release for CSM Authorization:
rpm -ivh karavi-authorization-1.0-0.x86_64.rpm
...
error: failed to read Makefile: open /viewsvn/lglbg082/jenkins/workspace/Ecosystems_Novus/Karavi_Authorization/karavi-authorization-release/karavi-authorization/Makefile: no such file or directory

Expected behavior
rpm install should not have any failures

System Information (please complete the following information):

OS/Version: CentOS 7
Kubernetes Version 1.18

[FEATURE]: Simplify access to karavictl binary for both Kubernetes and storage admins

Description

The karavictl binary is currently packaged within the CSM Authorization rpm that deploys the Authorization server. This currently does not support different host operating systems. There is also complexity around getting the binary from the Authorization server onto the Kubernetes access host. In an effort to avoid having to build/scp the binary and make it available for various host operating systems, the karavictl binary will be made availabe from the GitHub releases page for the following host operating systems.

darwin-amd64
linux-amd64
linux-arm64
windows-amd64.exe

Note: Feature created as a result of https://github.com/dell/karavi-authorization/issues/98.

[FEATURE]: Observability module: update cert manager dependency from 1.1 to 1.5

Describe the feature
For the Observability module, update the Cert Manager dependency from 1.1 to version 1.5 to support Kubernetes 1.22.2 and above.

[FEATURE]:Support for SINGLE_NODE_SINGLE_WRITER and SINGLE_NODE_MULTI_WRITER

As a Kubernetes user, I should be using SINGLE NODE WRITER for CRUD operations

[FEATURE]:CSM reporting metrics for PV/PVC for Unity

As a K8s admin, I want to view the pvc / pv metrics reporting back to kubelet via the CSI driver.

[BUG]:Extra VolumeSnapshots were created with VolumeGroup Snapshot.

Extra VolumeSnapshots were created with VolumeGroup Snapshot.
We don't see this problem on all VolumeGroup Snapshots.

Steps to reproduce the behavior:

Using dellemc/csi-volumegroup-snapshotter:v0.2.0 to create VolumeGroup Snapshot of 3 PVCs with same label "volume-group=volumeGroup1"
Run VolumeGroup Snapshot:
apiVersion: volumegroup.storage.dell.com/v1alpha2
kind: DellCsiVolumeGroupSnapshot
metadata:
name: "vg1-snap1"
namespace: "helmtest-vxflexos"
spec:
driverName: "csi-vxflexos.dellemc.com"
memberReclaimPolicy: "Delete"
volumesnapshotclass: "vxflexos-snapclass"
pvcLabel: "volumeGroup1"
Get the result and found there are 6 VolumeSnapshot instead of 3.
phuong@irv-vm-90-22 ~/work/vproxy-kubernetes (PPDM-159574-create-volumesnapshot) # k get dellcsivolumegroupsnapshots.volumegroup.storage.dell.com -n helmtest-vxflexos vg1-snap1 -o yaml
apiVersion: volumegroup.storage.dell.com/v1alpha2
kind: DellCsiVolumeGroupSnapshot
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"volumegroup.storage.dell.com/v1alpha2","kind":"DellCsiVolumeGroupSnapshot","metadata":{"annotations":{},"name":"vg1-snap1","namespace":"helmtest-vxflexos"},"spec":{"driverName":"csi-vxflexos.dellemc.com","memberReclaimPolicy":"Delete","pvcLabel":"volumeGroup1","volumesnapshotclass":"vxflexos-snapclass"}}
creationTimestamp: "2021-10-07T21:28:43Z"
finalizers:

vgFinalizer
generation: 1
name: vg1-snap1
namespace: helmtest-vxflexos
resourceVersion: "17321187"
uid: b540018a-d032-4ed2-b808-27e43007d521
spec:
driverName: csi-vxflexos.dellemc.com
memberReclaimPolicy: Delete
pvcLabel: volumeGroup1
volumesnapshotclass: vxflexos-snapclass
status:
creationTime: "2021-10-07T21:27:58Z"
readyToUse: true
snapshotGroupID: 217b52c106138b0f-c56b7c7b00000003
snapshotGroupName: vg1-snap1-100721-212844
snapshots: vg1-snap1-100721-212844-0-pvol1,vg1-snap1-100721-212844-1-pvol0,vg1-snap1-100721-212844-2-pvol2
status: Complete
phuong@irv-vm-90-22 ~/work/vproxy-kubernetes (PPDM-159574-create-volumesnapshot) # k get volumesnapshot -n helmtest-vxflexosNAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
vg1-snap1-100721-212843-0-pvol1 true snapcontent-0fa08d6e-62f8-44ef-860d-54d565da49fc 8Gi vxflexos-snapclass snapcontent-0fa08d6e-62f8-44ef-860d-54d565da49fc 56s 9s
vg1-snap1-100721-212843-1-pvol0 true snapcontent-930aba6b-758b-4e66-9df1-c00a208d3b1e 8Gi vxflexos-snapclass snapcontent-930aba6b-758b-4e66-9df1-c00a208d3b1e 56s 9s
vg1-snap1-100721-212843-2-pvol2 true snapcontent-57f03380-9715-40f5-a5de-25ec80f74a8e 8Gi vxflexos-snapclass snapcontent-57f03380-9715-40f5-a5de-25ec80f74a8e 56s 9s
vg1-snap1-100721-212844-0-pvol1 true snapcontent-e5e9a2e0-caa7-4020-8a06-432ba4d27f60 8Gi vxflexos-snapclass snapcontent-e5e9a2e0-caa7-4020-8a06-432ba4d27f60 55s 9s
vg1-snap1-100721-212844-1-pvol0 true snapcontent-61dc29f7-dddf-4c0a-9b0b-fb957188924d 8Gi vxflexos-snapclass snapcontent-61dc29f7-dddf-4c0a-9b0b-fb957188924d 55s 9s
vg1-snap1-100721-212844-2-pvol2 true snapcontent-2ecfdd38-fa4a-4bb5-ab9e-0347b3b3fc78 8Gi vxflexos-snapclass snapcontent-2ecfdd38-fa4a-4bb5-ab9e-0347b3b3fc78 55s 9s
phuong@irv-vm-90-22 ~/work/vproxy-kubernetes (PPDM-159574-create-volumesnapshot) #

Expected behavior
Only 3 VolumeSnapshot should be created.

[FEATURE]: Migrate Role Management Logic to Authorization Server

Describe the solution you'd like
The role create command logic should move from karavictl to the authorization server

Describe alternatives you've considered
Leaving the logic in the ctl binary, which makes it less portable and bloated

Additional context
N/A

[FEATURE]: CSM Resiliency supports evacuation of pods during NoExecute taint on node

Describe the feature

The original Resiliency design refused to force delete pods if they are potentially doing I/O to the array. The customer's request is a valid one, although not quite as safe as the current behavior of resiliency. I would propose that we make this behavior an option, i.e:

A new configuration variable is introduced. "forceDeleteOnNoExecuteTaint" or something similar, that would change the current behavior so that if podmon receives a notification that the pods is Not Ready and the node has a "NoExecute" taint, we force delete the pod. The default would be false, resulting in the current behavior.
If the pod has a grace period toleration for NoExecute (as most do, the default is 5 minutes), we need to do it before the grace period expires so that no replacement pods will be created on the same node being evacuated because of pod affinity. You could do it immediately upon receiving the NoExecute notification, or perhaps wait 1/2 the duration of the toleration (normally 2-3 minutes) in case the node becomes ready rather quickly.

Additional context
This feature has been converted from a bug. Logs have been attached.

session-logs.txt

[QUESTION]:Long time to bound PVC

Hi.

Latest driver installed with Helm on OpenShift 4.8 and Isilon storage.
When creating a PVC the time until it get bound seams long, about 35 sec.
Any suggestions to improve this?
Some output in the log when a PVC is created attached.

Regards,
-Ulf

[DEBUG]
-------------------------- GOISILON HTTP RESPONSE -------------------------
HTTP/1.1 404 Not Found
Transfer-Encoding: chunked
Allow: GET, PUT, POST, DELETE, HEAD
Content-Security-Policy: default-src 'none'
Content-Type: application/json
Date: Thu, 21 Oct 2021 06:48:50 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000;
X-Frame-Options: sameorigin
X-Isi-Ifs-Spec-Version: 1.0

[DEBUG] Error in response. Method:GET URI:namespace/ifs/dst002dc50/nfs_dcc/ocp_test Error: Unable to open object 'dst002dc50/nfs_dcc/ocp_test/k8s-de8dfb1546' in store 'ifs' -- not found. JSON Error: Unable to open object 'dst002dc50/nfs_dcc/ocp_test/k8s-de8dfb1546' in store 'ifs' -- not found.
[DEBUG] the query of volume (id '', name 'k8s-de8dfb1546') returned an error, regard the volume as non-existent. error : 'Unable to open object 'dst002dc50/nfs_dcc/ocp_test/k8s-de8dfb1546' in store 'ifs' -- not found.'
time="2021-10-21T06:48:50Z" level=debug clusterName=dc50nfsdcc runid=19 msg="volume (id :'', name 'k8s-de8dfb1546') already exists : 'false'" file="/go/src/service/isiService.go:412"
time="2021-10-21T06:48:50Z" level=debug clusterName=dc50nfsdcc runid=19 msg="begin getting export with target path '/ifs/dst002dc50/nfs_dcc/ocp_test/k8s-de8dfb1546' and access zone 'dc50nfsdcc' for Isilon" file="/go/src/service/isiService.go:633"

[DEBUG] Execution successful on Method: GET, URI: platform/2/protocols/nfs/exports
time="2021-10-21T06:48:50Z" level=error clusterName=dc50nfsdcc runid=19 msg="error retrieving export ID for 'k8s-de8dfb1546', set it to 0. error : ''.
" file="/go/src/service/controller.go:320"
time="2021-10-21T06:48:50Z" level=error clusterName=dc50nfsdcc runid=19 msg="request parameters: the path is '/ifs/dst002dc50/nfs_dcc/ocp_test/k8s-de8dfb1546', and the access zone is 'dc50nfsdcc'." file="/go/src/service/controller.go:321"
time="2021-10-21T06:48:50Z" level=debug clusterName=dc50nfsdcc runid=19 msg="create volume with header metadata 'k8s-de8dfb1546' has been resolved to 'map[x-csi-pv-claimname:dxy x-csi-pv-name:k8s-de8dfb1546 x-csi-pv-namespace:fredrik]'" file="/go/src/service/controller.go:345"
time="2021-10-21T06:48:50Z" level=debug clusterName=dc50nfsdcc runid=19 msg="begin to create volume 'k8s-de8dfb1546'" file="/go/src/service/isiService.go:102"
time="2021-10-21T06:48:50Z" level=debug clusterName=dc50nfsdcc runid=19 msg="header metadata 'map[x-csi-pv-claimname:dxy x-csi-pv-name:k8s-de8dfb1546 x-csi-pv-namespace:fredrik]'" file="/go/src/service/isiService.go:103"

dell / csm Goto Github PK

csm's Introduction

Dell Container Storage Modules (CSM)

Table of Contents

Container Storage Modules - Components

About

csm's People

Contributors

Stargazers

Watchers

Forkers

csm's Issues

How can the Team help you today?

How can the Team help you today?

How can the Team help you today?

Recommend Projects

Recommend Topics

Recommend Org