outscale / osc-bsu-csi-driver Goto Github PK

License: Apache License 2.0

Dockerfile 0.18% Makefile 1.04% Go 95.83% Python 0.54% Shell 1.96% Mustache 0.45%

osc-bsu-csi-driver's Issues

Be resilient to throttling errors

As a user, I would like my CSI plugin to be stable when API calls are blocked due to throttling (either on the account or the whole platform).

This issue focus on checking that all API calls are resilient to throttling errors and are able to retry using exponential-backoff algorithme.

Handle 39 volumes for scsi device per node

/kind bug

What happened?

It is possible to handle 39 volumes per node when it is not scsi.
But when it is scsi volumes we get after 36 volumes per node:

2023-05-24T04:02:26Z : Warning : FailedMount : MountVolume.MountDevice failed for volume "pvc-5b47b495-a962-47cb-9deb-4962fb6f7a3b" : rpc error: code = Internal desc = Failed to find device path /dev/xvdaa. scsi path "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sda" not found

What you expected to happen?
To be able to handle 39 volumes for scsi device per node

Idempotence of Deletion functions

What happened?
DeleteVolume and DeleteSnapshot are not idempotent and the plugin get stuck in infinite deletion loop if the resource does not exist anymore.

What you expected to happen?
All functions should be idempotent.

How to reproduce it (as minimally and precisely as possible)?
Create a Disk/Snapshots and destroy the disk manually

Environment

Driver version:v0.0.15

Migrate to outscale-sdk-go to 2.x

Is your feature request related to a problem?/Why is this needed

osc-sdk-go v1 is used but has some limitation due to nil management during API queries.
osc-sdk-go v2 provide some new facilities around this issue.

/feature

Describe the solution you'd like in detail

Adapt CSI code to switch to osc-sdk-go.

Ratio Iops/Size not checked

/kind bug

What happened?
When we request a disk with iops that exceeds Outscale'limit (https://docs.outscale.com/en/userguide/About-Volumes.html#_volume_types_and_iops) which is currently 300, the plugin does not reduce the iops.
What you expected to happen?
The plugin currently reduces the maximum iops if it exceeds 13000, it would be good to also check the ratio and reduce it in such case.

Inconsistency risk in CSI Snapshots

/kind bug

What happened?

During CreateSnapshot, CSI will return OK after calling CreateSnapshot (IaaS).
Once CreateSnapshot (CSI) returned OK, the CO now consider that the Snapshot is now "cut" in CSI specification (meaning the Snapshot's content cannot be altered by future writes).
Once the "cut" done, CO may "thaw" application which may continue writing on Volume.

However unlike EC2 behavior where "the point-in-time snapshot is created immediately", Outscale's Snapshot will be cut once the state "completed" is reached on IaaS:

The data contained in a snapshot is considered cut when the snapshot is in the completed state.

This behavior could lead CO to prematurely resume writes on Volume and alter Snapshot content.

What you expected to happen?

As described in CSI spec:

CreateSnapshot is a synchronous call and it MUST block until the snapshot is cut

In the current Outscale API version, CreateSnapshot (CSI) should block until Snapshot (IaaS) state reached "completed".

How to reproduce it (as minimally and precisely as possible)?

Create a loop that appends current date to date.txt in a volume every seconds
Trigger CreateSnapshot (CSI)
Read creation_time of the Snapshot
Restore Snapshot to a new Volume and read date file
Compair dates between 3. and 4 => Dates written to restored Volume should be after creation_time

Anything else we need to know?:

Note that ready_to_use still switch to true once a Snapshot (IaaS) move to "complete" state as Outscale have no post-processing effort (unlike EC2).

🔥IMPLEMENTATION RISK🔥

Waiting for state to reach "complete" could easily timeout CSI calls which is ok as CO will call CreateSnapshot again and again.
If each pending call is not stopped once timeout is reached, each call may continue performing ReadSnapshots (IaaS) in an infinite loop and cause those issues:

Runners are occupied to run the same ReadSnapshots (IaaS) over and over, leading to useless API usage.
All runners may be saturated by the same task and controller cannot respond anymore leading to denial of service

Fix implementation should consider exit with an error instead of ReadSnapshot (IaaS) forever (could be a fixed allocated time, could be after first read, ...)

Environment

Driver version: <= 1.2.4

Update to sanity test v4.3.0

Currently, the plugin is tested against sanity_test v2.2.0 which is meant to CSI drivers that satisfy v1.1.0 CSI spec.

With version v0.15.0, Outscale driver satifies v1.5.0 CSI spec therefore it should be a good idea to upgrade the sanity test package to v4.3.0

Options k8s-tag-cluster doesn't exist

/kind bug

What happened?
-k8s-tag-cluster-id doesn't exist in ebs-plugin but it is possible to set it in helm values (/osc-bsu-csi-driver/values.yaml)

k8sTagClusterId: ""

kubectl logs -f ebs-csi-controller-85f44d455c-4fbjz  -n kube-system -c ebs-plugin
flag provided but not defined: -k8s-tag-cluster-id
Usage of aws-ebs-csi-driver:
  -add_dir_header
        If true, adds the file directory to the header of the log messages

What you expected to happen?
I don't know if the feature is removed or changed
How to reproduce it (as minimally and precisely as possible)?

helm install osc-bsu-csi-driver ./osc-bsu-csi-driver --namespace kube-system --set enableVolumeScheduling=true  --set enableVolumeResizing=true --set enableVolumeSnapshot=true  --set region=$OSC_REGION   --set image.repository=$IMAGE_NAME --set image.tag=$IMAGE_TAG --set k8sTagClusterId="test"

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

Driver version:
"v0.0.14beta"

Fix code scanning alert - CVE-2022-37434

Tracking issue for:

https://github.com/outscale-dev/osc-bsu-csi-driver/security/code-scanning/4

Support for client certificat

Is your feature request related to a problem?/Why is this needed

Some deployments need to use a specific client certificate in order to make calls.
For instance, use may want to configure its API Access Rules with certiticates

/feature

The feature would consist to pass an optional client certificate to use through environment variable to CSI.
In a Kubernetes deployment, this certificate should be stored inside a Secret.

Two variables can be configured:

OSC_CLIENT_CERTIFICATE (client-certificate-data) base64(PEM certificate)
OSC_CLIENT_KEY (client-key-data): base64(PEM RSA private key)

Both variables should be set or none of them. When both are set, CSI will use client-certificate-data and client-key-data to establish connexion to Outscale API.

Generate automatically helm docs

In order not to miss new arguments, it would be interesting to integrate helm-docs in the release process.

Support `topology.kubernetes.io/zone`

Right now we need to specify topology.bsu.outscale.com/zone to get the right AZ.

Could we support standard topology.kubernetes.io/zone ?

MaxTotalIOPS over Outscale's limit

/kind bug

What happened?
Maximum IOPS is higher than 13000 allowed by Outscale

As a result IOPS are trimmed down to 20000 and request fails - API access logs shows the request created with iopsPerGB: 300 and PVC of 100Gi

    "Logs": [
        {
            "ResponseStatusCode": 400,
            "ResponseSize": 143,
            "QueryPayloadRaw": "{\"Iops\":20000,\"Size\":100,\"SubregionName\":\"cloudgouv-eu-west-1a\",\"VolumeType\":\"io1\"}\n",
            "AccountId": "XXX",
            "QueryUserAgent": "osc-bsu-csi-driver/v1.0.0",
            "CallDuration": 34,
            "RequestId": "0b5b0926-ad14-4809-8afa-e18350e43de5",
            "QueryApiVersion": "1.22",
            "QueryIpAddress": "1.2.3.4",
            "QueryApiName": "oapi",
            "QueryPayloadSize": 84,
            "QueryCallName": "CreateVolume",
            "QueryAccessKey": "XXX",
            "QueryHeaderSize": 351,
            "QueryDate": "2022-10-04T10:11:55.587546Z",
            "QueryHeaderRaw": "Host: api.cloudgouv-eu-west-1.outscale.com\\nAccept: application/json\\nConnection: close\\nUser-Agent: osc-bsu-csi-driver/v1.0.0\\nX-Amz-Date: 20221004T101155Z\\nX-SSL-CERT: -----BEGIN CERTIFICATE----------END CERTIFICATE-----\\nContent-Type: application/json\\nAuthorization: *****\\nContent-Length: 84\\nAccept-Encoding: gzip\\nX-Forwarded-For: 1.2.3.4"
        },

What you expected to happen?
Controller to scale down IOPS to maximum allowed value and create a volume

How to reproduce it (as minimally and precisely as possible)?

allowVolumeExpansion: true
allowedTopologies:
  - matchLabelExpressions:
      - key: topology.bsu.csi.outscale.com/zone
        values:
          - cloudgouv-eu-west-1a
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: osc-io1-big
parameters:
  iopsPerGB: "300"
  type: io1
provisioner: bsu.csi.outscale.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: block-claim
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  storageClassName: osc-io1-big
  resources:
    requests:
      storage: 100Gi

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version):
Server Version: v1.23.10+rke2r1
Driver version:
v1.0.0

Fix Non-Idempotent functions

/kind bug

What happened?
The CSI specification impose that all functions must be idempotent.

Issue #130 shows that NodePublishVolume was not idempotent.

What to do ?
We need to check the idempotency of all functions.

Update provider name to outscale.com

Is your feature request related to a problem?/Why is this needed

Some labels are set with osc.com which should be changed to outscale.com as osc.com does not exist.

/feature

Describe the solution you'd like in detail

Adapt label-migration branch to reflect this change.

Unable to create IO1 storage from our Openshift management console

Hello,
we can not create IO1 type of storage from our Openshift cluster. we have followed your template to write our yaml file. here is our yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: bsu.csi.outscale.com
parameters:
type: io1
iopsPerGB: "10"
fsType: ext4

and here is the error we are receiving:

I0912 12:49:52.488384 1 connection.go:187] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-36278801-f9a4-436c-a36a-a54f298ccc3a": could not create volume in Outscale: 400 Bad Request

9473I0912 12:49:52.488419 1 controller.go:767] CreateVolume failed, supports topology = true, node selected false => may reschedule = false => state = Finished: rpc error: code = Internal desc = Could not create volume "pvc-36278801-f9a4-436c-a36a-a54f298ccc3a": could not create volume in Outscale: 400 Bad Request

9474I0912 12:49:52.488452 1 controller.go:1074] Final error received, removing PVC 36278801-f9a4-436c-a36a-a54f298ccc3a from claims in progress

9475W0912 12:49:52.488461 1 controller.go:933] Retrying syncing claim "36278801-f9a4-436c-a36a-a54f298ccc3a", failure 3

9476E0912 12:49:52.488475 1 controller.go:956] error syncing claim "36278801-f9a4-436c-a36a-a54f298ccc3a": failed to provision volume with StorageClass "outscale-bsu-io1": rpc error: code = Internal desc = Could not create volume "pvc-36278801-f9a4-436c-a36a-a54f298ccc3a": could not create volume in Outscale: 400 Bad Request

9477I0912 12:49:52.488495 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"test", UID:"36278801-f9a4-436c-a36a-a54f298ccc3a", APIVersion:"v1", ResourceVersion:"424882", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "outscale-bsu-io1": rpc error: code = Internal desc = Could not create volume "pvc-36278801-f9a4-436c-a36a-a54f298ccc3a": could not create volume in Outscale: 400 Bad Request

9478I0912 12:49:52.488673 1 request.go:1181] Request Body: {"count":4,"lastTimestamp":"2023-09-12T12:49:52Z","message":"failed to provision volume with StorageClass "outscale-bsu-io1": rpc error: code = Internal desc = Could not create volume "pvc-36278801-f9a4-436c-a36a-a54f298ccc3a": could not create volume in Outscale: 400 Bad Request"}

9479I0912 12:49:52.488746 1 round_trippers.go:435] curl -v -XPATCH -H "Accept: application/json, /" -H "Content-Type: application/strategic-merge-patch+json" -H "User-Agent: csi-provisioner/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer " '[https://172.30.0.1:443/api/v1/namespaces/kube-system/events/test.178427ad569f843a'https://172.30.0.1/api/v1/namespaces/kube-system/events/test.178427ad569f843a%27](https://172.30.0.1/api/v1/namespaces/kube-system/events/test.178427ad569f843a'%3Chttps://172.30.0.1/api/v1/namespaces/kube-system/events/test.178427ad569f843a%27)

9480I0912 12:49:52.496620 1 round_trippers.go:454] PATCH [https://172.30.0.1:443/api/v1/namespaces/kube-system/events/test.178427ad569f843ahttps://172.30.0.1/api/v1/namespaces/kube-system/events/test.178427ad569f843a](https://172.30.0.1/api/v1/namespaces/kube-system/events/test.178427ad569f843a%3Chttps://172.30.0.1/api/v1/namespaces/kube-system/events/test.178427ad569f843a) 200 OK in 7 milliseconds

Erreur Interface graphique :
[cid:5b6909bb-cb85-4156-aa57-8cd6f1080292]

Please can you help us find what is the problem ? i am ready to take a call whenever you want.

Sincerely,
Jordan

xfs as fstype: missing xfs_growfs in $PATH

/kind bug

What happened?
It seems that when we use xfs as fstype, allowVolumeExpansion could not work as in Docker image, we don't have the xfs_growfs binary.

I encounter this error:

MountVolume.Setup failed while expanding volume for volume "pvc-xxxxxxxxxxxxxxxxx" : Expander.NodeExpand failed to expand the volume : rpc error: code = Internal desc = Could not resize volume "vol-xxxxxxxx" ("/dev/xvdf"): resize of device /var/lib/kubelet/pods/xxxxxxxxxxxxx/volumes/kubernetes.io~csi/pvc-xxxxxxxxxxxxxxxxx/mount failed: executable file not found in $PATH. xfs_growfs output:

Checking the Docker file here https://github.com/outscale/osc-bsu-csi-driver/blob/v1.2.3/Dockerfile#L26 and according to Alpine pkg, you need also xfsprogs-extra to be able to use xfs_growfs https://alpine.pkgs.org/3.16/alpine-main-x86_64/xfsprogs-extra-5.16.0-r1.apk.html

Maybe you could also update Alpine release while you are here :-) https://github.com/outscale/osc-bsu-csi-driver/blob/v1.2.3/Dockerfile#L22

Environment

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.7", GitCommit:"84e1fc493a47446df2e155e70fca768d2653a398", GitTreeState:"clean", BuildDate:"2023-07-19T12:23:27Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.7+rke2r1", GitCommit:"84e1fc493a47446df2e155e70fca768d2653a398", GitTreeState:"clean", BuildDate:"2023-07-19T20:19:16Z", GoVersion:"go1.20.6 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Driver version:
v1.2.3

Thanks a lot for your help

Controller CrashLoopBackOff after fresh install

/kind bug

What happened?
During installation of driver i have ebs-csi-controller in CrashLoopBackOff in Kubernetes v1.23.5.
I have checked the pod logs and having this trace:

I0707 15:24:13.882407 1 driver.go:63] Driver: ebs.csi.aws.com Version: v0.0.15
panic: could not get metadata from OSC: EC2 instance metadata is not available
goroutine 1 [running]:
github.com/outscale-dev/osc-bsu-csi-driver/pkg/driver.newControllerService(0xc0001de960)
/build/pkg/driver/controller.go:83 +0x85
github.com/outscale-dev/osc-bsu-csi-driver/pkg/driver.NewDriver({0xc000517f58, 0x3, 0x40cef4})
/build/pkg/driver/driver.go:83 +0x579
main.main()
/build/cmd/main.go:31 +0x18f

I've checked my env vars, all seems to be ok (osc-csi-bsu secret is available on namespace):

[...]
  env:
    - name: CSI_ENDPOINT
      value: 'unix:///var/lib/csi/sockets/pluginproxy/csi.sock'
    - name: OSC_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: osc-csi-bsu
          key: access_key
          optional: true
    - name: OSC_SECRET_KEY
      valueFrom:
        secretKeyRef:
          name: osc-csi-bsu
          key: secret_key
          optional: true
    - name: AWS_REGION
      value: eu-west-2
[...]

I do not use any Network Policies on my namespace (i'have seen some thread saying that can be related)
My cluster have acces to internet

What you expected to happen?
EBS-CSI is working well and allow to create pv/pvc

How to reproduce it (as minimally and precisely as possible)?
On fresh Kubernetes, follow steps of this documentation:
https://github.com/outscale-dev/osc-bsu-csi-driver/blob/OSC-MIGRATION/docs/deploy.md

Anything else we need to know?:

Environment

Kubernetes version:
Kubernetes Version: v1.23.5+b0357ed
Driver version:
v0.0.15

Fix code scanning alert - CVE-2022-32149

Tracking issue for:

https://github.com/outscale-dev/osc-bsu-csi-driver/security/code-scanning/7

Update sidecar containers

Why is this needed
The sidecar csi-snapshotter uses a deprecated objects therefore multiple warnings are thrown.

W0119 10:55:42.256252       1 warnings.go:67] snapshot.storage.k8s.io/v1beta1 VolumeSnapshotClass is deprecated; use snapshot.storage.k8s.io/v1 VolumeSnapshotClass
W0119 11:01:41.256348       1 warnings.go:67] snapshot.storage.k8s.io/v1beta1 VolumeSnapshotContent is deprecated; use snapshot.storage.k8s.io/v1 VolumeSnapshotContent

/feature

Describe the solution you'd like in detail
Update all sidecars to get latest features

Support for encryption for storage class

Is your feature request related to a problem?/Why is this needed

As a user concerned by data storage security, I would like to be able to encrypt my data (either through block or mount) to prevent some case of unauthorized read from BSU volumes.

/feature

Describe the solution you'd like in detail

Be able to provide a secret at storage class declaration similarly to what portworx has done.

Secret can be passed to CSI thanks to secrets requirements.

This issue focus on encryption at storage class level, a future enhancement would be to allow specific secret definition at PersistentVolumeClaim level.

Add Kustomize deployment engine

What would you like to be added:
Publish deployment using Kustomize

Why is this needed:
Native deployment engine

/kind feature

CSI Node issue on unmount volume

/kind bug

What happened?
We experience one time, a pod stuck on terminating state and the volume was not unmount properly.

Idea
Implement a stress test that mount and unmount multiple times to see if we can reproduce it

Volume Resizing work only when no Pod are running on it ?

Hello,

Using latest version available (v1.2.0), I'd like to be sure that I can't expand a PVC while I have a Pod running on it.
If it's correct, is it something that could be implemented some days ?

Thanks for your help

Separate helm and plugin release

Helm and plugin might sometimes needs to live separatively.

We can take the example of AWS EBS plugin: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/releases

have a workflow to release the helm
have a workflow to release the plugin
mountain a matrix to know which plugin is supported by the helm

Sample EIM policy in install guide

Is your feature request related to a problem?/Why is this needed
As it is not a best practice to use root AK/SK with this tool, we should provide a sample Policy document that could be attached to a less-privileged EIM user.

/feature

Describe the solution you'd like in detail
A sample EIM policy should be present in the README, along with the recommendation to bind this policy to an EIM user dedicated for the BSU driver.

Describe alternatives you've considered
Precisely listing the privileges required for managing the BSUs, but it may be over-complicated for new-comers. A sample EIM policy is a good starting point.

Additional context
Sample policy that could be given to users:

{
    "Statement": [
        {
            "Action": [
                "api:ReadVms",
                "api:ReadVmsState",
                "api:ReadSnapshots",
                "api:CreateSnapshot",
                "api:DeleteSnapshot",
                "api:CreateVolume",
                "api:ReadVolumes",
                "api:LinkVolume",
                "api:UnlinkVolume",
                "api:UpdateVolume",
                "api:DeleteVolume"
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow"
        }
    ]
}

Remove snapshot-controller and CRD from helm package

Why is this needed
CRD and snapshot control are cluster responsibility to maintain as specified in kubernetes-csi

/feature

Describe the solution you'd like in detail
Remove it from the helm chart

not working in cloudgouv-eu-west-1 region

/kind bug

What happened?
disk is not allocated
What you expected to happen?
disk to be allocated
How to reproduce it (as minimally and precisely as possible)?
try to create a disk on a cluster in cloudgouv-eu-west-1 region
Anything else we need to know?:
driver is given a "not Authorized" error with reference to eu-west-2 region
Environment

Kubernetes version (use kubectl version): 1.23.14
Driver version: 1.1.0

Migration documentation from legacy CSI to Outscale labels

Is your feature request related to a problem?/Why is this needed

Main branch currently use "ebs"/"aws" labels in provider description / definition.
label-migration adapt this to more accurate "bsu"/"outscale" labels but this branch introduce breaking change.
A solution would be to explain how to migrate from old csi to new csi plugin.

/documentation

Describe the solution you'd like in detail

A safe migration would be to:

Install new csi plugin next to the old one, should be collide as they have different names.
Snapshot volumes from old csi and restore to new one.
Un-install old csi plugin

Update to go 1.17

What would you like to be added:

We would probably need to update to go 1.17 as mentioned in:

This would also be a good opportunity to update go dependencies.

Why is this needed:

Keep up-to-date with ecosystem dependencies

/kind feature

Fix code scanning alert - CVE-2021-46828

Tracking issue for:

https://github.com/outscale-dev/osc-bsu-csi-driver/security/code-scanning/1

IaaS/VolumeAttachment sync issue

/kind bug

What happened?
I encountered a bug following a permission problem. I had a wrongly configured EIM profile for OAPI (following upgrade from FCU API, I was missing actual api: permissions). BSU CSI driver was then unable to see volumes for a time.
However, it tried to manage instances and volume attachments at this time, and was expectedly unable to do so correctly.

When re-establishing correct credentials, I had a buggy VolumeAttachment resource, which was in an inconsistent state.

  status:
    attachError:
      message: 'rpc error: code = NotFound desc = Instance "i-xxxxxxxx" not found'
      time: "2021-02-23T17:37:57Z"
    attached: false
    detachError:
      message: 'rpc error: code = Internal desc = Could not detach volume "vol-xxxxxxxx"
        from node "i-xxxxxxxx": could not detach volume "vol-xxxxxxxx" from node "i-xxxxxxxx":
        409 Conflict'
      time: "2021-02-23T18:19:28Z"

At IaaS level, the BSU was then detached.

From the logs, the BSU CSI driver tried to forcibly de-attach the device, presumably to make certain that it was not attached as an error before re-creating the attachment.

The BSU driver then goes along this code path: https://github.com/outscale-dev/osc-bsu-csi-driver/blob/OSC-MIGRATION/pkg/cloud/cloud.go#L525 and tries to detach an already detached volume. It fails, and the driver then enters a loop of failed Detach.

What you expected to happen?

Even if the initial situation occured because a configuration error, the driver should converge successfully after re-establishment of the correct configuration.

How to reproduce it (as minimally and precisely as possible)?

I have not a clear scenario to make this happen to be honest. Perhaps manually detaching a volume at IaaS could trigger the behavior.

Anything else we need to know?:

I suggest that when IaaS returns that the disk is not attached to any instance, we should not try the UnlinkVolume call, and consider the Detach already done. It could be done I think by adding a successful return after https://github.com/outscale-dev/osc-bsu-csi-driver/blob/27ea8b5107143776b0cca0479e861a45d5ac8564/pkg/cloud/cloud.go#L526.

Environment

Kubernetes version (use kubectl version): 1.18.16
Driver version: 0.0.8beta

Lien privé vers le CSI driver

hello,

We are creating storage class (IO1, GP2 and standard) from our Openshift cluster.

Unfortunately, right now we are only able to create it through public endpoint.

Can you provide us a private endpoint that achieves the creation of those storages class without going to internet ?

Sincerely,
Jordan

Documentation refactoring

What would you like to be added:

Arrange documentation to split between topics:

docs/README.md: general explanation and pointers to other documentations (internal or external)
docs/deploying.md: how to install, configure and remove ccm
docs/testing.md: how to test CCM
docs/contributing.md: general information round contributions like how to release

Why is this needed:

Make life easier of new users / developers

/kind documentation

Set User-Agent in API calls

Why is this needed
In order to distinguish APi calls from multiple services, user-agent is used to detect it

/feature

Describe the solution you'd like in detail
Set user-agent in the header to CSI

Tune component verbosity in Helm Chart

Is your feature request related to a problem?/Why is this needed

Various containers are currently set to verbosity level 10. It may be practical for debugging, however it is very verbose for one part, and for the other the debug output contains potentially sensitive information, that should not be enabled except for actual troubleshooting.

/feature

Describe the solution you'd like in detail
Verbosity level should be configurable at deployment time, and not be 10 by default.

Describe alternatives you've considered

Statically set verbosity of the containers to 2 - but then it impairs debugging.
Set a per-container verbosity knob, but it seemed a little too much changes.

Additional context

Info about Kubernetes debug levels:
https://kubernetes.io/docs/reference/kubectl/cheatsheet/#kubectl-output-verbosity-and-debugging

Fix links to Outscale documentation

Outscale documentation has moved to docs.outscale.com and a lot of links are still pointing to wiki.outscale.net.
This need to be fixed.

Symbolic link not supported by helm-git plugin

/kind bug

Hello,

using version v1.1.0, sym linking https://github.com/outscale-dev/osc-bsu-csi-driver/blob/v1.1.0/osc-bsu-csi-driver/CHANGELOG.md to https://github.com/outscale-dev/osc-bsu-csi-driver/blob/v1.1.0/CHANGELOG-1.X.md seems not supported using Helm.

Here the output using simple helm commands:

gbellongervais@me:~/outscale/osc-k8s-rke-cluster$ helm plugin install https://github.com/aslafy-z/helm-git --version 0.14.0
Installed plugin: helm-git
gbellongervais@me:~/outscale/osc-k8s-rke-cluster$ helm repo add osc git+https://www.github.com/outscale-dev/osc-bsu-csi-driver/@osc-bsu-csi-driver?ref=v1.1.0
Error: error evaluating symlink /tmp/helm-git.M9JJkZ/osc-bsu-csi-driver/CHANGELOG.md: lstat /tmp/helm-git.M9JJkZ/CHANGELOG-1.X.md: no such file or directory
Error: looks like "git+https://www.github.com/outscale-dev/osc-bsu-csi-driver/@osc-bsu-csi-driver?ref=v1.1.0" is not a valid chart repository or cannot be reached: plugin "helm-git" exited with error

I have the same issue using https://github.com/outscale-dev/osc-k8s-rke-cluster with updated addons version:

gbellongervais@me:~/outscale/osc-k8s-rke-cluster$ ANSIBLE_CONFIG=ansible.cfg ansible-playbook addons/csi/playbook.yaml

PLAY [Setup OSC-CSI] ***********************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************************************************************************
ok: [localhost]

TASK [download helm] ***********************************************************************************************************************************************************************************************************************
changed: [localhost]

TASK [Install Helm-git] ********************************************************************************************************************************************************************************************************************
changed: [localhost]

TASK [Add Outscale repository] *************************************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["$HELM_BIN", "repo", "add", "osc", "git+https://www.github.com/outscale-dev/osc-bsu-csi-driver/@osc-bsu-csi-driver?ref=v1.1.0"], "delta": "0:00:01.579029", "end": "2022-12-15 18:15:26.963329", "msg": "non-zero return code", "rc": 1, "start": "2022-12-15 18:15:25.384300", "stderr": "Error: error evaluating symlink /tmp/helm-git.PJGIwI/osc-bsu-csi-driver/CHANGELOG.md: lstat /tmp/helm-git.PJGIwI/CHANGELOG-1.X.md: no such file or directory\nError: looks like \"git+https://www.github.com/outscale-dev/osc-bsu-csi-driver/@osc-bsu-csi-driver?ref=v1.1.0\" is not a valid chart repository or cannot be reached: plugin \"helm-git\" exited with error", "stderr_lines": ["Error: error evaluating symlink /tmp/helm-git.PJGIwI/osc-bsu-csi-driver/CHANGELOG.md: lstat /tmp/helm-git.PJGIwI/CHANGELOG-1.X.md: no such file or directory", "Error: looks like \"git+https://www.github.com/outscale-dev/osc-bsu-csi-driver/@osc-bsu-csi-driver?ref=v1.1.0\" is not a valid chart repository or cannot be reached: plugin \"helm-git\" exited with error"], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************************************************************************************************************************************************************************
localhost                  : ok=3    changed=2    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

I'm on Ubuntu 22.04.1 LTS (WSL2 version on Windows 11 but not sure it is related)

Idempotency Unpublish Volume

When the disk is already detached, the plugin is stucked like that:

{"Vms":[{"VmType":"tinav4.c32r64p1","VmInitiatedShutdownBehavior":"stop","State":"running","StateReason":"","RootDeviceType":"ebs","RootDeviceName":"/dev/sda1","IsSourceDestChecked":true,"KeypairName":"thanos","ImageId":"ami-1cda4f98","DeletionProtection":false,"VmId":"i-a7d27e54","ReservationId":"r-fade62d8","Hypervisor":"xen","CreationDate":"2022-03-03T13:42:50.624Z","UserData":"I2Nsb3VkLWNvbmZpZwoKeXVtX3JlcG9zOgogIGVwZWwtcmVsZWFzZToKICAgIGJhc2V1cmw6IGh0dHA6Ly9kb3dubG9hZC5mZWRvcmFwcm9qZWN0Lm9yZy9wdWIvZXBlbC83LyRiYXNlYXJjaAogICAgZW5hYmxlZDogdHJ1ZQogICAgZmFpbG92ZXJtZXRob2Q6IHByaW9yaXR5CiAgICBncGdjaGVjazogdHJ1ZQogICAgZ3Bna2V5OiBodHRwOi8vZG93bmxvYWQuZmVkb3JhcHJvamVjdC5vcmcvcHViL2VwZWwvUlBNLUdQRy1LRVktRVBFTC03CiAgICBuYW1lOiBFeHRyYSBQYWNrYWdlcyBmb3IgRW50ZXJwcmlzZSBMaW51eCA3IC0gUmVsZWFzZQogIHNhbHRzdGFjazoKICAgIGJhc2V1cmw6IGh0dHBzOi8vcmVwby5zYWx0cHJvamVjdC5pby9weTMvcmVkaGF0LzcveDg2XzY0LzMwMDIvCiAgICBlbmFibGVkOiB0cnVlCiAgICBmYWlsb3Zlcm1ldGhvZDogcHJpb3JpdHkKICAgIGdwZ2NoZWNrOiB0cnVlCiAgICBncGdrZXk6IGh0dHBzOi8vcmVwby5zYWx0cHJvamVjdC5pby9weTMvcmVkaGF0LzcveDg2XzY0LzMwMDIvU0FMVFNUQUNLLUdQRy1LRVkucHViCiAgICBuYW1lOiBTYWx0U3RhY2sgUmVwbyAzMDAwLjIKCiAgCnBhY2thZ2VzOgogIC0gaHRvcAogIC0gaW90b3AKICAtIGlmdG9wCiAgLSB2aW0KICAtIHNhbHQtbWluaW9uCgp3cml0ZV9maWxlczoKLSBjb250ZW50OiB8CiAgICBoYXNoX3R5cGU6IHNoYTI1NgogICAgaWQ6ICJwYXIxLWNsb3VkLXByb20td29ya2VyLTEiCiAgICBsb2dfbGV2ZWw6IGluZm8KICAgIG1hc3RlcjogMTAuMjQuMS41CiAgcGF0aDogL2V0Yy9zYWx0L21pbmlvbgotIGNvbnRlbnQ6IHwKICAgICJwYXIxLWNsb3VkLXByb20td29ya2VyLTEiCiAgcGF0aDogL2V0Yy9zYWx0L21pbmlvbl9pZAoKcnVuY21kOgogIC0gc3VkbyBob3N0bmFtZWN0bCBzZXQtaG9zdG5hbWUgInBhcjEtY2xvdWQtcHJvbS13b3JrZXItMSIKICAtIFsgc3lzdGVtY3RsLCBkYWVtb24tcmVsb2FkIF0KICAtIFsgc3lzdGVtY3RsLCBlbmFibGUsIHNhbHQtbWluaW9uIF0KICAtIFsgc3lzdGVtY3RsLCBzdGFydCwgLS1uby1ibG9jaywgc2FsdC1taW5pb24gXQo=","SubnetId":"subnet-a5e344cc","PrivateIp":"10.24.0.8","SecurityGroups":[{"SecurityGroupName":"eu-west-2-common","SecurityGroupId":"sg-6c9713c0"}],"BsuOptimized":false,"BlockDeviceMappings":[{"DeviceName":"/dev/sda1","Bsu":{"VolumeId":"vol-b5afd0ce","State":"attached","LinkDate":"2022-05-31T07:44:46.279Z","DeleteOnVmDeletion":false}}],"ProductCodes":["0001"],"Placement":{"Tenancy":"default","SubregionName":"eu-west-2a"},"Architecture":"x86_64","NestedVirtualization":false,"LaunchNumber":0,"NetId":"vpc-96a7ffe2","Nics":[{"SubnetId":"subnet-a5e344cc","AccountId":"542438614293","Description":"Primary network interface","IsSourceDestChecked":true,"PrivateDnsName":"ip-10-24-0-8.eu-west-2.compute.internal","State":"in-use","LinkNic":{"State":"attached","LinkNicId":"eni-attach-1648dc3d","DeviceNumber":0,"DeleteOnVmDeletion":true},"SecurityGroups":[{"SecurityGroupName":"eu-west-2-common","SecurityGroupId":"sg-6c9713c0"}],"MacAddress":"aa:e8:ef:ea:79:42","NetId":"vpc-96a7ffe2","NicId":"eni-b15daae5","PrivateIps":[{"PrivateDnsName":"ip-10-24-0-8.eu-west-2.compute.internal","PrivateIp":"10.24.0.8","IsPrimary":true}]}],"Performance":"highest","Tags":[{"Value":"par1-cloud-prom-worker-1","Key":"Name"},{"Value":"10.24.1.5","Key":"saltmaster"}],"PrivateDnsName":"ip-10-24-0-8.eu-west-2.compute.internal"}],"ResponseContext":{"RequestId":"4878a62e-1ecd-44fa-a5af-b3bd8c868057"}}
I1017 13:27:57.093845       1 cloud.go:995] Debug response DescribeInstances: response({ResponseContext:0xc0006a0250 Vms:0xc0004f5ce0}), err(<nil>), httpRes(&{200 OK 200 HTTP/1.1 1 1 map[Access-Control-Allow-Origin:[*] Connection:[keep-alive] Content-Length:[3173] Content-Type:[application/json] Date:[Mon, 17 Oct 2022 13:27:57 GMT] Referrer-Policy:[same-origin] Server:[nginx] Strict-Transport-Security:[max-age=31536000; includeSubdomains;] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[1; mode=block]] {{"Vms":[{"VmType":"tinav4.c32r64p1","VmInitiatedShutdownBehavior":"stop","State":"running","StateReason":"","RootDeviceType":"ebs","RootDeviceName":"/dev/sda1","IsSourceDestChecked":true,"KeypairName":"thanos","ImageId":"ami-1cda4f98","DeletionProtection":false,"VmId":"i-a7d27e54","ReservationId":"r-fade62d8","Hypervisor":"xen","CreationDate":"2022-03-03T13:42:50.624Z","UserData":"I2Nsb3VkLWNvbmZpZwoKeXVtX3JlcG9zOgogIGVwZWwtcmVsZWFzZToKICAgIGJhc2V1cmw6IGh0dHA6Ly9kb3dubG9hZC5mZWRvcmFwcm9qZWN0Lm9yZy9wdWIvZXBlbC83LyRiYXNlYXJjaAogICAgZW5hYmxlZDogdHJ1ZQogICAgZmFpbG92ZXJtZXRob2Q6IHByaW9yaXR5CiAgICBncGdjaGVjazogdHJ1ZQogICAgZ3Bna2V5OiBodHRwOi8vZG93bmxvYWQuZmVkb3JhcHJvamVjdC5vcmcvcHViL2VwZWwvUlBNLUdQRy1LRVktRVBFTC03CiAgICBuYW1lOiBFeHRyYSBQYWNrYWdlcyBmb3IgRW50ZXJwcmlzZSBMaW51eCA3IC0gUmVsZWFzZQogIHNhbHRzdGFjazoKICAgIGJhc2V1cmw6IGh0dHBzOi8vcmVwby5zYWx0cHJvamVjdC5pby9weTMvcmVkaGF0LzcveDg2XzY0LzMwMDIvCiAgICBlbmFibGVkOiB0cnVlCiAgICBmYWlsb3Zlcm1ldGhvZDogcHJpb3JpdHkKICAgIGdwZ2NoZWNrOiB0cnVlCiAgICBncGdrZXk6IGh0dHBzOi8vcmVwby5zYWx0cHJvamVjdC5pby9weTMvcmVkaGF0LzcveDg2XzY0LzMwMDIvU0FMVFNUQUNLLUdQRy1LRVkucHViCiAgICBuYW1lOiBTYWx0U3RhY2sgUmVwbyAzMDAwLjIKCiAgCnBhY2thZ2VzOgogIC0gaHRvcAogIC0gaW90b3AKICAtIGlmdG9wCiAgLSB2aW0KICAtIHNhbHQtbWluaW9uCgp3cml0ZV9maWxlczoKLSBjb250ZW50OiB8CiAgICBoYXNoX3R5cGU6IHNoYTI1NgogICAgaWQ6ICJwYXIxLWNsb3VkLXByb20td29ya2VyLTEiCiAgICBsb2dfbGV2ZWw6IGluZm8KICAgIG1hc3RlcjogMTAuMjQuMS41CiAgcGF0aDogL2V0Yy9zYWx0L21pbmlvbgotIGNvbnRlbnQ6IHwKICAgICJwYXIxLWNsb3VkLXByb20td29ya2VyLTEiCiAgcGF0aDogL2V0Yy9zYWx0L21pbmlvbl9pZAoKcnVuY21kOgogIC0gc3VkbyBob3N0bmFtZWN0bCBzZXQtaG9zdG5hbWUgInBhcjEtY2xvdWQtcHJvbS13b3JrZXItMSIKICAtIFsgc3lzdGVtY3RsLCBkYWVtb24tcmVsb2FkIF0KICAtIFsgc3lzdGVtY3RsLCBlbmFibGUsIHNhbHQtbWluaW9uIF0KICAtIFsgc3lzdGVtY3RsLCBzdGFydCwgLS1uby1ibG9jaywgc2FsdC1taW5pb24gXQo=","SubnetId":"subnet-a5e344cc","PrivateIp":"10.24.0.8","SecurityGroups":[{"SecurityGroupName":"eu-west-2-common","SecurityGroupId":"sg-6c9713c0"}],"BsuOptimized":false,"BlockDeviceMappings":[{"DeviceName":"/dev/sda1","Bsu":{"VolumeId":"vol-b5afd0ce","State":"attached","LinkDate":"2022-05-31T07:44:46.279Z","DeleteOnVmDeletion":false}}],"ProductCodes":["0001"],"Placement":{"Tenancy":"default","SubregionName":"eu-west-2a"},"Architecture":"x86_64","NestedVirtualization":false,"LaunchNumber":0,"NetId":"vpc-96a7ffe2","Nics":[{"SubnetId":"subnet-a5e344cc","AccountId":"542438614293","Description":"Primary network interface","IsSourceDestChecked":true,"PrivateDnsName":"ip-10-24-0-8.eu-west-2.compute.internal","State":"in-use","LinkNic":{"State":"attached","LinkNicId":"eni-attach-1648dc3d","DeviceNumber":0,"DeleteOnVmDeletion":true},"SecurityGroups":[{"SecurityGroupName":"eu-west-2-common","SecurityGroupId":"sg-6c9713c0"}],"MacAddress":"aa:e8:ef:ea:79:42","NetId":"vpc-96a7ffe2","NicId":"eni-b15daae5","PrivateIps":[{"PrivateDnsName":"ip-10-24-0-8.eu-west-2.compute.internal","PrivateIp":"10.24.0.8","IsPrimary":true}]}],"Performance":"highest","Tags":[{"Value":"par1-cloud-prom-worker-1","Key":"Name"},{"Value":"10.24.1.5","Key":"saltmaster"}],"PrivateDnsName":"ip-10-24-0-8.eu-west-2.compute.internal"}],"ResponseContext":{"RequestId":"4878a62e-1ecd-44fa-a5af-b3bd8c868057"}}} 3173 [] false false map[] 0xc000838600 0xc00057a630})
W1017 13:27:57.093949       1 cloud.go:570] DetachDisk called on non-attached volume: vol-bd1eea74
2022/10/17 13:27:57 
POST /api/v1/UnlinkVolume HTTP/1.1
Host: api.eu-west-2.outscale.com
User-Agent: osc-bsu-csi-driver/
Content-Length: 28
Accept: application/json
Authorization: AWS4-HMAC-SHA256 Credential=/20221017/eu-west-2/oapi/aws4_request, SignedHeaders=accept;content-type;host;x-amz-date, Signature=442d830c639266f4ef2bb83d2d6aaf4ceebcc92e1dbd4e73e6c4df376a80c332
Content-Type: application/json
X-Amz-Date: 20221017T132757Z
Accept-Encoding: gzip

{"VolumeId":"vol-bd1eea74"}

2022/10/17 13:27:57 
HTTP/1.1 400 Bad Request
Content-Length: 179
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json
Date: Mon, 17 Oct 2022 13:27:57 GMT
Server: nginx

{"Errors":[{"Type":"InvalidResource","Details":"The VolumeId 'vol-bd1eea74' doesn't exist.","Code":"5064"}],"ResponseContext":{"RequestId":"d7b87e6f-a7b3-4c15-9491-8bbbc3d8e7e7"}}
I1017 13:27:57.134696       1 cloud.go:579] Debug response DetachVolume: response({ResponseContext:<nil>}), err(400 Bad Request) httpRes(&{400 Bad Request 400 HTTP/1.1 1 1 map[Access-Control-Allow-Origin:[*] Connection:[keep-alive] Content-Length:[179] Content-Type:[application/json] Date:[Mon, 17 Oct 2022 13:27:57 GMT] Server:[nginx]] {{"Errors":[{"Type":"InvalidResource","Details":"The VolumeId 'vol-bd1eea74' doesn't exist.","Code":"5064"}],"ResponseContext":{"RequestId":"d7b87e6f-a7b3-4c15-9491-8bbbc3d8e7e7"}}} 179 [] false false map[] 0xc00057fc00 0xc00057a630})
400 Bad Request
E1017 13:27:57.134785       1 driver.go:112] GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-bd1eea74" from node "i-a7d27e54": could not detach volume "vol-bd1eea74" from node "i-a7d27e54": 400 Bad Request / (<nil>)

access to our created OMI on cockpit v1

Hello,

We would like to know if it is possible for us to have access to our created OMI on cockpit v1. We have access through cockpit v2 but it is unpleasant to switch our interface often to see our OMI.

Sincerely,
Jordan

Support extra environment variables in the Chart (e.g. for HTTP Proxy)

Is your feature request related to a problem? Please describe.

I would like to use osc-bsu-csi-driver behind an HTTP proxy. A possible way to do this is to pass environment variables like https_proxy to the BSU CSI Driver containers. However, this is not supported today, and forces me to work around by patching the Chart.

Describe the solution you'd like in detail

Two enhancements could be done:

A generic way of overriding environment variables on the BSU CSI Driver (see extraEnv suggestion below).
Documenting the required environment variables to run the BSU CSI Driver behind a Proxy (https_proxy, no_proxy).

Additional context

Helm charts typically have knobs like extraEnv in the values.yaml to set static extra environment variables in the Deployments, StatefulSets, etc.

Some variables could contain secrets, and therefore often an extraSecretEnv is also offered, which manages a dedicated secret and its presentation.

Example overrides in a chart: https://github.com/hashicorp/vault-helm/blob/main/values.yaml#L511

Name allocator is too restrictive

What happened?
The name allocator only check devices from ato z and if something wrong happens with some disk we could have a shortage of name like this:

failed to attach: rpc error: code = Internal desc = Could not attach volume "vol-X" to node "i-X": could not get a free device name to assign to node i-X

What you expected to happen?
The API and Linux allows names like /dev/xvdYZ with Y and Z belonging to [a,z]. We need to update these allocator to handle more device name.

How to reproduce it (as minimally and precisely as possible)?
Have a 25 devices used and try to attach a disk

Environment

Driver version: v0.0.15

Publish to artifacthub.io

Is your feature request related to a problem?/Why is this needed

Deploying CSI is easy but still require few manual operations.
Newcomers may be interested to easily install CSI through artifacthub.io
/feature

Describe the solution you'd like in detail

Publish Outscale CSI on operatorhub.io

Trace CSI API calls on Outscale API

What would you like to be added:

As a user and developper, I would like to be able to trace csi calls on Outscale API.
This would include kubernetes version and csi version.

Why is this needed:

Add more details on User-Agent.
Should be set in both branches (osc-migration and label-migration) in newOscCloud

/kind feature

BSU Volume attachments issue

/kind bug

What happened?
When reaching the limit of the maximum number of volume attached to one node, we experience some issue on the attachment.
This behavior happens from time to time.

Idea
Add a stress test for this.

Fix code scanning alert - CVE-2021-46828

Tracking issue for:

https://github.com/outscale-dev/osc-bsu-csi-driver/security/code-scanning/2

Missing osc-bsu-csi-driver chart asset

I'm looking for an archive of osc-bsu-csi-driver chart to download, the last version did not produce any assets. For a comparison cloud-provider-osc assets is made available for each release https://github.com/outscale/cloud-provider-osc/releases/tag/v0.2.0

It would be very helpful to have such assets

Change default ext4 to xfs

Hi, ext4 is used as default. In Outscale cloud, we would recommend to use xfs for better block usage with snapshotting.

Fix code scanning alert - CVE-2022-27664

Tracking issue for:

https://github.com/outscale-dev/osc-bsu-csi-driver/security/code-scanning/5

Topology awareness

Since Kubernetes v1.17, CSI driver can be topology aware.

What does that mean ?

By answering to the RPC NodeGetInfo with accessible_topology, the external-provisioner side-cars will always emit a CreateVolume with an aggregate topology. Thus, the CSI just need to get the requisite or prefered topology nd create a volume with the right zone.

What to do ?

Remove the requirement to have an access to the meta-data from the controller.
Remove the ambiguois choice of zone when nothing were provided

outscale / osc-bsu-csi-driver Goto Github PK

osc-bsu-csi-driver's Issues

What does that mean ?

What to do ?

Recommend Projects

Recommend Topics

Recommend Org