scylladb / k8s-local-volume-provisioner Goto Github PK

ScyllaDB local volume provisioner for Kubernetes based on CSI

License: Apache License 2.0

Dockerfile 1.27% Makefile 5.41% Go 82.58% Shell 10.74%

k8s-local-volume-provisioner's Introduction

[](https://github.com/scylladb/local-csi-driver [](https://goreportcard.com/report/github.com/scylladb/local-csi-driver

Local Volume Provisioner

This repository contains an experimental code. ScyllaDB don't provide any support, nor any guarantees about backward compatibility.

The Local Volume Provisioner implements the Container Storage Interface (CSI), a specification for container orchestrators to manage the lifecycle of volumes.

Features

Local Volume Provisioner supports dynamic provisioning on local disks. It allows storage volumes to be created on-demand by managing directories created on disks attached to instances. On supported filesystems, directories have quota limitations to ensure volume size limits.

List of features driver supports:

Dynamic provisioning - Uses a persistent volume claim (PVC) to dynamically provision a persistent volume (PV).
Persistent volume capacity limiting - Uses FS Quotas to enforce volume capacity limits.
Storage Capacity Tracking - Container Orchestration scheduler can fetch information about node capacity and prevent from scheduling workloads on nodes not satisfying storage capacity constraints.
Topology - Volumes are constrained to land on the same node where they were originally created.

The following CSI features are implemented:

Controller Service
Node Service
Identity Service

Installation

Provisioner requires existing directory created on host where dynamic volumes will be managed. Currently, quotas are only supported on XFS filesystems. When the volume directory is using an unsupported filesystem, volume sizes aren't limited, and users won't receive any IO error when they overflow the volume.

Volume directory

Users can create volume directories themselves, or use the provided example which creates a 10GB image on every host in the k8s cluster, formats it to XFS and mounts it in a particular location. To deploy a DaemonSet creating it:

kubectl apply -f example/disk-setup
kubectl -n xfs-disk-setup rollout status daemonset.apps/xfs-disk-setup

Driver deployment:

HostPath where volume directory is created on each k8s node must be provided to the driver's DaemonSet via volumes-dir volume.

If you want to deploy the driver:

kubectl apply -f deploy/kubernetes
kubectl -n local-csi-driver rollout status daemonset.apps/local-csi-driver

Development

Please go through CSI Spec and Kubernetes CSI Developer Documentation to get some basic understanding of CSI driver before you start.

Requirements

Golang 1.18+
Kubernetes 1.24+

Testing

To execute all unit tests and e2e test suites run: make test

License

This library is licensed under the Apache 2.0 License.

k8s-local-volume-provisioner's People

Contributors

Stargazers

Watchers

Forkers

zimnx tnozicka mykaul beyondaman rzetelskik ylebi

k8s-local-volume-provisioner's Issues

PV space usage metrics missing for scylla storage class

I expect to see PV stats, like diskspace free etc. I am already scraping kubelet metrics, but I don't see any metrics when using the scylla storage class and csi driver.

The CSI driver is responsible for providing metrics. Specifically, the endpoint NodeGetVolumeStats is supposed to provide metrics around disk and inode capacity.

The version we're using is :0.1.0@sha256:d8c12e96d352077fd4b639c8e4fb050e49e0be11161d78d9339b69f73466f59a

as suggested in the documentation for setup based on the operator version (v1.10.0) we are currently using.

Build images for arm64 as well

We should build all images for amd64 and arm64 to have a stepping stone for migrating the operator as well.

Requires

Beta Give feedback

https://github.com/scylladb/scylla-operator-release/issues/69
https://github.com/scylladb/scylla-operator-release/issues/70
Options

[Flake] ./hack/.ci/run-e2e-gke.sh: line 67: exit: : numeric argument required

 ++ unblock-must-gather-pod
++ kubectl -n e2e exec pod/must-gather -- bash -euEo pipefail -O inherit_errexit -c 'touch /tmp/exit'
+ [[ -f /tmp/exit ]]
+ sleep 1
+ [[ -f /tmp/exit ]]
++ kubectl -n e2e get pods/must-gather '--output=jsonpath={.status.containerStatuses[0].state.terminated.exitCode}'
+ exit_code=
+ kubectl -n e2e delete pod/must-gather --wait=false
pod "must-gather" deleted
+ [[ '' != \0 ]]
+ echo 'Collecting artifacts using must-gather failed'
Collecting artifacts using must-gather failed
+ exit ''
./hack/.ci/run-e2e-gke.sh: line 67: exit: : numeric argument required

Flakes on master several times in a row, e.g. https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/pr-logs/pull/scylladb_local-csi-driver/57/pull-local-csi-driver-master-e2e-gke-parallel/1813527225367007232#1:test-build-log.txt%3A5892

Changes in volumes directory's underlying filesystem are not reflected in driver's state at runtime

If you run the provisioner with volumes-dir pointing to a directory with preexisting volume state files, it will account for them when building the state, which will also be reflected in CSIStorageCapacity.

$ ls -ls --block-size=K /mnt/persistent-volumes
total 4K
0K drwxr-x--- 6 root root 1K Jul  7 15:16 347e0693-f832-4fc1-a2dc-4928843b3238
4K -rw-r--r-- 1 root root 1K Jul  7 15:15 347e0693-f832-4fc1-a2dc-4928843b3238.json

$ df -h -BK /mnt/persistent-volumes/
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/loop5     20961280K 343484K 20617796K   2% /mnt/persistent-volumes

$ kubectl get csistoragecapacities.storage.k8s.io  csisc-6pnmh -ogo-template --template '{{ .capacity }}'
15718392Ki%

Yet if you delete the volume state file from the volumes directory, the change won't be reflected in the CSIStorageCapacity:

$ ls -ls --block-size=K /mnt/persistent-volumes
total 0K

$ df -h -BK /mnt/persistent-volumes/
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/loop5     20961280K 179204K 20782076K   1% /mnt/persistent-volumes

$ kubectl get csistoragecapacities.storage.k8s.io  csisc-6pnmh -ogo-template --template '{{ .capacity }}'
15718392Ki%

This can lead to the driver being unable to provision PVs despite having enough available storage.

The expected behaviour is for the driver to update capacity on changes in the filesystem of its volumes directory.

cc @zimnx

CVE-2024-24788: A malformed DNS message in response to a query can cause the Lookup functions to get stuck in an infinite loop.

https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/logs/ci-k8s-local-volume-provisioner-master-govulncheck/1788146917583097856

Vulnerability #1: GO-2024-2824
    Malformed DNS message can cause infinite loop in net
  More info: https://pkg.go.dev/vuln/GO-2024-2824
  Standard library
    Found in: [email protected]
    Fixed in: [email protected]
    Example traces found:
      #1: pkg/thirdparty/github.com/onsi/ginkgo/v2/exposedinternal/parallel_support/rpc_client.go:31:39: parallel_support.rpcClient.Connect calls rpc.DialHTTPPath, which calls net.Dial
      #2: test/e2e/set/localdriver/quotas.go:27:19: localdriver.init calls ginkgo.Describe, which eventually calls net.DialTimeout
      #3: pkg/thirdparty/github.com/onsi/ginkgo/v2/exposedinternal/parallel_support/http_client.go:14[6](https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/logs/ci-k8s-local-volume-provisioner-master-govulncheck/1788146917583097856#1:build-log.txt%3A6):24: parallel_support.httpClient.Write calls http.Post, which eventually calls net.Dialer.Dial
      #4: pkg/thirdparty/github.com/onsi/ginkgo/v2/exposedinternal/parallel_support/http_client.go:146:24: parallel_support.httpClient.Write calls http.Post, which eventually calls net.Dialer.DialContext
      #5: pkg/thirdparty/github.com/onsi/ginkgo/v2/exposedinternal/parallel_support/http_server.go:31:29: parallel_support.newHttpServer calls net.Listen
      #6: pkg/cmd/local-csi-driver/driver.go:16[7](https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/logs/ci-k8s-local-volume-provisioner-master-govulncheck/1788146917583097856#1:build-log.txt%3A7):28: local.LocalDriverOptions.run calls net.ListenConfig.Listen
      #7: pkg/signals/signal.go:36:9: signals.StopChannel calls sync.Once.Do, which eventually calls net.LookupIP
      #[8](https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/logs/ci-k8s-local-volume-provisioner-master-govulncheck/1788146917583097856#1:build-log.txt%3A8): test/e2e/set/localdriver/quotas.go:27:1[9](https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/logs/ci-k8s-local-volume-provisioner-master-govulncheck/1788146917583097856#1:build-log.txt%3A9): localdriver.init calls ginkgo.Describe, which eventually calls net.Resolver.LookupHost
      #9: test/e2e/set/localdriver/quotas.go:27:19: localdriver.init calls ginkgo.Describe, which eventually calls net.Resolver.LookupSRV
      #[10](https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/logs/ci-k8s-local-volume-provisioner-master-govulncheck/1788146917583097856#1:build-log.txt%3A10): test/e2e/set/localdriver/quotas.go:27:19: localdriver.init calls ginkgo.Describe, which eventually calls net.Resolver.LookupTXT

/kind feature
/priority critical-urgent
/assign

Use DaemonSet MaxSurge for stability and availability

We should use maxSurge in our deploy files to make sure we

a) spin up a new Pod on the node while the old one keeps working
This is important to reduce downtime as say pull time can take long, hit an error, backoff or it may not start

b) to make sure the new pod start far enough (and isn't broken) before we tear down the old Pod

Create new docker/quay repositories for local-csi-driver

Context

As part of #39, we need to create new docker.io / quay.io repositories.

What's need to be done?

Create the following on public docker.io and quay.io repositories:

scylladb/local-csi-driver
scylladb/local-csi-driver-tests

[Flake] Sanity tests

Sanity tests are flaky on current master (ba394ac).

To reproduce:

$  go run ./vendor/github.com/onsi/ginkgo/v2/ginkgo --until-it-fails -race ./test/sanity/...
...
Running Suite: Sanity Suite - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/test/sanity/set/localdriver
===========================================================================================================================
Random Seed: 1715243119

Will run 83 of 84 specs
••SS
------------------------------
• [FAILED] [60.080 seconds]
Local CSI Driver CSI sanity Controller Service [Controller Server] [BeforeEach] ListVolumes check the presence of new volumes and absence of deleted ones in the volume list
  [BeforeEach] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/tests.go:46
  [It] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/controller.go:209

  Timeline >>
  STEP: connecting to CSI driver @ 05/09/24 10:25:19.402
  [FAILED] in [BeforeEach] - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/09/24 10:26:19.458
  << Timeline

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc0004ded00>:
      Connection timed out
      {
          s: "Connection timed out",
      }
  occurred
  In [BeforeEach] at: /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/09/24 10:26:19.458
------------------------------
P [PENDING]
Local CSI Driver CSI sanity Controller Service [Controller Server] ListVolumes pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted
/home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/controller.go:268
------------------------------
•••••••SSSS•••••••SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS•••••••••••SSSSSSSSSSSSS••

Summarizing 1 Failure:
  [FAIL] Local CSI Driver CSI sanity Controller Service [Controller Server] [BeforeEach] ListVolumes check the presence of new volumes and absence of deleted ones in the volume list
  /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265

Ran 30 of 84 Specs in 61.337 seconds
FAIL! -- 29 Passed | 1 Failed | 1 Pending | 53 Skipped
--- FAIL: TestSanity (61.37s)
FAIL

Tests failed on attempt #3


Ginkgo ran 1 suite in 1m7.147438575s

Test Suite Failed
exit status 1

From what I've seen different tests are failing on connection timeout - it's not a specific test.

/kind flake

`xfs-disk-setup` and `local-csi-driver` got deployed to diferent nodes and failed to rollout

xfs-disk-setup and local-csi-driver got deplyo to diferent nodes and failed to rollout because local-csi-driver tolerates all taints while the xfs-disk-setup example DS doesn't. Yet local-csi-driver requires xfs-disk-setup or it fails on missing mount.

Support automatic data volume expansion

Driver should be able to expand the volume size. With XFS Quotas it should be fairly easy.

Driver should:

Implement VolumeExpansion plugin capability.
Implement EXPAND_VOLUME controller capability or implement EXPAND_VOLUME node capability or both.

Dockerfile should be using latest packages during build ('apt-get upgrade' is missing)

When we are building the Docker image, we should ensure we pull the latest-greatest packages, mainly for security fixes.

Rename repo from k8s-local-volume-provisioner to local-csi-driver

As part of #39 we need to change the repo name from k8s-local-volume-provisioner to local-csi-driver.

Error: can't listen on "/csi/csi.sock" using unix protocol: listen unix /csi/csi.sock: bind: address already in use

Issue description

One of the CSI driver instances failed to start with the following error:

I0623 17:30:37.071240       1 local-csi-driver/driver.go:119] "Driver started" command="local-csi-driver" version="\"v0.1.0-beta.1-0-ga59b0f8\""
I0623 17:30:37.071291       1 flag/flags.go:64] FLAG: --driver-name="local.csi.scylladb.com"
I0623 17:30:37.071298       1 flag/flags.go:64] FLAG: --help="false"
I0623 17:30:37.071303       1 flag/flags.go:64] FLAG: --listen="/csi/csi.sock"
I0623 17:30:37.071306       1 flag/flags.go:64] FLAG: --loglevel="2"
I0623 17:30:37.071310       1 flag/flags.go:64] FLAG: --node-name="ip-10-12-7-103.ec2.internal"
I0623 17:30:37.071314       1 flag/flags.go:64] FLAG: --v="2"
I0623 17:30:37.071317       1 flag/flags.go:64] FLAG: --volumes-dir="/mnt/persistent-volumes"
Error: can't listen on "/csi/csi.sock" using unix protocol: listen unix /csi/csi.sock: bind: address already in use

It's liveness probe results:

I0623 16:26:34.852622       1 main.go:149] calling CSI driver to discover driver name
I0623 16:26:34.853747       1 main.go:155] CSI driver name: "local.csi.scylladb.com"
I0623 16:26:34.853773       1 main.go:183] ServeMux listening at "0.0.0.0:9809"
W0623 16:46:30.509744       1 connection.go:173] Still connecting to unix:///csi/csi.sock
E0623 16:47:08.559050       1 main.go:64] failed to establish connection to CSI driver: context deadline exceeded
E0623 16:47:15.126560       1 main.go:64] failed to establish connection to CSI driver: context canceled
W0623 16:47:15.345008       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:15.424984       1 connection.go:173] Still connecting to unix:///csi/csi.sock
E0623 16:47:15.430405       1 main.go:64] failed to establish connection to CSI driver: context canceled
E0623 16:47:15.474940       1 main.go:64] failed to establish connection to CSI driver: context canceled
E0623 16:47:15.561968       1 connection.go:132] Lost connection to unix:///csi/csi.sock.
E0623 16:47:15.593518       1 main.go:64] failed to establish connection to CSI driver: context canceled
W0623 16:47:18.054903       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:25.236508       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:25.349541       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:25.500791       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:25.660422       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:30.794906       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:35.503076       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:35.595067       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:35.803256       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:36.158894       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:38.055628       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:50.060415       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:48:25.593549       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:47:53.150225       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:48:07.698577       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:48:22.719607       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:48:24.376925       1 connection.go:173] Still connecting to unix:///csi/csi.sock
E0623 16:48:25.422562       1 main.go:64] failed to establish connection to CSI driver: context canceled
E0623 16:48:25.422706       1 main.go:64] failed to establish connection to CSI driver: context canceled
E0623 16:48:25.423525       1 main.go:64] failed to establish connection to CSI driver: context canceled
E0623 16:48:25.485253       1 main.go:64] failed to establish connection to CSI driver: context canceled
E0623 16:48:25.424495       1 main.go:64] failed to establish connection to CSI driver: context canceled
W0623 16:48:25.714499       1 connection.go:173] Still connecting to unix:///csi/csi.sock
W0623 16:48:25.726677       1 connection.go:173] Still connecting to unix:///csi/csi.sock
...
< ~2600 more 'Still connecting to unix:///csi/csi.sock' messages>
...
W0623 17:32:08.055199       1 connection.go:173] Still connecting to unix:///csi/csi.sock

Impact

Breakage of a Scylla member creation

How frequently does it reproduce?

~5-10%

Installation details

Kernel Version: 5.10.179-168.710.amzn2.x86_64
Scylla version (or git commit hash): 2022.2.9-20230618.843304f9f734 with build-id a34753ee38bccbaf461e04ae0e63e17afe45e048

K8S local-volume-provisioner image: docker.io/scylladb/k8s-local-volume-provisioner:0.1.0-rc.0

Operator Image: scylladb/scylla-operator:1.9.0-rc.1
Operator Helm Version: 1.9.0-rc.1
Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest
Cluster size: 3 nodes (i3.2xlarge)

OS / Image: `` (k8s-eks: undefined_region)

Test: perf-regression-throughput-eks
Test id: 657ce2c4-2bbf-4941-8d23-8be7f0a2487d
Test name: scylla-operator/operator-1.9/performance/perf-regression-throughput-eks
Test config file(s):

perf-regression.100threads.30M-keys.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 657ce2c4-2bbf-4941-8d23-8be7f0a2487d
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 657ce2c4-2bbf-4941-8d23-8be7f0a2487d

Logs:

db-cluster-657ce2c4.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/657ce2c4-2bbf-4941-8d23-8be7f0a2487d/20230623_174133/db-cluster-657ce2c4.tar.gz
sct-runner-events-657ce2c4.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/657ce2c4-2bbf-4941-8d23-8be7f0a2487d/20230623_174133/sct-runner-events-657ce2c4.tar.gz
sct-657ce2c4.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/657ce2c4-2bbf-4941-8d23-8be7f0a2487d/20230623_174133/sct-657ce2c4.log.tar.gz
monitor-set-657ce2c4.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/657ce2c4-2bbf-4941-8d23-8be7f0a2487d/20230623_174133/monitor-set-657ce2c4.tar.gz
loader-set-657ce2c4.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/657ce2c4-2bbf-4941-8d23-8be7f0a2487d/20230623_174133/loader-set-657ce2c4.tar.gz
kubernetes-657ce2c4.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/657ce2c4-2bbf-4941-8d23-8be7f0a2487d/20230623_174133/kubernetes-657ce2c4.tar.gz

Jenkins job URL
Argus

Unify binary, repo and image naming

We should unify the packages, binary and image name for consistency.

At this point the difference is confusing (caused by historical reasons when requesting the repo name):
repo name: github.com/scylladb/k8s-local-volume-provisioner
image name: docker.io/scylladb/k8s-local-volume-provisioner
binary name: local-csi-driver

e2e test can't build in-cluster config

When run in-cluster the tests should build kubeconfig from env and mounted files. This is broken today.

CSIStorageCapacity API objects for Nodes that no longer exists stay indefinitelly

We are having issues with left overs of

apiVersion: storage.k8s.io/v1beta1
capacity: 857372994Ki
kind: CSIStorageCapacity
metadata:
  creationTimestamp: '2023-05-02T09:40:14Z'
  generateName: csisc-
  labels:
    csi.storage.k8s.io/drivername: local.csi.scylladb.com
    csi.storage.k8s.io/managed-by: external-provisioner-
  name: csisc-52vb8

They should be bounded to node and deleted when node is deleted

`xfs-disk-setup` isn't reentrant

When I was testing CI setups, the xfs-disk-setup failed to rollout a second time on the same node because the mount was already setup. We should make it reentrant.