The Google Compute Engine Persistent Disk (GCE PD) Container Storage Interface (CSI) Storage Plugin.

License: Apache License 2.0

Makefile 1.68% Go 84.68% Shell 12.54% Dockerfile 0.57% Python 0.53%

gcp-compute-persistent-disk-csi-driver's Introduction

Google Compute Engine Persistent Disk CSI Driver

WARNING: Manual deployment of this driver to your GKE cluster is not recommended. Instead users should use GKE to automatically deploy and manage the GCE PD CSI Driver (see GKE Docs).

DISCLAIMER: Manual deployment of the driver to your cluster is not officially supported by Google.

The Google Compute Engine Persistent Disk CSI Driver is a CSI Specification compliant driver used by Container Orchestrators to manage the lifecycle of Google Compute Engine Persistent Disks.

Project Status

Status: GA Latest stable image: registry.k8s.io/cloud-provider-gcp/gcp-compute-persistent-disk-csi-driver:v1.15.0

Test Status

Kubernetes Integration

Driver Version	Kubernetes Version	Test Status
HEAD Latest	HEAD
HEAD stable-master	HEAD (Migration ON)

CSI Compatibility

This plugin is compatible with CSI versions v1.2.0, v1.1.0, and v1.0.0

Kubernetes Version Recommendations

The latest stable of this driver is recommended for the latest stable Kubernetes version. For previous Kubernetes versions, we recommend the following driver versions.

Kubernetes Version	PD CSI Driver Version
HEAD	v1.13.x
1.29	v1.12.x
1.28	v1.12.x
1.27	v1.10.x

The manifest bundle which captures all the driver components (driver pod which includes the containers csi-provisioner, csi-resizer, csi-snapshotter, gce-pd-driver, csi-driver-registrar; csi driver object, rbacs, pod security policies etc) for the lastest stable release can be picked up from the master branch overlays directory.

Known Issues

See Github Issues

Plugin Features

CreateVolume Parameters

Parameter	Values	Default	Description
type	Any PD type (see GCP documentation), eg `pd-ssd` `pd-balanced`	`pd-standard`	Type allows you to choose between standard Persistent Disks or Solid State Drive Persistent Disks
replication-type	`none` OR `regional-pd`	`none`	Replication type allows you to choose between Zonal Persistent Disks or Regional Persistent Disks
disk-encryption-kms-key	Fully qualified resource identifier for the key to use to encrypt new disks.	Empty string.	Encrypt disk using Customer Managed Encryption Key (CMEK). See GKE Docs for details.
labels	`key1=value1,key2=value2`		Labels allow you to assign custom GCE Disk labels.
provisioned-iops-on-create	string (int64 format). Values typically between 10,000 and 120,000		Indicates how many IOPS to provision for the disk. See the Extreme persistent disk documentation for details, including valid ranges for IOPS.
provisioned-throughput-on-create	string (int64 format). Values typically between 1 and 7,124 mb per second		Indicates how much throughput to provision for the disk. See the hyperdisk documentation for details, including valid ranges for throughput.
resource-tags	`<parent_id1>/<tag_key1>/<tag_value1>,<parent_id2>/<tag_key2>/<tag_value2>`		Resource tags allow you to attach user-defined tags to each Compute Disk, Image and Snapshot. See Tags overview, Creating and managing tags.

Topology

This driver supports only one topology key: topology.gke.io/zone that represents availability by zone (e.g. us-central1-c, etc.).

CSI Windows Support

GCE PD driver starts to support CSI Windows with [CSI Proxy] (https://github.com/kubernetes-csi/csi-proxy). It requires csi-proxy.exe to be installed on every Windows node. Please see more details in CSI Windows page (docs/kubernetes/user-guides/windows.md)

Features in Development

Feature	Stage	Min Kubernetes Master Version	Min Kubernetes Nodes Version	Min Driver Version	Deployment Overlay
Snapshots	GA	1.17	Any	v1.0.0	stable-1-21, stable-1-22, stable-1-23, stable-master
Clones	GA	1.18	Any	v1.4.0	stable-1-21, stable-1-22, stable-1-23, stable-master
Resize (Expand)	Beta	1.16	1.16	v0.7.0	stable-1-21, stable-1-22, stable-1-23, stable-master
Windows*	GA	1.19	1.19	v1.1.0	stable-1-21, stable-1-22, stable-1-23, stable-master

* For Windows, it is recommended to use this driver with CSI proxy v0.2.2+. The master version of driver requires disk v1beta2 group, which is only available in CSI proxy v0.2.2+

Future Features

See Github Issues

Driver Deployment

As part of the deployment process, the driver is deployed in a newly created namespace by default. The namespace will be deleted as part of the cleanup process.

Controller-level and node-level deployments will both have priorityClassName set, and the corresponding priority value is close to the maximum possible for user-created PriorityClasses.

Further Documentation

Local Development

For releasing new versions of this driver, googlers should consult go/pdcsi-oss-release-process.

Kubernetes

User Guides

Driver Development

gcp-compute-persistent-disk-csi-driver's People

Contributors

Stargazers

Watchers

Forkers

davidz627 msau42 jingxu97 krunaljain saad-ali ashleyschuett ehmm verult cwdsuzhou andyzhangx dalavancloud mockery-li gnufied hantaowang ddebroy katsew bobhenkel suaas21 huffmanca jcunhasilva boylee1111 nelvadas nikhilkathare stiller-leser aayush-rangwala ffilippopoulos rfranzke ashish-amarnath liangxia bertinatto saikat-royc clebs ialidzhikov mattcary fredkan brianpursley muki1203 ravisawlani2020 rimas6968 valcharo92 ernestas2k jiawei0227 snimje alculquicondor socioprophet openshift jsafrane aayushrangwala yuga711 tsmetana zaidenwerg teweiluo isabella232 mauriciopoppe yjuns sfowl sarandia prafull11 christian-roggia leiyiz pacoxu elbehery ikarldasan chrishenzie pohly lisolu sozercan amacaskill wozniakjan zetaab pwschuurman spiffxp lizhuqi peterbradford damien75 frankfanslc lmercl luohao francoispqt book-io sagor999 dobsonj dannawang0221 akila-1998 jbadru73 mushuee nuxion kretingen-yuri giantswarm paulcapestany dmccoystephenson kon-angelo fishermeninpuqian alexander-ding songjiaxun mukkerasandeep jenting sunnylovestiramisu judemars sneha-at

gcp-compute-persistent-disk-csi-driver's Issues

Support Topology

// ACCESSIBILITY_CONSTRAINTS indicates that the volumes for this
      // plugin may not be equally accessible by all nodes in the
      // cluster. The CO MUST use the topology information returned by
      // CreateVolumeRequest along with the topology information
      // returned by NodeGetInfo to ensure that a given volume is
      // accessible from a given node when scheduling workloads.
      ACCESSIBILITY_CONSTRAINTS = 2;

/assign

WaitForOp should also error on Status Error

For some operations the operation poll doesn't actually return an error but there is an error on the field on the op itself.

"You can see that a 200 OK was returned and the operation has status=DONE, but there's an "error" field on the operation you should be looking at.

So your code should probably look at pollOp.error in addition to err.

There are additional fields "httpErrorStatusCode" and "httpErrorMessage". Perhaps it should be 403 instead of 400, but that's a separate issue."

Kubernetes recently solved by:

if op.Error != nil && len(op.Error.Errors) > 0 && op.Error.Errors[0] != nil {
		e := op.Error.Errors[0]
		o.err = &GCEOperationError{HTTPStatusCode: op.HTTPStatusCode, Code: e.Code, Message: e.Message}
}

Support `fs type` parameter

Support changing filesystem type instead of just always defaulting to ext4

Use Boskos Client Library to Lease Project in CI

Right now CI will just create instance in project dyzz-test for testing, would be better to "loan" a temporary tenant project using Boskos when the tests are being run on CI.
When the loaned project is released there is an automatic job that cleans up so we don't accidentally leave test artifacts around.

/cc @krzyzacy

Unit tests for controller

Need to add unit tests for controller calls

Add Flag to Driver for Controller or Node Mode

Add a flag --node to turn on the node server, --controller to turn on the controller server. They should both default to false.

Currently every driver runs all the node, controller, and identity servers by default.

Change Instances of `kubernetes-sigs` in directories to `sigs.k8s.io`

go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver is the correct directory for naming, not go/src/github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver

Needs to be changed.

/cc @krzyzacy

Run unit tests in CI

Depends on #46 #47 #48

CI for Github Pull Requests

We need to integrate with CI for Github Pull Requests.

Should run:

Sanity Tests
E2E Tests
Linter
???

Node DaemonSet in Kubernetes Deployment Should Not Run On Master

Currently the Daemonset constantly attempts to create the pod on master and fails because of resource limits. We should stop scheduling on master so that it does not continually loop attempted creation.

This is high priority as we have seen the daemonset pod kick off the api server from the master putting the whole cluster in a bad state.

Convert all "TODO" to Github issues

Codebase is currently littered with TODO's, some attributed to nobody/nothing and some to dyzz.

We should clean up the TODOs and create a github issue for each then link the Github issue in the TODO for tracking purposes

Support RePD

Support RePD for GCE PD.

Will likely depend on topology work as well

Change Naming From "gce-" to "gce-pd-" In All Appropriate Locations

Everywhere in the project. package, file, directory, type names etc. GCE is not descriptive enough, in most of these places it is specifically gce-pd

Kubernetes Deployment Readme Suggests Too Broad Permissions

Originally tried "compute-admin" and "compute-storage-admin" scopes but they seemed to not contain enough permissions for attach.

This is a tracking bug to revisit tightening the scopes required to deploy. If "Attach" is not currently supported in the "compute-admin" scope a bug should be opened against GCE Permissions because it definitely should be.

Improve Deployment Scripts

Make GCP service account token file user configuratable
Figure out actual minimal permissions set required for GCP service account
Make namespace configurable

Permission problem using gcp-pd-csi: Required 'compute.disks.get'

Hello,
Trying to deploy gcp-pd-csi to a GKE cluster. The deployment works, but trying to attach a disk I get the following error from the attacher:

GRPC error: rpc error: code = Unknown desc = googleapi: Error 403: Required 'compute.disks.get' permission for 'projects/XPROJECT/zones/europe-west3-a/disks/YDISK

(This is directly from the attacher log but the same error propagates up to the create pod error.

I'm using a service account in the deploy which is set-up as: Kubernetes Engine Admin, Service Account User, Storage Admin.

I'm trying to debug if the problem is in the deployment or is the service account missing certain rights.

Improve Documentation

Add Known Issues
Add Project Status
Add Plugin Features (supported parameters)
Add Future Features
Add User Guide
- Improve setup scripts to work every time, with reasonable error messages when variables not defined
Add Kubernetes Development
Add Dependency Management Information

Unit tests for Node

Unit tests for Identity

Bump External Kubernetes Components to v0.3.0

Also enable Kubelet registrar functionality

Set MaxVolumesPerNode on NodeGetInfo call based on Node Type

Currently NodeGetInfoResponse returns the default of 0 for MaxVolumesPerNode so the CO will decide how many volumes can be published on a node.

For GCE we need to return a different number based on node type as the Max Attachable Volumes depends on the number of vCPUs the instance has.

For the actual limits see:
See: https://cloud.google.com/compute/docs/disks/
"persistent disk limits" section

You should be able to GET the instance from the cloud and pull the number of vCPUs from that.
Bonus: We seem to need information from the node object a lot, caching the relevant information somewhere would be nice. Maybe in the GCENodeServer object

Use well-tested k8s/k8s Mount utilities for Mount

k8s/k8s has some pretty well tested and thorough mount utilities. It would be nice to consume them directly instead of copy-pasting the mount utilities into this driver. There is an active issue tracking factoring these utilities out of k8s for easier consumption: kubernetes/kubernetes#64953

Support Snapshots

 rpc CreateSnapshot (CreateSnapshotRequest)
    returns (CreateSnapshotResponse) {}

  rpc DeleteSnapshot (DeleteSnapshotRequest)
    returns (DeleteSnapshotResponse) {}

  rpc ListSnapshots (ListSnapshotsRequest)
    returns (ListSnapshotsResponse) {}

Integrate livenessprobe

We should ensure that CSI Probe call is implemented and integrate https://github.com/kubernetes-csi/livenessprobe to automatically manage health of the CSI driver (e.g. restart driver if unhealthy).

Raw Block Device Support

We want to support raw block device in the driver

/assign @verult

Update Production Push Location to `staging-k8s.gcr.io`

There is a new process to push to gcr.io/google-containers. First push to staging-k8s.gcr.io then there are some other specific steps after it has been reviewed for the image to get moved from staging to prod.

/assign

Service account file name is hard coded, needs to be configurable

Currently while deploying the driver, the Service Account environment variable needs to be in the format */cloud-sa.json. Ideally should be configurable based on the variable set by the user.

E2E test should test with service account made with setup-project script

The E2E Test should be as close as production env as possible. This would require it to use a Service Account with the same roles as we provide in the setup-project script.

However, we probably do not want to just run the setup-project script again without thinking because this could be highly disruptive of production service accounts. It would be good to create a special service account with the same permissions just for the test, and make sure that it is deleted afterwards.

/cc @msau42

Improve Base Image of Container

Currently container is using fedora:26 base image. This is not ideal.

We might want to change it to something more lightweight/secure like alpine or scratch.

/cc @msau42 might have some good ideas

Improve cloud provider layering

It may be cleaner to encapsulate the WaitForOp in the cloud provider CreateDisk method, instead of having the driver do it. This is more in line with how the in-tree driver is implemented.

Makefile prod-build-container fails

$ make prod-build-container 
mkdir -p bin
go build -o bin/gce-pd-csi-driver ./cmd/
go test -c sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/e2e -o bin/e2e.test
docker build -t gcr.io/****/volume-csi/compute-persistent-disk-csi-driver:v0.2.0.saad-dev
"docker build" requires exactly 1 argument.
See 'docker build --help'.

Usage:  docker build [OPTIONS] PATH | URL | - [flags]

Build an image from a Dockerfile
Makefile:34: recipe for target 'prod-build-container' failed
make: *** [prod-build-container] Error 1

Run E2E Tests With Containerized Driver

Currently E2E tests just copy the binary onto an instance to run the tests. They should run the tests with the containerized version of the driver because that is what the end user will likely be consuming, it will also detect errors in the containerization process.

Give more descriptive filenames for sample deployment

These are really generic names:
$ export SA_FILE=~/.../cloud-sa.json
$ export GCEPD_SA_NAME=sample-service-account

It may be better go prefix them with "gce-pd-csi" or something like that.

Also as a side note, because the SA_FILE filename must be cloud-sa.json, it may be better to make the directory configurable, rather than the filename configurable.

Verify udevadm works as intended in driver

There exists some manual udevadm triggering in the mount manager code due to a preexisting issue of devices not showing up properly. We need to verify that this udevadm trigger behavior is still called and works properly in this driver

E2E Test should have method of closing tunnel and shutting down driver on remote after test completes

Make a New CI Config to Run Kubernetes CSI Test(s)

Depends on: kubernetes/kubernetes#62561

We want to set up a config to run the Kubernetes E2E test for GCE PD here:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/csi_volumes.go

Add CI Tests for Verification

Things like:
go vet
go fmt
etc.

Maybe just take some of the ones from test-infra:
https://github.com/kubernetes/test-infra/tree/master/hack

Sample Deployment should point to Stable Driver Image

/assign

Currently points to my personal staging image which is changing all the time

Split CI Tests Into Seperate Runs

Right now all CI Tests are running under one job pull-gcp-compute-persistent-disk-csi-driver-test.

We should split them up based on test type. Ex.

pull-gcp-compute-persistent-disk-csi-driver-sanity
pull-gcp-compute-persistent-disk-csi-driver-e2e
pull-gcp-compute-persistent-disk-csi-driver-kubernetes

Each of these should invoke a go test or equivalent directly so that test results are streamed through Gubernator propertly.

Deployment of driver fails on GKE: assumes that the creator is cluster-admin

From Leonid:

So, apparently, deploy_driver.sh example under  gcp-compute-persistent-disk-csi-driver assumes that the creator is cluster-admin, which doesn't looks to be the case for GKE-gcloud-connect.
Manually creating relevant binding:
kubectl create clusterrolebinding cluster-admin-binding   --clusterrole cluster-admin   --user $(gcloud config get-value account)

seems to be enough to make the deployment work.

I verified his recommendation got me past this issue.

Flakey pull-gcp-compute-persistent-disk-csi-driver-e2e

The E2E test on CI is really flaky. Needs investigation.

Support partition parameter

GCE PD in-tree driver supports partitioning

    gcePersistentDisk:
      pdName: partitioned-disk
      fsType: ext4
      partition: 2

Container Image Should be Either "Scratch" or "k8s.gcr.io/debian-base-{arch}"

According to security team this would be best.

Deployment of driver fails on GKE: Unknown user

I see the following when deploying to GKE:

2018-07-16 07:35:05.000 PDT
Unknown user "system:serviceaccount:default:csi-controller-sa"

But kubectl shows that the account exists:

$ kubectl get sa
NAME                SECRETS   AGE
csi-controller-sa   1         8m
csi-node-sa         1         8m
default             1         1h

Also see this error:

E  github.com/kubernetes-csi/external-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:496: Failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:serviceaccount:default:csi-controller-sa" cannot list persistentvolumeclaims at the cluster scope: [clusterrole.rbac.authorization.k8s.io "system:csi-external-attacher" not found, clusterrole.rbac.authorization.k8s.io "system:csi-external-provisioner" not found] 
  undefined
E  Unknown user "system:serviceaccount:default:csi-controller-sa"
 
  undefined

But kubectl shows the clusterrolebinding exists

$ kubectl describe clusterrolebindings csi-controller-attacher-binding
Name:         csi-controller-attacher-binding
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"csi-controller-attacher-binding","namespa...
Role:
  Kind:  ClusterRole
  Name:  system:csi-external-attacher
Subjects:
  Kind            Name               Namespace
  ----            ----               ---------
  ServiceAccount  csi-controller-sa  default

$ kubectl describe clusterrolebindings csi-controller-provisioner-binding
Name:         csi-controller-provisioner-binding
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"csi-controller-provisioner-binding","name...
Role:
  Kind:  ClusterRole
  Name:  system:csi-external-provisioner
Subjects:
  Kind            Name               Namespace
  ----            ----               ---------
  ServiceAccount  csi-controller-sa  default

E2E Framework Overhaul

E2E Framework should actually run locally, we can create instances (possibly multizonal) that run the driver binary that actually exposes their endpoint as an SSL secured TCP endpoint:
https://grpc.io/docs/guides/auth.html#examples

This way the e2e tests can actually spin these instances up, and spin up a grpc client that we use to call these instances for the tests.

This makes the multi-zonal tests easier to create and makes it so that we dont have to stream test commands and results back and forth between the instances

Makefile does not define GCE_PD_CSI_STAGING_IMAGE variable

Makefile defines:

STAGINGIMAGE=${GCE_PD_CSI_STAGING_IMAGE}

But provides no default, so if that environment variable is not set, make build-container fails.

Validate Volume Capabilities in Various Calls

Refactor out volume capability validation code, call it in various controller functions to validate that we support the volume capabilities before doing any controller calls

ControllerPublishVolume should Verify Disk Attached

After the attach call, and waiting for the attach op to complete. Sometimes the attach will not error but the disk will not be attached.

We should add an additional verification to make sure that the disk has been attached. We can see this on the disk object:

// Users: [Output Only] Links to the users of the disk (attached
// instances) in form: project/zones/zone/instances/instance
Users []string `json:"users,omitempty"`

OR on the instance there is AttachedDisks.

Support CSI Spec v0.3.0

https://github.com/container-storage-interface/spec/releases/tag/v0.3.0

/assign

Convert Deployment and Setup Scripts to a more maintainable scripting language

Also would be nice if it was more platform agnostic

Maybe Python...?

kubernetes-sigs / gcp-compute-persistent-disk-csi-driver Goto Github PK

gcp-compute-persistent-disk-csi-driver's Introduction

Google Compute Engine Persistent Disk CSI Driver

Project Status

Test Status

Kubernetes Integration

CSI Compatibility

Kubernetes Version Recommendations

Known Issues

Plugin Features

CreateVolume Parameters

Topology

CSI Windows Support

Features in Development

Future Features

Driver Deployment

Further Documentation

Kubernetes

gcp-compute-persistent-disk-csi-driver's People

Contributors

Stargazers

Watchers

Forkers

gcp-compute-persistent-disk-csi-driver's Issues

Recommend Projects

Recommend Topics

Recommend Org