Giter Club home page Giter Club logo

gcp-compute-persistent-disk-csi-driver's Introduction

Google Compute Engine Persistent Disk CSI Driver

WARNING: Manual deployment of this driver to your GKE cluster is not recommended. Instead users should use GKE to automatically deploy and manage the GCE PD CSI Driver (see GKE Docs).

DISCLAIMER: Manual deployment of the driver to your cluster is not officially supported by Google.

The Google Compute Engine Persistent Disk CSI Driver is a CSI Specification compliant driver used by Container Orchestrators to manage the lifecycle of Google Compute Engine Persistent Disks.

Project Status

Status: GA Latest stable image: registry.k8s.io/cloud-provider-gcp/gcp-compute-persistent-disk-csi-driver:v1.15.0

Test Status

Kubernetes Integration

Driver Version Kubernetes Version Test Status
HEAD Latest HEAD Test Status
HEAD stable-master HEAD (Migration ON) Test Status

CSI Compatibility

This plugin is compatible with CSI versions v1.2.0, v1.1.0, and v1.0.0

Kubernetes Version Recommendations

The latest stable of this driver is recommended for the latest stable Kubernetes version. For previous Kubernetes versions, we recommend the following driver versions.

Kubernetes Version PD CSI Driver Version
HEAD v1.13.x
1.29 v1.12.x
1.28 v1.12.x
1.27 v1.10.x

The manifest bundle which captures all the driver components (driver pod which includes the containers csi-provisioner, csi-resizer, csi-snapshotter, gce-pd-driver, csi-driver-registrar; csi driver object, rbacs, pod security policies etc) for the lastest stable release can be picked up from the master branch overlays directory.

Known Issues

See Github Issues

Plugin Features

CreateVolume Parameters

Parameter Values Default Description
type Any PD type (see GCP documentation), eg pd-ssd pd-balanced pd-standard Type allows you to choose between standard Persistent Disks or Solid State Drive Persistent Disks
replication-type none OR regional-pd none Replication type allows you to choose between Zonal Persistent Disks or Regional Persistent Disks
disk-encryption-kms-key Fully qualified resource identifier for the key to use to encrypt new disks. Empty string. Encrypt disk using Customer Managed Encryption Key (CMEK). See GKE Docs for details.
labels key1=value1,key2=value2 Labels allow you to assign custom GCE Disk labels.
provisioned-iops-on-create string (int64 format). Values typically between 10,000 and 120,000 Indicates how many IOPS to provision for the disk. See the Extreme persistent disk documentation for details, including valid ranges for IOPS.
provisioned-throughput-on-create string (int64 format). Values typically between 1 and 7,124 mb per second Indicates how much throughput to provision for the disk. See the hyperdisk documentation for details, including valid ranges for throughput.
resource-tags <parent_id1>/<tag_key1>/<tag_value1>,<parent_id2>/<tag_key2>/<tag_value2> Resource tags allow you to attach user-defined tags to each Compute Disk, Image and Snapshot. See Tags overview, Creating and managing tags.

Topology

This driver supports only one topology key: topology.gke.io/zone that represents availability by zone (e.g. us-central1-c, etc.).

CSI Windows Support

GCE PD driver starts to support CSI Windows with [CSI Proxy] (https://github.com/kubernetes-csi/csi-proxy). It requires csi-proxy.exe to be installed on every Windows node. Please see more details in CSI Windows page (docs/kubernetes/user-guides/windows.md)

Features in Development

Feature Stage Min Kubernetes Master Version Min Kubernetes Nodes Version Min Driver Version Deployment Overlay
Snapshots GA 1.17 Any v1.0.0 stable-1-21, stable-1-22, stable-1-23, stable-master
Clones GA 1.18 Any v1.4.0 stable-1-21, stable-1-22, stable-1-23, stable-master
Resize (Expand) Beta 1.16 1.16 v0.7.0 stable-1-21, stable-1-22, stable-1-23, stable-master
Windows* GA 1.19 1.19 v1.1.0 stable-1-21, stable-1-22, stable-1-23, stable-master

* For Windows, it is recommended to use this driver with CSI proxy v0.2.2+. The master version of driver requires disk v1beta2 group, which is only available in CSI proxy v0.2.2+

Future Features

See Github Issues

Driver Deployment

As part of the deployment process, the driver is deployed in a newly created namespace by default. The namespace will be deleted as part of the cleanup process.

Controller-level and node-level deployments will both have priorityClassName set, and the corresponding priority value is close to the maximum possible for user-created PriorityClasses.

Further Documentation

Local Development

For releasing new versions of this driver, googlers should consult go/pdcsi-oss-release-process.

Kubernetes

User Guides

Driver Development

gcp-compute-persistent-disk-csi-driver's People

Contributors

amacaskill avatar artemvmin avatar chrishenzie avatar christian-roggia avatar cpanato avatar dannawang0221 avatar davidz627 avatar dependabot[bot] avatar hungnguyen243 avatar jeremyje avatar jiawei0227 avatar jingxu97 avatar judemars avatar k8s-ci-robot avatar leiyiz avatar lizhuqi avatar mattcary avatar mauriciopoppe avatar msau42 avatar nikhilkathare avatar pohly avatar pwschuurman avatar saad-ali avatar saikat-royc avatar sneha-at avatar songjiaxun avatar sunnylovestiramisu avatar teweiluo avatar tyuchn avatar verult avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcp-compute-persistent-disk-csi-driver's Issues

Support Topology

// ACCESSIBILITY_CONSTRAINTS indicates that the volumes for this
      // plugin may not be equally accessible by all nodes in the
      // cluster. The CO MUST use the topology information returned by
      // CreateVolumeRequest along with the topology information
      // returned by NodeGetInfo to ensure that a given volume is
      // accessible from a given node when scheduling workloads.
      ACCESSIBILITY_CONSTRAINTS = 2;

/assign

WaitForOp should also error on Status Error

For some operations the operation poll doesn't actually return an error but there is an error on the field on the op itself.

"You can see that a 200 OK was returned and the operation has status=DONE, but there's an "error" field on the operation you should be looking at.

So your code should probably look at pollOp.error in addition to err.

There are additional fields "httpErrorStatusCode" and "httpErrorMessage". Perhaps it should be 403 instead of 400, but that's a separate issue."

Kubernetes recently solved by:

if op.Error != nil && len(op.Error.Errors) > 0 && op.Error.Errors[0] != nil {
		e := op.Error.Errors[0]
		o.err = &GCEOperationError{HTTPStatusCode: op.HTTPStatusCode, Code: e.Code, Message: e.Message}
}

Use Boskos Client Library to Lease Project in CI

Right now CI will just create instance in project dyzz-test for testing, would be better to "loan" a temporary tenant project using Boskos when the tests are being run on CI.
When the loaned project is released there is an automatic job that cleans up so we don't accidentally leave test artifacts around.

/cc @krzyzacy

Add Flag to Driver for Controller or Node Mode

Add a flag --node to turn on the node server, --controller to turn on the controller server. They should both default to false.

Currently every driver runs all the node, controller, and identity servers by default.

Node DaemonSet in Kubernetes Deployment Should Not Run On Master

Currently the Daemonset constantly attempts to create the pod on master and fails because of resource limits. We should stop scheduling on master so that it does not continually loop attempted creation.

This is high priority as we have seen the daemonset pod kick off the api server from the master putting the whole cluster in a bad state.

Convert all "TODO" to Github issues

Codebase is currently littered with TODO's, some attributed to nobody/nothing and some to dyzz.

We should clean up the TODOs and create a github issue for each then link the Github issue in the TODO for tracking purposes

Support RePD

Support RePD for GCE PD.

Will likely depend on topology work as well

Kubernetes Deployment Readme Suggests Too Broad Permissions

Originally tried "compute-admin" and "compute-storage-admin" scopes but they seemed to not contain enough permissions for attach.

This is a tracking bug to revisit tightening the scopes required to deploy. If "Attach" is not currently supported in the "compute-admin" scope a bug should be opened against GCE Permissions because it definitely should be.

Improve Deployment Scripts

  • Make GCP service account token file user configuratable
  • Figure out actual minimal permissions set required for GCP service account
  • Make namespace configurable

Permission problem using gcp-pd-csi: Required 'compute.disks.get'

Hello,
Trying to deploy gcp-pd-csi to a GKE cluster. The deployment works, but trying to attach a disk I get the following error from the attacher:

GRPC error: rpc error: code = Unknown desc = googleapi: Error 403: Required 'compute.disks.get' permission for 'projects/XPROJECT/zones/europe-west3-a/disks/YDISK

(This is directly from the attacher log but the same error propagates up to the create pod error.

I'm using a service account in the deploy which is set-up as: Kubernetes Engine Admin, Service Account User, Storage Admin.

I'm trying to debug if the problem is in the deployment or is the service account missing certain rights.

Improve Documentation

  • Add Known Issues
  • Add Project Status
  • Add Plugin Features (supported parameters)
  • Add Future Features
  • Add User Guide
    • Improve setup scripts to work every time, with reasonable error messages when variables not defined
  • Add Kubernetes Development
  • Add Dependency Management Information

Set MaxVolumesPerNode on NodeGetInfo call based on Node Type

Currently NodeGetInfoResponse returns the default of 0 for MaxVolumesPerNode so the CO will decide how many volumes can be published on a node.

For GCE we need to return a different number based on node type as the Max Attachable Volumes depends on the number of vCPUs the instance has.

For the actual limits see:
See: https://cloud.google.com/compute/docs/disks/
"persistent disk limits" section

You should be able to GET the instance from the cloud and pull the number of vCPUs from that.
Bonus: We seem to need information from the node object a lot, caching the relevant information somewhere would be nice. Maybe in the GCENodeServer object

Support Snapshots

 rpc CreateSnapshot (CreateSnapshotRequest)
    returns (CreateSnapshotResponse) {}

  rpc DeleteSnapshot (DeleteSnapshotRequest)
    returns (DeleteSnapshotResponse) {}

  rpc ListSnapshots (ListSnapshotsRequest)
    returns (ListSnapshotsResponse) {}

Update Production Push Location to `staging-k8s.gcr.io`

There is a new process to push to gcr.io/google-containers. First push to staging-k8s.gcr.io then there are some other specific steps after it has been reviewed for the image to get moved from staging to prod.

/assign

E2E test should test with service account made with setup-project script

The E2E Test should be as close as production env as possible. This would require it to use a Service Account with the same roles as we provide in the setup-project script.

However, we probably do not want to just run the setup-project script again without thinking because this could be highly disruptive of production service accounts. It would be good to create a special service account with the same permissions just for the test, and make sure that it is deleted afterwards.

/cc @msau42

Improve Base Image of Container

Currently container is using fedora:26 base image. This is not ideal.

We might want to change it to something more lightweight/secure like alpine or scratch.

/cc @msau42 might have some good ideas

Improve cloud provider layering

It may be cleaner to encapsulate the WaitForOp in the cloud provider CreateDisk method, instead of having the driver do it. This is more in line with how the in-tree driver is implemented.

Makefile prod-build-container fails

$ make prod-build-container 
mkdir -p bin
go build -o bin/gce-pd-csi-driver ./cmd/
go test -c sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/e2e -o bin/e2e.test
docker build -t gcr.io/****/volume-csi/compute-persistent-disk-csi-driver:v0.2.0.saad-dev
"docker build" requires exactly 1 argument.
See 'docker build --help'.

Usage:  docker build [OPTIONS] PATH | URL | - [flags]

Build an image from a Dockerfile
Makefile:34: recipe for target 'prod-build-container' failed
make: *** [prod-build-container] Error 1

Run E2E Tests With Containerized Driver

Currently E2E tests just copy the binary onto an instance to run the tests. They should run the tests with the containerized version of the driver because that is what the end user will likely be consuming, it will also detect errors in the containerization process.

Give more descriptive filenames for sample deployment

These are really generic names:
$ export SA_FILE=~/.../cloud-sa.json
$ export GCEPD_SA_NAME=sample-service-account

It may be better go prefix them with "gce-pd-csi" or something like that.

Also as a side note, because the SA_FILE filename must be cloud-sa.json, it may be better to make the directory configurable, rather than the filename configurable.

Verify udevadm works as intended in driver

There exists some manual udevadm triggering in the mount manager code due to a preexisting issue of devices not showing up properly. We need to verify that this udevadm trigger behavior is still called and works properly in this driver

Split CI Tests Into Seperate Runs

Right now all CI Tests are running under one job pull-gcp-compute-persistent-disk-csi-driver-test.

We should split them up based on test type. Ex.

pull-gcp-compute-persistent-disk-csi-driver-sanity
pull-gcp-compute-persistent-disk-csi-driver-e2e
pull-gcp-compute-persistent-disk-csi-driver-kubernetes

Each of these should invoke a go test or equivalent directly so that test results are streamed through Gubernator propertly.

Deployment of driver fails on GKE: assumes that the creator is cluster-admin

From Leonid:

So, apparently, deploy_driver.sh example under  gcp-compute-persistent-disk-csi-driver assumes that the creator is cluster-admin, which doesn't looks to be the case for GKE-gcloud-connect.
Manually creating relevant binding:
kubectl create clusterrolebinding cluster-admin-binding   --clusterrole cluster-admin   --user $(gcloud config get-value account)

seems to be enough to make the deployment work.

I verified his recommendation got me past this issue.

Support partition parameter

GCE PD in-tree driver supports partitioning

    gcePersistentDisk:
      pdName: partitioned-disk
      fsType: ext4
      partition: 2

Deployment of driver fails on GKE: Unknown user

I see the following when deploying to GKE:

2018-07-16 07:35:05.000 PDT
Unknown user "system:serviceaccount:default:csi-controller-sa"

But kubectl shows that the account exists:

$ kubectl get sa
NAME                SECRETS   AGE
csi-controller-sa   1         8m
csi-node-sa         1         8m
default             1         1h

Also see this error:

E  github.com/kubernetes-csi/external-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:496: Failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:serviceaccount:default:csi-controller-sa" cannot list persistentvolumeclaims at the cluster scope: [clusterrole.rbac.authorization.k8s.io "system:csi-external-attacher" not found, clusterrole.rbac.authorization.k8s.io "system:csi-external-provisioner" not found] 
  undefined
E  Unknown user "system:serviceaccount:default:csi-controller-sa"
 
  undefined

But kubectl shows the clusterrolebinding exists

$ kubectl describe clusterrolebindings csi-controller-attacher-binding
Name:         csi-controller-attacher-binding
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"csi-controller-attacher-binding","namespa...
Role:
  Kind:  ClusterRole
  Name:  system:csi-external-attacher
Subjects:
  Kind            Name               Namespace
  ----            ----               ---------
  ServiceAccount  csi-controller-sa  default
$ kubectl describe clusterrolebindings csi-controller-provisioner-binding
Name:         csi-controller-provisioner-binding
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"csi-controller-provisioner-binding","name...
Role:
  Kind:  ClusterRole
  Name:  system:csi-external-provisioner
Subjects:
  Kind            Name               Namespace
  ----            ----               ---------
  ServiceAccount  csi-controller-sa  default

E2E Framework Overhaul

E2E Framework should actually run locally, we can create instances (possibly multizonal) that run the driver binary that actually exposes their endpoint as an SSL secured TCP endpoint:
https://grpc.io/docs/guides/auth.html#examples

This way the e2e tests can actually spin these instances up, and spin up a grpc client that we use to call these instances for the tests.

This makes the multi-zonal tests easier to create and makes it so that we dont have to stream test commands and results back and forth between the instances

ControllerPublishVolume should Verify Disk Attached

After the attach call, and waiting for the attach op to complete. Sometimes the attach will not error but the disk will not be attached.

We should add an additional verification to make sure that the disk has been attached. We can see this on the disk object:

// Users: [Output Only] Links to the users of the disk (attached
// instances) in form: project/zones/zone/instances/instance
Users []string `json:"users,omitempty"`

OR on the instance there is AttachedDisks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.