Giter Club home page Giter Club logo

cloud-readiness's Introduction

[DEPRECATED]

This project is deprecated and replaced by k8ssandra-operator

Read this blog post to see what differences exist between K8ssandra and k8ssandra-operator, and why we decided to build an operator.
Follow our migration guide to migrate from K8ssandra (and Apache Cassandra®) to k8ssandra-operator.

K8ssandra

K8ssandra is a simple to manage, production-ready, distribution of Apache Cassandra and Stargate that is ready for Kubernetes. It is built on a foundation of rock-solid open-source projects covering both the transactional and operational aspects of Cassandra deployments. This project is distributed as a collection of Helm charts. Feel free to fork the repo and contribute. If you're looking to install K8ssandra head over to the Quickstarts.

Components

K8ssandra is composed of a number of sub-charts each representing a component in the K8ssandra stack. The default installation is focused on developer deployments with all of the features enabled and configured for running with a minimal set of resources. Many of these components may be deployed independently in a centralized fashion. Below is a list of the components in the K8ssandra stack with links to the appropriate projects.

Apache Cassandra

K8ssandra packages and deploys Apache Cassandra via the cass-operator project. Each Cassandra container has the Management API for Apache Cassandra (MAAC) and Metrics Collector for Apache Cassandra(MCAC) pre-installed and configured to come up automatically.

Stargate

Stargate provides a collection of horizontally scalable API endpoints for interacting with Cassandra databases. Developers may leverage REST and GraphQL alongside the traditional CQL interfaces. With Stargate operations teams gain the ability to independently scale coordination (Stargate) and data (Cassandra) layers. In some use-cases, this has resulted in a lower TCO and smaller infrastructure footprint.

Monitoring

Monitoring includes the collection, storage, and visualization of metrics. Along with the previously mentioned MCAC, K8ssandra utilizes Prometheus and Grafana for the storage and visualization of metrics. Installation and management of these pieces is handled by the Kube Prometheus Stack Helm chart.

Repairs

The Last Pickle Reaper is used to schedule and manage repairs in Cassandra. It provides a web interface to visualize repair progress and manage activity.

Backup & Restore

Another project from The Last Pickle, Medusa, manages the backup and restore of K8ssandra clusters.

Next Steps

If you are looking to run K8ssandra in your Kubernetes environment check out the Getting Started guide, with follow-up details for developers and site reliability engineers.

We are always looking for contributions to the docs, helm charts, and underlying components. Check out the code contribution guide and docs contribution guide

If you are a developer interested in working with the K8ssandra code, here is a guide that will give you an introduction to:

  • Important technologies and learning resources
  • Project components
  • Project processes and resources
  • Getting up and running with a basic IDE environment
  • Deploying to a local docker-based cluster environment (kind)
  • Understanding the K8ssandra project structure
  • Running unit tests
  • Troubleshooting tips

Dependencies

For information on the packaged dependencies of K8ssandra and their licenses, check out our open source report.

cloud-readiness's People

Contributors

jeffbanks avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

jeffbanks

cloud-readiness's Issues

GCP service account access activation per cluster

Generate service account access per provisioned cluster during the setup of and installation of K8ssandra operators and supporting k8s resources.

Create auth init/setup. In addition, allow installation logic to switch between service accounts owning the resources being provisioned.

User documentation

Provide in README documentation current set of functionality:

  • An overview of initial set of usage covering GCP provisioning.

  • How to cleanup provisioned resources in a GCP environment using the framework.

  • Beginnings of K8ssandra multi-cluster installation and smoke test style validations.

Upgrade go to v1.18

Explore usage for reducing verbosity of existing logic and use of Golang v1.18 new features where appropriate.

Provisioner reduce /tmp artifact footprint

Remove the unnecessary artifacts migrated to /tmp cloud-readiness folders for use by TerrraForm.

The default behavior is that a relative path is supplied to the TF API, which moves modules as well as tests which isn't used and causes extra space usage and confusion.

Consider folder naming within the project itself where provision and test folders reside.

GKE invalid capacity error w/ dc2 not deploying

Given the following K8ssandraCluster configuration, the dc1 datacenter is being deployed, but dc2 is not deploying on the data-plane cluster as expected.

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: bootz-k8c-cluster
spec:
  auth: false
  cassandra:
    serverVersion: "4.0.1"
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: premium-rwo
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      jvmOptions:
        heapSize: 1024Mi
    datacenters:
      - metadata:
          name: dc1
        k8sContext: gke_community-ecosystem_us-central1_dev-bootz501
        size: 3
      - metadata:
          name: dc2
        k8sContext: gke_community-ecosystem_us-central1_dev-bootz502
        size: 3
  stargate:
    size: 1
    heapSize: 1024Mi
    resources:
      limits:
        memory: 1024Mi
  reaper:
    autoScheduling:
      enabled: true

K8ssandra-operator version:

NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
k8ssandra-operator      bootz           1               2022-03-08 19:31:53.8942451 -0600 CST   deployed        k8ssandra-operator-0.37.1       1.0.1

Configuration for test inputs:

ctxConfig1 := ContextConfig{
		Name:          "bootz500",
		Namespace:     "bootz",
		ClusterLabels: []string{"control-plane"},
	}

	ctxConfig2 := ContextConfig{
		Name:          "bootz501",
		Namespace:     "bootz",
		ClusterLabels: []string{"data-plane"},
	}

	ctxConfig3 := ContextConfig{
		Name:          "bootz502",
		Namespace:     "bootz",
		ClusterLabels: []string{"data-plane"},
	}

Errors discovered in k8ssandracluster log on the control-plane cluster bootz500, indicate there is invalid capacity on image filesystem.

35m         Normal    Starting                  node/gke-dev-bootz600-dev-bootz600-node-po-c3c38776-d24g   Starting kubelet.
35m         Warning   InvalidDiskCapacity       node/gke-dev-bootz600-dev-bootz600-node-po-c3c38776-d24g   invalid capacity 0 on image filesystem

The config for this k8ssandracluster is using the following storageclass configuration:

NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
premium-rwo          pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   18h

The resources for this node node/gke-dev-bootz600-dev-bootz600-node-po-c3c38776-d24g are set with:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        572m (7%)   144m (1%)
  memory                     583Mi (9%)  1283Mi (21%)
  ephemeral-storage          0 (0%)      0 (0%)
  hugepages-1Gi              0 (0%)      0 (0%)
  hugepages-2Mi              0 (0%)      0 (0%)
  attachable-volumes-gce-pd  0           0

GKE error executing access token command

Issue
Control-plane's k8ssandra-operator reports failure on all datacenter dc1 pods when attempting reconcile.

Context
gke_community-ecosystem_us-central1_dev-bootz800

Pods state

NAME                                                READY   STATUS       RESTARTS   AGE
k8ssandra-operator-6b48d84656-vgb5t                 1/1     Running      0          114m
k8ssandra-operator-cass-operator-557f99795b-xxfjg   1/1     Running      0          4h25m

Reported error

2022-03-10T23:15:27.092Z        ERROR   controller.k8ssandracluster     Failed to update replication    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "bootz-k8c-cluster", "namespace": "bootz", "K8ssandraCluster": "bootz/bootz-k8c-cluster", "CassandraDatacenter": "bootz/dc1", "K8SContext": "gke_community-ecosystem_us-central1_dev-bootz801", "keyspace": "system_traces", "error": "CALL list keyspaces system_traces failed on all datacenter dc1 pods"}
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:168
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:133
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:87
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2022-03-10T23:15:27.112Z        INFO    controller.k8ssandracluster     updated k8ssandracluster status {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "bootz-k8c-cluster", "namespace": "bootz", "K8ssandraCluster": "bootz/bootz-k8c-cluster"}
2022-03-10T23:15:27.112Z        ERROR   controller.k8ssandracluster     Reconciler error        {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "bootz-k8c-cluster", "namespace": "bootz", "error": "CALL list keyspaces system_traces failed on all datacenter dc1 pods"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
E0310 23:15:28.060647       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CassandraDatacenter: failed to list *v1beta1.CassandraDatacenter: Get "https://34.70.196.223/apis/cassandra.datastax.com/v1beta1/namespaces/bootz/cassandradatacenters?resourceVersion=132976": error executing access token command "/usr/lib/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=fork/exec /usr/lib/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=

Possible fix / research needed

Ensure during install of the control-plane and data-plane k8ssandra operators, that the context is appropriately assigned and using the kube configurations for those contexts and NOT the gcp auth.

Generate K8ssandraCluster spec to match test input model

Automation for test precondition setup to populate the K8ssandraCluster specification used in a cloud-readiness test with references to unique names as defined by the test input model.

Request: The cloud-readiness framework will generate a unique K8ssandra cluster name and full context name(s) based on the number of clusters provisioned.

Example

Cluster name:

  • bootz-k8c-cluster

Contexts:

  • gke_community-ecosystem_us-central1_dev-bootz1
  • gke_community-ecosystem_us-central1_dev-bootz2
  • gke_community-ecosystem_us-central1_dev-bootz3

Load and populate the K8ssandraCluster specification used in the install to include those names.

File reference: K8ssandra -> test -> config -> k8ssandra-clusters.yaml

metadata:
  name: bootz-k8c-cluster
...
spec:
  cassandra:
    datacenters:
      - metadata:
          name: dc1
        k8sContext: gke_community-ecosystem_us-central1_dev-bootz1
        size: 3
        stargate:
          size: 1
          heapSize: 256M
      - metadata:
          name: dc2
        k8sContext: gke_community-ecosystem_us-central1_dev-bootz2
        size: 3
        stargate:
          size: 1
          heapSize: 256M
      - metadata:
          name: dc3
        k8sContext: gke_community-ecosystem_us-central1_dev-bootz3
        size: 3
        stargate:
          size: 1
          heapSize: 256M

Provisioning cleanup procedures

Include user docs for:

  • Accessibility to target provisioned plan and related artifacts
  • Cleanup procedures to utilize if automated provisioning is not requested or becomes unstable

Measure K8ssandra test run costs in GCP

Measure and capture GCP costs associated with the basic smoke test execution for k8ssandra in a multi-cluster k8ssandra-operator deployment.

Since the framework is not limited to solely smoke tests, the effort may be expanded to additional test type costs after the k8ssandra-operator release.

Re-provision using Terraform artifacts w/ test inputs

Enhancement

Need to allow for destroy and re-provision where the original cloud-readiness model values are referenced.

At current, we can destroy manually using the Terraform artifacts stored in a unique cloud-readiness provisioning folder. However, when re-install takes place using CLI apart from the original cloud-readiness go test, the inputs are not aligned and result in issues during the k8ssandra installation phase.

Workaround

  • Destroy using Terraform CLI for a target cluster.
  • Re-provision using the same model inputs in the go test.
  • Proceed with the k8ssandra test installation steps using the go test target(s).

Future change

Externalize the cloud-readiness inputs used during the infrastructure provisioning steps. This will allow CLI capabilities and reference those original inputs. This is extremely handy when needing to re-provision select resources of an existing test infrastructure.

Google GKE readiness

An epic tracking activities to support Google GKE cloud readiness for k8ssandra.

AWS verifications w/ cloud-readiness

Use of framework with AWS/EKS provisioned infrastructure to include the following:

  • Setup/provision
  • K8ssandra installation
  • Clean/teardown
  • Clear and concise logging

Verification and testing of the items listed above.

GCP multi-cluster failed to get remote client

3 cluster deployment with k8ssandra-operator, cassandra-operator on all 3, and k8ssandra-cluster on control-plane.

The following error is being reported.

Failed to get remote client {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "bootz-k8c-cluster", "namespace": "bootz", "K8ssandraCluster": "bootz/bootz-k8c-cluster", "K8sContext": "gke_community-ecosystem_us-central1_dev-bootz101", "error": "No known client for context-name gke_community-ecosystem_us-central1_dev-bootz101"}

Generated clientconfig

apiVersion: config.k8ssandra.io/v1beta1
kind: ClientConfig
metadata:
    name: gke-community-ecosystem-us-central1-dev-bootz101
spec:
    contextName: gke_community-ecosystem_us-central1_dev-bootz101
    kubeConfigSecret:
        name: k8s-contexts

Resource

NAME                                               AGE
gke-community-ecosystem-us-central1-dev-bootz101   78m

Describe

Name:         gke-community-ecosystem-us-central1-dev-bootz101
Namespace:    bootz
Labels:       <none>
Annotations:  k8ssandra.io/resource-hash: uadezAdU2wvWDsv/+Biac3NldYJL9qAR8TaQQicaCRE=
              k8ssandra.io/secret-hash: 8zT5+9/UnEbHQLzAGzFxdHGJw8OF4UaCAYFuz6rcQjo=
API Version:  config.k8ssandra.io/v1beta1
Kind:         ClientConfig
Metadata:
  Creation Timestamp:  2022-02-08T23:22:25Z
  Generation:          2
  Managed Fields:
    API Version:  config.k8ssandra.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:contextName:
        f:kubeConfigSecret:
          .:
          f:name:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-02-08T23:22:25Z
    API Version:  config.k8ssandra.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:k8ssandra.io/resource-hash:
          f:k8ssandra.io/secret-hash:
    Manager:         manager
    Operation:       Update
    Time:            2022-02-08T23:22:29Z
  Resource Version:  656833
  UID:               75060343-b118-488c-aea3-b37a8e380c71
Spec:
  Context Name:  gke_community-ecosystem_us-central1_dev-bootz101
  Kube Config Secret:
    Name:  k8s-contexts
Events:    <none>

Amazon EKS readiness

An epic tracking activities to support Amazon EKS cloud readiness for k8ssandra.

Model update - EKS as supported cloud environment

Changes necessary to support specific cloud environment other than existing default GCP/GKE.

Requires changes in model processing to detect difference cloud environment, as well some adjustments to the cloud configuration model to identify the desire to provision an install resources to EKS.

Provisioning and test artifact cleanup

During the provisioning step, artifacts unique and scoped to a cluster are created as a snapshots of the current test run. Those artifacts will need to be cleaned up post test run on a conditional basis.

It may be that for diagnostics of a previous test run or manual cleanup needs, those artifacts will need to remain beyond the lifecycle of the test execution (in some cases).

Add traefik and cert-manager to infra provision step

Currently traefik and cert-manager resources are created as initialization and preconditions of the k8ssandra-operator installation.

It was determined that having these included as part of the infrastructure foundation provisioning will better support the e2e script installations of k8ssandra.

The capability to install these as part of the k8ssandra installation will still be available, but optional through an EnableConfig setting.

Data-plane k8ssandra-operator designation

Data-plane k8ssandra operator has K8SSANDRA_CONTROL_PLANE set to true in current version being used 0.5.0

k8ssandra-operator      bootz           1               2022-02-11 14:44:28.2038539 -0600 CST   deployed        k8ssandra-operator-0.35.0       0.4.0
Name:         k8ssandra-operator-85b9659c56-dlwc4
Namespace:    bootz
Priority:     0
Node:         gke-dev-bootz102-dev-bootz102-node-po-63bd73fd-c37f/10.0.0.8
Start Time:   Fri, 11 Feb 2022 14:44:34 -0600
Labels:       app.kubernetes.io/instance=k8ssandra-operator
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=k8ssandra-operator
              app.kubernetes.io/part-of=k8ssandra-k8ssandra-operator-bootz
              control-plane=k8ssandra-operator
              helm.sh/chart=k8ssandra-operator-0.35.0
              pod-template-hash=85b9659c56
Annotations:  <none>
Status:       Running
IP:           10.76.1.7
IPs:
  IP:           10.76.1.7
Controlled By:  ReplicaSet/k8ssandra-operator-85b9659c56
Containers:
  k8ssandra-operator:
    Container ID:  containerd://70e933eb545887ada5114fded5f6ec3249d66e3e86d4b5aab46e5f77f48d95e8
    Image:         docker.io/k8ssandra/k8ssandra-operator:v0.4.0
    Image ID:      docker.io/k8ssandra/k8ssandra-operator@sha256:f9868cd7c951f151ae3b4eb8f4de7b7cd26f0fe18e58ed17f2ce8c7c1f431682
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    State:          Running
      Started:      Fri, 11 Feb 2022 14:44:42 -0600
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 11 Feb 2022 14:44:35 -0600
      Finished:     Fri, 11 Feb 2022 14:44:41 -0600
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:          bootz (v1:metadata.namespace)
      K8SSANDRA_CONTROL_PLANE:  true
    Mounts:
      /controller_manager_config.yaml from manager-config (rw,path="controller_manager_config.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r2ccq (ro)

Expecting this to be set to false for the deployments on data-plane.

Unclear if this the version being used of the k8ssandra-operator or the cloud-readiness test logic. Both Helm and KubeConfig are setting the environment variable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.