k8ssandra / k8ssandra Goto Github PK

K8ssandra is an open-source distribution of Apache Cassandra for Kubernetes including API services and operational tooling.

Home Page: https://k8ssandra.io/

License: Apache License 2.0

Shell 0.76% Go 13.28% Makefile 0.16% YAML 84.32% Python 0.27% Dockerfile 0.04% Smarty 0.15% Mustache 1.02%

kubernetes cassandra nosql helm stargate medusa reaper prometheus grafana hacktoberfest

k8ssandra's Introduction

[DEPRECATED]

This project is deprecated and replaced by k8ssandra-operator

Read this blog post to see what differences exist between K8ssandra and k8ssandra-operator, and why we decided to build an operator.
Follow our migration guide to migrate from K8ssandra (and Apache Cassandra®) to k8ssandra-operator.

K8ssandra

K8ssandra is a simple to manage, production-ready, distribution of Apache Cassandra and Stargate that is ready for Kubernetes. It is built on a foundation of rock-solid open-source projects covering both the transactional and operational aspects of Cassandra deployments. This project is distributed as a collection of Helm charts. Feel free to fork the repo and contribute. If you're looking to install K8ssandra head over to the Quickstarts.

Components

K8ssandra is composed of a number of sub-charts each representing a component in the K8ssandra stack. The default installation is focused on developer deployments with all of the features enabled and configured for running with a minimal set of resources. Many of these components may be deployed independently in a centralized fashion. Below is a list of the components in the K8ssandra stack with links to the appropriate projects.

Apache Cassandra

K8ssandra packages and deploys Apache Cassandra via the cass-operator project. Each Cassandra container has the Management API for Apache Cassandra (MAAC) and Metrics Collector for Apache Cassandra(MCAC) pre-installed and configured to come up automatically.

Stargate

Stargate provides a collection of horizontally scalable API endpoints for interacting with Cassandra databases. Developers may leverage REST and GraphQL alongside the traditional CQL interfaces. With Stargate operations teams gain the ability to independently scale coordination (Stargate) and data (Cassandra) layers. In some use-cases, this has resulted in a lower TCO and smaller infrastructure footprint.

Monitoring

Monitoring includes the collection, storage, and visualization of metrics. Along with the previously mentioned MCAC, K8ssandra utilizes Prometheus and Grafana for the storage and visualization of metrics. Installation and management of these pieces is handled by the Kube Prometheus Stack Helm chart.

Repairs

The Last Pickle Reaper is used to schedule and manage repairs in Cassandra. It provides a web interface to visualize repair progress and manage activity.

Backup & Restore

Another project from The Last Pickle, Medusa, manages the backup and restore of K8ssandra clusters.

Next Steps

If you are looking to run K8ssandra in your Kubernetes environment check out the Getting Started guide, with follow-up details for developers and site reliability engineers.

We are always looking for contributions to the docs, helm charts, and underlying components. Check out the code contribution guide and docs contribution guide

If you are a developer interested in working with the K8ssandra code, here is a guide that will give you an introduction to:

Important technologies and learning resources
Project components
Project processes and resources
Getting up and running with a basic IDE environment
Deploying to a local docker-based cluster environment (kind)
Understanding the K8ssandra project structure
Running unit tests
Troubleshooting tips

Dependencies

For information on the packaged dependencies of K8ssandra and their licenses, check out our open source report.

k8ssandra's People

Contributors

Stargazers

Watchers

Forkers

sudhakarvelidi oscarneira ds-steven-matison idleyoungman burmanm nastra diegopacheco mabiyou amit2016-17 vmarchese johankritzinger jsanda jdonenine-datastax axonops petergrigoriev track-and-trace vlolla shangzhipei smeth190 florissmit10 mproch drissireda rocco408 stela-leon miles-garnsey jeffbanks fdehay michaelsembwever adejanovski jeffreyscarpenter emerkle826 tlasica stanislawwos ralpheweotto leewalter carolili bradfordcp navjyotnishant hesh-bit jack-fryer stevebream qafro1 omnifroodle alexleventer ywsfay patrickclee0207 arodrime sudolamia adutra boost-entropy-k8s darthferretus erickramirezau vcanuel hermeias pmcfadin arianvp devopstoday11 liamdonnellynyc ashish932 wholeworld-timothy mrjbanksy azurecloudmonk jdonenine mattfellows rajatchouhan18 ragsns kcvivek gerbal yuriipolishchuk chienfuchen32 zencircle msmygit katzenfan roachmj kakeimei 15astro github-vincent-miszczak johnsmartco idanmsela abhijithganesh naraendiran kiafarhang docent-net akayodesegun titof974 jkremser jithindevasia arjunsalyan perplexa dyvol777 567madhava karanotts omygod613 a-lynch gygygyy shindongsun0 marwan987 amine-limonade suhailsid polandll

k8ssandra's Issues

K8SSAND-141 ⁃ Add support for cluster-scoped deployments of reaper-operator

Describe the solution you'd like
Supporting cluster-scoped deployments of the different components is going to be important for a certain segment of users and enterprises.

It would be particularly useful for larger enterprises in which SREs basically provide a PaaS for other engineers.

Additional context
Legacy Reference: Jira K8C-79

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-141
┆priority: Medium

Document authorization configuration

Is your feature request related to a problem? Please describe.
How can authorization be configured and provided?

Describe the solution you'd like
Documentation outlining options and examples related to authorization.

Additional context
Legacy reference: Jira

K8SSAND-138 ⁃ Add multi-DC support

Is your feature request related to a problem? Please describe.
As a user, I want to be able to create a multi-DC cluster with the k8ssandra-cluster chart.

Describe the solution you'd like
This would likely come in the form of a helm upgrade. As a user, I may create a cluster with one DC. Later I decided that I want to add a second DC.

Additional context
Legacy reference: Jira

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-138
┆priority: Medium

Configure Grafana root_url sub-path

I need to run K8ssandra behind a proxy which will not allow me to use the virtual-host approach to access the various services. I can create a path that I can expose through the proxy. Grafana has some settings in the config file for just this situation.

[server]
root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana/
serve_from_sub_path = true

I would like some way to configure these values.

Make storage configurable

Is your feature request related to a problem? Please describe.
Here is an example CassandraDatacenter that configures storage:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: example
spec:
  clusterName: example
  serverType: cassandra
  serverVersion: 3.11.7
  managementApiAuth:
    insecure: {}
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: standard
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi

It should be made such that the storage class name and the resources are configurable.

Additional context
Legacy reference: Jira

Make cass-operator namespace-scoped by default

I want to make this change in conjunction with #59. cass-operator and prometheus-operator are configured with cluster-wide scope. Everything else is namespace-scoped. We want everything (if possible) to be namespace-scoped.

Baseline integration test

Feature Description
Utilizing Testify and Terratest, create an integration test that references a cassdc manifest and verifies that the CassandraDatacenter deploys successfully.

Manifest sample:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: k8ssandra-test
spec:
clusterName: cluster1
serverType: cassandra
serverVersion: "3.11.7"
managementApiAuth:
insecure: {}
size: 3
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: server-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
config:
cassandra-yaml:
authenticator: org.apache.cassandra.auth.PasswordAuthenticator
authorizer: org.apache.cassandra.auth.CassandraAuthorizer
role_manager: org.apache.cassandra.auth.CassandraRoleManager
jvm-options:
initial_heap_size: "800M"
max_heap_size: "800M"
additional-jvm-opts:
# As the database comes up for the first time, set system keyspaces to RF=3
- "-Ddse.system_distributed_replication_dc_names=dc1"
- "-Ddse.system_distributed_replication_per_dc=3"

Make the chart available on artifacthub.io/helm search hub

Is your feature request related to a problem? Please describe.
Helm and CNCF have introduced an aggregator for charts repositories that lets users easily discover, pull charts or add them as dependencies to other charts.
k8ssandra is not listed in the aggregator or helm search results which makes it less discoverable.

neither

https://artifacthub.io/packages/search?page=1&ts_query_web=cassandra
https://artifacthub.io/packages/search?page=1&ts_query_web=k8ssandra
return the k8ssandra chart information and instead list 3rd party charts.

Describe the solution you'd like
I would like for k8ssandra to appear in the search results :)

The 3rd page of the introduction to helm demonstrates using helm search hub which makes helm search hub cassandra the obvious thing to run for people looking for a cassandra chart.

Add support for auto scheduling in Reaper CRD

Describe the solution you'd like
Add support for auto scheduling in Reaper CRD

Additional context
Legacy reference: Jira K8C-68

Implement a deletion job

Is your feature request related to a problem? Please describe.
As mentioned in #59 deletions in Helm are unordered. If we consolidate the k8ssandra and k8ssandra-cluster charts, we need to ensure that the CassandraDatacenter is deleted before cass-operator.

Describe the solution you'd like
We can declare a pre-delete hook on the chart. The hook will be configured to run a Job that performs ordered deletions of resources. When the user executes helm uninstall, Helm will wait to run until the deletion job completes successfully. Really the Job only needs to worry about deleting the CassandraDatacenter and any other objects with finalizers for that matter. We can let Helm take care of everything else.

In summary we will need the following:

A k8s client application with the required deletion logic
- We will write this in Go
- We will use the controller-runtime dynamic client
A Docker image of the application
A k8s Job that runs the image

@jeffbanks I am going to assign this to you since you have been doing related work in #8. I can work with you to get things set up and configured.

CI/CD integration for automated unit testing

Is your feature request related to a problem? Please describe.
The project is currently lacking automated execution of unit tests.

Describe the solution you'd like
Provide an automated approach to the execution of the existing unit tests in a publicly visible and available fashion. Preferably via GH actions.

Describe alternatives you've considered
Other CI/CD tools might be possible, such as CircleCI.

Additional context
This issue might need to be clarified or split up into multiple issues to target specific types of tests or integration approaches.

Update favicon

Bug Report

Describe the bug
The favicon used on the k8ssandra.io site should be updated to a real k8ssandra icon.

To Reproduce
Steps to reproduce the behavior:

Go to https://k8ssandra.io/
Observe the favicon displayed in the browser is a default type icon

Expected behavior
The icon used as the favicon should be some form of the k8ssandra icon.

Screenshots

Additional context
Legacy reference: Jira

Make racks configurable

Is your feature request related to a problem? Please describe.
Users should be able to configure the racks of a CassandraDatacenter. Here is an example manifest:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: k8ssandra-test
spec:
  clusterName: cluster1
  serverType: cassandra
  serverVersion: "3.11.7"
  managementApiAuth:
    insecure: {}
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: server-storage
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  racks:
    - name: rack1
      zone: us-central1-a
    - name: rack2
      zone: us-central1-b
    - name: rack3
      zone: us-central1-c

Describe the solution you'd like
The racks property of CassandraDatacenter should support templated values in the cassdc.yaml template.

Make prometheus-operator namespace-scoped by default

Reaper cannot manage cluster after a repair run if Cassandra cluster restarted

Bug Report

Describe the bug
When adding a cluster to Reaper, you have to specify one or more seed hosts, i.e., contact points. I am using the term seed host since that is the terminology that Reaper uses. We specify the name of the CassandraDatacenter headless service which will resolve to the Cassandra pods.

We want Reaper to store the name of the service rather than the pod IPs since the pod IPs are not stable. We set the REAPER_ENABLE_DYNAMIC_SEED_LIST environment variable to false in the Reaper deployment. This causes Reaper to persist the service name instead of the endpoints. The relevant code can be found here.

In the Cassandra backend, we will then have a row like this:

cqlsh> select * from reaper_db.cluster ;

 name        | last_contact                    | partitioner                                 | properties       | seed_hosts                          | state
-------------+---------------------------------+---------------------------------------------+------------------+-------------------------------------+--------
 reaper-test | 2020-12-02 00:00:00.000000+0000 | org.apache.cassandra.dht.Murmur3Partitioner | {"jmxPort":7199} | {'reaper-test-reaper-test-service'} | ACTIVE

In a test environment that I had up and running for a while, I found that Reaper could no longer make JMX connections to any of the Cassandra nodes after I had done a rolling restart of the Cassandra cluster. A rolling restart means Cassandra pods will get new IPs. That should not be a problem provided Reaper is using the k8s service to resolve endpoint addresses.

I took a look at my test cluster and here is what I found:

cqlsh:reaper_db> select * from reaper_db.cluster;

 name | last_contact                    | partitioner                                 | properties       | seed_hosts                                    | state
------+---------------------------------+---------------------------------------------+------------------+-----------------------------------------------+--------
 demo | 2020-12-01 00:00:00.000000+0000 | org.apache.cassandra.dht.Murmur3Partitioner | {"jmxPort":7199} | {'10.244.1.82', '10.244.2.54', '10.244.3.89'} | ACTIVE

IP addresses are stored instead of the CassandraDatacenter service. I think that if a subset of the Cassandra nodes are restarted, Reaper can recover. It can connect to a node that has not restarted and will get the changed addresses. I am not 100% certain that this is the case, but I believe I observed this behavior some time ago.

If however the whole cluster is restarted, Reaper cannot make any JMX connections, which means it cannot manage the cluster.

How did I wind up with IP addresses in the seed_hosts column? When Repair does a repair run, it updates the row if the nodes being repaired do not match what is stored in seed_hosts. The relevant code can be found here

To Reproduce
Steps to reproduce the behavior:

Install a k8ssandra-cluster with repair auto scheduling enabled.

$ helm install k8ssandra k8ssandra/k8ssandra

$ helm install repair-test k8ssandra/k8ssandra-cluster -f repair-values.yaml

where repair-values.yaml looks like this:

name: repair-test
clusterName: repair-test
size: 3
repair:
  reaper:
    autoschedule: true

Verify that the cluster is added in Reaper with the k8s service name. Use a command like this:

$ kubectl -it <cassandra-pod> -c cassandra -- cqlsh -e "select seed_hosts from reaper_db.cluster where name = 'repair-test'"

Do a rolling restart of the Cassandra cluster. This can be done in multiple ways. One way is to update the CassandraDatacenter spec with kubectl edit cassdc repair-test and add rollingRestartRequested: true to the spec.
After the cluster restart is complete, wait for a repair to run or manually schedule one.
When the repair finishes, rerun the query from step 2. The seed_hosts column should not contain IP addresses.

Expected behavior
Reaper should be able to manage a cluster after the cluster is restarted.

Helm charts version info

$ helm ls -A
NAME        	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART                   	APP VERSION
demo        	default  	1       	2020-12-01 17:25:22.459807945 +0000 UTC	deployed	k8ssandra-cluster-0.13.0	3.11.7
demo-backup 	default  	1       	2020-12-01 16:45:00.818197619 +0000 UTC	deployed	backup-0.13.0           	0.1.0
demo-restore	default  	1       	2020-12-01 16:46:56.367255906 +0000 UTC	deployed	restore-0.13.0          	0.1.0
k8ssandra   	default  	1       	2020-12-01 17:25:07.72605783 +0000 UTC 	deployed	k8ssandra-0.13.0        	3.11.7
traefik     	traefik  	1       	2020-12-01 16:26:57.231930337 +0000 UTC	deployed	traefik-9.11.0          	2.3.3

Helm charts user-supplied values

$ helm get values demo
USER-SUPPLIED VALUES:
clusterName: demo
repair:
  reaper:
    autoschedule: true
size: 3

Kubernetes version information:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-06-03T04:00:21Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:
kind

Additional context
This is going to require changes in Reaper. K8ssandra is using Reaper 2.0.5. I believe the latest release is 2.1.2. We will want to upgrade Reaper at some point, so I do not think it makes sense to try and back port any changes.

Make Cassandra JVM heap settings configurable

Is your feature request related to a problem? Please describe.
Users should be able to configure Cassandra's JVM heap settings with the k8ssandra-cluster helm chart.

Here is an example CassandraDatacenter manifest that includes heap settings:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: k8ssandra-test
spec:
  clusterName: cluster1
  serverType: cassandra
  serverVersion: "3.11.7"
  managementApiAuth:
    insecure: {}
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: server-storage
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  config:
    jvm-options:
      initial_heap_size: "800M"
      max_heap_size: "800M"

Describe the solution you'd like
There should be properties in the k8ssandra-cluster chart along with reasonable defaults that are consistent with Cassandra best practices.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Note that the value of the config property is actually just treated as raw JSON. This means that it is essentially an opaque blob to cass-operator.

Combine the k8ssandra and k8ssandra-cluster charts

The separate, two chart installation is awkward and error-prone.

Initially the k8ssandra chart was supposed to install the stack - Prometheus, Grafana, Reaper, cass-operator, prometheus-operator, etc. The k8ssandra-cluster chart was supposed to be focused on creating and configuring the CassandraDatacenter. We intended for everything or as much as possible in the k8ssandra chart to be cluster-scoped by default. This means you would only run a single instances of cass-operator, Prometheus, Grafana, etc.

We ran into various issues with trying to make things cluster-scoped; consequently, more and more things have been moved from the k8ssandra chart to the k8ssandra-cluster chart. To be clear, we intend to support cluster-scoped deployments of the various resources. We simply did not have time to address those issues prior to KubeCon.

As it currently stands, k8ssandra-cluster only installs cass-operator and prometheus-operator.

There was another reason for the separate charts. cass-operator adds a finalizer to every CassandraDatacenter. Kubernetes will not delete an object until all finalizers are cleared. If cass-operator is deleted before the CassandraDatacenter, deletion is essentially blocked. You have to manually removed the finalizer in order for deletion of the CassandraDatacenter to complete. I mention all of this because there is no ordered deletion with helm uninstall. If everything is in the same chart, we could potentially wind up in a band state where deletion of the CassandraDatacenter is blocked.

We can work around the unordered deletion with a pre-delete hook in the chart. (See this section in the Helm docs for details). I will write up a separate ticket for implementing the pre-delete hook.

Provided we have a way to cleanly handle deletions, I propose the following for this issue:

Move all components/templates into the k8ssandra chart
Do a namespace-scoped install of cass-operator by default
Do a namespace-scoped install of prometheus-operator by default

With these changes everything will be namespace-scoped by default. In future tickets we can look to add support multi-namespace or cluster-wide scope for the various components.

K8SSAND-78 ⁃ Package k8ssandra for Operator Lifecycle Manager

Is your feature request related to a problem? Please describe.
Manage lifecycle of k8ssandra-tools better.

Describe the solution you'd like
Support Operator Lifecycle Manager
Consider listing on OperatorHub.io

Describe alternatives you've considered
Helm is the alternative.

Additional context
This will aid in managing k8ssandra itself, but will come with all the other goodies of OLM such as UI integration driven by the CRDs.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-78
┆priority: Medium

K8SSAND-132 ⁃ Error applying medusa/backup.yaml on K8s 1.19

Bug Report

Describe the bug
Running against the branch for PR #28:

$ helm install k8ssandra ../charts/k8ssandra -n k8ssandra --create-namespace
Error: failed to install CRD crds/medusa/backup.yaml: CustomResourceDefinition.apiextensions.k8s.io "cassandrabackups.cassandra.k8ssandra.io" is invalid: [spec.validation.openAPIV3Schema.properties[status].properties[cassdcTemplateSpec].properties[spec].properties[podTemplateSpec].properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[status].properties[cassdcTemplateSpec].properties[spec].properties[podTemplateSpec].properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]

That references lines 2368-2370 of medusa/backup.yaml (indents stripped for readability):

protocol:
  description: Protocol for port. Must be UDP, TCP, or SCTP. Defaults to "TCP".
  type: string

It’s complaining that there’s no default specified for protocol. I added one:

protocol:
  description: Protocol for port. Must be UDP, TCP, or SCTP. Defaults to "TCP".
  type: string
  default: "TCP"

then got this error:

Error: failed to install CRD crds/medusa/backup.yaml: CustomResourceDefinition.apiextensions.k8s.io "cassandrabackups.cassandra.k8ssandra.io" is invalid: [spec.validation.openAPIV3Schema.properties[status].properties[cassdcTemplateSpec].properties[spec].properties[podTemplateSpec].properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Forbidden: must not be set (cannot set default values in apiextensions.k8s.io/v1beta1 CRDs, must use apiextensions.k8s.io/v1), spec.validation.openAPIV3Schema.properties[status].properties[cassdcTemplateSpec].properties[spec].properties[podTemplateSpec].properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]

Then changed apiVersion from v1beta1 to v1, and got this error:

Error: failed to install CRD crds/medusa/backup.yaml: CustomResourceDefinition.apiextensions.k8s.io "cassandrabackups.cassandra.k8ssandra.io" is invalid: spec.versions[0].schema.openAPIV3Schema: Required value: schemas are required

To Reproduce
jsanda was unable to reproduce on K8s 1.17

Expected behavior
A clear and concise description of what you expected to happen.

Environment (please complete the following information):

Kubernetes version information:
1.19

Additional context
Legacy reference: Jira
Originally reported by jakerobb

┆Issue is synchronized with this Jira Bug by Unito
┆friendlyId: K8SSAND-132
┆priority: Medium

Provide additional documentation related to Prometheus setup/access

Describe the solution you'd like
For the first iteration, we may want to include things like:

Provide steps to access the PromUI using kubectl port-forward.
Show some examples of querying for some different metrics.
Show how to view scrape targets in PromUI
Show how to access Prometheus logs

Additional context
Legacy reference: Jira K8C-46

Testing

Testing what will happen with the projects when an issue is created.

K8SSAND-139 ⁃ Document client encryption configuration

Is your feature request related to a problem? Please describe.
How can client encryption be configured and provided?

Describe the solution you'd like
In addition to documenting the relevant properties in the CassandraDatacenter spec, we should also explain things like:

What type of encryption is used
How certs are signed
Whether or not mutual TLS is used
How keystore and trustore are created and managed
Whether or not there is support for hostname verification
How to deploy a client application that is configured with encryption

Additional context
Legacy reference: Jira

Initial implementation of capabilities will come in #171

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-139
┆priority: Medium

K8SSAND-133 ⁃ Improve configurability of C* deployment

Is your feature request related to a problem? Please describe.
There are aspects of the C* deployment within K8ssandra which cannot be easily configurated.

Describe the solution you'd like
Make aspects of the C* deployment easily configurable like:

Cluster size (I believe this is already available)

Users should be able to configure the cluster size of a CassandraDatacenter. Here is an example manifest:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: k8ssandra-test
spec:
  clusterName: cluster1
  serverType: cassandra
  serverVersion: "3.11.7"
  managementApiAuth:
    insecure: {}
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: server-storage
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi

Mulitple nodes per worker

By default cass-operator uses anti-affinity so that C* pods are not co-located on the same k8s worker node. There are times, particularly during development and testing, where we want to relax those constraints. {{CassandraDatacenter}} has the AllowMultipleNodesPerWorker property for this. We want to expose this setting.

Version, document supported versions
CPU/Memory resource allocations

Here is an example CassandraDatacenter that configures CPU and memory:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: example
spec:
  clusterName: example
  serverType: cassandra
  serverVersion: 3.11.7
  managementApiAuth:
    insecure: {}
  size: 3
  allowMultipleNodesPerWorker: true
  resources:
    requests:
      cpu: 1
      memory: 1Gi
    limits:
      cpu: 1
      memory: 1Gi
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: standard
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi  

Note that the operator requires resources to be specified when allowMultipleNodesPerWorker is true.

Additional context
Legacy reference: Jira

┆Issue is synchronized with this Jira Task by Unito
┆epic: Pod Scheduling Control
┆friendlyId: K8SSAND-133
┆priority: Medium

K8SSAND-137 ⁃ CI/CD integration for automated e2e/integration testing

Is your feature request related to a problem? Please describe.
The project is currently lacking automated execution of e2e/integration tests.

Describe the solution you'd like
Provide an automated approach to the execution of the existing e2e/integration tests in a publicly visible and available fashion. Preferably via GH actions.

Describe alternatives you've considered
Other CI/CD tools might be possible, such as CircleCI.

Additional context
This issue might need to be clarified or split up into multiple issues to target specific types of tests or integration approaches.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-137
┆priority: Medium

Reaper doesn't populate with Cassandra cluster

Bug Report

When I install K8ssandra and launch reaper, reaper is not aware of the Cassandra cluster.

I am running on a Kind cluster within the Katacoda VM environment. The Kind cluster has 3 worker nodes. The Katacoda environment runs behind a proxy, so I install an nginix ingress to route ports as follows:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: petclinic-ingress
spec:
  rules:
  - http:
      paths:
      - path: /
        backend:
          serviceName: petclinic-frontend
          servicePort: 8081
      - path: /petclinic/api
        backend:
          serviceName: petclinic-backend
          servicePort: 9966
      - path: /prometheus
        backend:
          serviceName: k8ssandra-cluster-a-prometheus-k8ssandra
          servicePort: 9090
      - path: /webui
        backend:
          serviceName: k8ssandra-cluster-a-reaper-k8ssandra-reaper-service
          servicePort: 8080
      - path: /grafana
        backend:
          serviceName: grafana-service
          servicePort: 3000

Here's what Reaper looks like:

I installed K8ssandra with the following:

helm repo add traefik https://helm.traefik.io/traefik
helm repo update
helm install traefik traefik/traefik --create-namespace -f traefik.values.yaml

helm repo add k8ssandra https://helm.k8ssandra.io/
helm repo update
helm install k8ssandra-tools k8ssandra/k8ssandra
helm install k8ssandra-cluster-a k8ssandra/k8ssandra-cluster -f grafana-config-values.yaml

Here are the config values in grafana-config-values.yaml:

monitoring:
  grafana:
    config:
      server:
        rootUrl: http://localhost:3000/grafana
        serveFromSubPath: true
  prometheus:
    externalUrl: http://localhost:9090/prometheus
    routePrefix: /prometheus

I am also running a petclinic app.
Here's the yaml file for the app (including the ingress settings mentioned above):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: petclinic
  labels:
    app: petclinic
spec:
  replicas: 1
  selector:
    matchLabels:
      app: petclinic-backend
  template:
    metadata:
      labels:
        app: petclinic-backend
    spec:
      containers:
      - name: petclinic-backend
        image: "datastaxdevs/petclinic-backend"
        #command: ["tail"]
        #args: ["-f", "/dev/null"]
        env:
          - name: CASSANDRA_USER
            valueFrom:
              secretKeyRef:
                name: k8ssandra-superuser
                key: username
          - name: CASSANDRA_PASSWORD
            valueFrom:
              secretKeyRef:
                name: k8ssandra-superuser
                key: password
          - name: LISTENING_PORT
            value: "9966"
          - name: MONITORING_ENABLED
            value: "true"
          - name: MONITORING_PROMETHEUS
            value: "true"
          - name: MONITORING_METRICS
            value: "true"
          - name: MONITORING_LISTENING_PORT
            value: "9967"
          - name: DISTRIBUTED_TRACING_ENABLED
            value: "true"
          - name: DISTRIBUTED_TRACING_URL
            value: "http://tracing-server:9411"
          - name: CASSANDRA_CONTACT_POINTS
            value: "k8ssandra-dc1-service:9042"
          - name: CASSANDRA_LOCAL_DC
            value: "dc1"
          - name: CASSANDRA_KEYSPACE_CREATE
            value: "true"
          - name: CASSANDRA_KEYSPACE_NAME
            value: "spring_petclinic"
          - name: CASSANDRA_KEYSPACE_CQL
            value: "CREATE KEYSPACE IF NOT EXISTS spring_petclinic WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };"

---
kind: Service
apiVersion: v1
metadata:
  name: petclinic-backend
spec:
  #type: NodePort
  selector:
    app: petclinic-backend
  ports:
  # Default port used by the image
  - port: 9966
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: petclinic-frontend
  labels:
    app: petclinic-frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: petclinic-frontend
  template:
    metadata:
      labels:
        app: petclinic-frontend
    spec:
      containers:
      - name: petclinic-frontend
        image: "datastaxdevs/petclinic-frontend"
---
kind: Service
apiVersion: v1
metadata:
  name: petclinic-frontend
spec:
  selector:
    app: petclinic-frontend
  ports:
  - port: 8081
    targetPort: 8080
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: petclinic-ingress
spec:
  rules:
  - http:
      paths:
      - path: /
        backend:
          serviceName: petclinic-frontend
          servicePort: 8081
      - path: /petclinic/api
        backend:
          serviceName: petclinic-backend
          servicePort: 9966
      - path: /prometheus
        backend:
          serviceName: k8ssandra-cluster-a-prometheus-k8ssandra
          servicePort: 9090
      - path: /webui
        backend:
          serviceName: k8ssandra-cluster-a-reaper-k8ssandra-reaper-service
          servicePort: 8080
      - path: /grafana
        backend:
          serviceName: grafana-service
          servicePort: 3000

---

Do not deploy Reaper schema job until Cassandra is ready

First install the k8ssandra-cluster chart:

$ helm install demo
Error: must either provide a name or specify --generate-name
johns-mbp:charts jsanda$ helm install demo ./k8ssandra-cluster/
NAME: demo
LAST DEPLOYED: Tue Nov 17 20:11:06 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Then check on the status of the pods:

$ kubectl get pods
NAME                                                 READY   STATUS             RESTARTS   AGE
cass-operator-86d4dc45cd-kt7tp                       1/1     Running            0          10h
demo-grafana-operator-k8ssandra-68d47447d4-qxh8x     1/1     Running            0          114s
demo-reaper-k8ssandra-schema-7h89x                   0/1     CrashLoopBackOff   3          88s
demo-reaper-operator-k8ssandra-6598cb7f74-vzcbv      1/1     Running            0          114s
grafana-deployment-59fb9b985c-tb772                  1/1     Running            0          57s
k8ssandra-dc1-default-sts-0                          2/2     Running            0          113s
k8ssandra-kube-prometheus-operator-755f6c444-kvgmj   1/1     Running            0          10h
prometheus-demo-prometheus-k8ssandra-0               2/2     Running            1          114s

Notice this line:

demo-reaper-k8ssandra-schema-7h89x                   0/1     CrashLoopBackOff   3          88s

Reaper is configured to use the same Cassandra cluster that is being deployed as its backend. While Reaper manages applying schema changes, it does create the keyspace.

reaper-operator deploys a job to create the keyspace:

$ kubectl get jobs
NAME                           COMPLETIONS   DURATION   AGE
demo-reaper-k8ssandra-schema   1/1           106s       6m29s

The CrashLoopBackOff and restarts happen because Cassandra is not up yet, so the job fails. This is fine because the job will continue to be retried and even recreated if necessary until it succeeds. This can be confusing though those for users. If there are deployment problems, and a user sees that error, he might mistakenly think that is the issue.

reaper-operator should wait to deploy the schema init job until the CassandraDatacenter is ready. Look here for an example of how to check if a CassandraDatacenter is ready.

These changes won't completely eliminate the possibility of the schema init job failing. The changes will however reduce some noise during initialization given that this is an expected situation.

Note that the PR should be submitted to the reaper-operator repo.

Resolve Grafana namespace issues

Bug Report

Describe the bug

We have multiple bug reports open with grafana-operator:
grafana/grafana-operator#303
grafana/grafana-operator#304
grafana/grafana-operator#305
grafana/grafana-operator#306

These bugs combine to make it so that if you install k8ssandra-cluster in a namespace other than the one in which you installed k8ssandra, Grafana will not have a datasource and so the dashboards will not function. Furthermore, you cannot install multiple clusters into the same namespace.

To Reproduce

helm install k8ssandra k8ssandra
helm install cluster1 k8ssandra-cluster -n cluster1 --create-namespace
helm install cluster2 k8ssandra-cluster -n cluster2 --create-namespace

Expected behavior
There should be Grafana instances created in both namespaces (cluster1 and cluster2). Each should:

have a datasource configured to point to Prometheus
have copies of the three dashboards (overview, condensed, and system)
show data from their respective clusters

Solution
Until the listed issues are resolved, in order to play nice with multiple clusters deployed to the same Kubernetes, we need to have a single Grafana instance with a single DataSource and a single set of Dashboards, and they all need to be in the same namespace as the operator.

Prometheus data source URL is wrong when Prometheus uses a routePrefix

Bug Report

Describe the bug
We adding support for setting a route prefix for Prometheus in #90 (see this post for background on route prefix). Setting the routePrefix breaks the Grafana data source URL.

When the routePrefix property is configured, it needs to be included in the data source URL.

To Reproduce
Steps to reproduce the behavior:

Install k8ssandra chart

$ helm install k8ssandra k8ssandra/k8ssandra

Install k8ssandra-cluster with routePrefix configured

$ helm install test k8ssandra/k8ssandra-cluster -f prom-values.yaml

where prom-values.yaml looks like this:

monitoring:
  prometheus:
    routePrefix: /prometheus

Wait for pods to become ready
Log into Grafana UI
Go to Data Sources and select Prometheus
Click Save and Test. It should fail.

Expected behavior
The Data Source URL should work when routePrefix is configured such that we are able to see metrics in Grafana.

The data source url looks like http://test-prometheus-k8ssandra.default/prometheus:9090/ but it should be of the form
http://test-prometheus-k8ssandra.default/prometheus:9090/<routePrefix>.

Screenshots

Environment (please complete the following information):

Helm charts version info

$ helm ls -A
NAME        	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART                   	APP VERSION
k8ssandra   	default  	2       	2020-12-04 16:53:35.215333 -0500 EST	deployed	k8ssandra-0.14.0        	3.11.7
traefik     	default  	1       	2020-11-22 10:08:46.747246 -0500 EST	deployed	traefik-9.11.0          	2.3.3
upgrade-test	default  	2       	2020-12-04 16:54:52.861233 -0500 EST	deployed  	k8ssandra-cluster-0.14.0	3.11.7

Helm charts user-supplied values

$ helm get values RELEASE_NAME

Kubernetes version information:

kubectl version

Medusa needs to be configured with Cassandra credentials secret

Bug Report

Describe the bug
There are some places where Medusa makes CQL calls. I need to review to find out exactly where/when. The short of it is, if Cassandra auth is enabled, I am not sure if backup/restore will work.

Expected behavior
Backup/restore workflows should work even when auth is enabled on Cassandra.

Additional context
Legacy reference: Jira K8C-99

Upgrade cass-operator to > 1.5.0

Is your feature request related to a problem? Please describe.
k8ssandra is currently using a patched version of cass-operator that is between 1.4.1 and 1.5.0. The patches did not land in 1.5.0.

Describe the solution you'd like
I want to rebase our patched version on top of 1.5.0 so that we can take advantage of scale down. The version we are using does not support scale down.

I will try to get the patches into the next release of cass-operator.

https://github.com/datastax/cass-operator/packages/174279 has the changes that we need.

Backup/restore support for local storage

Is your feature request related to a problem? Please describe.
Currently, backup/restore is only supported for AWS S3 and Google Cloud storage.

Describe the solution you'd like
Add support for local storage as a means for backup/restore so that it can operate in environments where using cloud based blob storage is not an option.

Describe alternatives you've considered
Leveraging MinIO as an AWS S3 compatible locally hosted object storage mechanism.

Additional context
Legacy reference: Jira

Please allow setting the prometheus URL command line option --web.external-url

In order to access prometheus through a proxy, I need to be able to set a path to prometheus instead of using a virtual host. I don't know if this option affects where grafana looks to find the data. If it does, we would need to be able to make grafana work with this modification as well.

Add documentation related to the configuration of StorageClasses

Is your feature request related to a problem? Please describe.
It's unclear how to properly configure the required StorageClass for various types of storage.

Describe the solution you'd like
Provide examples of StorageClass configuration, similar to what is found in the documentation for the cass-operator, for the following types of providers in the documentation:

GKE
EKS
AKS
Local/Development

Additional context
Legacy reference: Jira

K8SSAND-136 ⁃ Add unit tests to verify creation of JMX auth secret

Is your feature request related to a problem? Please describe.
Secret creation is handled in secrets.go by the SecretsManager interface. It uses the controller-runtime client to validate the secret. The use of the controller-runtime client while necessary made it difficult for me (@jsanda) to figure out how to add unit tests in K8C-7. A reasonable solution is to add an interface for the secret validation which can then be stubbed out for unit tests.

Describe the solution you'd like
Add unit testing.

Additional context
Legacy reference: Jira

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-136
┆priority: Medium

K8SSAND-135 ⁃ The k8ssandra-cluster chart instructions should say how to connect to Cassandra via cqlsh

Is your feature request related to a problem? Please describe.
Similar to #77 after the chart installs, more info should be provided as to how to access the C* cluster.

Describe the solution you'd like
The chart instructions should tell the user how to connect to the Cassandra cluster with cqlsh.

Additional context
Legacy reference: Jira

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-135
┆priority: Medium

TEST - IGNORE

ingressroutes are lost during scale up / scale down operations

Bug Report

Describe the bug
k8ssandra-cluster scale up/down reverts changes to ingress

To Reproduce
Steps to reproduce the behavior:

Provision a Cassandra cluster according to https://k8ssandra.io/docs/getting-started/
Install traefik according to https://k8ssandra.io/docs/topics/ingress/traefik/kind-deployment/#4-install-traefik-via-helm
Upgrade the cluster by enabling ingress. Example command:

helm upgrade k8ssandra-cluster  ./charts/k8ssandra-cluster --set ingress.traefik.enabled=true --set ingress.traefik.monitoring.grafana.host=grafana.tomer-cass --set ingress.traefik.monitoring.prometheus.host=prometheus.tomer-cass --set ingress.traefik.repair.host=repair.tomer-cass

Verify access to services UIs (should work) and ingressroutes are in place:
Scale up the cluster to 3 nodes and wait for it to be fully operational
Scale down the cluster back to 1 node
See error - cannot access services. The reason is that the ingressroutes dissapeared.

Expected behavior
ingressroutes should not get affected by scale up / scale down operations.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Helm charts version info

Helm charts user-supplied values

Kubernetes version information:

Kubernetes cluster kind:
On premises (multiple VM nodes) provisioned using kubeadm

Additional context

K8SSAND-134 ⁃ The k8ssandra-cluster chart instructions should say how to access the super user credentials

Is your feature request related to a problem? Please describe.
After installing, additional context should be provided on the command line w.r.t. accessing the credentials

Describe the solution you'd like
At the end of helm install or helm upgrade helm can instructions and other info for the user. This customized using the templates/NOTES.txt file.

Here is example output from installing mysql:

$ helm install stable/mysql --generate-name
NAME: mysql-1602951452
LAST DEPLOYED: Sat Oct 17 12:17:34 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
MySQL can be accessed via port 3306 on the following DNS name from within your cluster:
mysql-1602951452.default.svc.cluster.local

To get your root password run:

    MYSQL_ROOT_PASSWORD=$(kubectl get secret --namespace default mysql-1602951452 -o jsonpath="{.data.mysql-root-password}" | base64 --decode; echo)

To connect to your database:

1. Run an Ubuntu pod that you can use as a client:

    kubectl run -i --tty ubuntu --image=ubuntu:16.04 --restart=Never -- bash -il

2. Install the mysql client:

    $ apt-get update && apt-get install mysql-client -y

3. Connect using the mysql cli, then provide your password:
    $ mysql -h mysql-1602951452 -p

To connect to your database directly from outside the K8s cluster:
    MYSQL_HOST=127.0.0.1
    MYSQL_PORT=3306

    # Execute the following command to route the connection:
    kubectl port-forward svc/mysql-1602951452 3306

    mysql -h ${MYSQL_HOST} -P${MYSQL_PORT} -u root -p${MYSQL_ROOT_PASSWORD}

cass-operator will create a Cassandra super user and generate a password that is stored in a secret. A user-defined secret be used instead by setting the .spec.superuserSecretName property in a CassandraDatacenter manifest.

Like the mysql chart, we should output instructions on how to access the username and password of the super user. Both of them are stored in the secret. The secret generated by cass-operator is of the form -superuser.

Additional context
Legacy reference: Jira

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-134
┆priority: Medium

K8SSAND-77 ⁃ Implement a CLA for contributions.

We will need something for contributors to sign when they contribute changes.

We could use a GitHub action like CLA Assistant.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-77
┆priority: Medium

K8SSAND-130 ⁃ Update chart repo when chart versions change and do require versions to be same across charts

Is your feature request related to a problem? Please describe.
There are a couple of issues I want to address here.

First, we currently update the k8ssandra Helm chart repo when we create a new tag. In order for a user to consume a new version of a chart, we have to create and push a tag. The tagging seems like an extra, unnecessary step.

Secondly, the GH Actions CI require the versions to be the same across all charts. If I make an update in the k8ssandra chart and bump its version, I have to also bump the versions of all the other charts even though I have not changed them. This makes extra work for the developer making chart changes. It also can make things confusing for the end user consuming the charts because he may see versions and expect changes when there are none.

Describe the solution you'd like
For the first problem, I would like to treat it like Docker images.

In a project that produces Docker images, the CI will typically push new images to a repo, e.g., Docker Hub, as changes are made in the project. I am free to pull those images at any time. At some point the project will cut a release which includes a new tag and new tagged image.

We should update the chart repo on every version change of a chart. I do not think tags should be part of the process of updating the charts repo.

I prefer to use tags for releases. Let's say we establish a monthly release cadence. In the lead up to the release chart versions will change. Then we do a k8sandra release which will include several things:

Pinned versions of charts
- The exact version may vary across charts
A pinned version of cass-operator
A pinned version of reaper-operator
A pinned version of medusa-operator

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-130
┆priority: Medium

K8SSAND-140 ⁃ Document inter-node encryption configuration

Is your feature request related to a problem? Please describe.
How can inter-node encryption be configured and provided?

Describe the solution you'd like
In addition to documenting the relevant properties in the CassandraDatacenter spec, we should also explain things like:

What type of encryption is used
How certs are signed
Whether or not mutual TLS is used
How keystore and trustore are created and managed
Whether or not there is support for hostname verification

Additional context
Legacy reference: Jira

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-140
┆priority: Medium

In-place restore triggers two rolling restarts of Cassandra cluster

Bug Report

Describe the bug
An in-place restore of a backup is done via a rolling restart of the Cassandra cluster. The restore operation actually triggers two rolling restarts when there should only be one.

To Reproduce
Steps to reproduce the behavior:

Install k8ssandra-cluster with backup/restore enabled and configured. Here is an example values.yaml file:

name: demo
clusterName: demo
size: 3
backupRestore:
  medusa:
    enabled: true
    # Note that the bucket has to be configured separately.
    bucketName: k8ssandra-backups
    # Note that the secret should be created separately.
    bucketSecret: k8ssandra-bucket-key
    multiTenant: false
    storage: s3

Install the k8ssandra and k8ssandra-cluster charts

$ helm install k8ssandra k8ssandra/k8ssandra

$ helm install backup-restore-test k8ssandra/k8ssandra-cluster -f values.yaml

Wait for the CassandraDatacenter to become ready
Create a backup

$ helm install backup-1 -f backup-values.yaml

where backup-values.yaml looks like this:

name: backup-1
cassandraDatacenter:
  name: dc1

Wait for the backup to complete. You can monitor the progress of the backup with kubectl:

$ kubectl get cassandrabackup backup-1 -o yaml
...
spec:
  cassandraDatacenter: dc1
  name: backup-1
status:
...
  finishTime: "2020-11-30T20:13:59Z"
  finished:
  - demo-dc1-default-sts-1
  - demo-dc1-default-sts-0
  - demo-dc1-default-sts-2
  startTime: "2020-11-30T20:13:44Z"

The backup is finished when the .status.finishTime property is set.

Expected behavior
There should only be a single rolling restart, but I discovered that two rolling restarts are triggered.

Additional context
The root cause lies in medusa-operator. It is updating the podTemplateSpec property of the CassandraDatacenter as well as setting the rollingRestartRequested property to true.

Wrong chart template for Trafik IngressRoute

Bug Report

Describe the bug
Enabling the cassandra Traefik TCP Route in the yaml values file for the k8ssandra cluster gives an error in the chart yaml:

Error: YAML parse error on k8ssandra-cluster/templates/traefik.ingressroutes.yaml: error converting YAML to JSON: yaml: line 5: mapping values are not allowed in this contex

To Reproduce

Prepare a k8ssandra-cluster yaml values file as in:

ingress:
  traefik:
    enabled: true
    cassandra:
      enabled: true
      entrypoints: 
        - cassandra

Try to install (or dry-run) the chart with:

> helm install k8ssandra-cluster-a ./k8ssandra-cluster   -f kvalues.yaml --dry-run
Error: YAML parse error on k8ssandra-cluster/templates/traefik.ingressroutes.yaml: error converting YAML to JSON: yaml: line 5: mapping values are not allowed in this context

Proposed Resolution
The file k8ssandra-cluster/templates/traefik.ingressroutes.yaml is missing a label stanza in the ingressroute object (line 33) :

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: {{ $releaseName }}-k8ssandra-cassandra
{{ include "k8ssandra-cluster.labels" . | indent 4 }}
spec:
  entryPoints:
...

It should be:

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: {{ $releaseName }}-k8ssandra-cassandra
  labels:
{{ include "k8ssandra-cluster.labels" . | indent 4 }}
spec:
  entryPoints:
...

Improve developer documentation

Is your feature request related to a problem? Please describe.
I'm a developer who wants to get involved in the project, but I don't know how to get started.

Describe the solution you'd like
Provide documentation related to:

Development environment requirements
Development environment setup
Executing tests
Writing tests

Additional context
Legacy reference: Jira Jira

K8SSAND-129 ⁃ Grafana pod stuck on CrashLoopBackOff due to 2 datasource entries exist in grafana-datasources configmap

Bug Report

Describe the bug
I am trying k8ssandra on my own K8S cluster, all look good but just the grafana pod is stuck:

automaton@ip-10-101-33-203:~$ k get pod
NAME                                                              READY   STATUS             RESTARTS   AGE
cass-operator-86d4dc45cd-gv997                                    1/1     Running            0          7m56s
grafana-deployment-847954b9fc-lhkbh                               0/1     CrashLoopBackOff   5          3m22s
k8ssandra-cluster-a-grafana-operator-k8ssandra-5b89b64f4f-j2hh6   1/1     Running            0          7m45s
k8ssandra-cluster-a-reaper-k8ssandra-847c99ccd8-gqrtd             1/1     Running            0          2m46s
k8ssandra-cluster-a-reaper-k8ssandra-schema-jjjt7                 0/1     Completed          3          3m32s
k8ssandra-cluster-a-reaper-operator-k8ssandra-5db8b7c5b7-xz6q6    1/1     Running            0          7m45s
k8ssandra-dc1-default-sts-0                                       2/2     Running            0          7m44s
k8ssandra-tools-kube-prome-operator-6bcdf668d4-t8gdl              1/1     Running            0          7m56s
prometheus-k8ssandra-cluster-a-prometheus-k8ssandra-0             2/2     Running            1          7m44s

As per the logs:

t=2020-11-23T01:12:43+0000 lvl=eror msg="Server shutdown" logger=server reason="Service init failed: Datasource provisioning error: datasource.yaml config is invalid. Only one datasource per organization can be marked as default"

Checked the grafana-datasources configmap, did see 2 entries for datasource:

apiVersion: v1                 
data:
  default_prometheus-grafanadatasource.yaml: |
    apiVersion: 1              
    datasources:
    - access: proxy            
      editable: true           
      isDefault: true          
      jsonData:                
        timeInterval: 5s       
      name: Prometheus         
      secureJsonData: {}       
      type: prometheus         
      url: http://k8ssandra-cluster-a-prometheus-k8ssandra.default:9090
      version: 1               
  default_stress-prometheus.yaml: |
    apiVersion: 1
    datasources:               
    - access: proxy            
      isDefault: true          
      jsonData:
        timeInterval: 5s
        tlsSkipVerify: true
      name: stress-prometheus
      secureJsonData: {}
      type: prometheus
      url: http://stress-prometheus:9090
      version: 1
kind: ConfigMap
...

Tried the below and issue still persisted:

Delete the grafana pod and deployment
Uninstall and re-install the whole k8ssandra cluster via helm

The only working workaround is to remove the 2nd entry (default_stress-prometheus.yaml) in the grafana-datasources configmap, the pod would become running ready right away:

automaton@ip-10-101-33-203:~$ k get pods
NAME                                                              READY   STATUS      RESTARTS   AGE
cass-operator-86d4dc45cd-gv997                                    1/1     Running     0          62m
grafana-deployment-847954b9fc-xk6z6                               1/1     Running     0          37m
k8ssandra-cluster-a-grafana-operator-k8ssandra-5b89b64f4f-j2hh6   1/1     Running     0          61m
k8ssandra-cluster-a-reaper-k8ssandra-847c99ccd8-gqrtd             1/1     Running     0          56m
k8ssandra-cluster-a-reaper-k8ssandra-schema-jjjt7                 0/1     Completed   3          57m
k8ssandra-cluster-a-reaper-operator-k8ssandra-5db8b7c5b7-xz6q6    1/1     Running     0          61m
k8ssandra-dc1-default-sts-0                                       2/2     Running     0          61m
k8ssandra-tools-kube-prome-operator-6bcdf668d4-t8gdl              1/1     Running     0          62m
prometheus-k8ssandra-cluster-a-prometheus-k8ssandra-0             2/2     Running     1          61m

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
The grafana pod should not be stuck, the grafana-datasources configmap should have only 1 datasource that was created by the k8ssandra-tools-kube-prome-operator.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Helm charts version info

$ helm ls -A

automaton@ip-10-101-33-203:~$ helm ls -A
NAME                	NAMESPACE          	REVISION	UPDATED                                	STATUS  	CHART                   	APP VERSION
cass-operator       	my-custom-namespace	1       	2020-05-07 04:41:19.536867338 +0000 UTC	deployed	cass-operator-1.0.0     	           
demo-guestbook      	default            	1       	2020-05-06 07:24:34.859635188 +0000 UTC	deployed	guestbook-1.1.0         	2.0        
k8ssandra-cluster-a 	default            	1       	2020-11-23 00:57:32.999960191 +0000 UTC	deployed	k8ssandra-cluster-0.10.0	3.11.7     
k8ssandra-tools     	default            	1       	2020-11-23 00:57:19.158429227 +0000 UTC	deployed	k8ssandra-0.10.0        	3.11.7     
wordpress-1601961064	default            	1       	2020-10-06 05:11:07.14275057 +0000 UTC 	deployed	wordpress-9.0.3         	5.3.2

Helm charts user-supplied values

$ helm get values RELEASE_NAME

automaton@ip-10-101-33-203:~$ helm get values k8ssandra-cluster-a
USER-SUPPLIED VALUES:
null
automaton@ip-10-101-33-203:~$ helm get values k8ssandra-tools
USER-SUPPLIED VALUES:
null

Kubernetes version information:

kubectl version

automaton@ip-10-101-33-203:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

Additional context

The 2nd datasource entry default_stress-prometheus.yaml in the grafana-datasources configmap might come from a crd from a lastpickle-stress test I did long time ago, verifying this now and will provide more details once found:

default_stress-prometheus.yaml: |
    apiVersion: 1
    datasources:               
    - access: proxy            
      isDefault: true          
      jsonData:
        timeInterval: 5s
        tlsSkipVerify: true
      name: stress-prometheus
      secureJsonData: {}
      type: prometheus
      url: http://stress-prometheus:9090
      version: 1

Helm version:

automaton@ip-10-101-33-203:~$ helm version
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.11"}

Found this CRD which defines the 2nd entry of grafana-datasource, which was part of the thelastpickle-stress test cluster created long ago (cluster was terminated but the crd was not removed):

automaton@ip-10-101-33-203:~$ k get crd grafanadatasources.integreatly.org -o yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apiextensions.k8s.io/v1beta1","kind":"CustomResourceDefinition","metadata":{"annotations":{},"name":"grafanadatasources.integreatly.org"},"spec":{"group":"integreatly.org","names":{"kind":"GrafanaDataSource","listKind":"GrafanaDataSourceList","plural":"grafanadatasources","singular":"grafanadatasource"},"scope":"Namespaced","subresources":{"status":{}},"validation":{"openAPIV3Schema":{"properties":{"apiVersion":{"type":"string"},"kind":{"type":"string"},"metadata":{"type":"object"},"spec":{"properties":{"datasources":{"items":{"description":"Grafana Datasource Object","type":"object"},"type":"array"},"name":{"minimum":1,"type":"string"}},"required":["datasources","name"]}}}},"version":"v1alpha1"}}
  creationTimestamp: "2020-03-06T06:38:54Z"
  generation: 1
  name: grafanadatasources.integreatly.org
  resourceVersion: "3293725"
  selfLink: /apis/apiextensions.k8s.io/v1/customresourcedefinitions/grafanadatasources.integreatly.org
  uid: 3e52e79c-92d6-4bd0-854f-8de848ddc2e5
spec:
  conversion:
    strategy: None
  group: integreatly.org
  names:
    kind: GrafanaDataSource
    listKind: GrafanaDataSourceList
    plural: grafanadatasources
    singular: grafanadatasource
  preserveUnknownFields: true
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              datasources:
                items:
                  description: Grafana Datasource Object
                  type: object
                type: array
              name:
                minimum: 1
                type: string
            required:
            - datasources
            - name
    served: true
    storage: true
    subresources:
      status: {}
status:
  acceptedNames:
    kind: GrafanaDataSource
    listKind: GrafanaDataSourceList
    plural: grafanadatasources
    singular: grafanadatasource
  conditions:
  - lastTransitionTime: "2020-03-06T06:38:54Z"
    message: '[spec.validation.openAPIV3Schema.properties[spec].type: Required value:
      must not be empty for specified object fields, spec.validation.openAPIV3Schema.type:
      Required value: must not be empty at the root]'
    reason: Violations
    status: "True"
    type: NonStructuralSchema
  - lastTransitionTime: "2020-03-06T06:38:54Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2020-03-06T06:38:54Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1alpha1

The grafanadataresources created:

automaton@ip-10-101-33-203:~$ k get grafanadatasources stress-prometheus -o yaml
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  creationTimestamp: "2020-03-06T06:42:58Z"
  generation: 1
  name: stress-prometheus
  namespace: default
  resourceVersion: "3294557"
  selfLink: /apis/integreatly.org/v1alpha1/namespaces/default/grafanadatasources/stress-prometheus
  uid: ae95210f-ffab-4b8d-9c14-a47cfeb5def0
spec:
  datasources:
  - access: proxy
    isDefault: true
    jsonData:
      timeInterval: 5s
      tlsSkipVerify: true
    name: stress-prometheus
    secureJsonData: {}
    type: prometheus
    url: http://stress-prometheus:9090
    version: 1
  name: middleware.yaml
status:
  message: success
  phase: reconciling

When the issue is happening, there are 2 grafanadatasources:

automaton@ip-10-101-33-203:~$ k get grafanadatasources -o wide
NAME                           AGE
prometheus-grafanadatasource   25m
stress-prometheus              261d

Once the CRD grafanadatasources.integreatly.org was deleted, the grafana pod would become running without problem.

┆Issue is synchronized with this Jira Bug by Unito
┆friendlyId: K8SSAND-129
┆priority: Medium

K8SSAND-131 ⁃ Restoring to a new cluster fails

Bug Report

Describe the bug
Restoring a backup to a new cluster fails. medusa-restore initContainer fails which result in the Cassandra pods never being able to initialize. Here is an example of the output from kubectl get pods:

$ kubectl -n dev-1 get pods
NAME                                                              READY   STATUS                  RESTARTS   AGE
backup-restore-test-grafana-operator-k8ssandra-7f5448578c-s4xml   1/1     Running                 0          2d4h
backup-restore-test-medusa-operator-k8ssandra-79dbbc77b6-5wp2t    1/1     Running                 0          41h
backup-restore-test-reaper-k8ssandra-ddbbfc9db-97hjs              1/1     Running                 0          2d4h
backup-restore-test-reaper-k8ssandra-schema-zzjjg                 0/1     Completed               4          2d4h
backup-restore-test-reaper-operator-k8ssandra-74757d5d49-hdp2d    1/1     Running                 0          2d4h
grafana-deployment-58958f6dcf-tfc99                               1/1     Running                 0          6h2m
k8ssandra-dc1-default-sts-0                                       3/3     Running                 0          6h25m
k8ssandra-dc1-default-sts-1                                       3/3     Running                 0          5h43m
k8ssandra-dc1-default-sts-2                                       3/3     Running                 0          5h52m
k8ssandra-dc1-restored-default-sts-0                              0/3     Init:CrashLoopBackOff   20         81m
k8ssandra-dc1-restored-default-sts-1                              0/3     Init:CrashLoopBackOff   20         81m
k8ssandra-dc1-restored-default-sts-2                              0/3     Init:CrashLoopBackOff   20         81m

Notice the three with a status of Init:CrashLoopBackOff. Those are due to the failed restores.

To Reproduce
Steps to reproduce the behavior:

Create a cluster with backup-restore enabled.

$ helm install backup-restore-test k8ssandra/k8ssandra-cluster -f backup-restore-values.yaml -n dev-1

where backup-restore-values.yaml looks like:

name: restore-test
size: 3
backupRestore:
  medusa:
    enabled: true
    bucketName: k8ssandra-medusa-dev
    bucketSecret: medusa-bucket-key
    multiTenant: false
    storage: s3

Note that you need to have the S3 bucket configured beforehand.

Create a backup.

$ helm install backup-1 k8ssandra/backup -f backup-values.yaml -n dev-1

where backup-values.yaml looks like:

name: backup-1
cassandraDatacenter:
  name: dc1

Wait for the backup to finish. The S3 bucket will have 3 top-level folders,

k8ssandra-dc1-default-sts-0.k8ssandra-dc1-service.dev-1.svc.cluster.local/
k8ssandra-dc1-default-sts-1.k8ssandra-dc1-service.dev-1.svc.cluster.local/
k8ssandra-dc1-default-sts-2.k8ssandra-dc1-service.dev-1.svc.cluster.local/

Create a restore that targets a new cluster.

$ helm install restore-1 k8ssandra/restore -f restore-values.yaml -n dev-1

where restore-values.yaml looks like:

name: restore-1
backup:
  name: backup-1
cassandraDatacenter:
  name: dc1-restored
  clusterName: k8ssandra
inPlace: false

Expected behavior
A new CassandraDatacenter named dc1-restored should be created and initialized.

Additional context
The problem or at least part of the problem is that medusa-operator is specifying the name of the new CassandraDatacenter for the restore operation. Medusa tries to fetch data using incorrect URL. Here is an example from one of the failed medusa-restore initContainer logs:

DEBUG:root:[Storage] Getting object k8ssandra-dc1-restored-default-sts-0.k8ssandra-dc1-restored-service.dev-1.svc.cluster.local/backup-1/meta/schema.cql
[2020-11-23 20:46:01,600] DEBUG: [Storage] Getting object k8ssandra-dc1-restored-default-sts-0.k8ssandra-dc1-restored-service.dev-1.svc.cluster.local/backup-1/meta/schema.cql
DEBUG:urllib3.connectionpool:Resetting dropped connection: s3.amazonaws.com
[2020-11-23 20:46:01,602] DEBUG: Resetting dropped connection: s3.amazonaws.com
DEBUG:urllib3.connectionpool:https://s3.amazonaws.com:443 "HEAD /k8ssandra-medusa-dev HTTP/1.1" 200 0
[2020-11-23 20:46:01,692] DEBUG: https://s3.amazonaws.com:443 "HEAD /k8ssandra-medusa-dev HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Resetting dropped connection: s3.amazonaws.com
[2020-11-23 20:46:01,694] DEBUG: Resetting dropped connection: s3.amazonaws.com
DEBUG:urllib3.connectionpool:https://s3.amazonaws.com:443 "HEAD /k8ssandra-medusa-dev/k8ssandra-dc1-restored-default-sts-0.k8ssandra-dc1-restored-service.dev-1.svc.cluster.local/backup-1/meta/schema.cql HTTP/1.1" 404 0
[2020-11-23 20:46:01,789] DEBUG: https://s3.amazonaws.com:443 "HEAD /k8ssandra-medusa-dev/k8ssandra-dc1-restored-default-sts-0.k8ssandra-dc1-restored-service.dev-1.svc.cluster.local/backup-1/meta/schema.cql HTTP/1.1" 404 0
ERROR:root:No such backup
[2020-11-23 20:46:01,790] ERROR: No such backup

Notice it is querying /k8ssandra-medusa-dev/k8ssandra-dc1-restored-default-sts-0.k8ssandra-dc1-restored-service.dev-1.svc.cluster.local.

It should be /k8ssandra-medusa-dev/k8ssandra-dc1-default-sts-0.k8ssandra-dc1-restored-service.dev-1.svc.cluster.local

┆Issue is synchronized with this Jira Bug by Unito
┆friendlyId: K8SSAND-131
┆priority: Medium

Stargate Integration

Is your feature request related to a problem? Please describe.
N/A

Describe the solution you'd like
Deploy Stargate (https://github.com/stargate/stargate) as part of K8ssandra to enable easy API integration with the underlying Cassandra cluster.

Describe alternatives you've considered
N/A

Additional context
Legacy reference: Jira

Research feasibility of using GitHub Actions for executing integration testing

We want to add some more CI/CD integration, but for several things we want to add, we'll need more than the 2cpu/7gb limits available to us on GitHub actions.

@jsanda mentioned that "self-hosted runners" are a thing -- we might be able to host something on our internal infrastructure that can take jobs from GitHub and publish results back afterward.

This issue encompasses the research effort to determine what is possible and whether it helps us.

K8SSAND-76 ⁃ Upgrading from versions 0.12.0 and earlier of the k8ssandra and k8ssandra-cluster charts fails

Bug Report

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Make sure you have the latest charts:

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "datastax" chart repository
...Successfully got an update from the "k8ssandra" chart repository
...Successfully got an update from the "incubator" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "traefik" chart repository
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈

$ helm search repo k8ssandra
NAME                       	CHART VERSION	APP VERSION	DESCRIPTION
k8ssandra/k8ssandra        	0.14.0       	3.11.7     	Configures and provisions the full k8ssandra stack
k8ssandra/k8ssandra-cluster	0.14.0       	3.11.7     	Configures and creates a k8ssandra cluster
k8ssandra/backup           	0.14.0       	0.1.0      	Creates a CassandraBackup
k8ssandra/restore          	0.14.0       	0.1.0      	Creates a CassandraRestore

helm install k8ssandra k8ssandra/k8ssandra --version 0.12.0
helm install upgrade-test k8ssandra/k8ssandra-cluster --version 0.12.0
helm upgrade k8ssandra k8ssandra/k8ssandra
Upgrading upgrade-test fails:

$ helm upgrade upgrade-test k8ssandra/k8ssandra-cluster
Error: UPGRADE FAILED: cannot patch "upgrade-test-reaper-k8ssandra" with kind Reaper: Reaper.reaper.cassandra-reaper.io "upgrade-test-reaper-k8ssandra" is invalid: [spec.serverConfig.cassandraBackend.cassandraService: Required value, spec.serverConfig.cassandraBackend.clusterName: Required value]

Expected behavior
helm upgrade should work without error.

Kubernetes version information:
$ kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.9", GitCommit:"4fb7ed12476d57b8437ada90b4f93b17ffaeed99", GitTreeState:"clean", BuildDate:"2020-07-15T16:18:16Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.13-gke.600", GitCommit:"3415f17fc893455f55dd732b92b7036a59609135", GitTreeState:"clean", BuildDate:"2020-10-20T23:58:27Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}

Additional context
Reaper CRD changes were made for #45 which were introduced in version 0.13.0 or 0.14.0 of the charts. The errr above says that the spec.serverConfig.cassandraBackend.clusterName field is required; however, it was removed from the CRD.

The real problem is that Helm does not add/update CRDs with helm upgrade. See this ticket for details.

We need to implement a hook to ensure any CRD changes are applied.

┆Issue is synchronized with this Jira Bug by Unito
┆fixVersions: k8ssandra-1.2.0
┆friendlyId: K8SSAND-76
┆priority: Medium

Template unit test

Feature Description:
Implement unit test utilizing Terratestand Testify that verifies the creation of the k8ssandra-cluster chart using its defined template cassdc.

Reference Template
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: {{ .Values.name }}
spec:
clusterName: {{ .Values.clusterName }}
serverType: cassandra
serverVersion: "3.11.7"
managementApiAuth:
insecure: {}
size: 1
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
config:
jvm-options:
initial_heap_size: "800M"
max_heap_size: "800M"

k8ssandra / k8ssandra Goto Github PK

k8ssandra's Introduction

[DEPRECATED]

K8ssandra

Components

Apache Cassandra

Stargate

Monitoring

Repairs

Backup & Restore

Next Steps

Dependencies

k8ssandra's People

Contributors

Stargazers

Watchers

Forkers

k8ssandra's Issues

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Bug Report

Recommend Projects

Recommend Topics

Recommend Org