Bug Report
Describe the bug
I am trying k8ssandra on my own K8S cluster, all look good but just the grafana pod is stuck:
automaton@ip-10-101-33-203:~$ k get pod
NAME READY STATUS RESTARTS AGE
cass-operator-86d4dc45cd-gv997 1/1 Running 0 7m56s
grafana-deployment-847954b9fc-lhkbh 0/1 CrashLoopBackOff 5 3m22s
k8ssandra-cluster-a-grafana-operator-k8ssandra-5b89b64f4f-j2hh6 1/1 Running 0 7m45s
k8ssandra-cluster-a-reaper-k8ssandra-847c99ccd8-gqrtd 1/1 Running 0 2m46s
k8ssandra-cluster-a-reaper-k8ssandra-schema-jjjt7 0/1 Completed 3 3m32s
k8ssandra-cluster-a-reaper-operator-k8ssandra-5db8b7c5b7-xz6q6 1/1 Running 0 7m45s
k8ssandra-dc1-default-sts-0 2/2 Running 0 7m44s
k8ssandra-tools-kube-prome-operator-6bcdf668d4-t8gdl 1/1 Running 0 7m56s
prometheus-k8ssandra-cluster-a-prometheus-k8ssandra-0 2/2 Running 1 7m44s
As per the logs:
t=2020-11-23T01:12:43+0000 lvl=eror msg="Server shutdown" logger=server reason="Service init failed: Datasource provisioning error: datasource.yaml config is invalid. Only one datasource per organization can be marked as default"
Checked the grafana-datasources configmap, did see 2 entries for datasource:
apiVersion: v1
data:
default_prometheus-grafanadatasource.yaml: |
apiVersion: 1
datasources:
- access: proxy
editable: true
isDefault: true
jsonData:
timeInterval: 5s
name: Prometheus
secureJsonData: {}
type: prometheus
url: http://k8ssandra-cluster-a-prometheus-k8ssandra.default:9090
version: 1
default_stress-prometheus.yaml: |
apiVersion: 1
datasources:
- access: proxy
isDefault: true
jsonData:
timeInterval: 5s
tlsSkipVerify: true
name: stress-prometheus
secureJsonData: {}
type: prometheus
url: http://stress-prometheus:9090
version: 1
kind: ConfigMap
...
Tried the below and issue still persisted:
- Delete the grafana pod and deployment
- Uninstall and re-install the whole k8ssandra cluster via helm
The only working workaround is to remove the 2nd entry (default_stress-prometheus.yaml) in the grafana-datasources configmap, the pod would become running ready right away:
automaton@ip-10-101-33-203:~$ k get pods
NAME READY STATUS RESTARTS AGE
cass-operator-86d4dc45cd-gv997 1/1 Running 0 62m
grafana-deployment-847954b9fc-xk6z6 1/1 Running 0 37m
k8ssandra-cluster-a-grafana-operator-k8ssandra-5b89b64f4f-j2hh6 1/1 Running 0 61m
k8ssandra-cluster-a-reaper-k8ssandra-847c99ccd8-gqrtd 1/1 Running 0 56m
k8ssandra-cluster-a-reaper-k8ssandra-schema-jjjt7 0/1 Completed 3 57m
k8ssandra-cluster-a-reaper-operator-k8ssandra-5db8b7c5b7-xz6q6 1/1 Running 0 61m
k8ssandra-dc1-default-sts-0 2/2 Running 0 61m
k8ssandra-tools-kube-prome-operator-6bcdf668d4-t8gdl 1/1 Running 0 62m
prometheus-k8ssandra-cluster-a-prometheus-k8ssandra-0 2/2 Running 1 61m
To Reproduce
Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
The grafana pod should not be stuck, the grafana-datasources configmap should have only 1 datasource that was created by the k8ssandra-tools-kube-prome-operator.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
$ helm ls -A
automaton@ip-10-101-33-203:~$ helm ls -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cass-operator my-custom-namespace 1 2020-05-07 04:41:19.536867338 +0000 UTC deployed cass-operator-1.0.0
demo-guestbook default 1 2020-05-06 07:24:34.859635188 +0000 UTC deployed guestbook-1.1.0 2.0
k8ssandra-cluster-a default 1 2020-11-23 00:57:32.999960191 +0000 UTC deployed k8ssandra-cluster-0.10.0 3.11.7
k8ssandra-tools default 1 2020-11-23 00:57:19.158429227 +0000 UTC deployed k8ssandra-0.10.0 3.11.7
wordpress-1601961064 default 1 2020-10-06 05:11:07.14275057 +0000 UTC deployed wordpress-9.0.3 5.3.2
- Helm charts user-supplied values
$ helm get values RELEASE_NAME
automaton@ip-10-101-33-203:~$ helm get values k8ssandra-cluster-a
USER-SUPPLIED VALUES:
null
automaton@ip-10-101-33-203:~$ helm get values k8ssandra-tools
USER-SUPPLIED VALUES:
null
- Kubernetes version information:
kubectl version
automaton@ip-10-101-33-203:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Additional context
The 2nd datasource entry default_stress-prometheus.yaml in the grafana-datasources configmap might come from a crd from a lastpickle-stress test I did long time ago, verifying this now and will provide more details once found:
default_stress-prometheus.yaml: |
apiVersion: 1
datasources:
- access: proxy
isDefault: true
jsonData:
timeInterval: 5s
tlsSkipVerify: true
name: stress-prometheus
secureJsonData: {}
type: prometheus
url: http://stress-prometheus:9090
version: 1
Helm version:
automaton@ip-10-101-33-203:~$ helm version
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.11"}
Found this CRD which defines the 2nd entry of grafana-datasource, which was part of the thelastpickle-stress test cluster created long ago (cluster was terminated but the crd was not removed):
automaton@ip-10-101-33-203:~$ k get crd grafanadatasources.integreatly.org -o yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apiextensions.k8s.io/v1beta1","kind":"CustomResourceDefinition","metadata":{"annotations":{},"name":"grafanadatasources.integreatly.org"},"spec":{"group":"integreatly.org","names":{"kind":"GrafanaDataSource","listKind":"GrafanaDataSourceList","plural":"grafanadatasources","singular":"grafanadatasource"},"scope":"Namespaced","subresources":{"status":{}},"validation":{"openAPIV3Schema":{"properties":{"apiVersion":{"type":"string"},"kind":{"type":"string"},"metadata":{"type":"object"},"spec":{"properties":{"datasources":{"items":{"description":"Grafana Datasource Object","type":"object"},"type":"array"},"name":{"minimum":1,"type":"string"}},"required":["datasources","name"]}}}},"version":"v1alpha1"}}
creationTimestamp: "2020-03-06T06:38:54Z"
generation: 1
name: grafanadatasources.integreatly.org
resourceVersion: "3293725"
selfLink: /apis/apiextensions.k8s.io/v1/customresourcedefinitions/grafanadatasources.integreatly.org
uid: 3e52e79c-92d6-4bd0-854f-8de848ddc2e5
spec:
conversion:
strategy: None
group: integreatly.org
names:
kind: GrafanaDataSource
listKind: GrafanaDataSourceList
plural: grafanadatasources
singular: grafanadatasource
preserveUnknownFields: true
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
properties:
datasources:
items:
description: Grafana Datasource Object
type: object
type: array
name:
minimum: 1
type: string
required:
- datasources
- name
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: GrafanaDataSource
listKind: GrafanaDataSourceList
plural: grafanadatasources
singular: grafanadatasource
conditions:
- lastTransitionTime: "2020-03-06T06:38:54Z"
message: '[spec.validation.openAPIV3Schema.properties[spec].type: Required value:
must not be empty for specified object fields, spec.validation.openAPIV3Schema.type:
Required value: must not be empty at the root]'
reason: Violations
status: "True"
type: NonStructuralSchema
- lastTransitionTime: "2020-03-06T06:38:54Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
- lastTransitionTime: "2020-03-06T06:38:54Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
storedVersions:
- v1alpha1
- The grafanadataresources created:
automaton@ip-10-101-33-203:~$ k get grafanadatasources stress-prometheus -o yaml
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
creationTimestamp: "2020-03-06T06:42:58Z"
generation: 1
name: stress-prometheus
namespace: default
resourceVersion: "3294557"
selfLink: /apis/integreatly.org/v1alpha1/namespaces/default/grafanadatasources/stress-prometheus
uid: ae95210f-ffab-4b8d-9c14-a47cfeb5def0
spec:
datasources:
- access: proxy
isDefault: true
jsonData:
timeInterval: 5s
tlsSkipVerify: true
name: stress-prometheus
secureJsonData: {}
type: prometheus
url: http://stress-prometheus:9090
version: 1
name: middleware.yaml
status:
message: success
phase: reconciling
When the issue is happening, there are 2 grafanadatasources:
automaton@ip-10-101-33-203:~$ k get grafanadatasources -o wide
NAME AGE
prometheus-grafanadatasource 25m
stress-prometheus 261d
Once the CRD grafanadatasources.integreatly.org was deleted, the grafana pod would become running without problem.
┆Issue is synchronized with this Jira Bug by Unito
┆friendlyId: K8SSAND-129
┆priority: Medium