Giter Club home page Giter Club logo

kube-prometheus's Introduction

kube-prometheus

Build Status Slack Gitpod ready-to-code

Note that everything is experimental and may change significantly at any time.

This repository collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

The content of this project is written in jsonnet. This project could both be described as a package as well as a library.

Components included in this package:

This stack is meant for cluster monitoring, so it is pre-configured to collect metrics from all Kubernetes components. In addition to that it delivers a default set of dashboards and alerting rules. Many of the useful dashboards and alerts come from the kubernetes-mixin project, similar to this project it provides composable jsonnet as a library for users to customize to their needs.

Prerequisites

You will need a Kubernetes cluster, that's it! By default it is assumed, that the kubelet uses token authentication and authorization, as otherwise Prometheus needs a client certificate, which gives it full access to the kubelet, rather than just the metrics. Token authentication and authorization allows more fine grained and easier access control.

This means the kubelet configuration must contain these flags:

  • --authentication-token-webhook=true This flag enables, that a ServiceAccount token can be used to authenticate against the kubelet(s). This can also be enabled by setting the kubelet configuration value authentication.webhook.enabled to true.
  • --authorization-mode=Webhook This flag enables, that the kubelet will perform an RBAC request with the API to determine, whether the requesting entity (Prometheus in this case) is allowed to access a resource, in specific for this project the /metrics endpoint. This can also be enabled by setting the kubelet configuration value authorization.mode to Webhook.

This stack provides resource metrics by deploying the Prometheus Adapter. This adapter is an Extension API Server and Kubernetes needs to be have this feature enabled, otherwise the adapter has no effect, but is still deployed.

Compatibility

The following Kubernetes versions are supported and work as we test against these versions in their respective branches. But note that other versions might work!

kube-prometheus stack Kubernetes 1.22 Kubernetes 1.23 Kubernetes 1.24 Kubernetes 1.25 Kubernetes 1.26 Kubernetes 1.27 Kubernetes 1.28
release-0.10 x x x
release-0.11 x x x
release-0.12 x x x
release-0.13 x
main x x

Quickstart

Note: For versions before Kubernetes v1.21.z refer to the Kubernetes compatibility matrix in order to choose a compatible branch.

This project is intended to be used as a library (i.e. the intent is not for you to create your own modified copy of this repository).

Though for a quickstart a compiled version of the Kubernetes manifests generated with this library (specifically with example.jsonnet) is checked into this repository in order to try the content out quickly. To try out the stack un-customized run:

  • Create the monitoring stack using the config in the manifests directory:
# Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources
# Note that due to some CRD size we are using kubectl server-side apply feature which is generally available since kubernetes 1.22.
# If you are using previous kubernetes versions this feature may not be available and you would need to use kubectl create instead.
kubectl apply --server-side -f manifests/setup
kubectl wait \
	--for condition=Established \
	--all CustomResourceDefinition \
	--namespace=monitoring
kubectl apply -f manifests/

We create the namespace and CustomResourceDefinitions first to avoid race conditions when deploying the monitoring components. Alternatively, the resources in both folders can be applied with a single command kubectl apply --server-side -f manifests/setup -f manifests, but it may be necessary to run the command multiple times for all components to be created successfully.

  • And to teardown the stack:
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup

minikube

To try out this stack, start minikube with the following command:

$ minikube delete && minikube start --kubernetes-version=v1.23.0 --memory=6g --bootstrapper=kubeadm --extra-config=kubelet.authentication-token-webhook=true --extra-config=kubelet.authorization-mode=Webhook --extra-config=scheduler.bind-address=0.0.0.0 --extra-config=controller-manager.bind-address=0.0.0.0

The kube-prometheus stack includes a resource metrics API server, so the metrics-server addon is not necessary. Ensure the metrics-server addon is disabled on minikube:

$ minikube addons disable metrics-server

Getting started

Before deploying kube-prometheus in a production environment, read:

  1. Customizing kube-prometheus
  2. Customization examples
  3. Accessing Graphical User Interfaces
  4. Troubleshooting kube-prometheus

Documentation

  1. Continuous Delivery
  2. Update to new version
  3. For more documentation on the project refer to docs/ directory.

Contributing

To contribute to kube-prometheus, refer to Contributing.

Join the discussion

If you have any questions or feedback regarding kube-prometheus, join the kube-prometheus discussion. Alternatively, consider joining the kubernetes slack #prometheus-operator channel or project's bi-weekly Contributor Office Hours.

License

Apache License 2.0, see LICENSE.

kube-prometheus's People

Contributors

adinhodovic avatar arajkumar avatar arthursens avatar blizter avatar brancz avatar dependabot[bot] avatar dgrisonnet avatar eedugon avatar fabxc avatar gianrubio avatar goll avatar johanneswuerbach avatar jolson490 avatar kakkoyun avatar karlskewes avatar lilic avatar maxbrunet avatar metalmatze avatar mxinden avatar paulfantom avatar pgier avatar philipgough avatar prom-op-bot avatar rajatvig avatar raptorsun avatar rutsky avatar s-urbaniak avatar simonpasquier avatar squat avatar tpalfalvi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-prometheus's Issues

I don't understand the readme ,did you install the all yaml under the manifestes

What did you do?
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests

I don't understand the readme ,did you install the all yaml under the manifestes

What did you expect to see?
how to install the kube-prometheus

What did you see instead? Under which circumstances?

Environment

  • Kubernetes version information:

    insert output of kubectl version here

  • Kubernetes cluster kind:

    insert how you created your cluster: kops, bootkube, tectonic-installer, etc.

  • Manifests:

insert manifests relevant to the issue
  • Prometheus Operator Logs:
insert Prometheus Operator logs relevant to the issue here

Grafana 4.1.1

Is there a reason the version of grafana used is 3.1.1 still? I'm going to try with 4.1.1, as it appears to be needed for some dashboards I'm trying to use, but wanted to check if there are known issues with upgrading from 3.1.1.

Running `deploy` results in error "invalid character '\x00' in string literal"

Environment:
I'm running k8s 1.5.1. Kubectl 1.5.2. Running kubectl from my mac.
k8s running on a cluster of Ubuntu 14.04 instances.

Any config maps I try to load from the coreos team seem to have this same issue. When I try to create the resource it gives the following error:

Error from server (BadRequest): error when creating "manifests/grafana/grafana-cm.yaml": the object provided is unrecognized (must be of type ConfigMap): couldn't get version/kind; json parse error: invalid character '\x00' in string literal ({"kind":"ConfigMap","apiVersio ...)

Prometheus alerts immediately with Helm install

I just installed prometheus-operator using Helm with the following commands:

helm install coreos/prometheus-operator --name prometheus-operator --namespace monitoring
helm install coreos/kube-prometheus --name kube-prometheus --set global.rbacEnable=true --namespace monitoring

Every came up looking good. It's able to see all targets. However, I am getting the following alerts.

image

Controller manager and schedulers are definitely running.

kube-system     kube-controller-manager-kube-master-01.mydomain.com    1/1       Running   1          23h
kube-system     kube-controller-manager-kube-master-02.mydomain.com    1/1       Running   0          23h
kube-system     kube-controller-manager-kube-master-03.mydomain.com    1/1       Running   0          23h
kube-system     kube-scheduler-kube-master-01.mydomain.com             1/1       Running   1          59m
kube-system     kube-scheduler-kube-master-02.mydomain.com             1/1       Running   0          23h
kube-system     kube-scheduler-kube-master-03.mydomain.com             1/1       Running   0          23h

Issue #61 seems to be a similar issue but I am unclear on what I need to do to fix them.

Evaluate used memory as "Total - MemFree - Buffers - Cached" and display stacked Used/Buffers/Cached/Free

Currently "Used" memory is evaluated as "Total - MemFree":
https://github.com/coreos/kube-prometheus/blob/615f45e/assets/grafana/node-dashboard.json#L308

During typical Linux server usage "free memory" value is very small: kernel caches all disk I/O and usually uses almost all free memory for different types of caches, so currently Grafana almost always will show that > 90 % of memory is "used", which from some point of view is true, but useless.

Much better to show in separate actually privately used by processes memory, size of buffers, size of caches and size of free memory:

          "targets": [
            {
              "expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}",
              "intervalFactor": 2,
              "refId": "A",
              "legendFormat": "memory used",
            },
            {
              "expr": "node_memory_Buffers{instance=\"$server\"}",
              "intervalFactor": 2,
              "refId": "B",
              "legendFormat": "memory buffers",
            },
            {
              "expr": "node_memory_Cached{instance=\"$server\"}",
              "intervalFactor": 2,
              "refId": "C",
              "legendFormat": "memory cache"
            },
            {
              "expr": "node_memory_MemFree{instance=\"$server\"}",
              "intervalFactor": 2,
              "refId": "D",
              "legendFormat": "memory free",
            }
          ],

This is done in prometheus dashboard by default: https://github.com/prometheus/prometheus/blob/bed4635/consoles/node-overview.html#L99

Question: kube-prometheus "as a library"

Hi. I think this is great! But in the docs it says "Although this project is intended to be used as a library". What's wrong with just deploying kubectl create -f manifests/? Is there something wrong with that deployment? It all seemed to work pretty well, and was usable.

Thanks

Adding Grafana dashboards

I'm attempting to import additional grafana dashboards by adding them to the configmap but they are not showing up.

The dashboards work if I import them as files from the grafana UI, the dashboards in question are these two https://github.com/deadtrickster/beam-dashboards

But when I copied their contents as BEAM.json: |- ... and Elli.json: |- ... to the configmap and apply it they never show up. I know that the watcher is working because in the changes I also included a new datasource in the configmap and that did show up as expected in Grafana.

Is there a difference between the expected input and the files one would upload to grafana?

Any help you can give would be much appreciated, thanks!

feature request: customize existing prometheus alerts

To provide an e.g.: I grabbed a copy of build.sh, created try-edit-prom-alert.jsonnet, and then did:

josh@MyMac$ jb init
josh@MyMac$ jb install github.com/coreos/prometheus-operator/contrib/kube-prometheus/jsonnet/kube-prometheus
josh@MyMac$ cat try-edit-prom-alert.jsonnet
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + {
  _config+:: {
    namespace: 'monitoring',
  },
  prometheusAlerts+:: {
    groups+: [{
      name: 'alertmanager.rules',
      rules: [
        {
          alert: 'AlertmanagerFailedReload',
          annotations: {
            description: "Reloading Alertmanager's configuration has failed for {{ $labels.namespace }}/{{ $labels.pod}}.",
            summary: "Alertmanager's configuration reload failed",
          },
          expr: |||
            a_different_metric{%(alertmanagerSelector)s} == 0
          ||| % $._config,
          'for': '10m',
          labels: {
            severity: 'warning',
          },
        },
      ], // rules
    },], // groups
  }, // prometheusAlerts
};

{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) }
josh@MyMac$ ./build.sh try-edit-prom-alert.jsonnet
josh@MyMac$ cd manifests
josh@MyMac$ diff -w -b ~/prometheus-operator_clone/prometheus-rules.yaml .
861a862,873
>   - name: alertmanager.rules
>     rules:
>     - alert: AlertmanagerFailedReload
>       annotations:
>         description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
>           }}/{{ $labels.pod}}.
>         summary: Alertmanager's configuration reload failed
>       expr: |
>         a_different_metric{job="alertmanager-main"} == 0
>       for: 10m
>       labels:
>         severity: warning
josh@MyMac$ grep AlertmanagerFailedReload prometheus-rules.yaml
    - alert: AlertmanagerFailedReload
    - alert: AlertmanagerFailedReload

(Note that ~/prometheus-operator_clone/prometheus-rules.yaml is a copy of kube-prometheus/manifests/prometheus-rules.yaml.)

That ^ created a 2nd copy of AlertmanagerFailedReload, but what I would like is to end up with only 1 instance/copy of AlertmanagerFailedReload in manifests/prometheus-rules.yaml - i.e. the copy from try-edit-prom-alert.jsonnet.

Is there already a way to customize prom alerts (that are defined in the mixins used by kube-prometheus) that I'm not aware of? If not, does this sort of functionality seem useful to others too?

Reference info: Developing Prometheus Rules and Grafana Dashboards

Node's "free memory" in Grafana actually shows used memory

I installed Prometheus using the deploy script from 3a38e17.

When I access Grafana at http://MY_NODE:30902/dashboard/db/nodes?var-server=MY_NODE:9100, it displays "Free memory". However, the numbers seemed way off (displaying 5GiB free on a machine that runs almost no services and should have 24GiB of RAM), so I compared with the output of free -m on that node. Grafana appears to actually show the "used memory", not "free memory".

Additionally: Maybe adding another graph to the plot, showing "total memory", would help putting the numbers into context.

How to use other scrape_configs than kube_sd_config

Instead of adding a short static configuration, the current setup creates a dummy service just so that all services get discovered via Kubernetes.

- job_name: etcd
  static_configs:
  - targets: [172.17.4.51:2379]

This indirection seems to be just overhead. I assume it's the result of trying to completely generate prometheus.yml at some point. I think there should still be some way to provide non-kubernetes_sd_configs in that case.

node-exporter TargetDown

What did you do?

Installed prometheus-operator using Helm chart found here: https://github.com/helm/charts/tree/master/stable/prometheus-operator . My GKE test cluster uses preemptible nodes and after nodes are preemptied I start getting alerts from Prometheus that node-exporter targets are down even though all node-exporters are up and running (I see metrics when port-forwarding to them)

Labels | State | Active Since | Value
-- | -- | -- | --
alertname="TargetDown"   job="node-exporter"  severity="warning" | firing | 2018-11-21 10:46:39.941756819 +0000 UTC | 66.66666666666666
❯ kubectl get po -n monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-monitoring-prometheus-oper-alertmanager-0   2/2     Running   0          3h
monitoring-grafana-679df6bd5-vn2wz                       3/3     Running   4          1d
monitoring-kube-state-metrics-764d6d59df-k7829           1/1     Running   0          3h
monitoring-prometheus-node-exporter-4nnjq                1/1     Running   0          1d
monitoring-prometheus-node-exporter-ggvrs                1/1     Running   0          21h
monitoring-prometheus-node-exporter-hggzv                1/1     Running   0          21h
monitoring-prometheus-node-exporter-jwxcn                1/1     Running   0          1d
monitoring-prometheus-oper-operator-55564b6cbb-rdzqm     1/1     Running   0          2h
prometheus-monitoring-prometheus-oper-prometheus-0       3/3     Running   0          2h

What did you expect to see?

After nodes are removed/preemptied and new nodes are added node-exporter targets are refreshed and always pick up new nodes

What did you see instead? Under which circumstances?
At the moment I have 4 nodes in my cluster but metrics in Prometheus are only available for 2 of them

Environment

  • Prometheus Operator version:

    quay.io/coreos/prometheus-operator:v0.25.0

  • Kubernetes version information:

kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T17:05:32Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.7-gke.11", GitCommit:"fa90543563c9cfafca69128ce8cd9ecd5941940f", GitTreeState:"clean", BuildDate:"2018-11-08T20:22:21Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

    GKE cluster created with terraform

  • Manifests:

insert manifests relevant to the issue
  • Prometheus Operator Logs:
haven't found any specific errors 

Custom services datasource

When adding a custom service to monitor the metrics of my apps, like running hack/example-service-monitoring/deploy, I then had to manually add the datasource http://prometheus.default.svc:9090 in the Grafana ui before I could see those metrics.

Is this a required step or did I do something else wrong? I could not tell from the readme one way or the other.

Two alerts failing out of the box: K8SControllerManagerDown and K8SSchedulerDown

Installed with KOPS 1.4.1, K8s 1.4.6 on AWS.

It looks to me like the query is set to alert when there is one kube-scheduler (or kube-contoller-manager), which I don't understand.

ALERT K8SSchedulerDown
  IF absent(up{job="kube-scheduler"}) or (count(up{job="kube-scheduler"} == 1) BY (cluster) == 0)

I’m pretty new to prometheus queries and I’m not really sure how the BY (cluster) == 0) relates.
Any pointers appreciated.
Thanks for the great project!
--Duncan

Alerts failing out of the box: K8SControllerManagerDown and K8SSchedulerDown (on Kargo)

I see similar behavior to #23: after deploying prometheus operator on Kubernetes cluster deployed via Kargo K8SControllerManagerDown and K8SSchedulerDown alerts are firing:

2017-03-10_13 59 09_1

Here is my targets:

2017-03-10_13 59 43_1

Prometheus configuration:

alerting:
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - action: keep
      regex: alertmanager-main
      source_labels:
      - __meta_kubernetes_service_name
    - action: keep
      regex: monitoring
      source_labels:
      - __meta_kubernetes_namespace
    - action: keep
      regex: web
      source_labels:
      - __meta_kubernetes_endpoint_port_name
    scheme: http

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
- /etc/prometheus/rules/*.rules

scrape_configs:
- job_name: kubelets
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # Skip verification until we have resolved why the certificate validation
    # for the kubelet on API server nodes fail.
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: node

# Scrapes the endpoint lists for the Kubernetes API server, kube-state-metrics,
# and node-exporter, which we all consider part of a default setup.
- job_name: standard-endpoints
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # As for kubelets, certificate validation fails for the API server (node)
    # and we circumvent it for now.
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: endpoints

  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_service_name]
    regex: prometheus|node-exporter|kube-state-metrics
  - action: replace
    source_labels: [__meta_kubernetes_service_name]
    target_label: job

# Scrapes the endpoint lists for the kube-dns server. Which we consider
# part of a default setup.
- job_name: kube-components
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: endpoints

  relabel_configs:
  - action: replace
    source_labels: [__meta_kubernetes_service_label_k8s_app]
    target_label: job
  - action: keep
    source_labels: [__meta_kubernetes_service_name]
    regex: ".*-prometheus-discovery"
  - action: keep
    source_labels: [__meta_kubernetes_endpoint_port_name]
    regex: "http-metrics.*|https-metrics.*"
  - action: replace
    source_labels: [__meta_kubernetes_endpoint_port_name]
    regex: "https-metrics.*"
    target_label: __scheme__
    replacement: https

Labels:

$ kubectl -n kube-system get ep --show-labels
NAME                        ENDPOINTS                                                  AGE       LABELS
default-http-backend        10.233.85.8:8080                                           17d       k8s-app=default-http-backend
dnsmasq                     10.233.72.2:53,10.233.77.2:53,10.233.83.2:53 + 5 more...   17d       k8s-app=dnsmasq,kubernetes.io/cluster-service=true
elasticsearch-logging       10.233.77.7:9200,10.233.85.4:9200                          17d       k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Elasticsearch
heapster                    10.233.77.5:8082                                           17d       kubernetes.io/name=Heapster,task=monitoring
ingress-controller-leader   <none>                                                     17d       <none>
kibana-logging              10.233.85.11:5601                                          17d       k8s-app=kibana-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Kibana
kube-controller-manager     <none>                                                     17d       <none>
kube-scheduler              <none>                                                     17d       <none>
kubedns                     10.233.85.3:53,10.233.85.3:53                              17d       k8s-app=kubedns,kubernetes.io/cluster-service=true,kubernetes.io/name=kubedns
kubernetes-dashboard        10.233.85.6:9090                                           17d       app=kubernetes-dashboard
monitoring-grafana          10.233.72.6:3000                                           17d       kubernetes.io/name=monitoring-grafana
monitoring-influxdb         10.233.72.4:8086                                           17d       kubernetes.io/name=monitoring-influxdb,task=monitoring
$ kubectl -n kube-system get svc --show-labels
NAME                    CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE       LABELS
default-http-backend    10.233.50.202   <none>        80/TCP          17d       k8s-app=default-http-backend
dnsmasq                 10.233.0.2      <none>        53/TCP,53/UDP   17d       k8s-app=dnsmasq,kubernetes.io/cluster-service=true
elasticsearch-logging   10.233.17.19    <none>        9200/TCP        17d       k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Elasticsearch
heapster                10.233.20.165   <none>        80/TCP          17d       kubernetes.io/name=Heapster,task=monitoring
kibana-logging          10.233.7.135    <none>        5601/TCP        17d       k8s-app=kibana-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Kibana
kubedns                 10.233.0.3      <none>        53/UDP,53/TCP   17d       k8s-app=kubedns,kubernetes.io/cluster-service=true,kubernetes.io/name=kubedns
kubernetes-dashboard    10.233.27.199   <nodes>       80:31167/TCP    17d       app=kubernetes-dashboard
monitoring-grafana      10.233.38.226   <none>        80/TCP          17d       kubernetes.io/name=monitoring-grafana
monitoring-influxdb     10.233.27.33    <none>        8086/TCP        17d       kubernetes.io/name=monitoring-influxdb,task=monitoring
kubectl -n kube-system get pods --show-labels
NAME                                    READY     STATUS    RESTARTS   AGE       LABELS
default-http-backend-2657704409-f958m   1/1       Running   0          8d        k8s-app=default-http-backend,pod-template-hash=2657704409
dnsmasq-dfnq5                           1/1       Running   0          17d       k8s-app=dnsmasq
dnsmasq-f6gjj                           1/1       Running   0          17d       k8s-app=dnsmasq
dnsmasq-h0x93                           1/1       Running   0          17d       k8s-app=dnsmasq
dnsmasq-h8f72                           1/1       Running   0          17d       k8s-app=dnsmasq
elasticsearch-logging-v1-1zm9f          1/1       Running   2          8d        k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,version=v2.4.1
elasticsearch-logging-v1-62t37          1/1       Running   0          17d       k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,version=v2.4.1
flannel-wetta-kuber01                   1/1       Running   1          17d       app=flannel,version=v0.1
flannel-wetta-kuber02                   1/1       Running   1          17d       app=flannel,version=v0.1
flannel-wetta-kuber03                   1/1       Running   1          8d        app=flannel,version=v0.1
flannel-wetta-noaaweather               1/1       Running   1          17d       app=flannel,version=v0.1
fluentd-es-v1.22-5rcp8                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
fluentd-es-v1.22-6pt68                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
fluentd-es-v1.22-rp33r                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
fluentd-es-v1.22-s3v4s                  1/1       Running   0          17d       k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
heapster-564189836-szfbx                1/1       Running   0          15d       k8s-app=heapster,pod-template-hash=564189836,task=monitoring
kibana-logging-3982467769-hwbm0         1/1       Running   0          20h       k8s-app=kibana-logging,pod-template-hash=3982467769
kube-apiserver-wetta-kuber01            1/1       Running   0          17d       k8s-app=kube-apiserver,kargo=v2
kube-apiserver-wetta-kuber02            1/1       Running   0          17d       k8s-app=kube-apiserver,kargo=v2
kube-controller-manager-wetta-kuber01   1/1       Running   1          17d       k8s-app=kube-controller
kube-controller-manager-wetta-kuber02   1/1       Running   0          17d       k8s-app=kube-controller
kube-proxy-wetta-kuber01                1/1       Running   1          17d       k8s-app=kube-proxy
kube-proxy-wetta-kuber02                1/1       Running   1          17d       k8s-app=kube-proxy
kube-proxy-wetta-kuber03                1/1       Running   1          8d        k8s-app=kube-proxy
kube-proxy-wetta-noaaweather            1/1       Running   1          17d       k8s-app=kube-proxy
kube-scheduler-wetta-kuber01            1/1       Running   0          17d       k8s-app=kube-scheduler
kube-scheduler-wetta-kuber02            1/1       Running   1          17d       k8s-app=kube-scheduler
kubedns-m6x4j                           3/3       Running   0          17d       k8s-app=kubedns,kubernetes.io/cluster-service=true,version=v19
kubernetes-dashboard-3203831700-hb0qp   1/1       Running   0          17d       app=kubernetes-dashboard,pod-template-hash=3203831700
monitoring-grafana-1176657932-k2d53     1/1       Running   0          8d        k8s-app=grafana,pod-template-hash=1176657932,task=monitoring
monitoring-influxdb-957705310-6qdkn     1/1       Running   0          15d       k8s-app=influxdb,pod-template-hash=957705310,task=monitoring
nginx-ingress-controller-1jff3          1/1       Running   0          17d       k8s-app=nginx-ingress-controller
nginx-ingress-controller-5dz1x          1/1       Running   0          17d       k8s-app=nginx-ingress-controller
nginx-proxy-wetta-kuber03               1/1       Running   1          8d        k8s-app=kube-nginx
nginx-proxy-wetta-noaaweather           1/1       Running   1          17d       k8s-app=kube-nginx

Need to deploy Operator with restricted rights in a K8S cluster

What did you do?
Hello,

I Deployed Prometheus with Operator following the "getting started" guide from the project site (https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html)
And it worked correctly but after this I needed to deploy it on another cluster which is using RBAC and is more restrictive. The administrator asked me to change RBAC files to reduce rights granted to Operator.

Is it possible to have Prometheus Operator running with a minimum of granted rights (cluster wide one) and is it safe enouth if I reduce rights by myself in RBAC declaration (with DENY logs in apiserver logs) ?
Thank's for your help.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-operator
subjects:
- kind: ServiceAccount
  name: prometheus-operator
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: prometheus-operator
  namespace: monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus-operator
subjects:
- kind: ServiceAccount
  name: prometheus-operator
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus-operator
rules:
- apiGroups:
  - extensions
  resources:
  - thirdpartyresources
  verbs:
  - "get"
- apiGroups:
  - monitoring.coreos.com
  resources:
  - alertmanagers
  - prometheuses
  - servicemonitors
  verbs:
  - "*"
- apiGroups: [""]
  resources:
  - nodes
  verbs: ["list", "watch"]
- apiGroups: [""]
  resources:
  - namespaces
  verbs: ["list"]
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  verbs: ["list", "watch"]
- apiGroups: [""]
  resources:
  - services
  - endpoints
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  namespace: monitoring
  name: prometheus-operator
rules:
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs: ["list", "watch", "create", "update", "delete"]
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  verbs: ["list", "get", "watch", "create", "update", "delete"]
- apiGroups: [""]
  resources:
  - pods
  verbs: ["list", "delete"]
- apiGroups: [""]
  resources:
  - services
  - endpoints
  verbs: ["get", "create", "update"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-operator
  namespace: monitoring
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: monitoring
spec:
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        image: quay.io/coreos/prometheus-operator:v0.15.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
        resources:
          limits:
            cpu: 200m
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 50Mi
      serviceAccountName: prometheus-operator

What did you expect to see?
I expected Operator to be used with less rights granted on cluster wide.

What did you see instead? Under which circumstances?
I see some errors in apiserver logs according to Operator trying to use verbs with cluster wide rights.

RBAC DENY: user "system:serviceaccount:monitoring:prometheus-operator" groups ["system:serviceaccounts" "system:serviceaccounts:monitoring" "system:authenticated"] cannot "list" resource "configmaps" cluster-wide
RBAC DENY: user "system:serviceaccount:monitoring:prometheus-operator" groups ["system:serviceaccounts" "system:serviceaccounts:monitoring" "system:authenticated"] cannot "list" resource "secrets" cluster-wide
RBAC DENY: user "system:serviceaccount:monitoring:prometheus-operator" groups ["system:serviceaccounts" "system:serviceaccounts:monitoring" "system:authenticated"] cannot "watch" resource "secrets" cluster-wide
RBAC DENY: user "system:serviceaccount:monitoring:prometheus-operator" groups ["system:serviceaccounts" "system:serviceaccounts:monitoring" "system:authenticated"] cannot "watch" resource "configmaps" cluster-wide

Environment

  • Kubernetes version information:

    insert output of kubectl version here

  • Kubernetes cluster kind:

    insert how you created your cluster: kops, bootkube, tectonic-installer, etc.

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2+coreos.0", GitCommit:"c6574824e296e68a20d36f00e71fa01a81132b66", GitTreeState:"clean", BuildDate:"2017-07-24T23:28:22Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2+coreos.0", GitCommit:"c6574824e296e68a20d36f00e71fa01a81132b66", GitTreeState:"clean", BuildDate:"2017-07-24T23:28:22Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

  • Manifests:
insert manifests relevant to the issue
  • Prometheus Operator Logs:
No error visible in Prometheus Operator logs

Can I set the time zone to CST or other time zones, because utc is too unfriendly to us. I've tried changing the container's time zone. it's no good.

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

  • Prometheus Operator version:

    insert image tag or Git SHA here

  • Kubernetes version information:

    insert output of kubectl version here

  • Kubernetes cluster kind:

    insert how you created your cluster: kops, bootkube, tectonic-installer, etc.

  • Manifests:

insert manifests relevant to the issue
  • Prometheus Operator Logs:
insert Prometheus Operator logs relevant to the issue here

Grafana username/password

Hi All

I have tried installing Prometheus operator code onto Kubernetes Cluster managed by Minikube and I was able to get Grafana working. I installed via "kubectl create -f manifests/"

Now I'm wondering that I'm not able to login to Grafana with username and password as "admin". Could you please advise on the same

Best Regards
Sujith P V

prometheus-k8s-service-coredns-metrics throws an error due to namespace not matching monitoring

What did you do?

./hack/cluster-monitoring/deploy

What did you expect to see?

all manifests deployed properly

What did you see instead? Under which circumstances?

prometheus-k8s-service-coredns-metrics throws an error due to metadata.namespace definining kube-system

namespace "monitoring" created
clusterrolebinding "prometheus-operator" configured
clusterrole "prometheus-operator" configured
serviceaccount "prometheus-operator" created
service "prometheus-operator" created
deployment "prometheus-operator" created
Waiting for Operator to register custom resource definitions...done!
clusterrolebinding "node-exporter" configured
clusterrole "node-exporter" configured
daemonset "node-exporter" created
serviceaccount "node-exporter" created
service "node-exporter" created
clusterrolebinding "kube-state-metrics" configured
clusterrole "kube-state-metrics" configured
deployment "kube-state-metrics" created
rolebinding "kube-state-metrics" created
role "kube-state-metrics-resizer" created
serviceaccount "kube-state-metrics" created
service "kube-state-metrics" created
secret "grafana-credentials" created
secret "grafana-credentials" unchanged
configmap "grafana-dashboard-definitions-0" created
configmap "grafana-dashboards" created
configmap "grafana-datasources" created
deployment "grafana" created
service "grafana" created
servicemonitor "node-exporter" created
servicemonitor "prometheus" created
servicemonitor "kube-apiserver" created
servicemonitor "kube-controller-manager" created
servicemonitor "kube-state-metrics" created
service "prometheus-k8s" created
error: the namespace from the provided object "kube-system" does not match the namespace "monitoring". You must pass '--namespace=kube-system' to perform this operation.
servicemonitor "prometheus-operator" created
servicemonitor "kubelet" created
serviceaccount "prometheus-k8s" created
configmap "prometheus-k8s-rules" created
servicemonitor "alertmanager" created
servicemonitor "coredns" created
servicemonitor "kube-scheduler" created
prometheus "k8s" created
role "prometheus-k8s" created
role "prometheus-k8s" unchanged
role "prometheus-k8s" unchanged
clusterrole "prometheus-k8s" configured
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" unchanged
rolebinding "prometheus-k8s" unchanged
clusterrolebinding "prometheus-k8s" configured
secret "alertmanager-main" created
service "alertmanager-main" created
alertmanager "main" created

Environment

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4+coreos.0", GitCommit:"4292f9682595afddbb4f8b1483673449c74f9619", GitTreeState:"clean", BuildDate:"2017-11-21T17:22:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

  • Manifests:
prometheus-k8s-service-coredns-metrics

In Kubernetes Capacity Planning Dashboard of Grafana, no of pods are not showing beyond 1300.

What did you do?
We did a scalability test in a cluster . .

What did you expect to see?
No of pods available in the cluster utilization and pod utilization section in "Kubernetes Capacity Planning" dashboard

What did you see instead? Under which circumstances?
Even though we have more than 1300 pods in the cluster. In Grafana Dashboard, it is not showing beyond 1300.
image

Environment

  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.5", GitCommit:"f01a2bf98249a4db383560443a59bed0c13575df", GitTreeState:"clean", BuildDate:"2018-03-19T15:59:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:14:26Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
	
  • Kubernetes cluster kind:

Kubernetes Cluster in AWS.

  • Manifests:
insert manifests relevant to the issue
  • Prometheus Operator Logs:
insert Prometheus Operator logs relevant to the issue here

error: the path "manifests/prometheus-operator" does not exist

What did you do?

git clone https://github.com/coreos/prometheus-operator
cd prometheus-operator/contrib/kube-prometheus/
export NAMESPACE='monitoring'
kubectl create namespace "$NAMESPACE"

and then

kubectl --namespace="$NAMESPACE" apply -f manifests/prometheus-operator

What did you expect to see?

install prometheus-operator

What did you see instead? Under which circumstances?

error: the path "manifests/prometheus-operator" does not exist

Environment

  • Prometheus Operator version:

    master branch

  • Kubernetes version information:

    1.11.3

  • Kubernetes cluster kind:

    kubeadm

  • Manifests:

[root@k8s-m1 manifests]# ls
00namespace-namespace.yaml                                         alertmanager-serviceAccount.yaml            kube-state-metrics-role.yaml            prometheus-roleBindingConfig.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml    alertmanager-serviceMonitor.yaml            kube-state-metrics-serviceAccount.yaml  prometheus-roleBindingSpecificNamespaces.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml      alertmanager-service.yaml                   kube-state-metrics-serviceMonitor.yaml  prometheus-roleConfig.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml  grafana-dashboardDatasources.yaml           kube-state-metrics-service.yaml         prometheus-roleSpecificNamespaces.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml  grafana-dashboardDefinitions.yaml           node-exporter-clusterRoleBinding.yaml   prometheus-rules.yaml
0prometheus-operator-clusterRoleBinding.yaml                       grafana-dashboardSources.yaml               node-exporter-clusterRole.yaml          prometheus-serviceAccount.yaml
0prometheus-operator-clusterRole.yaml                              grafana-deployment.yaml                     node-exporter-daemonset.yaml            prometheus-serviceMonitorApiserver.yaml
0prometheus-operator-deployment.yaml                               grafana-serviceAccount.yaml                 node-exporter-serviceAccount.yaml       prometheus-serviceMonitorCoreDNS.yaml
0prometheus-operator-serviceAccount.yaml                           grafana-service.yaml                        node-exporter-serviceMonitor.yaml       prometheus-serviceMonitorKubeControllerManager.yaml
0prometheus-operator-serviceMonitor.yaml                           kube-state-metrics-clusterRoleBinding.yaml  node-exporter-service.yaml              prometheus-serviceMonitorKubelet.yaml
0prometheus-operator-service.yaml                                  kube-state-metrics-clusterRole.yaml         prometheus-clusterRoleBinding.yaml      prometheus-serviceMonitorKubeScheduler.yaml
alertmanager-alertmanager.yaml                                     kube-state-metrics-deployment.yaml          prometheus-clusterRole.yaml             prometheus-serviceMonitor.yaml
alertmanager-secret.yaml                                           kube-state-metrics-roleBinding.yaml         prometheus-prometheus.yaml              prometheus-service.yaml

Monitoring the control plane running on EC2 nodes

What did you do?
Attempted to deploy the Prometheus operator to both minikube and a standard Kubernetes cluster

What did you expect to see?
Metrics come in, targets listed

What did you see instead? Under which circumstances?
Prometheus works fine in the minikube environment. However, there is no data at all present in the version deployed to a regular k8s cluster. No targets and no datapoints.

Environment

  • Kubernetes version information:
    Minikube cluster:
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T11:40:06Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

EC2 cluster:

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:36:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Self hosted cluster running on AWS EC2. Our HA control plane runs as systemd managed units across 3 EC2 nodes

  • Manifests:

I am applying the manifests using the prometheus-operator/contrib/kube-prometheus/hack/cluster-monitoring/deploy script

  • Prometheus Operator Logs:

The operator deploys as expected and creates all the resources. However, the Prometheus dashboard remains empty, with ALERTS being the only query-able metric with the following output:

ALERTS{alertname="DeadMansSwitch",alertstate="firing",severity="none"} | 1
ALERTS{alertname="K8SApiserverDown",alertstate="pending",severity="critical"} | 1
ALERTS{alertname="K8SControllerManagerDown",alertstate="firing",severity="critical"} | 1
ALERTS{alertname="K8SKubeletDown",alertstate="pending",severity="critical"} | 1
ALERTS{alertname="K8SSchedulerDown",alertstate="firing",severity="critical"} | 1
ALERTS{alertname="NodeExporterDown",alertstate="firing",severity="warning"} | 1

With minikube, the "control plane" endpoints are static and I can always use an endpoint object as follows:

kind: Endpoints
apiVersion: v1
metadata:
  namespace: monitoring
  name: kube-controller-manager-prometheus-discovery
subsets:
  - addresses:
      - ip: 192.168.99.100 # minikube static master node ip address
    ports:
      - name: http-metrics
        port: 10252
        protocol: TCP

To me it seems like at this time the documentation exclusively caters to clusters where the controller-manager and scheduler run as pods and can be targeted with label selectors

My question is this: Is there a concrete example for setting up the Prometheus operator in an cluster that has been provisioned in this way - with the api-server controller-manager and scheduler not running as pods but rather as systemd units on the controller nodes themselves? And how would I monitor the control plane when those Node IPs are likely to change several times every month (upgrading and rolling the control plane AMIs), meaning hardcoded endpoint IPs are not an option?

Thanks!

Not able see the data of Pods/Deployment/Statefulset on the pre-configured grafana.

What did you do?
I have just installed the kube-prometheus on my cluster.

What did you expect to see?
Pre-Configured Dashboard for the Pods/Deployment/Statefulset/etc

What did you see instead? Under which circumstances?
I can see node related graph, but when it comes to pods related workload not able to see anything. Its just giving NA.

Environment
Pre-Prod

  • Prometheus Operator version:

    insert image tag or Git SHA here

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit
:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"201
7-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitComm
it:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2
018-08-20T09:56:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

    insert how you created your cluster: acs-engine.

  • Manifests:

insert manifests relevant to the issue
  • Prometheus Operator Logs:
insert Prometheus Operator logs relevant to the issue here
  • Node Exported Logs
time="2018-09-26T11:32:58Z" level=info msg="Listening on :9100" source="node_exporter.go:111"
time="2018-09-26T11:34:37Z" level=error msg="ERROR: infiniband collector failed after 0.000680s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:34:37Z" level=error msg="ERROR: infiniband collector failed after 0.001399s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:34:43Z" level=error msg="ERROR: infiniband collector failed after 0.000661s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:34:43Z" level=error msg="ERROR: infiniband collector failed after 0.000999s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:35:13Z" level=error msg="ERROR: infiniband collector failed after 0.021550s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:35:13Z" level=error msg="ERROR: infiniband collector failed after 0.000814s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:35:43Z" level=error msg="ERROR: infiniband collector failed after 0.020590s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:35:43Z" level=error msg="ERROR: infiniband collector failed after 0.001209s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:36:13Z" level=error msg="ERROR: infiniband collector failed after 0.022243s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:36:13Z" level=error msg="ERROR: infiniband collector failed after 0.041703s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:36:43Z" level=error msg="ERROR: infiniband collector failed after 0.000838s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:36:43Z" level=error msg="ERROR: infiniband collector failed after 0.001064s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:37:13Z" level=error msg="ERROR: infiniband collector failed after 0.001676s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
time="2018-09-26T11:37:13Z" level=error msg="ERROR: infiniband collector failed after 0.001240s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"```


* Kube state Metrics Logs
```I0926 11:33:07.800467       1 main.go:186] Starting metrics server: 0.0.0.0:8080
E0926 11:33:07.902006       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list secrets at the cluster scope
E0926 11:33:07.903804       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list namespaces at the cluster scope
E0926 11:33:07.999175       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list configmaps at the cluster scope
E0926 11:33:07.999687       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v2beta1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope
E0926 11:33:08.000150       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list endpoints at the cluster scope
E0926 11:33:08.000194       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list persistentvolumes at the cluster scope
E0926 11:33:08.904261       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list secrets at the cluster scope
E0926 11:33:08.905088       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list namespaces at the cluster scope
E0926 11:33:09.101081       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list configmaps at the cluster scope
E0926 11:33:09.101081       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v2beta1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope
E0926 11:33:09.101286       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list endpoints at the cluster scope
E0926 11:33:09.203038       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list persistentvolumes at the cluster scope
E0926 11:33:09.999401       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list secrets at the cluster scope
E0926 11:33:10.000321       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list namespaces at the cluster scope
E0926 11:33:10.103061       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list configmaps at the cluster scope
E0926 11:33:10.200734       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v2beta1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope
E0926 11:33:10.201435       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list endpoints at the cluster scope
E0926 11:33:10.300657       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:monitoring:kube-state-metrics" cannot list persistentvolumes at the cluster scope```

kube-prometheus: need to add Toleration to Prometheus

Hi,

I'am trying to add a toleration on prometheus, to make sure it's scheduled on specific nodes on the Kubernetes cluster. Those nodes are specifically provision For Prometheus hence the need for "Toleration" and "NodeAffinity" because those nodes have been tainted.

local k  = import 'ksonnet/ksonnet.beta.3/k.libsonnet';

local pvc = k.core.v1.persistentVolumeClaim;
local toleration = k.core.v1.tolerationsType;

local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + 
           (import 'kube-prometheus/kube-prometheus-kops.libsonnet') + {
  _config+:: {
    namespace: 'monitoring',
    prometheus+:: {
      namespaces+: ['dev', 'int', 'uat'],
      replicas+: 1,
      prometheus+: {
         retention: "30d",
         storage: {
           volumeClaimTemplate:
              pvc.new() +
              pvc.mixin.spec.withAccessModes('ReadWriteOnce') +
              pvc.mixin.spec.resources.withRequests({ storage: '50Gi' }) +
              pvc.mixin.spec.withStorageClassName('gp2'),
         },
         tolerations:
              toleration.new() +
              toleration.withKey('') +
              toleration.withValue('') +
              toleration.withEffect('')

      }
    },
  },
};

I'm not really sure I'm doing the right thing here as I'm new to Jsonnet and all that Jazz . but I'm looking forward to learn Jsonnet syntaxe and concept & use it more and more in my project. How would you go about defining Toleration and Affinity in this contexte ?

cheers

kube-Prometheus installation failed on K8 initiated with kubeadm, maybe due to problem in prometheus-adapter-apiService.yaml ??

What did you do?
try to deploy kube-Prometheus on K8 cluster iniated with kubeadm with "kubectl create -f manifests/ || true" with no luck.

I am not fully aware of how APIservice object work, but is there any relation between prometheus-adapter-apiService.yaml and metrics-apiservice.yaml which is deployed before with metric-server. or it is another problem.

What did you expect to see?
kube-Prometheus installed successfully.

What did you see instead? Under which circumstances?
the kube-state-metrics and prometheus-operator pods keep crashing as below,

NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE     IP           NODE      NOMINATED NODE   READINESS GATES
monitoring    pod/grafana-6689854d5-7npcp                 1/1     Running   0          86s     10.46.0.3    worker2   <none>           <none>
monitoring    pod/kube-state-metrics-86bc74fd4c-j7c7h     4/4     Error     2          86s     10.40.0.2    master    <none>           <none>
monitoring    pod/node-exporter-6b6d9                     2/2     Running   0          86s     172.1.1.11   worker2   <none>           <none>
monitoring    pod/node-exporter-lfsw8                     2/2     Running   0          86s     172.1.1.9    master    <none>           <none>
monitoring    pod/node-exporter-zzcth                     2/2     Running   0          86s     172.1.1.10   worker1   <none>           <none>
monitoring    pod/prometheus-adapter-5d7cff4b-2gqg9       1/1     Running   0          85s     10.32.0.6    worker1   <none>           <none>
monitoring    pod/prometheus-operator-5cfb7f4c54-kfl4r    1/1     Error     2          90s     10.40.0.1    master    <none>           <none>

Environment

  • Prometheus Operator version:
Prometheus Operator version '0.26.0'."
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

    kubeadm.

  • Manifests:

### prometheus-adapter-apiService.yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: prometheus-adapter
    namespace: monitoring
  version: v1beta1
  versionPriority: 100
###metrics-apiservice.yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

  • Prometheus Operator Logs:
root@master:~# kubectl logs pod/prometheus-operator-5cfb7f4c54-kfl4r -n monitoring
ts=2018-12-11T13:27:48.738481288Z caller=main.go:165 msg="Starting Prometheus Operator version '0.26.0'."
ts=2018-12-11T13:28:18.826524292Z caller=main.go:253 msg="Unhandled error received. Exiting..." err="communicating with server failed: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout"

Any plan to support s390x

Since 1.6.0 alpha 1, s390x is included into Kubernetes official build, any plan for cube-prometheus to support s390x?

No kind "Prometheus" is registered

I'm trying to deploy prometheus using the manifests on this repo.

the TPRs are created

alertmanager.monitoring.coreos.com      Managed Alertmanager cluster          v1alpha1
prometheus.monitoring.coreos.com        Managed Prometheus server             v1alpha1
service-monitor.monitoring.coreos.com   Prometheus monitoring for a service   v1alpha1

but when I try to apply manifests/prometheus/prometheus-k8s.yaml :

 no kind "Prometheus" is registered for version "monitoring.coreos.com/v1alpha1"

the same goes for:

error: unable to decode "manifests/alertmanager/alertmanager.yaml": no kind "Alertmanager" is registered for version "

Can someone address me how to figure it out why this is happening ?

Replacing custom Prometheus config

I was thinking of what steps to take to get rid of the "custom" Prometheus config and have it make use of the means provided by the prometheus-operator.

Basically there is a bit of work left in bootkube regarding making all components self-hosted, but besides that I was trying to come up with a strategy on how to use Prometheus and prometheus-operator means to discover the kubernetes components. We will need to actively make sure that all kubernetes and tectonic components are properly labelled. As of right now the components are very well labelled with k8s-app=<component-name>.

If we want to utilize the ServiceMonitors then we will need to create and maintain Service manifests for all kubernetes components - even those that don't actually need a Service. This is why I think we should get started with the PodMonitor (prometheus-operator/prometheus-operator#38), it will allow us to discover all components without having to maintain those Service manifests because we can discover them based on Pods having the k8s-app label, and have a section in the PodMonitor for a value to be used as the job label (same as the ServiceMonitor's jobLabel) and use the value of k8s-app for it.

Taking this a bit further, we should probably also start thinking of how we can have the Prometheus monitoring Kubernetes components only access the kube-system namespace, with what will hopefully result from prometheus/prometheus#2280.

@fabxc @alexsomesan

Issues accessing AlertManager and Grafana over Nginx Ingress

I've deployed prometheus-operator 0.0.26 and kube-prometheus 0.0.83 Helm charts.

However when I try to browse to them over NginX Ingress

  • I hit the default backend for alertmanager
  • grafana doesn't render properly in the Browser.

I don't have this problem with the prometheus UI.
Also, I can browse to alertmanager and grafana using a NodePort. So the problem is just via Ingress. I have a DNS foo.bar.com pointing at master IP. Prometheus, AlertManager and Grafana Ingress are all using the same host: - foo.bar.com

Browse to or curl alertmanager from master node and get a 404.

curl -L -H "HOST: foo.bar.com" localhost:80/alertmanager
404 page not found
# The nginx ingress log
kubectl -n kube-system logs -f nginx-ingress-controller-controller-28vw4
::1 - [::1] - - [10/Jul/2018:12:19:56 +0000] "GET /alertmanager HTTP/1.1" 404 19 "-" "curl/7.29.0" 92 0.001 [monitoring-kube-prometheus-alertmanager-9093] 10.244.3.2:9093 19 0.001 404
# describe alertmanager service
kubectl describe service/kube-prometheus-alertmanager -n monitoring

Name:                     kube-prometheus-alertmanager
Namespace:                monitoring
Labels:                   alertmanager=kube-prometheus
                          app=alertmanager
                          chart=alertmanager-0.1.2
                          heritage=Tiller
                          release=kube-prometheus
Annotations:              <none>
Selector:                 alertmanager=kube-prometheus,app=alertmanager
Type:                     NodePort
IP:                       10.110.173.254
Port:                     http  9093/TCP
TargetPort:               9093/TCP
NodePort:                 http  30903/TCP
Endpoints:                10.244.3.2:9093
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Browse to grafana via Ingress and get:

image

kubectl -n monitoring describe service/kube-prometheus-grafana
Name:                     kube-prometheus-grafana
Namespace:                monitoring
Labels:                   app=kube-prometheus-grafana
                          chart=grafana-0.0.35
                          heritage=Tiller
                          release=kube-prometheus
Annotations:              <none>
Selector:                 app=kube-prometheus-grafana
Type:                     NodePort
IP:                       10.97.214.244
Port:                     http  80/TCP
TargetPort:               3000/TCP
NodePort:                 http  30902/TCP
Endpoints:                10.244.2.7:3000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Dashboard to big for ConfigMap wrapper

What did you do?
Made custom modified jsonnet grafana deployment adding multiple dashboards
What did you expect to see?
the Dasboard in grafana
What did you see instead? Under which circumstances?
Error when applying config map
The ConfigMap "grafana-dashboard-node-exporter-full" is invalid: metadata.annotations: Too long: must have at most 262144 characters
Environment
DEV

  • Prometheus Operator version:

    insert image tag or Git SHA here

  • Kubernetes version information:
    v1.11.2
    insert output of kubectl version here
    Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:37:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

    insert how you created your cluster: kops, bootkube, tectonic-installer, etc.

  • Manifests:


alertmanager/alertmanager-main's config-volume is called cm/alertmanager-alertmanager-main

After running deploy:

Volumes:
  config-volume:
    Type:       ConfigMap (a volume populated by a ConfigMap)
    Name:       alertmanager-alertmanager-main
...
...: timeout expired waiting for volumes to attach/mount for pod "monitoring"/"alertmanager-alertmanager-main-0". list of unattached/unmounted volumes=[config-volume]

This happens, because alertmanager/alertmanager-main expects its config map to be named cm/alertmanager-alertmanager-main, while it is actually created as cm/alertmanager-main by kube-prometheus/.../deploy.

Might be implicitly fixed by #33, even though it is already broken right now and the behavioural change was not introduced by prometheus-operator-0.5.

KOPS 1.5.0-beta2, Kubernetes 1.5.2, AWS: Only api cert error TODO from deploy, self-hosted setup.

Alerts:

Alerts
K8SNodeDown (1 active)
K8SControllerManagerDown (1 active)
K8SSchedulerDown (1 active)
K8SKubeletDown (1 active)
K8SKubeletNodeExporterDown (1 active)

Targets:

Targets
kube-state-metrics
Endpoint	State	Labels	Last Scrape	Error
http://100.96.1.6:8080/metrics
UP	instance="100.96.1.6:8080"	12.302s ago	
kubelets
Endpoint	State	Labels	Last Scrape	Error
https://10.52.52.101:10250/metrics
UP	instance="ip-10-52-52-101.ec2.internal"	2.076s ago	
https://10.52.79.77:10250/metrics
DOWN	instance="ip-10-52-79-77.ec2.internal"	11.63s ago	context deadline exceeded
node-exporter
Endpoint	State	Labels	Last Scrape	Error
http://10.52.52.101:9100/metrics
UP	instance="10.52.52.101:9100"	14.497s ago	
http://10.52.79.77:9100/metrics
DOWN	instance="10.52.79.77:9100"	11.705s ago	Get http://10.52.79.77:9100/metrics: dial tcp 10.52.79.77:9100: i/o timeout
prometheus
Endpoint	State	Labels	Last Scrape	Error
http://100.96.1.8:9090/metrics
UP	instance="100.96.1.8:9090"	5.053s ago	

K8SApiserverDown alert always triggers due to querying on non-existent job name

What did you do?
Installed version 0.0.17 of kube-prometheus via helm.

What did you expect to see?
Everything installed fine. I do, however, get the K8SApiserverDown alert even though the api is up.
When looking at my targets, I see the apiserver has a job name of "kubernetes". The alert queries for a job name of "apiserver".

The alert in question is in helm/exporter-kubernetes/templates/kubernetes.rules.yaml

  - alert: K8SApiserverDown
    expr: absent(up{job="apiserver"} == 1)
    for: 20m
    labels:
      severity: critical
    annotations:
      description: No API servers are reachable or all have disappeared from service
        discovery
      summary: No API servers are reachable

When looking at the prometheus configuration file, I see you always use __meta_kubernetes_service_name as the job name. In my cluster, the service name is "kubernetes" not "apiserver". This naming mismatch causes the K8SApiserverDown alert to remain in a triggered state.

  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}

apiserver_scrape_target

Environment

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-27T00:13:02Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.11", GitCommit:"b13f2fd682d56eab7a6a2b5a1cab1a3d2c8bdd55", GitTreeState:"clean", BuildDate:"2017-11-25T17:51:39Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:
    custom install via ansible.

Is this naming unique to my cluster or should the job name be changed in the exporter-kubernetes alerting rule?

kube-prometheus filling 20GB disk with 1d retention

We are running kube-prometheus deployed using your helm charts with pretty much the default configuration and have it set to a 1 day retention. We are seeing a 20GB persistent volume being filled within a week and this doesn't feel right. Our understanding was that we wouldn't need anywhere near 20GB, can anyone help us debug what is being written and why?

configuring horizontal pod scaling along with the integration of Prometheus.

What did you do?
I am trying to configure horizontal pod scaling along with the integration of Prometheus. In this scenario how Can I add custom metrics for that. Any idea? Thanks in advance.
What did you expect to see?

What did you see instead? Under which circumstances?

Environment

  • Kubernetes version information:

    insert output of kubectl version here

  • Kubernetes cluster kind:

    insert how you created your cluster: kops, bootkube, tectonic-installer, etc.

  • Manifests:

insert manifests relevant to the issue
  • Prometheus Operator Logs:
insert Prometheus Operator logs relevant to the issue here

Migration to Helm charts status

I saw that moving kube-prometheus over to Helm is on the roadmap in the README, is there a discussion on this anywhere or WIP? I'd be interested in helping out in whatever capacity. I'm getting close to starting this for my own use

Rate queries aren't working as expected.

As an example, node_network_receive_bytes{device!~"lo"} shows the following.

screen shot 2017-02-22 at 6 27 00 pm

However, when rate(node_network_receive_bytes{device!~"lo"}[10m]) is queried, no results are returned.

screen shot 2017-02-22 at 6 27 37 pm

Grafana overwrite every dashboard on deploy

I am trying to use grafana helm deployed by kube-prometheus with a DB to back grafana state. In each deploy every dashboard is overwritten.

I understand the intention of this grafana deploy is to have a stateless service with config and dahbaord pragmatically deployed but this can cause lost of data in particular with WIP dashboards or very complex ones.

My suggestion is: Create a "kube-prometheus" folder in grafana, and keep in it every kube-prometheus related dashboard, and in each deploy only overwrite this folder (or other defined folders)

Alerts firing: ControllerManager, Scheduler and TargetDown

What did you do?
I installed prometheus-operator and kube-prometheus using helm:

helm install coreos/prometheus-operator --name prometheus-operator
helm install coreos/kube-prometheus --name kube-prometheus --set rbacEnable=true

What did you expect to see?
Everything green in Alert Manager

What did you see instead? Under which circumstances?
Some Alerts are firing:

  • K8s Scheduler
  • K8S Controller
  • NodeDiskRuningFull
  • TargetDown

Environment
GKE

  • Kubernetes version information:

    Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-11T23:27:35Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.5-gke.0", GitCommit:"2c2a807131fa8708abc92f3513fe167126c8cce5", GitTreeState:"clean", BuildDate:"2017-12-19T20:05:45Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

    I used terraform to create the cluster on GKE

  • Prometheus Operator Logs:
    No Errors nor warnings

I guess somehow these targets get not scraped. Can you help me out on how to solve this issue please? Thanks

KubeClientCertificateExpiration always alert

What did you do?
wget https://codeload.github.com/coreos/prometheus-operator/tar.gz/v0.23.1 install prometheus-operator use kubectl create -f prometheus-operator-0.23.1/contrib/kube-prometheus/manifests/ || true

What did you expect to see?
all components work correctly.

Environment

  • K8s version: v1.11.0

  • Prometheus Operator version: v0.23.1

  • Kubernetes cluster kind:

          install k8s cluster use binary package  https://storage.googleapis.com/kubernetes-release/release/v1.11.0/kubernetes.tar.gz
    
  • Manifests:


[2] Firing
--
Labels
alertname = KubeClientCertificateExpiration
job = apiserver
prometheus = monitoring/k8s
severity = critical
Annotations
message = Kubernetes API certificate is expiring in less than 1 day.runbook_url = https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpirationSource

Labels
alertname = KubeClientCertificateExpiration
job = apiserver
prometheus = monitoring/k8s
severity = warning
Annotations
message = Kubernetes API certificate is expiring in less than 7 days.runbook_url = https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration


I used cfssl generate pem and keys

# openssl x509 -in /etc/kubernetes/ssl/ca.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            6f:b9:70:eb:80:73:e6:73:f9:c8:29:98:99:5e:b5:f2:6d:a3:0e:49
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Validity
            Not Before: Aug  8 09:54:00 2018 GMT
            Not After : Aug  7 09:54:00 2023 GMT
        Subject: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:c8:ae:16:d6:0c:5b:30:95:97:a2:5b:16:cf:db:
                    f1:bd:68:8c:c6:0c:84:5b:a4:46:b4:79:0b:2b:c4:
                    b2:c0:5f:ab:e4:4a:33:46:d3:82:a3:33:bf:a7:f7:
                    ec:a3:4e:b3:70:34:e8:15:24:8e:56:b7:4d:68:9b:
                    e0:dc:0a:3a:3c:36:3e:f2:5c:be:d1:5d:fa:fa:e0:
                    7d:5b:2a:5d:e2:fc:94:9f:ea:a9:ce:ca:ad:2f:fd:
                    16:bc:fb:83:f6:45:fd:2f:9a:ac:94:e3:fd:49:90:
                    a1:31:95:cd:f2:30:2b:cd:31:34:69:b1:3a:b8:6a:
                    b8:7a:ef:f1:e9:ee:a2:5d:81:a8:59:80:77:c1:43:
                    85:3c:29:d8:02:fb:24:b9:9a:1f:e4:61:82:ec:8d:
                    49:3d:91:f7:0a:50:25:b1:a4:51:ba:f3:d6:77:07:
                    e2:50:ed:b8:af:30:18:d8:23:d6:e9:17:b1:a0:1c:
                    8c:74:f3:87:56:08:c7:49:86:c0:90:5e:16:a4:1e:
                    07:49:ef:b2:dc:9e:22:4c:b9:9b:7f:38:47:d7:26:
                    17:15:92:79:51:cc:a9:3f:4b:a1:6d:03:94:5b:9c:
                    03:c0:19:7e:d1:4e:c9:77:84:b1:e4:5b:a6:2b:54:
                    95:d0:a3:ef:39:d6:c3:88:77:af:4f:31:cd:ba:f7:
                    cc:3b
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:2
            X509v3 Subject Key Identifier: 
                BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F
            X509v3 Authority Key Identifier: 
                keyid:BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F

    Signature Algorithm: sha256WithRSAEncryption
         78:b7:65:4d:53:e1:0c:7d:d6:9e:d5:aa:f8:1a:34:e4:1d:c0:
         22:4b:42:72:86:86:e9:73:e2:fd:89:90:e1:10:56:a7:f2:15:
         71:14:79:ce:67:9a:ca:5d:4d:e8:25:3d:70:2a:0a:3b:08:09:
         02:8a:d9:2d:ed:85:cd:10:38:60:75:d7:f5:a7:b2:ee:86:05:
         dd:50:38:04:a4:7a:bc:f5:02:b2:a5:d9:a2:a1:71:7d:e5:ce:
         dd:c8:5a:a7:25:61:de:c3:76:c3:87:3e:5a:4c:eb:36:91:51:
         8b:fc:ef:9d:aa:35:58:3a:ba:fc:2a:3c:4f:b3:54:e8:0d:a5:
         32:25:91:dd:93:75:33:53:2b:94:9e:f1:cb:e9:58:17:a6:dc:
         07:1c:96:5e:93:40:d6:c8:2b:67:49:3b:3f:1f:a8:3a:41:65:
         29:03:f3:18:f9:d3:66:a8:49:14:1e:7f:cb:6b:f6:26:1d:7b:
         6f:46:c6:27:a1:69:fe:62:7f:da:fb:41:7d:fc:ab:12:77:b8:
         b3:4c:92:a5:5c:d2:8c:25:a1:aa:1e:2f:a2:de:38:e5:9a:96:
         2f:b2:bb:3c:32:de:db:7f:80:eb:f0:01:be:2d:ff:00:09:35:
         ea:2b:8d:33:6e:6c:2c:6d:37:a2:c4:b3:c9:eb:ac:3f:ec:e5:
         5d:61:50:66

# openssl x509 -in /etc/kubernetes/ssl/kubernetes.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            76:64:c7:59:95:aa:fb:9b:8c:b2:26:c0:82:24:c5:0a:8d:95:a2:1e
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Validity
            Not Before: Aug  8 09:54:00 2018 GMT
            Not After : Aug  5 09:54:00 2028 GMT
        Subject: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:b9:7e:1b:a9:9a:95:21:42:5a:e8:3e:79:94:e6:
                    c1:35:87:93:22:3d:3c:c9:65:be:b6:99:4b:47:25:
                    1a:22:db:4a:a5:b8:59:0d:2d:a0:0d:e5:c6:35:3b:
                    8e:2c:e3:fe:3a:d9:bc:63:9b:a0:98:c2:26:98:4c:
                    be:8b:71:20:37:a3:19:21:34:03:0b:10:d7:cb:7c:
                    b6:d8:68:90:1b:e1:6b:ee:b8:0e:6f:3d:33:2b:3f:
                    87:9a:4f:6c:59:08:f4:22:a6:2a:b6:d5:d6:00:b8:
                    7e:3c:90:aa:99:5c:6e:7c:93:f2:6b:6a:6f:5b:c6:
                    35:60:e0:14:62:5e:91:cc:20:eb:88:ea:cc:7a:10:
                    d7:f1:5f:b3:fb:aa:c4:a7:f5:95:3e:8a:44:ee:09:
                    12:6b:aa:29:05:40:df:1e:54:25:05:e2:8c:cb:d7:
                    32:e8:c5:ff:0c:48:11:27:c9:52:81:f2:53:b0:82:
                    b0:1b:7f:ad:08:fd:cd:b6:c1:4e:43:da:2d:f0:90:
                    90:cb:97:a2:2a:31:bc:65:2c:9f:a9:72:90:dd:b0:
                    5e:3b:7d:1c:37:d6:ca:22:13:2a:da:27:1d:61:94:
                    8f:36:9f:9d:6a:d1:6c:b9:17:58:5d:9c:0d:b1:d8:
                    2a:98:f1:54:d7:87:c6:da:ff:05:c9:a2:c5:91:5a:
                    77:23
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier: 
                B3:1C:65:F4:DA:61:57:1F:68:06:05:46:36:31:BC:AF:E1:D5:06:7C
            X509v3 Authority Key Identifier: 
                keyid:BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F

            X509v3 Subject Alternative Name: 
                DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:192.168.2.93, IP Address:10.100.0.1, IP Address:192.168.2.86, IP Address:192.168.2.87, IP Address:192.168.2.88
    Signature Algorithm: sha256WithRSAEncryption
         2d:a6:ee:28:71:0f:ea:69:ff:90:25:d6:04:4e:4c:e1:3d:ff:
         34:f1:64:67:4f:ab:80:ee:f5:d9:16:53:48:0c:c4:fd:9a:f0:
         09:13:71:b1:ba:52:b0:36:38:6b:51:be:ac:cc:14:30:2b:e7:
         a9:87:00:76:fe:1a:58:72:45:27:0a:59:51:74:65:6a:30:ea:
         37:f3:c9:79:59:f0:09:87:e9:94:99:00:11:d7:20:9c:90:5c:
         de:ee:09:ff:53:07:41:06:4c:91:8d:8a:d1:d5:ff:30:06:3b:
         53:32:4c:dd:70:f0:22:7f:7d:e6:02:f2:eb:a6:fd:5a:de:d6:
         0d:fa:b5:e9:f0:95:5a:79:bb:f9:b5:a5:47:01:13:3f:b0:12:
         c6:35:11:45:2f:6b:f3:71:26:92:8f:34:90:0f:42:d8:2a:12:
         0f:ad:96:1f:60:54:5c:27:f3:0f:c3:4e:f5:ef:58:75:51:7a:
         df:8c:f3:b2:d4:b8:70:99:ff:e3:5a:ee:a9:00:69:84:a3:c2:
         df:7e:9b:55:e1:ab:92:bb:55:8b:54:6c:aa:05:c4:ea:29:8e:
         56:72:15:11:c2:6e:49:72:b5:d7:30:06:7b:c4:a2:0a:82:87:
         19:83:b7:1e:3a:86:02:35:f5:21:e8:e6:bf:5e:51:c0:ec:f0:
         c1:3d:15:35

# openssl x509 -in /etc/kubernetes/ssl/admin.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            30:7e:a9:d4:1c:0a:04:d7:3b:2a:38:7a:b3:ca:25:fb:65:e3:e6:72
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=CN, ST=Shanghai, L=Shanghai, O=k8s, OU=System, CN=kubernetes
        Validity
            Not Before: Aug  8 09:55:00 2018 GMT
            Not After : Aug  5 09:55:00 2028 GMT
        Subject: C=CN, ST=Shanghai, L=Shanghai, O=system:masters, OU=System, CN=admin
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:cb:10:41:82:61:ec:93:e8:4d:bf:3e:2d:88:45:
                    ce:e8:57:ee:c6:90:8c:a2:e7:7b:16:ae:9e:fc:6e:
                    60:25:5c:f4:26:c2:50:c7:b5:1e:d3:91:d8:54:e9:
                    5b:6f:85:0e:0a:56:2c:e8:4d:69:dc:06:1e:94:92:
                    29:b9:7c:6f:cd:bd:25:13:bf:c9:9b:98:dd:81:f2:
                    0e:df:27:17:75:c9:4f:d8:9a:9c:5c:b0:db:9c:ed:
                    bb:a5:1f:c1:df:85:9a:f9:62:6b:a8:7a:96:69:30:
                    93:2f:e9:e3:16:dc:74:5f:4d:68:5d:e3:05:ae:01:
                    bd:60:72:d0:30:7c:3b:01:7a:13:9f:4c:ef:62:f2:
                    6c:47:6a:25:6f:b4:0c:7a:53:db:78:a4:71:00:c8:
                    6c:a7:c6:39:42:cf:da:e0:20:ce:66:02:36:43:13:
                    5a:56:7d:da:77:ad:01:4f:ab:56:54:6d:b9:27:08:
                    4e:d6:95:8b:cd:90:5f:28:c2:63:de:d8:f9:77:4f:
                    6d:35:02:9b:6c:cf:27:43:8a:47:b0:74:7e:25:c5:
                    6c:2d:7a:4b:e1:49:af:e7:28:d1:e0:3b:2a:21:1d:
                    bd:09:80:f7:4f:ee:a9:23:50:8c:65:55:0b:fd:d8:
                    4b:4b:b3:82:cb:2a:9f:33:c7:d3:88:63:91:ca:f9:
                    e1:a7
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier: 
                DA:B4:8B:36:C7:E9:9C:C0:6E:AC:8D:1F:D6:18:93:76:4D:6E:78:1F
            X509v3 Authority Key Identifier: 
                keyid:BC:9F:D1:BD:4C:26:E1:77:C0:7F:CF:04:3E:DF:64:86:BE:23:F3:7F

    Signature Algorithm: sha256WithRSAEncryption
         2f:69:9c:6f:53:bb:7a:42:e3:4e:8f:b4:17:00:10:90:c3:1c:
         be:68:05:f3:15:6a:aa:0c:53:eb:89:c6:0c:2e:c2:0a:75:14:
         16:09:7e:68:0e:83:5c:c9:79:e0:ab:86:ee:93:d7:de:50:66:
         98:3d:5a:43:e0:7f:dd:dc:8a:b8:83:84:84:d4:0f:a5:c5:a1:
         b2:4a:65:76:15:e7:85:f3:7d:37:ee:e2:50:70:28:85:e8:05:
         05:d1:60:74:40:e2:67:7a:31:32:39:e3:96:e3:5b:fe:5e:eb:
         36:ef:cf:fa:95:37:9c:f1:3a:f5:11:80:e8:80:f9:1c:39:04:
         a0:14:af:e0:e7:ac:ce:6f:ad:4a:f3:e8:24:13:20:72:46:15:
         da:9a:e3:1d:88:c5:3d:93:12:7c:71:d3:77:95:5b:cd:f7:3b:
         b3:33:5d:10:31:7e:d9:ba:0e:ed:c8:61:9a:e7:df:fa:75:f1:
         f4:e5:67:81:be:3b:4a:5d:1e:82:1e:64:f7:16:14:4c:d9:e1:
         09:56:81:f4:64:21:47:79:f2:50:55:bb:e1:28:21:40:22:7d:
         f6:b7:f1:cd:3f:99:e5:96:c9:ee:76:be:03:68:da:7a:94:f5:
         ad:bb:40:66:cc:8c:85:36:91:3d:6a:5e:f6:d8:71:23:9e:f1:
         97:ff:73:ea

my k8s cluster and prometheus seems fine. but KubeClientCertificateExpiration always trigger alert, how do I fix it ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.