carlosedp / cluster-monitoring Goto Github PK

Cluster monitoring stack for clusters based on Prometheus Operator

License: MIT License

Shell 17.20% Makefile 7.01% Jsonnet 75.79%

kubernetes monitoring prometheus grafana prometheus-operator hacktoberfest

cluster-monitoring's Introduction

Cluster Monitoring stack for ARM / X86-64 platforms

The Prometheus Operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.

This have been tested on a hybrid ARM64 / X84-64 Kubernetes cluster deployed as this article.

This repository collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator. The container images support AMD64, ARM64, ARM and PPC64le architectures.

The content of this project is written in jsonnet and is an extension of the fantastic kube-prometheus project.

If you like this project and others I've been contributing and would like to support me, please check-out my Patreon page!

Components included in this package:

The Prometheus Operator
Highly available Prometheus
Highly available Alertmanager
Prometheus node-exporter
kube-state-metrics
CoreDNS
Grafana
SMTP relay to Gmail for Grafana notifications (optional)

There are additional modules (disabled by default) to monitor other components of the infra-structure. These can be enabled or disabled on vars.jsonnet file by setting the module enabled flag in modules to true or false.

The additional modules are:

ARM-exporter to generate temperature metrics (works on some ARM boards like RaspberryPi)
MetalLB metrics
Traefik metrics
ElasticSearch metrics
APC UPS metrics
GMail SMTP relay module

There are also options to set the ingress domain suffix and enable persistence for Grafana and Prometheus.

The ingresses can use TLS with the default self-signed certificate from your Ingress controller by setting TLSingress to true and use a custom certificate by creating the files server.crt and server.key and enabling the UseProvidedCerts parameter at vars.jsonnet.

Persistence for Prometheus and Grafana can be enabled in the enablePersistence section. Setting each to true, creates the volume PVCs. If no PV names are defined in prometheusPV and grafanaPV, the default StorageClass will be used to dynamically create the PVs The sizes can be adjusted in prometheusSizePV and grafanaSizePV.

If using pre-created persistent volumes (samples in samples), check permissions on the directories hosting the files. The UID:GID for Prometheus is 1000:0 and for Grafana is 472:472.

Changing these parameters require a rebuild of the manifests with make followed by make deploy. To avoid installing all pre-requisites like Golang, Jsonnet, Jsonnet-bundler, use the target make docker to build in a container.

Quickstart (non K3s)

The repository already provides a set of compiled manifests to be applied into the cluster or the deployment can be customized thru the jsonnet files.

If you only need the default features and adjust your cluster URL for the ingress, there is no need to rebuild the manifests(and install all tools). Use the change_suffix target with argument suffix=[suffixURL] with the URL of your cluster ingress controller. If you have a local cluster, use the nip.io domain resolver passing your_cluster_ip.nip.io to the suffix argument. After this, just run make deploy.

# Update the ingress URLs
make change_suffix suffix=[suffixURL]

# Deploy
make deploy

To customize the manifests, edit vars.jsonnet and rebuild the manifests.

$ make vendor
$ make
$ make deploy

# Or manually:

$ make vendor
$ make
$ kubectl apply -f manifests/setup/
$ kubectl apply -f manifests/

If you get an error from applying the manifests, run the make deploy or kubectl apply -f manifests/ again. Sometimes the resources required to apply the CRDs are not deployed yet.

If you enable the SMTP relay for Gmail in vars.jsonnet, the pod will be in an error state after deployed since it would not find the user and password on the "smtp-account" secret. To generate, run the scripts/create_gmail_auth.sh script.

Quickstart on Minikube

You can also test and develop the monitoring stack on Minikube. First install minikube by following the instructions here for your platform. Then, follow the instructions similar to the non-K3s deployment:

# Start minikube (if not started)
minikube start

# Enable minikube ingress to allow access to the web interfaces
minikube addons enable ingress

# Get the minikube instance IP
minikube ip

# Run the change_suffix target
make change_suffix suffix=[minikubeIP.nip.io]

# or customize additional params on vars.jsonnet and rebuild
make vendor
make

# and deploy the manifests
make deploy

# Get the URLs for the exposed applications and open in your browser
kubectl get ingress -n monitoring

Quickstart for K3s

To deploy the monitoring stack on your K3s cluster, there are four parameters that need to be configured in the vars.jsonnet file:

Set k3s.enabled to true.
Change your K3s master node IP(your VM or host IP) on k3s.master_ip parameter.
Edit suffixDomain to have your node IP with the .nip.io suffix or your cluster URL. This will be your ingress URL suffix.
Set traefikExporter enabled parameter to true to collect Traefik metrics and deploy dashboard.

After changing these values to deploy the stack, run:

$ make vendor
$ make
$ make deploy

# Or manually:

$ make vendor
$ make
$ kubectl apply -f manifests/setup/
$ kubectl apply -f manifests/

If you get an error from applying the manifests, run the make deploy or kubectl apply -f manifests/ again. Sometimes the resources required to apply the CRDs are not deployed yet.

Ingress

Now you can open the applications:

To list the created ingresses, run kubectl get ingress --all-namespaces, if you added your cluster IP or URL suffix in vars.jsonnet before rebuilding the manifests, the applications will be exposed on:

Grafana on https://grafana.[your_node_ip].nip.io,
Prometheus on https://prometheus.[your_node_ip].nip.io
Alertmanager on https://alertmanager.[your_node_ip].nip.io

Updating the ingress suffixes

To avoid rebuilding all manifests, there is a make target to update the Ingress URL suffix to a different suffix. Run make change_suffix suffix="[clusterURL]" to change the ingress route IP for Grafana, Prometheus and Alertmanager and reapply the manifests.

Customizing

The content of this project consists of a set of jsonnet files making up a library to be consumed.

Pre-reqs

The project requires json-bundler and the jsonnet compiler. The Makefile does the heavy-lifting of installing them. You need Go (version 1.18 minimum) already installed:

git clone https://github.com/carlosedp/cluster-monitoring
cd cluster-monitoring
make vendor
# Change the jsonnet files...
make

After this, a new customized set of manifests is built into the manifests dir. To apply to your cluster, run:

make deploy

To uninstall, run:

make teardown

Images

This project depends on the following images (all supports ARM, ARM64 and AMD64 thru manifests):

Alertmanager Blackbox_exporter Node_exporter Snmp_exporter Prometheus

ARM_exporter

Source: https://github.com/carlosedp/docker-arm_exporter
Autobuild: https://travis-ci.org/carlosedp/docker-arm_exporter
Images: https://hub.docker.com/r/carlosedp/arm_exporter/

Prometheus-operator

Source: https://github.com/carlosedp/prometheus-operator
Autobuild: No autobuild yet. Use provided build_images.sh script.
Images: https://hub.docker.com/r/carlosedp/prometheus-operator

Prometheus-adapter

Source: https://github.com/DirectXMan12/k8s-prometheus-adapter
Autobuild: No autobuild yet. Use provided build_images.sh script.
Images: https://hub.docker.com/r/carlosedp/k8s-prometheus-adapter

Grafana

Source: https://github.com/carlosedp/grafana-ARM
Autobuild: https://travis-ci.org/carlosedp/grafana-ARM
Images: https://hub.docker.com/r/grafana/grafana/

Kube-state-metrics

Source: https://github.com/kubernetes/kube-state-metrics
Autobuild: No autobuild yet. Use provided build_images.sh script.
Images: https://hub.docker.com/r/carlosedp/kube-state-metrics

Addon-resizer

Source: https://github.com/kubernetes/autoscaler/tree/master/addon-resizer
Autobuild: No autobuild yet. Use provided build_images.sh script.
Images: https://hub.docker.com/r/carlosedp/addon-resizer

Obs. This image is a clone of AMD64, ARM64 and ARM with a manifest. It's cloned and generated by the build_images.sh script

configmap_reload

Source: https://github.com/carlosedp/configmap-reload
Autobuild: https://travis-ci.org/carlosedp/configmap-reload
Images: https://hub.docker.com/r/carlosedp/configmap-reload

prometheus-config-reloader

Source: https://github.com/coreos/prometheus-operator/tree/master/contrib/prometheus-config-reloader
Autobuild: No autobuild yet. Use provided build_images.sh script.
Images: https://hub.docker.com/r/carlosedp/prometheus-config-reloader

SMTP-server

Source: https://github.com/carlosedp/docker-smtp
Autobuild: https://travis-ci.org/carlosedp/docker-smtp
Images: https://hub.docker.com/r/carlosedp/docker-smtp

Kube-rbac-proxy

Source: https://github.com/brancz/kube-rbac-proxy
Autobuild: No autobuild yet. Use provided build_images.sh script.
Images: https://hub.docker.com/r/carlosedp/kube-rbac-proxy

cluster-monitoring's People

Contributors

Stargazers

Watchers

Forkers

rajr0 jessestuart abennion larsha clabu tw3rp wirr00 sasg zwexin containerpope sasanth konradgaluszka nwolber jithin-urolime houlalajaimal alemansec devaraj-s grengojbo akolk halytskyi wisnuwirandanu robwilkes vfondevilla mpv eons44 rburton04 andrewdmeier andyg-0 youngkin letme muvvas poidag-zz phalski nicklaswallgren stone-wlg wednus fr34k8 colin-mccarthy lyingchild l0g4n madhuravius fastjrun silaman sjrepos deepaknadig kluzny jhidalgo3 kasuboski joelsaunders efrat19 geerlingguy rossbachp pyguy juniorsaldanha whenkek tekenny blackmix mhdbs prasenforu jeffreysmooth pascal71 v4rd4453n naude-r saendu northlander z3r0l1nk smlmg nothsa doomgerbil dylanhitt robot-o jj-tay nashluffy albertteoh blizzard-pnoguera larsenclose togashidm leewallen russelltsherman 5l1v3r1 cubed-it branttaylor mike-ensor zhoutian4 aafishman axeal jontg mercurialdev jjo93sa nicholaswilde evalle fliitecorp loranet-technologies dixneuf19 naidu-kjml luthfiansyah syuan100 t-m-a stevemcghee terryhowe

cluster-monitoring's Issues

Prometheus Operator Helm chart w/ k3s

Hi, question. I am already using the Prometheus Operator Helm Chart in my k3s cluster. I would like to get kubeControllerManager and kubeScheduler monitored, what is the work that needs to be done to enable this without using your deployments here? Is it possible?

K3s Prometheus unable to scrape worker nodes.

Hi 👋

This repo is fantastic to see and so far its been brilliant.

So far I followed the pre requisite steps to get setup:

make vendor
make
kubectl apply -f manifests

Here's a snapshot of my cluster after applying all of the manifests

Everything looks great!

Unfortunately though I am unable to access the ingress gateways as suggested alertmanager..nip.io, prometheus..nip.io, grafana..nip.io. To get around this for now I thought I'd check out prometheus first via port-forward.

To do that I run: kubectl -n monitoring port-forward prometheus-k8s-0 9090:9090

When I check prometheus targets I see lots of similar errors to this one:

The majority of the errors are:

context deadline exceeded
IO Timeout

Details of my hardware:

NAME               STATUS   ROLES    AGE   VERSION
worker-rpi-3Bplus   Ready    <none>   22h   v1.17.3+k3s1
worker-rpi-2       Ready    <none>   22h   v1.17.3+k3s1
master             Ready    master   22h   v1.17.3+k3s1
worker-rpi-4       Ready    <none>   22h   v1.17.3+k3s1

master = Raspberry Pi 4 /w 4GB RAM
worker-rpi-4 = Rapsberry Pi 4 /w 4GB RAM
worker-rpi-2 = Raspberry Pi 2 B+
worker-rpi-3Bplus = Raspberry Pi 3 B +

All running docker version: 19.03.8 Client & Server.

If there's any further detail you need me to supply, I'll dig that out for you.

There also may be a gap in my knowledge here / potential further PEBCAC. Any advise / help would be much appreciated!

Permission error on persistent volumes with Prometheus and Grafana

I configured the master-ip in vars I have also enabled k3s, persistance volume and suffix in vars.jsonnet. My k3s master has the ip 192.168.1.2 and the worker 192.168.1.4. Then I did the make vendor, make and make deploy all pods are running but for some reason I cannot access grafana, prometheus and alertmanager. So I checked did kubectl get ingress --all-namespaces and the result was the following. Is there anything wrong with the steps I have perfomed ?

NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
monitoring alertmanager-main alertmanager.192.168.1.2.nip.io 192.168.1.4 80, 443
monitoring grafana grafana.192.168.1.2.nip.io 192.168.1.4 80, 443 12s
monitoring prometheus-k8s prometheus.192.168.1.2.nip.io 192.168.1.4 80, 443

prometheus-k8s-0 stuck pending

Thank you for the awesome project.

I ran the ./deploy script but just have one issue trying to bring everything up. The prometheus-k8s-0 and grafana pods seem to be stuck pending.

...
monitoring       grafana-68c8dcd4dd-bv7ng                0/1       Pending   0          17m
monitoring       prometheus-k8s-0                        0/2       Pending   0          15s
monitoring       prometheus-operator-c65785d89-2xkn4     1/1       Running   0          10m
...

relevant kubectl get events -n monitoring

5m          12m          26        prometheus-k8s-0.1534b4e7ea27ebb6                      Pod                                                            Warning   FailedScheduling        default-scheduler             pod has unbound PersistentVolumeClaims (repeated 4 times)
38s         3m           10        prometheus-k8s-0.1534b562df4c937a                      Pod                                                            Warning   FailedScheduling        default-scheduler             pod has unbound PersistentVolumeClaims (repeated 4 times)
1m          12m          43        prometheus-k8s-db-prometheus-k8s-0.1534b4e7214bad48    PersistentVolumeClaim                                          Normal    FailedBinding           persistentvolume-controller   no persistent volumes available for this claim and no storage class is set
12m         12m          1         prometheus-k8s.1534b4e721f847d6                        StatefulSet                                                    Normal    SuccessfulCreate        statefulset-controller        create Claim prometheus-k8s-db-prometheus-k8s-0 Pod prometheus-k8s-0 in StatefulSet prometheus-k8s success
12m         12m          9         prometheus-k8s.1534b4e72367ee8e                        StatefulSet                                                    Warning   FailedCreate            statefulset-controller        create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s failed error: pods "prometheus-k8s-0" is forbidden: error looking up service account monitoring/prometheus-k8s: serviceaccount "prometheus-k8s" not found
3m          12m          2         prometheus-k8s.1534b4e7ea855e82                        StatefulSet                                                    Normal    SuccessfulCreate        statefulset-controller        create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s successful
3m          3m           1         prometheus-k8s.1534b562e3e10eeb                        StatefulSet                                                    Warning   FailedCreate            statefulset-controller        create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s failed error: The POST operation against Pod could not be completed at this time, please try again.

Any thoughts on what might be going on? I see that it wants a persistent volume, any way around that?

Getting `/bin/jb: not found` when I try running `make vendor`

When I try running make vendor after customizing my vars.jsonnet file, I keep getting:

$ make vendor
Installing jsonnet-bundler
rm -rf vendor
``/bin/jb install
/bin/sh: 1: /bin/jb: not found
make: *** [Makefile:26: vendor] Error 127

It seems like the path /bin/jb is hardcoded, but when it installs jsonnet-bundler it is running with my local GOPATH, which is ~/go, so jb is installed in ~/go/bin/jb and not in the global /bin dir.

Can the makefile be updated to work with just calling jb instead? I have added the go bin path to my user's $PATH as well, but since the /bin/jb location is hardcoded, I have to manually add a symlink or install as root, which is a little strange.

Change path of storage

Hello,

How could I change the path that database is stored ? I would like change the path to my secondary hdd

Additional module not deployed

Hi,
I modified the vars file and set "true" on arm_exporter and metallbexporter but in the end it did not deploye.
Also when I change the IP in "suffixdomains" example 192.168.1.2, after I deployed the IP is not changed and remains 192.168.15.15.
I think it just ignores the changes to the vars.jsonnet file.

Cannot scrap metrics : Unauthorized

I created a cluster of 6 nodes with k3s using the first one as server and 5 others as agent.

I followed your readme, did some change for my own nfs settings and finally built the manifests and applied them.

But my prometheus instance cannot scrap /metrics when it's protected by kube-rbac-proxy

I tried to curl manually from the prometheus pod using the serviceAccount token to see if it was a prometheus configuration issue, but I found the same problem.

Checking the log from kube-rbac-proxy I found :

E0511 17:08:31.133029       1 webhook.go:106] Failed to make webhook authenticator request: the server could not find the requested resource
E0511 17:08:31.133217       1 proxy.go:67] Unable to authenticate the request due to an error: the server could not find the requested resource

Did I forgot to do something ? Or is it maybe and issue with k3s itself ?

Error loading config

Upgrading an existing Prometheus operator (from v0.17.0 to v0.28.0) and bumping the Prometheus version (v2.3.1 to v2.7.0) results in the following error in the Prometheus logs.

level=error ts=2019-02-01T18:43:54.10168301Z caller=main.go:688 err="error loading config from \"/etc/prometheus/config_out/prometheus.env.yaml\": couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory"

I haven't dug super deep into it yet, any idea what could be going on?

Support for dynamic persistent volume claims

A convenient enhancement would be support for dynamic persistent storage. storageClassName instead of volumeName.

Which certificate file types?

First of all, thank you for this amazing project!

I have created server certificate files using Let's Encrypt and Certbot. Now I wonder which files I need to copy into the server.key and server.crt files?
The Certbot has created the following files: cert.pem, chain.pem, fullchain.pem and privkey.pem.

error in starting the pods

Failed to pull image "carlosedp/node_exporter:v0.15.2": rpc error: code = Unknown desc = no supported platform found in manifest list

CPU Temperature monitor giving Pod IPs instead of node IPs, so DNS names don't display

I noticed the node-exporter seems to be giving back node IPs instead of Kubernetes Pod IPs, so when the dashboard is displayed in Grafana, I see my node DNS names, like worker-01, worker-02, etc.

The CPU temperature monitor data, though, uses Pod IPs instead of node IPs, so the data is a little harder to discern, since I have to manually map Pod IPs to the nodes those Pods are running on:

The secret `ingress-TLS-secret` is invalid...

servicemonitor.monitoring.coreos.com/traefik created
The Secret "ingress-TLS-secret" is invalid: metadata.name: Invalid value: "ingress-TLS-secret": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
make: *** [deploy] Error 1

'arm-exporter' isn't running on master node, only workers

I noticed while debugging #39 that the arm-exporter DaemonSet was only running on 6 out of 7 Pi nodes. It was not running on the master node.

The master has the following taint:

Taints:             k3s-controlplane=true:NoExecute

But that doesn't seem to cause the node-exporter DaemonSet to not deploy a Pod there:

# kubectl get ds -n monitoring
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
node-exporter   7         7         7       7            7           kubernetes.io/os=linux        27m
arm-exporter    6         6         6       6            6           beta.kubernetes.io/arch=arm   37m

The arch is arm on all 7 Pis, so I'm not sure why the selector might influence the DS deployment.

Additional scrape configs

This is a great project, and was really easy to get it monitoring my k3s raspberry pi cluster. Thank you.
Is there any way to add additional entries to the scrape_configs in Prometheus' configuration to monitor things outside of the cluster? I found these instructions in the original project:
https://github.com/coreos/prometheus-operator/blob/master/Documentation/additional-scrape-config.md
I see the same files mentioned in this project, but it doesn't look like it's being ingested when running make deploy. Is there something I should add to vars.jsonnet to get that working?

Trouble adding SNMP exporter

Thanks for your work on this. I am trying to get the SNMP exporter working and have used your configurations as a starting place but I am having trouble with the ServiceMonitor and I'm not sure how to troubleshoot it.

I can see the snmp related servicemonitor object in Kubernetes after creating it with,

kubectl get servicemonitor -n monitoring -o yaml | grep snmp

But I don't see any of the configuration anywhere else in Prometheus. Likewise none of the SNMP metrics are showing up in Grafana. Nothing is really showing up the prometheus-operator logs either about the configuration.

I have tested that the exporter is working and I can get to it and scrape a target manually.

Any thoughts or insights? Maybe I am missing something simple? I can post any other details if needed.

Unable to access dashboards

I attempted to deploy to my rasperry pi cluster running k3s and i am unable to reach the dashboards. I was thinking maybe something with my k3s setup was wrong so i attempted to re-deploy k3s but am still seeing the same issue.

{
  ...
  modules: [
    {
      // After deployment, run the create_gmail_auth.sh script from scripts dir.
      name: 'smtpRelay',
      enabled: false,
      file: import 'smtp_relay.jsonnet',
    },
    {
      name: 'armExporter',
      enabled: true,
      file: import 'arm_exporter.jsonnet',
    },
    {
      name: 'upsExporter',
      enabled: false,
      file: import 'ups_exporter.jsonnet',
    },
    {
      name: 'metallbExporter',
      enabled: false,
      file: import 'metallb.jsonnet',
    },
    {
      name: 'traefikExporter',
      enabled: true,
      file: import 'traefik.jsonnet',
    },
    {
      name: 'elasticExporter',
      enabled: false,
      file: import 'elasticsearch_exporter.jsonnet',
    },
  ],

  k3s: {
    enabled: true,
    master_ip: ['<my-ip>'],
  },

  // Domain suffix for the ingresses
  suffixDomain: '<my-ip>.nip.io',
  // If TLSingress is true, a self-signed HTTPS ingress with redirect will be created
  TLSingress: true,
  ...
}

Prometheus operator giving RBAC error with latest commit

After pulling the latest commit and building, I noticed the prometheus-operator is stuck in a CrashLoopBackOff and the container log shows:

level=info ts=2020-05-23T23:29:30.5438805Z caller=operator.go:293 component=thanosoperator msg="connection established" cluster-version=v1.17.5+k3s1
ts=2020-05-23T23:29:30.73898204Z caller=main.go:304 msg="Unhandled error received. Exiting..." err="getting CRD: Alertmanager: customresourcedefinitions.apiextensions.k8s.io \"alertmanagers.monitoring.coreos.com\" is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot get resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"

trying to make it work from a macosx -failed

Hi, the stack looks great, I've been following a tutorial here https://kauri.io/deploy-prometheus-and-grafana-to-monitor-a-kube/186a71b189864b9ebc4ef7c8a9f0a6b5/a

however, I am not able to deploy this using macosx catalina and brew installed go

after installing go using brew I set the PATH


brew install go
export PATH=$PATH:/usr/local/Cellar/go/1.14.2_1/bin/

make vendor finishes, but make deploy gives me this error

make
rm -rf manifests
./scripts/build.sh main.jsonnet /usr/local/Cellar/go/1.14.2_1//bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /usr/local/Cellar/go/1.14.2_1//bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
RUNTIME ERROR: vars.jsonnet:1:1-10 Unknown variable: master_ip
	main.jsonnet:2:14-35	thunk <vars> from <$>
	main.jsonnet:11:60-64

	utils.libsonnet:21:9-13	thunk from <function <anonymous>>
	utils.libsonnet:17:26-29	thunk from <function <aux>>
	utils.libsonnet:17:15-30	function <aux>
	utils.libsonnet:21:5-21	function <anonymous>
	main.jsonnet:11:14-92	thunk <kp> from <$>
	main.jsonnet:21:81-83
	<std>:1278:24-25	thunk from <function <anonymous>>
	<std>:1278:5-33	function <anonymous>
	main.jsonnet:21:64-99	$


	During evaluation

make: *** [manifests] Error 1

as I am a go noob, any help would be appreciated

Cheers!

build error

First of all, I add my thanks for this project. Here are the steps I just performed:

$ git clone https://github.com/carlosedp/cluster-monitoring.git
Cloning into 'cluster-monitoring'...
remote: Enumerating objects: 1256, done.
remote: Total 1256 (delta 0), reused 0 (delta 0), pack-reused 1256
Receiving objects: 100% (1256/1256), 921.68 KiB | 3.48 MiB/s, done.
Resolving deltas: 100% (878/878), done.

Modified vars.jsonnet, which is attached as vars.txt

$ make vendor
rm -rf vendor
/root/go/bin/jb install
GET https://github.com/coreos/kube-prometheus/archive/17989b42aa10b1c6afa07043cb05bcd5ae492284.tar.gz 200
GET https://github.com/brancz/kubernetes-grafana/archive/57b4365eacda291b82e0d55ba7eec573a8198dda.tar.gz 200
GET https://github.com/ksonnet/ksonnet-lib/archive/0d2f82676817bbf9e4acf6495b2090205f323b9f.tar.gz 200
GET https://github.com/kubernetes-monitoring/kubernetes-mixin/archive/b61c5a34051f8f57284a08fe78ad8a45b430252b.tar.gz 200
GET https://github.com/prometheus/prometheus/archive/74207c04655e1fd93eea0e9a5d2f31b1cbc4d3d0.tar.gz 200
GET https://github.com/coreos/etcd/archive/d8c8f903eee10b8391abaef7758c38b2cd393c55.tar.gz 200
GET https://github.com/coreos/prometheus-operator/archive/e31c69f9b5c6555e0f4a5c1f39d0f03182dd6b41.tar.gz 200
GET https://github.com/kubernetes/kube-state-metrics/archive/d667979ed55ad1c4db44d331b51d646f5b903aa7.tar.gz 200
GET https://github.com/kubernetes/kube-state-metrics/archive/d667979ed55ad1c4db44d331b51d646f5b903aa7.tar.gz 200
GET https://github.com/prometheus/node_exporter/archive/08ce3c6dd430deb51798826701a395e460620d60.tar.gz 200
GET https://github.com/grafana/grafonnet-lib/archive/8fb95bd89990e493a8534205ee636bfcb8db67bd.tar.gz 200
GET https://github.com/grafana/jsonnet-libs/archive/881db2241f0c5007c3e831caf34b0c645202b4ab.tar.gz 200

$ make
Installing jsonnet
go: found github.com/google/go-jsonnet/cmd/jsonnet in github.com/google/go-jsonnet v0.16.0
go: github.com/mattn/go-isatty upgrade => v0.0.12
go: github.com/mattn/go-colorable upgrade => v0.1.6
go: golang.org/x/sys upgrade => v0.0.0-20200620081246-981b61492c35
go: found github.com/google/go-jsonnet/cmd/jsonnetfmt in github.com/google/go-jsonnet v0.16.0
go: github.com/mattn/go-colorable upgrade => v0.1.6
go: github.com/mattn/go-isatty upgrade => v0.0.12
go: golang.org/x/sys upgrade => v0.0.0-20200620081246-981b61492c35
go: github.com/brancz/gojsontoyaml upgrade => v0.0.0-20200602132005-3697ded27e8c
rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from path
+ set -o pipefail
+ rm -rf manifests
+ mkdir -p manifests/setup
+ jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
RUNTIME ERROR: Unexpected type null, expected object
        base_operator_stack.jsonnet:(123:7)-(128:119)   object <anonymous>
        main.jsonnet:31:24-40   object <anonymous>
        During manifestation

make: *** [Makefile:19: manifests] Error 1

vars.txt

Missing Service when adding arm-exporter

I'm not a go or jsonnet person, so reporting here.
When building and adding arm-exporter for temps, the resulting yaml is missing a couple things.

arm-exporter-serviceAccount.yaml (file missing, needed, for me at least):
apiVersion: v1
kind: ServiceAccount
metadata:
name: arm-exporter
namespace: monitoring

along with:
arm-exporter-daemonset.yaml:
add: serviceAccountName: arm-exporter
and: tls-cipher-suites

Without the service account and the serviceAccountName added to the daemonset, I was getting 401 unauthorized errors.

Wish there was more I could do to help, this project is aweosme

Error on "make"

Hi,

I followed the installation process an the "make vendor" steps. Everything worked.

Then I started to run plain "make" and got the following error:

$ make
rm -rf manifests
./scripts/build.sh main.jsonnet
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
./scripts/build.sh: line 15: jsonnet: command not found
Makefile:12: recipe for target 'manifests' failed
make: *** [manifests] Error 127

Here is my OS info:

$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
HYPRIOT_OS="HypriotOS/armhf"
HYPRIOT_OS_VERSION="v2.0.1"
HYPRIOT_DEVICE="Raspberry Pi"
HYPRIOT_IMAGE_VERSION="v1.9.0"

Here is my k8s info:

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/arm"}

What am I doing wrong?

Prometheus high on CPU

Hey,
I was wondering if there was a way or any steps one could take to reduce the CPU load of the Prometheus Pod. Maybe I'm also doing something wrong, not too sure.

I've got a K3s cluster with one RPI 4 4GB running as the master and 3 3B+ as workers. When I deploy the monitoring stack (following @geerlingguy 's tutorial) to my cluster most of the pods get scheduled to the master node.

In effect, the master node is constantly spiking from 20% CPU up to 50% (sometimes higher) CPU usage (I assume every time Prometheus is scraping), and in fact, Prometheus is the cause of almost all of this CPU usage.

PersistentVolume pointed at specific location (e.g. NFS server)

Is there a way to "easily" point to a pv/pvc in order to have real persistant storage in my k3s cluster?

Sorry it's not an issue, but a question. Maybe a feature request :)

PS. Is there a way to easily uninstall all of it without going one-by-one deleting all resources? Thank you!

running "VolumeBinding" filter plugin for pod "prometheus-k8s-0": pod has unbound immediate PersistentVolumeClaims`

Hello,

Thanks for getting all of these images together! Its been a lot more challenging to track down all the right images for my cluster than I initially thought.

I tried following the quickstart non k3s guide to deploy the monitoring stack on my cluster (Pi 4's, running Raspian Buster lite, full k8s set up with kubeadm) and receive the following PCV-related errors for both the prometheus-k8s-0 pod and the grafana pod:

running "VolumeBinding" filter plugin for pod "prometheus-k8s-0": pod has unbound immediate PersistentVolumeClaims

running "VolumeBinding" filter plugin for pod "grafana-759f594549-5mrsj": pod has unbound immediate PersistentVolumeClaims

I tried setting up the volumes manually but am not able to get around the PVC issue described. Both pods are in pending until they can attach to their volumes. Is there a way to go around the plugin used to create the PV and do it manually? Is it possible to run the plugin on its own to attempt to create the required volumes? I haven't used any filter plugins as described in the log before so there could be something simple I am missing as well.

I re-made and re-deployed the manifests once, not sure yet what else to try. I don't see the PV volumes for either initialized in the cluster either.

I could be missing something obvious in set-up, let me know if there are some obvious missed items that could lead to this.

Please let me know if there is any other information I can provide that would be helpful.

Thanks!

Can't install on k3s kubernetes cluster ARM

Hello, I have an ARM k3s kubernetes cluster with 4 ARM machines (odroid MC1).

After I type the make deploy, I get these errors (full log):


rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /root/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
kubectl apply -f ./manifests/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-coredns-dashboard created
configmap/grafana-dashboard-k8s-cluster-rsrc-use created
configmap/grafana-dashboard-k8s-node-rsrc-use created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-kubernetes-cluster-dashboard created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-dashboard created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
ingress.extensions/alertmanager-main created
ingress.extensions/grafana created
ingress.extensions/prometheus-k8s created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
service/kube-controller-manager-prometheus-discovery created
service/kube-dns-prometheus-discovery created
service/kube-scheduler-prometheus-discovery created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
deployment.apps/smtp-server created
service/smtp-server created
unable to recognize "manifests/0prometheus-operator-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/alertmanager-alertmanager.yaml": no matches for kind "Alertmanager" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/alertmanager-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/grafana-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/kube-state-metrics-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/node-exporter-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-prometheus.yaml": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-rules.yaml": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorApiserver.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorCoreDNS.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorKubeControllerManager.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorKubeScheduler.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "manifests/prometheus-serviceMonitorKubelet.yaml": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
Makefile:25: recipe for target 'deploy' failed
make: *** [deploy] Error 1

How to enable grafana-piechart-panel

Hi,

I need to enable this plugin (and potentially others), I have added it to vars.jsonnet

// Grafana "from" email
grafana: {
from_address: '[email protected]',
plugins: [
"grafana-piechart-panel",
],
},

but a make vendor/make/deploy doesn't get it enabled.
How can I get this done properly please?

Regards

Tom

Getting 'couldn't open import "arm_exporter.jsonnet"' since file reorg

Since the reorganization, I'm getting the following error when running make:

---
changed: false
cmd: "/usr/bin/make"
msg: "+ set -o pipefail\n+ rm -rf manifests\n+ mkdir -p manifests/setup\n+ xargs '-I{}'
  sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- '{}'\n+
  /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet\nRUNTIME ERROR:
  couldn't open import \"arm_exporter.jsonnet\": no match locally or in the Jsonnet
  library paths\n\tvars.jsonnet:16:13-42\tobject <anonymous>\n\tmain.jsonnet:9:34-45\tthunk
  from <thunk from <thunk <kp> from <$>>>\n\tutils.libsonnet:20:35-41\tthunk from
  <function <aux>>\n\tutils.libsonnet:20:9-42\tfunction <aux>\n\tutils.libsonnet:21:5-21\tfunction
  <anonymous>\n\tmain.jsonnet:9:14-92\tthunk <kp> from <$>\n\tmain.jsonnet:18:86-88\t\n\t<std>:1293:24-25\tthunk
  from <function <anonymous>>\n\t<std>:1293:5-33\tfunction <anonymous>\n\tmain.jsonnet:18:69-104\t$\n\t\t\n\t\t\n\tDuring
  evaluation\t\n\nmake: *** [Makefile:19: manifests] Error 1"
rc: 2
stderr: "+ set -o pipefail\n+ rm -rf manifests\n+ mkdir -p manifests/setup\n+ xargs
  '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' --
  '{}'\n+ /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet\nRUNTIME
  ERROR: couldn't open import \"arm_exporter.jsonnet\": no match locally or in the
  Jsonnet library paths\n\tvars.jsonnet:16:13-42\tobject <anonymous>\n\tmain.jsonnet:9:34-45\tthunk
  from <thunk from <thunk <kp> from <$>>>\n\tutils.libsonnet:20:35-41\tthunk from
  <function <aux>>\n\tutils.libsonnet:20:9-42\tfunction <aux>\n\tutils.libsonnet:21:5-21\tfunction
  <anonymous>\n\tmain.jsonnet:9:14-92\tthunk <kp> from <$>\n\tmain.jsonnet:18:86-88\t\n\t<std>:1293:24-25\tthunk
  from <function <anonymous>>\n\t<std>:1293:5-33\tfunction <anonymous>\n\tmain.jsonnet:18:69-104\t$\n\t\t\n\t\t\n\tDuring
  evaluation\t\n\nmake: *** [Makefile:19: manifests] Error 1\n"
stderr_lines:
- "+ set -o pipefail"
- "+ rm -rf manifests"
- "+ mkdir -p manifests/setup"
- "+ xargs '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm
  -f {}' -- '{}'"
- "+ /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet"
- 'RUNTIME ERROR: couldn''t open import "arm_exporter.jsonnet": no match locally or
  in the Jsonnet library paths'
- "\tvars.jsonnet:16:13-42\tobject <anonymous>"
- "\tmain.jsonnet:9:34-45\tthunk from <thunk from <thunk <kp> from <$>>>"
- "\tutils.libsonnet:20:35-41\tthunk from <function <aux>>"
- "\tutils.libsonnet:20:9-42\tfunction <aux>"
- "\tutils.libsonnet:21:5-21\tfunction <anonymous>"
- "\tmain.jsonnet:9:14-92\tthunk <kp> from <$>"
- "\tmain.jsonnet:18:86-88\t"
- "\t<std>:1293:24-25\tthunk from <function <anonymous>>"
- "\t<std>:1293:5-33\tfunction <anonymous>"
- "\tmain.jsonnet:18:69-104\t$"
- "\t\t"
- "\t\t"
- "\tDuring evaluation\t"
- ''
- 'make: *** [Makefile:19: manifests] Error 1'
stdout: |
  rm -rf manifests
  ./scripts/build.sh main.jsonnet /home/pirate/go/bin/jsonnet
  using jsonnet from arg
stdout_lines:
- rm -rf manifests
- "./scripts/build.sh main.jsonnet /home/pirate/go/bin/jsonnet"
- using jsonnet from arg

CPU Temperature alert giving 'No data' on Raspberry Pi CM3+

I'm testing this out on a Turing Pi cluster, with 7 Pi Compute Module 3+ boards.

On my Grafana dashboard, I'm seeing no data for CPU temperature:

I'm going to dig in and see where the monitor is supposed to be running. I modified the vars.jsonnet file like so, for my cluster:

{
  _config+:: {
    namespace: 'monitoring',
  },
  // Enable or disable additional modules
  modules: [
    {
      // After deployment, run the create_gmail_auth.sh script from scripts dir.
      name: 'smtpRelay',
      enabled: false,
      file: import 'smtp_relay.jsonnet',
    },
    {
      name: 'armExporter',
      enabled: true,
      file: import 'arm_exporter.jsonnet',
    },
    {
      name: 'upsExporter',
      enabled: false,
      file: import 'ups_exporter.jsonnet',
    },
    {
      name: 'metallbExporter',
      enabled: false,
      file: import 'metallb.jsonnet',
    },
    {
      name: 'traefikExporter',
      enabled: false,
      file: import 'traefik.jsonnet',
    },
    {
      name: 'elasticExporter',
      enabled: false,
      file: import 'elasticsearch_exporter.jsonnet',
    },
  ],

  k3s: {
    enabled: true,
    master_ip: ['10.0.100.163'],
  },

  // Domain suffix for the ingresses
  suffixDomain: '10.0.100.74.nip.io',
  // If TLSingress is true, a self-signed HTTPS ingress with redirect will be created
  TLSingress: true,
  // If UseProvidedCerts is true, provided files will be used on created HTTPS ingresses.
  // Use a wildcard certificate for the domain like ex. "*.192.168.99.100.nip.io"
  UseProvidedCerts: false,
  TLSCertificate: importstr 'server.crt',
  TLSKey: importstr 'server.key',

  // Setting these to false, defaults to emptyDirs
  enablePersistence: {
    prometheus: false,
    grafana: false,
  },

  // Grafana "from" email
  grafana: {
    from_address: '[email protected]',
  },
}

Error on "make" - UNTIME ERROR: Unexpected type string, expected array

make
rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /root/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
RUNTIME ERROR: Unexpected type string, expected array
		builtin function <flatMap>
	vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:23134:57-66	thunk from <function <anonymous>>
	vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:23134:48-67	function <anonymous>
	utils.libsonnet:(80:23)-(83:24)	thunk <subset> from <function <anonymous>>
	utils.libsonnet:89:31-37	thunk from <function <anonymous>>
	vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:7427:51-58	thunk from <function <anonymous>>
	vendor/ksonnet/ksonnet.beta.4/k8s.libsonnet:7427:42-59	function <anonymous>
	utils.libsonnet:89:9-38	function <anonymous>
	k3s-overrides.jsonnet:8:7-130	object <anonymous>
	main.jsonnet:25:27-46	object <anonymous>
	During manifestation

Grafana pod not starting

I have tried this project on my Raspberry 4 Arm64 K3s cluster running k3s on unbuntu 19..10.

When I ran without persistant storage everything seemed to work great, I then tried to deploy with perstistent storage on my Ceph cluster publishing block storage, then the grafana fails to launch with the below error.

GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied

Below is my vars.jsonnet file

{
_config+:: {
namespace: 'monitoring',
},
// Enable or disable additional modules
modules: [
{
// After deployment, run the create_gmail_auth.sh script from scripts dir.
name: 'smtpRelay',
enabled: false,
file: import 'smtp_relay.jsonnet',
},
{
name: 'armExporter',
enabled: true,
file: import 'arm_exporter.jsonnet',
},
{
name: 'upsExporter',
enabled: false,
file: import 'ups_exporter.jsonnet',
},
{
name: 'metallbExporter',
enabled: false,
file: import 'metallb.jsonnet',
},
{
name: 'traefikExporter',
enabled: true,
file: import 'traefik.jsonnet',
},
{
name: 'elasticExporter',
enabled: false,
file: import 'elasticsearch_exporter.jsonnet',
},
],

k3s: {
enabled: true,
master_ip: ['192.168.5.41'],
},

// Domain suffix for the ingresses
suffixDomain: 'example.com',
// If TLSingress is true, a self-signed HTTPS ingress with redirect will be created
TLSingress: true,
// If UseProvidedCerts is true, provided files will be used on created HTTPS ingresses.
// Use a wildcard certificate for the domain like ex. "*.192.168.99.100.nip.io"
UseProvidedCerts: false,
TLSCertificate: importstr 'server.crt',
TLSKey: importstr 'server.key',

// Setting these to false, defaults to emptyDirs
enablePersistence: {
prometheus: true,
grafana: true,
},

// Grafana "from" email
grafana: {
from_address: '[email protected]',
},
}

prometheus-k8s fails to start after a while 'CrashLoopBackOff'

Prometheus crashes and fails to start after running for some time. It looks like the TSDB is getting too large and Prometheus can't allocate any memory anymore (mmap: cannot allocate memory).

% kubectl logs prometheus-k8s-0 -p prometheus

level=info ts=2020-05-31T06:15:21.309Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2020-05-31T06:15:21.309Z caller=main.go:331 host_details="(Linux 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l prometheus-k8s-0 (none))"
...
level=info ts=2020-05-31T06:15:21.313Z caller=main.go:652 msg="Starting TSDB ..."
...
level=info ts=2020-05-31T06:15:21.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1590494400000 maxt=1590501600000 ulid=01E99KKY0EHNP5Y85BWZKD85CX
level=info ts=2020-05-31T06:15:21.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1590501600000 maxt=1590508800000 ulid=01E99KMVH5A14VHEX6SGYZSM5R
level=info ts=2020-05-31T06:15:21.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1590508800000 maxt=1590516000000 ulid=01E9AS2F3X2Z2WARE081EYBFP4
...
level=info ts=2020-05-31T06:15:21.740Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2020-05-31T06:15:21.740Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2020-05-31T06:15:21.740Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2020-05-31T06:15:21.740Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-05-31T06:15:21.741Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:722 msg="Notifier manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2020-05-31T06:15:21.741Z caller=main.go:551 msg="Scrape manager stopped"
level=error ts=2020-05-31T06:15:21.741Z caller=main.go:731 err="opening storage failed: unexpected corrupted block:map[01E9AS2F3X2Z2WARE081EYBFP4:mmap files: mmap: cannot allocate memory]"

I have set the persistence settings to false in vars.jsonnet.

  // Setting these to false, defaults to emptyDirs
  enablePersistence: {
    prometheus: false,
    grafana: false,
  },

Is there an easy way to configure the TSDB retention behavior?
https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects

--storage.tsdb.retention.size: [EXPERIMENTAL] This determines the maximum number 
of bytes that storage blocks can use (note that this does not include the WAL size, 
which can be substantial). The oldest data will be removed first. Defaults to 0 or 
disabled. This flag is experimental and can be changed in future releases. Units 
supported: KB, MB, GB, PB. Ex: "512MB"

Some metrics from cadvisor have no data on K3s

Due to different runtime use by K3s, cadvisor does not report some metrics due to missing fields.

Ref:

Redirect to HTTPS version when connecting over Load Balancer Ingress

Hi,
first of all, thanks for your work, I tested your manifest with my local k3s cluster and while it is not a perfect setting it works better than the standard prometheus-operator that is geared towards k8s.
I noticed that all services get exposed via TLS via the ingress, but it does not automatically redirect one to the HTTPS version of the site when you connect to the HTTP version of your service. This should be configurable via a setting in the ingress, I guess.

K3s 0.9 - 1.0+ Missing metrics

It seems like node_namespace_pod_container is missing, I can't really pinpoint why. Is it K3s related?

Referenced issue
prometheus-operator/kube-prometheus#284

Thank you

Just wanted to say thank you for this project Carlos.

As an amateur with a bunch of PIs trying to learn and setup a cluster this was invaluable to me.

Thank you

metrices for some namespaces are missing

The metrices for the pods on some namespaces are not there. Actually only the metrices for the kube-system and metallb namespaces are present (although without the metrices fort Network I/O). When i check the cpu and memory usage using kubectl top pod foo i can prove that the pods are using memory and cpu.
How can i debug the problem?

I am using kubernetes 1.15.5 on a raspberrypi cluster

prometheus command line options

Hi again, here's my next issue:

I'd like to access prometheus' admin-api. It is enabled via the command line option '--web.enable-admin-api'
I found: prometheus-operator/prometheus-operator#2300 which should enable this feature.

I have changed base_operator_stack.jsonnet and added "enableAdminAPI: 'true'". I have re-created and re-applied the manifests. Unfortunatly the commandline option is still not there on the prometheus-pod.

During 'make', I get 'gojsontoyaml: not found'

When running make, I get the following output:

$ make
rm -rf manifests
./scripts/build.sh main.jsonnet /home/pirate/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir manifests
+ /home/pirate/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found
--: 1: --: gojsontoyaml: not found

All the .yaml manifests are created, but they are empty. It seems like gojsontoyaml may be quite essential to making the manifests actually have content 🤪

compatible with HA k3s

hello, i run this with HA k3s but the dashboard always show NONE instead of the graph

can this compatible with 2 or more master cluster?

how does rpi_up work?

Hello @carlosedp ,

Can you shed some light on rpi_up? is it a metric? how does it work?

https://github.com/carlosedp/prometheus-operator-ARM/blob/26fe27848558acbbb2e31983be7411028892612d/grafana-dashboards/Kubernetes%20cluster%20monitoring.json#L216

Error when running 'make'

While running make I eventually get this in the console:

rm -rf manifests
./scripts/build.sh main.jsonnet /root/go/bin/jsonnet
using jsonnet from arg
+ set -o pipefail
+ rm -rf manifests
+ mkdir -p manifests/setup
+ /root/go/bin/jsonnet -J vendor -m manifests main.jsonnet
+ xargs '-I{}' sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- '{}'
./scripts/build.sh: line 23:  7595 Killed                  $JSONNET_BIN -J vendor -m manifests "${1-example.jsonnet}"
      7596 Done                    | xargs -I{} sh -c 'cat {} | $(go env GOPATH)/bin/gojsontoyaml > {}.yaml; rm -f {}' -- {}
make: *** [Makefile:19: manifests] Error 137

My vars.jsonnet edits from default are:

armExporter : true
k3s.enabled : true
k3s.master_ip : 192.168.178.82
suffixDomain : 192.168.178.84.nip.io

Any ideas what is causing this or how to get this to run?

Script error

Hi,
Can you help me solve an issue that i got when following your guide? I´ve set up the vars master ip and suffix with my node1 IP. How should I approach this with multiple nodes in cluster?
I have set k3s enabled and armExporter and Traefik to true.
Should I run all the commands with sudo?

When I do I get

kubectl apply -f ./manifests/setup/
The connection to the server localhost:8080 was refused - did you specify the right host or port?
make: *** [Makefile:34: deploy] Error 1

Without sudo I get

kubectl apply -f manifests/setup/
namespace/monitoring unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator unchanged
[unable to recognize "manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1", unable to recognize "manifests/setup/prometheus-operator-0thanosrulerCustomResourceDefinition.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"]

What am I doing wrong?

Thanks :)

K3s - Disappearing metrics

I have noticed that the kube-controller-manager and the kube-scheduler disappears from the list of targets in prometheus after 12-24 hours. The metrics endpoints it still available thought.

I have tried to restart the prometheus container, but to no avail. The only solution so far it to re-apply the manifest-files.

Thanks for a great repo!

K3s - smtp-server, CreateContainerConfigError

Events:
  Type     Reason  Age                    From                  Message
  ----     ------  ----                   ----                  -------
  Warning  Failed  11m (x12279 over 44h)  kubelet, ubuntuapp02  Error: secret "smtp-account" not found

How to change Prometheus' scrapeInterval?

It seems the default (https://prometheus.io/docs/prometheus/latest/configuration/configuration/) is 60s/1m; though I noticed one override for 30s in base_operator_stack.jsonnet for kubeStateMetrics.

It seems the Prometheus pod is a bit overloaded when it gets deployed to some of my older Pi 3 B boards (seems to run okay on the Pi 4 with more RAM and faster CPU though)... and I'm wondering if setting the scrape interval to something a bit more lightweight like 2m or 5m might help the poor older Pis keep up.

I was about to jump over to my prometheus instance but that node just went down due to thrashing as it ran out of memory 🤪

Anyways, just a quick support question, not a big deal and I may work on getting the memory requirements a little more stringent so Prometheus only goes to one of the newer/faster nodes.

Investigate the use of kube-rbac-proxy on K3s

Dive deeper on kube-rbac-proxy use of the API on K3s to return endpoints into https and authenticated endpoints.

Ref. #13

How to regenerate grafana.ini after sercret update?

Hi, I have edited the grafana-config secret to enable basic.auth.
I do not understand how to regenerate the grafana.ini with this change. I have deleted the pod, it was recreated, but the config is still the old one.

Please help, and thanks for the great work!
Cheers,
udo.

Issue with Alerts and one question.

Hey,

first off all thanks for this nice work. Running it on my RPi4 cluster with HypriotOS and K3s without issues and also worked first try.
Unfortunately the Alerts show me KubeControllerManagerDown and KubeSchedulerDown is this expected behavior?

Additionally I have a question about your blogpost and one particular screenshot:
https://miro.medium.com/max/1400/1*zp4bS5omhxoLxbC4xGh5vQ.png
There it shows all the processes and their percentage of CPU usage. For me it shows only 1 graph with Value | 21% | 14%. Is this a limitation due to HypriotOS, the ARM, k3s or did I forget something?