Comments (3)
Awesome! Thanks for sharing!
from awesome-prometheus-alerts.
@TheKangaroo If the custom KSM config means the rules aren't a good fit here, would mind sharing them in a Gist (or anywhere else)? I'm about to write alerts for Flux in the week or so and would love to have a jump start!
from awesome-prometheus-alerts.
Sure these are basically our alerts (helm template). We'll improve them over time but for now we started with this.
As I said earlier, they rely on the kube-state-metrics config in the flux monitoring-example repo.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: flux.rules
labels:
app.kubernetes.io/part-of: flux
app.kubernetes.io/component: monitoring
spec:
groups:
- name: flux.rules
rules:
- alert: FluxKustomizationFailing
annotations:
description: Flux Kustomization {{`{{`}} $labels.name {{`}}`}} in namespace {{`{{`}} $labels.namespace {{`}}`}} failed.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/{{ .Template.Name }}
summary: Errors while reconcile Flux Kustomization(s)
expr: gotk_resource_info{customresource_kind=~"Kustomization",
ready="False"}
for: 5m
labels:
severity: warning
{{- if .Values.defaultRules.additionalRuleLabels }}
{{ toYaml .Values.defaultRules.additionalRuleLabels | indent 8 }}
{{- end }}
- alert: FluxHelmReleaseFailing
annotations:
description: Flux HelmRelease {{`{{`}} $labels.name {{`}}`}} in namespace {{`{{`}} $labels.namespace {{`}}`}} failed.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/{{ .Template.Name }}
summary: Errors while reconcile Flux HelmRelease(s)
expr: gotk_resource_info{customresource_kind=~"HelmRelease",
ready="False"}
for: 5m
labels:
severity: warning
{{- if .Values.defaultRules.additionalRuleLabels }}
{{ toYaml .Values.defaultRules.additionalRuleLabels | indent 8 }}
{{- end }}
- alert: FluxSourceFailing
annotations:
description: Flux Source {{`{{`}} $labels.name {{`}}`}} in namespace {{`{{`}} $labels.namespace {{`}}`}} failed.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/{{ .Template.Name }}
summary: Errors while reconcile Flux Source(s)
expr: gotk_resource_info{customresource_kind=~"GitRepository|HelmRepository|Bucket|OCIRepository",
ready="False"}
for: 5m
labels:
severity: warning
{{- if .Values.defaultRules.additionalRuleLabels }}
{{ toYaml .Values.defaultRules.additionalRuleLabels | indent 8 }}
{{- end }}
- alert: FluxResourceSuspended
annotations:
description: Flux Resource {{`{{`}} $labels.name {{`}}`}} in namespace {{`{{`}} $labels.namespace {{`}}`}} suspended.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/{{ .Template.Name }}
summary: Flux Resource(s) are suspended for an extended period of time.
expr: gotk_resource_info{suspended="true"}
for: 2h
labels:
severity: none
{{- if .Values.defaultRules.additionalRuleLabels }}
{{ toYaml .Values.defaultRules.additionalRuleLabels | indent 8 }}
{{- end }}
We send alerts for failing resources like GitRepo, Kustomization, HelmCharts and HelmReleases.
We added the "suspended" alert with a timeout of 2h in case someone is troubleshooting something and forgets to resume a flux resource after that.
from awesome-prometheus-alerts.
Related Issues (20)
- Awesome Prometheus alerts
- Create releases HOT 4
- Add alerting rule for the metric: node_filesystem_device_error HOT 3
- customize nodeexporter rules for some servers HOT 1
- 7.2.1. Loki process too many restarts label HOT 1
- Make alert description suffix customizable
- Invalid PostgresqlTooManyConnections HOT 1
- KubernetesNodeOutOfPodCapacity fails when multiple replicas of kube-state-metrics
- Broken on iOS?
- Adjust "Kubernetes Volume full in four days" query? HOT 2
- Node-exporter option has been renamed HOT 1
- Rule "Host RAID array got inactive" has misleading description HOT 1
- changed Kernel info breaks querie(s) HOT 1
- Rule "Host out of inodes" triggers false positive with FAT16 on FreeBSD HOT 6
- Host Memory underutilized uses a `rate` on the `node_memory_MemAvailable_bytes` gauge
- Need to fix use of deprecated apiserver_request_latencies_bucket metric HOT 1
- HostFilesystemDeviceError should use `for: 2m` HOT 1
- HostPhysicalComponentTooHot throws "many-to-many matching not allowed: matching labels must be unique on one side"
- RedisTooManyMasters in a multi-cluster setup HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awesome-prometheus-alerts.