Giter Club home page Giter Club logo

ankhmorpork's Introduction

Ankhmorpork

license kubescape

๐Ÿ“– Overview

This is a mono repository for @paulfantom home infrastructure and Kubernetes cluster. Project utilizes Infrastructure as Code to automate provisioning, operating, and updating self-hosted services.

โ›ต Kubernetes

Installation

Cluster is k3s provisioned on bare-metal hosts with latest LTS Ubuntu OS using a modified version of Ansible role provided by k3s project.

๐Ÿ”ธ Click here to see my Ansible playbooks and roles.

Components

Logo Name Description
Jsonnet Data templating language
GitHub Actions CI system
Ansible Automate bare metal provisioning and configuration
Ubuntu Base OS for Kubernetes nodes
K3s Lightweight distribution of Kubernetes
Kubernetes Container-orchestration system, the backbone of this project
kured Kubernetes Reboot Daemon
TopoLVM Local storage based on LVM
Longhorn Distributed block storage
Minio S3 storage
Flux GitOps tool built to deploy applications to Kubernetes
ExternalSecrets Secrets and encryption management system
MetalLB Bare metal load-balancer for Kubernetes
cert-manager Cloud native certificate management
Cloudflare DNS
Traefik Kubernetes Ingress Controller
oauth2-proxy Authentication proxy
Prometheus Systems monitoring and alerting toolkit
Thanos Metrics datalake
Grafana Operational dashboards
Cloudnative-pg Postgres Controller
Homer Portal Site
HomeAssistant Home Automation System
ESPhome Microcontrollers Management
Tandoor Cookbook
Photoprism Photo Management
Paperless-ngx Document Management
AND MANY OTHERS

GitOps

Flux watches manifests/ subdirectories in base and apps top-level directories and makes changes based on YAML manifests. Where possible YAML manifests are generated from jsonnet code.

๐ŸŒ DNS

Ingress Controller

Over WAN, I have port-forwarded ports 80 and 443 to the load balancer IP of my ingress controller that's running in my Kubernetes cluster.

Internal DNS

CoreDNS is deployed in a cluster and provides an internal resolution of ingress addresses as well as a proxy to NextDNS used for AdBlocking.

Dynamic DNS

My home IP can change at any given time and in order to keep my WAN IP address up to date on Cloudflare I have configured DDNS on Unifi Dream Machine Pro.

๐Ÿ’ฝ Network Attached Storage

QNAP NAS TS-431DeU is used to manage NFS shares and backup them to B2 cloud using HBS.

๐Ÿ”ง Hardware

Device Count RAM Storage Connectivity Purpose
Unifi Dream Machine Pro 1 N/A N/A 8x GbE + 2xSFP+ Router
Unifi US-16-PoE switch 1 N/A N/A 16x GbE + 2xSFP Main Switch
QNAP TS-431DeU 1 16GB 2x240GB NVMe RAID1 + 4x3TB RAID5 2x 2.5GbE LACP NAS
HP EliteDesk G2 800 mini 2 32GB 240GB M2 SSD + 500GB SSD 1x GbE K3S Node
DELL E5440 Laptop 1 12GB 240 SSD + 2x 120GB SSD 1x GbE K3S Node
Custom-built Server 1 64GB 240GB NVMe + 1TB SSD 2x GbE LACP + 1GbE K3S Node w/GPU

โœจ Features

Project status: Alpha

  • Common applications: Plex, Nextcloud, HomeAssistant, Ghost...
  • Automated Kubernetes installation and management
  • Monitoring and alerting
  • Modular architecture, easy to add or remove features/components
  • Automated certificate management
  • Installing and managing applications using GitOps
  • CI/CD platform
  • Distributed storage
  • Automatically update DNS records for exposed services ๐Ÿšง
  • Automated bare metal provisioning with PXE boot ๐Ÿšง
  • Support multiple environments (dev, stag, prod) ๐Ÿšง
  • Automated in-cluster offsite backups ๐Ÿšง
  • Single sign-on ๐Ÿšง

๐Ÿค Contributing

Any contributions you make, either big or small, are greatly appreciated.

๐Ÿ” Security

If you find any security issue please ping me using one of following contact mediums:

  • twitter DM (@paulfantom)
  • kubernetes slack (@paulfantom)
  • freenode IRC (@paulfantom)
  • email ([email protected])

๐Ÿ›๏ธ License

Distributed under the MIT License. See LICENSE for more information.

ankhmorpork's People

Contributors

paulfantom avatar renovate-bot avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ankhmorpork's Issues

PAU-45: Cloudflare and letsencrypt certs

Currently, certs for alchemyof.it cannot be requested when cloudflare is configured in strict TLS mode. Investigate adding cloudflare cert issuer to cert-manager or reconfiguring cloudflare rules.

Alert: TargetDown in monitoring

Alert TargetDown firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2022-12-22 13:39:48.970753442 +0000 UTC m=+327629.299195271.

Common Labels

alertname TargetDown
cluster ankhmorpork
job kubelet
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

description 25% of the kubelet/ targets in monitoring namespace are down.
runbook_url https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
summary One or more targets are unreachable.

Alerts

StartsAt Links
2022-12-22 13:24:18.612 +0000 UTC GeneratorURL

(DO NOT MODIFY: c3058ff715a6bb277bc9d4d65713d6bde6d43ea2e64facf2cee7f953c750f083 )

Services to add

Non-containerized:

Setup iSCSI provisioner

In addition to NFS provisioner it would be beneficial to setup iSCSI provisioner - https://github.com/kubernetes-incubator/external-storage/tree/master/iscsi/targetd

Majority of applications using hostPath for volumes could be moved to use iSCSI PVC.

Action items:

  • Ansible setup for targetd server (on NAS)
  • Ansible setup for iSCSI initiators (everywhere)
  • Manifests for iSCSI provisioner
  • 2 storage classes - one for vg_fast and one for vg_storage. A former one could be used for testing. Only vg_fast is allowed to be accessible for kubernetes. vg_storage is full and allowed to be accessed only as hostPath for performance reasons.

[ALERT] alertname:Test instance:localhost:9090 job:test

(Updated at 2022-12-18 17:00:29.598843205 +0000 UTC m=+588.227325790)

Common Labels

alertname Test
instance localhost:9090
job test

Common Annotations

description some description
runbook_url https://runbooks.thaum.xyz

Alerts

thing hint StartsAt Links
value 2022-06-12 01:00:00 +0000 UTC GeneratorURL

(DO NOT MODIFY: 933b432e797b2d35572313028cb4a685383488c087c4aea733e81833860129d5 )

Automate copying unifi backups

Unifi controller creates backup every week. However, this backup is stored locally on controller itself. It would be beneficial to create a cronjob to copy backup files and store on nfs-backed PV.
Backup files are stored in /data/autobackup/ on unifi controller.

Next step would be to add backup job to send data from PV to Backblaze.

Alert: KubeDaemonSetRolloutStuck in monitoring

Alert KubeDaemonSetRolloutStuck firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2024-02-10 11:32:40.199340918 +0000 UTC m=+44472.207561420.

Common Labels

alertname KubeDaemonSetRolloutStuck
cluster ankhmorpork
container kube-rbac-proxy-main
daemonset node-exporter
instance 10.42.2.209:8443
job kube-state-metrics
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

description DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.
runbook_url https://runbooks.thaum.xyz/runbooks/kubernetes/kubedaemonsetrolloutstuck
summary DaemonSet rollout is stuck.

Alerts

StartsAt Links
2024-02-10 06:12:09.821 +0000 UTC GeneratorURL

Automate provisioning of tools installed manually

Currently, there are many manually deployed services on NAS. This needs to be adjusted and config should be managed via git.

Discovered services:

  • ddclient
  • iscsi targetd not installed, issue tracked in #2
  • NFS exports

Configuration done on install (may be possible to automate via kickstart):

  • teamd config for LACP teamed network devices done during system installation
  • mdadm config done during installation
  • LVM config done during installation
  • Static DNS configuration (main DNS server for other clients is running in cluster) done during installation

Alert: KubeDaemonSetMisScheduled in monitoring

Alert KubeDaemonSetMisScheduled firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2022-12-22 13:39:43.58186847 +0000 UTC m=+327623.910310279.

Common Labels

alertname KubeDaemonSetMisScheduled
cluster ankhmorpork
container kube-rbac-proxy-main
instance 10.42.6.45:8443
job kube-state-metrics
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

runbook_url https://runbooks.thaum.xyz/runbooks/kubernetes/kubedaemonsetmisscheduled
summary DaemonSet pods are misscheduled.

Alerts

daemonset description StartsAt Links
kured 2022-12-22 13:29:43.046 +0000 UTC GeneratorURL
speaker 2022-12-22 13:29:43.046 +0000 UTC GeneratorURL

(DO NOT MODIFY: 7d04718e7ad227b0d6c6089159e34b440211567712f5bb11c4fc412328114d13 )

Alert: KubePodCrashLooping in monitoring

Alert KubePodCrashLooping firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2024-04-02 20:41:58.294093185 +0000 UTC m=+20.802279370.

Common Labels

alertname KubePodCrashLooping
cluster ankhmorpork
container github-receiver
instance 10.42.6.102:8443
job kube-state-metrics
namespace monitoring
pod github-receiver-668799f6b4-nbspf
prometheus monitoring/k8s
reason CrashLoopBackOff
severity warning
uid 53a4337e-3b09-4ff1-85c3-0e11fcd5d41e

Common Annotations

description Pod monitoring/github-receiver-668799f6b4-nbspf (github-receiver) is in waiting state (reason: "CrashLoopBackOff").
runbook_url https://runbooks.thaum.xyz/runbooks/kubernetes/kubepodcrashlooping
summary Pod is crash looping.

Alerts

StartsAt Links
2024-04-02 20:36:09.821 +0000 UTC GeneratorURL

Alert: PostgreSQLHighConnections in paperless

Alert PostgreSQLHighConnections firing in paperless namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-04-25 20:19:26.626114387 +0000 UTC m=+82818.323127310.

Common Labels

alertname PostgreSQLHighConnections
cluster ankhmorpork
instance 10.42.1.244:9187
namespace paperless
prometheus monitoring/k8s
severity warning

Common Annotations

description 10.42.1.244:9187 is exceeding 80% of the currently configured maximum Postgres connection limit (current value: 22s). Please check utilization graphs and confirm if this is normal service growth, abuse or an otherwise temporary condition or if new resources need to be provisioned (or the limits increased, which is mostly likely).
runbook_url https://runbooks.thaum.xyz/runbooks/postgresql/postgresqlhighconnections
summary 10.42.1.244:9187 is over 80% of max Postgres connections.

Alerts

StartsAt Links
2023-04-25 20:13:55.947 +0000 UTC GeneratorURL

(DO NOT MODIFY: d5f17e654e53aba5a290a7b5bd6083f74ceac73c01297e6d6edaf7b22b53a627 )

Alert: TestAlert in monitoring

Alert TestAlert firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2022-12-18 18:39:46.271423238 +0000 UTC m=+26.599865047.

Common Labels

alertname Test1
instance localhost:9090
job test

Common Annotations

description some description
runbook_url https://runbooks.thaum.xyz

Alerts

thing hint StartsAt Links
value 2022-06-12 01:00:00 +0000 UTC GeneratorURL

(DO NOT MODIFY: 346585aa11eea0e4b4d76f6e02fbcb6fd3e072e0044f467030c7eaa1440195b6 )

nextcloud: migrate to postgres

Main point behind migrating to postgresql:

  • it is faster (source)
  • mysql doesn't have good monitoring coverage in prometheus ecosystem (lack of meaningful alerts). Postgresql, on the other hand, is used by gitlab and has already established alerts and runbooks (alerts, recording rules)
  • possibly no hacks required to run postgresql_exporter

Cons:

Alert: TargetDown in monitoring

Alert TargetDown firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2024-04-14 20:39:22.674050872 +0000 UTC m=+214945.203274758.

Common Labels

alertname TargetDown
cluster ankhmorpork
job probe/monitoring/uptimerobot
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

description 100% of the probe/monitoring/uptimerobot/ targets in monitoring namespace are down.
runbook_url https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
summary One or more targets are unreachable.

Alerts

StartsAt Links
2024-04-14 20:33:22.417 +0000 UTC GeneratorURL

Alert: PostgreSQLCacheHitRatio in homeassistant

Alert PostgreSQLCacheHitRatio firing in homeassistant namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-01-17 13:36:14.31084953 +0000 UTC m=+370286.411825553.

Common Labels

alertname PostgreSQLCacheHitRatio
cluster ankhmorpork
datname homeassistant
namespace homeassistant
prometheus monitoring/k8s
severity warning

Common Annotations

description PostgreSQL low on cache hit rate on for database homeassistant with a value of 0.5173276395749896
runbook_url https://runbooks.thaum.xyz/runbooks/postgresql/postgresqlcachehitratio
summary PostgreSQL low cache hit rate on for database homeassistant

Alerts

StartsAt Links
2023-01-17 13:30:43.508 +0000 UTC GeneratorURL

(DO NOT MODIFY: b0cb3dbb66ae7f9be3e3a19466b1c79090cff7825d7e5292013a2f97e4dc16e4 )

Alert: TargetDown in monitoring

Alert TargetDown firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-03-07 15:36:04.518855456 +0000 UTC m=+5316.354817277.

Common Labels

alertname TargetDown
cluster ankhmorpork
job node-exporter
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

description 20% of the node-exporter/ targets in monitoring namespace are down.
runbook_url https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
summary One or more targets are unreachable.

Alerts

StartsAt Links
2023-03-07 15:34:48.612 +0000 UTC GeneratorURL

(DO NOT MODIFY: 450f142a6083f4ed63d0d1d7de01ebeb9d33d15312095933d5f927bb4e076c3f )

Run ansible in k3s pod

Due to resource constraints on master01, ansible deployment cannot finish and causes k3s apiserver to crash. To mitigate issue it would be better to run ansible as a cronjob in k3s.

DoD:

  • find/create a container image with:
    • ansible
    • ssh client
    • git
  • crojob should run ./deploy.sh script
  • repository should be mounted as PV on NFS storage class
  • pushgateway shouldn't be exposed to local network anymore
  • ansible_connection=local cannot be set for any host

Bring back SiteExternallyDown alert

  • Reduce severity level of SiteDown alert to warning
  • Recreate SiteExternallyDown alert that uses only uptimerobot data
  • Introduce inhibition rule - if SiteExternallyDown is firing, don't send SiteDown

Alert: TargetDown in monitoring

Alert TargetDown firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-03-07 14:39:19.103654196 +0000 UTC m=+1910.939616008.

Common Labels

alertname TargetDown
cluster ankhmorpork
job monitoring/smokeping
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

description 33.33% of the monitoring/smokeping/ targets in monitoring namespace are down.
runbook_url https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
summary One or more targets are unreachable.

Alerts

StartsAt Links
2023-03-07 14:23:48.612 +0000 UTC GeneratorURL

(DO NOT MODIFY: 0f273fe6719899752f139b038340c40a5684ab5eda17b89da23492695f528c41 )

Change backup strategy

Investigate k8s-native backup solutions to swap the current custom backup solution to a generic one.

Requirements:

  • uses restic internally
  • allow sending backups to Backblaze
  • backups need to be encrypted
  • allow backing up any PV

Nice to have:

  • allow backing up /var/lib/rancher/k3s/server on master node to allow master node recovery

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

ansible-galaxy
metal/roles/requirements.yml
  • devsec.hardening 9.0.0
  • prometheus.prometheus 0.6.0
  • oefenweb.locales v1.0.52
  • hifis.unattended_upgrades v3.2.1
github-actions
.github/workflows/kubeconform.yml
  • actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-go v5
  • actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-go v5
.github/workflows/kubescape.yml
  • actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11
.github/workflows/prometheusrule.yml
  • actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-go v5
  • prymitive/pint-action v1
.github/workflows/versions.yaml
  • actions/checkout v4@b4ffde65f46336ab88eb53be808477a3936bae11
  • actions/setup-go v5
  • juliangruber/read-file-action v1
  • peter-evans/create-pull-request v6
helm-values
apps/external-dns/values.yaml
  • ghcr.io/muhlba91/external-dns-provider-adguard v5.0.0
jsonnet-bundler
apps/datalake-metrics/jsonnet/jsonnetfile.json
apps/monitoring/jsonnet/jsonnetfile.json
apps/parca/jsonnet/jsonnetfile.json
apps/system-update/jsonnet/jsonnetfile.json
base/flux-system/jsonnet/jsonnetfile.json
lib/jsonnet/apps/jsonnetfile.json
regex
metal/group_vars/k3s.yml
  • k3s-io/k3s v1.28.6+k3s1
.github/workflows/versions.yaml
  • google/jsonnet v0.20.0
.github/workflows/kubeconform.yml
  • golang 1.21.5
.github/workflows/prometheusrule.yml
  • golang 1.21.5
.github/workflows/versions.yaml
  • golang 1.21.5

  • Check this box to trigger a request for Renovate to run again on this repository

Alert: PostgreSQLCacheHitRatio in <no value>

Alert PostgreSQLCacheHitRatio firing in namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2022-12-19 02:13:13.971634309 +0000 UTC m=+27234.300076138.

Common Labels

alertname PostgreSQLCacheHitRatio
cluster ankhmorpork
datname homeassistant
prometheus monitoring/k8s
severity warning

Common Annotations

description PostgreSQL low on cache hit rate on for database homeassistant with a value of 0.9593283923506267
runbook_url https://runbooks.thaum.xyz/runbooks/postgresql/postgresqlcachehitratio
summary PostgreSQL low cache hit rate on for database homeassistant

Alerts

StartsAt Links
2022-12-19 02:07:43.508 +0000 UTC GeneratorURL

(DO NOT MODIFY: 5ef6b1689008f5f35e66c52c08ad18997835c6f7116a07f925d42ce9d44bf43c )

Re-enable redis in nextcloud

Redis cache was disabled after a recent outage and disaster recovery. It needs to be reenabled and improved by adding password protection (REDIS_PASSWORD variable in nextcloud).

Unfortunately due to how nextcloud docker container is constructed, this is a manual work on nextcloud side and needs to be done by editing config.php file as well as setting correct variables for nextcloud container.

Prometheus Probe CRD doesn't probe targets

I am using Prometheus Probe CRD and Blackbox exporter to scrape static targets. But, when I checked in Blackbox exporter, I don't see specified targets being probed at all.

I was able to probe targets using Blackbox exporter and additionalScrapeConfigs in values file of Prometheus exporter but it doesn't work with Probe CRD.

Here is my Probe custom object config,

kind: Probe
metadata:
  name: probe-crd
  namespace: prometheus
spec:
  jobName: probe-crd
  prober:
    url: prometheus-blackbox-exporter:9115
  targets:
    staticConfig:
      static:
      - https://www.google.com

Blackbox exporter service is running on port 9115. Can someone please let me know what I am missing here?

Alert: ThanosQueryGrpcClientErrorRate in datalake-metrics

Alert ThanosQueryGrpcClientErrorRate firing in datalake-metrics namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2024-03-22 02:21:45.539869555 +0000 UTC m=+24.538879833.

Common Labels

alertname ThanosQueryGrpcClientErrorRate
cluster ankhmorpork
job thanos-query
namespace datalake-metrics
prometheus monitoring/k8s
severity warning

Common Annotations

description Thanos Query thanos-query is failing to send 5.817% of requests.
runbook_url https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosquerygrpcclienterrorrate
summary Thanos Query is failing to send requests.

Alerts

StartsAt Links
2024-03-22 02:15:36.048 +0000 UTC GeneratorURL

Alert: PostgreSQLMaxConnectionsReached in paperless

Alert PostgreSQLMaxConnectionsReached firing in paperless namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-04-28 15:13:57.585697289 +0000 UTC m=+22567.696417215.

Common Labels

alertname PostgreSQLMaxConnectionsReached
cluster ankhmorpork
instance 10.42.0.74:9187
namespace paperless
prometheus monitoring/k8s
severity warning

Common Annotations

description 10.42.0.74:9187 is exceeding the currently configured maximum Postgres connection limit (current value: 22s). Services may be degraded - please take immediate action (you probably need to increase max_connections in the Docker image and re-deploy.
runbook_url https://runbooks.thaum.xyz/runbooks/postgresql/postgresqlmaxconnectionsreached
summary 10.42.0.74:9187 has maxed out Postgres connections.

Alerts

StartsAt Links
2023-04-28 15:03:25.947 +0000 UTC GeneratorURL

(DO NOT MODIFY: 9af083c4c04aefaa4b687f93fe36767e0219bc14a37a0deb4214a22c8f6ab837 )

Alert: ThanosStoreObjstoreOperationLatencyHigh in datalake-metrics

Alert ThanosStoreObjstoreOperationLatencyHigh firing in datalake-metrics namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-12-04 05:27:59.510352973 +0000 UTC m=+495.481901453.

Common Labels

alertname ThanosStoreObjstoreOperationLatencyHigh
cluster ankhmorpork
job thanos-store
namespace datalake-metrics
prometheus monitoring/k8s
severity warning

Common Annotations

description Thanos Store thanos-store Bucket has a 99th percentile latency of 2.742787470798822 seconds for the bucket operations.
runbook_url https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosstoreobjstoreoperationlatencyhigh
summary Thanos Store is having high latency for bucket operations.

Alerts

StartsAt Links
2023-12-04 05:22:29.197 +0000 UTC GeneratorURL

Alert: ReconciliationFailure in flux-system

Alert ReconciliationFailure firing in flux-system namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2024-04-15 17:13:33.601854197 +0000 UTC m=+288996.131078121.

Common Labels

alertname ReconciliationFailure
cluster ankhmorpork
kind Kustomization
name shlink
namespace flux-system
prometheus monitoring/k8s
severity warning

Common Annotations

description Kustomization flux-system/shlink reconciliation has been failing for more than 10 minutes.
summary Flux objects reconciliation failure

Alerts

StartsAt Links
2024-04-11 16:14:32.63 +0000 UTC GeneratorURL

re-enable adblocker

Recent code update broke down DNS and adblocker plugin cannot contact external sources to pull blocklists.

Example errors:

[WARNING] plugin/ads: Loading list from url "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts" failed with error: Get "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts": net/http: TLS handshake timeout
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 www.google.de. A: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[WARNING] plugin/ads: Loading list from url "https://mirror1.malwaredomains.com/files/justdomains" failed with error: Get "https://mirror1.malwaredomains.com/files/justdomains": dial tcp 139.146.167.17:443: i/o timeout
[ERROR] plugin/errors: 2 imap.gmail.com. AAAA: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out
[ERROR] plugin/errors: 2 . NS: tls: DialWithDialer timed out

Alert: Test in monitoring

Alert Test firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2022-12-18 18:18:19.43869783 +0000 UTC m=+24.267089034.

Common Labels

alertname Test1
instance localhost:9090
job test

Common Annotations

<td>some description</td>
<td><a href="https://runbooks.thaum.xyz">https://runbooks.thaum.xyz</a></td>
description
runbook_url

Alerts

thing hint StartsAt Links
value 2022-06-12 01:00:00 +0000 UTC GeneratorURL

(DO NOT MODIFY: 21a57638781bc59a1dbd9cc17208d8dd0867816312710c3eaf12f759182010ee )

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: renovate.json
Error type: Invalid JSON (parsing failed)
Message: Syntax error: expecting String near ntom"], }

Switch from pi-hole to CoreDNS

Since CoreDNS is already used in k3s cluster, it might be a good idea to replace pi-hole with CoreDNS.

Pros:

  • one server type to learn
  • native prometheus metrics exposition (instead of faulty eko/pihole-exporter) with proper metrics :)
  • DNS-over-HTTP support
  • full GitOps management
  • stateless and easily scalable

Cons:

  • No webUI to quickly whitelist sites
  • ads plugin is not included natively in CoreDNS

Alert: KubeDeploymentReplicasMismatch in monitoring

Alert KubeDeploymentReplicasMismatch firing in monitoring namespace

This is an automated issue created by the monitoring system. Please do not edit this message.

Alertmanager URL: https://alertmanager.ankhmorpork.thaum.xyz

Issue was last updated at 2023-12-06 18:30:40.223380231 +0000 UTC m=+1278.197167614.

Common Labels

alertname KubeDeploymentReplicasMismatch
cluster ankhmorpork
container kube-rbac-proxy-main
deployment grafana
instance 10.42.6.139:8443
job kube-state-metrics
namespace monitoring
prometheus monitoring/k8s
severity warning

Common Annotations

description Deployment monitoring/grafana has not matched the expected number of replicas for longer than 15 minutes.
runbook_url https://runbooks.thaum.xyz/runbooks/kubernetes/kubedeploymentreplicasmismatch
summary Deployment has not matched the expected number of replicas.

Alerts

StartsAt Links
2023-12-06 18:25:09.821 +0000 UTC GeneratorURL

Fix issues with mysqld-exporter permissions

time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.perf_schema.eventsstatements: Error 1142: SELECT command denied to user 'cloud'@'127.0.0.1' for table 'events_statements_summary_by_digest'" source="exporter.go:171"
time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.perf_schema.indexiowaits: Error 1142: SELECT command denied to user 'cloud'@'127.0.0.1' for table 'table_io_waits_summary_by_index_usage'" source="exporter.go:171"
time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.perf_schema.tableiowaits: Error 1142: SELECT command denied to user 'cloud'@'127.0.0.1' for table 'table_io_waits_summary_by_table'" source="exporter.go:171"
time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.info_schema.innodb_metrics: Error 1227: Access denied; you need (at least one of) the PROCESS privilege(s) for this operation" source="exporter.go:171"
time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.info_schema.innodb_cmp: Error 1227: Access denied; you need (at least one of) the PROCESS privilege(s) for this operation" source="exporter.go:171"
time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.info_schema.innodb_cmpmem: Error 1227: Access denied; you need (at least one of) the PROCESS privilege(s) for this operation" source="exporter.go:171"
time="2020-04-16T09:33:53Z" level=error msg="Error scraping for collect.slave_status: Error 1227: Access denied; you need (at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation" source="exporter.go:171"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.