Giter Club home page Giter Club logo

kubernetes-nagios's Introduction

kubernetes-nagios

Some checks for Kubernetes clusters, which can be use with Nagios, Zabbix, Icinga, or any other monitoring system that can be configured to use an external shell script. It's been tested on macOS, Ubuntu and Debian with bash. There are relatively few dependencies, but the "jq" utility for processing json is required. If your Kubernetes API is not exposed, the checks can use kubectl, in which case this will need to be installed and configured.

check_kube_pods.sh

Usage

./check_kube_pods.sh [-t <TARGETSERVER> -c <CREDENTIALSFILE>] [-k <KUBE_CONFIG>] [-n <NAMESPACE>] [-w <WARN_THRESHOLD>] [-C <CRIT_THRESHOLD>]

Options

-t <TARGETSERVER> --  Optional, the endpoint for your Kubernetes API (otherwise will use kubectl)
-c <CREDENTIALSFILE> --  Required if a <TARGETSERVER> API is specified, in the format outlined below
-n <NAMESPACE> --  Namespace to check, for example, "kube-system". By default all are checked.
-w <WARN_THRESHOLD> --  Warning threshold for number of container restarts [default: 5]
-C <CRIT_THRESHOLD> --  Critical threshold for number of container restarts [default: 50]
-k <KUBE_CONFIG> --  Path to kube config file if using kubectl
-h --  Show usage / help
-v --  Show verbose output

Example Output

$ ./check_kube_pods.sh -n kube-system
OK - Kubernetes pods are all OK
OK: Pod: nginx-ingress-controller-v1-zg7gw   Container: nginx-ingress-lb    Ready: true   Restarts: 1
OK: Pod: nginx-ingress-controller-v1-txc1w   Container: nginx-ingress-lb    Ready: true   Restarts: 1
OK: Pod: nginx-ingress-controller-v1-dffl3   Container: nginx-ingress-lb    Ready: true   Restarts: 1
$ ./check_kube_pods.sh -n kube-system -w 0 -c 30
WARNING - One or more pods show warning status!
Warning: Pod: nginx-ingress-controller-v1-zg7gw   Container: nginx-ingress-lb    Ready: true   Restarts: 1
Warning: Pod: nginx-ingress-controller-v1-txc1w   Container: nginx-ingress-lb    Ready: true   Restarts: 1
Warning: Pod: nginx-ingress-controller-v1-dffl3   Container: nginx-ingress-lb    Ready: true   Restarts: 1

check_kube_deployments.sh

Usage

./check_kube_deployments.sh [-t <TARGETSERVER> -c <CREDENTIALSFILE>] [-k <KUBE_CONFIG>]
$ ./check_kube_deployments.sh -t https://api.mykube-cluster.co.uk -c ~/my-credentials
OK - Kubernetes deployments are all OK
OK: kubernetes-dashboard-v1.4.0 has condition Available: True - Deployment has minimum availability.
OK: kubernetes-dashboard has condition Available: True - Deployment has minimum availability.
OK: kube-dns-autoscaler has condition Available: True - Deployment has minimum availability.
OK: kube-dns has condition Available: True - Deployment has minimum availability.
OK: heapster has condition Available: True - Deployment has minimum availability.
OK: dns-controller has condition Available: True - Deployment has minimum availability.

check_kube_nodes.sh

This uses the Kubernetes API to check condition statuses across your nodes.

Usage

./check_kube_nodes.sh [-t <TARGETSERVER> -c <CREDENTIALSFILE>] [-k <KUBE_CONFIG>]
$ ./kubernetes-nagios ❯❯❯ ./check_kube_nodes.sh -t https://api.mykube-cluster.co.uk -c ~/my-credentials
WARNING - One or more nodes show warning status!
Warning: ip-10-123-81-96.eu-west-1.compute.internal has condition OutOfDisk - True
Warning: ip-10-123-82-87.eu-west-1.compute.internal has condition OutOfDisk - True

check_kubernetes_api.sh

This check returns the health status of the overall cluster.

Usage

./check_kubernetes_api.sh [-t <TARGETSERVER> -c <CREDENTIALSFILE>] [-k <KUBE_CONFIG>]

Dependencies

These scripts call the Kubernetes API, so this must be exposed to the machine running the script. If not, the script will try and use the kubectl utility, which must be installed and configured.

The jq utility for parsing json is required.

Credentials file format (when connecting to API)

The credentials must be supplied for the Kubernetes cluster API. It's in a file in the following format, this is required for all these checks in this project to work correctly.

$ cat my-credentials-file
machine yourEndPointOrTarget login yourUserNameHere password YOURPASSWORDHERE

kubernetes-nagios's People

Contributors

colebrooke avatar creyd avatar ebourg avatar glemignot avatar ondrejholecek avatar stefanlasiewski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-nagios's Issues

.status.conditions[] order behavior changed in version 1.20?

Hi,

I've been using this script for a while and it has been working great. But I've had some issues with the deployment script as it seems like .status.conditions[-1] has changed to .status.conditions[0] on all of my deployments except for coredns.

Example deployment 1:

    "availableReplicas": 1,
    "conditions": [
      {
        "lastTransitionTime": "2021-11-01T13:00:41Z",
        "lastUpdateTime": "2021-11-01T13:00:41Z",
        "message": "Deployment has minimum availability.",
        "reason": "MinimumReplicasAvailable",
        "status": "True",
        "type": "Available"
      },
      {
        "lastTransitionTime": "2021-11-01T13:51:56Z",
        "lastUpdateTime": "2021-11-29T15:04:01Z",
        "message": "ReplicaSet \"snapshot-controller-786647474f\" has successfully progressed.",
        "reason": "NewReplicaSetAvailable",
        "status": "True",
        "type": "Progressing"
      }
    ],

Example deployment 2:

    "availableReplicas": 2,
    "conditions": [
      {
        "lastTransitionTime": "2021-11-01T13:51:25Z",
        "lastUpdateTime": "2021-11-01T13:51:32Z",
        "message": "ReplicaSet \"coredns-8474476ff8\" has successfully progressed.",
        "reason": "NewReplicaSetAvailable",
        "status": "True",
        "type": "Progressing"
      },
      {
        "lastTransitionTime": "2021-11-01T13:51:32Z",
        "lastUpdateTime": "2021-11-01T13:51:32Z",
        "message": "Deployment has minimum availability.",
        "reason": "MinimumReplicasAvailable",
        "status": "True",
        "type": "Available"
      }
    ],

This causes my check to get this result:

$ ./check_kube_deployments.sh -n kube-system
Unknown: snapshot-controller has condition Progressing: True - ReplicaSet "snapshot-controller-786647474f" has successfully progressed.
Unknown: dns-autoscaler has condition Progressing: True - ReplicaSet "dns-autoscaler-7df78bfcfb" has successfully progressed.
Unknown: csi-cinder-controllerplugin has condition Progressing: True - ReplicaSet "csi-cinder-controllerplugin-74d8dd876d" has successfully progressed.
OK: coredns has condition Available: True - Deployment has minimum availability.

Anyone else had this issue before?

Cater for bearer token based authentication

/kind feature

The curl methods are clean however only the netrc authentication method is possible. We have a need for authentication using native Kubernetes Service Account tokens.

Feature request: Add an optional input parameter for token based authentication

Usage ./check_kube_nodes.sh [-t <TARGETSERVER> -c <CREDENTIALSFILE> -b <BEARERTOKEN>] [-k <KUBE_CONFIG>]
K8STATUS="$(curl -sS $SSL --header "Authorization: Bearer $BEARERTOKEN" $TARGET/api/v1/nodes)"

ComponentStatus is deprecated in v1.19+

Hi,

check_kubernetes.sh -m components fails with Kubernetes 1.19 with the following error:

parse error: Invalid numeric literal at line 1, column 8

It seems to be caused by a warning message displayed before the JSON payload after calling api/v1/componentstatuses:

Warning: v1 ComponentStatus is deprecated in v1.19+{"apiVersion": "v1","items":...

check_kube_deployments output is messed up when there are some deployments with issues

Due to #23 checking all namespaces got 2 unknown entries .. at least it should.
The actual output is somewhat messed up:

$ ./check_kube_deployments.sh
Unknown: deployment#1 has condition Progressing: True - ReplicaSet "deployment#1-77db4cb58c" has successfully progressed.
OK: deployment#2 has condition Available: True - Deployment has minimum availability.
OK: deployment#3 has condition Available: True - Deployment has minimum availability.
OK: deployment#4 has condition Available: True - Deployment has minimum availability.
OK: deployment#5 has condition Available: True - Deployment has minimum availability.
Unknown: deployment#6 has condition Progressing: True - ReplicaSet "deployment#6-78994478dc" has successfully progressed.
OK:deployment#7 has condition Available: True - Deployment has minimum availability.
OK: deployment#8 has condition Available: True - Deployment has minimum availability.
OK: deployment#9 has condition Available: True - Deployment has minimum availability.
OK: deployment#10 has condition Available: True - Deployment has minimum availability.
OK: deployment#11 has condition Available: True - Deployment has minimum availability.
OK: deployment#12 has condition Available: True - Deployment has minimum availability.
OK: deployment#13 has condition Available: True - Deployment has minimum availability.
OK: deployment#14 has condition Available: True - Deployment has minimum availability.
OK: deployment#19 has condition Available: True - Deployment has minimum availability.
Unknown: deployment#15 has condition Available
Available: True
True - Deployment has minimum availability.
Deployment has minimum availability.
OK: deployment#16 has condition Available: True - Deployment has minimum availability.
OK: deployment#17 has condition Available: True - Deployment has minimum availability.
Unknown: deployment#18 has condition Available
Available: True
True - Deployment has minimum availability.
Deployment has minimum availability.
Unknown: deployment#15 has condition Available
Available: True
True - Deployment has minimum availability.
Deployment has minimum availability.
Unknown: deployment#18 has condition Available
Available: True
True - Deployment has minimum availability.
Deployment has minimum availability.

note the duplicated deployment#15, also there should be only 2 Unknown entries (checked by namespace and found two with the issue früm #23)

Parameterize API versions

/kind bug

The Deployment resource API post k8s 1.16 is /apps/v1. /extensions/v1beta1/ is deprecated.

DEPLOYMENTS_STATUS=$(curl -sS $SSL --netrc-file $CREDENTIALS_FILE $TARGET/apis/extensions/v1beta1/namespaces/$NAMESPACE/deployments/)

Parameterize API versions as variable within shell scripts to accommodate

from:
/apis/extensions/v1beta1/
to:
/apis/apps/v1/

Can we use it to send data from Agent to icinga master

Hi,
i want to watch my cluster from an Icinga master and not in the other direction =/

Current situation :
My cluster doesn't have public IP and he is not in the same network from my icinga master (master got public ip and Agent use AWS tools too comunicate with ext.).

Can we run the "script" and command from my cluster directly, to send informations to my icinga master.

I know its complicated to explain but i want know if i can send kube status form my cluster to my icinga master. In this case agent icinga send informations to master icinga but master icinga has nothing to do except display on icingaweb2, because i can't sent check_command to my icinga agent from master.

thanks for your help

Doesn't detect when node is totally down

The script reports "OK" even if a node is down, if docker is down or if the Kubelet has crashed or entered an error condition.

In this case, I stopped the Kubelet on docker02, but the script says the node is OK.

$ kubectl get nodes
NAME        STATUS       ROLES                      AGE    VERSION
docker01   Ready          controlplane,etcd,worker   421d   v1.17.4
docker02   NotReady   controlplane,etcd,worker   421d   v1.17.4
docker03   Ready          controlplane,etcd,worker   421d   v1.17.4

$ ./check_kube_nodes.sh -k kube_config_cluster.yml
OK - Kubernetes nodes all OK
$

Should check that kubectl is in PATH

We should prevent this error:

$ ./check_kube_nodes.sh
./check_kube_nodes.sh: line 48: kubectl: command not found
CRITICAL - unable to connect to Kubernetes via kubectl!

NRPE: Unable to read output

Hello Team,

I'm trying to set up monitoring of Kubectl cluster using this article https://github.com/colebrooke/kubernetes-nagios

I came a weird Issues. I have the Kubectl cluster running on Remote RHEL server. When I try to run the scripts locally using NRPE it works.
From Remote server locally.
/usr/local/nagios/libexec/check_pods.sh -k -n -w 500 -C 800
OK - pods are all OK, found 2 in state.

Same command using nrpe plugin on remote server too
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_pod_cjoc
NRPE: Unable to read output

So I have defined a command definition in nrpe.cfg & restarted NRPE agent on the Remote server.

When I try to invoke this script from Nagios server. I'm getting "NRPE: Unable to read output" error.

From Nagios Server
/usr/local/nagios/libexec/check_nrpe -H -c check_pod_cjoc
NRPE: Unable to read output

I have tested with two versions of NRPE agent i.e 3.2.1 & 4.0.3, I didn't try with other versions, but getting same error message

Note: Nagios user has admin(sudo) rights to run these scripts on Remote server.

Nagios running is running on v4.4.5 on RHEL server.

Let me know if you need more information. Can you guys please look at it. @ericloyd @sawolf
######Stay Home#######Stay Safe#########
Thanks,
Srikanth

Cater for Statefulsets and Daemonsets

/kind feature

Cater for more resource types:

Either:
A. Clone check_kube_deployments.sh to check_kube_statefulsets.sh and check_kube_daemonsets.sh and update the specific differences where needed.

B. Modify check_kube_deployments.sh to gather status for all resource types (deployments, statefulsets and daemonsets) within the same namespace.

In favor of option B because:

  • Less input variables required for Nagios operators & consumers.
  • Likely kubernetes operators are interested in all failures within a namespace and not a partial view.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.