canonical / nrpe_exporter Goto Github PK
View Code? Open in Web Editor NEWA Prometheus exporter for generating metrics from commands executed by a running NRPE daemon.
License: Apache License 2.0
A Prometheus exporter for generating metrics from commands executed by a running NRPE daemon.
License: Apache License 2.0
Hi All,
Any way to implement ssl support in nrpe exporter? would be useful so i can use it with exsting nagios checks (i.e custom hardware checks)
Cheers
I don't have idea, but is it possible to send Arguments?
I have define mi check_load (nrpe server) as:
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 0.8 -c 1.1,1.0,0.9 -r
Or in another check, I have to send the VM hostname.
command[check_vm_name]=/usr/check/check_vm_node.sh $ARG1$
Any solution?
Thanks.
I am using the nrpe_exporter (https://github.com/RobustPerception/nrpe_exporter) to make about 4500 nrpe client requests every 70s. A few times a day at irregular intervals (typically hours, but sometimes minutes) the nrpe_exporter crashes with the following error:
panic: runtime error: index out of range
goroutine 3888620 [running]:
math/rand.(*rngSource).Int63(0xc420083500, 0x5d3ea3c976019300)
/home/dprittie/.cache/pants/bin/go/linux/x86_64/1.8.3/go/go/src/math/rand/rng.go:231 +0x8c
math/rand.(*Rand).Int63(0xc420018c30, 0x5d3ea3c976019300)
/home/dprittie/.cache/pants/bin/go/linux/x86_64/1.8.3/go/go/src/math/rand/rand.go:81 +0x33
math/rand.(*Rand).Uint32(0xc420018c30, 0xc420765ae4)
/home/dprittie/.cache/pants/bin/go/linux/x86_64/1.8.3/go/go/src/math/rand/rand.go:84 +0x2b
nrpe.randomizeBuffer(0xc4208e4480, 0x40c, 0x40c)
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe.nrpe/src/nrpe/nrpe.go:110 +0x52
nrpe.buildPacket(0xc400000001, 0xc420765bb0, 0x8, 0x20, 0x8)
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe.nrpe/src/nrpe/nrpe.go:202 +0x208
nrpe.Run(0xcce260, 0xc4202ac0f0, 0xc4202cce54, 0x8, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, ...)
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe.nrpe/src/nrpe/nrpe.go:282 +0x10f
main.collectCommandMetrics(0xc4202cce54, 0x8, 0x0, 0x0, 0x0, 0xcce140, 0xc420b300c8, 0xcc53a0, 0xc420123620, 0xc4208d7670, ...)
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe_exporter.nrpe_exporter/src/nrpe_exporter/nrpe_exporter.go:49 +0x120
main.(*Collector).Collect(0xc420596b90, 0xc4202cd080)
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe_exporter.nrpe_exporter/src/nrpe_exporter/nrpe_exporter.go:72 +0x357
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc4201bc5e0, 0xc4202cd080, 0xcc89a0, 0xc420596b90)
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/3rdparty.go.github.com.prometheus.client_golang.prometheus/src/github.com/prometheus/client_golang/prometheus/registry.go:433 +0x61
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/3rdparty.go.github.com.prometheus.client_golang.prometheus/src/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x2ec
Note the line numbers may be a little off, because we have applied a couple of patches to suit our local environment.
I believe we see this error because rand.NewSource is not safe for concurrent use, which I found documented at https://pkg.go.dev/math/rand. I have opened a PR to address this issue upstream here envimate/nrpe#3. I think in an ideal world I will get this fixed there and then after merge get that pulled into https://github.com/aperum/nrpe and then finally here.
I am opening this issue so that anyone else that runs into this problem whilst using the nrpe_exporter can benefit from what I have found out so far. Also because both nrpe projects have had no activity for over four years. So perhaps we will find that in order to get this fix into nrpe_exporter we need a new fork off aperum/nrpe.
[root@jino Downloads]# ./nrpe_exporter
level=info ts=2017-12-15T18:22:45.45790008Z caller=nrpe_exporter.go:129 msg="Starting nrpe_exporter" version="(version=, branch=, revision=)"
level=info ts=2017-12-15T18:22:45.457972373Z caller=nrpe_exporter.go:130 msg="Build context" build_context="(go=go1.9, user=, date=)"
level=info ts=2017-12-15T18:22:45.457995283Z caller=nrpe_exporter.go:131 msg="Listening on address" address=:9275
level=error ts=2017-12-15T18:22:49.152395908Z caller=nrpe_exporter.go:72 msg="Error running command" command=check_load err="read tcp 127.0.0.1:49300->127.0.0.1:5666: read: connection reset by peer"
^C
As an operator when I get an alert I would like to quickly identify the check and the unit that is firing.
For the unit I can use labels.juju_unit but I am missing a check label.
I can extract that from the job, but it would be good to have that out of the box.
Right now I can workaround that using a rule with the following expression:
label_replace(avg_over_time(command_status{juju_model="openstack"}[5m]), "check", "$1", "job", ".*.(check_.*)_prometheus_scrape") > 1
The above will produce metrics like the following:
{check="check_prometheus_libvirt_exporter_http", host="192.168.11.105", instance="192.168.11.105:5666", job="juju_openstack_666cbd0_mycontext_mycloud_myhost_check_prometheus_libvirt_exporter_http_prometheus_scrape", juju_application="nrpe", juju_model="openstack", juju_model_uuid="666cbd0e-58a4-4951-8b75-9b5ca9733f2f", juju_unit="mycontext-mycloud-myhost"}
Hitting too many open files with nrpe-exporter.
nrpe-exporter has PID 1026962 and it is listening on :9275.
Limits are currently set at 1024 on the system
# ulimit -n
1024
lsof
shows that most of them are sockets for ESTABLISHED connections.
# ps aux | grep nrpe-exporter
root 1026962 0.0 0.0 5404120 77720 ? Ssl 11:36 0:11 /usr/local/bin/nrpe-exporter
# ll /proc/1026962/fd/ | wc -l
1027
# lsof -p 1026962 | wc -l
1028
# lsof -p 1026962 | grep ESTABLISHED | wc -l
1017
In an environment with 196 nrpe units and 1659 targets:
juju deploy cos-proxy
juju relate cos-proxy:monitors nrpe:monitors
nrpe_exporter is installed by charm cos-proxy at latest/edge rev. 9 installed on a focal lxd.
Lxd at 5.7-c62733b (snap latest/stable)
Log from nrpe-exporter journal:
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.696Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.133:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.733Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.2.155:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.750Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.131:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.760Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.105:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.770Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.2.128:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.786Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.187:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.803Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.131:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.834Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.3.155:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.897Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.3.181:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.920Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.2.158:5666: socket: too many open files"
Nov 10 19:23:07 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:07.015Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.113:5666: socket: too many open files"
This results on targets being down on Prometheus
I cannot start the container. It fails and says standard_init_linux.go:190: exec user process caused "no such file or directory"
I have tried doing dos2unix on all the files and changing the entry point. When I do docker run -it and use /bin/ash I can enter the container and see the nrpe_exporter binary file in /bin/nrpe_exporter but manually trying to start it with ./nrpe_exporter leads to a file not found error
The current CI used is CircleCI, and while great as well, we have everything else in GitHub Actions, so this should be migrated at some point. While at it, also delete the travis config file and any other unused files lying around
Hello,
Is it possible to specify multiple commands against targets?
I have tried the following and it seems to fail. It would be easy to group all checks for targets in a job.
- job_name: 'nrpe'
metrics_path: /export
params:
target: ['127.0.0.1:5666']
command: ['check_load','check_users']
static_configs:
- targets:
- 127.0.0.1:9275
Thanks.
When I run the check_load against the localhost the webpage is blank. I can still get metrics from Prometheus but that don't correlate to NRPE and look like basic default metrics like up and process_max_fds Any ideas to why? Thanks
Due to the OpenSSL dependency of this program, turns out that the resulting binary is dynamic linked:
# file nrpe_exporter
nrpe_exporter: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fa46de95fb47fab8f7534ba7fa624f4158b1d6cc, for GNU/Linux 3.2.0, not stripped
# ldd nrpe_exporter
linux-vdso.so.1 (0x00007fffdd794000)
libssl.so.1.1 => /usr/lib/libssl.so.1.1 (0x00007f6196ef8000)
libcrypto.so.1.1 => /usr/lib/libcrypto.so.1.1 (0x00007f6196c18000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f6196bf0000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f6196a28000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f6196a20000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f6196ff8000)
Running without CGO doesn't seem to be an option here, sadly:
# CGO_ENABLED=0 go build nrpe_exporter.go
# github.com/spacemonkeygo/openssl
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/http.go:38:14: undefined: NewCtxFromFiles
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:24:7: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:32:16: undefined: Server
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:43:43: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:51:41: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:79:38: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:79:62: undefined: Conn
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:96:45: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:97:20: undefined: Conn
and trying to build it statically should be possible, however:
# go build -ldflags="-extldflags '-static'" nrpe_exporter.go
# command-line-arguments
/usr/lib/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
/usr/bin/ld: /tmp/go-link-932140215/000004.o: in function `_cgo_26061493d47f_C2func_getaddrinfo':
/tmp/go-build/cgo-gcc-prolog:58: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
collect2: error: ld returned 1 exit status
I really don't want to wrap my head around linker issues atm, so using another base image should suffice for now.
Since the current used base image is based on busybox, which only includes the static binaries for busybox itself with no other shared libraries in the system, the image fails as seen in #13.
I'll try to do a PR with the needed changes to make it run in an easier way.
I am trying to create a job in prometheus for the below check command in nrpe.cfg.
command[check_test]=/usr/local/bin/check_test.py -i
I am not able to get command_status as 1 for this command. I tried multiple check_commands with
HELP command_duration Length of time the NRPE command took
TYPE command_duration gauge
command_duration 0.052347708
HELP command_ok Indicates whether or not the command was a success
TYPE command_ok gauge
command_ok 0
HELP command_status Indicates the status of the command
TYPE command_status gauge
command_status 3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.