Giter Club home page Giter Club logo

nrpe_exporter's People

Contributors

brian-brazil avatar ca-scribner avatar candlerb avatar conr avatar dependabot[bot] avatar edguy3 avatar igormp avatar rbarry82 avatar sed-i avatar simskij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nrpe_exporter's Issues

Ssl support?

Hi All,

Any way to implement ssl support in nrpe exporter? would be useful so i can use it with exsting nagios checks (i.e custom hardware checks)

Cheers

NRPE Arguments

I don't have idea, but is it possible to send Arguments?

I have define mi check_load (nrpe server) as:
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 0.8 -c 1.1,1.0,0.9 -r
Or in another check, I have to send the VM hostname.
command[check_vm_name]=/usr/check/check_vm_node.sh $ARG1$

Any solution?
Thanks.

exporter panics at irregular intervals due to rand.NewSource not being safe for concurrent use

I am using the nrpe_exporter (https://github.com/RobustPerception/nrpe_exporter) to make about 4500 nrpe client requests every 70s. A few times a day at irregular intervals (typically hours, but sometimes minutes) the nrpe_exporter crashes with the following error:

panic: runtime error: index out of range

goroutine 3888620 [running]:
math/rand.(*rngSource).Int63(0xc420083500, 0x5d3ea3c976019300)
        /home/dprittie/.cache/pants/bin/go/linux/x86_64/1.8.3/go/go/src/math/rand/rng.go:231 +0x8c
math/rand.(*Rand).Int63(0xc420018c30, 0x5d3ea3c976019300)
        /home/dprittie/.cache/pants/bin/go/linux/x86_64/1.8.3/go/go/src/math/rand/rand.go:81 +0x33
math/rand.(*Rand).Uint32(0xc420018c30, 0xc420765ae4)
        /home/dprittie/.cache/pants/bin/go/linux/x86_64/1.8.3/go/go/src/math/rand/rand.go:84 +0x2b
nrpe.randomizeBuffer(0xc4208e4480, 0x40c, 0x40c)
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe.nrpe/src/nrpe/nrpe.go:110 +0x52
nrpe.buildPacket(0xc400000001, 0xc420765bb0, 0x8, 0x20, 0x8)
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe.nrpe/src/nrpe/nrpe.go:202 +0x208
nrpe.Run(0xcce260, 0xc4202ac0f0, 0xc4202cce54, 0x8, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, ...)
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe.nrpe/src/nrpe/nrpe.go:282 +0x10f
main.collectCommandMetrics(0xc4202cce54, 0x8, 0x0, 0x0, 0x0, 0xcce140, 0xc420b300c8, 0xcc53a0, 0xc420123620, 0xc4208d7670, ...)
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe_exporter.nrpe_exporter/src/nrpe_exporter/nrpe_exporter.go:49 +0x120
main.(*Collector).Collect(0xc420596b90, 0xc4202cd080)
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/src.go.src.nrpe_exporter.nrpe_exporter/src/nrpe_exporter/nrpe_exporter.go:72 +0x357
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc4201bc5e0, 0xc4202cd080, 0xcc89a0, 0xc420596b90)
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/3rdparty.go.github.com.prometheus.client_golang.prometheus/src/github.com/prometheus/client_golang/prometheus/registry.go:433 +0x61
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
        /opt/teamcity-agent-01/work/96ee0984d4e5ff87/.pants.d/compile/go/3rdparty.go.github.com.prometheus.client_golang.prometheus/src/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x2ec

Note the line numbers may be a little off, because we have applied a couple of patches to suit our local environment.

I believe we see this error because rand.NewSource is not safe for concurrent use, which I found documented at https://pkg.go.dev/math/rand. I have opened a PR to address this issue upstream here envimate/nrpe#3. I think in an ideal world I will get this fixed there and then after merge get that pulled into https://github.com/aperum/nrpe and then finally here.

I am opening this issue so that anyone else that runs into this problem whilst using the nrpe_exporter can benefit from what I have found out so far. Also because both nrpe projects have had no activity for over four years. So perhaps we will find that in order to get this fix into nrpe_exporter we need a new fork off aperum/nrpe.

nrpe_exporter issue

[root@jino Downloads]# ./nrpe_exporter
level=info ts=2017-12-15T18:22:45.45790008Z caller=nrpe_exporter.go:129 msg="Starting nrpe_exporter" version="(version=, branch=, revision=)"
level=info ts=2017-12-15T18:22:45.457972373Z caller=nrpe_exporter.go:130 msg="Build context" build_context="(go=go1.9, user=, date=)"
level=info ts=2017-12-15T18:22:45.457995283Z caller=nrpe_exporter.go:131 msg="Listening on address" address=:9275
level=error ts=2017-12-15T18:22:49.152395908Z caller=nrpe_exporter.go:72 msg="Error running command" command=check_load err="read tcp 127.0.0.1:49300->127.0.0.1:5666: read: connection reset by peer"
^C

Add "check" label

Enhancement Proposal

As an operator when I get an alert I would like to quickly identify the check and the unit that is firing.
For the unit I can use labels.juju_unit but I am missing a check label.

I can extract that from the job, but it would be good to have that out of the box.

Right now I can workaround that using a rule with the following expression:

label_replace(avg_over_time(command_status{juju_model="openstack"}[5m]), "check", "$1", "job", ".*.(check_.*)_prometheus_scrape") > 1

The above will produce metrics like the following:

{check="check_prometheus_libvirt_exporter_http", host="192.168.11.105", instance="192.168.11.105:5666", job="juju_openstack_666cbd0_mycontext_mycloud_myhost_check_prometheus_libvirt_exporter_http_prometheus_scrape", juju_application="nrpe", juju_model="openstack", juju_model_uuid="666cbd0e-58a4-4951-8b75-9b5ca9733f2f", juju_unit="mycontext-mycloud-myhost"}

too many open files

Bug Description

Hitting too many open files with nrpe-exporter.

nrpe-exporter has PID 1026962 and it is listening on :9275.

Limits are currently set at 1024 on the system

# ulimit -n
1024

lsof shows that most of them are sockets for ESTABLISHED connections.

# ps aux | grep nrpe-exporter
root     1026962  0.0  0.0 5404120 77720 ?       Ssl  11:36   0:11 /usr/local/bin/nrpe-exporter
# ll /proc/1026962/fd/ | wc -l
1027
# lsof -p 1026962 | wc -l
1028
# lsof -p 1026962 | grep ESTABLISHED | wc -l
1017

To reproduce

In an environment with 196 nrpe units and 1659 targets:

  1. juju deploy cos-proxy
  2. juju relate cos-proxy:monitors nrpe:monitors

Environment

nrpe_exporter is installed by charm cos-proxy at latest/edge rev. 9 installed on a focal lxd.
Lxd at 5.7-c62733b (snap latest/stable)

Relevant log output

Log from nrpe-exporter journal:
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.696Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.133:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.733Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.2.155:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.750Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.131:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.760Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.105:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.770Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.2.128:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.786Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.187:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.803Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.131:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.834Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.3.155:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.897Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.3.181:5666: socket: too many open files"
Nov 10 19:23:06 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:06.920Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.2.158:5666: socket: too many open files"
Nov 10 19:23:07 juju-733f2f-8-lxd-22 nrpe-exporter[3295]: level=error ts=2022-11-10T19:23:07.015Z caller=nrpe_exporter.go:102 msg="Error dialing NRPE server" err="dial tcp 192.168.1.113:5666: socket: too many open files"

Additional context

This results on targets being down on Prometheus

Container Fails To Start

I cannot start the container. It fails and says standard_init_linux.go:190: exec user process caused "no such file or directory"
I have tried doing dos2unix on all the files and changing the entry point. When I do docker run -it and use /bin/ash I can enter the container and see the nrpe_exporter binary file in /bin/nrpe_exporter but manually trying to start it with ./nrpe_exporter leads to a file not found error

Migrate CI to github actions

The current CI used is CircleCI, and while great as well, we have everything else in GitHub Actions, so this should be migrated at some point. While at it, also delete the travis config file and any other unused files lying around

Allow multiple commands against targets

Hello,

Is it possible to specify multiple commands against targets?

I have tried the following and it seems to fail. It would be easy to group all checks for targets in a job.

- job_name: 'nrpe'
    metrics_path: /export
    params:
      target: ['127.0.0.1:5666']
      command: ['check_load','check_users']
    static_configs:
      - targets:
        - 127.0.0.1:9275

Thanks.

Check_load against localhost is blank

When I run the check_load against the localhost the webpage is blank. I can still get metrics from Prometheus but that don't correlate to NRPE and look like basic default metrics like up and process_max_fds Any ideas to why? Thanks

Current docker image lacks needed dynamic libraries

Due to the OpenSSL dependency of this program, turns out that the resulting binary is dynamic linked:

# file nrpe_exporter
nrpe_exporter: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fa46de95fb47fab8f7534ba7fa624f4158b1d6cc, for GNU/Linux 3.2.0, not stripped
# ldd nrpe_exporter
	linux-vdso.so.1 (0x00007fffdd794000)
	libssl.so.1.1 => /usr/lib/libssl.so.1.1 (0x00007f6196ef8000)
	libcrypto.so.1.1 => /usr/lib/libcrypto.so.1.1 (0x00007f6196c18000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f6196bf0000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f6196a28000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f6196a20000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f6196ff8000)

Running without CGO doesn't seem to be an option here, sadly:

# CGO_ENABLED=0 go build nrpe_exporter.go
# github.com/spacemonkeygo/openssl
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/http.go:38:14: undefined: NewCtxFromFiles
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:24:7: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:32:16: undefined: Server
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:43:43: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:51:41: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:79:38: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:79:62: undefined: Conn
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:96:45: undefined: Ctx
../../../.go/pkg/mod/github.com/spacemonkeygo/[email protected]/net.go:97:20: undefined: Conn

and trying to build it statically should be possible, however:

# go build -ldflags="-extldflags '-static'" nrpe_exporter.go 
# command-line-arguments
/usr/lib/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
/usr/bin/ld: /tmp/go-link-932140215/000004.o: in function `_cgo_26061493d47f_C2func_getaddrinfo':
/tmp/go-build/cgo-gcc-prolog:58: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
collect2: error: ld returned 1 exit status

I really don't want to wrap my head around linker issues atm, so using another base image should suffice for now.

Since the current used base image is based on busybox, which only includes the static binaries for busybox itself with no other shared libraries in the system, the image fails as seen in #13.

I'll try to do a PR with the needed changes to make it run in an easier way.

Check_command with ARGS is giving command_status as 0

I am trying to create a job in prometheus for the below check command in nrpe.cfg.

command[check_test]=/usr/local/bin/check_test.py -i $ARG1$

I am not able to get command_status as 1 for this command. I tried multiple check_commands with $ARG1$ and it seems like NRPE exporter is not able to run the check_commands with dynamic ARGS.

HELP command_duration Length of time the NRPE command took
TYPE command_duration gauge
command_duration 0.052347708
HELP command_ok Indicates whether or not the command was a success
TYPE command_ok gauge
command_ok 0
HELP command_status Indicates the status of the command
TYPE command_status gauge
command_status 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.