Giter Club home page Giter Club logo

sensu-plugins-hardware's Introduction

Sensu-Plugins-hardware

Build Status Gem Version Code Climate Test Coverage Dependency Status

Functionality

check-hardware-fail will lookup in the output of dmesg for lines matching a provided query, it accepts --facility, --level and --kernel options to run dmesg command. Returns CRITICAL if any occurrence is found and UNKNOWN if provided options are invalid or the command execution fails.

Files

  • bin/check-hardware-fail

Usage

Usage: ./check-hardware-fail.rb (options)
    -f FACILITY[,FACILITY],          Restrict output to defined facilities. Supported log facilities: kern,user,mail,daemon,auth,syslog,lpr,news
        --facility
        --invert                     Invert order
    -k, --kernel                     Include kernel messages
    -L, --level LEVEL[,LEVEL]        Restrict output to defined levels, otherwise all levels are included. Supported log levels: emerg,alert,crit,err,warn,notice,info,debug
    -l, --lines NUMBER               Maximum number of lines to read from dmesg, 0 (default) means all
    -q, --query QUERY                What pattern to look for in the output of dmesg (regex or literal)
    -s, --seconds SECONDS            Amount of seconds to lookbehind from dmesg output. This option is incompatible with --lines

Example of usage:

Check the first 100 lines for 'killed as a result of limit'

check-hardware-fail.rb -l 100 --invert -q 'killed as a result of limit'

Check the last 100 lines for 'killed as a result of limit'

check-hardware-fail.rb -l 100 -q 'killed as a result of limit'

The following options are only available for linux OS:

  • --seconds Amount of seconds to lookbehind from dmesg output. This option is incompatible with --lines
  • --facility Restrict output to defined facilities. Supported log facilities: kern,user,mail,daemon,auth,syslog,lpr,news
  • --level Restrict output to defined levels, otherwise all levels are included. Supported log levels: emerg,alert,crit,err,warn,notice,info,debug
  • --kernel Include kernel messages

Check the last 300 seconds for 'killed as a result of limit'

check-hardware-fail.rb -s 300 -q 'killed as a result of limit'

Check the last 300 seconds for 'killed' on auth and syslog facilities

check-hardware-fail.rb -s 300 -f auth,syslog -q 'killed'

Installation

Installation and Setup

Notes

sensu-plugins-hardware's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sensu-plugins-hardware's Issues

Plugin should have option to select days of dmesg log

Plugin should have option to select days of dmesg log you want to check for Hardware Error.

For example, if I want to check only for last 1 day or 7 days of logs only.

Something like following might cause an alert, even when the Hardware Error occurred few months back.

[Sat May 14 05:34:20 2016] mce: [Hardware Error]: Machine check events logged

Check does not provide a way of specifying a time based occurrences of a given pattern.

The check check-hardware-fail.rb does not provide a way of checking only a given period of time, let's say the last 5mins, instead of just counting lines on the output of dmesg.
Would be nice to have the ability to specify certain amount of time to be inspected for a given pattern.
Something like: check-hardware-fail.rb -t 5min -q 'killed as a result of limit'

check-hardware-fail errors when dmesg contains characters not valid for utf-8

As reported by a Sensu Enterprise user:

bash-4.1$ /opt/sensu/embedded/bin/ruby /opt/sensu/embedded/bin/check-hardware-fail.rb
Check failed to run: invalid byte sequence in UTF-8, 
["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-hardware-1.1.0/bin/check-hardware-fail.rb:55:in `[]'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-hardware-1.1.0/bin/check-hardware-fail.rb:55:in `block in run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-hardware-1.1.0/bin/check-hardware-fail.rb:55:in `select'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-hardware-1.1.0/bin/check-hardware-fail.rb:55:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:58:in `block in '"]

I think we can address this with some input sanitization, as described here https://robots.thoughtbot.com/fight-back-utf-8-invalid-byte-sequences

check-hardware not reporting all dmesg error message

I am trying to figure out how to configure check-hardware probe. Currently, I am seeing an occational dmesg entries like the following.

[ 4866.464697] sd 2:0:0:0: [sda] tag#1 abort
[ 4906.432614] sd 2:0:0:0: [sda] tag#1 abort
[ 9445.441825] sd 2:0:0:1: [sdb] tag#4 abort
[13624.437138] sd 2:0:0:1: [sdb] tag#1 abort
[16219.436717] sd 2:0:0:1: [sdb] tag#3 abort
[16255.564600] sd 2:0:0:1: [sdb] tag#0 abort
[16336.460350] sd 2:0:0:1: [sdb] tag#0 abort
[16648.427363] sd 2:0:0:1: [sdb] tag#31 abort
[16722.379150] sd 2:0:0:1: [sdb] tag#3 abort
[16808.378876] sd 2:0:0:1: [sdb] tag#0 abort
[17361.481198] sd 2:0:0:1: [sdb] tag#1 abort
[17412.377007] sd 2:0:0:1: [sdb] tag#49 abort
[17481.988986] sd 2:0:0:1: [sdb] tag#0 abort
[17520.956819] INFO: task jbd2/sdb-8:4403 blocked for more than 120 seconds.
[17520.968703]       Not tainted 4.4.0-141-generic #167-Ubuntu
[17520.973645] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17520.980257] jbd2/sdb-8      D ffff8803f20a7ad8     0  4403      2 0x00000000
[17520.980265]  ffff8803f20a7ad8 ffff8803f20a7ad0 ffff88042d7a2a00 ffff88042a64d400
[17520.980270]  ffff8803f20a8000 ffff88043fcd7300 7fffffffffffffff ffffffff81855090
[17520.980275]  ffff8803f20a7c30 ffff8803f20a7af0 ffffffff81854895 0000000000000000
[17520.980280] Call Trace:
[17520.980290]  [<ffffffff81855090>] ? bit_wait+0x60/0x60
[17520.980295]  [<ffffffff81854895>] schedule+0x35/0x80
[17520.980300]  [<ffffffff81857a26>] schedule_timeout+0x1b6/0x270
[17520.980307]  [<ffffffff810cfd41>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[17520.980314]  [<ffffffff8106641e>] ? kvm_clock_get_cycles+0x1e/0x20
[17520.980335]  [<ffffffff810fb46e>] ? ktime_get+0x3e/0xb0
[17520.980349]  [<ffffffff81855090>] ? bit_wait+0x60/0x60
[17520.980360]  [<ffffffff81854004>] io_schedule_timeout+0xa4/0x110

These messages aren't detected by the check-hardware-fail probe

./check-hardware-fail.rb -s 300
CheckHardwareFail OK: OK

Is this probe not meant for the dmesg events?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.