Giter Club home page Giter Club logo

reacter's Introduction

Reacter

A tool for generating, consuming, and handling system monitoring events

Overview

  1. Executes Nagios-compatible check scripts
  2. Collects output and prints it as a formatted JSON string to standard output
  3. Publish output to an AMQP message broker
  4. Consumes output from an AMQP message broker
  5. Conditionally executes handler scripts based on check details and handler criteria

Checks: reacter check

Check scripts are consistent with the Nagios Plugin API. Checks can be any shell-executable program that exits with status 0 (OK), 1 (Warning), 2 (Critical), or 3+ (Unknown). Plugin output and performance data is parsed from the check's standard output.

Configuration

Checks are configured via a YAML file placed in a directory that Reacter will load the definitions from (specified via the --config-dir flag.) An example check definition looks like the following:

---
checks:
- name:                'my_cool_check'
  command:             ['my_cool_check', '--warning', '10', '--critical', '5']
  directory:           '/usr/local/bin'
  interval:            30
  timeout:             5000
  fall:                3
  rise:                2
  flap_threshold_high: 0.35
  flap_threshold_low:  0.15
  environment:
    HOME:      '/srv/home'
    OTHER_VAR: 6
    IS_COOL:   true

The configuration consists of a top-level checks array populated with one or more check definitions. Check definition fields are:

Field Type Required Default Description
name String Yes The name of the check
command Array(String) Yes The command expressed as an array of command and command-line parameters
directory String No $(pwd) The working directory to use when executing the command
interval Integer No 60 How often (in seconds) to execute the check
timeout Integer No 3000 The timeout (in milliseconds) before killing the check if it hasn't finished
fall Integer No 1 How many checks need to fail before reporting the change in status
rise Integer No 1 How many checks need to succeed after failing before reporting okay
environment Hash(String,Any) No A hash of key-value pairs that will be passed to the command as environment variables; replaces the calling shell environment
flap_threshold_high Float No 0.5 Maximum instability a service needs to be (0.0-1.0) to start flapping
flap_threshold_low Float No 0.25 How unstable a service needs to be (0.0-1.0) to stop flapping

Publication

Check results can be emitted to standard output for consumption by the reacter handler invocation of this utility, or by another service/program. One of the intended use cases is to emit results an HTTP POST them to a web service which will enqueue the messages to an AMQP message broker for later consumption by handlers.

The output format of a check is as follows:

{
  "check":{
    "node_name":"myhost",
    "name":"my_cool_check",
    "command":["my_cool_check", "--warning", "10", "--critical", "5"],
    "timeout": 5000,
    "enabled":true,
    "state": 0,
    "hard":  true,
    "changed": true,
    "interval": 30,
    "rise": 3,
    "fall": 2,
    "observations": {
      "size": 21,
      "flapping": false,
      "flap_detection": true,
      "flap_threshold_low": 0.15,
      "flap_threshold_high":0.35,
      "flap_factor": 0
    }
  },
  "output": "OK",
  "error": false,
  "timestamp":"1970-01-01T12:59:00.000000000-04:00"
}

This is formatted to be readable, but is output from reacter check as a single line, each line representing the output from one check's execution. Some of the fields in the output are described below.

Field Type Description
check.node_name String The hostname of the host the check executed on, or the value of --node-name
check.enabled Boolean Whether the check is enabled or not
check.state Integer The exit status of the check script
check.hard Boolean If a check is in the process of rising or falling, the status will remain unchanged but this field will be false
check.changed Boolean If the previous state of a check is different from the current state, this field will be true
error Boolean If the check script experienced an error that prevented execution, this will be true
observations.flapping Boolean If the check is oscillating between an okay and non-okay state, this will be true
observations.size Integer How many of the most-recent check states are stored in memory for flap detection
observations.flap_factor Float The current flap factor, which is compared to the high/low thresholds to determine if the check if flapping
output String The standard output captured from the check script's execution

Handlers: reacter handle

Handlers are executed in response to check results read from standard input. The handler definitions define the conditions on which a handler will be executed. The conditions include factors such as node name, check name, state, whether the check is flapping, and whether the check has changed state. Using these conditions, handlers can be executed for only a subset of check results as they stream in. Multiple handlers can respond to the same result, as each result is evaluated against each handler definition as it is processed.

Configuration

Handlers, like checks, are configured via a YAML file placed in a directory that Reacter will load the definitions from (specified via the --config-dir flag.) An example handler definition looks like the following:

---
handlers:
- name:                'my_team_slack_chat'
  command:             ['reacter-slack']
  timeout:             6000
  directory:           '/usr/local/bin'
  query:               ['bash', '-c', 'get_my_nodes > /tmp/node-list.txt']
  query_timeout:       3000
  nodefile:            /tmp/node-list.txt

  node_names:
  - my_node1
  - my_node2

  checks:
  - my_cool_check

  flapping: false
  only_changes: true

  parameters:
    token:   abc123def456
    channel: my-channel

  environment:
    HOME:      '/srv/home'

The configuration consists of a top-level handlers array populated with one or more handler definitions. Handler definition fields are:

Field Type Required Default Description
checks Array(String) No A list of check names to respond to
command Array(String) Yes The handler command expressed as an array of command and command-line parameters
cooldown Duration No 3000 How long to wait after the handler has fired before firing again
directory String No $(pwd) The working directory to use when executing the command
disable Boolean No false Whether to disable the handler
environment Hash(String,Any) No A hash of key-value pairs to pass to the handler command as environment variables; replaces the calling shell environment
name String Yes The name of the handler
node_names Array(String) No A list of nodes to respond to (will override query and nodefile)
nodefile String No A path to a file containing a list of nodes to respond to
only_changes Boolean No false Whether to only handle state changes or not (uses the check result changed field)
parameters Hash(String,Any) No A hash of key-value pairs to pass to the handler command as environment variables; prefixed with REACTER_PARAM_
query_timeout Duration No 3000 How long to wait for the query command to execute before killing it
query Array(String) No A command to execute before the handler that will return a list of nodes to respond to
skip_flapping Boolean No true Whether to skip flapping checks or not
skip_ok Boolean No false Whether to only handle checks in a non-okay state

Handler Scripts

Handler scripts are executed only when a handler definition's conditions are met. These scripts can be built to do anything that you need done to respond to a check result. This typically includes things like sending a PagerDuty alert, posting a notification to a Slack channel, or forwarding check data to a time series database. Handler scripts are called with several well-know environment variables that the handler may use to provide context-specific details about the check result being handled. These variables include:

Environment Variable Description
REACTER_CHECK_ID The check's node name concatenated with the check name, joined by a :. This can be used to uniquely identify a check from a specific node for services that require stateful information to clear events after they are first generated.
REACTER_CHECK_NAME The name of the check being handled
REACTER_CHECK_NODE The node name that the check was emitted from (corresponds to --node-name from reacter check)
REACTER_EPOCH The epoch time of the check event (seconds since Jan 1 1970)
REACTER_EPOCH_MS The epoch time of the check event (milliseconds since Jan 1 1970)
REACTER_HANDLER The name of the handler as defined in the handler definition configuration
REACTER_STATE The state of the check result being handled; one of "okay", "warning", "critical", or "unknown"
REACTER_STATE_CHANGED 0 if the state is unchanged, 1 if the check's state has changed
REACTER_STATE_FLAPPING 0 if the check is not flapping, 1 if it is
REACTER_STATE_HARD 0 if the check is rising or falling, 1 if the check is in a hard state
REACTER_STATE_ID The numeric exit status of the check result that was emitted from the check script
REACTER_PARAM_* Expanded to include any parameters specified in the parameters hash for the handler definition. All keys are converted to uppercase.

Node Queries and Caching Features

reacter's People

Contributors

ghetzel avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.