Giter Club home page Giter Club logo

redalert's Introduction

Redalert

Circle CI

Launch Stack

For monitoring your infrastructure and sending notifications if stuff is not ok. (e.g. pinging your websites/APIs via HTTP GET at specified intervals, and alerting you if there is downtime).

Features

Checks

  • Website monitoring & latency measurement (check type: web-ping)
  • Server metrics from local machine (check type: scollector)
  • Docker container metrics (check type: docker-stats)
  • Docker container metrics from remote host via SSH (check type: remote-docker)
  • Postgres counts/stats via SQL queries (check type: postgres)
  • TCP connectivity monitoring & latency measurement (check type: tcp)
  • Execute local commands & capture output (check type: command)
  • Execute remote commands via SSH & capture output (check type: remote-command)
  • Run test suite and capture report metrics via JUnit XML format (check type: test-report)

Checks will happen at specified intervals or explicit trigger (i.e. trigger check API endpoint).

Dashboard and Alerts

  • Alert notifications available on several channels:
    • sending email (gmail)
    • sending SMS (twilio)
    • posting a message to Slack (slack)
    • unix stream (stderr)
  • Provides ping status & latency info to stdout.
  • Adjustable back-off after a check fails (constant, linear, exponential - see notes below).
  • Includes a web UI as indicated by the screenshot above. (visit localhost:8888/, configure port via cli flag)
  • Triggers a failure alert (redalert) when a check is failing, and a recovery alert (greenalert) when the check has recovered (e.g. a successful ping, following a failing ping).
  • Triggers an alert when specified metric is above/below threshold.

Assertions

  • Assertions are used to define criteria for checks to pass or fail:
  • Assert on metrics
    • source: metric
    • > or greater than
    • >= or greater than or equal
    • < or less than
    • <= or less than or equal
    • == or = or equals
  • Assert on metadata
    • source: metadata
    • web-ping returns status_code
  • Assert on response
    • source: text
    • source: json

API

Endpoint Description
GET /v1/stats Retrieve stats for all checks
POST /v1/checks/{check_id}/disable Disable check
POST /v1/checks/{check_id}/enable Enable check
POST /v1/checks/{check_id}/trigger Trigger check

Design


         ┌──────────────────────────────┐
         │                              │
   ┌────▶│     Redalert Check Flow      │
   │     │                              │
   │     └──────────────────────────────┘
   │                    │
   │          @interval or ->trigger   ┌──────────────────────┐
   │                    │            ┌▶│  error during check  │
   │                    ▼            │ └──────────────────────┘
   │        ┌──────────────────────┐ │ ┌──────────────────────┐
   │        │  is check failing?   │─┤ │  failing assertions  │
   │        └──────────────────────┘ │ │     * metrics *      │
   │                    │            └▶│     * metadata *     │
   │          ┌───YES───┴───NO────┐    │     * response *     │
   │          │                   │    └──────────────────────┘
   │          ▼                   ▼
   │  ┌───────────────┐   ┌───────────────┐
   │  │send alerts via│   │   is check    │
   │  │   notifiers   │   │  recovering?  │
   │  └───────────────┘   └───────────────┘
   │  ┌───────────────┐          YES
   │  │adjust backoff │           │
   │  └───────────────┘           ▼
   │          │           ┌───────────────┐
   │          │           │send alerts via│
   │          │           │   notifiers   │
   │          │           └───────────────┘
   │          │           ┌───────────────┐
   │          │           │ reset backoff │
   │          │           └───────────────┘
   │          │                   │
   │          ▼                   ▼
   │         ┌──────────────────────┐
   └─────────│    Event Storage     │
             └──────────────────────┘

Screenshots

Getting started

Run via Docker:

docker run -d -P -v /path/to/config.json:/config.json jonog/redalert

Quick bootstrap example:

curl https://gist.githubusercontent.com/jonog/32c953aedf03edf71acaef53d89ce785/raw/e87f7e933165574e1d441781465223bfe6c3f1aa/sample_redalert_config.json > /tmp/sample_redalert_config.json && \
    docker run -d -P -v /tmp/sample_redalert_config.json:/config.json --name test_redalert jonog/redalert && \
    open "http://$(docker port test_redalert 8888)"

Usage

Get started with the redalert command:

Usage:
  redalert [command]

Available Commands:
  checks      List checks
  config-sync Sync file and database configurations
  server      Run checks and server stats
  version     Print the version number of Redalert

Flags:
  -d, --config-db string     config database url
  -f, --config-file string   config file (default "config.json")
  -s, --config-s3 string     config S3
  -u, --config-url string    config url
  -h, --help                 help for redalert
  -p, --port int             port to run web server (default 8888)
  -r, --rpc-port int         port to run RPC server (default 8889)

Use "redalert [command] --help" for more information about a command.

Configuration

Configure servers to monitor & alert settings via a configuration file:

  • a local file (specified by -f or --config-file) - defaults to config.json
  • a file remotely accessible via HTTP (specified by -u or --config-url)
  • a file hosted in an AWS S3 bucket (specified by -s or --config-s3)

TODO: document Postgres configuration option

Example config.json
{
   "checks":[
      {
         "name":"Google",
         "type": "web-ping",
         "config": {
            "address":"http://google.com"
         },
         "send_alerts": ["stderr"],
         "backoff": {
            "type": "constant",
            "interval": 10
         },
         "assertions": [
             {
                 "comparison": "==",
                 "identifier": "status_code",
                 "source": "metadata",
                 "target": "200"
             }
         ]
      }
   ],
   "notifications": []
}
Example Larger config.json
{
    "checks": [
        {
            "name": "Demo HTTP Status Check",
            "type": "web-ping",
            "config": {
                "address": "http://httpstat.us/200",
                "headers": {
                    "X-Api-Key": "ABCD1234"
                }
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 10,
                "type": "constant"
            },
            "assertions": [
                {
                    "comparison": "==",
                    "identifier": "status_code",
                    "source": "metadata",
                    "target": "200"
                }
            ]
        },
        {
            "name": "Demo Response Check",
            "type": "web-ping",
            "config": {
                "address": "http://httpstat.us/400"
            },
            "send_alerts": [
                "stderr",
                "email",
                "chat",
                "sms"
            ],
            "backoff": {
                "interval": 10,
                "type": "linear"
            },
            "assertions": [
                {
                    "comparison": "less than",
                    "identifier": "latency",
                    "source": "metric",
                    "target": "1100"
                },
                {
                    "comparison": "==",
                    "identifier": "status_code",
                    "source": "metadata",
                    "target": "400"
                },
                {
                    "comparison": "==",
                    "source": "text",
                    "target": "400 Bad Request"
                }
            ],
            "verbose_logging": true
        },
        {
            "name": "Demo Exponential Backoff",
            "type": "web-ping",
            "config": {
                "address": "http://httpstat.us/200"
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 10,
                "multiplier": 2,
                "type": "exponential"
            },
            "assertions": [
                {
                    "comparison": "==",
                    "identifier": "status_code",
                    "source": "metadata",
                    "target": "500"
                }
            ]
        },
        {
            "name": "Docker Redis",
            "type": "tcp",
            "config": {
                "host": "192.168.99.100",
                "port": 1001
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 10,
                "type": "constant"
            }
        },
        {
            "name": "Docker stats",
            "type": "docker-stats",
            "config": {},
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 30,
                "type": "linear"
            }
        },
        {
            "name": "production-docker-host",
            "type": "remote-docker",
            "config": {
                "host": "ec2-xx-xxx-xx-xxx.ap-southeast-1.compute.amazonaws.com",
                "user": "ubuntu"
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 5,
                "type": "linear"
            }
        },
        {
            "name": "scollector-metrics",
            "type": "scollector",
            "config": {
                "host": "hostname"
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 15,
                "type": "constant"
            }
        },
        {
            "name": "production-db",
            "type": "postgres",
            "config": {
                "connection_url": "postgres://user:pass@localhost:5432/dbname?sslmode=disable",
                "metric_queries": [
                    {
                        "metric": "client_count",
                        "query": "select count(*) from clients"
                    }
                ]
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 120,
                "type": "linear"
            }
        },
        {
            "name": "README size",
            "type": "command",
            "config": {
                "command": "cat README.md | wc -l",
                "output_type": "number"
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 10,
                "type": "constant"
            }
        },
        {
            "name": "List files",
            "type": "command",
            "config": {
                "command": "ls"
            },
            "send_alerts": [
                "stderr"
            ],
            "backoff": {
                "interval": 10,
                "type": "constant"
            }
        },
        {
            "name": "SHH into docker-alpine-sshd",
            "type": "remote-command",
            "config": {
                "command": "uptime",
                "ssh_auth_options": {
                  "user": "root",
                  "password": "root",
                  "host": "localhost",
                  "port": 2222
                }
            },
            "send_alerts": [
                "stderr"
            ],
            "assertions": [
                {
                    "comparison": "==",
                    "identifier": "exit_status",
                    "source": "metadata",
                    "target": "0"
                }
            ]
        },
        {
            "name": "Run Smoke Tests",
            "type": "test-report",
            "config": {
                "command": "./run-smoke-tests.sh"
            },
            "send_alerts": [
                "stderr"
            ],
            "assertions": [
                {
                    "comparison": "==",
                    "identifier": "status",
                    "source": "metadata",
                    "target": "PASSING"
                }
            ]
        }
    ],
    "notifications": [
        {
            "name": "email",
            "type": "gmail",
            "config": {
                "notification_addresses": "",
                "pass": "",
                "user": ""
            }
        },
        {
            "name": "chat",
            "type": "slack",
            "config": {
                "channel": "#general",
                "icon_emoji": ":rocket:",
                "username": "redalert",
                "webhook_url": ""
            }
        },
        {
            "name": "sms",
            "type": "twilio",
            "config": {
                "account_sid": "",
                "auth_token": "",
                "notification_numbers": "",
                "twilio_number": ""
            }
        }
    ],
    "preferences": {
        "notifications": {
          "fail_count_alert_threshold": 2,
          "repeat_fail_alerts": false
        }
    }
}

Build and run (capture stderr).

go build

./redalert 2> errors.log

Notification Preferences

  • fail_count_alert_threshold controls sending an alert, only after N fails (defaults to 1)
  • repeat_fail_alerts controls whether fail alerts are repeated, on consecutive failing checks (defaults to false)
"preferences": {
  "notifications": {
    "fail_count_alert_threshold": 2,
    "repeat_fail_alerts": false
  }
}

Backoffs

When a server check fails - the next check will be delayed according to the back-off algorithm. By default, there is no delay (i.e. constant back-off), with a default interval of 10 seconds between checks. When a failing server returns to normal, the check frequency returns to its original value.

Constant

Pinging interval will remain constant. i.e. will not provide any back-off after failure.

Linear

The pinging interval upon failure will be extended linearly. i.e. failure count x pinging interval.

Exponential

With each failure, the subsequent check will be delayed by the last delayed amount, times a multiplier, resulting in time between checks exponentially increasing. The multiplier is set to 2 by default.

Note for Gmail

If there are errors sending email via gmail - enable Access for less secure apps under Account permissions @ https://www.google.com/settings/u/2/security

Deployment

CloudFormation Stacks

See redalert-cloudformation

EC2 & ELB

Launch Stack

EC2 & ELB & S3 config

Launch Stack

Development

Setup

Dependencies:

  • Go dependency manager - glide
  • Embedding static assets into binary - go.rice
  • protoc for gRPC code generation - gRPC
  • Docker-machine for tests

Credits

Rocket emoji via https://github.com/twitter/twemoji

Next Features

See Github Issues here

redalert's People

Contributors

jacksgt avatar jonog avatar pavel-main avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

redalert's Issues

Config - Reduce log verbosity

Hi,

the web-ping check is currently quite noisy:

myservice 2018/01/30 09:50:40 GET https://example.com/v1/status/ApiListener
myservice 2018/01/30 09:50:40 Latency  1.73994

every ten seconds.

Redalert should be able to suppress these "info" logs and only output warnings / errors.

Do not write back into configuration file

Hi there,

using the file backend for configuration, redalert writes back into its own configuration file on each start. This is annoying when using configuration management such as Puppet, Ansible, Chef, etc.. as those tools are designed to ensure the system converges in a certain state.

Because redalert uses a slightly different formatting etc. the configuration file is adjusted by (e.g.) Puppet each time, only to be changed by redalert again.

https://github.com/jonog/redalert/blob/master/config/file_store.go#L45

	err = config.write()
	if err != nil {
		return nil, err
	}

An option to store the state internally without rewriting the config file would be great.

Deploy builds to Docker Hub

https://hub.docker.com/r/jonog/redalert/tags/ appears to be 2 versions behind https://github.com/jonog/redalert/releases. Adding a deploy step to the circle.yml config might help keep these tags up to date.

https://circleci.com/docs/1.0/docker/ has info about adding deploy steps to https://github.com/jonog/redalert/blob/master/circle.yml.

deployment:
  hub:
    branch: master
    commands:
      - docker login -e $DOCKER_EMAIL -u $DOCKER_USER -p $DOCKER_PASS
      - docker push jonog/redalert

undefined: grpc.SupportPackageIsVersion3

It looks like, although you're vendoring with godep, you're not vendoring gprc and there's some version conflict resulting in redalert not building cleanly.

1077)jonog/redalert % go build             
# github.com/jonog/redalert/servicepb
servicepb/service.pb.go:157: undefined: grpc.SupportPackageIsVersion3

Is there a step missing in the documentation?

Config - Load via S3

Integrate using AWS SDK to read from S3 bucket, to avoid having publicly exposed configuration in S3 buckets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.