Giter Club home page Giter Club logo

mehsh's Introduction

Mehsh

Mehsh allows to monitor the network between N servers for failures. It sends UDP packets from server to server and measures how many packets have disappeared.

Mehsh allows more complex configurations. So it is possible to define the complete server landscape in a configuration file and to specify the current server when starting Mehsh (default is the hostname).

Download

You can download the latest binaries from the releases page or use these permalinks for the latest version:

Example 1 (works local)

mehsh.toml:

# you have to group servers into groups, 
# in practive this might be "loadbalancers", "webservers", "storageservers", ... 
# to make this example as simple as possible we've just one group named all
[[group]]
name = "all"

# you've define to define every server, when you've 100 servers you would end up 
# with 100 [[server]] blocks. Every server has a name, an ip and is part of N groups.
[[server]]
name = "local"
ip = "127.0.0.1"
datacenter = "fra.dc11" # optional, datacenter of the server. great to see if a whole datacenter has issues
groups = ["all"]
extra1 = "some extra value you could use in placeholders"
extra2 = "same like extra1"
extra3 = "same like extra1"

[[server]]
name = "fail"
ip = "127.0.222.1"
datacenter = "fra.dc12"
groups = ["all"]

# now comes the real check
# we're checking with the udp_ping from "local" to "all".
# in real applications you would create one check per usecase.
# For example, one check that checks the connection of load balancers to application servers 
# and another check that checks the connection from the application server to the database server.
# And maybe another check that checks the connection between 
# the database servers to detect replication problems before they occur.
[[check]]
from = "local" # could be a servername or a group
to = "all" # could be a servername or a group
check = "udp_ping"

now you could download and run mehsh

mehsh_check example/local.toml --name=local # name is the name defined in the toml as server.name

Output In the output you can see that the server local has a problem to reach fail, but local itself is reached. in addition, you see an aggregated view for each datacenter. If you specify a "." at the datacenter, each of these blocks will be aggregated for itself.

Fields:

  • server: [from] -> [to], name of the server defined in the [[server]] block.
  • ip: Ip of the to server.
  • req: Number of udp packets that were send.
  • resp: Number of the udp packets that received a pont. (successful req)
  • max_lat: Max latency between sending and receiving a package
  • min_lat: Min latency between sending and receiving a package
  • mode: "normal" or "MAINTENANCE" if mehsh runs in Maintenance Mode.
  • loss: Number of packages lost. results from the calculation of req - resp
2022-03-25 09:56:37 server: local -> fail, ip: 127.0.222.1, req: 161, resp: 0, max_lat: None, min_lat: None, mode: normal, loss: 161, withloss
2022-03-25 09:56:37 server: local -> local, ip: 127.0.0.1, req: 161, resp: 161, max_lat: Some(243), min_lat: Some(51), mode: normal, loss: 0, withoutloss
2022-03-25 09:56:37 datacenter: fra.dc11 -> fra, req: 322, resp: 161, max_lat: None, min_lat: None, mode: normal, loss: 161, withloss
2022-03-25 09:56:37 datacenter: fra.dc11 -> fra.dc12, req: 161, resp: 0, max_lat: None, min_lat: None, mode: normal, loss: 161, withloss
2022-03-25 09:56:37 datacenter: fra.dc11 -> fra.dc11, req: 161, resp: 161, max_lat: Some(243), min_lat: Some(51), mode: normal, loss: 0, withoutloss
# repeats every second.

Automatically run network analysis tools in case of losses

Some hosters (e.g. hetzner) want you to send them analysis with certain tools in case of network issues. A common tool is mtr but theoretically it could be any tool.

When you want to execute tools like mtr using Mehsh you might need to run Mehsh as root because mtr needs to run as root.

[[analysis]]
from = "local" # server from
to = "all" # server to
name = "mtr" # just a name
min_loss = 20
# the command you want to run.
# variables:
# {{server.from.ip}}
# {{server.from.extra1}}
# {{server.from.extra2}}
# {{server.from.extra3}}
# {{server.to.ip}}
# {{server.to.extra1}}
# {{server.to.extra2}}
# {{server.to.extra3}}
command = "mtr -s 1000 -r -c 1000 {{server.to.ip}}"

Mehsh will execute the command and create a file with the output in /tmp/mehsh/[NAME]/[SERVER_TO]/[DATE_TIME].txt. The output is also visible in Mehsh stdout.

if you want to run your analysis tool against another ip you can use the extra fields to store the other ip there and access it with {{server.to.extra1}}. This is useful if mehsh is running in a VPN (wireguard etc.) but you need to run the diagnostics against the external ip.

Example 2 (lamp stack)

  graph TD;
      loadbalancers-->applicationservers;
      applicationservers-->databaseservers;
      databaseservers-->databaseservers;
      database_backup-->databaseservers;
Loading
[[group]]
name = "loadbalancers"

[[group]]
name = "applicationservers"

[[group]]
name = "databaseservers"

[[server]]
name = "loadbalancer1"
ip = "v4:127.0.2.2"
groups = ["loadbalancers"]

[[server]]
name = "loadbalancer2"
ip = "v4:127.0.2.3"
groups = ["loadbalancers"]

[[server]]
name = "applicationserver1"
ip = "v4:127.0.3.2"
groups = ["applicationservers"]

[[server]]
name = "applicationserver2"
ip = "v4:127.0.3.3"
groups = ["applicationservers"]

[[server]]
name = "databaseserver1"
ip = "v4:127.0.4.2"
groups = ["databaseservers"]

[[server]]
name = "databaseserver2"
ip = "v4:127.0.4.3"
groups = ["databaseservers"]

[[server]]
name = "databaseserver3"
ip = "v4:127.0.4.4"
groups = ["databaseservers"]

[[server]]
name = "database_backup"
ip = "v4:127.0.5.1"
groups = []

# load balancers needs to access applicationservers
[[check]]
from = "loadbalancers"
to = "applicationservers"
check = "udp_ping"

# applicationservers needs access to the database servers
[[check]]
from = "applicationservers"
to = "databaseservers"
check = "udp_ping"

# database servers must see each other
[[check]]
from = "databaseservers"
to = "databaseservers"
check = "udp_ping"

# backup server needs access to database servers
[[check]]
from = "database_backup"
to = "databaseservers"
check = "udp_ping"

now you could start mehsh on each server with the corrent --name argument. The name defaults to the hostname.

Maintenance Mode

Mehsh checks if the file /tmp/mehsh_maintenance exists. if the file exists mehsh will be put into maintenance mode. In maintenance mode, no more analysis tools are started and metrics no longer send a loss. You can create / remove the file while mehsh is running. The mode is included in the cli output.

# enable Maintenance Mode
touch /tmp/mehsh_maintenance
# disable Maintenance Mode
rm /tmp/mehsh_maintenance

mehsh's People

Contributors

derklaro avatar silas-91 avatar timglabisch avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.