Giter Club home page Giter Club logo

mon_archway's Introduction

Archway node monitoring tool

Developing for Torii testnet node. Based on other ours node monitors.

To monitor you node your should have installed and configured: On node server:

On monitoring server:

It is possible to install the software on the node server instance. Hovewer, it is better to move it to standalone instance with opened web access to watch it from browser at any location.

The following steps will guide you through the setup process:

Monitoring server installation

InfluxDB

Install:

wget -qO- https://repos.influxdata.com/influxdb.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdb.gpg > /dev/null
export DISTRIB_ID=$(lsb_release -si); export DISTRIB_CODENAME=$(lsb_release -sc)
echo "deb [signed-by=/etc/apt/trusted.gpg.d/influxdb.gpg] https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list > /dev/null

sudo apt update && sudo apt install influxdb

sudo systemctl enable --now influxdb

sudo systemctl start influxdb

sudo systemctl status influxdb

Setup database (change the passwords given in the example on more secure ones):

influx
> create database archwaymetricsdb
> create user metrics with password 'password'
> grant WRITE on archwaymetricsdb to metrics
> create user grafana with password 'other_password'
> grant READ on archwaymetricsdb to grafana

Keep database user and password in order to use it later for agent configuration. Write it.

In the case of using standalone instance for monitoring staff, you should know your node external ip address (you can know it by command curl ifconfig.me). In the case of installation on the same instance, just use localhost or 127.0.0.1

Grafana

Install:

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"

sudo apt update -y
sudo apt install grafana -y

sudo systemctl daemon-reload

sudo systemctl enable --now grafana-server
sudo systemctl start grafana-server

# verify the status of the Grafana service with the following command:
sudo systemctl status grafana-server

Configuration:

Follow YOUR_MONITORING_SERVER_IP:3000 to setup grafana dashboard. The following steps are performed in the graphical interface of grafana.

Change default password for grafana user admin/admin on safer one

Add data source InfluxDB with the following settings:

Param Value
HTTP
URL http://localhost:8086
InfluxDB Details
Database archwaymetricsdb
User grafana

Save datasource settings

Import json file from this repo and save your dashboard.

Installation on a node

Manual installation

Install telegraf

sudo apt update
sudo apt -y install curl jq bc

# install telegraf
sudo cat <<EOF | sudo tee /etc/apt/sources.list.d/influxdata.list
deb https://repos.influxdata.com/ubuntu bionic stable
EOF
sudo curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -

sudo apt update
sudo apt -y install telegraf

sudo systemctl enable --now telegraf
sudo systemctl is-enabled telegraf

# make the telegraf user sudo and adm to be able to execute scripts as node user
sudo adduser telegraf sudo
sudo adduser telegraf adm
sudo -- bash -c 'echo "telegraf ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers'

You can check telegram service status:

sudo systemctl status telegraf

Status can be not ok with default Telegraf's config. Next steps will fix it.

Clone this project repo and copy variable script template

git clone https://github.com/shurinov/mon_archway.git
cd mon_archway
cp template_mon_var.sh mon_var.sh
nano mon_var.sh

Insert your parameters to mon_var.sh:

  • full path to node binary to COS_BIN_NAME ( check which archwayd)
  • node PRC port to COS_PORT_RPC ( check in file path_to_arhcway_node_config/config/config.toml)
  • node API port to COS_PORT_API ( check in file path_to_arhcway_node_config/config/app.toml)
  • node validator address to COS_VALOPER ( like archwayvaloper********)
  • validator wallet address to COS_WALADDR

Save changes in mon_var.sh and enable execution permissions:

chmod +x monitor.sh mon_var.sh

Edit telegraf configuration

sudo mv /etc/telegraf/telegraf.conf /etc/telegraf/telegraf.conf.orig
sudo nano /etc/telegraf/telegraf.conf

Copy it to config and paste your server name (to do so it is convenient to use the node moniker):

[agent]
  hostname = "YOUR_MONIKER/SERVER_NAME" # set this to a name you want to identify your node in the grafana dashboard
  flush_interval = "15s"
  interval = "15s"
# Input Plugins
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["devtmpfs", "devfs"]
[[inputs.io]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.nstat]]
[[inputs.system]]
[[inputs.swap]]
[[inputs.netstat]]
[[inputs.linux_sysctl_fs]]
[[inputs.processes]]
[[inputs.interrupts]]
[[inputs.kernel]]
[[inputs.diskio]]
# Output Plugin InfluxDB
[[outputs.influxdb]]
  database = "archwaymetricsdb"
  urls = [ "MONITORING_SERV_URL:PORT" ] # example http://yourownmonitoringnode:8086
  username = "DB_USERNAME" # your database username
  password = "DB_PASSWORD" # your database user's password
	
[[inputs.exec]]
  commands = ["sudo su -c /PATH/TO/ARCHWAY_MON/monitor.sh -s /bin/bash <ARCHWAY_USER>"] # change home and username to the useraccount your validator runs at
  interval = "15s"
  timeout = "5s"
  data_format = "influx"
  data_type = "integer"

By fast installation script

Temporarily unavailable. Script will be updated to actual version soon...

Dashboard interface

Dashboard has main cosmos-based node information and common system metrics. There is a description in it.

Dashboard screenshort Dashboard screenshort Dashboard screenshort Dashboard screenshort Dashboard screenshort

Mon health

Complex parameter can show problem concerning receiving metrics from node. Normal value is "OK"

Sync status

Node catching_up parameter

Block height

Latest blockheight of node

Time since latest block

Time interval in seconds between taking the metric and node latest block time. Value greater 15s may indicate some kind of synchronization problem.

Peers

Number of connected peers

Jailed status

Validator jailed status.

Missed blocks

Number of missed blocks in 100 blocks running window. If the validator misses more than 50 blocks, it will end up in jail.

Bonded status

Validator stake bonded info

Voting power

Validator voting power. If the value of this parameter is zero, your node isn't in the active pool of validators

Delegated tokens

Number of delegated tokens

Version

Version of node binary

Vali Rank

Your node stake rank

Active validator numbers

Total number of active validators

Other common system metrics: CPU/RAM/FS load, etc.

No comments needed)

mon_archway's People

Contributors

shurinov avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

dvjromashkin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.