Comments (3)
Since Prometheus will only gather numeric metrics, there are some things to consider when modeling the metrics.
The Sovrin Network name being monitored
Should be a label attached to all metrics.
Node alias name
Should also be a label, for each node we get (with the node being the top-level objects in the duct we are currently fetching).
Detect when a node is inaccessible and produce standard output for that situation.
This would happen outside of the exporter, either in Prometheus through Altermanager, or in Grafana.
The number of transaction per Indy ledger, especially the domain ledger.
Should work as Gauge
transactions_total
with a label per ledger.
The average read and write times for the node.
Here I wonder how the values are measured. Ideally, we could just record the total requests in a Gauge and let Prometheus infer the other metrics. Else having histograms for throughput might be fine, we just have to be careful with regards to statistically wrong double aggregations.
The uptime of the node (time is last restart).
Clearly a gauge with a label per node.
The time since last freshness check (should be less than 5 minutes).
Diff against time of the and record as Gauge
?
Node IP address information
This could be a label, same as the node name.
Total nodes in pool information
Gauge with pool name as label.
from indy-node-monitor.
One question regarding freshness status:
When I have a test network with 4 nodes, I get 3 freshness values, as you have posted above:
"Freshness_status": {
"1": {
"Last_updated_time": "2020-07-06 23:55:07+00:00",
"Has_write_consensus": true
},
"0": {
"Last_updated_time": "2020-07-06 23:57:33+00:00",
"Has_write_consensus": true
},
"2": {
"Last_updated_time": "2020-07-06 23:57:33+00:00",
"Has_write_consensus": true
}
}
What does these numbers as keys (0,1,2) represent and how should we interpret them?
from indy-node-monitor.
These metrics should be available on the auto-provisioned dashboards supplied with the monitoring stack. If anything else is needed or anything is missing a separate issue can be opened.
from indy-node-monitor.
Related Issues (18)
- Using indy-node-monitor with cron
- Add node counts to status when using network monitor DID HOT 3
- Determine what information is available from validator-info to determine node compliance HOT 19
- Update Readme to indicate what analysis is performed by the scripts and how it's reported HOT 1
- Add system clock skew detection and alerting
- Add IP address configuration analysis HOT 2
- Implement a health, diversity, and compliance analysis plug-in
- Develop Indy Node Monitor into a fully containerized monitoring stack HOT 4
- Update Fetch Validator Status Documentation - Clarify the reference to the Trustee seed
- Create a MAINTAINERS.md file
- Add detection and reporting of network/node response time
- Add network diagnostics
- Cut new release(s)
- Incompatility with latest fastapi/starlette packages versions HOT 1
- Not sure why `response_result_data_Hardware_HDD_used_by_node` is used as a label in prometheus metrics HOT 10
- Given the capabilities of InfluxDb 2.7+, investigate wether Prometheus is still required
- Fetch validator status script error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from indy-node-monitor.