Comments (15)
It makes sense to provide only the most basic and important info for the first step.
And let's check how things and output fields from corosync-cmapctl are different between corosync-2.4 and corosync-3 so that we have better ideas on the options.
from ha_cluster_exporter.
@MalloZup @gao-yan Sorry for late respond. Here is the link status provided by corosync-cfgtool in corosync 3+:
sles15sp1vm2:~ # corosync-cfgtool -s
Printing link status.
Local node ID 2
LINK ID 0
addr = 192.168.122.23
status:
nodeid 1: link enabled:1 link connected:1
nodeid 2: link enabled:1 link connected:1
There is no concept RRP and SRP. The link all decided by knet, which supports up to 8 links. There are different priorities set in different links. More details in corosync-cfgtool is:
-s Displays the status of the current links on this node for UDP/UDPU, with extended status for KNET. After each link, the nodes on that link
are displayed in order with their status, for example there are 3 nodes with KNET transportation: LINK ID 0:
id = 192.168.100.80
status:
node 0: link enabled: 1 link connected: 1
node 1: link enabled: 1 link connected: 1
node 2: link enabled: 1 link connected: 1
-b Displays the brief status of the current links on this node (KNET only) when used with "-s". If any interfaces are faulty, 1 is returned by
the binary. If all interfaces are active 0 is returned to the shell. After each link, the nodes on that link are displayed in order with
their status encoded into a single digit. 1=link enabled, 2=link connected, So a 3 in a node position indicates that the link is both
enabled and connected. The local link (which will only ever be enabled on link 0) shows as enabled but not connected for internal reasons.
The output will be: LINK ID 0:
id = 192.168.100.80
status = 333
Is that helpful for you? Or any others I can make help?
from ha_cluster_exporter.
@diegoakechi to me we should consider :
this is an output
Printing ring status.
Local node ID 16777226
RING ID 0
id = 10.0.0.1
status = Marking ringid 0 interface 10.0.0.1 FAULTY
RING ID 1
id = 172.16.0.1
status = ring 1 active with no fault
Now if we want to have a label by ring id, which ID should be take?
RING ID 0 or id = 10.0.0.1
Also right now we have the total of failure. To me this could be as minimalist approach .
If we need more precision we can do it, but if we go to the more detailed approach we don't need the total metrics anymore since this can be done via promql
from ha_cluster_exporter.
Indeed the output of corosync-cfgtool is kind of confusing. How about adding a new field "ring_address" or combining the information like "0 (address: 10.0.0.1)"?
from ha_cluster_exporter.
@gao-yan to me we could add it yes. I just hope the output stay the same since we will use it as api😅
from ha_cluster_exporter.
@gao-yan can a ring have multiple adress? 🤔
from ha_cluster_exporter.
adding this would require some anti-pattern things in prometheus. I am not sure if we need it ..
from ha_cluster_exporter.
I don't think output of corosync-cfgtool changes often. But the output between corosync-2.4 and corosync-3 are different:
2.4:
https://github.com/corosync/corosync/blob/needle-2.4/tools/corosync-cfgtool.c#L68
3.0:
https://github.com/corosync/corosync/blob/master/tools/corosync-cfgtool.c#L90
But we will likely take corosync-3 only starting from SLE 16.
In corosync-2.4, corosync-cmapctl outputs the info in better format, for example some relevant information from there:
runtime.totem.pg.mrp.rrp.0.faulty (u8) = 1
runtime.totem.pg.mrp.srp.members.1084783184.ip (str) = r(0) ip(192.168.122.80) r(1) ip(127.0.0.1)
totem.interface.0.bindnetaddr (str) = 192.168.122.0
I don't have corosync-3 in hands, but IIRC, rrp (Redundant Ring Protocol) is even dropped. https://jira.suse.com/browse/PM-1203
@yuanren10 should have better understanding/suggestions on the topics in here :-)
from ha_cluster_exporter.
@gao-yan can a ring have multiple adress?
I don't think so. A ring/interface can only be configured with one "bindnetaddr".
from ha_cluster_exporter.
ok thx @gao-yan . 🌞 So I think for a first corosync 0.5 version of the exporter, I would go with only the metric total. which is the most simple one
For the other metric we need to research a bit if we needed it or not to rely on such output.
IMHO
corosync-cmapctl
seems a valid tool to me more handy to parse. If it is stable we might just use that.
from ha_cluster_exporter.
yes agree. thx @gao-yan 🚀
from ha_cluster_exporter.
Same with mentioned by @gao-yan , in corosync2.4.+, link status seems only can be tell by return code of "corosync-cfgtool" and "runtime.totem.pg.mrp.rrp.0.faulty". But "runtime.totem.pg.mrp.rrp.0.faulty" removed in corosync3+, because it's seems enough using "corosync-cfgtool" to show the status
from ha_cluster_exporter.
@ReyRen thx so far. I think for moment this issue is not super urgent but thx for all info was helpfull 🚀
from ha_cluster_exporter.
Hi, I can confirm the issue using corosync 3+ in Debian10.
Any plan to support it ? How can I help ?
from ha_cluster_exporter.
@mbothorel Hi.
If you want to help on this, you need to create a new metric with some labels.
Some doc:
https://github.com/ClusterLabs/ha_cluster_exporter/blob/master/doc/design.md
Also a part of setting up the development env., check my first comment on the issue which give an hint what we need to achieve.
If you have any question feel free to ping me. There is no stupid question, 😁
If you wanna work on it, I can assign this to you.
Let me know and thank you for proposing it!
from ha_cluster_exporter.
Related Issues (20)
- "'corosync' collector scrape failed: corosync parser error: could not parse members in corosync-quorumtool output: could not find membership information" HOT 4
- use crm_mon instead parsing cib with cibadmin for constraint metric HOT 6
- Exporter output when pacemaker is down HOT 7
- Change releases naming HOT 5
- research about addition of codeql
- Node atttributes and Systemd units data not showing up in Grafana HOT 4
- No data from the dashboard
- Issue with `ha_cluster_pacemaker_config_last_change` and timezone HOT 1
- corosync parser error: could not parse node id in corosync-quorumtool output: could not find Node ID line HOT 23
- Compress binaries attached to GH releases
- Error when disabled all cluster nodes HOT 7
- sbd data is not exported if devices in SBD_DEVICE have an space after ;
- Support SSL/TLS connections HOT 1
- Sample Prometheus alerting rules HOT 1
- Pacemaker metrics not available for monitoring HOT 1
- Not able to install Latest 1.3.0 version for prometheus-ha_cluster_exporter HOT 4
- HA Cluster configuration sample is erroneous HOT 2
- regexp parsing corosync-cfgtool output will not work HOT 2
- get trouble with make && make install
- get trouble with start exporter HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ha_cluster_exporter.