So, sensu is able to use this cluster for its intended purpose. However, I do see the following error in sasl log every now and then:
=CRASH REPORT==== 28-Apr-2017::19:34:47 ===
crasher:
initial call: rabbit_mgmt_metrics_collector:init/1
pid: <0.27508.15>
registered_name: connection_coarse_metrics_metrics_collector
exception exit: {badarith,
[{rabbit_mgmt_metrics_collector,sum_entry,2,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,550}]},
{rabbit_mgmt_metrics_collector,
'-insert_entry_op/4-fun-0-',2,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,529}]},
{dict,update_bkt,4,[{file,"dict.erl"},{line,323}]},
{dict,on_bucket,3,[{file,"dict.erl"},{line,415}]},
{dict,update,4,[{file,"dict.erl"},{line,318}]},
{rabbit_mgmt_metrics_collector,insert_entry_op,4,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,528}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},
{rabbit_mgmt_metrics_collector,aggregate_entry,4,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,213}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 744)
ancestors: [rabbit_mgmt_agent_sup,rabbit_mgmt_agent_sup_sup,<0.362.0>]
messages: []
links: [<0.364.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 2586
stack_size: 27
reductions: 1848
neighbours:
=SUPERVISOR REPORT==== 28-Apr-2017::19:34:47 ===
Supervisor: {local,rabbit_mgmt_agent_sup}
Context: child_terminated
Reason: {badarith,
[{rabbit_mgmt_metrics_collector,sum_entry,2,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,550}]},
{rabbit_mgmt_metrics_collector,
'-insert_entry_op/4-fun-0-',2,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,529}]},
{dict,update_bkt,4,[{file,"dict.erl"},{line,323}]},
{dict,on_bucket,3,[{file,"dict.erl"},{line,415}]},
{dict,update,4,[{file,"dict.erl"},{line,318}]},
{rabbit_mgmt_metrics_collector,insert_entry_op,4,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,528}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},
{rabbit_mgmt_metrics_collector,aggregate_entry,4,
[{file,"src/rabbit_mgmt_metrics_collector.erl"},
{line,213}]}]}
Offender: [{pid,<0.27508.15>},
{name,connection_coarse_metrics_metrics_collector},
{mfargs,
{rabbit_mgmt_metrics_collector,start_link,
[connection_coarse_metrics]}},
{restart_type,permanent},
{shutdown,30000},
{child_type,worker}]
My guess is the caller somehow passed an invalid type (not numeric) to sum_entry function. While this doesn't seem to cause any issue to our rabbitmq cluster, it's sort of misleading to see so many errors like this in the log file.
Let me know if you need any information.