Comments (14)
Background
As part of the work in client_java
1.x.x to integrate with OpenMetrics/OpenTelemetry metric names are enforced.
Kafka appears to create metrics with suffixes that violate the OpenMetrics/OpenTelemetry specification.
Stacktrace
An Exception occurred while scraping metrics: java.lang.IllegalArgumentException: 'kafka_server_socketservermetrics_failed_handshake_total': Illegal metric name. The metric name must not include the '_total' suffix. Call PrometheusNaming.sanitizeMetricName(name) to avoid this error.
at io.prometheus.metrics.model.snapshots.MetricMetadata.validate(MetricMetadata.java:106)
at io.prometheus.metrics.model.snapshots.MetricMetadata.<init>(MetricMetadata.java:63)
at io.prometheus.metrics.model.snapshots.MetricSnapshot$Builder.buildMetadata(MetricSnapshot.java:80)
at io.prometheus.metrics.model.snapshots.GaugeSnapshot$Builder.build(GaugeSnapshot.java:129)
at io.prometheus.jmx.JmxCollector.collect(JmxCollector.java:832)
at io.prometheus.metrics.model.registry.MultiCollector.collect(MultiCollector.java:26)
at io.prometheus.metrics.model.registry.PrometheusRegistry.scrape(PrometheusRegistry.java:72)
at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.scrape(PrometheusScrapeHandler.java:112)
at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.handleRequest(PrometheusScrapeHandler.java:53)
at io.prometheus.metrics.exporter.httpserver.MetricsHandler.handle(MetricsHandler.java:43)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:851)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:818)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Sanitizing the name could result in duplicate metrics, violating metric exposition constraints:
sanitized(foo_total) = foo
sanitized (foo) = foo
from jmx_exporter.
Debugging shows two similar MBeans registered with different names.
MBean 1 has a dash (-
) between topics
and partitions
...
- v3.topics-partitions
MBean 2 has a period (.
) between topics
and partitions
- v3.topics.partitions-
These both will get sanitized to the same name with the dashes (-
) and the periods (.
) replaced with underscores (_
), resulting in duplicate metrics.
254201 [FINE] io.prometheus.jmx.JmxScraper kafka.rest{type=jersey-metrics}v3.topics-partitions-reassignment.list.response-rate scrape: 0.0
254202 [FINE] io.prometheus.jmx.JmxCollector add metric sample: kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate [] [] 0.0
254203 [FINE] io.prometheus.jmx.JmxCollector matchedRule.name [kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate] sanitizedName [kafka_rest_jersey_metrics_v3_to pics_partitions_reassignment_list_response_rate]
254573 [FINE] io.prometheus.jmx.JmxScraper kafka.rest{type=jersey-metrics}v3.topics.partitions-reassignment.list.response-rate scrape: 0.0
254574 [FINE] io.prometheus.jmx.JmxCollector add metric sample: kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate [] [] 0.0
254575 [FINE] io.prometheus.jmx.JmxCollector matchedRule.name [kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate] sanitizedName [kafka_rest_jersey_metrics_v3_to pics_partitions_reassignment_list_response_rate]
from jmx_exporter.
Debugging shows that...
... converts .
and -
(unsafe characters per the Prometheus metrics specification) to _
, which results in duplicate metric names for two different MBeans.
In these scenarios, the first MBean metric would be used. Subsequent MBeans metrics (with the same name) would be ignored.
from jmx_exporter.
Thanks a lot @dhoard for analyzing this!!!
I guess adding a label like mbean_name
in case there is a name conflict is the best way to solve this. Something like this:
v3_topics_partitions{mbean_name="v3.topics-partitions"} ...
v3_topics_partitions{mbean_name="v3.topics.partitions"} ...
If we add that label only in case of name collisions it should have minimal impact.
However, this will beak if the metrics have different types, so we should set the type UNKNOWN
for all metrics with name collisions.
I'm wondering what the previous versions of jmx_exporter
did. My guess is metrics were just overwriting each other, so it was basically random which value you got.
from jmx_exporter.
While the example uses UNNOWN
metric types (because of Kafka), the issue can occur for UNTYPES
(treated as UNKNOWN
), COUNTER
and GAUGE
metrics.
I'm wondering what the previous versions of jmx_exporter did. My guess is metrics were just overwriting each other, so it was basically random which value you got.
@fstab Correct. The previous jmx_exporter
versions would use the first MBean, dropping the metrics for the second MBean, etc. if the sanitized name matched, resulting in missed metrics for some MBeans.
When an MBean has tabular data with no keys, tracking the MBean name alone isn't enough to guarantee uniqueness. - JUnit test issue
from jmx_exporter.
@fstab I have code working on my branch (https://github.com/dhoard/jmx_exporter/tree/issue-958) by adding an _x_id_
label to every non-JVM metric.
I have gone down this path because...
- the output wouldn't provide any information on which MBean was used to generate the first metric....
# HELP kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate The average number of HTTP responses per second. kafka.rest:name=null,type=jersey-metrics,attribute=v3.topics-partitions-reassignment.list.response-rate
# TYPE kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate untyped
kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate{} 0.0
kafka_rest_jersey_metrics_v3_topics_partitions_reassignment_list_response_rate{_x_id_="kafka.rest:type=jersey-metrics,attribute=v3.topics.partitions-reassignment.list.response-rate"} 0.0
- it would require keeping a
Map
of previously encountered metric names and performing a lookup when processing every MBean attribute value to check for a duplicate. (XSnapshot.Builder
classes have nocomputeIfAbsent
method to check for a duplicate datapoint.)
from jmx_exporter.
My branch has been updated to include a label _x_id_
on all non-JVM metrics that contains a Murmur3 hash of ObjectName + ,atttribute=
+ attribute name.
Analysis
- The response size for a Kafka server with 86,415 metrics and the following rules...
rules:
- pattern: ".*"
... increases by ~140KB (11.5% increase)
-
The transfer time for 140KB on a 1Gbps network is ~1.1 milliseconds.
-
Calculating a Murmur3 hash is performant and the probability of a hash collision is practically 0.
P ≈ 1− (e-1/2147483648)
(e-1/2147483648) ≈ 1
P ≈ 1 - 1
P ≈ 0
from jmx_exporter.
Please let me know when your branch is ready to review!
from jmx_exporter.
Can we add the extra label only to the metrics that have a name collision, rather than to all metrics? I think name collisions are rare, so most metrics won't need that label.
from jmx_exporter.
Possibly... a couple of options...
A
Change the client_java
...Snaphot.Builder
classes to throw a DuplicateLabelsException
when adding a ...DataPointSnapshot
value with colliding labels. This breaks the builder pattern.
B
The JMX exporter code would need to keep maps of the ...DataPointSnapshot.Builder
objects independently of the ...Snapshot.Builder
objects.
When building the MetricSnapshots
... all of the logic to detect duplicates/adding an additional label for uniqueness would have to be implemented in the JMX Exporter code. (Collect everything in isolation... merge in the end.)
This is further complicated because none of the builder classes implement hashCode()
/ equals(Object other)
so they can't be used as Map keys.
Thoughts?
from jmx_exporter.
I am confused. Is the problem conflicting metric names, or conflicting labels?
from jmx_exporter.
Let me try to clarify...
Test MBean
package io.prometheus.jmx;
import javax.management.JMException;
import javax.management.MBeanServer;
import javax.management.ObjectName;
public interface CollidingMBean {
int getXid();
}
class Colliding implements CollidingMBean {
private final int value;
public Colliding(int value) {
this.value = value;
}
@Override
public int getXid() {
return value;
}
static void registerBean(MBeanServer mbs) throws JMException {
ObjectName objectName =
new ObjectName("io.prometheus.jmx.test:type=Colliding-Bean");
Colliding mBean = new Colliding(123);
mbs.registerMBean(mBean, objectName);
objectName = new ObjectName("io.prometheus.jmx.test:type=Colliding.Bean");
mBean = new Colliding(321);
mbs.registerMBean(mBean, objectName);
objectName = new ObjectName("io.prometheus.jmx.test:type=Colliding_Bean");
mBean = new Colliding(246);
mbs.registerMBean(mBean, objectName);
objectName = new ObjectName("io.prometheus.jmx.test:type=Colliding|Bean");
mBean = new Colliding(246);
mbs.registerMBean(mBean, objectName);
}
}
Test Scenario
Register multiple independent instances of the MBean with different ObjectNames...
io.prometheus.jmx.test:type=Colliding-Bean
io.prometheus.jmx.test:type=Colliding.Bean
io.prometheus.jmx.test:type=Colliding_Bean
io.prometheus.jmx.test:type=Colliding|Bean
Exporter code
During scraping, the exporter "sanitizes" the name to match the Prometheus/OpenMetrics metrics name specification and adds the attribute.
sanitized(io.prometheus.jmx.test:type=Colliding-Bean
) = io_prometheus_jmx_test_Colliding_Bean
_attributeName
sanitized(io_prometheus_jmx_test_Colliding_Bean
) = io_prometheus_jmx_test_Colliding_Bean
_attributeName
sanitized(io_prometheus_jmx_test_Colliding_Bean
) = io_prometheus_jmx_test_Colliding_Bean
_attributeName
sanitized(io_prometheus_jmx_test_Colliding_Bean
) = io_prometheus_jmx_test_Colliding_Bean
_attributeName
Because the "sanitized" name is the same, these independent MBean are merged into a single metric with multiple datapoint snapshots, each having the same labels.
switch (matchedRule.type) {
case "COUNTER":
{
CounterSnapshot.Builder counterBuilder =
countersMap.computeIfAbsent(
sanitizedName,
key ->
CounterSnapshot.builder()
.name(sanitizedName)
.help(finalMatchedRule.help));
counterBuilder.dataPoint(
CounterSnapshot.CounterDataPointSnapshot.builder()
.labels(labels)
.value(value.doubleValue())
.build());
break;
}
case "GAUGE":
{
GaugeSnapshot.Builder gaugeBuilder =
gaugeMap.computeIfAbsent(
sanitizedName,
key ->
GaugeSnapshot.builder()
.name(sanitizedName)
.help(finalMatchedRule.help));
gaugeBuilder.dataPoint(
GaugeSnapshot.GaugeDataPointSnapshot.builder()
.labels(labels)
.value(value.doubleValue())
.build());
break;
}
case "UNKNOWN":
case "UNTYPED":
default:
{
UnknownSnapshot.Builder unknownBuilder =
unknownMap.computeIfAbsent(
sanitizedName,
key ->
UnknownSnapshot.builder()
.name(sanitizedName)
.help(finalMatchedRule.help));
unknownBuilder.dataPoint(
UnknownSnapshot.UnknownDataPointSnapshot.builder()
.labels(labels)
.value(value.doubleValue())
.build());
break;
}
}
When the code to build the MetricSnapshots
is executed...
for (CounterSnapshot.Builder counter : receiver.countersMap.values()) {
result.metricSnapshot(counter.build());
}
for (GaugeSnapshot.Builder gauge : receiver.gaugeMap.values()) {
result.metricSnapshot(gauge.build());
}
for (UnknownSnapshot.Builder unknown : receiver.unknownMap.values()) {
result.metricSnapshot(unknown.build());
}
... we experience the DuplicateLabelsException
Initial Approach
My initial approach was to keep a map of "sanitized" name to ...SnapShot.Builder
and a separate map of "sanitized" name to a list of ...DataPointSnapshot.Builder
s.
This doesn't work because the ...DataPointSnapshot.Builder
s have no methods to modify their contents.
Map<String, CounterSnapshot.Builder> snapshotBuilderMap = new HashMap<>();
Map<String, List<CounterSnapshot.CounterDataPointSnapshot.Builder>> snapshotDataPointBuilderMap = new HashMap<>();
for (Map.Entry<String, CounterSnapshot.Builder> entry : snapshotBuilderMap.entrySet()) {
String key = entry.getKey();
CounterSnapshot.Builder builder = entry.getValue();
List<CounterSnapshot.CounterDataPointSnapshot.Builder> list = snapshotDataPointBuilderMap.get(key);
for (CounterSnapshot.CounterDataPointSnapshot.Builder dataPointBuilder : list) {
try {
builder.dataPoint(dataPointBuilder.build());
} catch (DuplicateLabelsException e) {
// CounterSnapshot.CounterDataPointSnapshot.Builder doesn't
// have a method to add a label to the existing labels
// and doesn't have a method to get the current labels
}
}
}
I have been trying to find a solution without requiring a client_java
change/release.
from jmx_exporter.
Thanks a lot for the example, that was helpful.
I implemented a proposal how to solve this in #960. What do you think?
from jmx_exporter.
Resolve via #960. (post 1.0.0 release)
from jmx_exporter.
Related Issues (20)
- Duplicate metrics with NaN HOT 1
- Drop support for Java 8 HOT 8
- Blacklisting Percentile Metrics HOT 3
- Metrics for a specific process HOT 6
- the problem of getting data from multiple replicas under Kubernetes deployment controller and how to distinguish replica labels based on podname HOT 2
- JMX exporter high memory usage. HOT 3
- have any metric about jvm GC Pause? HOT 3
- Proposal: attributes as metrics & extra metrics HOT 7
- Can't add metrics to blacklistObjectNames HOT 6
- Plan for release with Prometheus Client 1.x? HOT 2
- no main manifest attribute, in jmx_prometheus_javaagent-0.19.0.jar HOT 1
- Hazelcast Metrics are not showing HOT 2
- Remote jmx has enabled SSL. How to configure jmx-exporter for verification? HOT 13
- Ability to use lowercaseOutputName in particular rules HOT 4
- jmx exporter inside Confluent Kafka broker container HOT 2
- After applying JMX filtering pattern, it takes 23 secs to scrape the metrics, is it expected behaviour? HOT 6
- feature request: allow to exclude specific attribute patterns inside a composite attribute HOT 3
- Clarify in documentation "jvm_*" metrics HOT 4
- jmxexporter adding incorrect _total suffix to #HELP and #TYPE when using COUNTER HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jmx_exporter.