ahus1 / prometheus-hystrix Goto Github PK
View Code? Open in Web Editor NEWThis is an implementation of a HystrixMetricsPublisher that publishes metrics using the Prometheus java client.
License: Apache License 2.0
This is an implementation of a HystrixMetricsPublisher that publishes metrics using the Prometheus java client.
License: Apache License 2.0
Hi, I'm having an issue integrating this with a Ratpack application. I'm getting the following error on startup:
INFO: An exception was caught and reported. Message: java.lang.IllegalStateException: Cannot install Hystrix integration because another concurrency strategy (class com.netflix.hystrix.strategy.concurrency.HystrixConcurrencyStrategyDefault) is already installed
java.lang.IllegalStateException: Cannot install Hystrix integration because another concurrency strategy (class com.netflix.hystrix.strategy.concurrency.HystrixConcurrencyStrategyDefault) is already installed
It looks like this bit of prometheus-hystrix
code:
https://github.com/ahus1/prometheus-hystrix/blob/master/src/main/java/com/soundcloud/prometheus/hystrix/HystrixPrometheusMetricsPublisher.java#L94
is clashing with this bit of Ratpack code:
https://github.com/ratpack/ratpack/blob/master/ratpack-hystrix/src/main/java/ratpack/hystrix/HystrixModule.java#L90
Is it necessary to register the default HystrixConcurrencyStrategy
in prometheus-hystrix
or can it be removed?
Hi, I am trying to use this cool lib in my application for publishing hystrix metrics. unfortunately, the application fails to start up. I figure that it's because that the application is on java 7 while this lib requires 1.8.
we don't have a plan to upgrade to java 8. and I found that all versions in maven central are 3.x which requires java 8.
So my question is what's the best way to get it work on java 7? as we just need basic functionalities, I am thinking to re-build 2.x with java 7. Is that the way to go? Can you confirm?
Thanks!
This no longer works. It appears that micrometer.io has replaced it. I have:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>de.ahus1.prometheus.hystrix</groupId>
<artifactId>prometheus-hystrix</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
But never do I see a hystrix_command_event_total, after many calls to apis have been made, in the output of /actuator/prometheus. I see a lot of micrometer output but nothing from prometheus-hystrix.
Does you library no longer work now that we have micrometer.io?
I wanted to use an existing grafana dashboard (https://grafana.com/dashboards/2113) that nicely shows hystrix circuit breaker information. It didn't work at all. It took me hours to find the reason: the grafana dashboard uses metrics that are to exported anymore as of version 4.0.0 of prometheus-hystrix.
It would be nice to include a transformation guide for the metrics.
An additional question: why did you change the exported metrics in the first place? I do not understand the benefits of the complete metric rewrite.
Is it possible to remove some of metrics from REST endpoint (e.g. remove everything except "hystrix_command_event_total") ?
but cannot find a way to filter collectors that I don't need.
i know this is on the road map. I just wanted to get an issue in place for the PR I'm likely to submit next week to fix some or all of the following metrics:
hystrix_command_error_total
hystrix_command_event_total
hystrix_command_latency_execute_seconds_bucket
hystrix_command_latency_execute_seconds_count
hystrix_command_latency_execute_seconds_sum
hystrix_command_latency_total_seconds_bucket
hystrix_command_latency_total_seconds_count
hystrix_command_latency_total_seconds_sum
hystrix_thread_pool_completed_task_count
hystrix_thread_pool_count_threads_executed
hystrix_thread_pool_largest_pool_size
hystrix_thread_pool_queue_size
hystrix_thread_pool_rolling_count_threads_executed
hystrix_thread_pool_rolling_max_active_threads
hystrix_thread_pool_thread_active_count
hystrix_thread_pool_total_task_count
Is it possible to remove some of metrics from REST endpoint (e.g. remove everything except "hystrix_command_event_total") ?
I'm currently using this to init metrics:
HystrixPrometheusMetricsPublisher
.builder()
.shouldExportDeprecatedMetrics(false)
.shouldRegisterDefaultPlugins(false)
.shouldExportProperties(false)
.buildAndRegister();
but cannot find a way to filter collectors that I don't need.
Couldn't find any version of this project on maven central. Any chance to release the current version soon?
I noticed that sometimes counters for _hystrix_command_latency_execute_seconds_sum
and _hystrix_command_latency_total_seconds_sum
have small glitches like that:
8891.582999999922 @1509010418.551
8891.617999999922 @1509010419.551
8891.633999999922 @1509010420.551
8891.633999999922 @1509010421.551 <-- attention
8891.631999999921 @1509010422.551 <-- attention
8891.631999999921 @1509010423.551
8891.64699999992 @1509010424.551
8891.68199999992 @1509010425.551
Prometheus' functions like rate()
or increase()
tend to think that there is a reboot between these two values and produce a huuuuge peak on the graph.
Deeper investigation showed that this is because of a series of short-circuited executions. The value for latency in case of short-circuit is always -1 ms.
The debug code like that:
HystrixCommandCompletionStream.getInstance(cmdKey)
.observe()
.subscribe(hystrixCommandCompletion -> {
LOG.warn(
"CMD Completion: executed={} {}",
hystrixCommandCompletion.didCommandExecute(),
hystrixCommandCompletion.toString(),
);
});
produces:
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=true listItemsByUserId[FAILURE][2 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=true listItemsByUserId[FAILURE][2 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
In purely synthetical tests with 100% failure rate the total sum of latency can even go below 0:
hystrix_command_latency_execute_seconds_count{command_group="SOME_GROUP",command_name="listItemsByUserId",} 684.0
hystrix_command_latency_execute_seconds_sum{command_group="SOME_GROUP",command_name="listItemsByUserId",} -0.3050000000000003
hystrix_command_latency_total_seconds_count{command_group="SOME_GROUP",command_name="listItemsByUserId",} 684.0
hystrix_command_latency_total_seconds_sum{command_group="SOME_GROUP",command_name="listItemsByUserId",} -0.21800000000000028
(Time machine, isn't it? :) )
I think that this could be safely fixed by this additional check in HystrixPrometheusMetricsPublisherCommand
class:
HystrixCommandCompletionStream.getInstance(commandKey)
.observe()
.subscribe(hystrixCommandCompletion -> {
if (hystrixCommandCompletion.didCommandExecute()) {
histogramLatencyTotal.observe(hystrixCommandCompletion.getTotalLatency() / 1000d);
histogramLatencyExecute.observe(hystrixCommandCompletion.getExecutionLatency() / 1000d);
}
for (HystrixEventType hystrixEventType : HystrixEventType.values()) {
// this code is not touched
}
});
What do you think?
Is there any reason why these latency's are published as a gauge rather than a histogram? Is there something about them that doesn't meet the Promtheus Histogram Spec? If not I would be interested in putting a PR to allow for latency metrics to be reported as a histogram.
The maintained fork is now https://github.com/ahus1/prometheus-hystrix .
Hi,
I'm using Spring Cloud Sleuth to track the calls. So, Sleuth registers the default MetricsPublisher.
When HystrixPrometheusMetricsPublisher is tries to register with HystrixPlugins throws "Another strategy was already registered." error.
We would like to add a fixed label to each metric, e.g. service_name="MyService". This is needed to be able to distinguish between different service instances. In our cases the hystrix command key / group is the same as it is physically the same code, just different named instances.
Using the namespace is not really an option as this make finding & using metrics more difficult.
Ideally one should be able to provide a factory of some kind to create e.g. a counter / histogram. We can then provide a subclass of the real prometheus metric and add our logic as required.
Any other thoughts?
My configuration:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: "gatewaymonitor"
scrape_configs:
- job_name: "gateway"
metrics_path: "/public/metrics"
static_configs:
- targets: ["10.8.110.83:18888"]
I'm using spring cloud zuul/hystrix/ribbon all the netflix oss.
I'm able to see the hystrix metrics at http://localhost:18888/public/metrics
I'm not able to see issue with my prometheus.yml.
The error I'm getting: https://ibb.co/hDXR5w
Please help.
The two metrics hystrix_command_total and hystrix_command_error_total are difficult to use and probably return wrong values.
For example if an command executes with with the events SHORT_CIRCUITED, it will not be counted as an error and not as a a total count. This matches the code in HystrixCommandMetrics.HealthCounts.plus, but is nevertheless counter intuitive.
Especially if you want to put all the events observed in relation to the total command received, this leads to strange results, as the number ob observed SHORT_CIRCUITED events is high, while both error and total rate are low.
Hi everyone,
I am currently trying to create a Grafana Dashbooard with some of the Hystrix Prometheus metrics. One of them is a Graf that should show the mean latency over time.
Therefore I am using the following query:
avg(hystrix_command_latency_total_seconds_sum{command_group=~"$commandGroup", command_name=~"$commandName"} ) by (command_name, command_group)
Unfortunately it feels like the used metric just sums up all latencies over time, because my graph looks like this:
I tried to divide the metric hystrix_command_latency_total_seconds_sum
by hystrix_command_latency_total_seconds_count
, but this gives me very unrealistic low results.
Does anyone know how to properly create a query for a mean latency over time?
Thanks in advance!
If update the simpclient to
io.prometheus
simpleclient
0.4.0
prometheus-hystrix will be work?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.