ahus1 / prometheus-hystrix Goto Github PK

View Code? Open in Web Editor NEW

100.0 48.0 35.0 290 KB

This is an implementation of a HystrixMetricsPublisher that publishes metrics using the Prometheus java client.

License: Apache License 2.0

Java 99.90% Batchfile 0.10%

java hystrix prometheus-exporter prometheus

prometheus-hystrix's People

Contributors

Stargazers

Watchers

prometheus-hystrix's Issues

Ratpack Compatability

Hi, I'm having an issue integrating this with a Ratpack application. I'm getting the following error on startup:

INFO: An exception was caught and reported. Message: java.lang.IllegalStateException: Cannot install Hystrix integration because another concurrency strategy (class com.netflix.hystrix.strategy.concurrency.HystrixConcurrencyStrategyDefault) is already installed
java.lang.IllegalStateException: Cannot install Hystrix integration because another concurrency strategy (class com.netflix.hystrix.strategy.concurrency.HystrixConcurrencyStrategyDefault) is already installed

It looks like this bit of prometheus-hystrix code:
https://github.com/ahus1/prometheus-hystrix/blob/master/src/main/java/com/soundcloud/prometheus/hystrix/HystrixPrometheusMetricsPublisher.java#L94

is clashing with this bit of Ratpack code:
https://github.com/ratpack/ratpack/blob/master/ratpack-hystrix/src/main/java/ratpack/hystrix/HystrixModule.java#L90

Is it necessary to register the default HystrixConcurrencyStrategy in prometheus-hystrix or can it be removed?

Hi, I am trying to use this cool lib in my application for publishing hystrix metrics. unfortunately, the application fails to start up. I figure that it's because that the application is on java 7 while this lib requires 1.8.

we don't have a plan to upgrade to java 8. and I found that all versions in maven central are 3.x which requires java 8.

So my question is what's the best way to get it work on java 7? as we just need basic functionalities, I am thinking to re-build 2.x with java 7. Is that the way to go? Can you confirm?

Thanks!

prometheus-hystrix not working with micrometer.io libraries

This no longer works. It appears that micrometer.io has replaced it. I have:

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>de.ahus1.prometheus.hystrix</groupId>
        <artifactId>prometheus-hystrix</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-core</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

But never do I see a hystrix_command_event_total, after many calls to apis have been made, in the output of /actuator/prometheus. I see a lot of micrometer output but nothing from prometheus-hystrix.

Does you library no longer work now that we have micrometer.io?

Need guide for transformation from old prometheus queries (3.4.0) to new queries (4.0.0)

I wanted to use an existing grafana dashboard (https://grafana.com/dashboards/2113) that nicely shows hystrix circuit breaker information. It didn't work at all. It took me hours to find the reason: the grafana dashboard uses metrics that are to exported anymore as of version 4.0.0 of prometheus-hystrix.
It would be nice to include a transformation guide for the metrics.
An additional question: why did you change the exported metrics in the first place? I do not understand the benefits of the complete metric rewrite.

Remove some of hystrix metrics

Is it possible to remove some of metrics from REST endpoint (e.g. remove everything except "hystrix_command_event_total") ?
but cannot find a way to filter collectors that I don't need.

Metric names do not comply with prometheus standards and buckets sizes are not configurable

i know this is on the road map. I just wanted to get an issue in place for the PR I'm likely to submit next week to fix some or all of the following metrics:

hystrix_command_error_total
hystrix_command_event_total
hystrix_command_latency_execute_seconds_bucket
hystrix_command_latency_execute_seconds_count
hystrix_command_latency_execute_seconds_sum
hystrix_command_latency_total_seconds_bucket
hystrix_command_latency_total_seconds_count
hystrix_command_latency_total_seconds_sum
hystrix_thread_pool_completed_task_count
hystrix_thread_pool_count_threads_executed
hystrix_thread_pool_largest_pool_size
hystrix_thread_pool_queue_size
hystrix_thread_pool_rolling_count_threads_executed
hystrix_thread_pool_rolling_max_active_threads
hystrix_thread_pool_thread_active_count
hystrix_thread_pool_total_task_count

Remove some of hystric metrics

Is it possible to remove some of metrics from REST endpoint (e.g. remove everything except "hystrix_command_event_total") ?

I'm currently using this to init metrics:

HystrixPrometheusMetricsPublisher
				.builder()
				.shouldExportDeprecatedMetrics(false)
				.shouldRegisterDefaultPlugins(false)
				.shouldExportProperties(false)
				.buildAndRegister();

but cannot find a way to filter collectors that I don't need.

Release on maven central

Couldn't find any version of this project on maven central. Any chance to release the current version soon?

Negative latency when curcuit breaker is in open state

I noticed that sometimes counters for _hystrix_command_latency_execute_seconds_sum and _hystrix_command_latency_total_seconds_sum have small glitches like that:

8891.582999999922 @1509010418.551
8891.617999999922 @1509010419.551
8891.633999999922 @1509010420.551
8891.633999999922 @1509010421.551 <-- attention
8891.631999999921 @1509010422.551 <-- attention
8891.631999999921 @1509010423.551
8891.64699999992 @1509010424.551
8891.68199999992 @1509010425.551

Prometheus' functions like rate() or increase() tend to think that there is a reboot between these two values and produce a huuuuge peak on the graph.

Deeper investigation showed that this is because of a series of short-circuited executions. The value for latency in case of short-circuit is always -1 ms.
The debug code like that:

HystrixCommandCompletionStream.getInstance(cmdKey)
    .observe()
    .subscribe(hystrixCommandCompletion -> {
        LOG.warn(
            "CMD Completion: executed={} {}",
            hystrixCommandCompletion.didCommandExecute(),
            hystrixCommandCompletion.toString(),
        );
    });

produces:

CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=true listItemsByUserId[FAILURE][2 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=true listItemsByUserId[FAILURE][2 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]
CMD Completion: executed=false listItemsByUserId[SHORT_CIRCUITED][-1 ms]

In purely synthetical tests with 100% failure rate the total sum of latency can even go below 0:

hystrix_command_latency_execute_seconds_count{command_group="SOME_GROUP",command_name="listItemsByUserId",} 684.0
hystrix_command_latency_execute_seconds_sum{command_group="SOME_GROUP",command_name="listItemsByUserId",} -0.3050000000000003
hystrix_command_latency_total_seconds_count{command_group="SOME_GROUP",command_name="listItemsByUserId",} 684.0
hystrix_command_latency_total_seconds_sum{command_group="SOME_GROUP",command_name="listItemsByUserId",} -0.21800000000000028

(Time machine, isn't it? :) )

I think that this could be safely fixed by this additional check in HystrixPrometheusMetricsPublisherCommand class:

HystrixCommandCompletionStream.getInstance(commandKey)
    .observe()
    .subscribe(hystrixCommandCompletion -> {
        if (hystrixCommandCompletion.didCommandExecute()) {
            histogramLatencyTotal.observe(hystrixCommandCompletion.getTotalLatency() / 1000d);
            histogramLatencyExecute.observe(hystrixCommandCompletion.getExecutionLatency() / 1000d);
        }
        for (HystrixEventType hystrixEventType : HystrixEventType.values()) {
            // this code is not touched
        }
    });

What do you think?

Reporting Hystrix Latency Percentiles as a Histogram

https://github.com/soundcloud/prometheus-hystrix/blob/master/src/main/java/com/soundcloud/prometheus/hystrix/HystrixPrometheusMetricsPublisherCommand.java#L193

Is there any reason why these latency's are published as a gauge rather than a histogram? Is there something about them that doesn't meet the Promtheus Histogram Spec? If not I would be interested in putting a PR to allow for latency metrics to be reported as a histogram.

Graveyard this repo.

The maintained fork is now https://github.com/ahus1/prometheus-hystrix .

Another strategy was already registered.

Hi,

I'm using Spring Cloud Sleuth to track the calls. So, Sleuth registers the default MetricsPublisher.
When HystrixPrometheusMetricsPublisher is tries to register with HystrixPlugins throws "Another strategy was already registered." error.

Fixed / global labels

We would like to add a fixed label to each metric, e.g. service_name="MyService". This is needed to be able to distinguish between different service instances. In our cases the hystrix command key / group is the same as it is physically the same code, just different named instances.

Using the namespace is not really an option as this make finding & using metrics more difficult.

Ideally one should be able to provide a factory of some kind to create e.g. a counter / histogram. We can then provide a subclass of the real prometheus metric and add our logic as required.

Any other thoughts?

Error: text format parsing error in line 1: invalid metric name

My configuration:

global:
  scrape_interval:     15s
  evaluation_interval: 15s
  external_labels:
      monitor: "gatewaymonitor"

scrape_configs:
  - job_name: "gateway"
    metrics_path: "/public/metrics"

    static_configs:
      - targets: ["10.8.110.83:18888"]

I'm using spring cloud zuul/hystrix/ribbon all the netflix oss.
I'm able to see the hystrix metrics at http://localhost:18888/public/metrics
I'm not able to see issue with my prometheus.yml.
The error I'm getting: https://ibb.co/hDXR5w
Please help.

hystrix_command_total and hystrix_command_error_total are difficult to use

The two metrics hystrix_command_total and hystrix_command_error_total are difficult to use and probably return wrong values.

For example if an command executes with with the events SHORT_CIRCUITED, it will not be counted as an error and not as a a total count. This matches the code in HystrixCommandMetrics.HealthCounts.plus, but is nevertheless counter intuitive.

Especially if you want to put all the events observed in relation to the total command received, this leads to strange results, as the number ob observed SHORT_CIRCUITED events is high, while both error and total rate are low.

Creating Dashboard for mean Latency

Hi everyone,

I am currently trying to create a Grafana Dashbooard with some of the Hystrix Prometheus metrics. One of them is a Graf that should show the mean latency over time.
Therefore I am using the following query:
avg(hystrix_command_latency_total_seconds_sum{command_group=~"$commandGroup", command_name=~"$commandName"} ) by (command_name, command_group)

Unfortunately it feels like the used metric just sums up all latencies over time, because my graph looks like this:

I tried to divide the metric hystrix_command_latency_total_seconds_sum by hystrix_command_latency_total_seconds_count, but this gives me very unrealistic low results.

Does anyone know how to properly create a query for a mean latency over time?

Thanks in advance!

Update <artifactId>simpleclient</artifactId> to 0.4.0

If update the simpclient to

io.prometheus
simpleclient
0.4.0

prometheus-hystrix will be work?

ahus1 / prometheus-hystrix Goto Github PK

prometheus-hystrix's People

Contributors

Stargazers

Watchers

Forkers

prometheus-hystrix's Issues

Recommend Projects

Recommend Topics

Recommend Org