Giter Club home page Giter Club logo

prom-client's People

Contributors

dependabot[bot] avatar doochik avatar fabienchaynes avatar frankie567 avatar gabrielcastro avatar glensc avatar guyellis avatar jamsinclair avatar leonerd avatar ngavalas avatar nowells avatar paulmelnikow avatar pierrickp avatar pigrabbit avatar psimk avatar raisinten avatar ric113 avatar rstuven avatar sam-github avatar serge1peshcoff avatar siimon avatar simenb avatar skellla avatar tcolgate avatar tniessen avatar vpalmisano avatar yosiat avatar yvasiyarov avatar zbjornson avatar zekth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prom-client's Issues

Un-incremented counters don't have value 0

When I navigate to my metrics endpoint, some counters don't have values because they were never incremented. Such a counter is the unknown_failure_total which gets incremented on unhandled exceptions.

I have Prometheus hooked up to Grafana and it keeps alerting me that there's missing data for some counters. I don't want to suppress this alert because missing data could also mean that Grafana is not able to reach my Prometheus server.

The HELP and TYPE statements are printed even for un-incremented counters, but their zero value is not. Should a 0 value be printed for counters which have yet to be incremented?

# HELP postgres_upsert_failure_total postgres_upsert_failure_total
# TYPE postgres_upsert_failure_total counter
postgres_upsert_failure_total{environment="production"} 2

# HELP unknown_failure_total cli_unknown_failure_total
# TYPE unknown_failure_total counter

Thanks!

_counter and +Inf buckets not incremented for every observation

Hi,

the _counter and +Inf buckets should expose the total count of all observations but this isn't the case. Here is an example exposing http request latency observations:

http_request_duration_microseconds_bucket{le="10",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 0
http_request_duration_microseconds_bucket{le="100",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 0
http_request_duration_microseconds_bucket{le="1000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 1464
http_request_duration_microseconds_bucket{le="5000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 25681
http_request_duration_microseconds_bucket{le="10000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 36814
http_request_duration_microseconds_bucket{le="25000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 42800
http_request_duration_microseconds_bucket{le="50000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 47149
http_request_duration_microseconds_bucket{le="100000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 52459
http_request_duration_microseconds_bucket{le="500000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55924
http_request_duration_microseconds_bucket{le="1000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55964
http_request_duration_microseconds_bucket{le="1500000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55975
http_request_duration_microseconds_bucket{le="5000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="10000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="30000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="50000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="+Inf",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 789
http_request_duration_microseconds_sum{code="200",method="POST",route="/api/v2/foo/",role="express-web-server"} 96063322
http_request_duration_microseconds_count{code="200",method="POST",route="/api/v2/foo/",role="express-web-server"} 789

As you can see, the +Inf bucket and http_request_duration_microseconds_count is 789, way lower than the next smaller bucket which is 55977.

The numbers in the buckets also seem to be off. If I build a rate over the le="50000000" bucket, it yield around 10x as many requests as I expect. Might be related to a separate issue.

nodejs_version_info not working with timestamp

Since the timestamp for nodejs_version_info is stale it looks like prometheus stops scraping the metric after a short while, and therefor the data does not appear when querying for it.

screenshot 2017-06-16 15 34 27

Registering a metric twice

Hi! I found myself using this library in perhaps a strange way. Essentially, prom-client was registering a metric twice, which messes up the scraper. In general the solution is to not have multiple metrics with the same name, but I couldn't find a clean way around it.

In a shared lib, I registered a metric. Another dependency used that same shared lib as a dependency. Both are listed in their respective package.json as a git repo url for various reasons. This has the side effect that npm does not dedupe the dependency. Essentially, then, the metric name was registered twice, which caused the prometheus scraper to report text format parsing error in line #: second HELP line for metric name "<metric_name>".

I'm working on a PR that prevents a metric from being registered twice. I am no prometheus expert, so I would appreciate input on if this might be a bad idea! My first thought is that users of prom-client might not expect this behavior and could accidentally mix metrics (though, I think using labels properly would avoid this, yes?) My second thought is that my use case is pathological and I should not change the lib to work around it (in which case, any suggestions? use and escape __dirname appended to the metric name and have prometheus aggregate across all metrics with my metric_name prefix?)

All the tests pass, but something tells me this might be a bigger change than I think and just should be a larger version hop, but I leave that up to you!

Self-signed https pushgateway

Is it possible to trigger a skip ssl cert verification for pushgateway connections?

I have a pushgateway that lives on a self-signed https endpoint but I'm getting certificate errors when its trying access the pushgateway.

JSON output of metrics

Hi, I'm trying to write a status page for my web app with the help of this library. I use it via a custom Koa middleware that collects request metrics. Now it would be useful if I could get these metrics as JSON in order to render it nicely in a static page. Would you accept a PR for this?

Add "external" memory usage

Hello. Starting from node v7.2.0 external filed was added (link) to output of process.memoryUsage(). For most people it probably will not be interesting, but it could help sometimes to find out issues when your app dealing with intensive i/o operations by tracking amount of memory used for bound C++ objects.

I can make a PR, but I don't know where it should be placed, because it's present on all OS, but here we have separate module for measuring linux memory usage which doesn't call process.memoryUsage. So as this metric could be not so popular it can be moved to separate module to provide ability for users to disable it. On the other hand adding it to existing module could save additional process.memoryUsage call. So basically the question is do we really need this as default metric and if so, where it should be placed?

process_cpu_seconds_total increasing faster than 1/s

I recently upgraded my services to Node 6 in order to start collecting process_cpu_seconds_total. I'm now graphing the results and seeing strange results. The stats seem to be claiming that Node is using 125k seconds per second, according to this Prometheus query:

irate(process_cpu_seconds_total{job="bt-actions"}[1m]) * 60

Screenshot of Grafana plot attached. Any idea what's up? Am I querying this wrong or using the wrong units? Do you have reports from people successfully using the process_cpu_seconds_total stat?

Thanks,
Jacob

screenshot from 2016-11-03 09-30-37

counter.inc(0) shouldn't increase

I've a case here, where I'm counting up integers which can also be zero. So, it might happen that counter.inc(0) is called and I assumed, that this doesn't increase it. As far as I can see, this boils down to value || 11, for `value == 0, at the bottom of counter.js.

Right there, I would rather just skip calling the counter if the value is zero.

looking for simple example

I'm relatively new to node.js and have trouble understanding the example particularly how to create a simple 'hello world' app.
https://github.com/siimon/prom-client/blob/master/example/server.js#L5
var register = require('../lib/register');

It seems I need to clone prom-client repo in order to get the example to work. If I consume prom-client as a npm module how would I use prom client with express? Thanks.

var client = require('prom-client');

// express

Add Changelog?

There have been two major releases recently, but I can't tell what's changed.

Exposing Metrics on :9090/metrics

The standard way of getting metrics into Prometheus is scraping the /metrics endpoint of your application. I'm not seeing where this client exposes metrics on the /metrics route. Is this possible or is this just using the push_gateway?

Proposal to add heapTotal and heapUsed to default metrics

I've been using @SimenB's code to collect production stats for a few days now. I also manually added heapTotal and heapUsed stats to my app.

heap-total-screenshot

As you can see, heapTotal and heapUsed are very different values than process_heap_bytes, and I think they more accurately reflect a typical understanding of heap usage. They also correspond to the value of --max_old_space_size. For instance, if I set --max_old_space_size=400, the process terminates when heapUsed (or heapTotal - haven't looked in detail) reaches 400MB. process_heap_bytes starts at ~1.4GB for my app.

@siimon, any objections to adding heapTotal and heapUsed as default metrics?

Error when pulling from NPM

Hi,

I tried to use your library which I think it's cool, but got the following error:

[email protected] install /<user_path>/bufferutil
node-gyp rebuild

CXX(target) Release/obj.target/bufferutil/src/bufferutil.o
In file included from ../src/bufferutil.cc:16:
../../nan/nan.h:261:25: error: redefinition of '_NanEnsureLocal'
NAN_INLINE v8::Local _NanEnsureLocal(v8::Local val) {
^
../../nan/nan.h:256:25: note: previous definition is here
NAN_INLINE v8::Local _NanEnsureLocal(v8::Handle val) {
....

According to a Google search, this seems to be related to some old libraries/dependencies (bufferutil)

I'm using:
node v5.5.0
npm 3.5.3
MacOS 10.11.6

I couldn't see a straightforward way to fix the dependencies issues.

Guidance for clustered processes

There is currently no guidance provided for clustered processes (default mechanism for running an app on multi-core servers). https://nodejs.org/api/cluster.html#cluster_cluster

By default, a clustered process, operating in round-robin fashion, will only serve metrics local to the process which handled a particular scrape request from Prometheus. This makes default metrics like "active requests" meaningless.

Possible solutions include:

  1. Recommend that users "push" metrics rather than the default pull mechanism. If this is the solution, the documentation should do a better job of showing how to set up a regular push of the default metrics via the push gateway
  2. Provide a mechanism for collecting metrics from multiple child processes - possibly by feeding them back to the master process, or e.g. via socket to a dedicated process which can handle all requests for metrics.

Counter Validation Design Question

I was assigning some values to the Counter prototype, and using the metric name 200, this of course is a number, which violates this regex.

My question is, if I was looking to record status codes of the name 200 with a label of statusCode, would you see this as still acceptable? Or would you request I flip them and use the status codes for labels? If the latter, what challenges do you see the former providing from a design prospective?

Thanks for your time!

Theryn

Optimise metric operations when used with the labels function

There is a good possibility to improve the performance if you're using the labels function - instead of calculating the labels hash everytime we do a metric operation it could be done when you create the labels function. This enables the user of the library to cache the metric with the specific label and you won't have to hash the label object everytime.

This is already done in Counters, but should be done in all other metric types as well

Way to provide labels for default metrics?

I can't find a way (from reading the code and readme) to add labels to the default metrics. This is important with regard to production deployment so that e.g. process id can be associated without forcing this data to be a grouping.

This is particularly important when needing to override the "instance" label (for pushgateway instances)

Changing function signatures

Hi

I've been digging into typescript the last few days and started to adding a declaration file for prom-client. While starting to implement this, i really hit a problem on how the function signatures look like when we're creating a new metric. It's really a mess with optional objects that sometimes can be a labels array or a configuration object (see histogram or summary for example). I don't like how it is and it's confusing for both users and the code within prom-client. Another drawback is that it's really messy to add a new parameter that we might want to do in the future.

So my suggestion is to remove all the parameters when creating a metric, and replace it with a single object that will have all the parameters as properties in that object instead.

I know that it will be a mess to migrate to that new version (which will obviously be a major release). What is your thoughts on this? Is it worth implementing this even with the major downside that all of us needs to migrate everywhere you instantiate metrics?

Counters should be created by `.inc(0)`

It's a common pattern in metrics instrumentation to report counters across some set of named states as labels. It's also common to initialise them all to zeroes initially, so that the first increment is noticed by prometheus and on graphs. Otherwise, prometheus' rate() function fails to see the jump from not-existing to 1, and hence the first occurrence of every event never appears on a graph. This can matter enormously for rare events.

For example, I have some code that performs:

var counter = new client.Counter(..., ["state"]);
STATES.forEach((state) => counter.inc({state: state}, 0));

With the older prometheus-client library I migrated away from, this would initialise every counter to zero in output. Using prom-client this doesn't happen, and the first occurrence goes unnoticed.

Failed to disable the default metrics

Hi,

var client = require('prom-client');
clearInterval(client.defaultMetrics());
client.register.clear();

When disabling the default metrics by this code. I got the following error:

Error: A metric with the name process_cpu_seconds_total has already been registered.

Using version is 5.0.0

Optionally serve metrics in protocol buffer format?

Thank for the great library!

Would you be interested in a pull request related to serving metrics to prometheus in the protocol buffer format?

My thought was to create a new function in lib/register.js named something like getMetricsAsProtobuf() which returns a buffer containing the 32-bit varint-encoded record length-delimited protocol buffer messages of type io.prometheus.client.MetricFamily which prometheus expects. This buffer could be returned in the body of an HTTP response using something like Express or Koa.

There would be one added dependency, protobufjs.

The only issue is, you guys internally store the metrics in a format that is not compliant with prometheus' metrics.proto messages, so either A), a conversion function is required, or B), we consider updating prom-client to internally store metrics as MetricFamily compliant objects. (E.G. for metric.type, we store 0 instead of counter, etc.)

Here's the conversion function we're using, it would be nice not to have to do this. It returns an array of metric objects from which we create protobuf messages using protobufjs's generated code's MetricFamily.fromObject(metric) function. (Note: support for Summary and Histogram is unfinished.)

// Convert prom-client metric objects to `io.prometheus.client.MetricFamily` compliant objects
function convertMetrics() {
    let convertedMetrics = []
    for (let metricFamily of prom.getMetricsAsJSON()) {
        let newMetricFamily = {}
        newMetricFamily.name = metricFamily.name
        newMetricFamily.help = metricFamily.help
        let newMetrics = []
        for (let metric of metricFamily.values) {
            let newMetric = {}
            let newLabels = []
            Object.keys(metric.labels).forEach(function (key) {
                if (metric.labels[key].toString() != null)
                    newLabels.push({ "name": key, "value": metric.labels[key].toString() });
            });
            newMetric.label = newLabels;
            switch (metricFamily.type) {
                case "counter":
                    newMetricFamily.type = 0;
                    newMetric.counter = { "value": metric.value };
                    break;
                case "gauge":
                    newMetricFamily.type = 1;
                    newMetric.gauge = { "value": metric.value };
                    break;
                case "summary":
                    newMetricFamily.type = 2;
                    // TODO
                    break;
                case "histogram":
                    newMetricFamily.type = 4;
                    // TODO
                    break;
            }
            // TODO timestamp_ms is not supported in prom-client? (confirm or contribute)
            // newMetric.timestamp_ms = metric.timestamp_ms;
            newMetrics.push(newMetric);
        }
        newMetricFamily.metric = newMetrics
        convertedMetrics.push(newMetricFamily);
    }
    return convertedMetrics;
}

I've written code that successfully sends the protobuf messages to prometheus, I'd like to share it, but the conversion part feels hacky. Thoughts?

object-hash dependency introduces high CPU overhead

While load-testing, we noticed unusually high CPU usage in one of our services and tracked it down to calling prom-client counters. On further investigation, the culprit appears to be calls to the objectHash() function from the object-hash module prom-client depends on. That function, in turn, does quite a bit, including talking to the native crypto libraries to figure out what algorithms are available, instantiating one of them, dispatching to a type-specific handler (which, in the case of an object, recursively flattens the object so it can feed it to one of the cryptographic hash algorithms). On my older Core i7 machine, I could only manage to call counter.inc() about 3,000 times per second at 85% CPU utilization while doing absolutely no other work. Given this level of overhead, we cannot use prom-client in its current state.

So, I rewrote the hashObject() function in prom-client's lib/util.js to do the bare minimum amount of work necessary to produce a unique (and cross-call consistent) string since it doesn't actually need to be a hash – it's only ever used as a key. Now, I can do 300,000 calls to counter.inc() per second while using 85% CPU. This is with a labels object that has two keys in it. An empty labels object can handle 300,000 calls per second at 35% CPU.

Based on those encouraging results, I went through the rest of the prom-client code this morning and made the rest of the changes to eliminate the dependency on object-hash entirely by having the other modules call the updated hashObject() function in lib/util.js. "npm test" works – all tests pass. Patch attached.

objectHash.diff.txt

The label handling is broken

Labels values should be dynamically handled in each metric and expose a "labels" function as guidelines describes

Add merge function to register

This would merge multiple registers together, creating one single registry containing all metrics. This would be the final bit for having a good way to handle prom-client within modules.

If there is conflicts on the metric name an error should be thrown (and documented as well)

Is it possible to break up metrics by sub-properties?

Basically we are trying to match some Prometheus output that's generated by Java apps, with our node apps - so they can be scraped by the same algorithm.

Here is what the Java output looks like for a couple of end points:

# HELP app_request_latency Latencies associated with uris of this application.
# TYPE app_request_latency summary
app_request_latency{uri="/info",quantile="0.5",} 1.0
app_request_latency{uri="/info",quantile="0.95",} 2.0
app_request_latency_count{uri="/info",} 101909.0
app_request_latency_sum{uri="/info",} 130389.0
app_request_latency{uri="/health",quantile="0.5",} NaN
app_request_latency{uri="/health",quantile="0.95",} NaN
app_request_latency_count{uri="/health",} 97.0
app_request_latency_sum{uri="/health",} 253772.0

And here is the best I've been able to do with the node output:

# HELP app_request_latency_info /info
# TYPE app_request_latency_info summary
app_request_latency_info{quantile="0.5"} 4
app_request_latency_info{quantile="0.95"} 4
app_request_latency_info_sum 5
app_request_latency_info_count 2
# HELP app_request_latency_health /health
# TYPE app_request_latency_health summary
app_request_latency_health{quantile="0.5"} 2
app_request_latency_health{quantile="0.95"} 2
app_request_latency_health_sum 3
app_request_latency_health_count 2

As you can see I'm using the metric name to separate them. So it's not really broken down the way we need to match the Java output (by a uri property). I don't know right now if they did some crazy custom work to get that. I'm just wondering if there's some easy way to replicate it with the node client.

Uncaught error when tracking heap usage

Even though Node do not document it, this line can throw with:

Error: EMFILE: too many open files, uv_resident_set_memory
    at Error (native)
    at /...../node_modules/prom-client/lib/metrics/heapSizeAndUsed.js:18:26
    at /...../node_modules/prom-client/lib/defaultMetrics.js:49:11
    at Array.forEach (native)
    at Timeout.updateAllMetrics [as _onTimeout] (/...../node_modules/prom-client/lib/defaultMetrics.js:48:16)
    at ontimeout (timers.js:365:14)
    at Timer.unrefdHandle (timers.js:466:3)

It would be polite if prom-client could catch this.

Aggregate metrics in multiprocess mode

So we use pm2 to run nodes "cluster" in fork mode to fully utilize server cpu cores. This means that every node.js instance can use prom-client to generate the metrics of its own process, but we need one more step to aggregate metrics from every node to a single metric. Is there some solutions with prom-client or I need an external aggregator for this (is there any in the wild then?). Sorry it this is a wrong place to ask this question.

Drop support for node older than 4

Would allow us to use classes instead of prototype, and easier handling of this. We just test on node 4, 6 and 8 on CI now anyways. If you're interested @siimon, I can make a PR for it.

process_cpu_seconds_total gauge breaks push gateway

When trying to use prom-client with a long lived ETL batch job that needs to push, push gateway dies because prom-client uses a gauge for process_cpu_seconds_total (see prometheus/pushgateway#94)

process_cpu_seconds_total uses set on a gauge, but ideally should be exposed as a counter. Either prom-client to add set() to counter, or it should add reset(), then inc() the counter process_cpu_seconds_total counter.

Creating our own metrics

Good afternoon,

I have a project in which we would like to create our own metrics and
we thought your library was interesting. We would like to base our code
on yours. Meaning we would like to inherit from yours or create new
metrics with the same function signatures and styles.

To do that, we would need access to some of the private content of
your library, such as the content of the lib/utils.js and
lib/validation.js.

Would it be possible to make these functions public somehow ?

Maybe you could export them this way:

  var defaultMetrics = require('./lib/defaultMetrics');
  defaultMetrics();

  module.exports = {
    // Main library
    register: require('./lib/register'),
    Counter: require('./lib/counter'),
    Gauge: require('./lib/gauge'),
    Histogram: require('./lib/histogram'),
    Summary: require('./lib/summary'),
    Pushgateway: require('./lib/pushgateway'),
    linearBuckets: require('./lib/bucketGenerators').linearBuckets,
    exponentialBuckets: require('./lib/bucketGenerators').exponentialBuckets,
    defaultMetrics: defaultMetrics,

    // To create your own metrics
    validateMetricName: require('./lib/validation').validateMetricName,
    validateLabelName: require('./lib/validation').validateLabelName,
    validateLabel: require('./lib/validation').validateLabel,

    getPropertiesFromObj: require('./lib/utils').getPropertiesFromObj,
    setValue: require('./lib/utils').setValue,
    getLabels: require('./lib/utils').getLabels,
    hashObject: require('./lib/utils').hashObject,
    isNumber: require('./lib/utils').isNumber,
  };

What do you think ?

Thanks,

Nicolas Pelletier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.