siimon / prom-client Goto Github PK
View Code? Open in Web Editor NEWPrometheus client for node.js
License: Apache License 2.0
Prometheus client for node.js
License: Apache License 2.0
When I navigate to my metrics endpoint, some counters don't have values because they were never incremented. Such a counter is the unknown_failure_total
which gets incremented on unhandled exceptions.
I have Prometheus hooked up to Grafana and it keeps alerting me that there's missing data for some counters. I don't want to suppress this alert because missing data could also mean that Grafana is not able to reach my Prometheus server.
The HELP
and TYPE
statements are printed even for un-incremented counters, but their zero value is not. Should a 0 value be printed for counters which have yet to be incremented?
# HELP postgres_upsert_failure_total postgres_upsert_failure_total
# TYPE postgres_upsert_failure_total counter
postgres_upsert_failure_total{environment="production"} 2
# HELP unknown_failure_total cli_unknown_failure_total
# TYPE unknown_failure_total counter
Thanks!
I can't find this anywhere in the documentation.
Hi,
the _counter and +Inf buckets should expose the total count of all observations but this isn't the case. Here is an example exposing http request latency observations:
http_request_duration_microseconds_bucket{le="10",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 0
http_request_duration_microseconds_bucket{le="100",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 0
http_request_duration_microseconds_bucket{le="1000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 1464
http_request_duration_microseconds_bucket{le="5000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 25681
http_request_duration_microseconds_bucket{le="10000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 36814
http_request_duration_microseconds_bucket{le="25000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 42800
http_request_duration_microseconds_bucket{le="50000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 47149
http_request_duration_microseconds_bucket{le="100000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 52459
http_request_duration_microseconds_bucket{le="500000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55924
http_request_duration_microseconds_bucket{le="1000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55964
http_request_duration_microseconds_bucket{le="1500000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55975
http_request_duration_microseconds_bucket{le="5000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="10000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="30000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="50000000",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 55977
http_request_duration_microseconds_bucket{le="+Inf",role="express-web-server",route="/api/v2/foo/",method="POST",code="200"} 789
http_request_duration_microseconds_sum{code="200",method="POST",route="/api/v2/foo/",role="express-web-server"} 96063322
http_request_duration_microseconds_count{code="200",method="POST",route="/api/v2/foo/",role="express-web-server"} 789
As you can see, the +Inf bucket and http_request_duration_microseconds_count is 789, way lower than the next smaller bucket which is 55977.
The numbers in the buckets also seem to be off. If I build a rate over the le="50000000" bucket, it yield around 10x as many requests as I expect. Might be related to a separate issue.
It wasn't clear to me how to push the default metrics from the gateway to a separate server. Any explanation on this?
Please add getSingleMetric to Typescript definition.
Hi! I found myself using this library in perhaps a strange way. Essentially, prom-client was registering a metric twice, which messes up the scraper. In general the solution is to not have multiple metrics with the same name, but I couldn't find a clean way around it.
In a shared lib, I registered a metric. Another dependency used that same shared lib as a dependency. Both are listed in their respective package.json
as a git repo url for various reasons. This has the side effect that npm does not dedupe the dependency. Essentially, then, the metric name was registered twice, which caused the prometheus scraper to report text format parsing error in line #: second HELP line for metric name "<metric_name>"
.
I'm working on a PR that prevents a metric from being registered twice. I am no prometheus expert, so I would appreciate input on if this might be a bad idea! My first thought is that users of prom-client might not expect this behavior and could accidentally mix metrics (though, I think using labels properly would avoid this, yes?) My second thought is that my use case is pathological and I should not change the lib to work around it (in which case, any suggestions? use and escape __dirname
appended to the metric name and have prometheus aggregate across all metrics with my metric_name prefix?)
All the tests pass, but something tells me this might be a bigger change than I think and just should be a larger version hop, but I leave that up to you!
Is it possible to trigger a skip ssl cert verification for pushgateway
connections?
I have a pushgateway that lives on a self-signed https endpoint but I'm getting certificate errors when its trying access the pushgateway.
Node provides built-in ways to get CPU and memory usage information from the process
package:
https://nodejs.org/api/process.html#process_process_cpuusage_previousvalue
https://nodejs.org/api/process.html#process_process_memoryusage
It would be useful to provide these as exported variables by default. This would be in keeping with practice of other Prometheus clients, like the Go client. Would you be open to a patch?
Hi, I'm trying to write a status page for my web app with the help of this library. I use it via a custom Koa middleware that collects request metrics. Now it would be useful if I could get these metrics as JSON in order to render it nicely in a static page. Would you accept a PR for this?
Hello. Starting from node v7.2.0 external
filed was added (link) to output of process.memoryUsage()
. For most people it probably will not be interesting, but it could help sometimes to find out issues when your app dealing with intensive i/o operations by tracking amount of memory used for bound C++ objects.
I can make a PR, but I don't know where it should be placed, because it's present on all OS, but here we have separate module for measuring linux memory usage which doesn't call process.memoryUsage
. So as this metric could be not so popular it can be moved to separate module to provide ability for users to disable it. On the other hand adding it to existing module could save additional process.memoryUsage
call. So basically the question is do we really need this as default metric and if so, where it should be placed?
Validate label names according to http://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
I recently upgraded my services to Node 6 in order to start collecting process_cpu_seconds_total. I'm now graphing the results and seeing strange results. The stats seem to be claiming that Node is using 125k seconds per second, according to this Prometheus query:
irate(process_cpu_seconds_total{job="bt-actions"}[1m]) * 60
Screenshot of Grafana plot attached. Any idea what's up? Am I querying this wrong or using the wrong units? Do you have reports from people successfully using the process_cpu_seconds_total stat?
Thanks,
Jacob
I've a case here, where I'm counting up integers which can also be zero. So, it might happen that counter.inc(0)
is called and I assumed, that this doesn't increase it. As far as I can see, this boils down to value || 1
→ 1
, for `value == 0, at the bottom of counter.js.
Right there, I would rather just skip calling the counter if the value is zero.
I'm relatively new to node.js and have trouble understanding the example particularly how to create a simple 'hello world' app.
https://github.com/siimon/prom-client/blob/master/example/server.js#L5
var register = require('../lib/register');
It seems I need to clone prom-client repo in order to get the example to work. If I consume prom-client as a npm module how would I use prom client with express? Thanks.
var client = require('prom-client');
// express
There have been two major releases recently, but I can't tell what's changed.
Right now we discard all values that's greater than the highest bucket value - this is probably wrong.
The java client pushes the value sum without adding it to a bucket, so that's an idea
The standard way of getting metrics into Prometheus is scraping the /metrics endpoint of your application. I'm not seeing where this client exposes metrics on the /metrics route. Is this possible or is this just using the push_gateway?
I've been using @SimenB's code to collect production stats for a few days now. I also manually added heapTotal and heapUsed stats to my app.
As you can see, heapTotal and heapUsed are very different values than process_heap_bytes, and I think they more accurately reflect a typical understanding of heap usage. They also correspond to the value of --max_old_space_size
. For instance, if I set --max_old_space_size=400
, the process terminates when heapUsed (or heapTotal - haven't looked in detail) reaches 400MB. process_heap_bytes starts at ~1.4GB for my app.
@siimon, any objections to adding heapTotal and heapUsed as default metrics?
Hi,
I tried to use your library which I think it's cool, but got the following error:
[email protected] install /<user_path>/bufferutil
node-gyp rebuild
CXX(target) Release/obj.target/bufferutil/src/bufferutil.o
In file included from ../src/bufferutil.cc:16:
../../nan/nan.h:261:25: error: redefinition of '_NanEnsureLocal'
NAN_INLINE v8::Local _NanEnsureLocal(v8::Local val) {
^
../../nan/nan.h:256:25: note: previous definition is here
NAN_INLINE v8::Local _NanEnsureLocal(v8::Handle val) {
....
According to a Google search, this seems to be related to some old libraries/dependencies (bufferutil)
I'm using:
node v5.5.0
npm 3.5.3
MacOS 10.11.6
I couldn't see a straightforward way to fix the dependencies issues.
There is currently no guidance provided for clustered processes (default mechanism for running an app on multi-core servers). https://nodejs.org/api/cluster.html#cluster_cluster
By default, a clustered process, operating in round-robin fashion, will only serve metrics local to the process which handled a particular scrape request from Prometheus. This makes default metrics like "active requests" meaningless.
Possible solutions include:
Use http module in node core instead to reduce unnecessary dependencies
In the official Prometheus Python client, they expose a constant for the right content-type header and it's tied to the library. That would be nice in this package too.
client_python
:
>>> from prometheus_client import CONTENT_TYPE_LATEST
>>> CONTENT_TYPE_LATEST
'text/plain; version=0.0.4; charset=utf-8'
Is there a function like client_python's push_to_gateway
used to export all metrics data to Pushgateway?
It would be useful if you could define a global namespace => a prefix to be attach to all metrics names.
I was assigning some values to the Counter
prototype, and using the metric name 200
, this of course is a number, which violates this regex.
My question is, if I was looking to record status codes of the name 200
with a label of statusCode
, would you see this as still acceptable? Or would you request I flip them and use the status codes for labels? If the latter, what challenges do you see the former providing from a design prospective?
Thanks for your time!
Theryn
There is a good possibility to improve the performance if you're using the labels function - instead of calculating the labels hash everytime we do a metric operation it could be done when you create the labels function. This enables the user of the library to cache the metric with the specific label and you won't have to hash the label object everytime.
This is already done in Counters, but should be done in all other metric types as well
I can't find a way (from reading the code and readme) to add labels to the default metrics. This is important with regard to production deployment so that e.g. process id can be associated without forcing this data to be a grouping.
This is particularly important when needing to override the "instance" label (for pushgateway instances)
Hi
I've been digging into typescript the last few days and started to adding a declaration file for prom-client. While starting to implement this, i really hit a problem on how the function signatures look like when we're creating a new metric. It's really a mess with optional objects that sometimes can be a labels array or a configuration object (see histogram or summary for example). I don't like how it is and it's confusing for both users and the code within prom-client. Another drawback is that it's really messy to add a new parameter that we might want to do in the future.
So my suggestion is to remove all the parameters when creating a metric, and replace it with a single object that will have all the parameters as properties in that object instead.
I know that it will be a mess to migrate to that new version (which will obviously be a major release). What is your thoughts on this? Is it worth implementing this even with the major downside that all of us needs to migrate everywhere you instantiate metrics?
As a client I would like to be able to set the precision/unit of the time measured. For instance microsecond level.
It's a common pattern in metrics instrumentation to report counters across some set of named states as labels. It's also common to initialise them all to zeroes initially, so that the first increment is noticed by prometheus and on graphs. Otherwise, prometheus' rate()
function fails to see the jump from not-existing to 1, and hence the first occurrence of every event never appears on a graph. This can matter enormously for rare events.
For example, I have some code that performs:
var counter = new client.Counter(..., ["state"]);
STATES.forEach((state) => counter.inc({state: state}, 0));
With the older prometheus-client
library I migrated away from, this would initialise every counter to zero in output. Using prom-client
this doesn't happen, and the first occurrence goes unnoticed.
Is there an example available on how to load the library into an existing Angular2 project? Or is there a special configuration for this?
I'll let the test be activated for now and let the nock contributors have some time to fix it. If something doesn't happens soon we'll have to rewrite the tests fixtures to something else.
There are a couple of issues for this
nock/nock#925
nock/nock#922
nock/nock#928
The LICENSE file contains Apache 2.0 license text but the "license" property in package.json contains the value "MIT", which is confusing for those browsing the package on npmjs.org.
Hi,
var client = require('prom-client');
clearInterval(client.defaultMetrics());
client.register.clear();
When disabling the default metrics by this code. I got the following error:
Error: A metric with the name process_cpu_seconds_total has already been registered.
Using version is 5.0.0
Thank for the great library!
Would you be interested in a pull request related to serving metrics to prometheus in the protocol buffer format?
My thought was to create a new function in lib/register.js
named something like getMetricsAsProtobuf()
which returns a buffer containing the 32-bit varint-encoded record length-delimited protocol buffer messages of type io.prometheus.client.MetricFamily
which prometheus expects. This buffer could be returned in the body of an HTTP response using something like Express or Koa.
There would be one added dependency, protobufjs.
The only issue is, you guys internally store the metrics in a format that is not compliant with prometheus' metrics.proto messages, so either A), a conversion function is required, or B), we consider updating prom-client to internally store metrics as MetricFamily
compliant objects. (E.G. for metric.type
, we store 0
instead of counter
, etc.)
Here's the conversion function we're using, it would be nice not to have to do this. It returns an array of metric objects from which we create protobuf messages using protobufjs's generated code's MetricFamily.fromObject(metric)
function. (Note: support for Summary and Histogram is unfinished.)
// Convert prom-client metric objects to `io.prometheus.client.MetricFamily` compliant objects
function convertMetrics() {
let convertedMetrics = []
for (let metricFamily of prom.getMetricsAsJSON()) {
let newMetricFamily = {}
newMetricFamily.name = metricFamily.name
newMetricFamily.help = metricFamily.help
let newMetrics = []
for (let metric of metricFamily.values) {
let newMetric = {}
let newLabels = []
Object.keys(metric.labels).forEach(function (key) {
if (metric.labels[key].toString() != null)
newLabels.push({ "name": key, "value": metric.labels[key].toString() });
});
newMetric.label = newLabels;
switch (metricFamily.type) {
case "counter":
newMetricFamily.type = 0;
newMetric.counter = { "value": metric.value };
break;
case "gauge":
newMetricFamily.type = 1;
newMetric.gauge = { "value": metric.value };
break;
case "summary":
newMetricFamily.type = 2;
// TODO
break;
case "histogram":
newMetricFamily.type = 4;
// TODO
break;
}
// TODO timestamp_ms is not supported in prom-client? (confirm or contribute)
// newMetric.timestamp_ms = metric.timestamp_ms;
newMetrics.push(newMetric);
}
newMetricFamily.metric = newMetrics
convertedMetrics.push(newMetricFamily);
}
return convertedMetrics;
}
I've written code that successfully sends the protobuf messages to prometheus, I'd like to share it, but the conversion part feels hacky. Thoughts?
While load-testing, we noticed unusually high CPU usage in one of our services and tracked it down to calling prom-client counters. On further investigation, the culprit appears to be calls to the objectHash() function from the object-hash module prom-client depends on. That function, in turn, does quite a bit, including talking to the native crypto libraries to figure out what algorithms are available, instantiating one of them, dispatching to a type-specific handler (which, in the case of an object, recursively flattens the object so it can feed it to one of the cryptographic hash algorithms). On my older Core i7 machine, I could only manage to call counter.inc() about 3,000 times per second at 85% CPU utilization while doing absolutely no other work. Given this level of overhead, we cannot use prom-client in its current state.
So, I rewrote the hashObject() function in prom-client's lib/util.js to do the bare minimum amount of work necessary to produce a unique (and cross-call consistent) string since it doesn't actually need to be a hash – it's only ever used as a key. Now, I can do 300,000 calls to counter.inc() per second while using 85% CPU. This is with a labels object that has two keys in it. An empty labels object can handle 300,000 calls per second at 35% CPU.
Based on those encouraging results, I went through the rest of the prom-client code this morning and made the rest of the changes to eliminate the dependency on object-hash entirely by having the other modules call the updated hashObject() function in lib/util.js. "npm test" works – all tests pass. Patch attached.
Labels values should be dynamically handled in each metric and expose a "labels" function as guidelines describes
This would merge multiple registers together, creating one single registry containing all metrics. This would be the final bit for having a good way to handle prom-client within modules.
If there is conflicts on the metric name an error should be thrown (and documented as well)
Basically we are trying to match some Prometheus output that's generated by Java apps, with our node apps - so they can be scraped by the same algorithm.
Here is what the Java output looks like for a couple of end points:
# HELP app_request_latency Latencies associated with uris of this application.
# TYPE app_request_latency summary
app_request_latency{uri="/info",quantile="0.5",} 1.0
app_request_latency{uri="/info",quantile="0.95",} 2.0
app_request_latency_count{uri="/info",} 101909.0
app_request_latency_sum{uri="/info",} 130389.0
app_request_latency{uri="/health",quantile="0.5",} NaN
app_request_latency{uri="/health",quantile="0.95",} NaN
app_request_latency_count{uri="/health",} 97.0
app_request_latency_sum{uri="/health",} 253772.0
And here is the best I've been able to do with the node output:
# HELP app_request_latency_info /info
# TYPE app_request_latency_info summary
app_request_latency_info{quantile="0.5"} 4
app_request_latency_info{quantile="0.95"} 4
app_request_latency_info_sum 5
app_request_latency_info_count 2
# HELP app_request_latency_health /health
# TYPE app_request_latency_health summary
app_request_latency_health{quantile="0.5"} 2
app_request_latency_health{quantile="0.95"} 2
app_request_latency_health_sum 3
app_request_latency_health_count 2
As you can see I'm using the metric name to separate them. So it's not really broken down the way we need to match the Java output (by a uri property). I don't know right now if they did some crazy custom work to get that. I'm just wondering if there's some easy way to replicate it with the node client.
Even though Node do not document it, this line can throw with:
Error: EMFILE: too many open files, uv_resident_set_memory
at Error (native)
at /...../node_modules/prom-client/lib/metrics/heapSizeAndUsed.js:18:26
at /...../node_modules/prom-client/lib/defaultMetrics.js:49:11
at Array.forEach (native)
at Timeout.updateAllMetrics [as _onTimeout] (/...../node_modules/prom-client/lib/defaultMetrics.js:48:16)
at ontimeout (timers.js:365:14)
at Timer.unrefdHandle (timers.js:466:3)
It would be polite if prom-client
could catch this.
So we use pm2 to run nodes "cluster" in fork mode to fully utilize server cpu cores. This means that every node.js instance can use prom-client to generate the metrics of its own process, but we need one more step to aggregate metrics from every node to a single metric. Is there some solutions with prom-client or I need an external aggregator for this (is there any in the wild then?). Sorry it this is a wrong place to ask this question.
Would allow us to use classes instead of prototype, and easier handling of this
. We just test on node 4, 6 and 8 on CI now anyways. If you're interested @siimon, I can make a PR for it.
When trying to use prom-client with a long lived ETL batch job that needs to push, push gateway dies because prom-client uses a gauge for process_cpu_seconds_total (see prometheus/pushgateway#94)
process_cpu_seconds_total uses set on a gauge, but ideally should be exposed as a counter. Either prom-client to add set() to counter, or it should add reset(), then inc() the counter process_cpu_seconds_total counter.
Good afternoon,
I have a project in which we would like to create our own metrics and
we thought your library was interesting. We would like to base our code
on yours. Meaning we would like to inherit from yours or create new
metrics with the same function signatures and styles.
To do that, we would need access to some of the private content of
your library, such as the content of the lib/utils.js
and
lib/validation.js
.
Would it be possible to make these functions public somehow ?
Maybe you could export them this way:
var defaultMetrics = require('./lib/defaultMetrics');
defaultMetrics();
module.exports = {
// Main library
register: require('./lib/register'),
Counter: require('./lib/counter'),
Gauge: require('./lib/gauge'),
Histogram: require('./lib/histogram'),
Summary: require('./lib/summary'),
Pushgateway: require('./lib/pushgateway'),
linearBuckets: require('./lib/bucketGenerators').linearBuckets,
exponentialBuckets: require('./lib/bucketGenerators').exponentialBuckets,
defaultMetrics: defaultMetrics,
// To create your own metrics
validateMetricName: require('./lib/validation').validateMetricName,
validateLabelName: require('./lib/validation').validateLabelName,
validateLabel: require('./lib/validation').validateLabel,
getPropertiesFromObj: require('./lib/utils').getPropertiesFromObj,
setValue: require('./lib/utils').setValue,
getLabels: require('./lib/utils').getLabels,
hashObject: require('./lib/utils').hashObject,
isNumber: require('./lib/utils').isNumber,
};
What do you think ?
Thanks,
Nicolas Pelletier
if not used with a prior .labels() call.
see https://github.com/siimon/prom-client/blob/master/lib/histogram.js#L65, 0 gets ||
ed to {},
histogram = new require("node-prom").Histogram(..);
histogram.observe(0);
results in
Error: Value is not a valid number
at ../node_modules/prom-client/lib/summary.js:210:10
at Summary.observe (../node_modules/prom-client/lib/summary.js:57:34)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.