prometheus / client_ruby Goto Github PK
View Code? Open in Web Editor NEWPrometheus instrumentation library for Ruby applications
License: Apache License 2.0
Prometheus instrumentation library for Ruby applications
License: Apache License 2.0
There is currently no decay of summary observations. This might lead to wrong quantile metrics in low throughput values due to stale values skewing results.
The performance of the ruby client needs to be tested with benchmarks. There have been a few reports about performance issues during metrics scrape, especially when using many summaries.
I think it would be nice to have the README.md showing instructions to the most recent stable version.
If I go to github.com/prometheus/client_ruby
and look at the README.md, those are not the instructions for the most recent stable version.
Seeing histogram support land is great. Can the Rack collector be updated to export it?
the metrics endpoint is opened and have no authentication.
I use a block to auth this block to solve this, and in my Rails app, it used like this
config.middleware.use Prometheus::Middleware::Exporter, authentication: ->(env) do
ActiveSupport::SecurityUtils.secure_compare(
Rack::Request.new(env).params['secret'].to_s,
YOUR_SECRET
)
end
module Prometheus
module Middleware
class Exporter
attr_reader :app, :registry, :path
FORMATS = [Client::Formats::Text].freeze
FALLBACK = Client::Formats::Text
DEFAULT_AUTHENTICATION = ->(_) { true }
def initialize(app, options = {})
@app = app
@registry = options[:registry] || Client.registry
@path = options[:path] || '/metrics'
@acceptable = build_dictionary(FORMATS, FALLBACK)
@authentication = options[:authentication] || DEFAULT_AUTHENTICATION
end
def call(env)
if env['PATH_INFO'] == @path
if !!@authentication.call(env)
format = negotiate(env, @acceptable)
format ? respond_with(format) : not_acceptable(FORMATS)
else
authentication_failed!
end
else
@app.call(env)
end
end
private
def authentication_failed!
[ 401,
{ 'Content-Type' => 'text/plain' },
["Authentication Failed"]
]
end
end
end
end
I wonder which is the right time to push metrics in a background job system like Sidekiq, seems too much to do it after each job completes/fail, a cronjob or scheduled task is better? how often?
I have the understanding that Sidekiq uses threads so registry will be available for every job as long the process is running, but I wonder what happen with scheduled jobs, perhaps this question is more for Sidekiq but some of you might be already answered yourself this.
Thanks!
As part of our work in #95 to introduce multi-process support, we made several breaking changes to the interface of the library.
To ease the transition for existing users, we should provide some documentation on the changes.
We (GoCardless) run our services in containers, which means a clean file system every time we boot the app.
We should look at what the behaviour is like for people who have file systems that persist between versions of the app. If so, we should look at what mitigations we can implement to make DirectFileStore
work by default.
Any edge-cases should be added to the DirectFileStore
’s docs.
Hi, first of all, I would like to thank you for your awesome work.
I found
@metrics_prefix = options[:metrics_prefix] || 'http_server'
…
inside of lib/prometheus/middleware/collector.rb
which enables users of this gem to customize the prefix of some default metrics.
However, this line does not present in the released version of this gem (latest version 0.7.1). After downloading the gem and view the source, I cannot find the equivalent line :
I would like to push the metrics collected by the Rake Collector middleware onto a Pushgateway but I'm not sure how to accomplish this.
Hi!
I've configured the gem in a Rails App and default metrics seem to be working fine. However, If I try to do something like this:
def index
gauge = Prometheus::Client::Gauge.new(:room_temperature_celsius, '...')
gauge.set({ room: 'kitchen' }, 21.534)
result = User.find_by(username: params[:username])
if result.nil?
render json: { msg: 'Error user name not found' }, status: :not_found
else
render json: result
end
end
The metric does not show in /metrics
path.
Is there any I'm doing wrong?
Thank you!
One of our internal users raised a point about our method of reading PID files when exporting metrics making it possible to accidentally include more files than it should. Specifically, if one metric name is a subset of another, the export of the metric with the subset name could include values from the metric with a longer name.
@dmagliola commented that running into this issue would involve putting a triple underscore in your metric name, which is highly unusual and against conventions, but maybe we can choose a character that never appears in metric names when we generate the file names.
At a minimum, if we make no code change, we should document the behaviour.
Supporting flexible labels in our out-the-box Rack middleware commits us to maintaining what is currently quite a confusing API.
@dmagliola discussed a few ways to make the API less weird, but they always resulted in the middleware accepting multiple lambdas for custom behaviour as arguments and having almost no behaviour provided out the box - sort of defeating the purpose!
We can always come back and add a better implementation of this functionality later, but it will be a pain to take this version of it away.
I’m in favour of only supporting our fixed set of labels in the Rack middleware we provide, and having a README section advising people to do their own thing if they want something more sophisticated.
In contrast to other prometheus clients (i.e. golang) the histogram does not use disjoint buckets but cumulative values (see https://github.com/prometheus/client_ruby/blob/master/lib/prometheus/client/histogram.rb#L28)
While this is also a nice way to collect metrics, it should be named as CumulativeHistogram
and Histogram
should behave as other client libraries do, as this can be confusing, especially when using quantile conversion.
The client library should follow our standard collector design and not scrape metrics directly: https://prometheus.io/docs/instrumenting/writing_clientlibs/#overall-structure
Currently tracking this in tenderlove/mmap#6. The issue is that the gemspec is broken for modern ruby.
The prometheus documentation states that:
Counters should not be used to expose current counts of items whose number can also go down, e.g. the number of currently running goroutines.
Currently counters can be decremented by decrement
or passing a negative value into increment
.
See https://github.com/prometheus/client_golang/blob/master/prometheus/process_collector.go for the go implementation.
The API that this client uses to push metrics to a push gateway was deprecated and then removed in this commit. Push from this client to newer versions of the push gateway fail with a 404 Not Found
error.
When exposing buckets, the suffix is missing.
I am newbie with prometheus. I wanted to test histogram & summary with pushgateway
.
Readme was not helpful for me to setup basic histogram/summary metrics.
I think it will be very helpful to add basic example to examples
directory.
I'm pretty strongly against supporting anything below Ruby 2.1 because the lack of required keyword arguments is a pain to work around (you can, with sentinel values, but it's a mess). It's been out of support for a long time now that I don't think it justifies the ongoing effort and risk of bugs in our workarounds.
This also raises the discussion of what our Ruby version support policy should be overall. I think it's good to document this up front, to set expectations around how we'll act as maintainers - something we've done before on our own open source projects.
Tangientially, some of the memory optimisations people have been playing with here involve methods introduced to Ruby's stdlib in relatively new versions (one of them was added in 2.5). If they're really worth having, we might be looking at some code that conditionally runs in those versions.
Once we decide what we're doing, we should translate that to the CI matrix of Ruby versions that we run our tests against.
[
{
"baseLabels": {
"__name__": "http_requests_total"
},
"docstring": "A counter of the total number of HTTP requests made.",
"metric": {
"type": "counter",
"value": [
{
"labels": {
"method": "get",
"path": "/metrics",
"code": "200"
},
"value": 6
},
{
"labels": {
"method": "get",
"path": "/",
"code": "200"
},
"value": 11
}
]
}
},
{
"baseLabels": {
"__name__": "http_request_durations_total_microseconds"
},
"docstring": "The total amount of time spent answering HTTP requests (microseconds).",
"metric": {
"type": "counter",
"value": [
{
"labels": {
"method": "get",
"path": "/metrics",
"code": "200"
},
"value": 4057
},
{
"labels": {
"method": "get",
"path": "/",
"code": "200"
},
"value": 27355
}
]
}
},
{
"baseLabels": {
"__name__": "http_request_durations_microseconds"
},
"docstring": "A histogram of the response latency (microseconds).",
"metric": {
"type": "histogram",
"value": [
{
"labels": {
"method": "get",
"path": "/metrics",
"code": "200"
},
"value": {
"0.5": 609,
"0.9": 652,
"0.99": 652
}
},
{
"labels": {
"method": "get",
"path": "/",
"code": "200"
},
"value": {
"0.5": 1492,
"0.9": 1628,
"0.99": 1628
}
}
]
}
},
{
"baseLabels": {
"__name__": "http_exceptions_total"
},
"docstring": "A counter of the total number of exceptions raised.",
"metric": {
"type": "counter",
"value": []
}
}
]
I've included Prometheus rack middleware in a new Rails app and /metrics returns JSON. Is this normal/expected?
There is currently no support for prometheus' protobuf format. The format is described here: http://prometheus.io/docs/instrumenting/exposition_formats/.
The protobuf definition itself has been created already: https://github.com/prometheus/client_model/tree/master/ruby.
Hi,
I was playing with this lib yesterday, mostly to be able to provide metrics about certain aspects of the application but I've also tried Collector as well. Collector provides the same metrics that you can get with Nginx-Lua so I'll stay with it since it is generic solution for every HTTP app and I think it is little bit less overhead being collected with Lua from Nginx side.
However the question I would like to do is about knowing how you guys are dealing with the collector tracking URI's since it collect a bunch of data that makes scraper takes a lot of seconds to complete. For example in Staging with ~3 people accessing the application the scrape time is about 20s I cannot imagine how much it could be in Production. Easy fix is don't track URI but you loose the ability to identify slow endpoints, of-course is possible to use Logs or implement something else to get the slowest requests but will be really nice to keep that data in Prometheus.
Can you please share thoughts and experiences about Collector?
Hello.
I'm using current master and trying to get rid of the path
parameter in the metrics.
If I do this
use Prometheus::Middleware::Collector, counter_label_builder: ->(env, code) {
{
method: env['REQUEST_METHOD'].downcase,
code: code
}
}
then http_server_request_duration_seconds_bucket
metric gets path
parameter and graph which shows percentiles becomes suuuuper slow. However with such setup http_server_requests_total
metric doesn't have path
, so graph which shows request per seconds works ok.
If I change to duration_label_builder
, then it works vice versa: http_server_requests_total
gets path
and http_server_request_duration_seconds_bucket
does not have it.
I'm confused how to remove path
from both metrics, because having path
in any of these metrics leads to super slow graphs. As I understand having path
in metrics makes in slow to squash all these metrics with different path
when you group by something like method
, for example for requests-per-second graphs I have the following query:
sum(rate(http_server_request_duration_seconds_count{job="myjob"}[1m])) by(method)
And if there are each http_server_request_duration_seconds_count
per path
, it becomes slow.
Thanks.
I have problems with pushes.
curl to pushgate works, but this does not:
require 'prometheus/client'
require 'prometheus/client/push'
prometheus = Prometheus::Client.registry
counter = Prometheus::Client::Counter.new(:something_here, 'hello world')
counter.increment({ service: 'foo' })
counter.increment({ service: 'foo' })
counter.increment({ service: 'foo' })
counter.increment({ service: 'foo' })
Prometheus::Client::Push.new('job-1', nil, 'http://my-pushgate:9091').add(prometheus)
I don't see something_here
counter in http://my-pushgate:9091/metrics. Only push_time_seconds
What am I doing wrong?
If you use this gem with a multi-process Rack server such as Unicorn, surely each worker will be returning just a percentage of the correct results (eg., number of requests served, total time), thus making the exposed metrics fairly meaningless?
To solve this the easiest solution is to create a block of shared memory in the master process that all workers share, instead of using instance variables.
Would it be possible to release the latest master since it's been sitting for 4 months?
Thanks.
Hi this is my config.ru
file
require 'rack'
require 'prometheus/middleware/collector'
require 'prometheus/middleware/exporter'
use Rack::Deflater, if: ->(_, _, _, body) { body.any? && body[0].length > 512 }
use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter
run Rails.application
But when I am throwing :not_found
error using return head :not_found unless user
it gives me
#<NoMethodError: undefined method `any?' for #ActionDispatch::Response::RackBody:0x00000003c657c0>>
/home/user/.rvm/gems/ruby-2.3.3/gems/rack-2.0.3/lib/rack/body_proxy.rb:41:in `method_missing'
/home/user/.rvm/gems/ruby-2.3.3/gems/rack-2.0.3/lib/rack/body_proxy.rb:41:in `method_missing'
/home/user/.rvm/gems/ruby-2.3.3/gems/rack-2.0.3/lib/rack/body_proxy.rb:41:in `method_missing'
/home/user/.rvm/gems/ruby-2.3.3/gems/rack-2.0.3/lib/rack/body_proxy.rb:41:in `method_missing'
/home/user/api/config.ru:5:in `block (2 levels) in <main>'
/home/user/.rvm/gems/ruby-2.3.3/gems/rack-2.0.3/lib/rack/deflater.rb:114:in `should_deflate?'
/home/user/.rvm/gems/ruby-2.3.3/gems/rack-2.0.3/lib/rack/deflater.rb:37:in `call'
/home/user/.rvm/gems/ruby-2.3.3/gems/puma-3.8.2/lib/puma/configuration.rb:224:in `call'
/home/user/.rvm/gems/ruby-2.3.3/gems/puma-3.8.2/lib/puma/server.rb:600:in `handle_request'
/home/user/.rvm/gems/ruby-2.3.3/gems/puma-3.8.2/lib/puma/server.rb:435:in `process_client'
/home/user/.rvm/gems/ruby-2.3.3/gems/puma-3.8.2/lib/puma/server.rb:299:in `block in run'
/home/user/.rvm/gems/ruby-2.3.3/gems/puma-3.8.2/lib/puma/thread_pool.rb:120:in `block in spawn_thread'`
I am using client_ruby in a rails application with the following config.ru file:
# This file is used by Rack-based servers to start the application.
require ::File.expand_path('../config/environment', __FILE__)
# gzip compression
use Rack::Deflater
# metrics
require 'prometheus/client/rack/collector'
require 'prometheus/client/rack/exporter'
use Prometheus::Client::Rack::Collector
use Prometheus::Client::Rack::Exporter
run Storybook::Application
When analyzing counters such as http_requests_total
or http_request_duration_total_seconds
, I noticed that the values will fluctuate back and forth every few seconds. I confirmed this by constantly refreshing my application.com/metrics
page and observing the values. My grafana dashboard caught this instantly.
http_requests_total
exhibits similar behaviour.
Are these fluctuations expected behaviour?
I am trying to use the promethus client in a Rack application using Grape.
Since Grape APIs do not have an initialize
block, I am somewhat lost as to what the best place is to create the Registry and register metrics.
Am I meant to only ever use a single Client::Registry
throughout my code or can I create new ones where I need them?
Will they share the registered metrics?
I suspect not, and if that is correct, I would appreciate some pointers as to the preferred way of handling the Registry. Should I wrap it in a Singleton?
Using Ruby 2.4.1, prometheus-client
0.7.1, and quantile
0.2.0 (for specificity), the Summary class is generating strange percentiles (or what I assume are supposed to be percentiles).
➜ bundle exec pry
[1] pry(main)> require 'prometheus/client/summary'
=> true
[2] pry(main)> summary = Prometheus::Client::Summary.new(:a, "a")
=> #<Prometheus::Client::Summary:0x00000003041908
@base_labels={},
@docstring="a",
@mutex=#<Thread::Mutex:0x00000003041890>,
@name=:a,
@validator=#<Prometheus::Client::LabelSetValidator:0x00000003041840 @validated={}>,
@values={}>
[3] pry(main)> (1..100_000).each { |n| summary.observe({}, n) }; summary.get
=> {0.5=>27253, 0.9=>44736, 0.99=>49532}
This is something the quantile
gem is doing, not something Prometheus::Client::Summary
is doing, as the values returned are identical to the ones provided by the underlying library
➜ bundle exec pry
[1] pry(main)> require 'quantile'
=> true
[2] pry(main)> qe = Quantile::Estimator.new
=> #<Quantile::Estimator:0x0000000238aea8
@buffer=[],
@head=nil,
@invariants=
[#<Quantile::Quantile:0x0000000238ae58 @coefficient_i=0.2, @coefficient_ii=0.2, @inaccuracy=0.05, @quantile=0.5>,
#<Quantile::Quantile:0x0000000238ae30 @coefficient_i=0.20000000000000004, @coefficient_ii=0.022222222222222223, @inaccuracy=0.01, @quantile=0.9>,
#<Quantile::Quantile:0x0000000238ae08 @coefficient_i=0.19999999999999982, @coefficient_ii=0.00202020202020202, @inaccuracy=0.001, @quantile=0.99>],
@observations=0,
@sum=0>
[3] pry(main)> (1..100_000).each(&qe.method(:observe)); nil
=> nil
[4] pry(main)> qe.query(0.5)
=> 27253
[5] pry(main)> qe.query(0.9)
=> 44736
[6] pry(main)> qe.query(0.99)
=> 49532
But it looks like the Java client library has this exact (well, almost) setup as an automated test, and they assert the values are pretty much normal percentiles, but with an error margin:
https://github.com/prometheus/client_java/blob/master/simpleclient/src/test/java/io/prometheus/client/SummaryTest.java#L72-L89
@Test
public void testQuantiles() {
int nSamples = 1000000; // simulate one million samples
for (int i=1; i<=nSamples; i++) {
// In this test, we observe the numbers from 1 to nSamples,
// because that makes it easy to verify if the quantiles are correct.
labelsAndQuantiles.labels("a").observe(i);
noLabelsAndQuantiles.observe(i);
}
assertEquals(getNoLabelQuantile(0.5), 0.5 * nSamples, 0.05 * nSamples);
assertEquals(getNoLabelQuantile(0.9), 0.9 * nSamples, 0.01 * nSamples);
assertEquals(getNoLabelQuantile(0.99), 0.99 * nSamples, 0.001 * nSamples);
assertEquals(getLabeledQuantile("a", 0.5), 0.5 * nSamples, 0.05 * nSamples);
assertEquals(getLabeledQuantile("a", 0.9), 0.9 * nSamples, 0.01 * nSamples);
assertEquals(getLabeledQuantile("a", 0.99), 0.99 * nSamples, 0.001 * nSamples);
}
Those assertions seem to indicate what I thought would happen, which is the percentiles (under uniform distribution) will be linear with the number of observations.
So, which behavior is correct?
Hi,
I'm using client_ruby to create/set counter and gauge metrics but i don't see in the prometheus library an option to specify a timestamp.
Is this not possibly?
We had a couple of comments on this point, and it’s fair. The point about “no obvious right way to aggregate gauges in all cases” is a solid argument.
We need to decide a stance on this both for gauges specifically and for metrics in general (i.e. should gauges be a special case where we don't aggregate by default).
This issue should be renamed and updated once we decide that stance.
So in most cases with time series data, it doesnt make sense to have cumulative summaries. I've implemented sliding windows in a way that works for my uses, but I thought I'd put it here in case you'd like to modify it slightly for the full gem
TimeWindowEstimator class
# This class is a wrapper around a single Quantile::Estimator, which is used to hold data for summaries
# It maintains a ring buffer of Estimators to provide quantiles over a sliding windows of time.
module Quantile
class TimeWindowEstimator
attr_reader :invariants
def initialize(invariants, max_age_seconds, age_buckets)
@invariants = invariants
@ring_buffer = []
age_buckets.times { @ring_buffer.push(Estimator.new(*invariants)) }
@current_bucket = 0
@last_rotated_time = Time.now
@duration_between_rotations_seconds = max_age_seconds / age_buckets
end
def query(quantile)
current_estimator = rotate
current_estimator.query(quantile)
end
def observe(value)
rotate
@ring_buffer.each do |estimator|
estimator.observe(value)
end
end
def sum
current_estimator = rotate
current_estimator.sum
end
def observations
current_estimator = rotate
current_estimator.observations
end
private
def rotate
time_since_last_rotate = Time.now - @last_rotated_time
while time_since_last_rotate > @duration_between_rotations_seconds
# Clear the current bucket
@ring_buffer[@current_bucket] = Estimator.new(*@invariants)
@current_bucket += 1
@current_bucket = 0 if @current_bucket >= @ring_buffer.length
time_since_last_rotate -= @duration_between_rotations_seconds
@last_rotated_time += @duration_between_rotations_seconds
end
@ring_buffer[@current_bucket]
end
end
end
TimeWindowSummary
require 'quantile'
require 'prometheus/client/summary'
require_relative 'time_window_estimator'
# rubocop:disable Metrics/LineLength
# rubocop:disable Metrics/ParameterLists
module Prometheus
module Client
# Summary is an accumulator for samples. It captures Numeric data and provides an efficient quantile calculation mechanism.
# TimeWindowSummary is a summary but with a sliding window of time for metrics
class TimeWindowSummary < Summary
attr_reader :invariants, :max_age_seconds, :num_buckets
# Time Window Summaries should have:
# name: name of the metric
# docstring: description of the metric
# max_age_seconds: Set the duration of the time window is, i.e. how long observations are kept before they are discarded
# invariants: Array of quantiles and their given error bounds
# num_buckets: Set the number of buckets used to implement the sliding time window. If your time window is 10 minutes,
# and you have ageBuckets=5, buckets will be switched every 2 minutes.
# The value is a trade-off between resources (memory and cpu for maintaining the bucket) and how smooth the time window is moved.
# base_labels: optional set of labels
def initialize(name, docstring, invariants, max_age_seconds, num_buckets, base_labels = {})
@invariants = invariants
@max_age_seconds = max_age_seconds
@num_buckets = num_buckets
super(name, docstring, base_labels)
end
# Type must be summary so that the prometheus scraper still thinks its a valid type
def type
:summary
end
private
def default
Quantile::TimeWindowEstimator.new(@invariants, @max_age_seconds, @num_buckets)
end
end
end
end
# rubocop:enable Metrics/LineLength
# rubocop:enable Metrics/ParameterLists
I did not find delete or remove method in ruby-client, like gauge:
gauge.set({ room: 'kitchen' }, 21.534)
gauge.get({ room: 'kitchen' })
Is there any gauge.delete or gauge.remove method? I did not find this in the source code also.
Docs say that the metric name should match [a-zA-Z_:][a-zA-Z0-9_:]*
: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
The Ruby library just checks that it is a symbol: https://github.com/prometheus/client_ruby/blob/master/lib/prometheus/client/metric.rb#L49
See also prometheus/client_golang#255.
#95 was a significant re-write. I understand that people want to do more work before a "proper" 1.0 release, but it'd be nice to have a rubygems release of current master that we can point at instead of pointing at the git repository. Maybe 0.10.0
or 1.0.0-alpha.1
(ie a prerelease version) or something?
For context, we have been using the gocardless fork for a while and in alphagov/verify-frontend#697 we switched to the official master branch.
Hey guys, your library is awesome!
Can I ask you for a favour to release rc1
or so for 0.7.0
version? Unfortunately our project can't use the master branch from the Github. But in current the master, you have super exciting and important changes.
In go client hash of labels and metric name is used to check uniqueness.
https://github.com/prometheus/client_golang/blob/master/prometheus/desc.go#L71
In Ruby client it's not the case and only metric name is used to identify unique metrics.
As it currently stands, the Prometheus Ruby Client has a few issues that make it hard to adopt in mainstream Ruby projects, particularly in Web applications:
We'd like to contribute to the effort of improving the client in these directions, and we're proposing we could make a number of changes (this issue will be promptly followed by a PR that implements several of these).
We have reached out to @grobie recently and he suggested that releasing a new major version was the way to go in order to work around these issues.
There are several proposals in this RFC for improvements to the existing Prometheus Client for Ruby.
These proposals are largely independent of each other, so we can pick for each one whether we think it’s an improvement or not. They are also ordered from most to least important. Only the first one is an absolute must, since it paves the way for adding multi-process support.
In the current client, each Metric object has an internal @values
hash to store the metric values. The value of this hash is, for Gauges and Counters, a float
, and the key of this hash is itself a hash of labels and their values. Thus, for one given metric there are multiple values at any given time, one for each combination of values of their labels.
For Histograms, @values
doesn’t hold a float
. Instead it holds a Histogram::Value
object, which holds one integer
for each bucket, plus the total number of observations, and the sum of all observations. Summaries do a similar thing.
We're proposing that, instead of each metric holding their own counter internally, we should have a centralized store that holds all the information. Metric objects update this store for every observation, and it gets read in its entirety by a formatter when the metrics are scraped.
Having this central storage allows us to abstract how that data is stored internally. For simpler cases, we can simply use a large hash, similar to the current implementation. But other situations (like pre-fork servers) require more involved implementations, to be able to share memory between different processes and report coherent total numbers. Abstracting the storage away allows the rest of the client to be agnostic about this complexity, and allows for multiple different “backends” that can be swapped based on the needs of each particular user.
Prometheus would have a global config
object that allows users to set which Store they want:
module Prometheus
module Client
class Config
attr_accessor :data_store
def initialize
@data_store = DataStores::SimpleHash.new
end
end
end
end
As a default, a simple storage system that provides the same functionality as the current client is provided. Other backends may be provided with the gem, and users can also make their own. Note that we set the data store to an instantiated object, not a class. This is because that object may need store-specific parameters when being instantiated (file paths, connection strings, etc)
These swappable stores have the following interface:
module Prometheus
module Client
module DataStores
class ExampleCustomStore
# Return a MetricStore, which provides a view of the internal data store,
# catering specifically to that metric.
#
# - `metric_settings` specifies configuration parameters for this metric
# specifically. These may or may not be necessary, depending on each specific
# store and metric. The most obvious example of this is for gauges in
# multi-process environments, where the developer needs to choose how those
# gauges will get aggregated between all the per-process values.
#
# The settings that the store will accept, and what it will do with them, are
# 100% Store-specific. Each store should document what settings it will accept
# and how to use them, so the developer using that store can pass the appropriate
# instantiating the Store itself, and the Metrics they declare.
#
# - `metric_type` is specified in case a store wants to validate that the settings
# are valid for the metric being set up. It may go unused by most Stores
#
# Even if your store doesn't need these two parameters, the Store must expose them
# to make them swappable.
def for_metric(metric_name, metric_type:, metric_settings: {})
# Generally, here a Store would validate that the settings passed in are valid,
# and raise if they aren't.
validate_metric_settings(metric_type: metric_type,
metric_settings: metric_settings)
MetricStore.new(store: self,
metric_name: metric_name,
metric_type: metric_type,
metric_settings: metric_settings)
end
# MetricStore manages the data for one specific metric. It's generally a view onto
# the central store shared by all metrics, but it could also hold the data itself
# if that's better for the specific scenario
class MetricStore
# This constructor is internal to this store, so the signature doesn't need to
# be this. No one other than the Store should be creating MetricStores
def initialize(store:, metric_name:, metric_type:, metric_settings:)
end
# Metrics may need to modify multiple values at once (Histograms do this, for
# example). MetricStore needs to provide a way to synchronize those, in addition
# to all of the value modifications being thread-safe without a need for simple
# Metrics to call `synchronize`
def synchronize
raise NotImplementedError
end
# Store a value for this metric and a set of labels
# Internally, may add extra "labels" to disambiguate values between,
# for example, different processes
def set(labels:, val:)
raise NotImplementedError
end
def increment(labels:, by: 1)
raise NotImplementedError
end
# Return a value for a set of labels
# Will return the same value stored by `set`, as opposed to `all_values`, which
# may aggregate multiple values.
#
# For example, in a multi-process scenario, `set` may add an extra internal
# label tagging the value with the process id. `get` will return the value for
# "this" process ID. `all_values` will return an aggregated value for all
# process IDs.
def get(labels:)
raise NotImplementedError
end
# Returns all the sets of labels seen by the Store, and the aggregated value for
# each.
#
# In some cases, this is just a matter of returning the stored value.
#
# In other cases, the store may need to aggregate multiple values for the same
# set of labels. For example, in a multiple process it may need to `sum` the
# values of counters from each process. Or for `gauges`, it may need to take the
# `max`. This is generally specified in `metric_settings` when calling
# `Store#for_metric`.
def all_values
raise NotImplementedError
end
end
end
end
end
end
For example, the default implementation of this interface would be something like this: (like all the code in this doc, this is simplified to explain the general idea, it is not final code):
module Prometheus
module Client
module DataStores
# Stores all the data in a simple, synchronized global Hash
#
# There are ways of making this faster (because of the naive Mutex usage).
class SimpleHash
def initialize
@internal_store = Hash.new { |hash, key| hash[key] = 0.0 }
end
def for_metric(metric_name, metric_type:, metric_settings: {})
# We don't need `metric_type` or `metric_settings` for this particular store
MetricStore.new(store: self, metric_name: metric_name)
end
private
class MetricStore
def initialize(store:, metric_name:)
@store = store
@internal_store = store.internal_store
@metric_name = metric_name
end
def synchronize
@store.synchronize { yield }
end
def set(labels:, val:)
synchronize do
@internal_store[store_key(labels)] = val.to_f
end
end
def increment(labels:, by: 1)
synchronize do
@internal_store[store_key(labels)] += by
end
end
def get(labels:)
synchronize do
@internal_store[store_key(labels)]
end
end
def all_values
store_copy = synchronize { @internal_store.dup }
store_copy.each_with_object({}) do |(labels, v), acc|
if labels["__metric_name"] == @metric_name
label_set = labels.reject { |k,_| k == "__metric_name" }
acc[label_set] = v
end
end
end
private
def store_key(labels)
labels.merge(
{ "__metric_name" => @metric_name }
)
end
end
end
end
end
end
A more complex store may look like this: (note, this is based on a fantasy primitive that magically shares memory between processes, it’s just to illustrate how extra internal labels / aggregators work):
module Prometheus
module Client
module DataStores
# Stores all the data in a magic data structure that keeps cross-process data, in a
# way that all processes can read it, but each can write only to their own set of
# keys.
# It doesn't care how that works, this is not an actual solution to anything,
# just an example of how the interface would work with something like that.
#
# Metric Settings have one possible key, `aggregation`, which must be one of
# `AGGREGATION_MODES`
class SampleMagicMultiprocessStore
AGGREGATION_MODES = [MAX = :max, MIN = :min, SUM = :sum, AVG = :avg]
DEFAULT_METRIC_SETTINGS = { aggregation: SUM }
def initialize
@internal_store = MagicHashSharedBetweenProcesses.new # PStore, for example
end
def for_metric(metric_name, metric_type:, metric_settings: {})
settings = DEFAULT_METRIC_SETTINGS.merge(metric_settings)
validate_metric_settings(metric_settings: settings)
MetricStore.new(store: self,
metric_name: metric_name,
metric_type: metric_type,
metric_settings: settings)
end
private
def validate_metric_settings(metric_settings:)
raise unless metric_settings.has_key?(:aggregation)
raise unless metric_settings[:aggregation].in?(AGGREGATION_MODES)
end
class MetricStore
def initialize(store:, metric_name:, metric_type:, metric_settings:)
@store = store
@internal_store = store.internal_store
@metric_name = metric_name
@aggregation_mode = metric_settings[:aggregation]
end
def set(labels:, val:)
@internal_store[store_key(labels)] = val.to_f
end
def get(labels:)
@internal_store[store_key(labels)]
end
def all_values
non_aggregated_values = all_store_values.each_with_object({}) do |(labels, v), acc|
if labels["__metric_name"] == @metric_name
label_set = labels.reject { |k,_| k.in?("__metric_name", "__pid") }
acc[label_set] ||= []
acc[label_set] << v
end
end
# Aggregate all the different values for each label_set
non_aggregated_values.each_with_object({}) do |(label_set, values), acc|
acc[label_set] = aggregate(values)
end
end
private
def all_store_values
# This assumes there's a something common that all processes can write to, and
# it's magically synchronized (which is not true of a PStore, for example, but
# would of some sort of external data store like Redis, Memcached, SQLite)
# This could also have some sort of:
# file_list = Dir.glob(File.join(path, '*.db')).sort
# which reads all the PStore files / MMapped files, etc, and returns a hash
# with all of them together, which then `values` and `label_sets` can use
end
# This method holds most of the key to how this Store works. Adding `_pid` as
# one of the labels, we hold each process's value separately, which we can
# aggregate later
def store_key(labels)
labels.merge(
{
"__metric_name" => @metric_name,
"__pid" => Process.pid
}
)
end
def aggregate(values)
# This is a horrible way to do this, just illustrating the point
values.send(@aggregation_mode)
end
end
end
end
end
end
The way you’d use these stores and aggregators would be something like:
Client.config.data_store = DataStores::SampleMagicMultiprocessStore.new(dir: "/tmp/prom")
Client.registry.count(
:http_requests_total,
docstring: 'Number of HTTP requests'
)
Client.registry.gauge(
:max_memory_in_a_process,
docstring: 'Maximum memory consumed by one process',
store_settings: { aggregation: DataStores::SampleMagicMultiprocessStore::MAX }
)
For all other metrics, you’d just get sum
by default which is probably what you want.
Stores are ultimately only used by the Metrics and the Formatters. The user never touches them.
The way Metrics work is similar to this:
class Metric
def initialize(name,
docstring:,
store_settings: {})
@store = Prometheus::Client.config.data_store.for_metric(name, metric_type: type, metric_settings: store_settings)
end
def get(labels: {})
@store.get(labels: label_set_for(labels))
end
def values
@store.all_values
end
end
class Counter < Metric
def type
:counter
end
def increment(by: 1, labels: {})
@store.increment(labels: label_set_for(labels), by: by)
end
end
Storage for Histograms and Summaries
In the current client, Histograms use a special value object to hold the number of observations for each bucket, plus a total and a sum. Our stores don’t allow this, since they’re a simple Hash that stores floats
and nothing else.
To work around this, Histograms add special, reserved labels when interacting with the store. These are the same labels that’ll be exposed when exporting the metrics, so there isn’t a huge impedance problem with doing this. The main difference is that Histograms need to override the get
and values
methods of Metric
to recompose these individual bucket values into a coherent Hash
class Histogram < Metric
def observe(value, labels: {})
base_label_set = label_set_for(labels)
@store.synchronize do
buckets.each do |upper_limit|
next if value > upper_limit
@store.increment(labels: base_label_set.merge(le: upper_limit), by: 1)
end
@store.increment(labels: base_label_set.merge(le: "+Inf"), by: 1)
@store.increment(labels: base_label_set.merge(le: "sum"), by: value)
end
end
# Returns all label sets with their values expressed as hashes with their buckets
def values
v = @store.all_values
v.each_with_object({}) do |(label_set, v), acc|
actual_label_set = label_set.reject{|l| l == :le }
acc[actual_label_set] ||= @buckets.map{|b| [b, 0.0]}.to_h
acc[actual_label_set][label_set[:le]] = v
end
end
end
Example usage:
let(:histogram) do
described_class.new(:bar,
docstring: 'bar description',
labels: expected_labels,
buckets: [2.5, 5, 10])
end
it 'returns a hash of all recorded summaries' do
histogram.observe(3, labels: { status: 'bar' })
histogram.observe(5, labels: { status: 'bar' })
histogram.observe(6, labels: { status: 'foo' })
expect(histogram.values).to eql(
{ status: 'bar' } => { 2.5 => 0.0, 5 => 2.0, 10 => 2.0, "+Inf" => 2.0, "sum" => 8.0 },
{ status: 'foo' } => { 2.5 => 0.0, 5 => 0.0, 10 => 1.0, "+Inf" => 1.0, "sum" => 6.0 },
)
end
For Summaries, we'd apply a similar solution.
Summary
We would change Summaries to expose only sum
and count
instead, with no quantiles / percentiles.
This is a bit of a contentious proposal in that it's not something we're doing because it'll make the client better, but because the way Summaries work is not very compatible with the idea of "Stores", which we'd need for pre-fork servers.
The quantile
gem doesn't play well with this, since we'd need to store instances of Quantile::Estimator
, which is a complex data structure, and tricky to share between Ruby processes.
Moreover, individual Summaries on different processes cannot be aggregated, so all processes would actually have to share one instance of this class, which makes it extremely tricky, particularly to do performantly.
Even though this is a loss of functionality, this puts the Ruby client on par with other client libraries, like the Python one, which also only offers sum and count without quantiles.
Also, this is actually more compliant with the Client Library best practices:
The original client didn't comply with the last 2 rules, where this one would, just like the Python client.
And quantiles, while seemingly the point of summaries, are encouraged but not required.
We're not ruling out the idea of adding quantiles back, either they'd work only "single-process", or we may find a better way of dealing with the multiple processes.
The current client enforces that labels for a metric don’t change once the metric is observed once. However, there is no way to declare what the labels should be when creating the metric, as the best practices demand. There’s also no facility to access a labeled dimension via a labels
method like the best practices, allowing things like metric.labels(role: admin).inc()
We propose changing the signature of a Metric’s initialize
method to:
def initialize(name,
docstring,
labels: [],
preset_labels: {})
labels
is an array of strings listing all the labels that are both allowed and required by this metric.preset_labels
are the same as the current base_labels
parameter. They allow specifying “default” values for labels. The difference is that, in this proposed interface, a label that has a pre-set value would be specified in both the labels
and preset_labels
params.LabelSetValidator
basically changes to compare preset_labels.merge(labels).keys == labels
, instead of storing the previously validated ones, and comparing against those. The rest of the validations remain basically the same.with_labels
method that behaves in the way that the best practices suggest for labels()
. (with_labels
is a more idiomatic name in Ruby). Given a metric, calling with_labels
on it will return a new Metric object with those labels added to preset_labels
.
with_labels(role: :admin)
, caching that, and then calling inc()
on it multiple times will be slightly faster than calling inc({ role: :admin }, 1)
multiple times, as we can skip the label validation. module Prometheus
module Client
class Metric
# When initializing a metric we specify the list of labels that are allowed,
# and we can specify pre-set values for some (or all) of them
#
# `preset_values` is the same idea as the current `base_labels`, with the
# difference that the label need to be specified in both `labels` and `preset_labels`
def initialize(name,
docstring,
labels: [],
preset_labels: {})
@allowed_labels = labels
@validator = LabelSetValidator.new(allowed_labels: labels)
@preset_labels = preset_labels
@validator.valid?(@preset_labels) if @preset_labels
end
# This is the equivalent of the `labels` method specified in the best practices.
# `with_labels` is a more idiomatic name in my opinion, and it's less confusing,
# since `labels` could be something that lists all the allowed labels or all the
# observed label values for the metric.
#
# Like the best practices mention, this can be cached by the client, for
# performance. This will save the time of validating the labels for every
# `increment` or `set`, but it won't save the time to increment the actual
# counter in the store, since the hash lookup still needs to happen.
def with_labels(labels: {})
all_labels = @preset_labels.merge(labels)
@validator.valid?(all_labels)
return self.class.new(@name,
@docstring,
labels: @allowed_labels,
preset_labels: all_labels)
end
private
def label_set_for(labels)
@validator.validate(preset_labels.merge(labels))
end
end
end
end
NOTE: This code doesn’t quite work faster when caching the result of with_labels
, for simplicity, but it's easy to make that change.
Questions:
nil
as a label value?
nil
as a value for a label.to_s
inside the Metric
Currently, the client raises an exception when anything is wrong with labels. While any such problem should be caught in development, we wouldn’t want to 500 on a request because of some unexpected situation with labels.
Ideally, we would raise in dev / test, but notify our exception manager in production.
We propose adding a label_error_strategy
config option, which defaults to Raise
, but that can be change by the user to whatever they need.
Something like:
module Prometheus
class Config
attr_accessor :label_error_strategy
def initialize
@label_error_strategy = ErrorStrategies::Raise
end
end
module ErrorStrategies
class Raise
def self.label_error(e)
raise e
end
end
end
end
The Prometheus Client would only provide Raise
as a strategy. We might also want to provide some for Sentry / Rollbar / Bugsnag / etc, but just allowing to swap these around should be enough.
Note that the Client makes no attempt at figuring out it’s in production / dev, and deciding anything based on that. This is left for the user.
An example of using this would be:
class RavenPrometheusLabelStrategy
def self.label_error(e)
Raven.notify_exception(e)
end
end
Prometheus::Client.config.label_error_strategy = RavenPrometheusLabelStrategy
kwargs
throughout the codeWe believe using keyword arguments will make the Client nicer to use, and more clear. Also, this is more idiomatic in modern Ruby.
Examples:
counter.increment(by: 2)
vs
counter.increment({}, 2)
Registry.counter(:requests_total,
labels: ['code', 'app'],
preset_labels: { app: 'MyApp' })
vs
Registry.counter(:requests_total, ['code', 'app'], { app: 'MyApp' })
The main point against this is that Ruby < 2.0 doesn’t support them fully, but those versions have been EOL for over 3 years now, so we shouldn't need to continue to support them.
Something like:
class Counter
def count_exceptions(type: StandardError)
yield
rescue type => e
inc()
raise
end
end
This should be used like:
def dodgy_code
my_counter.count_exceptions do
# dodgy code
end
rescue
# actually rescue the exception and do something useful with it
end
Not much explanation needed for this one
Something like:
class Gauge
def track_in_progress
inc()
yield
ensure
dec()
end
end
Something like:
class Gauge
def time
t = Time.now
yield
ensure
set(Time.now - t)
end
end
/s/set()/observe()/
in the code above
Class methods on Histogram
that return an array with bucket upper limits, for users to pass to the Histogram constructor
Registry.histogram(name: "foo", buckets: Histogram.linear_buckets(0, 10, 10))
https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors
I would like to push some metrics to my push gateway. However, the push gateway is secured and requires authentication. How do I do this as I cannot see any examples to do this.
Thanks :)
I've setup a custom exporter and prometheus is querying correctly but its won't store data. When i'am manually query them all seems fine.
Any ideas?
root@stats >>> curl odroid:5000/metrics
# TYPE sma_dc_power_kw gauge
# HELP sma_dc_power_kw DC Power
sma_dc_power_kw{phase="1"} 0.0
sma_dc_power_kw{phase="2"} 0.0
# TYPE sma_dc_voltage gauge
# HELP sma_dc_voltage DC Voltage
sma_dc_voltage{phase="1"} 0.0
sma_dc_voltage{phase="2"} 0.0
# TYPE sma_dc_current gauge
# HELP sma_dc_current DC Current
sma_dc_current{phase="1"} 0.0
sma_dc_current{phase="2"} 0.0
# TYPE sma_ac_power_kw gauge
# HELP sma_ac_power_kw AC Power
sma_ac_power_kw{phase="1"} 0.0
sma_ac_power_kw{phase="2"} 0.0
sma_ac_power_kw{phase="3"} 0.0
# TYPE sma_ac_current gauge
# HELP sma_ac_current AC Current
sma_ac_current{phase="1"} 0.0
sma_ac_current{phase="2"} 0.0
sma_ac_current{phase="3"} 0.0
# TYPE sma_ac_voltage gauge
# HELP sma_ac_voltage AC Voltage
sma_ac_voltage{phase="1"} 0.0
sma_ac_voltage{phase="2"} 0.0
sma_ac_voltage{phase="3"} 0.0
# TYPE sma_device_temperature gauge
# HELP sma_device_temperature SMA device temperature
sma_device_temperature 0.0
# TYPE sma_device_state gauge
# HELP sma_device_state SMA device state
sma_device_state
# TYPE sma_device_sn gauge
# HELP sma_device_sn SMA device serialnumber
sma_device_sn
# TYPE sma_grid_state gauge
# HELP sma_grid_state SMA grid state
sma_grid_state
# TYPE sma_grid_freq gauge
# HELP sma_grid_freq SMA grid state
sma_grid_freq
# TYPE http_requests_total counter
# HELP http_requests_total A counter of the total number of HTTP requests made.
http_requests_total{method="get",host="localhost:5000",path="/metrics",code="200"} 393698
http_requests_total{method="get",host="odroid:5000",path="/",code="200"} 1
http_requests_total{method="get",host="odroid:5000",path="/metrics",code="200"} 57735
# TYPE http_request_duration_seconds summary
# HELP http_request_duration_seconds A histogram of the response latency.
http_request_duration_seconds{method="get",host="localhost:5000",path="/metrics",code="200",quantile="0.5"} 0.011047742
http_request_duration_seconds{method="get",host="localhost:5000",path="/metrics",code="200",quantile="0.9"} 0.017268368
http_request_duration_seconds{method="get",host="localhost:5000",path="/metrics",code="200",quantile="0.99"} 0.029610358
http_request_duration_seconds_sum{method="get",host="localhost:5000",path="/metrics",code="200"} 4950.962471179024
http_request_duration_seconds_count{method="get",host="localhost:5000",path="/metrics",code="200"} 393698
http_request_duration_seconds{method="get",host="odroid:5000",path="/",code="200",quantile="0.5"} 6.2291e-05
http_request_duration_seconds{method="get",host="odroid:5000",path="/",code="200",quantile="0.9"} 6.2291e-05
http_request_duration_seconds{method="get",host="odroid:5000",path="/",code="200",quantile="0.99"} 6.2291e-05
http_request_duration_seconds_sum{method="get",host="odroid:5000",path="/",code="200"} 6.2291e-05
http_request_duration_seconds_count{method="get",host="odroid:5000",path="/",code="200"} 1
http_request_duration_seconds{method="get",host="odroid:5000",path="/metrics",code="200",quantile="0.5"} 0.01191175
http_request_duration_seconds{method="get",host="odroid:5000",path="/metrics",code="200",quantile="0.9"} 0.012619823
http_request_duration_seconds{method="get",host="odroid:5000",path="/metrics",code="200",quantile="0.99"} 0.012885253
http_request_duration_seconds_sum{method="get",host="odroid:5000",path="/metrics",code="200"} 719.6265259969975
http_request_duration_seconds_count{method="get",host="odroid:5000",path="/metrics",code="200"} 57735
# TYPE http_exceptions_total counter
# HELP http_exceptions_total A counter of the total number of exceptions raised.
From running it in production, we (GoCardless) have found DirectFileStore to be more memory-hungry than other ways of storing metrics.
This isn't entirely surprising, but we've done a little investigation into whether we could reduce the effect and we may have some improvements we can make.
One mitigation we found after looking round the internet was to switch from libc malloc to jemalloc, which mitigates a lot of the memory bloat issues you can run into with CRuby.
For now it's sufficient for us to document this and move on.
One of our internal users found some potential savings on memory allocations (hence bloat), which we can look to apply later, but which don't block releasing multi-process support.
Currently labels are handled by being the first argument to all functions. This gives the incorrect impression that all metrics should labels (in reality most metrics don't have labels) and makes label-less use harder.
This client should follow the structure laid out in https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels
In addition the user should be required to specify all their label names at metric creation time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.