Giter Club home page Giter Club logo

nginx-lua-prometheus's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nginx-lua-prometheus's Issues

Metric host not corret

some metric has wrong host

nginx_http_requests_total{host="(select (case when (5710=5710) then 0x676a2e6e78696e2e636f6d else 5710*(select 5710 from information_schema.character_sets) end))",instance="nginx02.produce.yz:9145",job="openresty",status="400"}
nginx_http_requests_total{host="gj.nxin.com));select (case when (4178=5243) then 4178 else 4178*(select 4178 from mysql.db) end)#",instance="nginx02.produce.yz:9145",job="openresty",status="400"}

Bytes by host or other's flags

Hello!
Amazing lib, congratulations.

But I have a question. Can we search inside metric_response_sizes for Host, same as nginx http_requests_total?

Tks

Invalid UTF-8 can be produced

I got a report of this library producing a label value which included the byte sequence \xbd\xa6, which is not valid UTF-8. Please sanitise/correctly encode label values before exposing them to Prometheus, which expects UTF-8.

exclude hosts and remote addresses

More of a question than issue here since I have no experience in Lua.

I am using this library to monitor with Prometheus my nginx server and visualize data with Grafana.
All well so far except that I have a lot of requests from Prometheus itself and many others from uptimerobot.com every few seconds.

I know I can exclude them from Grafana using labels but I also believe they will affect overall metrics, totals, averages, counters, etc and they also take out disk space, resources, etc.

I was wondering if there is a way to filter out requests based on a regex applied to nginx variables or a config file with exclusions. An example would be great (I'd even add it to the docs)

Thanks for a great library!

How to persist data?

I use docker to run nginx with promethues.lua. But the container restart or recreate, old data is loss, how can I persistent data? thanks

prometheus.lua:290: bad "zone" argument

Running into a issue enabling this and openresty/openresty:1.13.6.1-2-centos Docker baseimages:

gateway_1 | 2018/07/30 20:36:26 [error] 7#7: init_by_lua error: /usr/local/openresty/nginx//lualib/prometheus.lua:290: bad "zone" argument
gateway_1 | stack traceback:
gateway_1 | [C]: in function 'set'
gateway_1 | /usr/local/openresty/nginx//lualib/prometheus.lua:290: in function 'init'
gateway_1 | init_by_lua:3: in main chunk
gateway_1 | nginx: [error] init_by_lua error: /usr/local/openresty/nginx//lualib/prometheus.lua:290: bad "zone" argument
gateway_1 | stack traceback:
gateway_1 | [C]: in function 'set'
gateway_1 | /usr/local/openresty/nginx//lualib/prometheus.lua:290: in function 'init'
gateway_1 | init_by_lua:3: in main chunk

Any ideas?

Sum / Count returns NaN

I was following this article: https://prometheus.io/docs/practices/histograms, checking metrics suggested there vs the one I've already added.

For the simplest one that should be:

 rate(nginx_http_request_duration_seconds_sum[5m])
/
 rate(nginx_http_request_duration_seconds_count[5m])

I am getting NaN values in Prometheus console, do you know something about that behavior?

Histogram increments every bucket

Hello,

In my case histogram increments every bucket and it looks like so:

# HELP nginx_http_request_duration_seconds HTTP request latency
# TYPE nginx_http_request_duration_seconds histogram
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.005"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.010"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.020"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.030"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.050"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.075"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.100"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.200"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.300"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.400"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.500"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="00.750"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="01.000"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="01.500"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="02.000"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="03.000"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="04.000"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="05.000"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="10.000"} 1
nginx_http_request_duration_seconds_bucket{server_name="",host="localhost:9145",le="+Inf"} 1
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.005"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.010"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.020"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.030"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.050"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.075"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.100"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.200"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.300"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.400"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.500"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="00.750"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="01.000"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="01.500"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="02.000"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="03.000"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="04.000"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="05.000"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="10.000"} 6
nginx_http_request_duration_seconds_bucket{server_name="localhost",host="localhost",le="+Inf"} 6

Could you please advice what do I need to check?

My config is almost default:

    # nginx-lua-prometheus
    lua_shared_dict prometheus_metrics 10M;
    lua_package_path "/usr/local/share/lua/prometheus.lua";
    init_by_lua '
    prometheus = require("prometheus").init("prometheus_metrics")
    metric_requests = prometheus:counter(
        "nginx_http_requests_total", "Number of HTTP requests", {"server_name", "status", "host"})
    metric_latency = prometheus:histogram(
        "nginx_http_request_duration_seconds", "HTTP request latency", {"server_name", "host"})
    metric_connections = prometheus:gauge(
        "nginx_http_connections", "Number of HTTP connections", {"state"})
    ';
    log_by_lua '
    metric_requests:inc(1, {ngx.var.server_name, ngx.var.status, ngx.var.http_host})
    metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name, ngx.var.http_host})
    ';

[Question] Nginx-1.16 docker alpine compatibility

Hi,

Thank you for this awesome plugin.
I was using this with nginx-1.13.12-alpine image with nginx-mod-http-lua

But I need to upgrade to nginx-1.16, the issue is the lua in alpine upstream in outdated.

2019/06/27 03:15:38 [emerg] 12#12: module "/etc/nginx/modules/ndk_http_module.so" version 1014002 instead of 1016000 in /etc/nginx/nginx.conf:3
nginx: [emerg] module "/etc/nginx/modules/ndk_http_module.so" version 1014002 instead of 1016000 in /etc/nginx/nginx.conf:3
```

snippet of nginx conf
```
user  nginx;
worker_processes  1;
load_module modules/ndk_http_module.so;
load_module modules/ngx_http_lua_module.so;
```

Microservices metrics

Hi Knyar,

can i get metrics for specific services?

ex. i have the server_name api.mysite.com;

and locations /
/api
/v1

where api, v1 are different microservices of same server_name, in /metrics i just got api.mysite.com metrics, and don't api.mysite.com/api

can you help me in this question?

counter

I'm using counter to count requests passed to django backend
'''
nginx.conf:
metric_requests_django = prometheus:counter(
"nginx_django_requests_total", "Number of HTTP requests", {"host", "status"})

server.conf:
server{
...
location ... {
uwsgi_pass ...
log_by_lua '
local host = ngx.var.host:gsub("^www.", "")
metric_requests_django:inc(1, {host, ngx.var.status})
';
}}
'''

and metrics i've got with this config:

'''
while true; do wget -O - http://nginx_server_ip:9101/metrics -q | grep nginx_django_requests_total | grep lala.my_domain_name | grep 200; sleep 60; done
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9383
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9159
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9185
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9205
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9458
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9245
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9265
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9282
nginx_django_requests_total{host="lala.my_domain_name",status="200"} 9301
'''

As you can see counter is not in ascending order.
maybe the problem is because i'm using this host config in nginx:

'''
server_name ~^(www.)?(?[a-z0-9-]+).my_domain_name;
'''

but that configuration works just fine some time....and last month or two i see that strange problem.
i've updated nginx-lua-prometeus and nginx to latest. that did hot help.

Prometheus errors when parsing /metrics generated by nginx-lua-prometheus

Entries like this appear:

nginx_http_request_duration_seconds_bucket{host="'"",le="00.010"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.020"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.030"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.050"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.075"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.100"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.200"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.300"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.400"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.500"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="00.750"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="01.000"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="01.500"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="02.000"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="03.000"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="04.000"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="05.000"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="10.000"} 1
nginx_http_request_duration_seconds_bucket{host="'"",le="+Inf"} 1

I believe it because someone did a drive-by (host header attack)[http://www.skeletonscribe.net/2013/05/practical-http-host-header-attacks.html] on us. Should really do some escaping...

openresty prometheus log_error

hi
my userd prometheus.lua
my nginx.conf add.
why my prometheus nginx_metric_errors_total error_log?
I don't know what's wrong
thanks you

http {  ...... }
lua_shared_dict prometheus_metrics 10M;
lua_package_path "/usr/local/nginx/conf/prometheus.lua";
init_by_lua '
  prometheus = require("prometheus").init("prometheus_metrics")
  metric_requests = prometheus:counter(
    "nginx_http_requests_total", "Number of HTTP requests", {"host", "status"})
  metric_latency = prometheus:histogram(
    "nginx_http_request_duration_seconds", "HTTP request latency", {"host"})
  metric_connections = prometheus:gauge(
    "nginx_http_connections", "Number of HTTP connections", {"state"})
';
log_by_lua '
  metric_requests:inc(1, {ngx.var.server_name, ngx.var.status})
  metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name})
';
# /usr/local/nginx/sbin/nginx -V
nginx version: openresty/1.13.6.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) 
built with OpenSSL 1.0.2k-fips  26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty-1.13.6.2/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.13 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.5 --with-ld-opt=-Wl,-rpath,/usr/local/openresty-1.13.6.2/luajit/lib --user=nobody --group=nobody --with-stream --with-stream_ssl_module --with-http_ssl_module

----log_error----

[root@localhost conf]# curl -i 127.0.0.1/metrics
HTTP/1.1 200 OK
Server: ORifeng/1.13.6.2
Date: Sat, 29 Dec 2018 10:21:50 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive

# HELP nginx_http_request_duration_seconds HTTP request latency
# TYPE nginx_http_request_duration_seconds histogram
nginx_http_request_duration_seconds_bucket{host="localhost",le="00.005"} 22
......
# HELP nginx_http_requests_total Number of HTTP requests
# TYPE nginx_http_requests_total counter
nginx_http_requests_total{host="localhost",status="200"} 15
# HELP nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE nginx_metric_errors_total counter
nginx_metric_errors_total 15                            <<<<<--------------


# tail -f /usr/local/nginx/logs/error.log 
2018/12/29 18:15:37 [error] 115210#0: *19 [lua] prometheus.lua:295: log_error(): No value passed for nginx_http_connections, client: 127.0.0.1, server: localhost, request: "GET /metrics HTTP/1.1", host: "127.0.0.1"

why the metris label "host" is null?

HELP nginx_http_request_duration_seconds HTTP request latency

TYPE nginx_http_request_duration_seconds histogram

nginx_http_request_duration_seconds_bucket{host="",le="00.005"} 1597
nginx_http_request_duration_seconds_bucket{host="",le="00.010"} 1597
nginx_http_request_duration_seconds_bucket{host="",le="00.020"} 1597
nginx_http_request_duration_seconds_bucket{host="",le="00.030"} 1597
nginx_http_request_duration_seconds_bucket{host="",le="00.050"} 1597
nginx_http_request_duration_seconds_bucket{host="",le="00.075"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="00.100"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="00.200"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="00.300"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="00.400"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="00.500"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="00.750"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="01.000"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="01.500"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="02.000"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="03.000"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="04.000"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="05.000"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="10.000"} 1600
nginx_http_request_duration_seconds_bucket{host="",le="+Inf"} 1600
nginx_http_request_duration_seconds_count{host=""} 1600
nginx_http_request_duration_seconds_sum{host=""} 0.194

HELP nginx_http_requests_total Number of HTTP requests

TYPE nginx_http_requests_total counter

nginx_http_requests_total{host="",status="200"} 1599
nginx_http_requests_total{host="",status="404"} 1

HELP nginx_metric_errors_total Number of nginx-lua-prometheus errors

TYPE nginx_metric_errors_total counter

nginx_metric_errors_total 0

Performance improvements around dictionary locking

I was just reading the performance issue with locking and I'd like to suggest a possible solution: double buffering.

In a nutshell, write everything to one shared dict (A). Then, when it's time to collect, we redirect all writes to a second dict (B) so they don't block. collect is done only from A's data. For the next scrape, we treat A as B and vice versa.

After collect finishes we'll have to:

  1. Clear A it so there's no leftover data.
  2. Since B was "empty" when we swapped, we also need to update B with the data that was previously in A.

(2) is where it can get a bit hairy but I think it's doable (maybe using CRDTs)?

lua coroutine: memory allocation error: not enough memory

prometheus_metrics 2048m

ERROR:

2019/07/19 13:15:41 [error] 19173#0: *33456259895 lua coroutine: memory allocation error: not enough memory
stack traceback:
coroutine 0:
[C]: in function ‘get_keys’
…cal/share/lua/5.1/kong/plugins/prometheus/prometheus.lua:503: in function ‘collect’
…local/share/lua/5.1/kong/plugins/prometheus/exporter.lua:162: in function ‘collect’
/usr/local/share/lua/5.1/kong/plugins/prometheus/api.lua:7: in function </usr/local/share/lua/5.1/kong/plugins/prometheus/api.lua:6>
coroutine 1:
[C]: in function ‘resume’
/usr/local/share/lua/5.1/lapis/application.lua:393: in function ‘handler’
/usr/local/share/lua/5.1/lapis/application.lua:130: in function ‘resolve’
/usr/local/share/lua/5.1/lapis/application.lua:161: in function </usr/local/share/lua/5.1/lapis/application.lua:159>
[C]: in function ‘xpcall’
/usr/local/share/lua/5.1/lapis/application.lua:159: in function ‘dispatch’
/usr/local/share/lua/5.1/lapis/nginx.lua:215: in function ‘serve_admin_api’
content_by_lua(nginx-kong.conf:239):2: in function <content_by_lua(nginx-kong.conf:239):1>, client: 10.23.80.11, server: kong_admin, request: “GET /metrics HTTP/1.1”, host: “10.23.11.22:7001”
2019/07/19 13:15:41 [error] 19173#0: *33456259895 [lua] init.lua:133: handle_error(): /usr/local/share/lua/5.1/lapis/application.lua:397: not enough memory
stack traceback:
[C]: in function ‘get_keys’
…cal/share/lua/5.1/kong/plugins/prometheus/prometheus.lua:503: in function ‘collect’
…local/share/lua/5.1/kong/plugins/prometheus/exporter.lua:162: in function ‘collect’
/usr/local/share/lua/5.1/kong/plugins/prometheus/api.lua:7: in function </usr/local/share/lua/5.1/kong/plugins/prometheus/api.lua:6>

stack traceback:
[C]: in function ‘error’
/usr/local/share/lua/5.1/lapis/application.lua:397: in function ‘handler’
/usr/local/share/lua/5.1/lapis/application.lua:130: in function ‘resolve’
/usr/local/share/lua/5.1/lapis/application.lua:161: in function </usr/local/share/lua/5.1/lapis/application.lua:159>
[C]: in function ‘xpcall’
/usr/local/share/lua/5.1/lapis/application.lua:159: in function ‘dispatch’
/usr/local/share/lua/5.1/lapis/nginx.lua:215: in function ‘serve_admin_api’
content_by_lua(nginx-kong.conf:239):2: in function <content_by_lua(nginx-kong.conf:239):1>, client: 10.23.80.11, server: kong_admin, request: “GET /metrics HTTP/1.1”, host: “10.23.11.22:7001”

Multiple proxy pass trouble

Hello. I tried to monitor Nginx by this library with default configuration. AFAIK, each metric is labeled by HTTP "Host" header by the library. So, I tried to append different proxy locations different host headers.

Here is my config sample:

    location ~ ^/api/v1/service1/ {
        proxy_pass http://example.com:9001;
        proxy_set_header Host service1.example.com;
    }

    location ~ ^/api/v1/service2/ {
        proxy_pass http://example.com:9002;
        proxy_set_header Host service2.example.com;
    }

    location ~ ^/api/v1/service3/ {
        proxy_pass http://example.com:9003;
        proxy_set_header Host service3.example.com;
    }

    location ~ ^/api/v1/service4/ {
        proxy_pass http://example.com:9004;
        proxy_set_header Host service4.example.com;
    }

The problem is when I scrape metrics from nginx, I receive the metric with only one of them - service1.example.com. Also default Nginx hostname, localhost, but no one proxy location more.

So, should it work this way? Is it possible to label metrics by server locations another way?

Probably, that's my misunderstanding about how does this monitoring work, but I'd be happy to have your advice about this issue.

log_error 'no memory' while logging request

Hello,

I'm having a small (i think) problem with this module. After a short time after starting up nginx, it starts flooding the logs like:

[error] 5#5: *47323 [lua] prometheus.lua:289: log_error(): Error while setting 'nginx_http_request_time_bucket{host="professional-webserver",method="GET",path="/index.php//professionals/350640",le="10.000"}' to '1': 'no memory' while logging request, client: 10.0.4.26, server: , request: "GET /professionals/350640 HTTP/1.1", upstream: "fastcgi://10.0.26.5:9000", host: "professional-webserver"

this same error 'no memory' while logging request happen on mostly every request for every registered aggregator (regarless of type. e.g. counter, gauge or histogram).

My first reaction was to bump up the 10M in

lua_shared_dict prometheus_metrics 10M;

to something higher, like 50M. This didn't solve the problem, but seems that it took longer before it started happening.

I don't know if I should be "cleaning up" this dictionary manually or something. The configs I have don't stray far from the defaults.

Just to be clear, the metrics endpoints are working correctly, and the data is changing as expected when new requests arrive. The problem only has to do with log pollution and, maybe, the problem indicated by this error log.


Environment Information

I'm running openresty with a docker image with the following Dockerfile:
FROM oakman/nginx-prometheus

FROM oakman/nginx-prometheus

RUN rm /etc/nginx/conf.d/default.conf \
  && apk update && apk add wget

COPY ./metrics.vhost /usr/local/openresty/nginx/conf/metrics.vhost

COPY ./*.lua /usr/local/openresty/nginx/conf/

WORKDIR "/var/www/service"

The host system is an Ubuntu 16.04 with Docker version 17.12.1~ce-0 installed


Config files

metrics.vhost

lua_shared_dict prometheus_metrics 10M;
lua_package_path '/usr/local/openresty/luajit/lib/?.lua;;';

init_by_lua_file '/usr/local/openresty/nginx/conf/init-prometheus.lua';

log_by_lua_file '/usr/local/openresty/nginx/conf/log-prometheus.lua';

server {
  listen 9527;

  location /prometheus-metrics {
    content_by_lua_block {
      if ngx.var.connections_active ~= nil then
        http_connections:set(ngx.var.connections_active, {"active"})
        http_connections:set(ngx.var.connections_reading, {"reading"})
        http_connections:set(ngx.var.connections_waiting, {"waiting"})
        http_connections:set(ngx.var.connections_writing, {"writing"})
      end
      prometheus:collect()
    }
  }
}

log-prometheus.lua

local function split(str)
  local array = {}
  for mem in string.gmatch(str, '([^, ]+)') do
    table.insert(array, mem)
  end
  return array
end

local function getWithIndex(str, idx)
  if str == nil then
    return nil
  end

  return split(str)[idx]
end

local host = ngx.var.host
local status = ngx.var.status
local method = ngx.var.request_method
local args = ngx.var.uri

http_requests:inc(1, {host, method, args, status})
http_request_time:observe(ngx.now() - ngx.req.start_time(), {host, method, args})

http_request_bytes_sent:inc(tonumber(ngx.var.bytes_sent), {host, method, args})
if ngx.var.bytes_received ~= nil then
  http_request_bytes_received:inc(tonumber(ngx.var.bytes_received), {host, method, args})
end

init-prometheus.lua

prometheus = require("prometheus").init("prometheus_metrics")

http_requests = prometheus:counter(
  "nginx_http_requests",
  "Number of HTTP requests",
  {"host", "method", "path", "status"}
)
http_request_time = prometheus:histogram(
  "nginx_http_request_time",
  "HTTP request time",
  {"host", "method", "path"}
)
http_request_bytes_received = prometheus:counter(
  "nginx_http_request_bytes_received",
  "Number of HTTP request bytes received",
  {"host", "method", "path"}
)
http_request_bytes_sent = prometheus:counter(
  "nginx_http_request_bytes_sent",
  "Number of HTTP request bytes sent",
  {"host", "method", "path"}
)
http_connections = prometheus:gauge(
  "nginx_http_connections",
  "Number of HTTP connections",
  {"state"}
)

Thanks in advance for your help

Unable to install with LuaRocks?

I have a build which installs this in a docker container using the command
RUN luarocks install nginx-lua-prometheus
However since you released the new version 2 days ago, this no longer will complete successfully.

Using https://luarocks.org/nginx-lua-prometheus-0.20171117-2.src.rock... switching to 'build' mode

Error: File not found: nginx-lua-prometheus-0.20171117-2.rockspec

Was there something wrong with the release?

Issues with starting Lua directory

nginx: [error] [lua] prometheus.lua:271: init(): Dictionary 'prometheus_metrics' does not seem to exist. Please define the dictionary using lua_shared_dict.
nginx: [error] [lua] prometheus.lua:316: counter(): Prometheus module has not been initialized
nginx: [error] [lua] prometheus.lua:384: histogram(): Prometheus module has not been initialized
nginx: [error] [lua] prometheus.lua:349: gauge(): Prometheus module has not been initialized

Docker image

Any plans to have an nginx docker image with this builtin?

collect() costs a lot of time

the same result sometimes costs a lot of time(12 minutes max), not always(<10s Other times);
When time to wait, one nginx worker cpu usage will increase to 100,any debug advice?

Please create a tagged release

Hello,

I'm creating a Docker image that will use this Lua code. The build process would be cleaner if I could wget and checksum a zipped release from this repo.

All you'd need to do is create a 0.1 tag or similar.

Thanks

Including path as a label without high cardinality issues

First of all, thank you for this extension. It's been very useful for my use case.

I would like to include path information in our metrics output to gain a more granular understanding of the traffic in our distributed application environment. In Issue #29 it was mentioned that ngx.var.uri could be used for a related use case to collect metrics for a specific endpoint.

Since I would like to offer a centralised solution for application services in our system, I would ideally add the path as a label but am concerned about cardinality issues that would arise as a result. For example, many uris would contain unique identifiers that would initiate the creation of too many time series for a single metric and risk overloading prometheus instances as a result.

It would be useful if there were a method that would allow including path that imposed limits on possibly high cardinality values. Any suggestions would be greatly appreciated.

bug: 'le' label getting set to "nil" instead of "+Inf"

Not entirely sure why, but I'm getting some metrics lines like:

http_response_size_bytes_bucket{method="GET",status="200",le="nil"} 577

This causes Prometheus to error on parsing with:

text format parsing error in line 9: expected float as value for 'le' label, got "nil"

My nginx config looks like:

...
lua_package_path '/usr/local/openresty/lualib/?.lua;;';
lua_shared_dict prometheus_metrics 10M;
init_by_lua_file /srv/metrics_init.lua;
log_by_lua_file /srv/metrics_log.lua;
...

/srv/metrics_init.lua:

prometheus = require("prometheus").init("prometheus_metrics")

metric_bytes_sent = prometheus:histogram(
  "http_response_size_bytes",
  "HTTP response body sizes",
  {"method", "status"},
  {100, 1000, 10000, 100000, 1000000, 10000000, 100000000}
)

/srv/metrics_log.lua:

local method = ngx.var.request_method
local status = ngx.var.status

local bytes_sent = tonumber(ngx.var.body_bytes_sent) or 0

metric_bytes_sent:observe(bytes_sent, {method, status})

From the code it looks like that label should be +Inf: https://github.com/knyar/nginx-lua-prometheus/blob/master/prometheus.lua#L372... Not really sure what's going on.

Invitation to DevOpsConf Russia on September 30th and October 1st 2019

Hi Anton.

My name is Slava, I’m an engineer at Ecwid. I’m representing DevOpsConf Russia program committee. We are carrying out the most practical conference for those involved in processes of test automation, creating an infrastructure platform, CTO and anyone who wants to know how DevOps works for others. The community has grown a lot and in addition to separate tracks at the Russian Internet technologies festival and HighLoad++ conference, we will also hold an independent two-day conference this autumn.

I found your name among authors of Google’s SRE book. So I decided you have something to share with the Russian engineering community. On behalf of organizing committee I would like to make a special invitation to you as a speaker :) If you’ll come up with several topics you might want to talk about, we can discuss them all in order to find the most efficient way to present your work and get maximum response from our audience. Also we can choose several, if you decide to come to us more than once.

You can apply here.

If any questions, please don’t hesitate to ask me in telegram @smith3v

Thank you and have a nice day!

P.S. Sorry, I didn't find better way to contact you.

Histogram values

Hi,
I am integrating this library in our SMS hub and I have a doubt about how histogram works.

I have defined a histogram with the following buckets:

{0.1,0.25,0.5,0.75,1,2,3,4,5,10,30,60}

After calling observe the first time (value is 0.275), I get this result in collect:

sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="00.50"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="00.75"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="01.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="02.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="03.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="04.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="05.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="10.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="30.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="60.00"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="+Inf"} 1 sirocco_latency_seconds_count{endpoint="7",transceiver="80",type="push"} 1 sirocco_latency_seconds_sum{endpoint="7",transceiver="80",type="push"} 0.2759997844696

There are 2 missing "le" values (0.1 and 0.25). Both buckets are below the passed value (0.1<0.25<0,275). Is this the expected behavior ?

All buckets have a counter of 1. I was expecting this result

sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="00.10"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="00.25"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="00.50"} 1 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="00.75"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="01.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="02.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="03.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="04.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="05.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="10.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="30.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="60.00"} 0 sirocco_latency_seconds_bucket{endpoint="7",transceiver="80",type="push",le="+Inf"} 0 sirocco_latency_seconds_count{endpoint="7",transceiver="80",type="push"} 1 sirocco_latency_seconds_sum{endpoint="7",transceiver="80",type="push"} 0.2759997844696

If I send a second value, all buckets will be incremented to 2 as well.

Am I missing something here ?

I can't reach my metrics

Hello,
I have this issue when I try to reach my adresse: localhost:9113/metrics

2018/05/15 16:26:38 [info] 2537#2537: Using 32768KiB of shared memory for push module in /etc/nginx/nginx.conf:92
2018/05/15 16:27:02 [error] 2545#2545: *1 lua entry thread aborted: runtime error: /opt/nginx-lua-prometheus/prometheus.lua:496: attempt to index local 'self' (a nil value)
stack traceback:
coroutine 0:
/opt/nginx-lua-prometheus/prometheus.lua: in function 'collect'
content_by_lua(nginx.conf:89):5: in function , client: 192.168.56.1, server: , request: "GET /metrics HTTP/1.1", host: "192.168.56.101:9113"

What should I do ?

Thanks !

/metrics aren't displayed, downloaded instead

Hi,

not really an issue, but a question instead. It seems that any browser does not recognize the /metrics path as a text file it can display, but as a binary (or other file type it can't handle), so it downloads the file instead. When viewing the downloaded file with a text editor it looks fine.
So i am questioning myself if i haven't set up this plugin correctly, and prometheus itself will fail interpreting the /metrics as well, or if everything is fine?
If this is the expected behaviour, maybe you could specify some kind of output MIME-type so that the average browser can handle the output?
This would prevent other noobs like me who have very little experience with the collaboration of nginx and lua from searching for errors when there are none.

Nginx metrics based on path and method

Hi,
Thank you for this awesome plugin. Helped me to get rid of logspout, logstash and statsd.
But some small clarifications.

  1. is it possible to get custom metrics( like per path or incoming ip/server)
  2. Is it possible to gather metrics with methods(GET, POST, OPTIONS)

and one doubt,
while looking at nginx_http_requests_total I saw

nginx_http_requests_total{host="xxx.yyy.com",instance="10.0.0.156:9145",job="nginx_prom",status="200"} 58408
and
nginx_http_requests_total{instance="10.0.0.156:9145",job="nginx_prom",status="200"} 138

one with instance and one without. What's this?

I am running nginx in docker swarm, with 6 replicas, and prometheus dns_sd to discover nodes.

--

Thank you.

Compute Rate-Per-Second

First of all thanks for the great tool!

Wanted to ask if there is any way to obtain rate-per-second for counter metrics, as it would be useful to operate continuous monitoring on REST APIs and to properly computer success/failure rates.

Optimally even better if this are computed directly by this tool, so can be retrieved at the specified endpoint without the need of further tools or processing steps.

Like is there any way to even call Prometheus queries inside the Lua code?

Collecting metrics from multiple containers

Hi,

How do we collect metrics from single box when multiple containers are running on it ?

For example: If 9 containers are running in 2 machines, Those 9 container will be using a random service ports selected by orchestration tool.

In that scenario, How do we collect nginx stats from those 2 boxes (machine-wise/aggregate reports).

Thanks for any help !!

attempt to index field 'dict' error

Getting this error when i run nginx -t after adding in this package.

  • lua: Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio
  • ubuntu 18.04 LTS
  • nginx version: nginx/1.14.0 (Ubuntu)

nginx.conf

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
	worker_connections 768;
	# multi_accept on;
}

http {

	##
	# Basic Settings
	##

	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 65;
	types_hash_max_size 2048;
	# server_tokens off;

	# server_names_hash_bucket_size 64;
	# server_name_in_redirect off;

	include /etc/nginx/mime.types;
	default_type application/octet-stream;

	##
	# SSL Settings
	##

	ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
	ssl_prefer_server_ciphers on;

	##
	# Logging Settings
	##

	access_log /var/log/nginx/access.log;
	error_log /var/log/nginx/error.log;

	##
	# Gzip Settings
	##

	gzip on;

	# gzip_vary on;
	# gzip_proxied any;
	# gzip_comp_level 6;
	# gzip_buffers 16 8k;
	# gzip_http_version 1.1;
	# gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

	##
	# Virtual Host Configs
	##

	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;


	## nginx-prometheus

	lua_package_path "/var/nginx-lua-prometheus/?.lua";
	init_by_lua '
	  prometheus = require("prometheus").init("prometheus_metrics")
	  metric_requests = prometheus:counter(
	    "nginx_http_requests_total", "Number of HTTP requests", {"host", "status"})
	  metric_latency = prometheus:histogram(
	    "nginx_http_request_duration_seconds", "HTTP request latency", {"host"})
	  metric_connections = prometheus:gauge(
	    "nginx_http_connections", "Number of HTTP connections", {"state"})
	';
	log_by_lua '
	  metric_requests:inc(1, {ngx.var.server_name, ngx.var.status})
	  metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name})
	';
}

error

ubuntu@host:/var$ sudo nginx -t
nginx: [error] init_by_lua error: /var/nginx-lua-prometheus/prometheus.lua:283: attempt to index field 'dict' (a nil value)
stack traceback:
	/var/nginx-lua-prometheus/prometheus.lua:283: in function 'init'
	init_by_lua:2: in main chunk
nginx: configuration file /etc/nginx/nginx.conf test failed

any ideas?

Stream Dictionary?

I'm having trouble getting metrics working with the stream module based on the readme because lua_shared_dict will not work in that block. Is there a different way to store the metrics for them to be readable?

The example of prometheus:histogram in READ.md is wrong.

The example of prometheus:histogram in READ.md is wrong.
The first function should be counter(), the second function should be histogram(), because the parameters count is wrong).

metric_latency = prometheus:histogram( "nginx_http_request_duration_seconds", "HTTP request latency", {"host"}) metric_response_sizes = prometheus:counter( "nginx_http_response_size_bytes", "Size of HTTP responses", nil, {10,100,1000,10000,100000,1000000})

How serious Caveats are?

Hi,

I am experimenting with Nginx-Lua and so far so good but with some issues that might be caused by a misunderstanding from my side. First of all I am tracking API endpoints for each server/app, I guess that should not be a good practice since the cardinality is to high because some URL's are namespaced with user_ids. So I wonder how can I get metrics/alerts from slow endpoints and other information without actually storing endpoint hits.

By the other side for some Nginx-Lua jobs I am getting: net/http: request canceled as result of Prometheus Job. I've increased the timeout in Prometheus to 15s, same result, I don't think I have to allow the scrape to take forever so, what actually could be causing this and which is the best way to solve it?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.