github / brubeck Goto Github PK
View Code? Open in Web Editor NEWA Statsd-compatible metrics aggregator
License: MIT License
A Statsd-compatible metrics aggregator
License: MIT License
brubeck's config.default.json
has expire=5
and the carbon backend's frequency=10
. This leads to the brubeck internal metrics being expired twice (and becoming BRUBECK_EXPIRE_DISABLED
) before brubeck_internal_sample
has a chance to reset the metric to active. This leads to the internal metrics only being reported once or twice then falling forever silent.
Hey there, Brubeck is pretty awesome! But for a while now we've been under the assumption that it didn't do anything with sample rates due to this line in the readme:
Client-sent sampling rates are ignored.
But after digging in the code to check something else, I noticed sample rates are supported when I send something like test:1|c|@0.1
and get a value of 10 on flush. Is this line in the README outdated or does it mean something else?
According to original Statsd:
If the gauge is not updated at the next flush, it will send the previous value. You can opt to send no metric at all for this gauge, by setting config.deleteGauges
https://github.com/etsy/statsd/blob/master/docs/metric_types.md#gauges
As I can see Brubeck by default doesn't send previous value if no data. Is it bug or feature?
Do you have an option for changing that?
Any chance you could do these, just so that deployments could be reproducible?
In the brubeck/metric.c, there is this:
static void
counter__record(struct brubeck_metric *metric, value_t value)
{
pthread_spin_lock(&metric->lock);
{
if (metric->as.counter.previous > 0.0) {
value_t diff = (value >= metric->as.counter.previous) ?
(value - metric->as.counter.previous) :
(value);
metric->as.counter.value += diff;
}
metric->as.counter.previous = value;
}
pthread_spin_unlock(&metric->lock);
}
Think there are two servers reporting the same counter http_requests
, first one server sends 5, and the metric->as.counter.value is 5, then the second server sends 10, metric->as.counter.value is 10, but we expect the counter to be 15. Am I right?
Delete me please! I've found an alternative solution, thanks.
config.json:
{
...
"http" : "127.0.0.1:8080",
...
}
Expected: brubeck should bind only to the loopback interface / 127.0.0.1.
Actual: brubeck binds it's http listener on all interfaces (see *:8080
below)
[localhost]# ss -tulpn | grep brubeck
udp UNCONN 0 0 127.0.0.1:8125 *:* users:(("brubeck",pid=27614,fd=6))
tcp LISTEN 0 32 *:8080 *:* users:(("brubeck",pid=27614,fd=7))
Possible fix is to parse out the address from the "http" bind string and pass along as MHD_OPTION_SOCK_ADDR in brubeck_http_endpoint_init
.
See
Line 258 in c3b66aa
Hi
Seems like brubeck got overflow, when handling gauge-value more, than 2^32 (4294967296).
(I did not find in statsd specification max values for gauge:
https://github.com/etsy/statsd/blob/master/docs/metric_types.md,
and in my production envs we often sent values more than 2^32)
How to reproduce:
Send metric to brubeck and watch network packets (I use ngrep):
echo "complex.delete_me.mem:4294967296|g" | nc -u -q1 127.0.0.1 8126 ... U 127.0.0.1:38487 -> 127.0.0.1:8126 complex.delete_me.mem:4294967296|g.
Watch network traffic from brubeck to storage:
T 10.0.2.9:53546 -> 10.9.192.2:2003 [AP] complex.delete_me.mem 0 1493135074. # brubeck send 0 instead of 4294967296
My investigation lead me to this function.
https://github.com/github/brubeck/blob/master/src/utils.c#L137
May you provide fix for that case?
Thanks in advance!
It should sum up the README
Right now sending the following values...
val:-1|g
val:-1|g
val:-1|g
...in a flush cycle will output a final value of -3. But since this is a gauge, it's expected to be -1. This appears to be due to the following code:
https://github.com/github/brubeck/blob/master/src/samplers/statsd.c#L105
https://github.com/github/brubeck/blob/master/src/metric.c#L48
There seem to be some statsd aggregators that support negative gauges and some that do not. What is brubeck's stance, and what would be the best recommended way to get this functionality from brubeck? Maybe a runtime configuration switch to remove relative values from gauges altogether?
Trying to set up a local cluster using docker-compose and to link containers together you should use the hostname, however it seems that brubeck
cannot connect to the containers using hostnames defined in /etc/hosts
. I've attached the error I get from the log, the backends
snippet that is relevant to the issues and the contents of /etc/hosts
.
If I use the ip address from /etc/hosts
brubeck connects fine.
Let me know if you need more info :)
EDIT: I can ofcourse telnet carbon 2004
Log:
instance=brubeck_debug backend=carbon event=failed_to_connect errno=101 msg="Network is unreachable"
Config:
"backends" : [
{
"type" : "carbon",
"address" : "carbon",
"port" : 2004,
"frequency" : 10,
"pickle": true
}
],
/etc/hosts
:
$ cat /etc/hosts
172.17.0.155 93c76038bc47
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.150 carbon 087e1b0d7be2 statsdocker_carbon_1
I'm trying to compile brubeck on Debian Sid with the following library versions:
➜ brubeck git:(master) ✗ dpkg -l libmicrohttpd-dev libjansson-dev libssl-dev gcc
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=======================================-========================-========================-====================================================================================
ii gcc 4:6.1.1-1 amd64 GNU C compiler
ii libjansson-dev:amd64 2.7-5 amd64 C library for encoding, decoding and manipulating JSON data (dev)
ii libmicrohttpd-dev 0.9.51-1 amd64 library embedding HTTP server functionality (development)
ii libssl-dev:amd64 1.1.0b-2 amd64 Secure Sockets Layer toolkit - development files
It fails to compile with this error:
➜ brubeck git:(master) ✗ ./script/bootstrap
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/backend.c -o src/backend.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/backends/carbon.c -o src/backends/carbon.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/bloom.c -o src/bloom.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/city.c -o src/city.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/histogram.c -o src/histogram.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/ht.c -o src/ht.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/http.c -o src/http.o
src/http.c: In function ‘expire_metric’:
src/http.c:67:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
MHD_create_response_from_data (size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_metric’:
src/http.c:100:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
MHD_create_response_from_data (size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_stats’:
src/http.c:177:2: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
MHD_create_response_from_data (size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_ping’:
src/http.c:210:2: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
MHD_create_response_from_data (size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘handle_request’:
src/http.c:244:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
response = MHD_create_response_from_data(
^~~~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
MHD_create_response_from_data (size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/internal_sampler.c -o src/internal_sampler.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/log.c -o src/log.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/metric.c -o src/metric.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/sampler.c -o src/sampler.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/samplers/statsd-secure.c -o src/samplers/statsd-secure.o
src/samplers/statsd-secure.c: In function ‘statsd_secure__thread’:
src/samplers/statsd-secure.c:101:11: error: storage size of ‘ctx’ isn’t known
HMAC_CTX ctx;
^~~
src/samplers/statsd-secure.c:111:2: warning: implicit declaration of function ‘HMAC_CTX_init’ [-Wimplicit-function-declaration]
HMAC_CTX_init(&ctx);
^~~~~~~~~~~~~
src/samplers/statsd-secure.c:152:2: warning: implicit declaration of function ‘HMAC_CTX_cleanup’ [-Wimplicit-function-declaration]
HMAC_CTX_cleanup(&ctx);
^~~~~~~~~~~~~~~~
src/samplers/statsd-secure.c:101:11: warning: unused variable ‘ctx’ [-Wunused-variable]
HMAC_CTX ctx;
^~~
Makefile:44: recipe for target 'src/samplers/statsd-secure.o' failed
make: *** [src/samplers/statsd-secure.o] Error 1
I suspect OpenSSL 1.1, but haven't yet looked into it extensively. The program compiles just fine on a machine running Debian 8 with OpenSSL 1.0.1t.
https://github.com/github/brubeck/blob/master/src/samplers/statsd.c#L56
for (i = 0; i < SIM_PACKETS; ++i) {
Ketama is an alternative to cityhash which is designed to limit the amount of rebalancing when new servers are added / removed from a CH ring.
In addition to a previous change (#38) ... the following are required to avoid the stuck metric ...
diff --git a/src/samplers/statsd.c b/src/samplers/statsd.c
index 62cb33e..4d6c515 100644
--- a/src/samplers/statsd.c
+++ b/src/samplers/statsd.c
@@ -51,9 +51,9 @@ static void statsd_run_recvmmsg(struct brubeck_statsd *statsd, int sock)
}
/* store stats */
- brubeck_atomic_add(&statsd->sampler.inflow, SIM_PACKETS);
+ brubeck_atomic_add(&statsd->sampler.inflow, res);
- for (i = 0; i < SIM_PACKETS; ++i) {
+ for (i = 0; i < res; ++i) {
char *buf = msgs[i].msg_hdr.msg_iov->iov_base;
char *end = buf + msgs[i].msg_len;
brubeck_statsd_packet_parse(server, buf, end);
These were included in the original patch. Without these additional changes, a single metric in an otherwise idle system is still stuck.
The default setting --
Is too small, at least for me. I overflow it. I've built with a larger buffer, 16384, and no issues after that. The observed symptom is that all metrics are not visible and present in whisper, Graphite and downstream when I overwrite this buffer.
statsd_run_recvmsg()
subtracts 1 from the buffer size passed to recvfrom()
to allow room for the null byte that will be appended by brubeck_statsd_msg_parse()
. statsd_run_recvmmsg()
doesn't do this, so the null byte is written past the end of the buffer if a MAX_PACKET_SIZE
-byte packet is received.
Patch attached ... a bit rough but with a lot of road miles.
mac_os_x.657498c47e0cabc6aaecc6e1b0db8b25e5c0be22.patch.txt
Trying to create a debian package for this.. and default it runs make test..
Which has failed tests.. perhaps because of this (from compile step):
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"2536347\" -DBRUBECK_HAVE_MICROHTTPD -c src/http.c -o src/http.o
src/http.c: In function ‘expire_metric’:
src/http.c:67:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
MHD_create_response_from_data (size_t size,
^
src/http.c: In function ‘send_metric’:
src/http.c:100:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
MHD_create_response_from_data (size_t size,
^
src/http.c: In function ‘send_stats’:
src/http.c:177:2: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
MHD_create_response_from_data (size_t size,
^
src/http.c: In function ‘send_ping’:
src/http.c:210:2: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
return MHD_create_response_from_data(
^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
MHD_create_response_from_data (size_t size,
^
src/http.c: In function ‘handle_request’:
src/http.c:244:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
response = MHD_create_response_from_data(
^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
MHD_create_response_from_data (size_t size,
^
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendo
Test output:
== Entering suite #1, "histogram: time/data series aggregation" ==
[1:1] test_histogram__single_element:#1 "histogram size" pass
[1:2] test_histogram__single_element:#2 "histogram value count" pass
[1:3] test_histogram__single_element:#3 "sample.min" pass
[1:4] test_histogram__single_element:#4 "sample.max" pass
[1:5] test_histogram__single_element:#5 "sample.percentile[3]" pass
[1:6] test_histogram__single_element:#6 "sample.mean" pass
[1:7] test_histogram__single_element:#7 "sample.count" pass
[1:8] test_histogram__single_element:#8 "sample.sum" pass
[1:9] test_histogram__large_range:#1 "sample.min" pass
[1:10] test_histogram__large_range:#2 "sample.max" pass
[1:11] test_histogram__large_range:#3 "sample.median" pass
[1:12] test_histogram__multisamples:#1 "histogram size" pass
[1:13] test_histogram__multisamples:#2 "histogram value count" pass
[1:14] test_histogram__multisamples:#3 "sample.min" pass
[1:15] test_histogram__multisamples:#4 "sample.max" pass
[1:16] test_histogram__multisamples:#5 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:17] test_histogram__multisamples:#6 "sample.mean" pass
[1:18] test_histogram__multisamples:#7 "sample.count" pass
[1:19] test_histogram__multisamples:#8 "sample.sum" pass
[1:20] test_histogram__multisamples:#9 "histogram size" pass
[1:21] test_histogram__multisamples:#10 "histogram value count" pass
[1:22] test_histogram__multisamples:#11 "sample.min" pass
[1:23] test_histogram__multisamples:#12 "sample.max" pass
[1:24] test_histogram__multisamples:#13 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:25] test_histogram__multisamples:#14 "sample.mean" pass
[1:26] test_histogram__multisamples:#15 "sample.count" pass
[1:27] test_histogram__multisamples:#16 "sample.sum" pass
[1:28] test_histogram__multisamples:#17 "histogram size" pass
[1:29] test_histogram__multisamples:#18 "histogram value count" pass
[1:30] test_histogram__multisamples:#19 "sample.min" pass
[1:31] test_histogram__multisamples:#20 "sample.max" pass
[1:32] test_histogram__multisamples:#21 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:33] test_histogram__multisamples:#22 "sample.mean" pass
[1:34] test_histogram__multisamples:#23 "sample.count" pass
[1:35] test_histogram__multisamples:#24 "sample.sum" pass
[1:36] test_histogram__multisamples:#25 "histogram size" pass
[1:37] test_histogram__multisamples:#26 "histogram value count" pass
[1:38] test_histogram__multisamples:#27 "sample.min" pass
[1:39] test_histogram__multisamples:#28 "sample.max" pass
[1:40] test_histogram__multisamples:#29 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:41] test_histogram__multisamples:#30 "sample.mean" pass
[1:42] test_histogram__multisamples:#31 "sample.count" pass
[1:43] test_histogram__multisamples:#32 "sample.sum" pass
[1:44] test_histogram__multisamples:#33 "histogram size" pass
[1:45] test_histogram__multisamples:#34 "histogram value count" pass
[1:46] test_histogram__multisamples:#35 "sample.min" pass
[1:47] test_histogram__multisamples:#36 "sample.max" pass
[1:48] test_histogram__multisamples:#37 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:49] test_histogram__multisamples:#38 "sample.mean" pass
[1:50] test_histogram__multisamples:#39 "sample.count" pass
[1:51] test_histogram__multisamples:#40 "sample.sum" pass
[1:52] test_histogram__multisamples:#41 "histogram size" pass
[1:53] test_histogram__multisamples:#42 "histogram value count" pass
[1:54] test_histogram__multisamples:#43 "sample.min" pass
[1:55] test_histogram__multisamples:#44 "sample.max" pass
[1:56] test_histogram__multisamples:#45 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:57] test_histogram__multisamples:#46 "sample.mean" pass
[1:58] test_histogram__multisamples:#47 "sample.count" pass
[1:59] test_histogram__multisamples:#48 "sample.sum" pass
[1:60] test_histogram__multisamples:#49 "histogram size" pass
[1:61] test_histogram__multisamples:#50 "histogram value count" pass
[1:62] test_histogram__multisamples:#51 "sample.min" pass
[1:63] test_histogram__multisamples:#52 "sample.max" pass
[1:64] test_histogram__multisamples:#53 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:65] test_histogram__multisamples:#54 "sample.mean" pass
[1:66] test_histogram__multisamples:#55 "sample.count" pass
[1:67] test_histogram__multisamples:#56 "sample.sum" pass
[1:68] test_histogram__multisamples:#57 "histogram size" pass
[1:69] test_histogram__multisamples:#58 "histogram value count" pass
[1:70] test_histogram__multisamples:#59 "sample.min" pass
[1:71] test_histogram__multisamples:#60 "sample.max" pass
[1:72] test_histogram__multisamples:#61 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 105
[1:73] test_histogram__multisamples:#62 "sample.mean" pass
[1:74] test_histogram__multisamples:#63 "sample.count" pass
[1:75] test_histogram__multisamples:#64 "sample.sum" pass
[1:76] test_histogram__with_sample_rate:#1 "histogram size" pass
[1:77] test_histogram__with_sample_rate:#2 "histogram value count" pass
[1:78] test_histogram__with_sample_rate:#3 "sample.min" pass
[1:79] test_histogram__with_sample_rate:#4 "sample.max" pass
[1:80] test_histogram__with_sample_rate:#5 "sample.percentile[3]" FAIL
! Type: fail-unless
! Condition: sample.percentile[3] == 127.0
! Line: 130
[1:81] test_histogram__with_sample_rate:#6 "sample.mean" pass
[1:82] test_histogram__with_sample_rate:#7 "sample.count" pass
[1:83] test_histogram__with_sample_rate:#8 "sample.sum" pass
[1:84] test_histogram__capacity:#1 "histogram size" pass
[1:85] test_histogram__capacity:#2 "histogram value count" pass
[1:86] test_histogram__capacity:#3 "sample.min" pass
[1:87] test_histogram__capacity:#4 "sample.max" pass
[1:88] test_histogram__capacity:#5 "sample.count" pass
[1:89] test_histogram__capacity:#6 "histogram size" pass
[1:90] test_histogram__capacity:#7 "histogram value count" pass
[1:91] test_histogram__capacity:#8 "sample.min" pass
[1:92] test_histogram__capacity:#9 "sample.max" pass
[1:93] test_histogram__capacity:#10 "sample.count" pass
--> 93 check(s), 84 ok, 9 failed (9.68%)
== Entering suite #2, "mstore: concurrency test for metrics hash table" ==
[2:1] test_mstore__save:#1 "stored 15000 metrics in table" pass
[2:2] test_mstore__save:#2 "lookup all metrics from table" pass
--> 2 check(s), 2 ok, 0 failed (0.00%)
== Entering suite #3, "atomic: atomic primitives" ==
[3:1] test_atomic_spinlocks:#1 "spinlock doesn't race" pass
--> 1 check(s), 1 ok, 0 failed (0.00%)
== Entering suite #4, "ftoa: double-to-string conversion" ==
[4:1] test_ftoa:#1 "0" pass
[4:2] test_ftoa:#2 "15" pass
[4:3] test_ftoa:#3 "15.5" pass
[4:4] test_ftoa:#4 "15.505" pass
[4:5] test_ftoa:#5 "0.125" pass
[4:6] test_ftoa:#6 "1234.567" pass
[4:7] test_ftoa:#7 "100000" pass
[4:8] test_ftoa:#8 "0.999" pass
--> 8 check(s), 8 ok, 0 failed (0.00%)
== Entering suite #5, "statsd: packet parsing" ==
[5:1] test_statsd_msg__parse_strings:#1 "github.auth.fingerprint.sha1:1|c" pass
[5:2] test_statsd_msg__parse_strings:#2 "msg.value == expected" pass
[5:3] test_statsd_msg__parse_strings:#3 "msg.sample_rate == expected" pass
[5:4] test_statsd_msg__parse_strings:#4 "msg.modifiers == expected" pass
[5:5] test_statsd_msg__parse_strings:#5 "github.auth.fingerprint.sha1:1|c|@0.1" pass
[5:6] test_statsd_msg__parse_strings:#6 "msg.value == expected" pass
[5:7] test_statsd_msg__parse_strings:#7 "msg.sample_rate == expected" pass
[5:8] test_statsd_msg__parse_strings:#8 "msg.modifiers == expected" pass
[5:9] test_statsd_msg__parse_strings:#9 "github.auth.fingerprint.sha1:1|g" pass
[5:10] test_statsd_msg__parse_strings:#10 "msg.value == expected" pass
[5:11] test_statsd_msg__parse_strings:#11 "msg.sample_rate == expected" pass
[5:12] test_statsd_msg__parse_strings:#12 "msg.modifiers == expected" pass
[5:13] test_statsd_msg__parse_strings:#13 "lol:1|ms" pass
[5:14] test_statsd_msg__parse_strings:#14 "msg.value == expected" pass
[5:15] test_statsd_msg__parse_strings:#15 "msg.sample_rate == expected" pass
[5:16] test_statsd_msg__parse_strings:#16 "msg.modifiers == expected" pass
[5:17] test_statsd_msg__parse_strings:#17 "this.is.sparta:199812|C" pass
[5:18] test_statsd_msg__parse_strings:#18 "msg.value == expected" pass
[5:19] test_statsd_msg__parse_strings:#19 "msg.sample_rate == expected" pass
[5:20] test_statsd_msg__parse_strings:#20 "msg.modifiers == expected" pass
[5:21] test_statsd_msg__parse_strings:#21 "this.is.sparta:0012|h" pass
[5:22] test_statsd_msg__parse_strings:#22 "msg.value == expected" pass
[5:23] test_statsd_msg__parse_strings:#23 "msg.sample_rate == expected" pass
[5:24] test_statsd_msg__parse_strings:#24 "msg.modifiers == expected" pass
[5:25] test_statsd_msg__parse_strings:#25 "this.is.sparta:23.23|g" pass
[5:26] test_statsd_msg__parse_strings:#26 "msg.value == expected" pass
[5:27] test_statsd_msg__parse_strings:#27 "msg.sample_rate == expected" pass
[5:28] test_statsd_msg__parse_strings:#28 "msg.modifiers == expected" pass
[5:29] test_statsd_msg__parse_strings:#29 "this.is.sparta:0.232030|g" pass
[5:30] test_statsd_msg__parse_strings:#30 "msg.value == expected" pass
[5:31] test_statsd_msg__parse_strings:#31 "msg.sample_rate == expected" pass
[5:32] test_statsd_msg__parse_strings:#32 "msg.modifiers == expected" pass
[5:33] test_statsd_msg__parse_strings:#33 "this.are.some.floats:1234567.89|g" pass
[5:34] test_statsd_msg__parse_strings:#34 "msg.value == expected" pass
[5:35] test_statsd_msg__parse_strings:#35 "msg.sample_rate == expected" pass
[5:36] test_statsd_msg__parse_strings:#36 "msg.modifiers == expected" pass
[5:37] test_statsd_msg__parse_strings:#37 "this.are.some.floats:1234567.89|g|@0.025" pass
[5:38] test_statsd_msg__parse_strings:#38 "msg.value == expected" pass
[5:39] test_statsd_msg__parse_strings:#39 "msg.sample_rate == expected" pass
[5:40] test_statsd_msg__parse_strings:#40 "msg.modifiers == expected" pass
[5:41] test_statsd_msg__parse_strings:#41 "this.are.some.floats:1234567.89|g|@0.25" pass
[5:42] test_statsd_msg__parse_strings:#42 "msg.value == expected" pass
[5:43] test_statsd_msg__parse_strings:#43 "msg.sample_rate == expected" pass
[5:44] test_statsd_msg__parse_strings:#44 "msg.modifiers == expected" pass
[5:45] test_statsd_msg__parse_strings:#45 "this.are.some.floats:1234567.89|g|@0.01" pass
[5:46] test_statsd_msg__parse_strings:#46 "msg.value == expected" pass
[5:47] test_statsd_msg__parse_strings:#47 "msg.sample_rate == expected" pass
[5:48] test_statsd_msg__parse_strings:#48 "msg.modifiers == expected" pass
[5:49] test_statsd_msg__parse_strings:#49 "this.are.some.floats:1234567.89|g|@000.0100" pass
[5:50] test_statsd_msg__parse_strings:#50 "msg.value == expected" pass
[5:51] test_statsd_msg__parse_strings:#51 "msg.sample_rate == expected" pass
[5:52] test_statsd_msg__parse_strings:#52 "msg.modifiers == expected" pass
[5:53] test_statsd_msg__parse_strings:#53 "this.are.some.floats:1234567.89|g|@1.0" pass
[5:54] test_statsd_msg__parse_strings:#54 "msg.value == expected" pass
[5:55] test_statsd_msg__parse_strings:#55 "msg.sample_rate == expected" pass
[5:56] test_statsd_msg__parse_strings:#56 "msg.modifiers == expected" pass
[5:57] test_statsd_msg__parse_strings:#57 "this.are.some.floats:1234567.89|g|@1" pass
[5:58] test_statsd_msg__parse_strings:#58 "msg.value == expected" pass
[5:59] test_statsd_msg__parse_strings:#59 "msg.sample_rate == expected" pass
[5:60] test_statsd_msg__parse_strings:#60 "msg.modifiers == expected" pass
[5:61] test_statsd_msg__parse_strings:#61 "this.are.some.floats:1234567.89|g|@1." pass
[5:62] test_statsd_msg__parse_strings:#62 "msg.value == expected" pass
[5:63] test_statsd_msg__parse_strings:#63 "msg.sample_rate == expected" pass
[5:64] test_statsd_msg__parse_strings:#64 "msg.modifiers == expected" pass
[5:65] test_statsd_msg__parse_strings:#65 "this.are.some.floats:|g" pass
[5:66] test_statsd_msg__parse_strings:#66 "msg.value == expected" pass
[5:67] test_statsd_msg__parse_strings:#67 "msg.sample_rate == expected" pass
[5:68] test_statsd_msg__parse_strings:#68 "msg.modifiers == expected" pass
[5:69] test_statsd_msg__parse_strings:#69 "this.are.some.floats:1234567.89|g" pass
[5:70] test_statsd_msg__parse_strings:#70 "msg.value == expected" pass
[5:71] test_statsd_msg__parse_strings:#71 "msg.sample_rate == expected" pass
[5:72] test_statsd_msg__parse_strings:#72 "msg.modifiers == expected" pass
[5:73] test_statsd_msg__parse_strings:#73 "gauge.increment:+1|g" pass
[5:74] test_statsd_msg__parse_strings:#74 "msg.value == expected" pass
[5:75] test_statsd_msg__parse_strings:#75 "msg.sample_rate == expected" pass
[5:76] test_statsd_msg__parse_strings:#76 "msg.modifiers == expected" pass
[5:77] test_statsd_msg__parse_strings:#77 "gauge.decrement:-1|g" pass
[5:78] test_statsd_msg__parse_strings:#78 "msg.value == expected" pass
[5:79] test_statsd_msg__parse_strings:#79 "msg.sample_rate == expected" pass
[5:80] test_statsd_msg__parse_strings:#80 "msg.modifiers == expected" pass
[5:81] test_statsd_msg__parse_strings:#81 "this.are.some.floats:12.89.23|g" pass
[5:82] test_statsd_msg__parse_strings:#82 "this.are.some.floats:12.89|a" pass
[5:83] test_statsd_msg__parse_strings:#83 "this.are.some.floats:12.89|msdos" pass
[5:84] test_statsd_msg__parse_strings:#84 "this.are.some.floats:12.89g|g" pass
[5:85] test_statsd_msg__parse_strings:#85 "this.are.some.floats:12.89|" pass
[5:86] test_statsd_msg__parse_strings:#86 "this.are.some.floats:12.89" pass
[5:87] test_statsd_msg__parse_strings:#87 "this.are.some.floats:12.89 |g" pass
[5:88] test_statsd_msg__parse_strings:#88 "this.are.some.floats|g" pass
[5:89] test_statsd_msg__parse_strings:#89 "this.are.some.floats:1.0|g|1.0" pass
[5:90] test_statsd_msg__parse_strings:#90 "this.are.some.floats:1.0|g|0.1" pass
[5:91] test_statsd_msg__parse_strings:#91 "this.are.some.floats:1.0|g|@0.1.1" pass
[5:92] test_statsd_msg__parse_strings:#92 "this.are.some.floats:1.0|g|@0.1@" pass
[5:93] test_statsd_msg__parse_strings:#93 "this.are.some.floats:1.0|g|@0.1125.2" pass
[5:94] test_statsd_msg__parse_strings:#94 "this.are.some.floats:1.0|g|@0.1125.2" pass
[5:95] test_statsd_msg__parse_strings:#95 "this.are.some.floats:1.0|g|@1.23" pass
[5:96] test_statsd_msg__parse_strings:#96 "this.are.some.floats:1.0|g|@3.0" pass
[5:97] test_statsd_msg__parse_strings:#97 "this.are.some.floats:1.0|g|@-3.0" pass
[5:98] test_statsd_msg__parse_strings:#98 "this.are.some.floats:1.0|g|@-1.0" pass
[5:99] test_statsd_msg__parse_strings:#99 "this.are.some.floats:1.0|g|@-0.23" pass
[5:100] test_statsd_msg__parse_strings:#100 "this.are.some.floats:1.0|g|@0.0" pass
[5:101] test_statsd_msg__parse_strings:#101 "this.are.some.floats:1.0|g|@0" pass
--> 101 check(s), 101 ok, 0 failed (0.00%)
==> 205 check(s) in 5 suite(s) finished after 0.00 second(s),
196 succeeded, 9 failed (4.39%)
[FAILURE]
Makefile:56: recipe for target 'test' failed
make: *** [test] Error 1
I would like to point out that identifiers like "__BLOOM_FILTER_H
" and "__BRUBECK__H_
" do not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?
Expiry causes, potentially, a metric that has not reported in an interval to push a zero value to graphite when the total time to expiry/ DISABLED > the sampling interval. In this circumstance, the metric has not reported and consequently has no value, and yet a zero value is pushed to graphite.
For counters and meters, it implies that a metric has reported a zero value when it has not reported, or that the aggregated total of an incrementer/ decrementer is zero, for instance, when it has not reported. I haven't looked whether histograms/ timers report zero values, but I'm guess they might.
There is also the overhead of walking the list of metrics to expire them.
Would it not be more accurate, and more efficient, to mark a metric as ACTIVE when recorded and DISABLED when sampled? This way only metrics that have reported during the sample period will push a value to graphite -- and the zeros that represent non-reports/ non-expired will not get pushed?
I haven't a recipe for this yet; however, the situation is a metric, a counter, that emits hourly, just once. Several times in a 24 hour period, a metric for a given hour emits in the following hour.
This yields an hour without a metric, and an hour with a count of 2.0. I expect every hour to have a count of 1.0.
(Not my idea by the way to have an hourly metric, but it is what it is ...)
The Graphite project indicates that it honors the timestamp of the inbound metric. This suggests that the two metrics from brubeck are emitted with a timestamp of the same, in this case 10-second, bucket.
I can confirm that the metrics are sent at the proper time and received by brubeck at that time. I've added telemetry to log metric receipt, with timestamp and by regex, which logs the receipt of the metric by the statds sampler and can confirm that the metrics in question were received at the appropriate time.
My reporting interval is 10 seconds, and the expiry 7 seconds. I've varied this up to 20 seconds expiry without effect.
I will post more information as I have it.
The official statsd node client uses \n as separator for multiple metrics in a single packet.
https://github.com/etsy/statsd/blob/master/docs/server.md
"Multiple metrics can be received in a single packet if separated by the \n character."
Update:
A simple python script to test this:
import socket
addr = (socket.gethostbyname('localhost'), 8126)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
print sock.sendto('foo.baz:1|c\nfoo.def:1|c\n', addr)
This gives me
instance=brubeck_debug sampler=statsd event=bad_key key='foo.baz:1|c
foo.def:1|c
' from=127.0.0.1
If I use "\0" or "\n\0" as the separator I don't the the error but just the first metric is send to carbon.
Is it possible to support multiple metrics in one package ?
While testing i found when you send metrics exactly 11 or 4 characters long, those are dropped because if this checks (line 88+ in statsd.c):
int res = recvfrom(sock, buffer,
sizeof(buffer) - 1, 0,
(struct sockaddr *)&reporter, &reporter_len);
if (res == EAGAIN || res == EINTR || res == 0)
continue;
res is either the message length or -1 on error. EAGAIN = 11 and EINTR = 4 so messages with length 4 or 11 are dropped. This check should be res == -1 or replaced with the check further down, which check if res < 0 (line 99 in stats.c) and mark the message dropped.
I guess messages with length 4 or 11 are unlikely but maybe someone uses them.
In the brubeck source, at /brubeck-master/src/samplers/statsd.c, line 37, the rcvmmsg call should have the MSG_WAITFORONE flag set in the 4th argument, which is currently zero. With this flag NOT set, a single udp msg does not receive, and without a timeout, the call blocks indefinitely, waiting for > 1 message. This can result in the loss of lone messages, like sparse calls with a single metric.
Most of the time, the metrics flow is significant, and these single messages are flushed with additional messages, and the rcvmmsg will work. If, however, a single metric is sent, the call blocks indefinitely, leaving the message stranded.
This can be reproduced by sending a single metric to a running instance of brubeck. The single metric fails to emit in the line or pickle protocol to graphite. When the MSG_WAITFORONE flag is added, the metric flows through as expected.
The workaround is to change the brubeck configuration by setting the value of the "multimsg" setting in /etc/brubeck/config.json to 1, and then restart brubeck. This configuration change deactivates the rcvmmsg call detailed above, falling back to rcvmsg. It's not clear that this will have any performance implication with multiple worker threads in the receive pool; however, it seems likely that there will be some impact on a busy metrics flow.
I have built and tested with the flag above, and it works fine. (I also have a Mac build for anyone who wants it).
Would you like to add more error handling for return values from functions like the following?
Some of the included files appear to be GPL, cf: https://github.com/github/brubeck/blob/master/src/http/mongoose.c
This is problematic considering that brubeck doesn't appear to be GPL licensed?
Could the GPL licensed code be removed (or, less preferably, the entire project licensed GPL) so it can be used without issue?
Hi there, I made a docker container with brubeck, are you in any way interested in this or should I just keep it for myself?
It's located here https://github.com/Dinoshauer/brubeck-docker and here https://registry.hub.docker.com/u/dinoshauer/brubeck-docker/
Also, if I've done something horribly wrong, let me know :)
Hello,
"Brubeck is missing many of the features of the original StatsD. We've only implemented what we felt was necessary for our metrics stack."
Is there a resource listing what is available or.. I have to do tests/dig into code to know that
btw thanks for sharing brubeck!
The spec here -> https://github.com/b/statsd_spec <- indicates that counters are indicated by 'c' and meters by 'm'.
The statsd sampler indicates counters as 'C', meters as 'c', and fails to recognize 'm'.
Is there any reason why rate
(or as statsd calls it count_ps
) isn't calculated for timers?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.