Giter Club home page Giter Club logo

brubeck's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

brubeck's Issues

default config disables brubeck internal stats

brubeck's config.default.json has expire=5 and the carbon backend's frequency=10. This leads to the brubeck internal metrics being expired twice (and becoming BRUBECK_EXPIRE_DISABLED) before brubeck_internal_sample has a chance to reset the metric to active. This leads to the internal metrics only being reported once or twice then falling forever silent.

README is misleading concerning sample rates

Hey there, Brubeck is pretty awesome! But for a while now we've been under the assumption that it didn't do anything with sample rates due to this line in the readme:

Client-sent sampling rates are ignored.

But after digging in the code to check something else, I noticed sample rates are supported when I send something like test:1|c|@0.1 and get a value of 10 on flush. Is this line in the README outdated or does it mean something else?

What is the counter's behavior?

In the brubeck/metric.c, there is this:

static void
counter__record(struct brubeck_metric *metric, value_t value)
{
    pthread_spin_lock(&metric->lock);
    {
        if (metric->as.counter.previous > 0.0) {
            value_t diff = (value >= metric->as.counter.previous) ?
                (value - metric->as.counter.previous) :
                (value);

            metric->as.counter.value += diff;
        }

        metric->as.counter.previous = value;
    }
    pthread_spin_unlock(&metric->lock);
}

Think there are two servers reporting the same counter http_requests, first one server sends 5, and the metric->as.counter.value is 5, then the second server sends 10, metric->as.counter.value is 10, but we expect the counter to be 15. Am I right?

http daemon bind doesn't respect socket address from config file

config.json:

{
  ...
  "http" : "127.0.0.1:8080",
  ...
}

Expected: brubeck should bind only to the loopback interface / 127.0.0.1.

Actual: brubeck binds it's http listener on all interfaces (see *:8080 below)

[localhost]# ss -tulpn | grep brubeck
udp    UNCONN     0      0      127.0.0.1:8125     *:*     users:(("brubeck",pid=27614,fd=6))
tcp    LISTEN     0      32     *:8080             *:*     users:(("brubeck",pid=27614,fd=7))

Possible fix is to parse out the address from the "http" bind string and pass along as MHD_OPTION_SOCK_ADDR in brubeck_http_endpoint_init.

See

void brubeck_http_endpoint_init(struct brubeck_server *server, const char *listen)

Gauge overflow

Hi
Seems like brubeck got overflow, when handling gauge-value more, than 2^32 (4294967296).

(I did not find in statsd specification max values for gauge:
https://github.com/etsy/statsd/blob/master/docs/metric_types.md,
and in my production envs we often sent values more than 2^32)

How to reproduce:

  1. Send metric to brubeck and watch network packets (I use ngrep):
    echo "complex.delete_me.mem:4294967296|g" | nc -u -q1 127.0.0.1 8126 ... U 127.0.0.1:38487 -> 127.0.0.1:8126 complex.delete_me.mem:4294967296|g.

  2. Watch network traffic from brubeck to storage:
    T 10.0.2.9:53546 -> 10.9.192.2:2003 [AP] complex.delete_me.mem 0 1493135074. # brubeck send 0 instead of 4294967296

My investigation lead me to this function.
https://github.com/github/brubeck/blob/master/src/utils.c#L137

May you provide fix for that case?
Thanks in advance!

Negative gauge values

Right now sending the following values...

val:-1|g
val:-1|g
val:-1|g

...in a flush cycle will output a final value of -3. But since this is a gauge, it's expected to be -1. This appears to be due to the following code:

https://github.com/github/brubeck/blob/master/src/samplers/statsd.c#L105

https://github.com/github/brubeck/blob/master/src/metric.c#L48

There seem to be some statsd aggregators that support negative gauges and some that do not. What is brubeck's stance, and what would be the best recommended way to get this functionality from brubeck? Maybe a runtime configuration switch to remove relative values from gauges altogether?

Cannot use hostnames in config

Trying to set up a local cluster using docker-compose and to link containers together you should use the hostname, however it seems that brubeck cannot connect to the containers using hostnames defined in /etc/hosts. I've attached the error I get from the log, the backends snippet that is relevant to the issues and the contents of /etc/hosts.

If I use the ip address from /etc/hosts brubeck connects fine.

Let me know if you need more info :)

EDIT: I can ofcourse telnet carbon 2004
Log:

instance=brubeck_debug backend=carbon event=failed_to_connect errno=101 msg="Network is unreachable"

Config:

  "backends" : [
    {
      "type" : "carbon",
      "address" : "carbon",
      "port" : 2004,
      "frequency" : 10,
      "pickle": true
    }
  ],

/etc/hosts:

$ cat /etc/hosts
172.17.0.155    93c76038bc47
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.150    carbon 087e1b0d7be2 statsdocker_carbon_1

Fails to compile with newer library versions (libssl?)

I'm trying to compile brubeck on Debian Sid with the following library versions:

➜  brubeck git:(master) ✗ dpkg -l libmicrohttpd-dev libjansson-dev libssl-dev gcc
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                    Version                  Architecture             Description
+++-=======================================-========================-========================-====================================================================================
ii  gcc                                     4:6.1.1-1                amd64                    GNU C compiler
ii  libjansson-dev:amd64                    2.7-5                    amd64                    C library for encoding, decoding and manipulating JSON data (dev)
ii  libmicrohttpd-dev                       0.9.51-1                 amd64                    library embedding HTTP server functionality (development)
ii  libssl-dev:amd64                        1.1.0b-2                 amd64                    Secure Sockets Layer toolkit - development files

It fails to compile with this error:

➜  brubeck git:(master) ✗ ./script/bootstrap
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/backend.c -o src/backend.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/backends/carbon.c -o src/backends/carbon.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/bloom.c -o src/bloom.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/city.c -o src/city.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/histogram.c -o src/histogram.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/ht.c -o src/ht.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/http.c -o src/http.o
src/http.c: In function ‘expire_metric’:
src/http.c:67:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_metric’:
src/http.c:100:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_stats’:
src/http.c:177:2: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_ping’:
src/http.c:210:2: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘handle_request’:
src/http.c:244:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
   response = MHD_create_response_from_data(
   ^~~~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/internal_sampler.c -o src/internal_sampler.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/log.c -o src/log.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/metric.c -o src/metric.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/sampler.c -o src/sampler.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/samplers/statsd-secure.c -o src/samplers/statsd-secure.o
src/samplers/statsd-secure.c: In function ‘statsd_secure__thread’:
src/samplers/statsd-secure.c:101:11: error: storage size of ‘ctx’ isn’t known
  HMAC_CTX ctx;
           ^~~
src/samplers/statsd-secure.c:111:2: warning: implicit declaration of function ‘HMAC_CTX_init’ [-Wimplicit-function-declaration]
  HMAC_CTX_init(&ctx);
  ^~~~~~~~~~~~~
src/samplers/statsd-secure.c:152:2: warning: implicit declaration of function ‘HMAC_CTX_cleanup’ [-Wimplicit-function-declaration]
  HMAC_CTX_cleanup(&ctx);
  ^~~~~~~~~~~~~~~~
src/samplers/statsd-secure.c:101:11: warning: unused variable ‘ctx’ [-Wunused-variable]
  HMAC_CTX ctx;
           ^~~
Makefile:44: recipe for target 'src/samplers/statsd-secure.o' failed
make: *** [src/samplers/statsd-secure.o] Error 1

I suspect OpenSSL 1.1, but haven't yet looked into it extensively. The program compiles just fine on a machine running Debian 8 with OpenSSL 1.0.1t.

Single metric still stuck

In addition to a previous change (#38) ... the following are required to avoid the stuck metric ...

diff --git a/src/samplers/statsd.c b/src/samplers/statsd.c
index 62cb33e..4d6c515 100644
--- a/src/samplers/statsd.c
+++ b/src/samplers/statsd.c
@@ -51,9 +51,9 @@ static void statsd_run_recvmmsg(struct brubeck_statsd *statsd, int sock)
                }

                /* store stats */
-               brubeck_atomic_add(&statsd->sampler.inflow, SIM_PACKETS);
+               brubeck_atomic_add(&statsd->sampler.inflow, res);

-               for (i = 0; i < SIM_PACKETS; ++i) {
+               for (i = 0; i < res; ++i) {
                        char *buf = msgs[i].msg_hdr.msg_iov->iov_base;
                        char *end = buf + msgs[i].msg_len;
                        brubeck_statsd_packet_parse(server, buf, end);

These were included in the original patch. Without these additional changes, a single metric in an otherwise idle system is still stuck.

Overflowing pickle write buffer as set in src/backends/carbon.h

The default setting --

define PICKLE_BUFFER_SIZE 4096

Is too small, at least for me. I overflow it. I've built with a larger buffer, 16384, and no issues after that. The observed symptom is that all metrics are not visible and present in whisper, Graphite and downstream when I overwrite this buffer.

1-byte heap overflow when recvmmsg() is used

statsd_run_recvmsg() subtracts 1 from the buffer size passed to recvfrom() to allow room for the null byte that will be appended by brubeck_statsd_msg_parse(). statsd_run_recvmmsg() doesn't do this, so the null byte is written past the end of the buffer if a MAX_PACKET_SIZE-byte packet is received.

make test fails ?

Trying to create a debian package for this.. and default it runs make test..

Which has failed tests.. perhaps because of this (from compile step):

gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"2536347\" -DBRUBECK_HAVE_MICROHTTPD -c src/http.c -o src/http.o
src/http.c: In function ‘expire_metric’:
src/http.c:67:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘send_metric’:
src/http.c:100:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘send_stats’:
src/http.c:177:2: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘send_ping’:
src/http.c:210:2: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘handle_request’:
src/http.c:244:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
   response = MHD_create_response_from_data(
   ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendo

Test output:

== Entering suite #1, "histogram: time/data series aggregation" ==

[1:1]  test_histogram__single_element:#1  "histogram size"  pass
[1:2]  test_histogram__single_element:#2  "histogram value count"  pass
[1:3]  test_histogram__single_element:#3  "sample.min"  pass
[1:4]  test_histogram__single_element:#4  "sample.max"  pass
[1:5]  test_histogram__single_element:#5  "sample.percentile[3]"  pass
[1:6]  test_histogram__single_element:#6  "sample.mean"  pass
[1:7]  test_histogram__single_element:#7  "sample.count"  pass
[1:8]  test_histogram__single_element:#8  "sample.sum"  pass
[1:9]  test_histogram__large_range:#1  "sample.min"  pass
[1:10]  test_histogram__large_range:#2  "sample.max"  pass
[1:11]  test_histogram__large_range:#3  "sample.median"  pass
[1:12]  test_histogram__multisamples:#1  "histogram size"  pass
[1:13]  test_histogram__multisamples:#2  "histogram value count"  pass
[1:14]  test_histogram__multisamples:#3  "sample.min"  pass
[1:15]  test_histogram__multisamples:#4  "sample.max"  pass
[1:16]  test_histogram__multisamples:#5  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:17]  test_histogram__multisamples:#6  "sample.mean"  pass
[1:18]  test_histogram__multisamples:#7  "sample.count"  pass
[1:19]  test_histogram__multisamples:#8  "sample.sum"  pass
[1:20]  test_histogram__multisamples:#9  "histogram size"  pass
[1:21]  test_histogram__multisamples:#10  "histogram value count"  pass
[1:22]  test_histogram__multisamples:#11  "sample.min"  pass
[1:23]  test_histogram__multisamples:#12  "sample.max"  pass
[1:24]  test_histogram__multisamples:#13  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:25]  test_histogram__multisamples:#14  "sample.mean"  pass
[1:26]  test_histogram__multisamples:#15  "sample.count"  pass
[1:27]  test_histogram__multisamples:#16  "sample.sum"  pass
[1:28]  test_histogram__multisamples:#17  "histogram size"  pass
[1:29]  test_histogram__multisamples:#18  "histogram value count"  pass
[1:30]  test_histogram__multisamples:#19  "sample.min"  pass
[1:31]  test_histogram__multisamples:#20  "sample.max"  pass
[1:32]  test_histogram__multisamples:#21  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:33]  test_histogram__multisamples:#22  "sample.mean"  pass
[1:34]  test_histogram__multisamples:#23  "sample.count"  pass
[1:35]  test_histogram__multisamples:#24  "sample.sum"  pass
[1:36]  test_histogram__multisamples:#25  "histogram size"  pass
[1:37]  test_histogram__multisamples:#26  "histogram value count"  pass
[1:38]  test_histogram__multisamples:#27  "sample.min"  pass
[1:39]  test_histogram__multisamples:#28  "sample.max"  pass
[1:40]  test_histogram__multisamples:#29  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:41]  test_histogram__multisamples:#30  "sample.mean"  pass
[1:42]  test_histogram__multisamples:#31  "sample.count"  pass
[1:43]  test_histogram__multisamples:#32  "sample.sum"  pass
[1:44]  test_histogram__multisamples:#33  "histogram size"  pass
[1:45]  test_histogram__multisamples:#34  "histogram value count"  pass
[1:46]  test_histogram__multisamples:#35  "sample.min"  pass
[1:47]  test_histogram__multisamples:#36  "sample.max"  pass
[1:48]  test_histogram__multisamples:#37  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:49]  test_histogram__multisamples:#38  "sample.mean"  pass
[1:50]  test_histogram__multisamples:#39  "sample.count"  pass
[1:51]  test_histogram__multisamples:#40  "sample.sum"  pass
[1:52]  test_histogram__multisamples:#41  "histogram size"  pass
[1:53]  test_histogram__multisamples:#42  "histogram value count"  pass
[1:54]  test_histogram__multisamples:#43  "sample.min"  pass
[1:55]  test_histogram__multisamples:#44  "sample.max"  pass
[1:56]  test_histogram__multisamples:#45  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:57]  test_histogram__multisamples:#46  "sample.mean"  pass
[1:58]  test_histogram__multisamples:#47  "sample.count"  pass
[1:59]  test_histogram__multisamples:#48  "sample.sum"  pass
[1:60]  test_histogram__multisamples:#49  "histogram size"  pass
[1:61]  test_histogram__multisamples:#50  "histogram value count"  pass
[1:62]  test_histogram__multisamples:#51  "sample.min"  pass
[1:63]  test_histogram__multisamples:#52  "sample.max"  pass
[1:64]  test_histogram__multisamples:#53  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:65]  test_histogram__multisamples:#54  "sample.mean"  pass
[1:66]  test_histogram__multisamples:#55  "sample.count"  pass
[1:67]  test_histogram__multisamples:#56  "sample.sum"  pass
[1:68]  test_histogram__multisamples:#57  "histogram size"  pass
[1:69]  test_histogram__multisamples:#58  "histogram value count"  pass
[1:70]  test_histogram__multisamples:#59  "sample.min"  pass
[1:71]  test_histogram__multisamples:#60  "sample.max"  pass
[1:72]  test_histogram__multisamples:#61  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:73]  test_histogram__multisamples:#62  "sample.mean"  pass
[1:74]  test_histogram__multisamples:#63  "sample.count"  pass
[1:75]  test_histogram__multisamples:#64  "sample.sum"  pass
[1:76]  test_histogram__with_sample_rate:#1  "histogram size"  pass
[1:77]  test_histogram__with_sample_rate:#2  "histogram value count"  pass
[1:78]  test_histogram__with_sample_rate:#3  "sample.min"  pass
[1:79]  test_histogram__with_sample_rate:#4  "sample.max"  pass
[1:80]  test_histogram__with_sample_rate:#5  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      130
[1:81]  test_histogram__with_sample_rate:#6  "sample.mean"  pass
[1:82]  test_histogram__with_sample_rate:#7  "sample.count"  pass
[1:83]  test_histogram__with_sample_rate:#8  "sample.sum"  pass
[1:84]  test_histogram__capacity:#1  "histogram size"  pass
[1:85]  test_histogram__capacity:#2  "histogram value count"  pass
[1:86]  test_histogram__capacity:#3  "sample.min"  pass
[1:87]  test_histogram__capacity:#4  "sample.max"  pass
[1:88]  test_histogram__capacity:#5  "sample.count"  pass
[1:89]  test_histogram__capacity:#6  "histogram size"  pass
[1:90]  test_histogram__capacity:#7  "histogram value count"  pass
[1:91]  test_histogram__capacity:#8  "sample.min"  pass
[1:92]  test_histogram__capacity:#9  "sample.max"  pass
[1:93]  test_histogram__capacity:#10  "sample.count"  pass

--> 93 check(s), 84 ok, 9 failed (9.68%)

== Entering suite #2, "mstore: concurrency test for metrics hash table" ==

[2:1]  test_mstore__save:#1  "stored 15000 metrics in table"  pass
[2:2]  test_mstore__save:#2  "lookup all metrics from table"  pass

--> 2 check(s), 2 ok, 0 failed (0.00%)

== Entering suite #3, "atomic: atomic primitives" ==

[3:1]  test_atomic_spinlocks:#1  "spinlock doesn't race"  pass

--> 1 check(s), 1 ok, 0 failed (0.00%)

== Entering suite #4, "ftoa: double-to-string conversion" ==

[4:1]  test_ftoa:#1  "0"  pass
[4:2]  test_ftoa:#2  "15"  pass
[4:3]  test_ftoa:#3  "15.5"  pass
[4:4]  test_ftoa:#4  "15.505"  pass
[4:5]  test_ftoa:#5  "0.125"  pass
[4:6]  test_ftoa:#6  "1234.567"  pass
[4:7]  test_ftoa:#7  "100000"  pass
[4:8]  test_ftoa:#8  "0.999"  pass

--> 8 check(s), 8 ok, 0 failed (0.00%)

== Entering suite #5, "statsd: packet parsing" ==

[5:1]  test_statsd_msg__parse_strings:#1  "github.auth.fingerprint.sha1:1|c"  pass
[5:2]  test_statsd_msg__parse_strings:#2  "msg.value == expected"  pass
[5:3]  test_statsd_msg__parse_strings:#3  "msg.sample_rate == expected"  pass
[5:4]  test_statsd_msg__parse_strings:#4  "msg.modifiers == expected"  pass
[5:5]  test_statsd_msg__parse_strings:#5  "github.auth.fingerprint.sha1:1|c|@0.1"  pass
[5:6]  test_statsd_msg__parse_strings:#6  "msg.value == expected"  pass
[5:7]  test_statsd_msg__parse_strings:#7  "msg.sample_rate == expected"  pass
[5:8]  test_statsd_msg__parse_strings:#8  "msg.modifiers == expected"  pass
[5:9]  test_statsd_msg__parse_strings:#9  "github.auth.fingerprint.sha1:1|g"  pass
[5:10]  test_statsd_msg__parse_strings:#10  "msg.value == expected"  pass
[5:11]  test_statsd_msg__parse_strings:#11  "msg.sample_rate == expected"  pass
[5:12]  test_statsd_msg__parse_strings:#12  "msg.modifiers == expected"  pass
[5:13]  test_statsd_msg__parse_strings:#13  "lol:1|ms"  pass
[5:14]  test_statsd_msg__parse_strings:#14  "msg.value == expected"  pass
[5:15]  test_statsd_msg__parse_strings:#15  "msg.sample_rate == expected"  pass
[5:16]  test_statsd_msg__parse_strings:#16  "msg.modifiers == expected"  pass
[5:17]  test_statsd_msg__parse_strings:#17  "this.is.sparta:199812|C"  pass
[5:18]  test_statsd_msg__parse_strings:#18  "msg.value == expected"  pass
[5:19]  test_statsd_msg__parse_strings:#19  "msg.sample_rate == expected"  pass
[5:20]  test_statsd_msg__parse_strings:#20  "msg.modifiers == expected"  pass
[5:21]  test_statsd_msg__parse_strings:#21  "this.is.sparta:0012|h"  pass
[5:22]  test_statsd_msg__parse_strings:#22  "msg.value == expected"  pass
[5:23]  test_statsd_msg__parse_strings:#23  "msg.sample_rate == expected"  pass
[5:24]  test_statsd_msg__parse_strings:#24  "msg.modifiers == expected"  pass
[5:25]  test_statsd_msg__parse_strings:#25  "this.is.sparta:23.23|g"  pass
[5:26]  test_statsd_msg__parse_strings:#26  "msg.value == expected"  pass
[5:27]  test_statsd_msg__parse_strings:#27  "msg.sample_rate == expected"  pass
[5:28]  test_statsd_msg__parse_strings:#28  "msg.modifiers == expected"  pass
[5:29]  test_statsd_msg__parse_strings:#29  "this.is.sparta:0.232030|g"  pass
[5:30]  test_statsd_msg__parse_strings:#30  "msg.value == expected"  pass
[5:31]  test_statsd_msg__parse_strings:#31  "msg.sample_rate == expected"  pass
[5:32]  test_statsd_msg__parse_strings:#32  "msg.modifiers == expected"  pass
[5:33]  test_statsd_msg__parse_strings:#33  "this.are.some.floats:1234567.89|g"  pass
[5:34]  test_statsd_msg__parse_strings:#34  "msg.value == expected"  pass
[5:35]  test_statsd_msg__parse_strings:#35  "msg.sample_rate == expected"  pass
[5:36]  test_statsd_msg__parse_strings:#36  "msg.modifiers == expected"  pass
[5:37]  test_statsd_msg__parse_strings:#37  "this.are.some.floats:1234567.89|g|@0.025"  pass
[5:38]  test_statsd_msg__parse_strings:#38  "msg.value == expected"  pass
[5:39]  test_statsd_msg__parse_strings:#39  "msg.sample_rate == expected"  pass
[5:40]  test_statsd_msg__parse_strings:#40  "msg.modifiers == expected"  pass
[5:41]  test_statsd_msg__parse_strings:#41  "this.are.some.floats:1234567.89|g|@0.25"  pass
[5:42]  test_statsd_msg__parse_strings:#42  "msg.value == expected"  pass
[5:43]  test_statsd_msg__parse_strings:#43  "msg.sample_rate == expected"  pass
[5:44]  test_statsd_msg__parse_strings:#44  "msg.modifiers == expected"  pass
[5:45]  test_statsd_msg__parse_strings:#45  "this.are.some.floats:1234567.89|g|@0.01"  pass
[5:46]  test_statsd_msg__parse_strings:#46  "msg.value == expected"  pass
[5:47]  test_statsd_msg__parse_strings:#47  "msg.sample_rate == expected"  pass
[5:48]  test_statsd_msg__parse_strings:#48  "msg.modifiers == expected"  pass
[5:49]  test_statsd_msg__parse_strings:#49  "this.are.some.floats:1234567.89|g|@000.0100"  pass
[5:50]  test_statsd_msg__parse_strings:#50  "msg.value == expected"  pass
[5:51]  test_statsd_msg__parse_strings:#51  "msg.sample_rate == expected"  pass
[5:52]  test_statsd_msg__parse_strings:#52  "msg.modifiers == expected"  pass
[5:53]  test_statsd_msg__parse_strings:#53  "this.are.some.floats:1234567.89|g|@1.0"  pass
[5:54]  test_statsd_msg__parse_strings:#54  "msg.value == expected"  pass
[5:55]  test_statsd_msg__parse_strings:#55  "msg.sample_rate == expected"  pass
[5:56]  test_statsd_msg__parse_strings:#56  "msg.modifiers == expected"  pass
[5:57]  test_statsd_msg__parse_strings:#57  "this.are.some.floats:1234567.89|g|@1"  pass
[5:58]  test_statsd_msg__parse_strings:#58  "msg.value == expected"  pass
[5:59]  test_statsd_msg__parse_strings:#59  "msg.sample_rate == expected"  pass
[5:60]  test_statsd_msg__parse_strings:#60  "msg.modifiers == expected"  pass
[5:61]  test_statsd_msg__parse_strings:#61  "this.are.some.floats:1234567.89|g|@1."  pass
[5:62]  test_statsd_msg__parse_strings:#62  "msg.value == expected"  pass
[5:63]  test_statsd_msg__parse_strings:#63  "msg.sample_rate == expected"  pass
[5:64]  test_statsd_msg__parse_strings:#64  "msg.modifiers == expected"  pass
[5:65]  test_statsd_msg__parse_strings:#65  "this.are.some.floats:|g"  pass
[5:66]  test_statsd_msg__parse_strings:#66  "msg.value == expected"  pass
[5:67]  test_statsd_msg__parse_strings:#67  "msg.sample_rate == expected"  pass
[5:68]  test_statsd_msg__parse_strings:#68  "msg.modifiers == expected"  pass
[5:69]  test_statsd_msg__parse_strings:#69  "this.are.some.floats:1234567.89|g"  pass
[5:70]  test_statsd_msg__parse_strings:#70  "msg.value == expected"  pass
[5:71]  test_statsd_msg__parse_strings:#71  "msg.sample_rate == expected"  pass
[5:72]  test_statsd_msg__parse_strings:#72  "msg.modifiers == expected"  pass
[5:73]  test_statsd_msg__parse_strings:#73  "gauge.increment:+1|g"  pass
[5:74]  test_statsd_msg__parse_strings:#74  "msg.value == expected"  pass
[5:75]  test_statsd_msg__parse_strings:#75  "msg.sample_rate == expected"  pass
[5:76]  test_statsd_msg__parse_strings:#76  "msg.modifiers == expected"  pass
[5:77]  test_statsd_msg__parse_strings:#77  "gauge.decrement:-1|g"  pass
[5:78]  test_statsd_msg__parse_strings:#78  "msg.value == expected"  pass
[5:79]  test_statsd_msg__parse_strings:#79  "msg.sample_rate == expected"  pass
[5:80]  test_statsd_msg__parse_strings:#80  "msg.modifiers == expected"  pass
[5:81]  test_statsd_msg__parse_strings:#81  "this.are.some.floats:12.89.23|g"  pass
[5:82]  test_statsd_msg__parse_strings:#82  "this.are.some.floats:12.89|a"  pass
[5:83]  test_statsd_msg__parse_strings:#83  "this.are.some.floats:12.89|msdos"  pass
[5:84]  test_statsd_msg__parse_strings:#84  "this.are.some.floats:12.89g|g"  pass
[5:85]  test_statsd_msg__parse_strings:#85  "this.are.some.floats:12.89|"  pass
[5:86]  test_statsd_msg__parse_strings:#86  "this.are.some.floats:12.89"  pass
[5:87]  test_statsd_msg__parse_strings:#87  "this.are.some.floats:12.89 |g"  pass
[5:88]  test_statsd_msg__parse_strings:#88  "this.are.some.floats|g"  pass
[5:89]  test_statsd_msg__parse_strings:#89  "this.are.some.floats:1.0|g|1.0"  pass
[5:90]  test_statsd_msg__parse_strings:#90  "this.are.some.floats:1.0|g|0.1"  pass
[5:91]  test_statsd_msg__parse_strings:#91  "this.are.some.floats:1.0|g|@0.1.1"  pass
[5:92]  test_statsd_msg__parse_strings:#92  "this.are.some.floats:1.0|g|@0.1@"  pass
[5:93]  test_statsd_msg__parse_strings:#93  "this.are.some.floats:1.0|g|@0.1125.2"  pass
[5:94]  test_statsd_msg__parse_strings:#94  "this.are.some.floats:1.0|g|@0.1125.2"  pass
[5:95]  test_statsd_msg__parse_strings:#95  "this.are.some.floats:1.0|g|@1.23"  pass
[5:96]  test_statsd_msg__parse_strings:#96  "this.are.some.floats:1.0|g|@3.0"  pass
[5:97]  test_statsd_msg__parse_strings:#97  "this.are.some.floats:1.0|g|@-3.0"  pass
[5:98]  test_statsd_msg__parse_strings:#98  "this.are.some.floats:1.0|g|@-1.0"  pass
[5:99]  test_statsd_msg__parse_strings:#99  "this.are.some.floats:1.0|g|@-0.23"  pass
[5:100]  test_statsd_msg__parse_strings:#100  "this.are.some.floats:1.0|g|@0.0"  pass
[5:101]  test_statsd_msg__parse_strings:#101  "this.are.some.floats:1.0|g|@0"  pass

--> 101 check(s), 101 ok, 0 failed (0.00%)

==> 205 check(s) in 5 suite(s) finished after 0.00 second(s),
    196 succeeded, 9 failed (4.39%)

[FAILURE]
Makefile:56: recipe for target 'test' failed
make: *** [test] Error 1

Expiry alternative

Expiry causes, potentially, a metric that has not reported in an interval to push a zero value to graphite when the total time to expiry/ DISABLED > the sampling interval. In this circumstance, the metric has not reported and consequently has no value, and yet a zero value is pushed to graphite.

For counters and meters, it implies that a metric has reported a zero value when it has not reported, or that the aggregated total of an incrementer/ decrementer is zero, for instance, when it has not reported. I haven't looked whether histograms/ timers report zero values, but I'm guess they might.

There is also the overhead of walking the list of metrics to expire them.

Would it not be more accurate, and more efficient, to mark a metric as ACTIVE when recorded and DISABLED when sampled? This way only metrics that have reported during the sample period will push a value to graphite -- and the zeros that represent non-reports/ non-expired will not get pushed?

Any reason why bad key logging was removed?

PR #24 added in functionality to log bad keys sent to the StatsD sampler, but commit 1a0b863 removed that code and reverts Brubeck to logging packet_drop messages again. Was this done for performance or other reasons, or is it something I could submit a PR for?

Hourly metric emitted in following period

I haven't a recipe for this yet; however, the situation is a metric, a counter, that emits hourly, just once. Several times in a 24 hour period, a metric for a given hour emits in the following hour.

This yields an hour without a metric, and an hour with a count of 2.0. I expect every hour to have a count of 1.0.

(Not my idea by the way to have an hourly metric, but it is what it is ...)

The Graphite project indicates that it honors the timestamp of the inbound metric. This suggests that the two metrics from brubeck are emitted with a timestamp of the same, in this case 10-second, bucket.

I can confirm that the metrics are sent at the proper time and received by brubeck at that time. I've added telemetry to log metric receipt, with timestamp and by regex, which logs the receipt of the metric by the statds sampler and can confirm that the metrics in question were received at the appropriate time.

My reporting interval is 10 seconds, and the expiry 7 seconds. I've varied this up to 20 seconds expiry without effect.

I will post more information as I have it.

Statsd server uses \n as separator for multiple metrics

The official statsd node client uses \n as separator for multiple metrics in a single packet.

https://github.com/etsy/statsd/blob/master/docs/server.md
"Multiple metrics can be received in a single packet if separated by the \n character."

Update:
A simple python script to test this:

import socket
addr = (socket.gethostbyname('localhost'), 8126)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
print sock.sendto('foo.baz:1|c\nfoo.def:1|c\n', addr)

This gives me
instance=brubeck_debug sampler=statsd event=bad_key key='foo.baz:1|c
foo.def:1|c
' from=127.0.0.1

If I use "\0" or "\n\0" as the separator I don't the the error but just the first metric is send to carbon.

Is it possible to support multiple metrics in one package ?

wrong error checks for recvfrom and recvmmsg

While testing i found when you send metrics exactly 11 or 4 characters long, those are dropped because if this checks (line 88+ in statsd.c):

int res = recvfrom(sock, buffer,
        sizeof(buffer) - 1, 0,
        (struct sockaddr *)&reporter, &reporter_len);
if (res == EAGAIN || res == EINTR || res == 0)
    continue;

res is either the message length or -1 on error. EAGAIN = 11 and EINTR = 4 so messages with length 4 or 11 are dropped. This check should be res == -1 or replaced with the check further down, which check if res < 0 (line 99 in stats.c) and mark the message dropped.

I guess messages with length 4 or 11 are unlikely but maybe someone uses them.

Lone metrics stuck in rcvmmsg call

In the brubeck source, at /brubeck-master/src/samplers/statsd.c, line 37, the rcvmmsg call should have the MSG_WAITFORONE flag set in the 4th argument, which is currently zero. With this flag NOT set, a single udp msg does not receive, and without a timeout, the call blocks indefinitely, waiting for > 1 message. This can result in the loss of lone messages, like sparse calls with a single metric.

Most of the time, the metrics flow is significant, and these single messages are flushed with additional messages, and the rcvmmsg will work. If, however, a single metric is sent, the call blocks indefinitely, leaving the message stranded.

This can be reproduced by sending a single metric to a running instance of brubeck. The single metric fails to emit in the line or pickle protocol to graphite. When the MSG_WAITFORONE flag is added, the metric flows through as expected.

The workaround is to change the brubeck configuration by setting the value of the "multimsg" setting in /etc/brubeck/config.json to 1, and then restart brubeck. This configuration change deactivates the rcvmmsg call detailed above, falling back to rcvmsg. It's not clear that this will have any performance implication with multiple worker threads in the receive pool; however, it seems likely that there will be some impact on a busy metrics flow.

I have built and tested with the flag above, and it works fine. (I also have a Mac build for anyone who wants it).

Features documentation

Hello,

"Brubeck is missing many of the features of the original StatsD. We've only implemented what we felt was necessary for our metrics stack."

Is there a resource listing what is available or.. I have to do tests/dig into code to know that

btw thanks for sharing brubeck!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.