Giter Club home page Giter Club logo

brubeck's Introduction

Brubeck (unmaintained)

Brubeck is a statsd-compatible stats aggregator written in C. Brubeck is currently unmaintained.

List of known maintained forks

What is statsd?

Statsd is a metrics aggregator for Graphite (and other data storage backends). This technical documentation assumes working knowledge of what statsd is and how it works; please read the statsd documentation for more details.

Statsd is a good idea, and if you're using Graphite for metrics collection in your infrastructure, you probably want a statsd-compatible aggregator in front of it.

Tradeoffs

  • Brubeck is missing many of the features of the original StatsD. We've only implemented what we felt was necessary for our metrics stack.

  • Brubeck only runs on Linux. It won't even build on Mac OS X.

  • Some of the performance features require a (moderately) recent version of the kernel that you may not have.

Building

Brubeck has the following dependencies:

  • A Turing-complete computing device running a modern version of the Linux kernel (the kernel needs to be at least 2.6.33 in order to use multiple recvmsg support)

  • A compiler for the C programming language

  • Jansson (libjansson-dev on Debian) to load the configuration (version 2.5+ is required)

  • OpenSSL (libcrypto) if you're building StatsD-Secure support

  • libmicrohttpd (libmicrohttpd-dev) to have an internal HTTP stats endpoint. Build with BRUBECK_NO_HTTP to disable this.

Build brubeck by typing:

./script/bootstrap

Other operating systems or kernels can probably build Brubeck too. More specifically, Brubeck has been seen to work under FreeBSD and OpenBSD, but this is not supported.

Supported Metric Types

Brubeck supports most of the metric types from statsd and many other implementations.

  • g - Gauges
  • c - Meters
  • C - Counters
  • h - Histograms
  • ms - Timers (in milliseconds)

Client-sent sampling rates are ignored.

Visit the statsd docs for more information on metric types.

Interfacing

The are several ways to interact with a running Brubeck daemon.

Signals

Brubeck answers to the following signals:

  • SIGINT, SIGTERM: shutdown cleanly
  • SIGHUP: reopen the log files (in case you're using logrotate or an equivalent)
  • SIGUSR2: dump a newline-separated list of all the metrics currently aggregated by the daemon and their types.

HTTP Endpoint

If enabled on the config file, Brubeck can provide an HTTP API to poll its status. The following routes are available:

  • GET /ping: return a short JSON payload with the current status of the daemon (just to check it's up)
  • GET /stats: get a large JSON payload with full statistics, including active endpoints and throughputs
  • GET /metric/{{metric_name}}: get the current status of a metric, if it's being aggregated
  • POST /expire/{{metric_name}}: expire a metric that is no longer being reported to stop it from being aggregated to the backend

Configuration

The configuration for Brubeck is loaded through a JSON file, passed on the commandline.

./brubeck --config=my.config.json

If no configuration file is passed to the daemon, it will load config.default.json, which contains useful defaults for local development/testing.

The JSON file can contain the following sections:

  • server_name: a string identifying the name for this specific Brubeck instance. This will be used by the daemon when reporting its internal metrics.

  • dumpfile: a path where to store the metrics list when triggering a dump (see the section on Interfacing with the daemon)

  • http: if existing, this string sets the listen address and port for the HTTP API

  • backends: an array of the different backends to load. If more than one backend is loaded, brubeck will function in sharding mode, distributing aggregation load evenly through all the different backends through constant-hashing.

    • carbon: a backend that aggregates data into a Carbon cache. The backend sends all the aggregated data once every frequency seconds. By default the data is sent to the port 2003 of the Carbon cache (plain text protocol), but the pickle wire protocol can be enabled by setting pickle to true and changing the port accordingly.

      {
        "type" : "carbon",
        "address" : "0.0.0.0",
        "port" : 2003,
        "frequency" : 10,
        "pickle: true
      }
      

      We strongly encourage you to use the pickle wire protocol instead of plaintext, because carbon-relay.py is not very performant and will choke when parsing plaintext under enough load. Pickles are much softer CPU-wise on the Carbon relays, aggregators and caches.

      Hmmmm pickles. Now I'm hungry. Lincoln when's lunch?

  • samplers: an array of the different samplers to load. Samplers run on parallel and gather incoming metrics from the network.

    • statsd: the default statsd-compatible sampler. It listens on an UDP port for metrics packets. You can have more than one statsd sampler on the same daemon, but Brubeck was designed to support a single sampler taking the full metrics load on a single port.

      {
        "type" : "statsd",
        "address" : "0.0.0.0",
        "port" : 8126,
      }
      

      The StatsD sampler has the following options (and default values) for performance tuning:

      • "workers" : 4 number of worker threads that will service the StatsD socket endpoint. More threads means emptying the socket faster, but the context switching and cache smashing will affect performance. In general, you can saturate your NIC as long as you have enough worker threads (one per core) and a fast enough CPU. Set this to 1 if you want to run the daemon in event-loop mode. But that'd be silly. This is not Node.

      • "multisock" : false if set to true, Brubeck will use the SO_REUSEPORT flag available since Linux 3.9 to create one socket per worker thread and bind it to the same address/port. The kernel will then round-robin between the threads without forcing them to race for the socket. This improves performance by up to 30%, try benchmarking this if your Kernel is recent enough.

      • "multimsg" : 1 if set to greater than one, Brubeck will use the recvmmsg syscall (available since Linux 2.6.33) to read several UDP packets (the specified amount) in a single call and reduce the amount of context switches. This doesn't improve performance much with several worker threads, but may have an effect in a limited configuration with only one thread. Make it a power of two for better results. As always, benchmark. YMMV.

    • statsd-secure: like StatsD, but each packet has a HMAC that verifies its integrity. This is hella useful if you're running infrastructure in The Cloud (TM) (C) and you want to send back packets back to your VPN without them being tampered by third parties.

      {
        "type" : "statsd-secure",
        "address" : "0.0.0.0",
        "port" : 9126,
        "max_drift" : 3,
        "hmac_key" : "750c783e6ab0b503eaa86e310a5db738",
        "replay_len" : 8000
      }
      

      The address and port parts are obviously the same as in statsd.

      • max_drift defines the maximum time (in seconds) that packets can be delayed since they were sent from the origin. All metrics come with a timestamp, so metrics that drift more than this value will silently be discared.

      • hmac_key is the shared HMAC secret. The client sending the metrics must also know this in order to sign them.

      • replay_len is the size of the bloom filter that will be used to prevent replay attacks. We use a rolling bloom filter (one for every drift second), so replay_len should roughly be the amount of unique metrics you expect to receive in a 1s interval.

      NOTE: StatsD-secure doesn't run with multiple worker threads because verifying signatures is already slow enough. Don't use this in performance critical scenarios.

      NOTE: StatsD-secure uses a bloom filter to prevent replay attacks, so a small percentage of metrics will be dropped because of false positives. Take this into consideration.

      NOTE: An HMAC does not encrypt the packets, it just verifies its integrity. If you need to protect the content of the packets from eavesdropping, get those external machines in your VPN.

      NOTE: StatsD-secure may or may not be a good idea. If you have the chance to send all your metrics inside a VPN, I suggest you do that instead.

Testing

There's some tests in the test folder for key parts of the system (such as packet parsing, and all concurrent data access); besides that we test the behavior of the daemon live on staging and production systems.

  • Small changes are deployed into production as-is, straight from their feature branch. Deployment happens in 3 seconds for all the Brubeck instances in our infrastructure, so we can roll back into the master branch immediately if something fails.

  • For critical changes, we multiplex a copy of the metrics stream into an Unix domain socket, so we can have two instances of the daemon (old and new) aggregating to the production cluster and a staging cluster, and verify that the metrics flow into the two clusters is equivalent.

  • Benchmarking is performed on real hardware in our datacenter. The daemon is spammed with fake metrics across the network and we ensure that there are no regressions (particularly in the linear scaling between cores for the statsd sampler).

When in doubt, please refer to the part of the MIT license that says "THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED". We use Brubeck in production and have been doing so for years, but we cannot make any promises regarding availability or performance.

FAQ

  • I cannot hit 4 million UDP metrics per second. I want my money back.

Make sure receiver-side scaling is properly configured in your kernel and that IRQs are being serviced by different cores, and that the daemon's threads are not pinned to a specific core. Make sure you're running the daemon in a physical machine and not a cheap cloud VPS. Make sure your NIC has the right drivers and it's not bottlenecking. Install a newer kernel and try running with SO_REUSEPORT.

If nothing works, refunds are available upon request. Just get mad at me on Twitter.

brubeck's People

Contributors

alindeman avatar carlosmn avatar goir avatar haneefmubarak avatar iamfletch avatar ionicabizau avatar joeshaw avatar linkslice avatar mhr3 avatar mikemcquaid avatar vmg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

brubeck's Issues

README is misleading concerning sample rates

Hey there, Brubeck is pretty awesome! But for a while now we've been under the assumption that it didn't do anything with sample rates due to this line in the readme:

Client-sent sampling rates are ignored.

But after digging in the code to check something else, I noticed sample rates are supported when I send something like test:1|c|@0.1 and get a value of 10 on flush. Is this line in the README outdated or does it mean something else?

default config disables brubeck internal stats

brubeck's config.default.json has expire=5 and the carbon backend's frequency=10. This leads to the brubeck internal metrics being expired twice (and becoming BRUBECK_EXPIRE_DISABLED) before brubeck_internal_sample has a chance to reset the metric to active. This leads to the internal metrics only being reported once or twice then falling forever silent.

What is the counter's behavior?

In the brubeck/metric.c, there is this:

static void
counter__record(struct brubeck_metric *metric, value_t value)
{
    pthread_spin_lock(&metric->lock);
    {
        if (metric->as.counter.previous > 0.0) {
            value_t diff = (value >= metric->as.counter.previous) ?
                (value - metric->as.counter.previous) :
                (value);

            metric->as.counter.value += diff;
        }

        metric->as.counter.previous = value;
    }
    pthread_spin_unlock(&metric->lock);
}

Think there are two servers reporting the same counter http_requests, first one server sends 5, and the metric->as.counter.value is 5, then the second server sends 10, metric->as.counter.value is 10, but we expect the counter to be 15. Am I right?

Cannot use hostnames in config

Trying to set up a local cluster using docker-compose and to link containers together you should use the hostname, however it seems that brubeck cannot connect to the containers using hostnames defined in /etc/hosts. I've attached the error I get from the log, the backends snippet that is relevant to the issues and the contents of /etc/hosts.

If I use the ip address from /etc/hosts brubeck connects fine.

Let me know if you need more info :)

EDIT: I can ofcourse telnet carbon 2004
Log:

instance=brubeck_debug backend=carbon event=failed_to_connect errno=101 msg="Network is unreachable"

Config:

  "backends" : [
    {
      "type" : "carbon",
      "address" : "carbon",
      "port" : 2004,
      "frequency" : 10,
      "pickle": true
    }
  ],

/etc/hosts:

$ cat /etc/hosts
172.17.0.155    93c76038bc47
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.150    carbon 087e1b0d7be2 statsdocker_carbon_1

Single metric still stuck

In addition to a previous change (#38) ... the following are required to avoid the stuck metric ...

diff --git a/src/samplers/statsd.c b/src/samplers/statsd.c
index 62cb33e..4d6c515 100644
--- a/src/samplers/statsd.c
+++ b/src/samplers/statsd.c
@@ -51,9 +51,9 @@ static void statsd_run_recvmmsg(struct brubeck_statsd *statsd, int sock)
                }

                /* store stats */
-               brubeck_atomic_add(&statsd->sampler.inflow, SIM_PACKETS);
+               brubeck_atomic_add(&statsd->sampler.inflow, res);

-               for (i = 0; i < SIM_PACKETS; ++i) {
+               for (i = 0; i < res; ++i) {
                        char *buf = msgs[i].msg_hdr.msg_iov->iov_base;
                        char *end = buf + msgs[i].msg_len;
                        brubeck_statsd_packet_parse(server, buf, end);

These were included in the original patch. Without these additional changes, a single metric in an otherwise idle system is still stuck.

Hourly metric emitted in following period

I haven't a recipe for this yet; however, the situation is a metric, a counter, that emits hourly, just once. Several times in a 24 hour period, a metric for a given hour emits in the following hour.

This yields an hour without a metric, and an hour with a count of 2.0. I expect every hour to have a count of 1.0.

(Not my idea by the way to have an hourly metric, but it is what it is ...)

The Graphite project indicates that it honors the timestamp of the inbound metric. This suggests that the two metrics from brubeck are emitted with a timestamp of the same, in this case 10-second, bucket.

I can confirm that the metrics are sent at the proper time and received by brubeck at that time. I've added telemetry to log metric receipt, with timestamp and by regex, which logs the receipt of the metric by the statds sampler and can confirm that the metrics in question were received at the appropriate time.

My reporting interval is 10 seconds, and the expiry 7 seconds. I've varied this up to 20 seconds expiry without effect.

I will post more information as I have it.

Overflowing pickle write buffer as set in src/backends/carbon.h

The default setting --

define PICKLE_BUFFER_SIZE 4096

Is too small, at least for me. I overflow it. I've built with a larger buffer, 16384, and no issues after that. The observed symptom is that all metrics are not visible and present in whisper, Graphite and downstream when I overwrite this buffer.

Any reason why bad key logging was removed?

PR #24 added in functionality to log bad keys sent to the StatsD sampler, but commit 1a0b863 removed that code and reverts Brubeck to logging packet_drop messages again. Was this done for performance or other reasons, or is it something I could submit a PR for?

Negative gauge values

Right now sending the following values...

val:-1|g
val:-1|g
val:-1|g

...in a flush cycle will output a final value of -3. But since this is a gauge, it's expected to be -1. This appears to be due to the following code:

https://github.com/github/brubeck/blob/master/src/samplers/statsd.c#L105

https://github.com/github/brubeck/blob/master/src/metric.c#L48

There seem to be some statsd aggregators that support negative gauges and some that do not. What is brubeck's stance, and what would be the best recommended way to get this functionality from brubeck? Maybe a runtime configuration switch to remove relative values from gauges altogether?

Statsd server uses \n as separator for multiple metrics

The official statsd node client uses \n as separator for multiple metrics in a single packet.

https://github.com/etsy/statsd/blob/master/docs/server.md
"Multiple metrics can be received in a single packet if separated by the \n character."

Update:
A simple python script to test this:

import socket
addr = (socket.gethostbyname('localhost'), 8126)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
print sock.sendto('foo.baz:1|c\nfoo.def:1|c\n', addr)

This gives me
instance=brubeck_debug sampler=statsd event=bad_key key='foo.baz:1|c
foo.def:1|c
' from=127.0.0.1

If I use "\0" or "\n\0" as the separator I don't the the error but just the first metric is send to carbon.

Is it possible to support multiple metrics in one package ?

wrong error checks for recvfrom and recvmmsg

While testing i found when you send metrics exactly 11 or 4 characters long, those are dropped because if this checks (line 88+ in statsd.c):

int res = recvfrom(sock, buffer,
        sizeof(buffer) - 1, 0,
        (struct sockaddr *)&reporter, &reporter_len);
if (res == EAGAIN || res == EINTR || res == 0)
    continue;

res is either the message length or -1 on error. EAGAIN = 11 and EINTR = 4 so messages with length 4 or 11 are dropped. This check should be res == -1 or replaced with the check further down, which check if res < 0 (line 99 in stats.c) and mark the message dropped.

I guess messages with length 4 or 11 are unlikely but maybe someone uses them.

http daemon bind doesn't respect socket address from config file

config.json:

{
  ...
  "http" : "127.0.0.1:8080",
  ...
}

Expected: brubeck should bind only to the loopback interface / 127.0.0.1.

Actual: brubeck binds it's http listener on all interfaces (see *:8080 below)

[localhost]# ss -tulpn | grep brubeck
udp    UNCONN     0      0      127.0.0.1:8125     *:*     users:(("brubeck",pid=27614,fd=6))
tcp    LISTEN     0      32     *:8080             *:*     users:(("brubeck",pid=27614,fd=7))

Possible fix is to parse out the address from the "http" bind string and pass along as MHD_OPTION_SOCK_ADDR in brubeck_http_endpoint_init.

See

void brubeck_http_endpoint_init(struct brubeck_server *server, const char *listen)

Expiry alternative

Expiry causes, potentially, a metric that has not reported in an interval to push a zero value to graphite when the total time to expiry/ DISABLED > the sampling interval. In this circumstance, the metric has not reported and consequently has no value, and yet a zero value is pushed to graphite.

For counters and meters, it implies that a metric has reported a zero value when it has not reported, or that the aggregated total of an incrementer/ decrementer is zero, for instance, when it has not reported. I haven't looked whether histograms/ timers report zero values, but I'm guess they might.

There is also the overhead of walking the list of metrics to expire them.

Would it not be more accurate, and more efficient, to mark a metric as ACTIVE when recorded and DISABLED when sampled? This way only metrics that have reported during the sample period will push a value to graphite -- and the zeros that represent non-reports/ non-expired will not get pushed?

Fails to compile with newer library versions (libssl?)

I'm trying to compile brubeck on Debian Sid with the following library versions:

➜  brubeck git:(master) ✗ dpkg -l libmicrohttpd-dev libjansson-dev libssl-dev gcc
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                    Version                  Architecture             Description
+++-=======================================-========================-========================-====================================================================================
ii  gcc                                     4:6.1.1-1                amd64                    GNU C compiler
ii  libjansson-dev:amd64                    2.7-5                    amd64                    C library for encoding, decoding and manipulating JSON data (dev)
ii  libmicrohttpd-dev                       0.9.51-1                 amd64                    library embedding HTTP server functionality (development)
ii  libssl-dev:amd64                        1.1.0b-2                 amd64                    Secure Sockets Layer toolkit - development files

It fails to compile with this error:

➜  brubeck git:(master) ✗ ./script/bootstrap
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/backend.c -o src/backend.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/backends/carbon.c -o src/backends/carbon.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/bloom.c -o src/bloom.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/city.c -o src/city.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/histogram.c -o src/histogram.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/ht.c -o src/ht.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/http.c -o src/http.o
src/http.c: In function ‘expire_metric’:
src/http.c:67:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_metric’:
src/http.c:100:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_stats’:
src/http.c:177:2: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘send_ping’:
src/http.c:210:2: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/http.c: In function ‘handle_request’:
src/http.c:244:3: warning: ‘MHD_create_response_from_data’ is deprecated: MHD_create_response_from_data() is deprecated, use MHD_create_response_from_buffer() [-Wdeprecated-declarations]
   response = MHD_create_response_from_data(
   ^~~~~~~~
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2079:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/internal_sampler.c -o src/internal_sampler.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/log.c -o src/log.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/metric.c -o src/metric.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/sampler.c -o src/sampler.o
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"dad4b3b\" -DBRUBECK_HAVE_MICROHTTPD -c src/samplers/statsd-secure.c -o src/samplers/statsd-secure.o
src/samplers/statsd-secure.c: In function ‘statsd_secure__thread’:
src/samplers/statsd-secure.c:101:11: error: storage size of ‘ctx’ isn’t known
  HMAC_CTX ctx;
           ^~~
src/samplers/statsd-secure.c:111:2: warning: implicit declaration of function ‘HMAC_CTX_init’ [-Wimplicit-function-declaration]
  HMAC_CTX_init(&ctx);
  ^~~~~~~~~~~~~
src/samplers/statsd-secure.c:152:2: warning: implicit declaration of function ‘HMAC_CTX_cleanup’ [-Wimplicit-function-declaration]
  HMAC_CTX_cleanup(&ctx);
  ^~~~~~~~~~~~~~~~
src/samplers/statsd-secure.c:101:11: warning: unused variable ‘ctx’ [-Wunused-variable]
  HMAC_CTX ctx;
           ^~~
Makefile:44: recipe for target 'src/samplers/statsd-secure.o' failed
make: *** [src/samplers/statsd-secure.o] Error 1

I suspect OpenSSL 1.1, but haven't yet looked into it extensively. The program compiles just fine on a machine running Debian 8 with OpenSSL 1.0.1t.

Features documentation

Hello,

"Brubeck is missing many of the features of the original StatsD. We've only implemented what we felt was necessary for our metrics stack."

Is there a resource listing what is available or.. I have to do tests/dig into code to know that

btw thanks for sharing brubeck!

make test fails ?

Trying to create a debian package for this.. and default it runs make test..

Which has failed tests.. perhaps because of this (from compile step):

gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendor/ck/include -DNDEBUG=1 -DGIT_SHA=\"2536347\" -DBRUBECK_HAVE_MICROHTTPD -c src/http.c -o src/http.o
src/http.c: In function ‘expire_metric’:
src/http.c:67:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘send_metric’:
src/http.c:100:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
   return MHD_create_response_from_data(
   ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘send_stats’:
src/http.c:177:2: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘send_ping’:
src/http.c:210:2: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
  return MHD_create_response_from_data(
  ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
src/http.c: In function ‘handle_request’:
src/http.c:244:3: warning: ‘MHD_create_response_from_data’ is deprecated [-Wdeprecated-declarations]
   response = MHD_create_response_from_data(
   ^
In file included from src/http.c:4:0:
/usr/include/microhttpd.h:2022:1: note: declared here
 MHD_create_response_from_data (size_t size,
 ^
gcc -g -Wall -O3 -Wno-strict-aliasing -Isrc -Ivendo

Test output:

== Entering suite #1, "histogram: time/data series aggregation" ==

[1:1]  test_histogram__single_element:#1  "histogram size"  pass
[1:2]  test_histogram__single_element:#2  "histogram value count"  pass
[1:3]  test_histogram__single_element:#3  "sample.min"  pass
[1:4]  test_histogram__single_element:#4  "sample.max"  pass
[1:5]  test_histogram__single_element:#5  "sample.percentile[3]"  pass
[1:6]  test_histogram__single_element:#6  "sample.mean"  pass
[1:7]  test_histogram__single_element:#7  "sample.count"  pass
[1:8]  test_histogram__single_element:#8  "sample.sum"  pass
[1:9]  test_histogram__large_range:#1  "sample.min"  pass
[1:10]  test_histogram__large_range:#2  "sample.max"  pass
[1:11]  test_histogram__large_range:#3  "sample.median"  pass
[1:12]  test_histogram__multisamples:#1  "histogram size"  pass
[1:13]  test_histogram__multisamples:#2  "histogram value count"  pass
[1:14]  test_histogram__multisamples:#3  "sample.min"  pass
[1:15]  test_histogram__multisamples:#4  "sample.max"  pass
[1:16]  test_histogram__multisamples:#5  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:17]  test_histogram__multisamples:#6  "sample.mean"  pass
[1:18]  test_histogram__multisamples:#7  "sample.count"  pass
[1:19]  test_histogram__multisamples:#8  "sample.sum"  pass
[1:20]  test_histogram__multisamples:#9  "histogram size"  pass
[1:21]  test_histogram__multisamples:#10  "histogram value count"  pass
[1:22]  test_histogram__multisamples:#11  "sample.min"  pass
[1:23]  test_histogram__multisamples:#12  "sample.max"  pass
[1:24]  test_histogram__multisamples:#13  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:25]  test_histogram__multisamples:#14  "sample.mean"  pass
[1:26]  test_histogram__multisamples:#15  "sample.count"  pass
[1:27]  test_histogram__multisamples:#16  "sample.sum"  pass
[1:28]  test_histogram__multisamples:#17  "histogram size"  pass
[1:29]  test_histogram__multisamples:#18  "histogram value count"  pass
[1:30]  test_histogram__multisamples:#19  "sample.min"  pass
[1:31]  test_histogram__multisamples:#20  "sample.max"  pass
[1:32]  test_histogram__multisamples:#21  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:33]  test_histogram__multisamples:#22  "sample.mean"  pass
[1:34]  test_histogram__multisamples:#23  "sample.count"  pass
[1:35]  test_histogram__multisamples:#24  "sample.sum"  pass
[1:36]  test_histogram__multisamples:#25  "histogram size"  pass
[1:37]  test_histogram__multisamples:#26  "histogram value count"  pass
[1:38]  test_histogram__multisamples:#27  "sample.min"  pass
[1:39]  test_histogram__multisamples:#28  "sample.max"  pass
[1:40]  test_histogram__multisamples:#29  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:41]  test_histogram__multisamples:#30  "sample.mean"  pass
[1:42]  test_histogram__multisamples:#31  "sample.count"  pass
[1:43]  test_histogram__multisamples:#32  "sample.sum"  pass
[1:44]  test_histogram__multisamples:#33  "histogram size"  pass
[1:45]  test_histogram__multisamples:#34  "histogram value count"  pass
[1:46]  test_histogram__multisamples:#35  "sample.min"  pass
[1:47]  test_histogram__multisamples:#36  "sample.max"  pass
[1:48]  test_histogram__multisamples:#37  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:49]  test_histogram__multisamples:#38  "sample.mean"  pass
[1:50]  test_histogram__multisamples:#39  "sample.count"  pass
[1:51]  test_histogram__multisamples:#40  "sample.sum"  pass
[1:52]  test_histogram__multisamples:#41  "histogram size"  pass
[1:53]  test_histogram__multisamples:#42  "histogram value count"  pass
[1:54]  test_histogram__multisamples:#43  "sample.min"  pass
[1:55]  test_histogram__multisamples:#44  "sample.max"  pass
[1:56]  test_histogram__multisamples:#45  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:57]  test_histogram__multisamples:#46  "sample.mean"  pass
[1:58]  test_histogram__multisamples:#47  "sample.count"  pass
[1:59]  test_histogram__multisamples:#48  "sample.sum"  pass
[1:60]  test_histogram__multisamples:#49  "histogram size"  pass
[1:61]  test_histogram__multisamples:#50  "histogram value count"  pass
[1:62]  test_histogram__multisamples:#51  "sample.min"  pass
[1:63]  test_histogram__multisamples:#52  "sample.max"  pass
[1:64]  test_histogram__multisamples:#53  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:65]  test_histogram__multisamples:#54  "sample.mean"  pass
[1:66]  test_histogram__multisamples:#55  "sample.count"  pass
[1:67]  test_histogram__multisamples:#56  "sample.sum"  pass
[1:68]  test_histogram__multisamples:#57  "histogram size"  pass
[1:69]  test_histogram__multisamples:#58  "histogram value count"  pass
[1:70]  test_histogram__multisamples:#59  "sample.min"  pass
[1:71]  test_histogram__multisamples:#60  "sample.max"  pass
[1:72]  test_histogram__multisamples:#61  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      105
[1:73]  test_histogram__multisamples:#62  "sample.mean"  pass
[1:74]  test_histogram__multisamples:#63  "sample.count"  pass
[1:75]  test_histogram__multisamples:#64  "sample.sum"  pass
[1:76]  test_histogram__with_sample_rate:#1  "histogram size"  pass
[1:77]  test_histogram__with_sample_rate:#2  "histogram value count"  pass
[1:78]  test_histogram__with_sample_rate:#3  "sample.min"  pass
[1:79]  test_histogram__with_sample_rate:#4  "sample.max"  pass
[1:80]  test_histogram__with_sample_rate:#5  "sample.percentile[3]"  FAIL
!    Type:      fail-unless
!    Condition: sample.percentile[3] == 127.0
!    Line:      130
[1:81]  test_histogram__with_sample_rate:#6  "sample.mean"  pass
[1:82]  test_histogram__with_sample_rate:#7  "sample.count"  pass
[1:83]  test_histogram__with_sample_rate:#8  "sample.sum"  pass
[1:84]  test_histogram__capacity:#1  "histogram size"  pass
[1:85]  test_histogram__capacity:#2  "histogram value count"  pass
[1:86]  test_histogram__capacity:#3  "sample.min"  pass
[1:87]  test_histogram__capacity:#4  "sample.max"  pass
[1:88]  test_histogram__capacity:#5  "sample.count"  pass
[1:89]  test_histogram__capacity:#6  "histogram size"  pass
[1:90]  test_histogram__capacity:#7  "histogram value count"  pass
[1:91]  test_histogram__capacity:#8  "sample.min"  pass
[1:92]  test_histogram__capacity:#9  "sample.max"  pass
[1:93]  test_histogram__capacity:#10  "sample.count"  pass

--> 93 check(s), 84 ok, 9 failed (9.68%)

== Entering suite #2, "mstore: concurrency test for metrics hash table" ==

[2:1]  test_mstore__save:#1  "stored 15000 metrics in table"  pass
[2:2]  test_mstore__save:#2  "lookup all metrics from table"  pass

--> 2 check(s), 2 ok, 0 failed (0.00%)

== Entering suite #3, "atomic: atomic primitives" ==

[3:1]  test_atomic_spinlocks:#1  "spinlock doesn't race"  pass

--> 1 check(s), 1 ok, 0 failed (0.00%)

== Entering suite #4, "ftoa: double-to-string conversion" ==

[4:1]  test_ftoa:#1  "0"  pass
[4:2]  test_ftoa:#2  "15"  pass
[4:3]  test_ftoa:#3  "15.5"  pass
[4:4]  test_ftoa:#4  "15.505"  pass
[4:5]  test_ftoa:#5  "0.125"  pass
[4:6]  test_ftoa:#6  "1234.567"  pass
[4:7]  test_ftoa:#7  "100000"  pass
[4:8]  test_ftoa:#8  "0.999"  pass

--> 8 check(s), 8 ok, 0 failed (0.00%)

== Entering suite #5, "statsd: packet parsing" ==

[5:1]  test_statsd_msg__parse_strings:#1  "github.auth.fingerprint.sha1:1|c"  pass
[5:2]  test_statsd_msg__parse_strings:#2  "msg.value == expected"  pass
[5:3]  test_statsd_msg__parse_strings:#3  "msg.sample_rate == expected"  pass
[5:4]  test_statsd_msg__parse_strings:#4  "msg.modifiers == expected"  pass
[5:5]  test_statsd_msg__parse_strings:#5  "github.auth.fingerprint.sha1:1|c|@0.1"  pass
[5:6]  test_statsd_msg__parse_strings:#6  "msg.value == expected"  pass
[5:7]  test_statsd_msg__parse_strings:#7  "msg.sample_rate == expected"  pass
[5:8]  test_statsd_msg__parse_strings:#8  "msg.modifiers == expected"  pass
[5:9]  test_statsd_msg__parse_strings:#9  "github.auth.fingerprint.sha1:1|g"  pass
[5:10]  test_statsd_msg__parse_strings:#10  "msg.value == expected"  pass
[5:11]  test_statsd_msg__parse_strings:#11  "msg.sample_rate == expected"  pass
[5:12]  test_statsd_msg__parse_strings:#12  "msg.modifiers == expected"  pass
[5:13]  test_statsd_msg__parse_strings:#13  "lol:1|ms"  pass
[5:14]  test_statsd_msg__parse_strings:#14  "msg.value == expected"  pass
[5:15]  test_statsd_msg__parse_strings:#15  "msg.sample_rate == expected"  pass
[5:16]  test_statsd_msg__parse_strings:#16  "msg.modifiers == expected"  pass
[5:17]  test_statsd_msg__parse_strings:#17  "this.is.sparta:199812|C"  pass
[5:18]  test_statsd_msg__parse_strings:#18  "msg.value == expected"  pass
[5:19]  test_statsd_msg__parse_strings:#19  "msg.sample_rate == expected"  pass
[5:20]  test_statsd_msg__parse_strings:#20  "msg.modifiers == expected"  pass
[5:21]  test_statsd_msg__parse_strings:#21  "this.is.sparta:0012|h"  pass
[5:22]  test_statsd_msg__parse_strings:#22  "msg.value == expected"  pass
[5:23]  test_statsd_msg__parse_strings:#23  "msg.sample_rate == expected"  pass
[5:24]  test_statsd_msg__parse_strings:#24  "msg.modifiers == expected"  pass
[5:25]  test_statsd_msg__parse_strings:#25  "this.is.sparta:23.23|g"  pass
[5:26]  test_statsd_msg__parse_strings:#26  "msg.value == expected"  pass
[5:27]  test_statsd_msg__parse_strings:#27  "msg.sample_rate == expected"  pass
[5:28]  test_statsd_msg__parse_strings:#28  "msg.modifiers == expected"  pass
[5:29]  test_statsd_msg__parse_strings:#29  "this.is.sparta:0.232030|g"  pass
[5:30]  test_statsd_msg__parse_strings:#30  "msg.value == expected"  pass
[5:31]  test_statsd_msg__parse_strings:#31  "msg.sample_rate == expected"  pass
[5:32]  test_statsd_msg__parse_strings:#32  "msg.modifiers == expected"  pass
[5:33]  test_statsd_msg__parse_strings:#33  "this.are.some.floats:1234567.89|g"  pass
[5:34]  test_statsd_msg__parse_strings:#34  "msg.value == expected"  pass
[5:35]  test_statsd_msg__parse_strings:#35  "msg.sample_rate == expected"  pass
[5:36]  test_statsd_msg__parse_strings:#36  "msg.modifiers == expected"  pass
[5:37]  test_statsd_msg__parse_strings:#37  "this.are.some.floats:1234567.89|g|@0.025"  pass
[5:38]  test_statsd_msg__parse_strings:#38  "msg.value == expected"  pass
[5:39]  test_statsd_msg__parse_strings:#39  "msg.sample_rate == expected"  pass
[5:40]  test_statsd_msg__parse_strings:#40  "msg.modifiers == expected"  pass
[5:41]  test_statsd_msg__parse_strings:#41  "this.are.some.floats:1234567.89|g|@0.25"  pass
[5:42]  test_statsd_msg__parse_strings:#42  "msg.value == expected"  pass
[5:43]  test_statsd_msg__parse_strings:#43  "msg.sample_rate == expected"  pass
[5:44]  test_statsd_msg__parse_strings:#44  "msg.modifiers == expected"  pass
[5:45]  test_statsd_msg__parse_strings:#45  "this.are.some.floats:1234567.89|g|@0.01"  pass
[5:46]  test_statsd_msg__parse_strings:#46  "msg.value == expected"  pass
[5:47]  test_statsd_msg__parse_strings:#47  "msg.sample_rate == expected"  pass
[5:48]  test_statsd_msg__parse_strings:#48  "msg.modifiers == expected"  pass
[5:49]  test_statsd_msg__parse_strings:#49  "this.are.some.floats:1234567.89|g|@000.0100"  pass
[5:50]  test_statsd_msg__parse_strings:#50  "msg.value == expected"  pass
[5:51]  test_statsd_msg__parse_strings:#51  "msg.sample_rate == expected"  pass
[5:52]  test_statsd_msg__parse_strings:#52  "msg.modifiers == expected"  pass
[5:53]  test_statsd_msg__parse_strings:#53  "this.are.some.floats:1234567.89|g|@1.0"  pass
[5:54]  test_statsd_msg__parse_strings:#54  "msg.value == expected"  pass
[5:55]  test_statsd_msg__parse_strings:#55  "msg.sample_rate == expected"  pass
[5:56]  test_statsd_msg__parse_strings:#56  "msg.modifiers == expected"  pass
[5:57]  test_statsd_msg__parse_strings:#57  "this.are.some.floats:1234567.89|g|@1"  pass
[5:58]  test_statsd_msg__parse_strings:#58  "msg.value == expected"  pass
[5:59]  test_statsd_msg__parse_strings:#59  "msg.sample_rate == expected"  pass
[5:60]  test_statsd_msg__parse_strings:#60  "msg.modifiers == expected"  pass
[5:61]  test_statsd_msg__parse_strings:#61  "this.are.some.floats:1234567.89|g|@1."  pass
[5:62]  test_statsd_msg__parse_strings:#62  "msg.value == expected"  pass
[5:63]  test_statsd_msg__parse_strings:#63  "msg.sample_rate == expected"  pass
[5:64]  test_statsd_msg__parse_strings:#64  "msg.modifiers == expected"  pass
[5:65]  test_statsd_msg__parse_strings:#65  "this.are.some.floats:|g"  pass
[5:66]  test_statsd_msg__parse_strings:#66  "msg.value == expected"  pass
[5:67]  test_statsd_msg__parse_strings:#67  "msg.sample_rate == expected"  pass
[5:68]  test_statsd_msg__parse_strings:#68  "msg.modifiers == expected"  pass
[5:69]  test_statsd_msg__parse_strings:#69  "this.are.some.floats:1234567.89|g"  pass
[5:70]  test_statsd_msg__parse_strings:#70  "msg.value == expected"  pass
[5:71]  test_statsd_msg__parse_strings:#71  "msg.sample_rate == expected"  pass
[5:72]  test_statsd_msg__parse_strings:#72  "msg.modifiers == expected"  pass
[5:73]  test_statsd_msg__parse_strings:#73  "gauge.increment:+1|g"  pass
[5:74]  test_statsd_msg__parse_strings:#74  "msg.value == expected"  pass
[5:75]  test_statsd_msg__parse_strings:#75  "msg.sample_rate == expected"  pass
[5:76]  test_statsd_msg__parse_strings:#76  "msg.modifiers == expected"  pass
[5:77]  test_statsd_msg__parse_strings:#77  "gauge.decrement:-1|g"  pass
[5:78]  test_statsd_msg__parse_strings:#78  "msg.value == expected"  pass
[5:79]  test_statsd_msg__parse_strings:#79  "msg.sample_rate == expected"  pass
[5:80]  test_statsd_msg__parse_strings:#80  "msg.modifiers == expected"  pass
[5:81]  test_statsd_msg__parse_strings:#81  "this.are.some.floats:12.89.23|g"  pass
[5:82]  test_statsd_msg__parse_strings:#82  "this.are.some.floats:12.89|a"  pass
[5:83]  test_statsd_msg__parse_strings:#83  "this.are.some.floats:12.89|msdos"  pass
[5:84]  test_statsd_msg__parse_strings:#84  "this.are.some.floats:12.89g|g"  pass
[5:85]  test_statsd_msg__parse_strings:#85  "this.are.some.floats:12.89|"  pass
[5:86]  test_statsd_msg__parse_strings:#86  "this.are.some.floats:12.89"  pass
[5:87]  test_statsd_msg__parse_strings:#87  "this.are.some.floats:12.89 |g"  pass
[5:88]  test_statsd_msg__parse_strings:#88  "this.are.some.floats|g"  pass
[5:89]  test_statsd_msg__parse_strings:#89  "this.are.some.floats:1.0|g|1.0"  pass
[5:90]  test_statsd_msg__parse_strings:#90  "this.are.some.floats:1.0|g|0.1"  pass
[5:91]  test_statsd_msg__parse_strings:#91  "this.are.some.floats:1.0|g|@0.1.1"  pass
[5:92]  test_statsd_msg__parse_strings:#92  "this.are.some.floats:1.0|g|@0.1@"  pass
[5:93]  test_statsd_msg__parse_strings:#93  "this.are.some.floats:1.0|g|@0.1125.2"  pass
[5:94]  test_statsd_msg__parse_strings:#94  "this.are.some.floats:1.0|g|@0.1125.2"  pass
[5:95]  test_statsd_msg__parse_strings:#95  "this.are.some.floats:1.0|g|@1.23"  pass
[5:96]  test_statsd_msg__parse_strings:#96  "this.are.some.floats:1.0|g|@3.0"  pass
[5:97]  test_statsd_msg__parse_strings:#97  "this.are.some.floats:1.0|g|@-3.0"  pass
[5:98]  test_statsd_msg__parse_strings:#98  "this.are.some.floats:1.0|g|@-1.0"  pass
[5:99]  test_statsd_msg__parse_strings:#99  "this.are.some.floats:1.0|g|@-0.23"  pass
[5:100]  test_statsd_msg__parse_strings:#100  "this.are.some.floats:1.0|g|@0.0"  pass
[5:101]  test_statsd_msg__parse_strings:#101  "this.are.some.floats:1.0|g|@0"  pass

--> 101 check(s), 101 ok, 0 failed (0.00%)

==> 205 check(s) in 5 suite(s) finished after 0.00 second(s),
    196 succeeded, 9 failed (4.39%)

[FAILURE]
Makefile:56: recipe for target 'test' failed
make: *** [test] Error 1

Lone metrics stuck in rcvmmsg call

In the brubeck source, at /brubeck-master/src/samplers/statsd.c, line 37, the rcvmmsg call should have the MSG_WAITFORONE flag set in the 4th argument, which is currently zero. With this flag NOT set, a single udp msg does not receive, and without a timeout, the call blocks indefinitely, waiting for > 1 message. This can result in the loss of lone messages, like sparse calls with a single metric.

Most of the time, the metrics flow is significant, and these single messages are flushed with additional messages, and the rcvmmsg will work. If, however, a single metric is sent, the call blocks indefinitely, leaving the message stranded.

This can be reproduced by sending a single metric to a running instance of brubeck. The single metric fails to emit in the line or pickle protocol to graphite. When the MSG_WAITFORONE flag is added, the metric flows through as expected.

The workaround is to change the brubeck configuration by setting the value of the "multimsg" setting in /etc/brubeck/config.json to 1, and then restart brubeck. This configuration change deactivates the rcvmmsg call detailed above, falling back to rcvmsg. It's not clear that this will have any performance implication with multiple worker threads in the receive pool; however, it seems likely that there will be some impact on a busy metrics flow.

I have built and tested with the flag above, and it works fine. (I also have a Mac build for anyone who wants it).

1-byte heap overflow when recvmmsg() is used

statsd_run_recvmsg() subtracts 1 from the buffer size passed to recvfrom() to allow room for the null byte that will be appended by brubeck_statsd_msg_parse(). statsd_run_recvmmsg() doesn't do this, so the null byte is written past the end of the buffer if a MAX_PACKET_SIZE-byte packet is received.

Gauge overflow

Hi
Seems like brubeck got overflow, when handling gauge-value more, than 2^32 (4294967296).

(I did not find in statsd specification max values for gauge:
https://github.com/etsy/statsd/blob/master/docs/metric_types.md,
and in my production envs we often sent values more than 2^32)

How to reproduce:

  1. Send metric to brubeck and watch network packets (I use ngrep):
    echo "complex.delete_me.mem:4294967296|g" | nc -u -q1 127.0.0.1 8126 ... U 127.0.0.1:38487 -> 127.0.0.1:8126 complex.delete_me.mem:4294967296|g.

  2. Watch network traffic from brubeck to storage:
    T 10.0.2.9:53546 -> 10.9.192.2:2003 [AP] complex.delete_me.mem 0 1493135074. # brubeck send 0 instead of 4294967296

My investigation lead me to this function.
https://github.com/github/brubeck/blob/master/src/utils.c#L137

May you provide fix for that case?
Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.