Giter Club home page Giter Club logo

statsrelay's People

Contributors

denen99 avatar jjneely avatar szibis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statsrelay's Issues

Multiple StatsRelay daemons behind LVS

Hey Jack,

Thanks for releasing statsrelay, this is a great tool! I'd like to run multiple statsrelay daemons behind a simple UDP load balancer with LVS in roundrobin mode and was wondering would the statsrelay daemons be able to share the same hash table? Would they each send the same metrics to the same backend server or would they each maintain their own hash table?

Thanks!
Dave

High number of active flow counts in load balancer

I am running statsrelay as repeater daemon in my application boxes(ec2 m4.large) and forwarding the udp packets to group of boxes listening behind aws network load balancer(udp). This is opening around 50K connection flows in the NLB. The count of metrics published by the application boxes is around 10K/second. What is the reason behind opening such a high number of flows?

panic: runtime error: slice bounds out of range

Hey Guys,

I'm facing a bit of a strange problem at the moment, a statsrelay process is crashing at one of our sites for some reason. I've run the daemon in verbose mode and this is the error message just after it crashes.

Could this be the result of a malformed metric name received?

2015/10/27 12:15:01 Procssed 11261 metrics. Running total: 4006453. Metrics/sec: 14411

panic: runtime error: slice bounds out of range

goroutine 384 [running]:
runtime.panic(0x53b7c0, 0x624c6f)
        /usr/lib/golang/src/pkg/runtime/panic.c:279 +0xf5
main.getMetricName(0xc208312cbb, 0x30, 0xbb345, 0x0, 0x0)
        /admin/golang/statsrelay/statsrelay.go:79 +0xa6
main.handleBuff(0xc2082ce000, 0xcceb5, 0x100000)
        /admin/golang/statsrelay/statsrelay.go:144 +0x9d8
created by main.runServer
        /admin/golang/statsrelay/statsrelay.go:281 +0x1db

goroutine 16 [select]:
main.runServer(0x7fff8c1a06b7, 0xc, 0x2ee1)
        /admin/golang/statsrelay/statsrelay.go:278 +0x2db
main.main()
        /admin/golang/statsrelay/statsrelay.go:346 +0x694

goroutine 19 [finalizer wait, 4 minutes]:
runtime.park(0x416580, 0x6283d8, 0x626f09)
        /usr/lib/golang/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x6283d8, 0x626f09)
        /usr/lib/golang/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
        /usr/lib/golang/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
        /usr/lib/golang/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall, 4 minutes]:
os/signal.loop()
        /usr/lib/golang/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
        /usr/lib/golang/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 21 [IO wait]:
net.runtime_pollWait(0x7fd668027730, 0x72, 0x0)
        /usr/lib/golang/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc20802c220, 0x72, 0x0, 0x0)
        /usr/lib/golang/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc20802c220, 0x0, 0x0)
        /usr/lib/golang/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).Read(0xc20802c1c0, 0xc20846fd3f, 0xfc2c1, 0xfc2c1, 0x0, 0x7fd6680262b8, 0xb)
        /usr/lib/golang/src/pkg/net/fd_unix.go:242 +0x34c
net.(*conn).Read(0xc20803c028, 0xc20846fd3f, 0xfc2c1, 0xfc2c1, 0x4d, 0x0, 0x0)
        /usr/lib/golang/src/pkg/net/net.go:122 +0xe7
main.readUDP(0x7fff8c1a06b7, 0xc, 0x2ee1, 0xc208070000)
        /admin/golang/statsrelay/statsrelay.go:240 +0x6fa
created by main.runServer
        /admin/golang/statsrelay/statsrelay.go:275 +0x132

Make StatsRelay detect if StatsD Daemons are Alive

The current code base does nothing to detect or react to StatsD daemons that are not alive. The UDP StatsD protocol is designed to be fire-and-forget and offers no way to detect if the other side has received the packet.

StatsD daemons have a TCP administrative interface that's probably very useful for checking if the process is alive. That may be of help with this issue.

Things to think about:

  • How do we configure this? Command line representation? I'd rather not require a config file if possible -- although I'm not opposed to it.
  • What do we do with metrics destined for a down StatsD daemon? Buffer them? Probably redirect them to the next available daemon as time stamp information is gathered when the packet is received and isn't in the packet. This may cause inconsistent data in upstream Graphite when/if multiple statsd daemons submit the same metric during hash-ring changes. But probably the least bad situation.
  • What do we do when all statsd daemons are dead? Log loudly and drop packets?

metric name consistent hashing distribution

hi,

we are having issue with statsrelay - how it distributes metrics between statsite hosts. We run 4 statsrelay servers, and 5 statsite servers at the baclkend. All are configured identically.

It appears, that one statsite node is always getting higher amount of metrics, look at the is-1941b :

statsrelay

We tried to add INSTANCE values (in 'HOST:PORT:INSTANCE') , but it did not make any visible effect.

Our current idea is that consistent hashing is not distributing metrics in efficient way.

We sometimes use very long metric names, for example:
env.analyze.document_analyzer.analyzeDocument.entities.InternetDomainName.source.DataProviderNameHERE.DocumentNameHashHere.occurrences

In our case, it looks like all metric names prefixed env.analyze.document_analyzer.analyzeDocument.entities are forwarded to the same single statsite backend. Which is not good, we would expect those metric names to be distributed between different statsite backends.

Any ideas how to solve ?

Question: statsrelay dropping packet?.

I have started testing statsrelay in our envioment by using statsd repeater. Onething I noticed is difference in metrics recieved in graphite vs statsd proxy.

image
statsd repeater -> statsrelay -> statsd > graphite ^ [currently using one statsrelay and statsd]

And the difference is huge when we add more statsd backends.
The same graphs works fine when I replace statsrelay with statsd. [statsd repeater -> statsd > graphite]
Any thoughts on this @jjneely @szibis

panic: runtime error: index out of range

Hi Guys,

For some reason I've noticed the statsrelay processes for one of our sites is crashing every 10-20 minutes, this is the error:

panic: runtime error: index out of range

goroutine 6 [running]:
main.readUDP(0x7fffc2e446b9, 0xc, 0x2ee1, 0xc2080640c0)
        /admin/downloads/statsrelay/latest/golang/statsrelay/statsrelay.go:276 +0x9a2
created by main.runServer
        /admin/downloads/statsrelay/latest/golang/statsrelay/statsrelay.go:309 +0x174

goroutine 1 [select]:
main.runServer(0x7fffc2e446b9, 0xc, 0x2ee1)
        /admin/downloads/statsrelay/latest/golang/statsrelay/statsrelay.go:312 +0x396
main.main()
        /admin/downloads/statsrelay/latest/golang/statsrelay/statsrelay.go:382 +0x762

goroutine 5 [syscall, 3 minutes]:
os/signal.loop()
        /usr/lib/golang/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
        /usr/lib/golang/src/os/signal/signal_unix.go:27 +0x35

Has anyone come across this problem before?

Thanks,
Dave

Is statsrelay v0.0.9 compatible with buckytools v0.4.2?

I have the following architecture in-place:

statsd metrics -> statsrelay -> x2 gostatsd -> x2 go-carbon

As far as I understood, statsrelay uses fnv1a hashing algorithm for hash ring, but buckytools inconsistent command shows inconsistency with all supported (carbon, fnv1a, jump_fnv1a) algorithms.

$ statsrelay -p=8125 10.0.0.1:18125:node-1 10.0.0.2:18125:node-2

$ echo "foo.bar.bob:1|c" | nc -w1 -u 10.0.0.1 8125   <-- landed on node-2
$ echo "foo.bar.john:1|c" | nc -w1 -u 10.0.0.1 8125  <-- landed on node-1

$ buckyd -node 10.0.0.1 -hash fnv1a 10.0.0.1:2003=node-1 10.0.0.2:2003=node-2
$ buckyd -node 10.0.0.2 -hash fnv1a 10.0.0.1:2003=node-1 10.0.0.2:2003=node-2

$ bucky inconsistent -f | grep foo
2020/09/02 17:42:35 10.0.0.1:4242 returned 110656 metrics
2020/09/02 17:42:35 10.0.0.2:4242 returned 109531 metrics
2020/09/02 17:42:35 Hashing...
2020/09/02 17:42:35 Hashing time was: 0s
2020/09/02 17:42:35 36404 inconsistent metrics found on 10.0.0.1:4242
2020/09/02 17:42:35 35111 inconsistent metrics found on 10.0.0.2:4242
10.0.0.1:4242: stats.foo.bar.john
10.0.0.1:4242: stats_counts.foo.bar.john

panic: runtime error: slice bounds out of range

I've run into a strange problem recently which I did not have before and am slowly loosing my mind trying to get to the bottom of it :)

This is my test environment (using RHEL 7):

I'm running statsrelay with the following command options on serverx (2 backend statsd hosts):

statsrelay -bind=10.0.0.1 -port=12001 -prefix=serverx -verbose=true 10.0.0.10:12000 10.0.0.20:12000
2015/05/12 15:11:59 Listening on 10.0.0.1:12001
2015/05/12 15:11:59 Setting socket read buffer size to: 212992
2015/05/12 15:11:59 Rock and Roll!

Then on my test client I'm sending it a statsd metric using netcat like so:

echo "a.b.c:1|c" | nc -w 1 -u 10.0.0.1 12001

As soon as the metric hits serverx on 12001 (UDP), statsrelay crashes and the following appears in the console:

panic: runtime error: slice bounds out of range

goroutine 27 [running]:
runtime.panic(0x53b7c0, 0x624c6f)
        /usr/lib/golang/src/pkg/runtime/panic.c:279 +0xf5
main.getMetricName(0xc208380011, 0x0, 0xfffef, 0x0, 0x0)
        /home/dave/statsrelay/statsrelay.go:79 +0xa6
main.handleBuff(0xc208380000, 0x11, 0x100000)
        /home/dave/statsrelay/statsrelay.go:139 +0x1ea
created by main.runServer
        /home/dave/statsrelay/statsrelay.go:276 +0x1db

goroutine 16 [select]:
main.runServer(0x7fffec2bc801, 0x7, 0x2ee1)
        /home/dave/statsrelay/statsrelay.go:273 +0x2db
main.main()
        /home/dave/statsrelay/statsrelay.go:341 +0x694

goroutine 19 [finalizer wait]:
runtime.park(0x416570, 0x6283d8, 0x626f09)
        /usr/lib/golang/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x6283d8, 0x626f09)
        /usr/lib/golang/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
        /usr/lib/golang/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
        /usr/lib/golang/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall]:
os/signal.loop()
        /usr/lib/golang/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
        /usr/lib/golang/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 21 [IO wait]:
net.runtime_pollWait(0x7fc7fa7be708, 0x72, 0x0)
        /usr/lib/golang/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc20802c060, 0x72, 0x0, 0x0)
        /usr/lib/golang/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc20802c060, 0x0, 0x0)
        /usr/lib/golang/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).Read(0xc20802c000, 0xc208480000, 0x100000, 0x100000, 0x0, 0x7fc7fa7bd2b8, 0xb)
        /usr/lib/golang/src/pkg/net/fd_unix.go:242 +0x34c
net.(*conn).Read(0xc20803a018, 0xc208480000, 0x100000, 0x100000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/pkg/net/net.go:122 +0xe7
main.readUDP(0x7fffec2bc801, 0x7, 0x2ee1, 0xc208054000)
        /home/dave/statsrelay/statsrelay.go:235 +0x6fa
created by main.runServer
        /home/dave/statsrelay/statsrelay.go:270 +0x132

Any help/pointers would be greatly appreciated! Most likely I've completely missed the boat here and doing something wrong, have been stuck with this issue for a couple hours now!

Thanks!
Dave

Issue with UPD input message cutting

From the beginning of statsrelay, there is a problem with handling messages in UDP input.

What I observe is that with some clients messages are sliced into 1024 parts, or bigger depending on buffer set in system or in tool - example from nc:

cat test | nc -w 1 -u 127.0.0.1 8125

One file with multiple metrics inside each with \n

2017/04/19 10:55:29 Buffer from server run
2017/04/19 10:55:29 input buffer debug &jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-re
1024 bytes read from 127.0.0.1:56579
2017/04/19 10:55:29 input buffer debug &quests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:4|g
jetty.handler.put-requests.15MinuteRate:0|g
jetty.handler.put-requests.1MinuteRate:0|g
jetty.handler.put-requests.10MinuteRate:

1024 bytes read from 127.0.0.1:56579

This will produce very bad things like metrics in multiple random places with multiple random locations in graphite. When we add tags there are sliced with wrong places and generates multiple error metrics.
I found this when I start to use nginx-statsd with statsrelay with a huge number of metrics and multiple trash metrics starting to appear in graphite.

Setting -bufsize is ok but some apps not respecting this, setting net.core.rmem_max is not right thing always and looks like this is not solving this problem for all cases.

I think in this scenario better will be to test somehow that whole line is not in the buffer, maybe in this place?

https://github.com/jjneely/statsrelay/blob/master/statsrelay.go#L374

@jjneely can you take a look a tell what you thinking about this ? You can test it using nc with a bigger file containing multiple new lines of metrics.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.