Giter Club home page Giter Club logo

Comments (24)

fvbock avatar fvbock commented on July 3, 2024

hi phil,

thanks for your report!

the waitgroup going below zero should - as you describe - only happen after a server needs to get hammered. i was not happy with the recover solution there but it seemed to work... not for you it seems so let's look for a better solution.

do you have any test code you could provide to help me reproduce your issue?

cheers
_f

from endless.

justphil avatar justphil commented on July 3, 2024

Hi @fvbock,

sorry for my belated response. I'm currently quite busy over here.

I analyzed my logs very thoroughly and couldn't find any [STOP - Hammer Time] Forcefully shutting down parent. So this means that the for-loop that counts the WaitGroup down to zero (in the hammerTime() func) hasn't been executed in the crash scenario. Therefore my assumption is that there is no causal link between the hammering and the crash.

I wish that I could isolate the problem with a test case or something. But unfortunately it's not that easy. If I could do so it probably wouldn't be to hard to find a solution.

Meanwhile my Close() func looks like this.

func (w endlessConn) Close() error {
    // prevent the server from crashing (this is only a mitigation!)
    // TODO: find the root cause for the problem...
    defer func() {
        if err := recover(); err != nil {
            log.Println("endless WaitGroup went below 0 when closing connection", err)
        }
    }()
    w.server.wg.Done()
    return w.Conn.Close()
}

And from the log statement I try to figure out patterns that show up when the problem occurres. But to be honest that didn't show a real pattern til now. There are situations where we've got a lot of traffic and everything is fine, even for a longer period of time. And then there are situations where the server has to handle a couple of requests and the problem shows up. All in all very weird.

That's all I can tell you by now. Will continue to monitor the problem and inform you if I gain more information.

Bye,
Phil

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

I have same problem and can repeat it easily. That happens all the time i stress test my server and when i just reload my page quickly in the browser.

P.S.
Forgot to mention that server fails after stresstest is complete. While requests are still coming server is not failing.

from endless.

fvbock avatar fvbock commented on July 3, 2024

@flamedmg thanks! is this maybe related? johnniedoe@715b6ce i completely missed that PR :-( sounds like it could have something todo with it.

i will look into it tomorrow.

is the stresstest you're running somthing you could post in a gist so that i could use it?

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

I'm using wrk to do stress testing for me. Command looks like this: wrk
-t100 -c5000 -d120s http://www.mytest.com:8080

I'm not the owner of mytest.com, i overridden it in /etc/hosts file

Hope this will help. In the mean time i can revert changes you mentioned
and test again

2015-07-10 16:26 GMT+03:00 Florian von Bock [email protected]:

@flamedmg https://github.com/flamedmg thanks! is this maybe related?
johnniedoe/endless@715b6ce
https://github.com/johnniedoe/endless/commit/715b6ce676154f94e6ef4199b24f273b74681d8c
i completely missed that PR :-( sounds like it could have something todo
with it.

i will look into it tomorrow.

is the stresstest you're running somthing you could post in a gist so that
i could use it?


Reply to this email directly or view it on GitHub
https://github.com/fvbock/endless/issues/7#issuecomment-120410897.

Thanks & Regards
Dmitry

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

I figured out that johnniedoe@715b6ce is not accepted by you and added change myself to the local copy. Server still fails even after 3 second stress test

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

What i figured out is that Close method on endlessConn object is called multiple times for the same connection object. I discovered this by giving each connection unique identifier and seen them in the log at least twice. This does not happen all the time. The less number of connections is the less is the chance to get this behavior. 100 connections is enough to get it in 100% of cases on my machine.

from endless.

fvbock avatar fvbock commented on July 3, 2024

hi @justphil @flamedmg

i tried a bunch of things, but i could not reproduce this until now. i used variations of this https://github.com/fvbock/endless/blob/master/examples/testserver.go one. i dropped the delay in the handler and used 1k, 10k, 100k and 1000k payloads to send that i created from /dev/urandom

i tested with the server being restarted while running the test and without.

i did use ab instead of wrt: ab -c 2000 -n 2000000 http://localhost:4242/foo

i am running
Linux 3.19.0-21-generic #21-Ubuntu SMP
go version go1.4.2 linux/amd64

what are you guys running?

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

Your testserver is not failing under wrk too. I tried several timeouts values.

I'm running MintLinux 17.1 and go 1.4.2 linux/amd64

from endless.

fvbock avatar fvbock commented on July 3, 2024

that's a start. can you post (some or all) code of your test server? what are the differences....? i guess yours is more complex?

from endless.

fvbock avatar fvbock commented on July 3, 2024

Your testserver is not failing under wrk too. I tried several timeouts values.

@justphil can you see any general difference between the basic https://github.com/fvbock/endless/blob/master/examples/testserver.go and your server code?

from endless.

justphil avatar justphil commented on July 3, 2024

Sorry, @fvbock @flamedmg I'm currently very busy due to my job. Will take a look at it on the weekend. And will post details about my system configuration as well.

BTW.
I think that I can now see an emerging crash pattern. The first process seems to run "endlessly" without any problems. The problems start to occur after the first hot redeployment when the parent process passes the listening socket to the child process. From this point of time the mentioned problem starts to show up.

from endless.

fvbock avatar fvbock commented on July 3, 2024

@justphil @flamedmg sorry for taking a while again... i tried a few more things but with a server based on the testserver i was not able to reproduce the behaviour both of you observed.
any code that produces it would be helpful at this point.

from endless.

ledzep2 avatar ledzep2 commented on July 3, 2024

I think I found the problem. I added some code in endlessConn.Close to identify the connection being closed and the call stack. It turns out that a connection is closed twice. Once from net/http/server.go:274 and once from net/http/server.go:1071.

So I guess whenever the connection got interrupted while writing, it will be closed twice. But it doesn't crash the app immediately. The crash happens when the the last few connections (depending on how many times it happened) are about to get closed. Here are the stack trace of both closing actions. As you can see they happened almost at the same time.

2015/08/11 07:22:59 Closing connection #379
/root/repo/go/src/github.com/fvbock/endless/endless.go:514 (0x4c0c0f)
<autogenerated>:16 (0x4c1de2)
/usr/lib/go/src/net/http/server.go:274 (0x493eed)
/usr/lib/go/src/bufio/bufio.go:562 (0x50e375)
/usr/lib/go/src/net/http/server.go:1005 (0x498d13)
/usr/lib/go/src/net/http/server.go:977 (0x498a77)
<autogenerated>:47 (0x4b1c29)
/usr/lib/go/src/io/io.go:364 (0x4c6af8)
/usr/lib/go/src/net/http/server.go:391 (0x494a39)
/usr/lib/go/src/bufio/bufio.go:433 (0x50da94)
/usr/lib/go/src/io/io.go:354 (0x4c6932)
/root/repo/go/src/bs2proxy/proxy.go:221 (0x407179)
/root/repo/go/src/bs2proxy/controller.go:273 (0x404537)
/root/repo/go/src/bs2proxy/main.go:33 (0x40afa5)
/usr/lib/go/src/net/http/server.go:1265 (0x49a2b1)
/usr/lib/go/src/net/http/server.go:1541 (0x49ba1d)
/usr/lib/go/src/net/http/server.go:1703 (0x49c33a)
/usr/lib/go/src/net/http/server.go:1204 (0x499e07)
/usr/lib/go/src/runtime/asm_amd64.s:2232 (0x448f51)

In between I got an error from io.Copy complaining bout "Broken pipe" which pretty much explained what just caused the closing action.

2015/08/11 07:22:59 Closing connection #379
/root/repo/go/src/github.com/fvbock/endless/endless.go:514 (0x4c0c0f)
<autogenerated>:16 (0x4c1de2)
/usr/lib/go/src/net/http/server.go:1071 (0x4990b0)
/usr/lib/go/src/net/http/server.go:1134 (0x4ac0fb)
/usr/lib/go/src/net/http/server.go:1217 (0x499ab9)
/usr/lib/go/src/runtime/asm_amd64.s:2232 (0x448f51)

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

@ledzep2 That's exactly what i've found, but i can't repeat that on a small sample app, just on my pretty large code base.

from endless.

ledzep2 avatar ledzep2 commented on July 3, 2024

@flamedmg Did you try manually interrupting the connection while transfering data (like killall -9 wrk)? Theoretically that should do the trick.

from endless.

flamedmg avatar flamedmg commented on July 3, 2024

no, i not terminated the process in any way, please check my earlier messages in this thread. I found that issue during load testing. What that tool is doing is opening specified number of keep-alive connections and making requests. After that it closes them. During closing i found that some connections are closed two or even more times. That makes opened connection counter negative and library fails. I'm still waiting on a solution, until our app not in production mode. When it comes to production i think i will do what @justphil did - basically catching and silencing all exception in that part of code. Ugly, but it will work or will work on rewriting that logic.

from endless.

ledzep2 avatar ledzep2 commented on July 3, 2024

@flamedmg I read your previous posts. I'm trying to locate the problem in the source code and reproduce it. Report back later.

from endless.

ledzep2 avatar ledzep2 commented on July 3, 2024

@justphil @flamedmg I reproduced it folks. Replace the handler in testserver.go with the following

func handler(w http.ResponseWriter, r *http.Request) {
    buf := make([]byte, 1000*1000*50)
    br := bytes.NewReader(buf)
    io.Copy(w, br)
}

Then wget /foo and interrupt it when it begins with ctrl+c. Crashes everytime.

from endless.

ledzep2 avatar ledzep2 commented on July 3, 2024

I created a pull request for this with the commit above. #11

from endless.

fvbock avatar fvbock commented on July 3, 2024

with the hander @ledzep2 posted i was also able to reproduce the problem and #11 fixed it. i merged the PR.

@justphil @flamedmg can you confirm that it fixes the problem in your scenarios too?

from endless.

fvbock avatar fvbock commented on July 3, 2024

@justphil @flamedmg closing this for now - please let me know if you still experience problems.

from endless.

justphil avatar justphil commented on July 3, 2024

Thank you for the further investigation @fvbock @ledzep2 and @flamedmg.
Will try the fix in our staging environment and report back if there are still problems.

Bye, Phil

from endless.

fzerorubigd avatar fzerorubigd commented on July 3, 2024

I think this, still happen. see #13

from endless.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.