Comments (24)
hi phil,
thanks for your report!
the waitgroup going below zero should - as you describe - only happen after a server needs to get hammered. i was not happy with the recover solution there but it seemed to work... not for you it seems so let's look for a better solution.
do you have any test code you could provide to help me reproduce your issue?
cheers
_f
from endless.
Hi @fvbock,
sorry for my belated response. I'm currently quite busy over here.
I analyzed my logs very thoroughly and couldn't find any [STOP - Hammer Time] Forcefully shutting down parent. So this means that the for-loop that counts the WaitGroup down to zero (in the hammerTime()
func) hasn't been executed in the crash scenario. Therefore my assumption is that there is no causal link between the hammering and the crash.
I wish that I could isolate the problem with a test case or something. But unfortunately it's not that easy. If I could do so it probably wouldn't be to hard to find a solution.
Meanwhile my Close()
func looks like this.
func (w endlessConn) Close() error {
// prevent the server from crashing (this is only a mitigation!)
// TODO: find the root cause for the problem...
defer func() {
if err := recover(); err != nil {
log.Println("endless WaitGroup went below 0 when closing connection", err)
}
}()
w.server.wg.Done()
return w.Conn.Close()
}
And from the log statement I try to figure out patterns that show up when the problem occurres. But to be honest that didn't show a real pattern til now. There are situations where we've got a lot of traffic and everything is fine, even for a longer period of time. And then there are situations where the server has to handle a couple of requests and the problem shows up. All in all very weird.
That's all I can tell you by now. Will continue to monitor the problem and inform you if I gain more information.
Bye,
Phil
from endless.
I have same problem and can repeat it easily. That happens all the time i stress test my server and when i just reload my page quickly in the browser.
P.S.
Forgot to mention that server fails after stresstest is complete. While requests are still coming server is not failing.
from endless.
@flamedmg thanks! is this maybe related? johnniedoe@715b6ce i completely missed that PR :-( sounds like it could have something todo with it.
i will look into it tomorrow.
is the stresstest you're running somthing you could post in a gist so that i could use it?
from endless.
I'm using wrk to do stress testing for me. Command looks like this: wrk
-t100 -c5000 -d120s http://www.mytest.com:8080
I'm not the owner of mytest.com, i overridden it in /etc/hosts file
Hope this will help. In the mean time i can revert changes you mentioned
and test again
2015-07-10 16:26 GMT+03:00 Florian von Bock [email protected]:
@flamedmg https://github.com/flamedmg thanks! is this maybe related?
johnniedoe/endless@715b6ce
https://github.com/johnniedoe/endless/commit/715b6ce676154f94e6ef4199b24f273b74681d8c
i completely missed that PR :-( sounds like it could have something todo
with it.i will look into it tomorrow.
is the stresstest you're running somthing you could post in a gist so that
i could use it?—
Reply to this email directly or view it on GitHub
https://github.com/fvbock/endless/issues/7#issuecomment-120410897.
Thanks & Regards
Dmitry
from endless.
I figured out that johnniedoe@715b6ce is not accepted by you and added change myself to the local copy. Server still fails even after 3 second stress test
from endless.
What i figured out is that Close method on endlessConn object is called multiple times for the same connection object. I discovered this by giving each connection unique identifier and seen them in the log at least twice. This does not happen all the time. The less number of connections is the less is the chance to get this behavior. 100 connections is enough to get it in 100% of cases on my machine.
from endless.
i tried a bunch of things, but i could not reproduce this until now. i used variations of this https://github.com/fvbock/endless/blob/master/examples/testserver.go one. i dropped the delay in the handler and used 1k, 10k, 100k and 1000k payloads to send that i created from /dev/urandom
i tested with the server being restarted while running the test and without.
i did use ab
instead of wrt
: ab -c 2000 -n 2000000 http://localhost:4242/foo
i am running
Linux 3.19.0-21-generic #21-Ubuntu SMP
go version go1.4.2 linux/amd64
what are you guys running?
from endless.
Your testserver is not failing under wrk too. I tried several timeouts values.
I'm running MintLinux 17.1 and go 1.4.2 linux/amd64
from endless.
that's a start. can you post (some or all) code of your test server? what are the differences....? i guess yours is more complex?
from endless.
Your testserver is not failing under wrk too. I tried several timeouts values.
@justphil can you see any general difference between the basic https://github.com/fvbock/endless/blob/master/examples/testserver.go and your server code?
from endless.
Sorry, @fvbock @flamedmg I'm currently very busy due to my job. Will take a look at it on the weekend. And will post details about my system configuration as well.
BTW.
I think that I can now see an emerging crash pattern. The first process seems to run "endlessly" without any problems. The problems start to occur after the first hot redeployment when the parent process passes the listening socket to the child process. From this point of time the mentioned problem starts to show up.
from endless.
@justphil @flamedmg sorry for taking a while again... i tried a few more things but with a server based on the testserver i was not able to reproduce the behaviour both of you observed.
any code that produces it would be helpful at this point.
from endless.
I think I found the problem. I added some code in endlessConn.Close to identify the connection being closed and the call stack. It turns out that a connection is closed twice. Once from net/http/server.go:274 and once from net/http/server.go:1071.
So I guess whenever the connection got interrupted while writing, it will be closed twice. But it doesn't crash the app immediately. The crash happens when the the last few connections (depending on how many times it happened) are about to get closed. Here are the stack trace of both closing actions. As you can see they happened almost at the same time.
2015/08/11 07:22:59 Closing connection #379
/root/repo/go/src/github.com/fvbock/endless/endless.go:514 (0x4c0c0f)
<autogenerated>:16 (0x4c1de2)
/usr/lib/go/src/net/http/server.go:274 (0x493eed)
/usr/lib/go/src/bufio/bufio.go:562 (0x50e375)
/usr/lib/go/src/net/http/server.go:1005 (0x498d13)
/usr/lib/go/src/net/http/server.go:977 (0x498a77)
<autogenerated>:47 (0x4b1c29)
/usr/lib/go/src/io/io.go:364 (0x4c6af8)
/usr/lib/go/src/net/http/server.go:391 (0x494a39)
/usr/lib/go/src/bufio/bufio.go:433 (0x50da94)
/usr/lib/go/src/io/io.go:354 (0x4c6932)
/root/repo/go/src/bs2proxy/proxy.go:221 (0x407179)
/root/repo/go/src/bs2proxy/controller.go:273 (0x404537)
/root/repo/go/src/bs2proxy/main.go:33 (0x40afa5)
/usr/lib/go/src/net/http/server.go:1265 (0x49a2b1)
/usr/lib/go/src/net/http/server.go:1541 (0x49ba1d)
/usr/lib/go/src/net/http/server.go:1703 (0x49c33a)
/usr/lib/go/src/net/http/server.go:1204 (0x499e07)
/usr/lib/go/src/runtime/asm_amd64.s:2232 (0x448f51)
In between I got an error from io.Copy complaining bout "Broken pipe" which pretty much explained what just caused the closing action.
2015/08/11 07:22:59 Closing connection #379
/root/repo/go/src/github.com/fvbock/endless/endless.go:514 (0x4c0c0f)
<autogenerated>:16 (0x4c1de2)
/usr/lib/go/src/net/http/server.go:1071 (0x4990b0)
/usr/lib/go/src/net/http/server.go:1134 (0x4ac0fb)
/usr/lib/go/src/net/http/server.go:1217 (0x499ab9)
/usr/lib/go/src/runtime/asm_amd64.s:2232 (0x448f51)
from endless.
@ledzep2 That's exactly what i've found, but i can't repeat that on a small sample app, just on my pretty large code base.
from endless.
@flamedmg Did you try manually interrupting the connection while transfering data (like killall -9 wrk)? Theoretically that should do the trick.
from endless.
no, i not terminated the process in any way, please check my earlier messages in this thread. I found that issue during load testing. What that tool is doing is opening specified number of keep-alive connections and making requests. After that it closes them. During closing i found that some connections are closed two or even more times. That makes opened connection counter negative and library fails. I'm still waiting on a solution, until our app not in production mode. When it comes to production i think i will do what @justphil did - basically catching and silencing all exception in that part of code. Ugly, but it will work or will work on rewriting that logic.
from endless.
@flamedmg I read your previous posts. I'm trying to locate the problem in the source code and reproduce it. Report back later.
from endless.
@justphil @flamedmg I reproduced it folks. Replace the handler in testserver.go with the following
func handler(w http.ResponseWriter, r *http.Request) {
buf := make([]byte, 1000*1000*50)
br := bytes.NewReader(buf)
io.Copy(w, br)
}
Then wget /foo and interrupt it when it begins with ctrl+c. Crashes everytime.
from endless.
I created a pull request for this with the commit above. #11
from endless.
with the hander @ledzep2 posted i was also able to reproduce the problem and #11 fixed it. i merged the PR.
@justphil @flamedmg can you confirm that it fixes the problem in your scenarios too?
from endless.
@justphil @flamedmg closing this for now - please let me know if you still experience problems.
from endless.
Thank you for the further investigation @fvbock @ledzep2 and @flamedmg.
Will try the fix in our staging environment and report back if there are still problems.
Bye, Phil
from endless.
I think this, still happen. see #13
from endless.
Related Issues (20)
- why sleep 1 second in your multi-port example? HOT 1
- FD leak on linux after restart HOT 1
- can't this project build in Windows? HOT 6
- systemd init script HOT 3
- can't work with systemd HOT 2
- accept tcp 127.0.0.1:4242: use of closed network connection HOT 3
- endless.NewServer() Struct Variable
- how to use with r.Run(":8080") and r.RunTLS(":8443", "server.pem", "server.key") HOT 1
- Support Http2.0 HOT 1
- add go mod
- fork() should use cmd.Run() rather than cmd.Start() HOT 1
- go build command-line-arguments: copying /tmp/go-build042781492/b001/exe/a.out: open main: text file busy
- net.FileListener error: file file+net : getsockopt: socket operation on non-socket
- POST_SIGNAL shutdown server not run
- I tried to get the app to send a SIGHUP signal to reload itself and got a "text file busy" error
- Build for Windows on compilers smaller than go1.10 HOT 1
- windows endless.go:64:11: undefined: syscall.SIGUSR1 HOT 1
- Abandoned project? HOT 3
- reload daemon then http server can't listen on other port
- can't use go build, show too many errors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from endless.