Giter Club home page Giter Club logo

Comments (22)

brianlmoon avatar brianlmoon commented on September 18, 2024 2

Release 2.0.0 with these changes and more. https://github.com/brianlmoon/net_gearman/releases/tag/2.0.0

from net_gearman.

j03k64 avatar j03k64 commented on September 18, 2024 1

@bdeshong fwiw, we've been running that error code handling in prod since Oct 13, 2017 with no issues. I've just been running off my own fork: tave@b4c7eb3

I don't think that change is MacOS specific, it is just more prevalent on MacOS.
image

source

So it should be fine to ignore these error codes.

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

You may end up with a lot of CLOSE_WAIT connections on the gearmand server. What is your use case where the connection close time is a performance issue? Our workers run for an hour, staying connected the whole time.

from net_gearman.

caffeinejolt avatar caffeinejolt commented on September 18, 2024

My workers do stay connected most of the time. Its the clients (or whatever they are called in the gearman world - the job submitters) that connect/disconnect frequently. The line causes the disconnection to take an extra 1-2 seconds for the clients (end-users see the latency). I see the comment here: http://php.net/manual/en/function.socket-close.php recommending the approach you take and that does make sense. I may just end up sticking with the older version that did not support non-blocking ops since you are right - I could end up with a lot of CLOSE_WAIT connections.

I wish I had more constructive feedback - I have never dealt with non-blocking sockets. Do you see the disconnection lag with/without that line when you run clients?

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

I did not think about the client side. I do not see that lag. What OS and PHP version are you using?

from net_gearman.

caffeinejolt avatar caffeinejolt commented on September 18, 2024

Ok - I dug some further and am not sure the issue is specifically socket_set_block any more. However, there is a speed issue compared to the old version I run. Here is the difference in run times with only swapping out the version of net_gearman:

New:
real 0m1.161s
user 0m0.011s
sys 0m0.005s

Old (uses blocking sockets):
real 0m0.150s
user 0m0.015s
sys 0m0.006s

I am not sure what is causing the slowdown - but it seems to be somewhere in Connection::close - here is some profiling I ran: https://drive.google.com/file/d/0B02yOk4TFZH9RHd5SGQ2bTAxSDg/view?usp=sharing

So when I remove socket_set_block, it spends no time in the Connection::close do/while loop whereas when socket_set_block is used, it spends 88% of total time in the Connection::close do/while loop. So the major slowdown is here in Connection::close:

        do {
            $buf = @socket_read($this->socket, 8192);
        } while (strlen($buf) > 0);

When I remove those three lines - it runs much faster:

real 0m0.255s
user 0m0.014s
sys 0m0.006s

So narrowing things down a bit more!

from net_gearman.

caffeinejolt avatar caffeinejolt commented on September 18, 2024

PHP 5.3.8

from net_gearman.

caffeinejolt avatar caffeinejolt commented on September 18, 2024

Linux 2.6

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

PHP 5.3.8 is quite old. It has been end of life for over 3 years. I think I have not used it for over 5 years. This code was definitely not built or tested against it. I should update the composer requirements to indicate a minimum version of 5.6.

If you are stuck on PHP 5.3, you could try removing the @ on $buf = @socket_read($this->socket, 8192); and see if there are errors not getting caught there. You could also put a timer in that loop to see if it's looping and reading data off (which you should understand why that is happening) or if the socket_read is taking 1 second.

Are you passing in a timeout of 1 second perhaps when you create the connection?

from net_gearman.

caffeinejolt avatar caffeinejolt commented on September 18, 2024

Totally valid point on the PHP version - I will update to latest 5.6.x and test this out again and get back to you. Thanks again.

from net_gearman.

bdeshong avatar bdeshong commented on September 18, 2024

Can you explain to use why Net_Gearman_Connection::close() calls socket_set_block()?

We're using a two-second timeout on our end, and we're finding that, upon close() call, we're just blocked waiting to read from the socket for two seconds.

See the code in question here.

If we comment out the call to socket_set_block(), the close() method completes super fast. Even the 250ms or 1000ms blocking time makes a difference to us when we're trying to submit jobs as quickly as possible.

What was the thinking behind setting the socket to blocking? Or might it make sense to, at instantiation time or somehow, letting the caller indicate that they don't want to block when closing?

Thanks in advance!

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

I will be honest, all of those changes were in response to an outage we had where workers, not clients were having connection issues that caused gearmand to fill up with CLOSE_WAIT connections. We use a 200ms timeout on the client side, so we likely never feel this problem. Looking at our internal repo where all those changes were originally made, the close function evolved over time and it's possible that not all of those tactics are required to get a clean shtudown. I am going to do some testing.

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

@bdeshong Can you give the https://github.com/brianlmoon/net_gearman/tree/timeout_close_refactor branch a try and see if that resolves your issue? Setting the socket to blocking before closing is still the right choice. Otherwise, you don't really know if it closes and you can end up with a gearmand that runs out of file descriptors because of a ton of CLOSE_WAIT connections. The issue appears to be that the read loop was waiting on the timeout which is silly. If we can close the connection clean, then we don't need to try to read the buffer. If we are going to read the buffer, there is no reason to wait for whatever timeout was set for normal operations. We use a much lower timeout now when reading the buffer. But, in my tests, I never hit it. 99.9999% of the time the connection will close fine and the buffer read is never needed.

Further reading:

from net_gearman.

bdeshong avatar bdeshong commented on September 18, 2024

@brianlmoon Ah, this all makes sense, thanks for the explanation. We'll give this other branch a try in dev this morning. Stay tuned...

from net_gearman.

j03k64 avatar j03k64 commented on September 18, 2024

@brianlmoon should we open a pull request so we can discuss that timeout_close_refactor branch? It doesn't work out of the box (with at least 7.1) since you are calling askForWork with no parameters and the parameters on askForWork in Worker.php aren't optional.

This branch definitely makes the connection close much faster on MacOS.

Still doing some testing, but here is what I have:
https://github.com/brianlmoon/net_gearman/compare/timeout_close_refactor...tave:upstream-changes-20180624?expand=1

from net_gearman.

j03k64 avatar j03k64 commented on September 18, 2024

From testing my fork of your branch (with a couple minor updates to fix the method signature) it looks like it performs much, much better.

Also, I've been running that if ($err && $err != 11 && $err != 35) { check in Client.php in production for quite a while so I am confident about that. I can't get through a test suite without having that there on MacOS. It seems like it was already being handled in a different block of code in Connection.php where both error codes 11 and 35 are ignored.

from net_gearman.

bdeshong avatar bdeshong commented on September 18, 2024

@brianlmoon / @jpirkey: Overall, this branch performs much better for us. However, we definitely see the same issue that @jpirkey described here. FWIW, we're running on PHP 7.2.6.

We believe that @jpirkey's change fixes the issue, so we'd love to see his changes accepted.

For the sake of keeping moving on our end and not being directly dependent on a dev branch on your end, we're going to fork from this branch, then apply @jpirkey's change that's not macOS-specific.

But with your change + @jpirkey's, we're a πŸ‘on this.

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

Let’s get that merged. Sounds like a winner.

from net_gearman.

bdeshong avatar bdeshong commented on September 18, 2024

Looks like @brianlmoon's changes from February are good. @brianlmoon, do you plan to open a PR for those or just merge this branch of yours to master?

@jpirkey, do you plan to open a PR from your forked changes to get them integrated back into the project, or did you specifically want to rely on your fork? I'd love to not have to rely on a fork on the ShootProof side, but also don't want to rush this for the sake of calling it done. πŸ˜„

I'm not the author of any of this code, but I'm happy to wrangle various commits from forks into a single PR if you all need me to!

from net_gearman.

j03k64 avatar j03k64 commented on September 18, 2024

I created a pull request against his branch (#40).

I don't care to run my own fork either so assuming we are all good with this, I don't really have a preference on how it gets done. And while I have the ability to merge, I'd still prefer to defer to @brianlmoon on that.

from net_gearman.

brianlmoon avatar brianlmoon commented on September 18, 2024

I will get this all merged in.

from net_gearman.

j03k64 avatar j03k64 commented on September 18, 2024

Made one last minor change to fix a variable name in Connection->close:
https://github.com/brianlmoon/net_gearman/pull/40/files#diff-7f309b838cae721a520e158e201fb4d5L424

from net_gearman.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.