Giter Club home page Giter Club logo

Comments (9)

bbkr avatar bbkr commented on June 21, 2024

I've noticed that if I add

my $t = AnyEvent->timer(
    'after' => 10,
    'cb' => sub { print $requestor->recv_multipart() }
);

Then I get next message from socket after script gets locked.
So it looks like has_pollin is not detecting all messages possible, leaves some data on the socket and no IO callback will be generated in the future because not all messages were consumed.

Or my theory can be completely wrong :)

from perlzmq.

calid avatar calid commented on June 21, 2024

use DEALER DEALER. REQ/REP sockets are basic types generally discouraged for real applications. My guess is the REP socket is getting out of sync, and since it HAS to follow recv, send, recv, send order it gets stuck at that point. Using DEALER DEALER worked without issue.

from perlzmq.

bbkr avatar bbkr commented on June 21, 2024

REP socket on server is following recv, send, recv states correctly. If lock happens I can see that it produced 4 more messages that are received in client. It's client that never gets IO callback despite those messages being available on socket later.

Besides - DEALER-REP is valid pattern, should never get "out of sync" on REP side (and in my case should not exceed HWM either because of throttling). I can not imagine how is it possible to achieve such desync because REP has its own buffer.

I'll try DEALER-DEALER, however I do not want event loop in server - that means I'll receive task in AE callback and that exponentially complicates task execution flow because I cannot use $condvar->recv() to synchronize async steps required to do task.

from perlzmq.

bbkr avatar bbkr commented on June 21, 2024

Ah, also in DEALER-DEALER there can be only one peer, that's another reason why I chose DEALER-REP.

from perlzmq.

bbkr avatar bbkr commented on June 21, 2024

OK, here is where stuff gets interesting...

I disabled throttling return if $req - $rep >= 4 line and got all messages.
I set throttling to 1 and also got all messages.
I set throttling to 2 and got random lockups.
I set throttling to 3 and got random lockups.
I set throttling to 500 and got rare random lockups.

So now I'm really confused. Why the code works when there is one or bunch messages published on socket at the same time but locks when there are few? I've tested it on 1_000_000 messages.

from perlzmq.

calid avatar calid commented on June 21, 2024

@bbkr a bit slammed with work at the moment, but I'll take a deeper look at this just as soon as I have a chance. I've certainly run into weird behavior using event loops + zeromq's virtual fd in the past. Usually this is down to not handling zeromq's edge triggered semantics in exactly the right way. Is it possible this is the issue?

If you aren't familiar with edge triggered vs level triggered behavior this article seems like a nice overview of the issues:
http://funcptr.net/2012/09/10/zeromq---edge-triggered-notification/

from perlzmq.

bbkr avatar bbkr commented on June 21, 2024

So basically in edge triggered model I must consume all messages to get next "IO is readable" callback.
That means if something arrives after $requestor->has_pollin() returns false but before exiting callback then it will lock. And to complicate things readable socket info can be false positive.

So the scenario that leads to lock:

  • got IO readable callback
  • enter callback body
  • ask for pollin
  • get reply "false"
  • _meantime message arrives on socket_
  • leave callback body

(it may also happen after consuming few messages while in callback)

That means many examples linked in ZMQ guide and even this library PUSH/PULL synopsis are prone to this error.

I have no idea how to fix it in a code that needs AnyEvent loop. The obvious hack is to give up on IO monitors and use timers, but that is very inefficient.

from perlzmq.

calid avatar calid commented on June 21, 2024

@bbkr I haven't forgotten about this, just been extremely busy... I should be able to look at this in the next few weeks though.

from perlzmq.

calid avatar calid commented on June 21, 2024

I was unable to reproduce the hang using the example client/server in your initial comment, either this was an issue with older versions or it is specific to your local system. I let it run for several minutes, sending/receiving tens of thousands of messages without issue.

Perl - v5.16.3 x86_64-linux-thread-multi
ZMQ::FFI - 1.12
AnyEvent - 7.15 
FFI::Platypus - 0.84

from perlzmq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.