Giter Club home page Giter Club logo

Comments (5)

paddycarver avatar paddycarver commented on July 20, 2024

This is "expected" but "unexpected". Let me explain. It is expected in that it is a condition the software is built to expect and self-remedy--that message is just for debug purposes. It is unexpected in that it is an edge case of the algorithm that could cause errors if left uncorrected. And in the course of writing this message, I noticed a bug (which would lead to too many of these messages being generated), so I'm glad you brought it up.

That message is supposed to be triggered when Node A receives state information from Node B, but Node B updated its state tables after Node B sent the message. This is to prevent a race condition in which Node B's state changes after it sends the state to Node A but before Node A properly "joins" the cluster and is, therefore, informed of changes to Node B's state.

What actually happens is that the message is triggered when Node A receives state information from Node B, but Node A updated its state tables after Node B sent the message. Which is decidedly more chatty and generates more false positives, but eventually leads to the same result.

I need to do some serious work on Nodes joining the cluster and communicating Node state as part of #4 and #10, so I'll fix this bug when I do that. Essentially, I need Nodes to "formally" announce their presence to the cluster, and trigger the race condition check at that point, not before. This will bring the implementation in line with the paper.

Thanks for raising the issue. Sorry for the problem. :(

from wendy.

paddycarver avatar paddycarver commented on July 20, 2024

As I was working on #10, I came across the part in the paper that specifically calls for this function:

Pastry uses an optimistic approach to controlling concurrent node arrivals and departures. Since the arrival/departure of a node affects only a small number of existing nodes in the system, contention is rare and an optimistic approach is appropriate. Briefly, whenever a node A provides state information to a node B, it attaches a timestamp to the message. B adjusts its own state based on this information and eventually sends an update message to A (e.g., notifying A of its arrival). B attaches the original timestamp, which allows A to check if its state has since changed. In the event that its state has changed, it responds with its updated state and B restarts its operation.

Of course, as #4 demonstrates, relying on clock time in a distributed system is a Bad Ideaβ„’, so we don't want to use a timestamp for that. The original suggestion I heard was to use a vector clock, which would certainly do the trick. I think, however, based on the usage, it may be overkill. Really, all we're trying to do is determine whether the state a node modified its state tables based off of changed before the node officially joined the cluster and began receiving state updates.

To that end, I'm starting to think that a simple version number for each state table would suffice. The number is incremented whenever the state table is altered, and included when sending the state table to other nodes. Those nodes then include that number when they announce their presence to the cluster, and if there's a mismatch, the node in the cluster sends the new state to the joining node.

Any thoughts or objections?

from wendy.

alecthomas avatar alecthomas commented on July 20, 2024

I can't comment with any authority, not knowing the design of Wendy, but there is a vector clock implementation for Go that could be useful if you want that route.

from wendy.

paddycarver avatar paddycarver commented on July 20, 2024

Yeah, it just hasn't been brought up to speed to work with Go1 yet. So I'd have to update it and submit the patch, which isn't the end of the world. I just think a vector clock may be more information than, strictly speaking, is necessary for this. Versioning the state tables with a uint64 is sufficient, I think.

The biggest change is working on the joining algorithm, but I've got that all whiteboarded out, and I think it should be easy enough to implement. I've got some stuff that needs doing for my job, so I can't focus on this as much as I'd like, but I'll get the change in ASAP.

from wendy.

paddycarver avatar paddycarver commented on July 20, 2024

This has been resolved as of the beta1 release.

from wendy.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.