In our setup, I'm cleanly leaving the cluster by using leave

WARN: "update about identity with same prefix as ours, declaring it down" about foca HOT 16 CLOSED

jeromegn commented on August 20, 2024

WARN: "update about identity with same prefix as ours, declaring it down"

from foca.

Comments (16)

caio commented on August 20, 2024 1

I've shipped v0.14.0 with a rework on traces. Now foca never emits anything higher than DEBUG. On this level, only high level traces are emitted (membership changes, probes, etc); And the TRACE level exposes the innards (messages being sent, timer events, etc)

And as I write this I realize I forgot about changing leave_cluster 😅 I suspect you won't care much given that you won't be seeing the noise anymore heh (I'll still change it)

from foca.

caio commented on August 20, 2024

Yeah, foca does gossip when leaving the cluster but consuming the instance on leave_cluster wasn't a good decision. It's cute, but one should still be able to poke at the members, flush the updates backlog and even rejoin the cluster (via reuse_down_identity or just changing the identity and announcing) if they like- there's no good reason to prevent that.

I think we should make leave_cluster take &mut self instead of consuming self. Then if you wanna try harder at disseminating your final update(s) you could call into gossip() as many times as you want (updates_backlog() will help here)

As for the traces: ack. I'm quite unhappy with their state atm- I think everything debug level and lower is ok, but the other ones have a tendency of getting in the way and going down the route of filtering traces via subscribers and whatnot is annoying af. I'll try to come up with something less unpleasant in the future. For now, I'll lower the level of this to Debug.

I'll try to ship these along with something for #28 early this w/e

from foca.

jeromegn commented on August 20, 2024

I won't see the noise anymore, however I do notice double the cluster size for a while when I restart my cluster. I'm sort of attributing that to the unclean leave.

from foca.

caio commented on August 20, 2024

hmm... The leave is definitely not unclean and since the higher counts go down after a while it suggests that the knowledge persists in the cluster. I've released v0.15.0 right now so you can see if gossiping a bunch after leave_cluster helps, but I'm skeptical: as soon as someone thinks a node is dead they start ignoring their messages, so chances are the extra messages will be discarded.

I wonder if persisting recent exists would help here: idk how you're doing the store/load of the state during restarts but if you have a global storage, you could also save the "node X decided to leave at time T" then you can feed the recent ones to foca.apply_many just like you do for the active members.

from foca.

jeromegn commented on August 20, 2024

The simplest way to explain how we do it is: before leaving the cluster, we iterate all members and store their serialized state per identity in sqlite. On start we only pick Alive or Down states and apply_many for them. It's replacing them entirely within a transaction so there shouldn't be any stray members.

I wonder if persisting recent exists would help here: idk how you're doing the store/load of the state during restarts but if you have a global storage, you could also save the "node X decided to leave at time T" then you can feed the recent ones to foca.apply_many just like you do for the active members.

What do you mean by this? I can add a timestamp to the persisted states for sure. Or are you saying something else?

from foca.

caio commented on August 20, 2024

Heh sorry it got confusing since I jumped straight to an attempt at solving the problem.

Let's say you're taking members A and B out of the cluster for a restart and doing the state save + load that you describe. By the time they come back B is now B' and a is A' however B' thinks that A is still alive (and A' thinks that B is). So they feed this information back to the cluster.

^ That's where I believe your double counting is coming from. And then when A' learns about the cluster thinks that A is alive the noise starts. Thankfully the cluster recovers nicely after a while since we learned our lesson with previous bugs 😀

So one solution to minimize this would be to store in a table that everyone can access when you're leaving. Say, you write {timestamp, identity} to a table and you can use this information during the restart of any instance. It's ok if there's some replication delay because of said self correcting behaviour

from foca.

caio commented on August 20, 2024

Since we're talking about persisting the Down state, what I think would be best is:

new functionality in foca to expose down members
you persist the whole state (or at least Alive and Down members; whatever you choose for Suspect)
you use some heuristic to prune down members that you are persisting somehow (just to prevent it growing forever)

less moving pieces on your side and pretty trivial to expose in foca

forgetting the down members during a restart reenables all the issues we had with double counting again

from foca.

jeromegn commented on August 20, 2024

So one solution to minimize this would be to store in a table that everyone can access when you're leaving. Say, you write {timestamp, identity} to a table and you can use this information during the restart of any instance. It's ok if there's some replication delay because of said self correcting behaviour

Turns out the thing that uses foca is an eventually consistent data store. Fairly easy to add some data that gets replicated.

What's odd to me is: when I'm restarting the cluster, every current known identity is going down, assuredly. The order is random though, and members seeing a new state for a left member is also random. So I don't really know what the solution is except that maybe it could be a new type of message. When leaving, send the new down state to every other member, not just a few random members.

I'm now using QUIC instead of UDP / TCP, so updates are pretty reliable (pretty much as reliable as TCP). I figure all nodes would get the "leave" message.

Ultimately, I wouldn't have to use apply_many if the cluster discovered itself much faster. Perhaps that's what needs to be sent more? new identities?

from foca.

caio commented on August 20, 2024

What's odd to me is: when I'm restarting the cluster, every current known identity is going down, assuredly. The order is random though, and members seeing a new state for a left member is also random. So I don't really know what the solution is except that maybe it could be a new type of message. When leaving, send the new down state to every other member, not just a few random members.

I'm pretty sure the knowledge is getting disseminated correctly and fast enough. The problem is not that they never learn that the member went down, is that they forget :)

When you're restarting the cluster, this scenario from a couple of comments ago repeats itself multiple times:

Let's say you're taking members A and B out of the cluster for a restart and doing the state save + load that you describe. By the time they come back B is now B' and a is A' however B' thinks that A is still alive (and A' thinks that B is). So they feed this information back to the cluster.

(But instead of 2 members, it's whatever your rolling restart batch size is)

It doesn't look like a problem when you think about the first time this happens, but in the second batch of nodes the problem becomes evident:

Let's say you're restarting your cluster in 3 batches. B1, the first one, just completed and now you're doing B2. At this point in time you have:

B1 has the knowledge as described in my previous comment: its view of the cluster is good but its view of the members in the same batch is outdated
B2 has just restarted, so has forgotten everything about down nodes

So when B2 starts coming back online, if a node from B1 talks to B2 and still has updates in its backlog, B2 will think that a identity that just declared itself as down is actually alive.

The larger the batch size and the number of batches, the higher the likelihood that it reintroduces down nodes as alive.

If you persist the down members the problem mostly disappears, only the nodes within a batch will (possibly) have outdated knowledge.

To be clear: the problem here is asymmetric knowledge of the terminal (Down) state of identities. It's the same problem we had before with forgetting down members too early due to configuration.

So, getting back to the meat of it: you're trying to speed up discovery of cluster members. The reason foca doesn't do this very well is because it's limited to a maximum packet size. You, however, aren't.

I think you should consider the approach that memberlist uses: periodically, a member connects to another member and they do a full synch (i.e.: member A sends its full list of members, including down ones; member B applies these to its own state THEN sends its full list back to A). It's very similar to what foca does with announce, but having a proper connection between members enables giving the whole state and ensuring a reply.

Whether you stick to the current approach or try a different one, foca can facilitate this by exposing the full state directly (think foca.iter_members(), but without filtering for liveness); I'll get this done

from foca.

jeromegn commented on August 20, 2024

I think you should consider the approach that memberlist uses: periodically, a member connects to another member and they do a full synch (i.e.: member A sends its full list of members, including down ones; member B applies these to its own state THEN sends its full list back to A). It's very similar to what foca does with announce, but having a proper connection between members enables giving the whole state and ensuring a reply.

Whether you stick to the current approach or try a different one, foca can facilitate this by exposing the full state directly (think foca.iter_members(), but without filtering for liveness); I'll get this done

Would this help on batch restarts? I suppose it could self correct by merging states in a way to dismisses a lot of "bad" identities?

from foca.

caio commented on August 20, 2024

Huh I thought I'd released v0.16.0 and replied to this before. Apologies.

Doing this sync between live members pretty much eliminates any knowledge disparity because it guarantees that the nodes will have the exact same state.

I understand the need to converge to the full cluster size as fast as possible so you'll have to address the problem somehow.

Possible approaches. Useful for any scenario related to converging
state:

Persist the full state and reload it after restart (I've shipped v0.16.0 that facilitates this a bit). The batch that's currently being restarted will still have outdated knowledge (of each other), which you may choose to ignore (letting foca self correct) or inject a correction (if you know which addresses were restarted it's not difficult to find out which identities need to be declared down)
Never persist data and exchange knowledge only between live nodes (v0.16.0 helps here the same way). This is the push-pull thing that memberlist does: a request-response cycle where members share their full state with each other
Ignore the problem and use your identities for self correction: if there's a timestamp or something else that grows monotonically (per address at least), you can teach your nodes something like: "whenever there's a new member, check if there's an identity using it, declare the oldest one down". You might want to use this as a way to correct the stale knowledge from the the first approach

My tiny cluster uses an identity with a timestamp (I golfed it and actually has a bug if I restart during a year change but I digress 😅 ), so I go for the last approach as it'd rather not introduce tcp or something similar here

from foca.

jeromegn commented on August 20, 2024

Thanks, that's helpful!

I'm tempted to try the timestamp technique. That should work fine for us.

you can teach your nodes something like: "whenever there's a new member, check if there's an identity using it, declare the oldest one down"

Do you mean declare it down with Foca or declare it down internally (like the a Members map)?

from foca.

caio commented on August 20, 2024

Do you mean declare it down with Foca or declare it down internally (like the a Members map)?

With foca. The idea here is that someone is spreading this knowledge to the cluster, some might have learned it already and you want it to stop; So you teach foca the correct state and it disseminates.

(As soon as you do the foca.apply_many(...) dance the MemberDown notification will fire and your client will update the Members map accordingly so there's nothing to worry about on that end)

from foca.

jeromegn commented on August 20, 2024

I think the timestamp change helped! The cluster seems to eventually coalesce to the same number of members for each node.

I've also started using the new iter_membership_state, that might've helped too.

How can I tell foca that a member is definitely down? I don't think there's a mechanism to do that.

from foca.

caio commented on August 20, 2024

How can I tell foca that a member is definitely down? I don't think there's a mechanism to do that.

You've just done it by feeding the output of iter_membership_state to apply_many :) Any member you insert using this becomes part of the distributed state and Down is a state members can't transition out of so:

foca.apply_many(core::iter::once(
    Member::new(identity_to_kill, Incarnation::default(), State::Down)
), &mut runtime)

Makes foca declare identity_to_kill as down for the whole cluster. That identity won't be able to be used again, the only thing that node can do is change its own id to rejoin.

from foca.

caio commented on August 20, 2024

closing stale issues that seem resolved. feel free to reopen

from foca.

WARN: "update about identity with same prefix as ours, declaring it down" about foca HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent