Giter Club home page Giter Club logo

Comments (7)

schreter avatar schreter commented on June 20, 2024 1

LogId is also used on a follower to determine if a log entry is identical to the follower's local log entry at the same index, when a follower receives append-entries request.

Sure, but considering that Raft doesn't allow to have two leaders for the same term, the term comparison must be a sufficient condition - no need for node_id there.

May be I can introduce a feature to disable/enable to contain a node-id in LogId. 🤔

I wouldn't complicate the API and core algorithms. Either-or, I'd say.

We have plenty of optimization potential elsewhere (e.g., get rid of Vec<T> and similar on the API, replacing them with impl IntoIterator<T> or similar to prevent unnecessary memory allocation, where not required, etc.).

I did not know you have such a big node-id :( Is it possible using a u32 node-id and store the 16B real id in a Node?

Indeed, I'm thinking about doing exactly that. So far, we didn't use the Node, but maybe it's a better way; simply store Raft node index in the node_id and use Node to store the actual node ID (we use 16B GUIDs; that's a long story). That would actually make some other operations simpler (restoring a consensus domain replica in a catastrophic scenario on a completely unrelated node w/o the need to "renumber" everything).

from openraft.

github-actions avatar github-actions commented on June 20, 2024

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

from openraft.

drmingdrmer avatar drmingdrmer commented on June 20, 2024

I added a node_id to LeaderId because LeaderId(term, node_id) reduces conflict during an election:

Given 3 nodes, if all of them start to elect at the same time, with the same term, without node_id, it is very likely there is no leader established in this term:

  • a voted for itself and sent vote requests to b, c;
  • b voted for itself and sent vote requests to a, c;
  • c voted for itself and sent vote requests to a, b;

Because they all have already voted for themselves, they just reject every received vote request from other nodes. This is the way the standard raft does

But with node_id, finally there will be a leader elected in this term. Because LeaderId(term=1, node_id=2) > LeaderId(term=1, node_id=1), the node with greatest node_id will finally succeed.

In any case, removing the comparison for testing purposes doesn't have any effect on test outcome - all green.

It has to compare node_id too here, because there might have been multiple leaders in the same term, e.g., LeaderId(term=3,node_id=a) then a new LeaderId(term=3,node_id=b) established. In this case, a log of LeaderId(term=3,node_id=a) being replicated by LeaderId(term=3,node_id=b) does not imply the log is committed.

It looks I have to add some tests for such a case:(

The only location in code where LeaderId::node_id is evaluated seems to be in Engine::update_progress(), where it would likely suffice to compare the term (the same term cannot be used by two different leaders).

This is the only location where it is explicitly used. When handling a vote request, a node just compares votes with req.vote >= self.vote, this is the most important usage of it.

from openraft.

schreter avatar schreter commented on June 20, 2024

Hm, I still don't get it. The standard Raft uses only term, but no node_id for its election and it works correctly. Raft guarantees by its election process that there cannot be two leaders elected for the same term. So how come there are two leaders with the same term in openraft?

In the first case you described (all start voting for self at the exact time), there would be no leader elected for this term, which would trigger another election with a new term. The likelihood that even the second election again occurs at exact the same time on all nodes (and fails too) is virtually zero.

When I understand you correctly, you only store node_id to speed up the election? If so, I don't think it makes sense to optimize the "bad" case (new election occurring once in a while) by penalizing the "good" case all the time (e.g., additional info to be logged; occurring all the time).

The vote handling compares log ID (i.e., term + log index). That should be sufficient, since the same term cannot be used by two different nodes. I don't see how the node_id should influence anything there.

from openraft.

drmingdrmer avatar drmingdrmer commented on June 20, 2024

Hm, I still don't get it. The standard Raft uses only term, but no node_id for its election and it works correctly. Raft guarantees by its election process that there cannot be two leaders elected for the same term. So how come there are two leaders with the same term in openraft?

In standard raft, a leader is not just identified by term. A leader is identified by a tuple LeaderId: (term, voted_for), e.g., a node has voted for node-1 in term 3 has its LeaderId stored as (term=3, voted_for=1).

image

The LeaderId in standard raft is a partial-order value, because voted_for=1 !> voted_for=2 and voted_for=1 !< voted_for=2. I had a slide explaining the conflict when candidate y in term 5 tries sending vote request to another node in term 5:

image

And this partial-order LeaderId allows only one leader to be granted in every term in the standard raft, term, without voted_for can be used as a leader identifier.

In openraft LeaderId(term, node_id) is a total-order value, i.e., LeaderId(term=3, node_id=2) > LeaderId(term=3,node_id=1), which makes it possible to have multiple leaders in one term. But still, at any time, there is only one leader is granted by a quorum. E.g., LeaderId(term=3,node_id=1) is granted by node-1 and node-2(3 nodes cluster), then LeaderId(term=3,node_id=2) is granted by node-1 and node-2. Because LeaderId is totally ordered, a node can grant LeaderId(term=3,node_id=2) even when it has already granted LeaderId(term=3,node_id=1).

image

In the first case you described (all start voting for self at the exact time), there would be no leader elected for this term, which would trigger another election with a new term. The likelihood that even the second election again occurs at exact the same time on all nodes (and fails too) is virtually zero.

Right, the chance there is another round of conflict is rare. But still, it takes two rounds of election. The time gap between these two rounds will be considered service downtime(about 2 RTT).

When I understand you correctly, you only store node_id to speed up the election?

Generally speaking, yes:). And it simplifies the logic. Adding another field will save several lines of code that handle term.

If so, I don't think it makes sense to optimize the "bad" case (new election occurring once in a while) by penalizing the "good" case all the time (e.g., additional info to be logged; occurring all the time).

Hm... to me, with or without a node_id in a log entry, the storage or transport efficiency can be well optimized when needed, e.g., extracting term, node_id and storing them separately. But the election conflict can not be solved without node_id in the LeaderId.

from openraft.

schreter avatar schreter commented on June 20, 2024

Wow, what a long explanation :-), thanks.

I was imprecise, sorry. Of course, (one) voted-for node ID needs to be stored and the node_id is sent as part of the Vote message.

So we can conclude that the node_id storage is for speeding up the election. I can live with that, I just have to find a way how to optimize log storage that we don't have to store the additional node_id information for each log entry unnecessarily. Maybe something like not storing the term/node_id per log entry at all and rather generate "switch" log entries when there is a change.

Alternatively, looking at the code again, it seems like Entry could use a simpler LogId only containing term and index, without the node_id. The node_id seems to be only relevant for replication reply, where it can be easily computed based on the last voted_for for Matching. For Conflict, it can be ignored, since it's not used. Of course, the default implementation of RaftLogReader would be more complex/impossible and quite a few changes would be needed elsewhere...

I'll probably go with the above mentioned storage optimization in the log ("switch" entries) or we'll store our 16B node IDs for each log entry anyway.

from openraft.

drmingdrmer avatar drmingdrmer commented on June 20, 2024

Alternatively, looking at the code again, it seems like Entry could use a simpler LogId only containing term and index, without the node_id. The node_id seems to be only relevant for replication reply, where it can be easily computed based on the last voted_for for Matching. For Conflict, it can be ignored, since it's not used. Of course, the default implementation of RaftLogReader would be more complex/impossible and quite a few changes would be needed elsewhere...

LogId is also used on a follower to determine if a log entry is identical to the follower's local log entry at the same index, when a follower receives append-entries request.

I have a LogIdList structure that contains every log id from which a new LeaderId starts:
https://github.com/datafuselabs/openraft/blob/main/openraft/src/engine/log_id_list.rs

In fact, in my opinion, such a structure reflects more precisely the data structure in raft.
Logs should be a 2 dimension list entry = logs[leader_id][index].

May be I can introduce a feature to disable/enable to contain a node-id in LogId. 🤔

I'll probably go with the above mentioned storage optimization in the log ("switch" entries) or we'll store our 16B node IDs for each log entry anyway.

I did not know you have such a big node-id :(
Is it possible using a u32 node-id and store the 16B real id in a Node?
Membership stores every NodeId and Node for applications:

pub struct Membership<NID, N>
where
N: Node,
NID: NodeId,
{
/// Multi configs of members.
///
/// AKA a joint config in original raft paper.
configs: Vec<BTreeSet<NID>>,
/// Additional info of all nodes, e.g., the connecting host and port.
///
/// A node-id key that is in `nodes` but is not in `configs` is a **learner**.
nodes: BTreeMap<NID, N>,
}

from openraft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.