Currently, LeaderId contains not only <code class="no

👋 Thanks for opening this issue! Get help or engage by: <ul dir

Hm, I still don't get it. The standard Raft uses only <code class="notran

Alternatively, looking at the code again, it seems like <code class="notr

Redundant information in `LeaderId::node_id`,about datafuselabs/openraft

Comments (7)

schreter commented on June 20, 2024 1

LogId is also used on a follower to determine if a log entry is identical to the follower's local log entry at the same index, when a follower receives append-entries request.

Sure, but considering that Raft doesn't allow to have two leaders for the same term, the term comparison must be a sufficient condition - no need for node_id there.

May be I can introduce a feature to disable/enable to contain a node-id in LogId. 🤔

I wouldn't complicate the API and core algorithms. Either-or, I'd say.

We have plenty of optimization potential elsewhere (e.g., get rid of Vec<T> and similar on the API, replacing them with impl IntoIterator<T> or similar to prevent unnecessary memory allocation, where not required, etc.).

I did not know you have such a big node-id :( Is it possible using a u32 node-id and store the 16B real id in a Node?

Indeed, I'm thinking about doing exactly that. So far, we didn't use the Node, but maybe it's a better way; simply store Raft node index in the node_id and use Node to store the actual node ID (we use 16B GUIDs; that's a long story). That would actually make some other operations simpler (restoring a consensus domain replica in a catastrophic scenario on a completely unrelated node w/o the need to "renumber" everything).

from openraft.

github-actions commented on June 20, 2024

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

from openraft.

drmingdrmer commented on June 20, 2024

I added a node_id to LeaderId because LeaderId(term, node_id) reduces conflict during an election:

Given 3 nodes, if all of them start to elect at the same time, with the same term, without node_id, it is very likely there is no leader established in this term:

a voted for itself and sent vote requests to b, c;
b voted for itself and sent vote requests to a, c;
c voted for itself and sent vote requests to a, b;

Because they all have already voted for themselves, they just reject every received vote request from other nodes. This is the way the standard raft does

But with node_id, finally there will be a leader elected in this term. Because LeaderId(term=1, node_id=2) > LeaderId(term=1, node_id=1), the node with greatest node_id will finally succeed.

In any case, removing the comparison for testing purposes doesn't have any effect on test outcome - all green.

It has to compare node_id too here, because there might have been multiple leaders in the same term, e.g., LeaderId(term=3,node_id=a) then a new LeaderId(term=3,node_id=b) established. In this case, a log of LeaderId(term=3,node_id=a) being replicated by LeaderId(term=3,node_id=b) does not imply the log is committed.

It looks I have to add some tests for such a case:(

The only location in code where LeaderId::node_id is evaluated seems to be in Engine::update_progress(), where it would likely suffice to compare the term (the same term cannot be used by two different leaders).

This is the only location where it is explicitly used. When handling a vote request, a node just compares votes with req.vote >= self.vote, this is the most important usage of it.

from openraft.

schreter commented on June 20, 2024

Hm, I still don't get it. The standard Raft uses only term, but no node_id for its election and it works correctly. Raft guarantees by its election process that there cannot be two leaders elected for the same term. So how come there are two leaders with the same term in openraft?

In the first case you described (all start voting for self at the exact time), there would be no leader elected for this term, which would trigger another election with a new term. The likelihood that even the second election again occurs at exact the same time on all nodes (and fails too) is virtually zero.

When I understand you correctly, you only store node_id to speed up the election? If so, I don't think it makes sense to optimize the "bad" case (new election occurring once in a while) by penalizing the "good" case all the time (e.g., additional info to be logged; occurring all the time).

The vote handling compares log ID (i.e., term + log index). That should be sufficient, since the same term cannot be used by two different nodes. I don't see how the node_id should influence anything there.

from openraft.

drmingdrmer commented on June 20, 2024

Hm, I still don't get it. The standard Raft uses only term, but no node_id for its election and it works correctly. Raft guarantees by its election process that there cannot be two leaders elected for the same term. So how come there are two leaders with the same term in openraft?

In standard raft, a leader is not just identified by term. A leader is identified by a tuple LeaderId: (term, voted_for), e.g., a node has voted for node-1 in term 3 has its LeaderId stored as (term=3, voted_for=1).

The LeaderId in standard raft is a partial-order value, because voted_for=1 !> voted_for=2 and voted_for=1 !< voted_for=2. I had a slide explaining the conflict when candidate y in term 5 tries sending vote request to another node in term 5:

And this partial-order LeaderId allows only one leader to be granted in every term in the standard raft, term, without voted_for can be used as a leader identifier.

In openraft LeaderId(term, node_id) is a total-order value, i.e., LeaderId(term=3, node_id=2) > LeaderId(term=3,node_id=1), which makes it possible to have multiple leaders in one term. But still, at any time, there is only one leader is granted by a quorum. E.g., LeaderId(term=3,node_id=1) is granted by node-1 and node-2(3 nodes cluster), then LeaderId(term=3,node_id=2) is granted by node-1 and node-2. Because LeaderId is totally ordered, a node can grant LeaderId(term=3,node_id=2) even when it has already granted LeaderId(term=3,node_id=1).

In the first case you described (all start voting for self at the exact time), there would be no leader elected for this term, which would trigger another election with a new term. The likelihood that even the second election again occurs at exact the same time on all nodes (and fails too) is virtually zero.

Right, the chance there is another round of conflict is rare. But still, it takes two rounds of election. The time gap between these two rounds will be considered service downtime(about 2 RTT).

When I understand you correctly, you only store node_id to speed up the election?

Generally speaking, yes:). And it simplifies the logic. Adding another field will save several lines of code that handle term.

If so, I don't think it makes sense to optimize the "bad" case (new election occurring once in a while) by penalizing the "good" case all the time (e.g., additional info to be logged; occurring all the time).

Hm... to me, with or without a node_id in a log entry, the storage or transport efficiency can be well optimized when needed, e.g., extracting term, node_id and storing them separately. But the election conflict can not be solved without node_id in the LeaderId.

from openraft.

schreter commented on June 20, 2024

Wow, what a long explanation :-), thanks.

I was imprecise, sorry. Of course, (one) voted-for node ID needs to be stored and the node_id is sent as part of the Vote message.

So we can conclude that the node_id storage is for speeding up the election. I can live with that, I just have to find a way how to optimize log storage that we don't have to store the additional node_id information for each log entry unnecessarily. Maybe something like not storing the term/node_id per log entry at all and rather generate "switch" log entries when there is a change.

Alternatively, looking at the code again, it seems like Entry could use a simpler LogId only containing term and index, without the node_id. The node_id seems to be only relevant for replication reply, where it can be easily computed based on the last voted_for for Matching. For Conflict, it can be ignored, since it's not used. Of course, the default implementation of RaftLogReader would be more complex/impossible and quite a few changes would be needed elsewhere...

I'll probably go with the above mentioned storage optimization in the log ("switch" entries) or we'll store our 16B node IDs for each log entry anyway.

from openraft.

drmingdrmer commented on June 20, 2024

Alternatively, looking at the code again, it seems like Entry could use a simpler LogId only containing term and index, without the node_id. The node_id seems to be only relevant for replication reply, where it can be easily computed based on the last voted_for for Matching. For Conflict, it can be ignored, since it's not used. Of course, the default implementation of RaftLogReader would be more complex/impossible and quite a few changes would be needed elsewhere...

LogId is also used on a follower to determine if a log entry is identical to the follower's local log entry at the same index, when a follower receives append-entries request.

I have a LogIdList structure that contains every log id from which a new LeaderId starts:
https://github.com/datafuselabs/openraft/blob/main/openraft/src/engine/log_id_list.rs

In fact, in my opinion, such a structure reflects more precisely the data structure in raft.
Logs should be a 2 dimension list entry = logs[leader_id][index].

May be I can introduce a feature to disable/enable to contain a node-id in LogId. 🤔

I'll probably go with the above mentioned storage optimization in the log ("switch" entries) or we'll store our 16B node IDs for each log entry anyway.

I did not know you have such a big node-id :(
Is it possible using a u32 node-id and store the 16B real id in a Node?
Membership stores every NodeId and Node for applications:

openraft/openraft/src/membership/membership.rs

Lines 73 to 87 in e31b0f9

 pub struct Membership<NID, N> 

 where 

 N: Node, 

 NID: NodeId, 

 { 

 /// Multi configs of members. 

 /// 

 /// AKA a joint config in original raft paper. 

 configs: Vec<BTreeSet<NID>>, 

 /// Additional info of all nodes, e.g., the connecting host and port. 

 /// 

 /// A node-id key that is in `nodes` but is not in `configs` is a **learner**. 

 nodes: BTreeMap<NID, N>, 

 }

from openraft.

Redundant information in `LeaderId::node_id` about openraft HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	pub struct Membership<NID, N>
	where
	N: Node,
	NID: NodeId,
	{
	/// Multi configs of members.
	///
	/// AKA a joint config in original raft paper.
	configs: Vec<BTreeSet<NID>>,

	/// Additional info of all nodes, e.g., the connecting host and port.
	///
	/// A node-id key that is in `nodes` but is not in `configs` is a learner.
	nodes: BTreeMap<NID, N>,
	}