Giter Club home page Giter Club logo

Comments (4)

github-actions avatar github-actions commented on September 23, 2024

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

from openraft.

drmingdrmer avatar drmingdrmer commented on September 23, 2024

What script were you using to produce this issue?

I think you're using one of the shell script in the example.
You can just push the modify code base to a branch so I can reproduce this issue using your branch.

And please attach the full log in DEBUG level so I can see what was happening.

And I don't understand why you believe that step two/three should fail.
Adding non-voter without join them to a cluster should always work.

from openraft.

pfwang80s avatar pfwang80s commented on September 23, 2024

And I don't understand why you believe that step two/three should fail.

sorry, my bad, they should be step 4/5.

Adding non-voters with different NodeId always works, but after upgrading them to voter, killing the leader, the whole cluster stopped.

I think you can reproduce it with the mem kv example. To express myself better, I redescribe it in Chinese:
如何重现
有三个节点,node0/1/2,对应的 RaftNetwork 地址分别是 addr 0/1/2,启动这三个节点时,传给 openraft::Raft::new 的 NodeId 分别是 0/1/2

  1. 启动 node0,作为单节点 cluster。
  2. 启动 node1,作为 learner。
  3. 启动 node2,作为 learner。
  4. 在 node0 上调用 add_learner,但是用 NodeId = 2 来添加 node1 (addr1)
  5. 在 node0 上调用 add_learner,用 NodeId = 1 来添加 node2 (addr2)
  6. 在 node0 上调用 change_membership,把两个新加进来的 node 升级为 voter
  7. kill node0, the leader

期待结果
下面两种结果都可以接受:

  1. 步骤 4/5 直接报错。因为 change_membership 日志中记录 id 和 raft 启动时的 id 不符,这导致了在 learner 升级成 voter 且 leader 挂掉之后整个集群不再工作。或者
  2. 选举正常工作,哪怕 node1/2 的 id 被改掉了。
    相比之下 1 会简单一些,2 可能需要在 learner 被加入 cluster 时去做个判断,并更新自己的 id,目测这个需要梳理很多逻辑来保证更新 id 的正确性。

实际结果
node 1/2 stop working, "openraft::core::tick: Tick fails to send, receiving end quit: channel closed"

from openraft.

drmingdrmer avatar drmingdrmer commented on September 23, 2024

OK, I see the root cause of this problem is this, node-2 expected itself to be a follower but it is acutally a leader:

thread 'main' panicked at openraft/openraft/src/engine/engine_impl.rs:793:9:
assertion failed: self.internal_server_state.is_following()

Because you mistakenly configured a wrong network addresses for node-3: the endpoint address of node-3 is the address of node-2.
When node-1 is killed and node-2 becomes the leader, node-2 tries to replicate a message to node-3 by replicating it to itself.
And Openraft only allows a follower to receive replication messages.
Because node-2 itself is a leader, the assertion fails.

from openraft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.