Giter Club home page Giter Club logo

Comments (4)

github-actions avatar github-actions commented on September 22, 2024

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

from openraft.

drmingdrmer avatar drmingdrmer commented on September 22, 2024

You can not kill two nodes in a cluster of only three nodes. A majority requires at least two nodes.
The whole cluster will inevitably become unresponsive, whether or not using an "Unreachable" error and backoff.

The leader has received two Unreachable errors for all of the target node it replicates to. Thus by the design, the leader decide not to send anything out. This is as expected.

There should be several warning messages in the log, emitted by the two logging statements below.

if let Some(b) = &mut self.backoff {
let duration = b.next().unwrap_or_else(|| {
tracing::warn!("backoff exhausted, using default");
Duration::from_millis(500)
});
self.backoff_drain_events(Instant::now() + duration).await?;
}

pub async fn backoff_drain_events(&mut self, until: Instant) -> Result<(), ReplicationClosed> {
let d = until - Instant::now();
tracing::warn!(
interval = debug(d),
"{} backoff mode: drain events without processing them",
func_name!()
);

from openraft.

kevlu93 avatar kevlu93 commented on September 22, 2024

Thanks for the explanation. Yes I see those two warning messages, but are there any messages emitted indicating that the leader recognizes that all nodes are unreachable and it should no longer send anything out? I looked through the logs and wasn't seeing anything obvious. Also, would the same behavior occur if the leader receives multiple Network or Timeout errors from all of the target nodes?

from openraft.

drmingdrmer avatar drmingdrmer commented on September 22, 2024

No such log message indicating that all nodes are unreachable.

And the critical state is not that all nodes are unreachable, but that a quorum of nodes are unreachable.

There is no such a log message because it may misleading: the leader can not be sure if it is a problem with the target node or a problem with the leader node.

To get a correct view of the cluster state, you must contact each node directly.

from openraft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.