Giter Club home page Giter Club logo

raft's Introduction

🚣 Raft

This is an instructional implementation of the Raft distributed consensus algorithm in Go. It's accompanied by a series of blog posts:

Each of the partN directories in this repository is the complete source code for Part N of the blog post series (except Part 0, which is introductory and has no code). There is a lot of duplicated code between the different partN directories - this is a conscious design decision. Rather than abstracting and reusing parts of the implementation, I opted for keeping the code as simple as possible. Each directory is completely self contained and can be read and understood in isolation. Using a graphical diff tool to see the deltas between the parts can be instructional.

How to use this repository

You can read the code, but I'd also encourage you to run tests and observe the logs they print out. The repository contains a useful tool for visualizing output. Here's a complete usage example:

$ cd part1
$ go test -v -race -run TestElectionFollowerComesBack |& tee /tmp/raftlog
... logging output
... test should PASS
$ go run ../tools/raft-testlog-viz/main.go < /tmp/raftlog
PASS TestElectionFollowerComesBack map[0:true 1:true 2:true TEST:true] ; entries: 150
... Emitted file:///tmp/TestElectionFollowerComesBack.html

PASS

Now open file:///tmp/TestElectionFollowerComesBack.html in your browser. You should see something like this:

Image of log browser

Scroll and read the logs from the servers, noticing state changes (highlighted with colors). Feel free to add your own cm.dlog(...) calls to the code to experiment and print out more details.

Changing and testing the code

Each partN directory is completely independent of the others, and is its own Go module. The Raft code itself has no external dependencies; the only require in its go.mod is for a package that enables goroutine leak testing - it's only used in tests.

To work on part2, for example:

$ cd part2
... make code changes
$ go test -race ./...

Depending on the part and your machine, the tests can take up to a minute to run. Feel free to enable verbose logging with -v, and/or used the provided dotest.sh script to run specific tests with log visualization.

Contributing

I'm interested in hearing your opinion or suggestions for the code in this repository. Feel free to open an issue if something is unclear, or if you think you found a bug. Code contributions through PRs are welcome as well.

raft's People

Contributors

eliben avatar fatih avatar mike96265 avatar ngocson2vn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raft's Issues

Check before updating nextIndex and matchIndex

In Part3 code, function leaderSendAEs.
For example: After a leader sends an empty ae(heartbeat) to followers, it immediately receives a command and sends an ae with a new log to followers. If the response of the 2nd ae arrives earlier than the 1st ae due to some network glitch, the nextIndex and matchIndex will be set to the newer(larger) one and set back to the smaller one. It's better to check in line 633 that ni + len(entries) is larger than nextIndex before updating it.

Why we do not check errors in AE responses?

Hi, Assume that the state of the node is leader and a network partition happened, node send AE (heartbeat) to other nodes and all of the requests failed or timed out, node remain leader.
Is this supposed behaviour ?

Why do need to use lock in RPC calls?

I am a newbie who just learned the go language, and when I read the following code, I was very confused.

func (s *Server) Call(id int, serviceMethod string, args interface{}, reply interface{}) error {
	s.mu.Lock()
	peer := s.peerClients[id]
	s.mu.Unlock()

	// If this is called after shutdown (where client.Close is called), it will
	// return an error.
	if peer == nil {
		return fmt.Errorf("call client %d after it's closed", id)
	} else {
		return peer.Call(serviceMethod, args, reply)
	}
}

My understanding of locks is limited to when multiple goroutines read and write to a shared variable, the lock is needed to prevent one goroutine from reading the shared variable that other goroutines have not finished writing. In my opinion, for two RPC calls to the same node, the two RPCs are not related (in each RPC call, new parameters are generated for the call), so there is no need to lock.

TestCrashAfterSubmit

I don't think this is a bug. Maybe it's a doubt.

image

if sleepMs(1) is commented , the test will be failure.

Because mechanism of goroutine will execute leaderSendAEs before leader crashed.

But environment is very complex , I don't think this is a clearness result.

3ks for your sharing & expecting answer.

Best regards.

becomeFollower may update the term from a higher one back to a low one

when the node try to become a follower, it will update its currentTerm to a specified term passed to becomeFollower

I think current implementation of becomeFollower may update currentTerm to low one, which may not be allowed.

There are 2 entires for a node to call becomeFollower: one is from the RequestVote reply, another is from the AppendEntries reply.

from AppendEntry reply, the condition for becomeFollower(...) is if reply.Term > savedCurrentTerm ( https://github.com/eliben/raft/blob/master/part3/raft.go#L620-L622 ). This could be a misbehavior:

  1. there are 5 nodes in the raft group, current node is the leader with term 1.
  2. another 2 nodes (node_0, node_1) are network partitioned, node_0's vote term has been increase to 4, node_1's vote term has been increased to 9.
  3. current node send AppendEntries to all other 4 nodes, and its savedCurrentTerm is 1 right now.
  4. node_1 with vote term(9) reconnected back, it sending VoteRequest with term(9), current node becom follower by thie VoteRequest since current node's term is 1, which is less than 9 and, its currentNode has been changed to 9 in becomeFollower
  5. node_0 with vote term(4)reconnected back, it received current node's AppendEntries and it reply this rpc with reply.Term = 4, when current node recevied this reply, it will reset its currentTerm from 9 back to 4

I think it should compare the reply.Term with cm.currentTerm, not with savedCurrentTerm:

if err := cm.server.Call(peerId, "ConsensusModule.AppendEntries", args, &reply); err == nil {
  cm.mu.Lock()
  defer cm.mu.Unlock()
  if reply.Term > cm.currentTerm {  // Not compared with savedCurrentTerm
	  cm.dlog("term out of date in heartbeat reply")
	  cm.becomeFollower(reply.Term)
	  return
  }

Retired old leader may send a heartbeat with a new term accepted from the new leader

Thanks for the author's articles, which is very useful for me to understand the procedure details of the raft.

I think there may be a misbehavior that the old leader may send a heartbeat with a new term accepted from the new leader: because the leader check procedure for the heartbeat sending and the real hartbeat sending with term is not atomic.

  1. when the old leader try to send a heartbeat , the check procedure is passed ( https://github.com/eliben/raft/blob/master/part3/raft.go#L576-L581 )
  2. after passing the check , the old leader try to send the heartbeat, however, at this point, another candidate with higher term launch another round of vote ( this could be happened from a network partitioned follower reconnecting back), and this loader leader become a follower with the higher term
  3. The heartbeat sending procedure of the old leader continues, this old leader sends a heartbeat with a higher term accepted from the new leader

I think this may be a misbehavior, the old leader should sending a heartbeat with the old term, not new(higher) term.

if doSend {
    heartbeat_term := 0
    cm.mu.Lock()
    if cm.state != Leader {
        cm.mu.Unlock()
        return
    }
    heartbeat_term = currentTerm
    cm.mu.Unlock()
    cm.leaderSendAEs(heartbeat_term )  // Sending heartbeat with the term of the leader term.
}

Is `intMin(args.LeaderCommit, len(cm.log)-1)` needed?

In part 2, AppendEntries method, this if block

if args.LeaderCommit > cm.commitIndex {
    cm.commitIndex = intMin(args.LeaderCommit, len(cm.log)-1)
    cm.dlog("... setting commitIndex=%d", cm.commitIndex)
    cm.newCommitReadyChan <- struct{}{}
}

I can't think of a scenario where commitIndex will be set to len(cm.log)-1), since the AE call from leader will always set the args.Entries from nextIndex[peerId] to the end of leader's log entries := cm.log[ni:], doesn't this ensure follower's log always be as complete as leader's?
so we can always set follower's commitIndex to args.LeaderCommit?

Am I crazy here?

Does the `startElection()` method has a potential race condition in it?

Since votesReceived is being read and modified in multiple go routines, wouldn't a race condition occur?

Link To Code

func (cm *ConsensusModule) startElection() {
	cm.state = Candidate
	cm.currentTerm += 1
	savedCurrentTerm := cm.currentTerm
	cm.electionResetEvent = time.Now()
	cm.votedFor = cm.id
	cm.dlog("becomes Candidate (currentTerm=%d); log=%v", savedCurrentTerm, cm.log)

	votesReceived := 1

	// Send RequestVote RPCs to all other servers concurrently.
	for _, peerId := range cm.peerIds {
		go func(peerId int) {
			args := RequestVoteArgs{
				Term:        savedCurrentTerm,
				CandidateId: cm.id,
			}
			var reply RequestVoteReply

			cm.dlog("sending RequestVote to %d: %+v", peerId, args)
			if err := cm.server.Call(peerId, "ConsensusModule.RequestVote", args, &reply); err == nil {
				cm.mu.Lock()
				defer cm.mu.Unlock()
				cm.dlog("received RequestVoteReply %+v", reply)

				if cm.state != Candidate {
					cm.dlog("while waiting for reply, state = %v", cm.state)
					return
				}

				if reply.Term > savedCurrentTerm {
					cm.dlog("term out of date in RequestVoteReply")
					cm.becomeFollower(reply.Term)
					return
				} else if reply.Term == savedCurrentTerm {
					if reply.VoteGranted {
						votesReceived += 1
						if votesReceived*2 > len(cm.peerIds)+1 {
							// Won the election!
							cm.dlog("wins election with %d votes", votesReceived)
							cm.startLeader()
							return
						}
					}
				}
			}
		}(peerId)
	}

	// Run another election timer, in case this election is not successful.
	go cm.runElectionTimer()
}

Awesome blog btw!

Eternal leader

it seems like once a leader is elected, its term is continued until it fails.

Shouldn't the term be limited in time to prevent an enduring isolated leader?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.