Giter Club home page Giter Club logo

hotstuff's People

Contributors

asonnino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hotstuff's Issues

Verify TC when handling TC

In handle_tc, the validity of the TC is not checked:

async fn handle_tc(&mut self, tc: TC) -> ConsensusResult<()> {
if tc.round < self.round {
return Ok(());
}
self.advance_round(tc.round).await;
if self.name == self.leader_elector.get_leader(self.round) {
self.generate_proposal(Some(tc)).await;
}
Ok(())

If I am not wrong, any malicious node could send a TC, making correct nodes moving to the next round.

Limit number of payloads per node

A bad node may make us store a lot of crap. There is currently no limit to how many payload they can send us, and we will store them all.

Proposer proposes more than one block after TC

I noticed that generate_proposal sometimes gets called multiple times for a given round. I think that it might be caused by the fact that every node can broadcast a TC. So if N nodes broadcast a TC, it will be handled N times and generate_proposal called N times as well.

Implement epochs

The committee contains an epoch number, but it is never used to reject messages from wrong epochs.

A message transfer between two honest nodes can be canceled

In two places of the code, we call ReliableSender::broadcast() and wait for 2f+1 answers.

// Wait for the first 2f nodes to send back an Ack. Then we consider the batch
// delivered and we send its digest to the consensus (that will include it into
// the dag). This should reduce the amount of synching.
let mut total_stake = self.stake;
while let Some(stake) = wait_for_quorum.next().await {
total_stake += stake;
if total_stake >= self.committee.quorum_threshold() {
self.tx_batch
.send(batch)
.await
.expect("Failed to deliver batch");
break;
}
}

let handles = self
.network
.broadcast(addresses, Bytes::from(message))
.await;
// Send our block to the core for processing.
self.tx_loopback
.send(block)
.await
.expect("Failed to send block");
// Control system: Wait for 2f+1 nodes to acknowledge our block before continuing.
let mut wait_for_quorum: FuturesUnordered<_> = names
.into_iter()
.zip(handles.into_iter())
.map(|(name, handler)| {
let stake = self.committee.stake(&name);
Self::waiter(handler, stake)
})
.collect();
let mut total_stake = self.committee.stake(&self.name);
while let Some(stake) = wait_for_quorum.next().await {
total_stake += stake;
if total_stake >= self.committee.quorum_threshold() {
break;
}
}

This function returns handlers that cancel the transfer when they are dropped. Since we wait for 2f+1 answers before dropping the handlers, it is possible that f honest nodes do not receive the message.

I suggest adding this line :
tokio::spawn(async move { while let Some(_) = wait_for_quorum.next().await {} }); at the end of these two code blocks so the handlers do not get dropped.

I ran two local benchmark of the actual code base after adding this line and obtained 3x more Committed -> logs. (1500 vs 4600)

Implement shared randomness

What's the best approach to implement shared randomness? It will be used to elect the leader for async fallback, and chained-VABA.

Separate loopback channels from net channel

The core of consensus and mempool both use the same channel for loopback messages and net messages. This is a problem: a bad node may format their messages as "loopback" to avoid some checks.

questions about environment

What are your versions of python and fabric? I run fab local in a new tmux session and it will crash, which means the session will be killed. Thanks.

Write a Wiki

Write a wiki to help getting started and understand how the code is structured.

Protect Votes aggregator and Mempool's synchronizer from DoS

A bad node may make us run out of memory by sending many votes with different round numbers (as long as they are bigger than our current round) or with different digests. We will store all of them in memory and clean them up only upon moving to the next round.

A similar issue appears in the mempool's synchronizer. A bad node may send us many different blocks for the same round (ie. blocks with different payloads), and we will try to synchronizer the block data with other nodes.

Restart and Synchronize Issue

Hi, we have been using this library for a consensus scenario. But there seems to be some issues about restarting a node.

In our scenario, we run 4 nodes for consensus.
Then we stop one of them for approximately 0.5~1 hours.
Then we restart the node.

Afterwards, the node runs for a lot of synchronization blocks and gets stuck. Moreover, it finally drags down all other three nodes, and the whole system hangs.

What could be the problem and do u have a solution for this case?

Thanks~

result of benchmark

We tested a lot of group rate under this tx_size and max_payload_sizeใ€ faults(branch 3-chain), but many of the results were 0, and we don't know why this is happening?

+ CONFIG:
 Committee size: 100 nodes
 Input rate: 2,010 tx/s
 Transaction size: 1,000 B
 Faults: 33 nodes
 Execution time: 0 s

 Consensus timeout delay: 5,000 ms
 Consensus sync retry delay: 100,000 ms
 Consensus max payloads size: 1,000 B
 Consensus min block delay: 100 ms
 Mempool queue capacity: 1,200,000 B
 Mempool sync retry delay: 100,000 ms
 Mempool max payloads size: 256,000 B
 Mempool min block delay: 500 ms

 + RESULTS:
 Consensus TPS: 0 tx/s
 Consensus BPS: 0 B/s
 Consensus latency: 0 ms

 End-to-end TPS: 0 tx/s
 End-to-end BPS: 0 B/s
 End-to-end latency: 0 ms

But once in a while, the throughput is displayed. This is also the case for 4, 10, 40, and 100 node tests.

+ CONFIG:
 Committee size: 100 nodes
 Input rate: 3,350 tx/s
 Transaction size: 1,000 B
 Faults: 33 nodes
 Execution time: 21 s

 Consensus timeout delay: 5,000 ms
 Consensus sync retry delay: 100,000 ms
 Consensus max payloads size: 1,000 B
 Consensus min block delay: 100 ms
 Mempool queue capacity: 1,200,000 B
 Mempool sync retry delay: 100,000 ms
 Mempool max payloads size: 256,000 B
 Mempool min block delay: 500 ms

 + RESULTS:
 Consensus TPS: 836 tx/s
 Consensus BPS: 836,069 B/s
 Consensus latency: 9,582 ms

 End-to-end TPS: 622 tx/s
 End-to-end BPS: 622,008 B/s
 End-to-end latency: 13,812 ms

Implement mock storage

All unit tests that require the store currently create one instance of RockDB. We therefore have to be careful to initialise each of these instances with a different storage path or 'cargo test' cannot run tests in parallel.

It would be better to write a simple mock storage (an in-memory store) with the same interface as the current store that could be used for testing.

How you implement bcast?

Hey, I just wonder whether you just done it by:

// consensus/src/synchronizer.rs
let message =  NetMessage(Bytes::from(bytes), addresses);
network_channel.send(message).await;

If so, why a simple send() can achieve it, I found it in doc that the only parameter will just be handled as a message.
If not, then how can the protocol achieve it?

Looking forward to your kind reply ๐Ÿ˜ƒ

Mention dependency of librocksdb-sys on clang in README.md?

Since README.md provides quick start instructions, perhaps mention the dependency of librocksdb-sys on clang there.

During build at

Compiling librocksdb-sys v6.11.4

I got the errors

  --- stdout
  cargo:warning=couldn't execute `llvm-config --prefix` (error: No such file or directory (os error 2))
  cargo:warning=set the LLVM_CONFIG_PATH environment variable to the full path to a valid `llvm-config` executable (including the executable itself)

  --- stderr
  thread 'main' panicked at 'Unable to find libclang: "couldn\'t find any valid shared libraries matching: [\'libclang.so\', \'libclang-*.so\', \'libclang.so.*\', \'libclang-*.so.*\'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"'

On Arch Linux, installing extra/clang (version 11.1.0-1) resolved the issue.

Unify the mempool driver and the synchroniser

When the consensus core receives a new block, it first checks whether its mempool has the associated payload. If it doesn't, the mempool driver keeps the block and re-schedule execution when the mempool managed to get the payload from another node.

However, the synchroniser has no idea of this. So it can happen that the synchroniser tries to sync blocks that we already have in the mempool driver.

Sharing small payloads with other nodes timely

I am using this implementation as an "SMR module" in a prototype where the effective transaction rate is quite low. In this case, transactions "get stuck" in the mempool (more specifically, the Runner of the PayloadMaker) as follows: Since the number of transactions pending at a node is low, https://github.com/asonnino/hotstuff/blob/main/mempool/src/payload.rs#L49 never makes a payload; so the only time a payload is made (and the few pending transactions are shared with other nodes) is when the node becomes leader, which with many nodes might take a while and so the transactions experience quite some latency.

If instead even small payloads were shared with other nodes timely, then other leaders could propose these transactions, leading to better latency (albeit probably more overhead in communicating smaller payloads).

As a workaround, since in my use case throughput is really low, it is quick to hack https://github.com/asonnino/hotstuff/blob/main/mempool/src/payload.rs#L47 so that it makes and shares a new payload for every incoming transaction.

For a more universal solution, one might want a new mempool config parameter controlling a timeout to get a behavior like "keep adding transactions to the payload in the making; make and share the payload either once it is big or once the timeout for the oldest pending transaction has expired". Let me know your thoughts, I'd be happy to modify accordingly and submit a pull request.

Btw, great code base, thanks for sharing!

Separate Front and Mempool?

The mempool currently handles both incoming transactions (from clients) and incoming payloads (from other nodes). Should these two functions be separated and run in different tokio tasks?

Generic network

The mempool front is almost a copy-past of the network receiver; making it use the same network as the mempool and consensus should be a small change. Also take this opportunity to transfer the serialized payload to the core of mempool and consensus so that they can store it avoid a deserialize-serialize round.

Accounting for sync replies

We currently reply to any sync request we receive, which costs us resources (specifically for the worker). We need to do some accounting to prevent bad nodes from monopolizing our resources.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.