asonnino / hotstuff Goto Github PK
View Code? Open in Web Editor NEWImplementation of the HotStuff consensus protocol.
License: Apache License 2.0
Implementation of the HotStuff consensus protocol.
License: Apache License 2.0
In handle_tc
, the validity of the TC is not checked:
hotstuff/consensus/src/core.rs
Lines 400 to 408 in 9e9c286
If I am not wrong, any malicious node could send a TC, making correct nodes moving to the next round.
Fix Mathieu's bug on 2-chain HotStuff
A bad node may make us store a lot of crap. There is currently no limit to how many payload they can send us, and we will store them all.
Hi, clap cannot parse the command client [...] --nodes IP1:PORT1 IP2:PORT2
when multiple IP:PORT are provided.
Line 36 in 40180b4
I think adding multiple=true
to this line would solve the issue.
I noticed that generate_proposal
sometimes gets called multiple times for a given round. I think that it might be caused by the fact that every node can broadcast a TC. So if N nodes broadcast a TC, it will be handled N times and generate_proposal
called N times as well.
The committee contains an epoch number, but it is never used to reject messages from wrong epochs.
In two places of the code, we call ReliableSender::broadcast()
and wait for 2f+1 answers.
hotstuff/mempool/src/quorum_waiter.rs
Lines 70 to 83 in ce476e2
hotstuff/consensus/src/proposer.rs
Lines 94 to 121 in ce476e2
This function returns handlers that cancel the transfer when they are dropped. Since we wait for 2f+1 answers before dropping the handlers, it is possible that f
honest nodes do not receive the message.
I suggest adding this line :
tokio::spawn(async move { while let Some(_) = wait_for_quorum.next().await {} });
at the end of these two code blocks so the handlers do not get dropped.
I ran two local benchmark of the actual code base after adding this line and obtained 3x more Committed ->
logs. (1500 vs 4600)
What's the best approach to implement shared randomness? It will be used to elect the leader for async fallback, and chained-VABA.
Safety-critical information such as the last voted round are currently not persisted to storage.
The core of consensus and mempool both use the same channel for loopback messages and net messages. This is a problem: a bad node may format their messages as "loopback" to avoid some checks.
Add doc strings and better comments to the rust code.
What are your versions of python and fabric? I run fab local
in a new tmux session and it will crash, which means the session will be killed. Thanks.
Adding the highest known QC to timeout votes allows honest nodes to synchronize faster in case of leader failures.
Write a wiki to help getting started and understand how the code is structured.
A bad node may make us run out of memory by sending many votes with different round numbers (as long as they are bigger than our current round) or with different digests. We will store all of them in memory and clean them up only upon moving to the next round.
A similar issue appears in the mempool's synchronizer. A bad node may send us many different blocks for the same round (ie. blocks with different payloads), and we will try to synchronizer the block data with other nodes.
Hi, we have been using this library for a consensus scenario. But there seems to be some issues about restarting a node.
In our scenario, we run 4 nodes for consensus.
Then we stop one of them for approximately 0.5~1 hours.
Then we restart the node.
Afterwards, the node runs for a lot of synchronization blocks and gets stuck. Moreover, it finally drags down all other three nodes, and the whole system hangs.
What could be the problem and do u have a solution for this case?
Thanks~
We tested a lot of group rate
under this tx_size
and max_payload_size
ใ faults
(branch 3-chain
), but many of the results were 0
, and we don't know why this is happening?
+ CONFIG:
Committee size: 100 nodes
Input rate: 2,010 tx/s
Transaction size: 1,000 B
Faults: 33 nodes
Execution time: 0 s
Consensus timeout delay: 5,000 ms
Consensus sync retry delay: 100,000 ms
Consensus max payloads size: 1,000 B
Consensus min block delay: 100 ms
Mempool queue capacity: 1,200,000 B
Mempool sync retry delay: 100,000 ms
Mempool max payloads size: 256,000 B
Mempool min block delay: 500 ms
+ RESULTS:
Consensus TPS: 0 tx/s
Consensus BPS: 0 B/s
Consensus latency: 0 ms
End-to-end TPS: 0 tx/s
End-to-end BPS: 0 B/s
End-to-end latency: 0 ms
But once in a while, the throughput is displayed. This is also the case for 4, 10, 40, and 100 node tests.
+ CONFIG:
Committee size: 100 nodes
Input rate: 3,350 tx/s
Transaction size: 1,000 B
Faults: 33 nodes
Execution time: 21 s
Consensus timeout delay: 5,000 ms
Consensus sync retry delay: 100,000 ms
Consensus max payloads size: 1,000 B
Consensus min block delay: 100 ms
Mempool queue capacity: 1,200,000 B
Mempool sync retry delay: 100,000 ms
Mempool max payloads size: 256,000 B
Mempool min block delay: 500 ms
+ RESULTS:
Consensus TPS: 836 tx/s
Consensus BPS: 836,069 B/s
Consensus latency: 9,582 ms
End-to-end TPS: 622 tx/s
End-to-end BPS: 622,008 B/s
End-to-end latency: 13,812 ms
Is it possible to have an upper bound on how much storage is required to run a node for one epoch?
All unit tests that require the store currently create one instance of RockDB. We therefore have to be careful to initialise each of these instances with a different storage path or 'cargo test' cannot run tests in parallel.
It would be better to write a simple mock storage (an in-memory store) with the same interface as the current store that could be used for testing.
Hey, I just wonder whether you just done it by:
// consensus/src/synchronizer.rs
let message = NetMessage(Bytes::from(bytes), addresses);
network_channel.send(message).await;
If so, why a simple send()
can achieve it, I found it in doc that the only parameter will just be handled as a message.
If not, then how can the protocol achieve it?
Looking forward to your kind reply ๐
We need to add the highQC in timeout messages to enable bounded time catchup for slow nodes
Since README.md
provides quick start instructions, perhaps mention the dependency of librocksdb-sys
on clang
there.
During build at
Compiling librocksdb-sys v6.11.4
I got the errors
--- stdout
cargo:warning=couldn't execute `llvm-config --prefix` (error: No such file or directory (os error 2))
cargo:warning=set the LLVM_CONFIG_PATH environment variable to the full path to a valid `llvm-config` executable (including the executable itself)
--- stderr
thread 'main' panicked at 'Unable to find libclang: "couldn\'t find any valid shared libraries matching: [\'libclang.so\', \'libclang-*.so\', \'libclang.so.*\', \'libclang-*.so.*\'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"'
On Arch Linux, installing extra/clang
(version 11.1.0-1
) resolved the issue.
Is it possible to make the synchroniser memory bound?
It seems that there is a typo in this following line when reading node parameters
hotstuff/benchmark/benchmark/config.py
Line 94 in d771d48
When the consensus core receives a new block, it first checks whether its mempool has the associated payload. If it doesn't, the mempool driver keeps the block and re-schedule execution when the mempool managed to get the payload from another node.
However, the synchroniser has no idea of this. So it can happen that the synchroniser tries to sync blocks that we already have in the mempool driver.
The benchmarking scripts currently support a single AWS region.
I am using this implementation as an "SMR module" in a prototype where the effective transaction rate is quite low. In this case, transactions "get stuck" in the mempool (more specifically, the Runner of the PayloadMaker) as follows: Since the number of transactions pending at a node is low, https://github.com/asonnino/hotstuff/blob/main/mempool/src/payload.rs#L49 never makes a payload; so the only time a payload is made (and the few pending transactions are shared with other nodes) is when the node becomes leader, which with many nodes might take a while and so the transactions experience quite some latency.
If instead even small payloads were shared with other nodes timely, then other leaders could propose these transactions, leading to better latency (albeit probably more overhead in communicating smaller payloads).
As a workaround, since in my use case throughput is really low, it is quick to hack https://github.com/asonnino/hotstuff/blob/main/mempool/src/payload.rs#L47 so that it makes and shares a new payload for every incoming transaction.
For a more universal solution, one might want a new mempool config parameter controlling a timeout to get a behavior like "keep adding transactions to the payload in the making; make and share the payload either once it is big or once the timeout for the oldest pending transaction has expired". Let me know your thoughts, I'd be happy to modify accordingly and submit a pull request.
Btw, great code base, thanks for sharing!
The mempool currently handles both incoming transactions (from clients) and incoming payloads (from other nodes). Should these two functions be separated and run in different tokio tasks?
The mempool front is almost a copy-past of the network receiver; making it use the same network as the mempool and consensus should be a small change. Also take this opportunity to transfer the serialized payload to the core of mempool and consensus so that they can store it avoid a deserialize-serialize round.
We currently reply to any sync request we receive, which costs us resources (specifically for the worker). We need to do some accounting to prevent bad nodes from monopolizing our resources.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.