facebookresearch / narwhal Goto Github PK

View Code? Open in Web Editor NEW

179.0 11.0 79.0 3.54 MB

Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus.

License: Apache License 2.0

Python 28.43% Rust 71.57%

narwhal's People

Contributors

Stargazers

Watchers

narwhal's Issues

Committee update

What is the best way to:

Update the committee (change of authorities)
Change authorities network info (not the authorities per se, but for instance their ip addresses)

aws ec2 bandwidth between different regions

It seems that a peer connection is needed to establish between any two regions to achieve high bandwidth in aws ec2, while i didn't find in benchmark scripts.

Remove redundant serialize-deserialize on the Primary

The Primary serializes and deserializes data many times for different reasons (storage, net, replying to sync requests). It would be nice to use the same trick as the worker to avoid/limit that.

Redundant Dag Traverse

Hello there,

I am new to this project and have one question about the consensus crate.

I feel the 'Consensus:order_leaders' is redundant. The traverse of the dag is repeated twice(or more if some leader has no link to the later blocks) in both order_leader and order_dag. Is there any purpose for doing that?

Autenticate node's channels

We need some (lightweight) way to ensure only members of the committee can talk to each other, and that bad nodes cannot impersonate good ones.

Do not sequence store commands

There is no reason to sequence store commands into a single task, we can clone the rocksdb struct.

Ensure crash recovery

Have we thought already about crash and recovery?
For example, we need to persist headers that we already signed in order to not lose this information when we crash and recover. Is it possible to read important information from DB every time we recover?

Our round.
All certificates from previous round.
All headers we signed in current round.
digest that we need to re-include?

Reliable sender's connection replies has a potential ordering issue

Hello.

I think there is a bug in reliable sender's connection's pending replies queue ordering, but please let me know if it is my understanding that is lacking.

If we send multiple messages to a peer, there is no reason that the acknowledgements will be received in the same order. Yet the code seems to assume that because it is sending the received ACK message to the first handler in the queue (pop_front).

How do the replicas respond to a client?

Hello! I am working on a secondary development of this project.

Reading the code, I see that the client (from benchmark_client.rs) sends data to the replicas here. A replica accepts it while makes and commit a new block. During this call, the client does not wait for a reply from the replica and thus it receives no data from the replica.

Consider a scenario where the client needs to fetch the data stored on the state machine replication. Could you please give me some hints on how I can modify your code so that replicas could respond to client requests?

Thanks a lot!

Panic upon storage failure

There is no point in keeping the system running if the storage fail (this is in fact dangerous). We currently panic upon storage failure but only at a late stage.

Smarter sync mechanism

There are many ways to improve our current sync strategy.

For instance, if we get the same parent from more than one peer, we can ask these peers first before random selection.

An other example could be:

Ask all nodes for missing data
All nodes start streaming chunks (FC coded) of the data
Stop streaming once we can reconstruct the data

All nodes keep in memory already FC-coded data in case others need to sync.

configuration and logs of test results

Hello,
We conducted a 100-node test on the WAN, and the test configuration and results are as follows:

+ CONFIG:
Faults: 33 node(s)
Committee size: 100 node(s)
Worker(s) per node: 1 worker(s)
Collocate primary and workers: True
Input rate: 234,500 tx/s
Transaction size: 200 B
Execution time: 41 s
Header size: 1,000 B
Max header delay: 200 ms
GC depth: 50 round(s)
Sync retry delay: 10,000 ms
Sync retry nodes: 33 node(s)
batch size: 200,000 B
Max batch delay: 200 ms
+ RESULTS:
Consensus TPS: 213,986 tx/s
Consensus BPS: 42,797,208 B/s
Consensus latency: 4,771 ms
End-to-end TPS: 207,890 tx/s
End-to-end BPS: 41,577,920 B/s
End-to-end latency: 7,852 ms

We have some questions about the test log and configuration.

First, according to the configuration, the sending rate of each client is 3500 tx/s, and our test time is 30s. But according to client's log, each client sends about 800 tx in this period of time.
According to the worker's log, every 4 tx make up a batch, but batch contains 14000B is displayed, which doesn't seem to match the configured tx_size = 200 B.

Batch jOiahFVevxMc4+RQEIlZfEjFHha/oBesYqcEHBKSZiU= contains sample tx 786
Batch jOiahFVevxMc4+RQEIlZfEjFHha/oBesYqcEHBKSZiU= contains sample tx 787
Batch jOiahFVevxMc4+RQEIlZfEjFHha/oBesYqcEHBKSZiU= contains sample tx 788
Batch jOiahFVevxMc4+RQEIlZfEjFHha/oBesYqcEHBKSZiU= contains sample tx 789
Batch jOiahFVevxMc4+RQEIlZfEjFHha/oBesYqcEHBKSZiU= contains 140000 B

I would also like to ask about the meaning of the same number after each "Committed B" in the primary log.

Committed B97(mZFTSr1a8XJoClO4) -> WK0oFGTH44pm3PYAtejZ05EDysdIuDJ1MZuphZQe3m4=
Committed B97(mZFTSr1a8XJoClO4) -> isS9EtiKzZ2qm3DfL37mt8o02TPS519+/aEDggnzTTE=
Committed B97(mZFTSr1a8XJoClO4) -> vI69CmxG2PlTMDc5GYZreMZVUIFIhS8zSIQCQmBxAlk=

No store for synchronizer

Do not use the store in the synchronizer, it can be much faster to keep data in memory (we have a lot of memory).

Accounting for sync replies

We currently reply to any sync request we receive, which costs us resources (specifically for the worker). We need to do some accounting to prevent bad nodes from monopolizing our resources.

The experiment is so expensive ,how can i do that in any other cheap ways?

Hello, as an individual learning developer, I cannot afford such an expensive and highly configured AWS server. Can this code run on a low configuration server? Do i need to change any settings?

Re-include missed txs into our blocks

Currently, we never do it. But it might be the case that the upper layer consensus declined our block and thus we need to re-include it TXs. We need to think about the API with the consensus layer - it should tell as which blocks we can move to cold storage and which we need to retry.

We currently re-include digests until they appear in certified header. However, a certified header might still not get into the DAG. So need to think of a more accurate condition to stop re-include digests.

Protect primary against DoS

A bad node may make us run out of memory by sending many headers with very high round numbers. An easy fix is to add one parent certificate (not its hash) to the header, and only sign header with a round of certificate.round + 1.

Support read operations

What is the best way to support read operations for clients? Remember that the state is sharded amongst the workers.

facebookresearch / narwhal Goto Github PK

narwhal's People

Contributors

Stargazers

Watchers

Forkers

narwhal's Issues

Recommend Projects

Recommend Topics

Recommend Org