poanetwork / hbbft Goto Github PK

An implementation of the paper "Honey Badger of BFT Protocols" in Rust. This is a modular library of consensus.

License: Other

Rust 99.85% Shell 0.15%

hbbft's Introduction

Honey Badger Byzantine Fault Tolerant (BFT) consensus algorithm

Welcome to a Rust library of the Honey Badger Byzantine Fault Tolerant (BFT) consensus algorithm. The research and protocols for this algorithm are explained in detail in "The Honey Badger of BFT Protocols" by Miller et al., 2016.

An official security audit has been completed on hbbft by Jean-Philippe Aumasson.

Following is an overview of HoneyBadger BFT and basic instructions for getting started.

Note: This library is a work in progress and parts of the algorithm are still in development.

What is Honey Badger?

The Honey Badger consensus algorithm allows nodes in a distributed, potentially asynchronous environment to achieve agreement on transactions. The agreement process does not require a leader node, tolerates corrupted nodes, and makes progress in adverse network conditions. Example use cases are decentralized databases and blockchains.

Honey Badger is Byzantine Fault Tolerant. The protocol can reach consensus with a number of failed nodes f (including complete takeover by an attacker), as long as the total number N of nodes is greater than 3 * f.

Honey Badger is asynchronous. It does not make timing assumptions about message delivery. An adversary can control network scheduling and delay messages without impacting consensus.

How does it work?

Honey Badger is a modular library composed of several independent algorithms. To reach consensus, Honey Badger proceeds in epochs. In each epoch, participating nodes broadcast a set of encrypted data transactions to one another and agree on the contents of those transactions.

In an optimal networking environment, output includes data sent from each node. In an adverse environment, the output is an agreed upon subset of data. Either way, the resulting output contains a batch of transactions which is guaranteed to be consistent across all nodes.

In addition to validators, the algorithms support observers: These don't actively participate, and don't need to be trusted, but they receive the output as well, and are able to verify it under the assumption that more than two thirds of the validators are correct.

Please see the following posts for more details:

Algorithms

Honey Badger: Each node inputs transactions. The protocol outputs a sequence of batches of transactions.
Dynamic Honey Badger: A modified Honey Badger where nodes can dynamically add and remove other nodes to/from the network.
Queueing Honey Badger: Works exactly like Dynamic Honey Badger, but includes a built in transaction queue.
Subset: Each node inputs data. The nodes agree on a subset of suggested data.
Broadcast: A proposer node inputs data and every node receives this output.
Binary Agreement: Each node inputs a binary value. The nodes agree on a value that was input by at least one correct node.
Threshold Sign: Each node inputs the same data to be signed, and outputs the unique valid signature matching the public master key. It is used as a pseudorandom value in the Binary Agreement protocol.
Threshold Decryption: Each node inputs the same ciphertext, encrypted to the public master key, and outputs the decrypted data.
Synchronous Key Generation A dealerless algorithm that generates keys for threshold encryption and signing. Unlike the other algorithms, this one is completely synchronous and should run on top of Honey Badger (or another consensus algorithm)

External crates developed for this library

Threshold Crypto: A threshold cryptosystem for collaborative message decryption and signature creation.

Getting Started

This library requires a distributed network environment to function. Details on network requirements TBD.

Note: Additional examples are currently in progress.

Build

Requires Rust 1.36 or higher and cargo: installation instructions. The library is tested against the stable release channel.

$ cargo build [--release]

Testing

$ cargo test --release

See the tests README for more information on our testing toolkit.

Example Network Simulation

A basic example is included to run a network simulation.

$ cargo run --example simulation --release

Heading	Definition
Epoch	Epoch number. In each epoch, transactions are processed in a batch by simulated nodes (default is 10 nodes) on a network. The batch is always output in one piece, with all transactions at once.
Min Time	Time in simulated milliseconds until the first correct (i.e. not faulty) node outputs the batch.
Max Time	Time in simulated milliseconds until the last correct node outputs the batch.
Txs	Number of transactions processed in the epoch.
Msgs/Node	Average number of messages handled by a node. The counter is cumulative and includes the number of messages handled in the current epoch and all previous epochs.
Size/Node	Average message size (in converted bytes) handled by a node. This is cumulative and includes message size for the current epoch and all previous epochs.

Options

Set different parameters to simulate different transaction and network conditions.

Flag	Description
`-h, --help`	Show help options
`--version`	Show the version of hbbft
`-n <n>, --nodes <n>`	The total number of nodes [default: 10]
`-f <f>, --faulty <f>`	The number of faulty nodes [default: 0]
`-t <txs>, --txs <txs>`	The number of transactions to process [default: 1000]
`-b <b>, --batch <b>`	The batch size, i.e. txs per epoch [default: 100]
`-l <lag>, --lag <lag>`	The network lag between sending and receiving [default: 100]
`--bw <bw>`	The bandwidth, in kbit/s [default: 2000]
`--cpu <cpu>`	The CPU speed, in percent of this machine's [default: 100]
`--tx-size <size>`	The size of a transaction, in bytes [default: 10]

Examples:

# view options
$ cargo run --example simulation --release -- -h

# simulate a network with 12 nodes, 2 of which are faulty
$ cargo run --example simulation --release -- -n 12 -f 2

# increase batch size to 500 transactions per epoch
$ cargo run --example simulation --release -- -b 500

Protocol Modifications

Our implementation modifies the protocols described in "The Honey Badger of BFT Protocols" in several ways:

We use a pairing elliptic curve library to implement pairing-based cryptography using a Barrento-Lynn-Scott (BLS12-381) curve.
We add a Terminate message to the Binary Agreement algorithm. Termination occurs following output, preventing the algorithm from running (or staying in memory) indefinitely. (#53)
We add a Conf message to the Binary Agreement algorithm. An additional message phase prevents an attack if an adversary controls a network scheduler and a node. (#37)
We return additional information from the Subset and Honey Badger algorithms that specifies which node input which data. This allows for identification of potentially malicious nodes.
We include a Distributed Key Generation (DKG) protocol which does not require a trusted dealer; nodes collectively generate a secret key. This addresses the problem of single point of failure. See Distributed Key Generation in the Wild.

Algorithm naming conventions

We have simplified algorithm naming conventions from the original paper.

Algorithm Name	Original Name
Honey Badger	HoneyBadgerBFT
Subset	Asynchronous Common Subset (ACS)
Broadcast	Reliable Broadcast (RBC)
Binary Agreement	Asynchronous Binary Byzantine Agreement (ABA)

References

The Honey Badger of BFT Protocols
Honey Badger Video
Other language implementations
- Python
- Go
- Erlang
- Rust - unfinished implementation

Honey Badger Visualization

Contributing

See the CONTRIBUTING document for contribution, testing and pull request protocol.

License

Licensed under either of:

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

hbbft's People

Contributors

Stargazers

Watchers

Forkers

yrashk therustmonk nanne007 freechain gitter-badger andogro c0gent mbr amiller chainpunk jonnydubowsky afck demimarie 0zand1z pacoferre lonson zhaohanjin d33a94975ba60d59 logannc khoazany zhaohaijun alyjak robertsdotpm r0qs tanngocdo sgeisler padorax jsulmont phahulin korul crixalis2013 yangwao lab10-coop lexfro sustainabilitychain zerofruit agwab pawanjay176 univ-wyo-education trendsetter37 amanusk genysys kigawas jacky-xbb gwangchina86 superdev810 smilofoundation andypeng2015 vyugov amitgtx dendisuhubdy ujang360 yangl1996 recursive-bloom lycrushamster m00sey canislabs 798283635 kftygy4aquzoi tapalogi mvidigueira dingchaoz dmdcoin shaungt1 icodein andrewpedia alexandremessmer samuelschlesinger isabella232 amazingamy gxy1113 jejdouay justinmoon fedimint mnikolic93 ethanncnm sgladkov grumbach d-tietjen versatus jaleelpersonal wbv vegebun-csj iq-scm shubhankar4862 rocinta976 webberic92 0xgoerlimainnet dforsten agnxsh

hbbft's Issues

Add more detailed module-level documentation for all algorithms.

Every module should have an extensive description of what it does, and maybe even a short explanation of how it does it. (That explanation would be needed in the code anyway, to make it readable at all, so why not make it part of the module docs themselves.)

Once the API is stable (#66), we should add a usage example to every module, too.

Use stable for building on CI

Disregarding Clippy & friends, should hbbft not start targeting stable instead of a specific nightly? I believe the CI can be made to work with both without great effort, I will gladly tackle the issue if someone agrees that this is a good idea.

External transaction queue

In many use cases, the user might want to implement their own transaction queue to e.g. prioritize messages. E.g. Parity already has a transaction queue.

We could make it a Queue trait, with the default implementation the current (internal) one. The user could then write their own implementation. A Queue would basically have to "return a random N-th of the first batch_size transactions" on demand, and allow removing a set of transactions after it has been output in a batch.

HoneyBadger would probably not have an input method at all then: the input would go directly to the queue?

Deduplicate node ID and network size information.

Currently every instance of several algorithms has fields for the node's own ID, for the set of all node IDs, for the total number of nodes and/or the number of faulty nodes. We should find a way to deduplicate them.

Would it makes sense if all the instances had a reference &'a NetworkInfo to some common object like:

NetworkInfo<NodeUid> {
    our_uid: NodeUid,
    all_uids: BTreeSet<NodeUid>,
}

and methods num_nodes and num_faulty, etc.? The downside would be that they'd all carry around that lifetime 'a then.

Make DynamicHoneyBadger parameters dynamic, too.

Using a similar mechanism like the one to negotiate validator set changes, DynamicHoneyBadger could also allow changing parameters like the batch size (~~max_future_epochs~~ — that doesn't have to be the same among different nodes anyway), etc.

Also, if we don't find a good way to allow new nodes to join in any epoch (see #129), we should allow the user to restart HoneyBadger for no reason, so that new observers can join.

Implementation

We'd probably add a new Change variant BatchSize (and maybe RestartHoneyBadger). It would allow validators to vote for batch size changes in the same way they vote for adding or removing validators. Once such a change has a majority, no key generation needs to be initialized. We'd just restart HoneyBadger with the new parameter.

ACS should decide on a common subset if at least N - f nodes propose values

There is currently a bug that requires all nodes to propose to guarantee progress.

Tell the new node it's invited.

Currently a new observer can only join a DynamicHoneyBadger network after an epoch in which a ChangeState::Complete has been output, i.e. a node has just joined or left: That's when the internal Honey Badger restarts, key generation is terminated, the vote counts are reset and the "current change" is set to None. Joining at any other point would mean that its vote count is out of sync, it doesn't know what new validator set key generation is in progress for, it misses key generation messages, and it could even have missed Honey Badger messages (unless max_future_epochs is set to 0).

However, a natural point for a new node n to join as an observer is after the epoch where ChangeState::InProgress(Change::Add(n, _)) was output, i.e. where key generation for the new validator set including n is beginning. After InProgress(c), Honey Badger restarts, and the vote counts are also cleared. However, the current change is set to Some(c). What's still missing is a parameter change, so that the new node can initialize its DynamicHoneyBadger accordingly, and know what the current key generation process is about. In particular, it will know if it is the current candidate for joining as a validator, and that it's supposed to take part in key generation.

So the process of joining the network would be something like this:

The new node with ID n generates a random secret key sk, and the corresponding public key pk, and tells the network something like IWannaJoin(n, pk), and the network decides whether to accept it. That part is application-level logic.
Once accepted, the validators input Input::Change(Add(n, pk)) into DynamicHoneyBadger. Soon after that, an output batch in some epoch e will contain ChangeState::InProgress(Add(n, pk)). Starting with the messages from the method call that produced this output, all future Target::All messages must be sent to the new node. (Or cached and sent later, if not connected yet.)
Now the nodes send some kind of Welcome(e + 1, pub_key_set, node_ids) (application-level) message to the new node to inform it about the epoch in which it is joining as an observer, and about the existing node IDs (excluding n) and public keys (excluding pk) in the network.
The new node now creates a new Dynamic Honey Badger instance and sends its initial key generation messages:

let netinfo = NetworkInfo::new(n, node_ids, sk, pub_key_set);
let mut dhb = DynamicHoneyBadger::builder(netinfo)
    .start_epoch(e + 1)
    .change(Change::Add(n, pk))
    .build();
for msg in dhb.message_iter() {
    send_over_the_network(msg);
}

We need to:

add the new parameter and, if present, initialize key generation on startup,
document this,
properly test it,

and maybe:

rename InProgress to Begin or Start,
think about how we can simplify this for the user (what part of the application-level logic can be included in hbbft?),
split NetworkInfo into a public and private part, so that the PublicNetworkInfo can be sent to the new node, which would basically just be node_ids and pub_key_set,
make it possible somehow to join in any epoch (is that feasible?) or stop counting only in epochs: distinguish between e.g. an "era", and reset the epochs to 0 whenever a new era begins, that is, whenever a ChangeState other than None is output.

Together with #113, that should make it ~~easy~~ ~~feasible~~ theoretically possible to implement a network that can be started by a single node, and where validators can dynamically join and leave, without any interruption to the consensus process.

Reliable broadcast by all of the participating nodes is not supported.

Problem

The research paper describes that each in new epoch of ACS all participating nodes will propose their value in the RBC. If I take a look at the current implementation of the Reliable broadcast, I noticed that there is no support for that.

Is this correct? Have I miss interpreted the paper? Or did I miss something in the codebase?

Test fault logs.

Distributed Key Generation

Implement distributed key generation for the threshold schemes (#38, #40).
See Distributed Key Generation in the Wild and A robust threshold elliptic curve digital signature providing a new verifiable secret sharing scheme.

Delay sending messages for future epochs in Honey Badger?

For DynamicHoneyBadger it's important to determine which epochs a newly joined observer node can fully see (the new node needs to participate in the "on-chain" key generation, and it needs to see all epochs involved in that process). So if epoch0 output transactions that represent the decision to include a new node n, and if the key generation process starts in epoch0 + 1, then we need to make sure that node n, still as an observer, receives all messages belonging to epoch0 + 1.

This is mostly the responsibility of the user: We will document that once a batch contains the information about a new node, all future Target::All messages need to be sent to that node, too.

Currently, however, HoneyBadger can run several CommonSubset instances in parallel, i.e. it will handle messages belonging to future epochs and even send some epoch0 + 1 messages (e.g. broadcast Echos) before epoch0 has output. We could store all later outgoing messages in a cache, and only return them after epoch0's output.

On the other hand, handling them earlier may be a desirable optimization: it allows lagging nodes to still participate in and speed up later epochs before they even arrive there? If that's indeed valuable, we could just introduce a limit and, say, only hold back messages from epoch0 + 5 and later. Then we could start key generation in epoch0 + 5 and still know that node n will be able to participate.

Allow observer nodes to join a running network.

For completely new nodes, joining a running consensus session will involve two steps:

Join as an observer, so that there is a clearly defined first epoch from which they are guaranteed to receive all output.
Then perform Distributed Key Generation and turn into a full peer that takes part in the consensus itself.

With #93, it will be possible to remove peers and turn them into observer nodes, and to turn existing observer nodes into peers, so the second point is basically done.

For the first point, the signal indicating the new observer has to be defined by transactions itself — it already is, in #93, namely a majority of Add votes, which is communicated to the user as part of the Batch. The user is responsible for making sure that the new node receives all messages after that output. Once #91 is done, that guarantees that the new node indeed receives all messages for later epochs.

Finally, we need to modify the HoneyBadger and DynamicHoneyBadger constructors to allow starting in an epoch other than 0, and write more tests for changing peer/validator sets.

Make sure Honey Badger also works with a single validator.

This is especially important for Dynamic Honey Badger: With it, one will usually start a network with a single validator, and then add more using the internal DKG.

Design a more ergonomic API.

The current DistAlgorithm trait makes it easy for the caller to forget checking next_message and next_output. It would probably be safer to go back to a design where handle_message directly returns a (#[must_use]?) result that contains the messages and output.

We should also consider moving DistAlgorithm into the tests. Even if we don't, all the algorithms should at least be usable without it, too.

Fix Dynamic Honey Badger messaging logic.

This is currently a bit fuzzy, vulnerable to replaying votes, and doesn't always guarantee that votes eventually get committed. Also, if the joining node only joins after the epoch where the vote for adding it reaches a majority, it won't know the exact vote count and won't be able to detect if it loses majority again.
A solution could look as follows.

We clear the vote counts, restart Honey Badger and set start_epoch to the current epoch every time one of the following conditions occurs:

A change reaches majority: That change remains the "current" one until another change gets a majority of votes that are committed after this epoch. Votes committed in older epochs are dismissed.
Key generation for the current change completes.

Every key generation message (Accept or Prepare) is annotated with that start_epoch (that's already the case). Also, every change vote message is annotated with the start_epoch and in addition with an incrementing vote_number: That way, nodes can change their votes and old votes can't be replayed. For each validator, the vote that counts (if any) is the vote with the highest vote_num that has been committed in an epoch >= start_epoch.

For a vote, the message flow is:

The validator that wants to cast the votes signs the desired change (Add or Remove) with the start_epoch and incremented vote_num and sends it to all the other validators.
When we receive a vote, we put it into pending_votes: BTreeMap<NodeUid, Vote> if it has the current start_epoch, and has a greater vote_num that the existing entry by the same signatory (or if there's no entry yet).
In each new epoch, we propose all pending_votes that are not in votes yet.
After each epoch's output, we put all committed votes into votes if they have the current start_epoch and they have a grater vote_num than the existing entry by the same signatory (if any). If there is a majority in votes, we set current_change to that vote, restart Honey Badger and clear votes and pending_votes. Key generation starts for the new change.

One drawback still remains: the joining node can't verify that it has won the vote and is about to be added. That can't hurt the network, but it's probably still important for the new node to know whether it is actually expected. Also, it should know the current start_epoch so that it knows that it has received all votes, and that it will realize if another change gets a majority.

We could leave it as is: Then it's the user's responsibility to get that information to the new node and verify it.
We could add the current_change and start_epoch to each proposal. That is a bit wasteful, but then the new node would automatically receive it.
We could add an additional Restart(start_epoch, current_change) message type, probably for new observers in general. Every time Honey Badger is restarted, that is sent to Target::All. The new node needs f + 1 Restart messages to trust them.

Make sure secret keys are always inside a `ClearOnDrop`.

Overwriting the memory containing a secret key (#57) can still easily be forgotten: e.g. SecretKey::new returns a key outside of a ClearOnDrop.

Maybe the current SecretKey should be turned into a private SecretKeyInner, and put a ClearOnDrop<Box<SecretKeyInner>> inside the new SecretKey, so that outside of the crypto module there's no way to circumvent it anymore?

Random adversary.

While it's impossible to cover every conceivable sophisticated attack strategy, we should send at least one fake message of each variant in some test, i.e. for each algorithm have a random adversary that creates all kinds of nonsense messages.

Where possible, we could make it a bit smarter (or have a separate, less random adversary) and e.g. generate fake messages with the correct epoch (or ±1), since that is more likely to expose bugs.

Another way to generate more "realistic" fake messages is to simulate a complete instance of the algorithm in parallel and use it as inspiration, i.e. modify its messages a bit and send them to the honest nodes.

Project building error

Cannot build the project due to following error:

Evgens-MacBook-Pro:hbbft user$ cargo build
   Compiling scopeguard v0.3.3
   Compiling smallvec v0.6.0
   Compiling cfg-if v0.1.2
   Compiling nodrop v0.1.12
   Compiling lazy_static v1.0.0
   Compiling crossbeam v0.3.2
   Compiling remove_dir_all v0.5.0
   Compiling stable_deref_trait v1.0.0
   Compiling either v1.5.0
   Compiling rayon-core v1.4.0
   Compiling lazy_static v0.2.11
   Compiling memoffset v0.1.0
   Compiling untrusted v0.5.1
   Compiling protobuf v1.5.1
   Compiling cc v1.0.10
   Compiling memoffset v0.2.1
   Compiling libc v0.2.40
   Compiling gcc v0.3.54
   Compiling log v0.4.1
   Compiling crossbeam-utils v0.2.2
   Compiling arrayvec v0.4.7
   Compiling owning_ref v0.3.3
   Compiling protoc v1.5.1
   Compiling num_cpus v1.8.0
   Compiling rand v0.4.2
   Compiling time v0.1.39
   Compiling crossbeam-epoch v0.3.1
   Compiling crossbeam-epoch v0.2.0
   Compiling reed-solomon-erasure v3.0.3
   Compiling simple_logger v0.5.0
   Compiling crossbeam-deque v0.2.0
   Compiling tempdir v0.3.7
   Compiling parking_lot_core v0.2.13
   Compiling parking_lot v0.4.8
   Compiling rayon v1.0.1
   Compiling rayon v0.8.2
   Compiling crossbeam-channel v0.1.2
   Compiling ring v0.12.1
   Compiling merkle v1.5.1-pre (https://github.com/vkomenda/merkle.rs?branch=public-proof#f34be0aa)
   Compiling protoc-rust v1.5.1
   Compiling hbbft v0.1.0 (file:///Projects/hbbft)
error: failed to run custom build command for `hbbft v0.1.0 (file:///Projects/hbbft)`
process didn't exit successfully: `/Projects/hbbft/target/debug/build/hbbft-b8fd1374e3e5d302/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-changed=proto/message.proto

--- stderr
thread 'main' panicked at 'protoc: Error { repr: Os { code: 2, message: "No such file or directory" } }', src/libcore/result.rs:916:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:68
             at src/libstd/sys_common/backtrace.rs:57
             at src/libstd/panicking.rs:381
   2: std::panicking::default_hook
             at src/libstd/panicking.rs:397
   3: std::panicking::begin_panic
             at src/libstd/panicking.rs:577
   4: std::panicking::begin_panic
             at src/libstd/panicking.rs:538
   5: std::panicking::try::do_call
             at src/libstd/panicking.rs:522
   6: std::panicking::try::do_call
             at src/libstd/panicking.rs:498
   7: <core::ops::range::Range<Idx> as core::fmt::Debug>::fmt
             at src/libcore/panicking.rs:71
   8: core::result::unwrap_failed
             at /Users/travis/build/rust-lang/rust/src/libcore/macros.rs:23
   9: <core::result::Result<T, E>>::expect
             at /Users/travis/build/rust-lang/rust/src/libcore/result.rs:809
  10: build_script_build::main
             at ./build.rs:5
  11: std::rt::lang_start::{{closure}}
             at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
  12: std::panicking::try::do_call
             at src/libstd/rt.rs:59
             at src/libstd/panicking.rs:480
  13: panic_unwind::dwarf::eh::read_encoded_pointer
             at src/libpanic_unwind/lib.rs:101
  14: std::sys_common::bytestring::debug_fmt_bytestring
             at src/libstd/panicking.rs:459
             at src/libstd/panic.rs:365
             at src/libstd/rt.rs:58
  15: std::rt::lang_start
             at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
  16: build_script_build::main

Optimize Common Coin using a schedule

Computations involving pairing are relatively slow. Simulation reveals that with even more messages requiring such computations to be performed by each node, the node run times soar. The main contributor to the run time increase is the common coin algorithm. We should try to reduce the number of encrypted Coin messages as suggested in amiller/HoneyBadgerBFT#63. That will also lead to the reduction in the number of Conf messages.

Consistently name individual algorithms.

We started referring to CommonSubset as Subset in some places, and should make sure the names are the same everywhere. Everything should follow the naming convention described in the Readme.

Create a type of test adversary that can perform Man-in-the-Middle attacks

Currently the test adversary can perform only eavesdropping attacks. To the attack in Issue #37, there has to be a more capable adversary that can also

reorder messages,
delay messages for any amount of time.

While man-in-the-middle attacks can be more powerful than the #37 attack, that attack requires the attacker to control the message flow rather than just listen to it. This level of control should be provided by the man-in-the-middle adversary type.

Use error-chain.

It provides a macro to implement the Error trait and error conversions.

(There are lots of error handling crates, but error-chain is what e.g. ethabi, ethcore and poa-bridge also use.)

Test filtering out decryption shares that fail verification

In #68, a decryption share received by HoneyBadger is verified as soon as the ciphertext becomes available for the proposer of this share's value.

We need a CI test to verify this implementation.

Evaluate whether we can use Secret Store's DKG, and secp256k1 for threshold schemes.

Parity's Secret Store implements ECDKG to generate secpk256k1 key shares: It generates a random polynomial f of degree t in Zn (integers modulo n), where n is the order of the group secpk256k1 (G) and distributes the value f(i) to node i > 0, without any node knowing the polynomial. The values f(i) * g, with g the base point of G, are public.

This is used for threshold Schnorr signatures, where ECDKG generates both the nodes' secret keys (once) and the shares for the random nonce r (once per signing protocol execution).

G doesn't seem to have a pairing, and DDH is probably infeasible due to its large embedding degree. So our threshold schemes (#38, #40) don't work in it.

Questions:

Is the above correct at all?
Does threshold encryption also work with G?
Do Schnorr threshold signatures have the uniqueness property required by the common coin?
Is it still possible to have secure coin flips with only one message round?
The Secret Store implementation seems to be incomplete. Can we verify and complete it?
Can we use Parity's DKG instead of our own (#47)?

Extend the test framework to allow benchmarking.

The test framework should optionally support benchmarking the algorithms, e.g. by tagging each message that an instance outputs with a "timestamp" — not real time, just a counter to simulate network lag —, and instead of delivering random messages, always first deliver the ones with the earliest timestamp, i.e.:

Set time = 0.
Process inputs, tag all resulting messages with timestamp 1. Then repeat:
- Increase time to the lowest value of any queued message.
- Handle all messages with timestamp time, and queue the resulting new messages with timestamp time + 1.

Then the time value at the point where the instances output is the latency of the algorithm, in multiples of the network lag.

This could be extended to take into account:

Bandwidth: Instead of time + 1, tag the message with time + lag + msg_size / bandwidth, where msg_size is the length of the serialized message.
CPU: Measure the real time it takes to handle each message, and add (with some configurable factor) that time, too.

These would also require keeping track of a timestamp per node.

Fix Common Coin use in `Agreement`.

The algorithm needs a third round of messages (Conf), otherwise an adversary that controls the network scheduler and one node can potentially stall agreement: amiller/HoneyBadgerBFT#59

Consider making the cryptography tools `Bls12`-only.

It would simplify the code, other modules wouldn't have to be aware of pairing, and if zkcrypto/pairing#87 is implemented, we will remove our general Engine serde implementations and replace them with the Bls12-specific ones.

Consider clearing private keys after drop.

It's probably safer to zero private keys' memory before freeing it, e.g. with clear_on_drop.

Enforce batch size, add configuration options.

There are different ways one may want to specify the batch size in Honey Badger (see the TODO):

Currently the batch size itself is a parameter; each node proposes batch_size / num_nodes transactions per epoch. This is desirable e.g. in a blockchain with a fixed maximum block size.
To achieve its asymptotic complexity, the paper suggests a batch size of lambda * num_nodes * num_nodes * log(num_nodes). Since this comes at the cost of a quickly increasing latency, it's not always the right choice in practice.
There are other options in between with varying tradeoffs, e.g. lambda * num_nodes (constant proposal size per node) or lambda * num_nodes * log(num_nodes), etc.

We could make the batch size specification an enum that allows picking one of these options. This is particularly important for DynamicHoneyBadger, where the number of nodes can change.

Also, we should enforce the limit: If another node is sending us a proposal that exceeds the per-node allowance, the transfer needs to be interrupted, the proposal ignored and the node reported/logged.

Algorithms should not return errors if <⅓ is faulty.

We should double-check some of our error conditions:

The algorithms should never return an error in any circumstance that can be brought about by <⅓ faulty nodes. In any such case, we need to clearly define the behavior (e.g. terminate without output) in a way that all good nodes will do the same. The problem with errors is that they can cause the user (e.g. a higher-level algorithm) to return early from a function, and fail to follow the correct protocol.
For things that can only happen if ≥⅓ is faulty, returning an error is probably the right thing: there is no correct protocol anymore, and the user needs to be alerted and needs to be able to decide programmatically what to do.
In cases that can only happen due to programming errors, I'd say we should use expect, with a string that argues why this cannot fail. (E.g. parity seems to do this.) Error variants whose instantiation should be unreachable code are confusing to the user.

Ensure that the library API satisfies all items in the guidelines checklist

The library API (see #66) has to satisfy (possibly a reasonable subset of) Rust API guidelines:

https://rust-lang-nursery.github.io/api-guidelines/checklist.html

Complete the basic Honey Badger test.

The test doesn't verify yet that the output batches all agree.

Encrypt proposed transactions in Honey Badger

Using #40, encrypt the proposed batches of transactions in HoneyBadger.

Put protobuf serialization behind a feature flag.

This would make it easier for users who don't want to use protobuf to compile the crate. It would remove the requirement to have protoc installed.

Threshold Encryption

Honey Badger requires threshold encryption to avoid sending the proposed transactions as plaintext. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.119.1717&rep=rep1&type=pdf

Allow "observer" nodes.

Most of the algorithms generalize with almost no change to the code to a situation where other nodes, that are not part of the consensus group itself, want to receive the output, too. If all Target::All messages are sent to nodes even if they are not in all_uids, then those nodes will be able to receive and verify the output, too.

This saves an additional type of messages and round of signatures in all use-cases where nodes other than the consensus nodes need to learn about the outcome.

internal message channels

At the moment the internal messaging in the hbbft crate is built on std::sync::mpsc and spmc. It has been known for a while that performance of those could have been better. There are benchmark results here:
https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-11-09-channel.md

Will it improve anything if we use crossbeam-channel instead, or some other implementation of channels for that matter?

Dynamic Honey Badger should produce proofs for new public keys.

When the set of validators changes, we create a new set of secret key shares, and a new public key. Users will probably need a proof to trust the new key, i.e. a signature by the old key.

Is that easy to do externally, or should this just be done by default as part of DynamicHoneyBadger? After the block in which key generation has finished, should we create signature shares for the new block, commit them via the old HoneyBadger instance and only restart with the new validator set once enough of those have been output (i.e. sign the new public key on-chain)? Or should we add an additional message round for exchanging the signature shares, and make the result part of the dynamic_honey_badger::Batch?

Prioritize DKG transactions in `DynamicHoneyBadger`.

In some cases it is desirable to prioritize transactions, i.e. make sure the node includes it in its batch instead of only choosing it at random. We can just add another high priority buffer that always gets included.

For DynamicHoneyBadger the DKG-related transactions should be prioritized to speed up the network change.

Add a `Terminate` message to ABA?

I think our current Agreement implementation is not guaranteed to terminate: It will eventually output, but after that it could hang.

Let's say the f faulty nodes just never send any messages, one node, Alice, decides and outputs 0️⃣ in round 1, and the coin shows 0️⃣ in round 2 again. Then Alice will terminate in round 2 and honest Bob will output in round 2. But Bob must continue until the coin comes up 0️⃣ again. He will be stuck in round 3, where he won't receive more than N - f - 1 Aux messages.

Am I missing something? But even if I am, it would be nice to be able to terminate immediately when we output.

We should add a new message type AgreementContent::Terminate(bool), which counts as Aux, BVal and Conf for all following rounds. Also, if a node receives f + 1 of them, it can immediately terminate, too.

Mostéfaoui, Moumen and Raynal wrote a follow-up paper where they deal with termination. There are some other modifications to ABA in there which we should also have a closer look at. Also: How do the Python, Erlang and Go implementations handle the above? If this is indeed a problem, we should file bugs there, too.

Measure number and size of messages in benchmarks.

Even without simulating bandwidth, CPU and random delays, the simulation example could output the number and total size of messages, in addition to the (simulated) time.

Implement a threshold signature scheme.

The common coin is based on threshold signatures: https://eprint.iacr.org/2002/118.pdf (section 4.2)

Common Coin

Honey Badger's common coin implementation requires threshold signatures: #38

Limit message caches and future epochs?

We need to ensure that an attacker can't fill up a node's memory by sending it lots of messages that it stores forever, particularly in the algorithms that proceed in epochs (agreement, Honey Badger, …) and that, to be fully asynchronous, would in theory need to store any information for any future epoch until they can process it.

It's tempting to enforce some reasonable limits instead, and just assume that we'll never fall behind by more than 1000 epochs. But then an attacker could break consensus by sending a message that is exactly 1000 epochs ahead and will be handled by only some of the honest nodes.

Either way, we should try to limit the amount of memory per epoch: E.g. in agreement, incoming_queue should be replaced by just the information which of the other nodes sent which of the four possible messages: BVal(true), BVal(false), Aux(true), Aux(false).

Honey Badger
Binary Agreement

Split `honey_badger` into smaller modules.

The decryption part involves many lines of code in honey_badger.rs. I wonder whether this could be split out into a private submodule. We could write a kind of collection in which you can put the ciphertext and decryption shares and which would automatically return the decrypted item as soon as it has enough information. If both steps (Subset and Decryption) are separate modules, honey_badger.rs should be very simple and easy to read.

Binary agreement protocol instance

This is an atomic task that can be done in parallel with other tasks. The Common Subset Agreement protocol outlined in Figure 4 of the HoneyBadgerBFT paper features two primitives: Reliable Broadcast and Binary Agreement. While the work on Reliable Broadcast is still in progress, a Binary Agreement instance can be implemented. A detailed description of Binary Agreement is found in Figures 11 and 12.

Support serde.

The crate should support serde serialization for all its messages, possibly behind a feature flag.

One issue is that the merkle types don't implement serde's serialization traits, and they are not trivial to implement because the Proofs contain the hash algorithm instance. We could extract and serialize only the "ProofData", without the algorithm.

Report faulty nodes.

Keep a log of events that prove that a particular node is faulty, and propagate that information up, e.g. via a DistAlgorithm::get_fault_log method. (See #24.)

Reduce the memory footprint of a decryption share in HoneyBadger

PR #68 introduced a new kind of HoneyBadger message that contains a decryption share of the CommonCoin for the threshold encryption scheme. The decryption share contains a vector of 48 bytes which is the compressed representation of a group element. #68 took a rather relaxed view on optimization of memory usage and simply cloned the same share from local storage to a message. This can be optimised by wrapping the locally stored share into an Rc and cloning that Rc to the message instead of the whole vector. However, since serialization is not derivable for Rc, that required defining custom Serialization and Deserialization instances for HoneyBadger::Message, or possibly using an attribute #[serde(with = "a_new_rc_serde_module")].

SyncKeyGen uses stack allocated SecretKeys

SyncKeyGen::new() and SyncKeyGen::generate() both create secrets on the stack.
SyncKeyGen is instantiated using NetworkInfo.secret_key().clone() which copies the SecretKey from a ClearOnDrop<Box<SecretKey>> onto the stack then moves ownership of the SecretKey into an instance of SyncKeyGen (resulting in two copies).