consensys / handel Goto Github PK

Multi-Signature Aggregation in a Large Byzantine Committees

License: Apache License 2.0

Go 91.57% HCL 4.17% Python 4.16% Shell 0.10%

aggregation multi-signature-aggregation multi-signature handel byzantine-committees

handel's Introduction

Handel

Handel is a fast multi-signature aggregation protocol for large Byzantine committees. This is the reference implementation in Go.

The protocol

Handel is a Byzantine fault tolerant aggregation protocol that allows for the quick aggregation of cryptographic signatures over a WAN. Handel has both logarithmic time and network complexity and needs minimal computing resources. For more information about the protocol, we refer you to the following presentations:

Stanford Blockchain Conference 2019: the slides presented.
Community Ethereum Development Conference 2019: the slides

We have a paper in submission available here: https://arxiv.org/abs/1906.05132 Please note that the slides are not up-to-date with the latest version of the paper.

The reference implementation

Handel is an open-source Go library implementing the protocol. It includes many openness points to allow plugging different signature schemes or even other forms of aggregation besides signature aggregation. We implemented extensions to use Handel with BLS multi-signatures using the BN256 curve. We ran large-scale tests and evaluated Handel on 2000 AWS nano instances located in 10 AWS regions and running two Handel nodes per instance. Our results show that Handel scales logarithmically with the number of nodes both in communication and re- source consumption. Handel aggregates 4000 BN256 signatures with an average of 900ms completion time and an average of 56KBytes network consumption.

Installation

This library requires go version 1.11+

This library uses go modules, so make sure either you clone this library outside your $GOPATH or use GO111MODULE=on before building it.

If you want to hack around the library, you can find more information about the internal structure of Handel in the HACKING.md file.

License

The library is licensed with an Apache 2.0 license. See LICENSE for more information.

handel's People

Contributors

Stargazers

Watchers

Forkers

nikkolasg bonedaddy isabella232 nibhar

handel's Issues

Profiling handel

Selection of signature to verify with the store

Today we check all signatures.
The store should give us:

the signatures that we can discard without checking them
a score for the signatures we keep

Master node should start slaves

Currently AWS plafrom starts both master and slaves nodes, we can use master to fully control the lifetime of slaves nodes

[paper] add security section

discussion - how many packets per level

should have discussion about the two approaches the two nicolas have implemented:

Each time there is a better signature for a given level, we send this new signature to candidateCount peers. The periodic update only sends the current best signature at each level periodically. This sends a lot of "low quality" signatures but homogeneize all nodes.
when there is a better signature, it will be sent only during the next iteration of the periodic update to one single peer. Only when there is a full signature, it sends to candidateCount peers. This reduces the load of "low quality" signatures but can produces more heterogeneous signatures between all peers.

BitSet - add IsZero + And(bs)

efficiency in the store merging

Add example without signature aggregation (SUM for example)

remote execution will hang if we print lines too long

The remote execution will hang if we print lines too long. It works if they are shorter eg. with \n)
this can be reproduced with the following patch in the simul package

diff --git a/processing.go b/processing.go
index ea7de83..514059b 100644
--- a/processing.go
+++ b/processing.go
@@ -7,6 +7,7 @@ package handel
 import (
        "errors"
        "fmt"
+       "os"
        "sync"
        "time"
 )
@@ -315,10 +316,11 @@ func (f *evaluatorProcessing) processStep() bool {
 func (f *evaluatorProcessing) verifyAndPublish(sp *incomingSig) {
        startTime := time.Now()
        err := (error)(nil)
-       if f.sigSleepTime <= 0 {
+       if f.sigSleepTime <= 0 && false {
                err = verifySignature(sp, f.msg, f.part, f.cons)
        } else {
-               time.Sleep(time.Duration(f.sigSleepTime * 1000000))
+               //time.Sleep(time.Duration(f.sigSleepTime * 1000000))
+               os.Stdout.WriteString("*********************************************************************************************************************************************************************************************************")
        }
        endTime := time.Now()

Document how to use within Ethereum context

bn256.hashedMessage panics for some messages

For some messages M, hashedMessage function panics.
Example:
message = []byte("I am the byzantine general.")

Reason:
Under the hood bn256.RandomG1(reader) is using big.Int(rand io.Reader, max *big.Int) function.
This function assumes implementation of io.Reader can be called
many times and each time reads up fresh data.

We fail to satisfy this condition as we create reader with :
reader := bytes.NewBuffer(hashed) this reader has finite capacity and
_, err = io.ReadFull(reader, bytes) will return error when the reader is exhausted. We don't handel this error hence the bug.

Add measurements of network / processing in ReportHandel

Simulation framework

We need to have a simulation implementation strategy. There's different ways we could do that, please add to that list if you think of other ways.

Architecture

There's probably many ways to design a simulation framework. We should list the different options here. Here's the one that we implemented in my previous job, in Go ( we could probably take some pieces here and there to drastically reduce dev time):

One "sink" node that receives measurements from any nodes running the experiement. Every node knows how to contact that sink - via a separate connection in TCP or UDP datagrams. At the end, the sink compute average / min_dev, etc and outputs the result in a CSV file. You can find the relevant code / packages here

Interfaces

First, in order to collect relevant measurements, we need configurable network, store, processing interfaces and Handel structs so we can wrap them around with functionalities related to measurements. i.e. Envision stg like

type MeasurementNetwork struct { 
    packetsSent uint32
    packetsReceived uint32
    Network 
}

func (m *MeasurementNetwork) Send(ids []Identity, p *Packet) {
     m.packetsSent++
     m.Network.Send(ids,p)
}

Multiple solutions possible:

Export Handel's field of general interfaces (processing, store etc) so one
can wrap some into another interface. simul/ package can contain the
wrappers.
- PRO: Very easy to wrap interfaces around
- CON: Public field of handels exposed
Having "constructor" function for each interface that are put into the
config struct. Can even put public the current implementations.
- PRO: quite modularizable.
- CON: larger config, difficult to know in advance which fields are
  required when creating an interface.
sets up a "SimulationHandel" struct, with its own interfaces inside the
handel package
- PRO: every implementation details could be kept hidden but still usable
  for collecting results, code should be able to be separated from main
  logic
- CON: simulation code separated but still in same package, not so
  "production-ready".

Failing nodes for platform

Use a log framework

Select it
Change the existing logs to have the right levels.

CI started to fail

Without any change on the code.
It seems we can reproduce the problem locallty:
FAIL github.com/ConsenSys/handel/simul/p2p/libp2p [build failed]
@nikkolasg any insight?

Add threshold flexbility to start a level

Today we start a level only when it has all the signatures.
We could have something smarter when the missing signature comes from a node that should have communicated long ago.

Technically, we could have a module to identify suspicious nodes. If the missing sigs comes from a suspicious node we start the level. A node would become suspicious if it hasn't sent its signature after a given delay or of it hasn't responded when we use tcp or quic to communicate.

replace 'Incoming() chan sigPair' by a Add

We don't need the Incoming channel

Add metrics about signatures

We should track, per node:

the number of signatures removed from the queue (because there is a better signature already)
the time taken to check a signature
the length of the queue of the signatures to sign

The last one will be useful to check that we do not overload the cpu if we put more than one handel node for 2 core.

Randomize nodes to contact at a given level

This was previously handled by RandomBinPartitioner but now the level struct is in charge of this. The logic should be moved.

Minotoring: in the report we have more messages received than sent

For example:
network,nodes,run,threshold,net_rcvd_min,net_rcvd_max,net_rcvd_avg,net_rcvd_sum,net_rcvd_dev,net_sent_min,net_sent_max,net_sent_avg,net_sent_sum,net_sent_dev,sigen_system_min,sigen_system_max,sigen_system_avg,sigen_system_sum,sigen_system_dev,sigen_user_min,sigen_user_max,sigen_user_avg,sigen_user_sum,sigen_user_dev,sigen_wall_min,sigen_wall_max,sigen_wall_avg,sigen_wall_sum,sigen_wall_dev,sigs_sigCheckedCt_min,sigs_sigCheckedCt_max,sigs_sigCheckedCt_avg,sigs_sigCheckedCt_sum,sigs_sigCheckedCt_dev,sigs_sigCheckingTime_min,sigs_sigCheckingTime_max,sigs_sigCheckingTime_avg,sigs_sigCheckingTime_sum,sigs_sigCheckingTime_dev,sigs_sigQueueSize_min,sigs_sigQueueSize_max,sigs_sigQueueSize_avg,sigs_sigQueueSize_sum,sigs_sigQueueSize_dev,sigs_sigSuppressed_min,sigs_sigSuppressed_max,sigs_sigSuppressed_avg,sigs_sigSuppressed_sum,sigs_sigSuppressed_dev,store_replaceTrial_min,store_replaceTrial_max,store_replaceTrial_avg,store_replaceTrial_sum,store_replaceTrial_dev,store_successReplace_min,store_successReplace_max,store_successReplace_avg,store_successReplace_sum,store_successReplace_dev
udp,202,0,200,148.000000,422.000000,252.185000,50437.000000,62.003978,140.000000,376.000000,245.085000,49017.000000,50.381736,0.432000,1.476000,0.961380,192.276000,0.262533,10.992000,47.940000,28.775040,5755.008000,8.814224,5.350673,19.115064,13.303494,2660.698744,3.087293,21.000000,237.000000,65.805000,13161.000000,37.153586,13.296296,79.000000,38.689162,7737.832348,11.845670,0.307692,82.571429,14.384738,2876.947551,12.854892,15.000000,361.000000,126.305000,25261.000000,53.479268,1.000000,190.000000,38.105000,7621.000000,35.554530,13.000000,34.000000,23.345000,4669.000000,3.400558

Make it work for non power of 2

Currently Hande probably fails when n is not a power of 2.

Insecure hashing in bn256/sign method

The method to hash a message to a point is insecure m -> scalar s -> s * G , as no easy method is provided by the go or cf packages and time pressure. We should try to implement a correct method, maybe by following the ideas in this paper https://www.di.ens.fr/~fouque/pub/latincrypt12.pdf . Although that will probably require forking off Go's or CF's package in order to access to the lower level methods.

Make it work with absent nodes

Handel has not been tested with absent nodes so far

QUIC network implementation thrashes sessions

I did a quick review of the code for the QUIC network implementation. It seems to be creating a QUIC session and dropping it for every incoming packet. Is this intentional? If yes, why?

Of course thrashing sessions will penalise performance – and a 3x slowdown when compared to UDP is not even bad in that circumstance.

To run fair UDP vs QUIC comparisons, this aspect should be fixed.

(BTW – apologies in advance if I misread the code – I did a very quick pass)

Relevant code: https://github.com/ConsenSys/handel/blob/master/network/quic/net.go#L127

CC @marten-seemann

Add an interface to communicate back to the application the bad nodes

If a node sent an invalid signatures it should be reported back to the application

BinomialPartitioner: optimizations

The binomial partitioner currently makes heavy computations each time it computes the partitioning of a level etc. These computations could be greatly optimized and even maybe cached. For the former, using simple binary operations on the IDs to compute the common prefix length should be sufficient for example.

Add flag for platform specyfic configuration in simul/main.go

For now we are passing platform specific parameters as flag in the generic
simul/main.go launcher. This can be confusing as user has to understand which flags she needs for given platform. For example -regions flag is required for AWS platform but not for localhost.

Solution:
Add platform specific config file

QUIC network implementation

Add a QUIC network implementation to Handel.

Libp2p - weird behaviors

We now have a comparative baseline simulation using libp2p where each peer connects to a few other peers (designated as a parameter "Count" in a config file), subscribe to the "handel" topic, broadcast their signature and wait to receive enough signatures.
Unfortunately, this simulation exhibits weird behaviors (~failures) of the libp2p pubsub library. We can tests these failures in two different ways, in the fail_libp2p branch:

Running the test TestGossipMeshy in simul/p2p/libp2p which is directly inspired from the tests found in the libp2p/pubsub repo.
Running the simulation in simul/ with go run main.go -config config_gossip.toml -platform localhost - It's the generalization of the tests. Even with a large number of connected peers, the simulation fails most often.

Please note that sometime theses tests pass, but most often they don't - repeat the experience !

For the test, using a Neighbor connector that makes each peer connects only to some "neighbors" in the ID space (modulo), so all peer's connections form a circle - it's a completely connected graph. On the contrary, using Random connector that randomly connects peers (as in the libp2p pubsub's tests) fails most of the time.

fifoProcessing: deprecated

fifoProcessing is no longer in use in the main code, but only in two places in the tests. It should be removed.

Processing + Partitioner still uses fmt.Printf

github.com/ConsenSys/handel.logf(NOT the logger interface) is called from these 5 sites:

partitioner.go|231 col 8| static function call from (*github.com/ConsenSys/handel.binomialPartitioner).Combine
partitioner.go|244 col 7| static function call from (*github.com/ConsenSys/handel.binomialPartitioner).Combine
processing.go|364 col 7| static function call from github.com/ConsenSys/handel.verifySignature
processing.go|457 col 7| static function call from (*github.com/ConsenSys/handel.fifoProcessing).verifySignature
processing.go|418 col 8| static function call from (*github.com/ConsenSys/handel.fifoProcessing).processIncoming

Handel constructor simplification

At the moment, the Handel constructor looks like the following:

func NewHandel(n Network, r Registry, id Identity, c Constructor,msg []byte, s Signature, conf ...*Config) *Handel

I see two problems with that:

It is very long: 7 arguments is long, even more for Go. And it add quite a cognitive load to be able to understand all these arguments and set them properly.
The Config contains the "Contributions" field that may required to be changed. Of course one can take the default value, but if a users sets it one time, it has to set it all the other times as well when the number of ids change.

Real / Faked Latency for AWS ?

How do we simulate latency for AWS instances within one region ? Do we need to simulate it at all, given the time conditions ? We should at least explore the naive solution of adding a time.Sleep(100 * time.Millisecond) at the network level and see how it compares without.