cargodog / one-of-many-proofs Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 2.0 117 KB

License: MIT License

Rust 100.00%

one-of-many-proofs's People

Contributors

Stargazers

Watchers

Forkers

endrizzimarco stringhandler

one-of-many-proofs's Issues

Missing tests

Tests should be added for all public methods.

Missing crate level documentation

Decent documentation exists for lower level modules & traits, but there is no documentation at the crate root.

Missing add and add_assign impls for proof aggregation

Implementing add and add_assign would allow for trivial non-interactive aggregation of proofs. This will require slight modification of the proof to allow a vector of f_j vectors, offsets, and bit commitments, as these cannot be aggregated (as I know of). Offsets actually can be aggregated, however, this would occlude their use as linking tags in a linkable ring signature scheme. Perhaps that can be made optional, for those who do not require linkable ring signatures?

Missing serde implementations

This crate should implement the serde traits to serialize/deserialize proofs.

The deserialization methods should include preliminary security checks, such as checking that scalars are canonical, points are on the curve, etc.

Scalar multiplication may be accelerated with basepoint tables

Consider use of basepoint tables to accelerate scalar multiplication for commonly used points. This should be checked to play nicely with multiscalar multiplication

Use RistrettoPoint multiscalar multiplication utilities in proof and verification

Optimized multiplication helpers are provided in curve25519_dalek::traits. I can think of places in this proof scheme where every one of these traits would be useful.

Some examples:

vector commitments (use constant-time for secret values)
precomputed mul for the set filter in verification
optional mult for processing vector of optional offsets

Use Gray codes to optimize computation of exponents

Current computation of set filter coefficients uses expensive binary evaluation of the f0_j & f1_j vectors. Significant performance gains may be achieved by calculating coefficients according to a Gray code sequence, instead of the binary integer sequence.

Example of current algorithm

To illustrate this, consider this example for a 3-bit set, given f1_j the blinded proof bits, and their complements f0_j. Currently each coefficient is calculated as the product series of members of f0_j and f1_j, mapped according to the bit at each position:

i (dec)	i (binary)	coeff_i
0	000	f0[2] * f0[1] * f0[0]
1	001	f0[2] * f0[1] * f1[0]
2	010	f0[2] * f1[1] * f0[0]
3	011	f0[2] * f1[1] * f1[0]
4	100	f1[2] * f0[1] * f0[0]
5	101	f1[2] * f0[1] * f1[0]
6	110	f1[2] * f1[1] * f0[0]
7	111	f1[2] * f1[1] * f1[0]

This results in 2 scalar multiplications for each member in the set, for each proof to be verified (i.e. no batch optimization). More generally, for an m bit proof size, this algorithm requires (m-1) * 2^m multiplications per proof, to compute every coefficient.

Refactored to use Gray coded indices

Using a Gray coded sequence instead of binary coded sequence allows for a significant optimization. Gray code guarantees only 1 bit will change between each integer in the sequence, meaning we can compute each coefficient with a single multiplication and a single division on the previous coefficient:

i (dec)	i (Gray)	coeff_i (unoptimized)	coeff_i (optimized)
0	000	f0[2] * f0[1] * f0[0]	f0[2] * f0[1] * f0[0]
1	001	f0[2] * f0[1] * f1[0]	coeff[i-1] * f1[0] / f0[0]
2	011	f0[2] * f1[1] * f1[0]	coeff[i-1] * f1[1] / f0[1]
3	010	f0[2] * f1[1] * f0[0]	coeff[i-1] * f0[0] / f1[0]
4	110	f1[2] * f1[1] * f0[0]	coeff[i-1] * f1[2] / f0[2]
5	111	f1[2] * f1[1] * f1[0]	coeff[i-1] * f1[0] / f0[0]
6	101	f1[2] * f0[1] * f1[0]	coeff[i-1] * f0[1] / f1[1]
7	100	f1[2] * f0[1] * f0[0]	coeff[i-1] * f0[0] / f1[0]

Notice, the optimized computation requires a single multiplication and division for each coefficient. Since Gray code guarantees only one bit will change, no matter how large the sequence, this will remain constant for proofs of any size. This means the entire algorithm only requires (m-1) + 2*((2^m) - 1) multiplications per proof to compute every coefficient.

Comparison of performance

The below table compares the number of operations required to compute set coefficients (for every proof) between the current strategy and the proposed optimized strategy, for various m-bit proof sizes.

Notice the optimized algorithm is faster by approximately a factor of m.

`m`	# ops Binary	# ops Gray
3	16	16
8	1792	517
16	1.05E6	1.31E5
32	1.33E11	4.29E9

Proof sizes greater than 32 bits may be invalid

Much of this code utilizes usize variables, as well as some other constructs, which may lead to issues with proofs larger than 32 bits in size.

Adding merkle tree membership proof?

I know its a bit off-topic but I thought I'd ask. Your work looks amazing @cargodog and I'm planning to play around with some of the work you built and learn from it. I've had a hard time wrapping my head around how one would handle a merkle tree membership proof using marlin and curv25519-dalek. Any advice?

Suboptimal evaluation of set filter coefficients

Issue #19 proposes a technique to aggregate f_j vectors, to reduce the number of coefficients that must be computed for each set member to just one.

There is potentially further room for optimization by adjusting how each coefficient is computed. Currently each coefficient is computed by evaluating the f1_j & f0_j against the bit values of each index. For an m bit proof, this results in m multiplications for each coefficient at each member in the set. Without #19, this becomes m multiplications for each proof at each member in the set; e.g. for k proofs, that would be m * k * 2^m scalar multiplications. If #19 is implemented, this reduces to m * 2^m multiplications and m * (k-1) subtractions.

Instead of computing each no coefficient by re-evaluating the f_j vector at each index of the set, it is possible to perform a much cheaper transformation of the previous coefficient, by adding or subtracting the appropriate bits of f_j. This could reduce the computation of coefficients to only m multiplications + log2(N) additions once for the entire set.

Add feature gate to disable redundant security checks in verification

Upon implementing the deserialize trait for proofs (#6), many security checks may be allowable to remove from the verification functions and into the deserialize trait implementation. For example, checking curve points and scalars will be redundant with the checks performed in the deserialization trait implementation.

It might be wiser to leave the redundant checks in place, and add a feature gate to disable the checks. This would allow for the "safest" behavior by default, but allow a more optimized verification for those who know their proofs have been imported via the deserialize trait, before verification.

Use transcript::validate_and_append_point() when committing to points

Currently, point checking is done independently from committing to points. Instead, using transcript::validate_and_append_point() will ensure every point that is committed will also be checked, reducing the risk of forgetting to check every point.

Serde impls should be an optional feature

Some builds may prefer not to include the Serde implementations (Serialize & Derive). This should be an optional feature.

Proofs may be optimized by using n-ary commitments

Currently the proof scheme uses binary (2-ary) commitments. The proof scheme can be improved to use arbitrary n-ary commitments. This crate should support that ability, as well as suggest appropriate defaults for a given set size.

Missing readme

This crate needs a readme.

Performance can be improved withmulti-threaded computations

Currently the bulk of proof and verification time comes from a large pre-computation of the filtered set; each set member is tediously multiplied by a coefficient for each proof, and summed together. This can be sped up greatly by breaking up the set processing into parallel computations and aggregating the results. I am not sure the optimal degree of parallelization, and this likely depends on CPU architecture as much as anything. There are potentially huge performance gains to be attained here, though, so this should be a high priority research effort.

Does not build with #![no_std]

For security and compatibility purposes, I would like to configure this crate to build without the standard library.

Curve library should be conconfigured to use high performance CPU backend

Proofs should be made fixed size with const generics, instead of using vectors

Const generics are not yet a stable feature in Rust (rust-lang/rust#44580). Once const generics stabilize, they should be considered for use in this crate.

This would have a few advantages. Primarily, allowing proofs to be known size at compile time enables their use in more applications. Second, this would allow all proof logic to operate on pure slices, instead of dynamically allocated vectors. This opens up compatibility with more systems as well as reduces complexity/issues that arise with dynamic memory.

Missing references in documentation

This code base was inspired by a number of papers and research sources. These sources should be included in a list of references.

Add documentation demonstrating use as ring signature scheme, both linkable and unlinkable variants.

A ring signature scheme may be constructed, by committing to some message text in the proof transcript before proving/verifying each proof. Optionally, the use of offsets may lead to a linkable ring signature, to identify multiple signatures from the same member (e.g. linking signatures).

Missing benchmarks

Some benchmarks exist already, but their coverage is poor. It would be nice to improve the benchmarks to get good performance estimates.