arkworks-rs / algebra Goto Github PK
View Code? Open in Web Editor NEWLibraries for finite field, elliptic curve, and polynomial arithmetic
Home Page: https://arkworks.rs
License: Apache License 2.0
Libraries for finite field, elliptic curve, and polynomial arithmetic
Home Page: https://arkworks.rs
License: Apache License 2.0
We should make it easier to input parameters to curves etc., rather than relying on decomposing numbers into hex values etc. It's meh for hex values, disastrous for decimals, and overall mistake prone.
It might be helpful to add some Readmes in the algebra and in the future, other directories, that lists out basic math of the various curves, including security, special characteristics like twists, GLV, also list some relevant references etc. Since we seem to be adding more curves. Sort of like a pokedex. My inspiration is OpenAI's SpinningUp, which had a nice documentation that explained some of the differences in RL algorithms and threw in some useful formulas.
I am happy to do this as it will help motivate my study of elliptic curves.
This design is primarily based on the original GLV paper and Aztec's approach of doing pre-division by n and a second round of approximate division by right shift. We do an analysis to show that the error bound in terms of the short lattice basis lengths does not increase significantly.
Given a modulus n
and a cheap endomorphism ϕ
with associated scalar λ
such that ϕ(P) = λP
for all P
in the curve, the objective is to compute two vectors in Z X Z
, v1 = [a1, b1], v2 = [a2, b2]
such that ||vi|| < sqrt(n)
and ai - λbi = n*si = 0 mod n
. By the construction from the extended euclidean algorithm found in the paper, we see that wlog a1|b2| + a2|b1| = n
. We also know v1
linearly independent to v2
.
Then, given any scalar k \in Z/nZ
, one has that (k, 0) = β1 v1 + β2 v2
, where β1, β2 in Q>=0
and by solving the linear equations, we find that
β2 = k/(a1*|b1/b2|*(-1)^{!(sign(b1) == sign(b2))}
β1 = -b1/b2*β2
.
If b1
has different signs to b2
, β1 = |b2|*k/n
, β2 = |b1|*k/n
, since then k = (a1|b2| + a2|b1|)k/n
and b1|b2| + b2|b1|= 0
.
Else,
As a matter of fact, we can assume that at most one of b1, b2
are negative.
Choose R > r. Then, one computes g1 = round(R*|b2|/n)
, g2 = round(R*|b1|/n)
. Let kb + 1/2 = qn + t, 0 <= t < n
. Note that if R*b + (n-1)/2 = g’n + t’
, then g’ = R*b/n +1/2 - t’/n +ε
. Then, if we take c = g’k/R = k/R(R*b/n +1/2 - t’/n +ε) = k*b/n +k/2R - t’k/nR + ε’
. If we round this, we get a distance from k*b/2
by at most ||t/n - 1/2 + k/R(1/2 - t’/n) + ε’||
. Notice that this is at most 1.
Let c1 = round(g1*k/R), c2 = round(g2*k/R)
. So our total error for u = (k, 0) - (c1*v1 + c2*v2)
has ||u|| <= ||v1|| + ||v2|| <= 2*max(||v1||, ||v2||) < 2*sqrt(n)
by the triangle equality and f(u) = k
where f((a, b)) = a + λb
. So we set k2 = -(c1b1 + c2b2)
and k1 = k - λ*k2 mod n
.
Notes:
ZZ X ZZ
and also ZZ/nZZ X ZZ/nZZ
Marlin IVC for a predicate of 10^6 max degree would demand ~3000k degrees for MNT6-753.
However, MNT6-753 currently has a domain of 2^15 * 5^2, which is insufficient to support MNT6-753.
Changing 5^2 to 71^1 is still insufficient.
Thus, this calls for the support of truly multi-radix FFT, which supports more than two primes.
If we use a three-radix FFT of (2^15)*(5^2)*7, Marlin IVC would more likely work.
It is now left as a future todo.
cc'ed @npwardberkeley
constant time is not a priority for zexe, so it could be nice to have constant time ops that are feature gated for when they are needed
In order to match gnark timings for G1, we should bring mul_assign
timings down to 220us from 270us. add_mixed
should be brought down from 750us to 705us.
gnark uses the following algorithm credited to Jonathan Bootle: https://jbootle.github.io/Misc/pippenger.pdf.
Actually, I think the derive approach might not be good, especially because of the discrepancy in the way that G1Affine etc. are instantiated
So I'm leaning towards a plain macro approach.
Field should return extension degree + characteristic (already impl) so that one can easily determine its order.
Yes, the crate dev is not responsive, but are there really necessary changes?
It appears ProjectiveCurve: Hash
which sounds slow. I'd expect only AffineCurve: Hash
so one must explicitly batch normalize anytime one wants hashes of points.
There is some complexity here because one should minimize pairings in a pairing equation by collecting points in buckets and running roughly Gaussian elimination. It's doable optimally for BLS signatures provided you give yourself some pre-hashed fixed size message type like a [u8; 32]
, so you can collect those points before even creating any projective points with hash-to-curve.
In general, you need some excess redundency in some batch_normalization
calls if you start with only a bunch of projective points on both sides, but any given proof system could've its batch verifier written to do this optimally I think, but you need to look at the specific equation. In the end, all these worries would land under some TIPP/MIPP prover I think.
Hi all,
My name is Dimitri Koshelev. I am a researcher from Moscow and Paris. My field of science is pairing-based cryptography.
Earlier I posted on the site ethresear.ch my new point compression method for elliptic curves of j-invariant 0. May be you are interested.
Also, I know how generalize my compression method to simultaneously compress three elliptic $\mathbb{F}{!q}$-points (resp. one \mathbb{F}{!q^3}$-point) such that at the decompression stage one requires to accomplish only 1 exponentiation in the field
In your opinion is it a usefull result or not?
Sincerely yours, Dimitri.
Currently, MNT4/6 pairing is implemented in projective coordinate with a binary Miller loop.
Note that the Hamming weight of ATE_LOOP_COUNT for both MNT298 and MNT753 is lower when decomposed in 2-NAF. I suggest to implement 2-NAF Miller loop for MNT4 and MNT6.
It is algebra after all, and will reduce the clutter.
I assume that Zexe variable base multiexp is inherited from bellman. In section "New advances in multi-scalar-multiplication over elliptic curves" of this document by Aztec team it is shown that there is a 2x improvement.
I was wondering if lazy reductions are possible to speed up the EC operations.
This is as far as I can tell relevant for affine, Jacobian and projective formulas, where intermediate terms are composed of summands that are multiples of two or more field elements.
I'm pretty sure pairing formulas can benefit as well.
In Montgomery reduction mod N, we only require aR.bR
to be in the range [0, RN-1]
, where we require R > N
, rather than in the range [0, (R-1)^2]
.
We can probably use some appropriate compile time checks to make sure that no errors are introduced when utilising lazy reductions in writing formulas.
Implement all algorithms defined in Figure 1 of https://eprint.iacr.org/2012/685.pdf
We can obtain upto 30% speed ups for scalar multiplication by implementing a GLV-type endomorphism. See Section 3.5 (Pg 123) of http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.394.3037&rep=rep1&type=pdf
Currently the Miller loop for SW6 uses affine coordinates, which makes the loop much slower than necessary. Moving to projective coordinates should speed up the loop significantly.
One can use the MNT6 miller loop code as a reference point to get this working. Indeed, a prior version of the SW6 ML code was basically a port of the MNT6 code, but it had incorrect constants, and so the pairing result was incorrect. This (incorrect) version was ~4x faster than the current (correct) version.
You can preprocess the twiddle factors on the domain for FFTs. This halves the number of multiplications required for a FFT. You just have to align the twiddle factors with how they get used.
Here is the relevant code in libiop that does these:
Creating the cache: https://github.com/scipr-lab/libiop/blob/master/libiop/algebra/field_subset/subgroup.tcc#L118
Using the cache: https://github.com/scipr-lab/libiop/blob/master/libiop/algebra/fft.tcc#L309
Currently the From traits aren't implemented on Field, only PrimeField. This is due to them not being implemented on Prime Field Extensions.
I think it makes sense to have a From on a field extension Fp[x]
that just sets the 'constant' term of the field element, which is consistent with what the base field does.
I hope to either use webgpu or emu to parallelise MSM/FFTs. I am not sure what the current implementation of MSM is, but we need to load-balance the different buckets. My target is a 2-10x speedup on a powerful GPU like GTX1060-6GB.
An idea I have which may or may not make sense is to perform the FFT in parallel on the CPU first, then when the butterflies are sufficiently small, move them into small kernels on the GPU. (can't remember which way around it is). Due to lack of data dependencies, we should be able to get very high streaming throughput.
Other problems to think about is how to package the library so that GPU can be compiled for if desired/detected. Shouldn't be too hard utilising some WebGPU/#[cfg]
functionality.
Add methods on Field
and Curve
traits if necessary.
Hi all, I would like to share with you guys some benchmarks I ran on zexe, gnark/goff and zkcrypto's BL12-381. I believe this is a good place to start to identify what the key avenues for optimisation are.
From these results, it seems that major areas of optimisation are in
More benchmarking should be done once Fq has been sufficiently optimised.
Yet to benchmark:
zk-swap-libff/libff
barretenberg
I believe the deserialization logic here should be improved: https://github.com/scipr-lab/zexe/blob/e1db7c5fede8ba5a0f5b3958692de44da65c1cb6/algebra-core/src/fields/macros.rs#L175
When Zexe tries to read a byte stream and it gets a value that isn’t in the field it returns 0. This is both confusing for users to check for errors and prevents from getting a valid 0 value.
A current approach to mitigate it is https://github.com/scipr-lab/zexe/pull/181/files#diff-0d3539aa74111fa4e3ea4d5b08b7278cR289, but should be improved.
Currently the polynomial library does not support an API to multiply more than two polynomials together efficiently.
For n
polynomials the API would ideally:
This can trip up users. Should we change this?
currently it is fixed
The current integration of mixed-radix FFTs is kind of unsatisfactory. There are two issues:
Required size: 10;
GeneralEvaluationDomain: domain_size = 2^4 = 16
DynamicEvaluationDomain: domain_size = 2 * 5 = 10
To resolve this, we might want to implement a "dynamic" domain that automatically chooses a subgroup whose size matches the required input size as closely as possible. This is different from the behaviour of something like GeneralEvaluationDomain
, which tries to fit things into radix-2 if possible, and only goes for larger domain sizes if radix-2 is insufficient. This would also unify the existing Radix2Domain
, MixedRadixDomain
, and GeneralEvaluationDomain
impls.
FFTs in the library are all nlog(n). However they could be nlog(d), which is an efficiency improvement when doing polynomial multiplication.
This can be taken advantage of in two possible ways:
This probably doesn't matter for the current pairing based SNARKs as they are dominated by MSM time.
By utilising bigint instead of field, we are already losing some invariant on the size of the arg.
Further, it's clear this has little bearing on code correctness. Hence, I suggest that either scalar_mul be parameterizable by a bigint param, or we simply accept slices of &[u64] to allow for a more flexible interface.
Depends on arkworks-rs/snark#104
It'd be nice to have a helper method as an extension trait over CanonicalSerialize
which calls a cryptographically secure hash function (maybe make it take a Digest
?) on the serialized input, and returns the fixed size hash of the provided data.
It'd be even better if it would also be available on structs composing such elements.
Example:
use blake2::Blake2b;
use generic_array::GenericArray;
use zexe::algebra::*;
use digest::Digest;
trait Hasher: CanonicalSerialize {
fn hash<H: Digest>(&self) -> GenericArray<u8, H::OutputSize> {
let buffer = self.serialize();
let mut hasher = H::new();
hasher.input(buffer);
hasher.result()
}
}
struct X<E: PairingEngine> {
a: E::G1Affine,
b: E::G2Affine,
c: Vec<E::G1Affine>,
}
x = X::<E> {
a: E::G1Affine::zero(),
b: E::G2Affine::zero(),
c: vec![],
}
let hash = x.hash::<Blake2b>();
This would serialize all elements (compressed or uncompressed? i don't think it matters) to a buffer, and then take the hash of the buffer.
It becomes a bit trickier I believe if the struct ends up having non-CanonicalSerialize elements.
e.g.
struct X<E: PairingEngine> {
a: E::G1Affine,
b: E::G2Affine,
c: Vec<E::G1Affine>,
transcript_hash: [u8; 64],
}
Currently prime_field.root_of_unity()
returns a group element of order 2^{F::PARAMS::TWO_ADICITY}
.
It would be convenient to have a method on PrimeField to get the root of unity for a given power of 2. This operation comes up fairly often, including the evaluation domain API of zexe.
It could also be extended in the future to get non-power of two roots.
It seems unacceptable that a primitive that can be accelerated (leading to ~1.5x speedup) is not.
Hence I suggest to look into how one should add assembly for field ops to make used of vectorised instructions etc. I also suggest to look into other modes of multiplication besides textbook such as karatsuba multiplication. This may not lead to huge speedups on x86_64, but might on other platforms. Further one should look into optimisations for faster extension field arithmetic.
Finally, I propose to look into how much of this code can be generically created for fields of arbitrary size using either macros or function calls to wrapped assembly, with target-dependent compilation.
The current pairing timings are about 2-2.5ms for the BLS12 curves on a 3+GHz machine. I believe this can be improved closer to the theoretical limit of roughly 1.1-1.3ms. As a target, I would set 1.5ms.
Could the struct for an evaluation domain contain a Field element for the affine shift, which then enables it for use as any coset. This not existing required me to re-implement a subset of this logic. (The current API only supports cosets with shift g
, for a subset of methods)
My suggested API:
affine_offset^|H|
We currently use Extended Twisted Edwards Coordinates for computation on Edwards curves. In that paper, they provide a unified addition formula, and a tailored doubling formula that is faster. Currently the doubling implementation uses the unified addition formula (in a way that isn't constant time).
If speed on these curves matter (e.g. for signatures), its probably worth experimenting with the tailored doubling formula to see if the computation time is reduced in comprehensive benchmarks. It is not immediate that this will be the case due to instruction caching effects.
Further, it probably is the case that using this custom doubling formula for scalar multiplication (as suggested in section 4.3 and 4.4 of the paper) will yield higher speeds, as there we have much less to worry re things being out of cache.
Why? 319-bit prime modulus reduces G1 cost by approx 1.44x vis a vis BLS12-381/377. Constructing curves from Cocks-Pinch method (only known method to generate pairing-friendly curves of prescribed order) doubles the size of the base field, so we want to it to be small to begin with.
Since we assume the leaves of an aggregating proof system have lower proving power, we believe this is an adequate strategy for 1 layer or even 2 layer proof composition.
Future work
Future work includes finding suitable Cocks-Pinch Curves for proof composition with BLS24-319. Hopefully, we can find good curves with log_2 p <= 639 and for 2nd layer, log_2 p <= 1279 with highly efficient pairings. Note that the size of the pairing as an arithmetic circuit is about 2x that of BLS12, so the outer most curve has to be at least 2x more efficient (in terms of machine ops/gas costs etc.).
The BW6-761 curve had a pairing time of about 10ms (albeit for a relatively unoptimised implementation), compared with ~2.5 ms for a curve over a prime with half the number of bits, which is a 4x difference. That being said, a slightly over par 1.5x to 2x improvement (on the basis of a smaller base field) seems within reach.
Nonetheless, as one approaches larger field sizes, the costs saved from using a smaller base field may not make up for larger circuits if optimised algorithms like Karatsuba multiplication is used. So one would be trading proof capacity for G1 e.g. proving performance.
BLS48-286
Further, the BLS48-286 curve should also be considered. According to my estimates, they should be 1.5-2x as slow for verification compared to BLS12, but proving could be much faster (9 words on a 32-bit machine). The real benefits can be seen in the outer layer curve, which could have modulus < 572 bits.
Note that this has a much large arithmetic circuit compared with BLS12 - approximately 2.5-5x. So to reap the benefits, the cost of verifying the outer curve has to be < 2.5-5x that of the BW6-761 curve. Based on the discrepancy of base field, a speedup of 1.5-2x is more likely.
Overall, due to the possibly high complexity of the pairing for BLS48, BLS24 seems like a more promising route for now.
Things to investigate
The elliptic curve point API's (AffineCurve, ProjectiveCurve) currently claim that the interface should be providing quite strong guarantees, all points should be on the curve, and moreover in the prime order subgroup.
However this guarantee isn't achieved at the moment.
I've only looked at short weierstrass jacobian curves so far (which encompasses the BLS12 curves), and they don't guarantee that points are on the curve, or that they are in the prime order subgroup for AffineCurve.
The new()
method for AffineGroup (which implements AffineCurve) does not ensure that points are on the curve.
Also the sw curve's Affine Curve implementation of from_random_bytes()
uses get_point_from_x()
, which as stated in its comments provides no guarantees around being in the prime order subgroup.
I believe that the trait comments should drop the description that they are in the prime order subgroup, and we should have new traits that enforce that this is the case. (Or alternatively, make the two above methods enforce the claimed properties)
Currently we have a slew of different macros; we should systematize and unify these where it makes sense.
Since debugging with macros can be hard, and Zexe uses significant amount of macros, we should have a script that generates an entire new directory e.g. zexe_expand
, so that all of the macros have been expanded. We can use cargo-expand
to achieve this, and expand each file and write to a file. Then, this library can be compiled and run, and will be easier to debug.
We can have the expand script be located in the crate root. So just run . expand_all.sh
.
Very unreadable code results from having two traits implementing the same type with same type name. One should prefix with the trait name. E.g. AffineScalarField, AffineBaseField
Currently polynomial evaluation requires linear memory: https://github.com/scipr-lab/zexe/blob/6bfe574f7adea14b97ff554bbb594988635b1908/ff-fft/src/polynomial/dense.rs#L97
Instead it should only require a constant amount of memory using horner's method. An example of this is here: https://github.com/scipr-lab/libiop/blob/master/libiop/algebra/polynomials/polynomial.tcc#L99
https://gitlab.com/elrnv/unroll/-/issues/4
Temporary fix in arkworks-rs/snark#219
Due to improvements on special towered number field sieves, a recent paper from A. Guillevic show that many pairing-friendly curves are below their targeted level of security. For instance, the paper states that the base field size of BLS12 curves need to be increased up to 446 bit (instead of 377 as for BLS12-377 or (z-Cash's) BLS12-381) to serve 128 Bit security.
Furthermore, by personal communication with the author of the above paper, I have also received similar estimates on MNT curves which I uploaded to our repository, see HorizenOfficial/ginger-lib#47. The results are as for the BLS12 curves:
Guillevic's note also includes a link to a parameter file of new MNT4/6 cycles with base field size 996 bits and beyond) which serve quite acceptable domain sizes for mixed radix FFT.
As we from ginger do not aim at implementing Guillevic's cycle (already Coda's is very slow in performance) I can be of help providing parameters (such as cofactors and non-residues for extension fields) if this is of interest for Zexe.
GLV patents end 25 sept. 2 weeks and 2 days. We should collate resources/scripts for finding lambda and omega for all the SW curves Zexe uses, so that the current GLV script can produce the precomputes necessary for GLV scalar mul, which is not constant time either, but combined with wNAF produces a speedup of over 2x for BW6 projective. Batch affine alone only adds about 1.3-1.4x on top of this.
GLV precomputation script should also be generic and work for all curves based on features. We can use macros to simplify.
Currently, MNT4/6 pairing is implemented in projective coordinate with a binary Miller loop.
Note that the Hamming weight of ATE_LOOP_COUNT
for both MNT298 and MNT753 is lower when decomposed in 2-NAF. I suggest to implement 2-NAF Miller loop for MNT4 and MNT6.
It should be fairly straightforward (with some care to details) to use what we have in the big integer library to define signed bigintegers.
The basic design would be to have the top level limb be a i64 and the rest of the limbs be u64s.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.