Randomized test failure - ARM - G1 Infinity

Constantine

Constantine: High performance cryptography for proof systems and blockchain protocols

“A cryptographic system should be secure even if everything about the system, except the key, is public knowledge.”
— Auguste Kerckhoffs

This library provides constant-time implementation of cryptographic primitives with a particular focus on cryptography used in blockchains and zero-knowledge proof systems.

Constantine

The library aims to be a fast, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system.

The library focuses on following properties:

constant-time (not leaking secret data via side-channels)
performance
generated code size, datatype size and stack usage

in this order.

Public API: Curves & Protocols

Protocols are a set of routines, designed for specific goals or a combination thereof:

confidentiality: only the intended receiver of a message can read it
authentication: the other party in the communication is the expected part
integrity: the received message has not been tampered with
non-repudiation: the sender of a message cannot repudiated it

Legend

✅: Full support
🏗️: Partial support:
- in C, some APIs not provided.
- in Rust, only low-level constantine-sys API available but no high-level wrapper.
🙈: Missing support

Protocols

Constantine supports the following protocols in its public API.

	Nim	C	Rust	Go
Ethereum BLS signatures	✅	🏗️	🏗️	🙈
Ethereum KZG commitments for EIP-4844	✅	✅	✅	✅
Ethereum Virtual Machine Precompiles (ECADD, ECMUL, ECPAIRING, MODEXP)	✅	🙈	🙈	🙈
Zk Accel layer for Halo2 proof system (experimental)	not applicable	not applicable	✅	not applicable

Elliptic Curves

Constantine supports the following curves in its public API.

	Nim	C	Rust	Go
BN254-Snarks	✅	✅	✅	🙈
BLS12-381	✅	✅	✅	🙈
Pasta curves (Pallas & Vesta)	✅	✅	✅	🙈

For all elliptic curves, the following arithmetic is supported

field arithmetic
- on Fr (i.e. modulo the 255-bit curve order)
- on Fp (i.e. modulo the 381-bit prime modulus)
elliptic curve arithmetic:
- on elliptic curve over Fp (EC G1) with affine, jacobian and homogenous projective coordinates
- on elliptic curve over Fp2 (EC G2) with affine, jacobian and homogenous projective coordinates
- including scalar multiplication, multi-scalar-multiplication (MSM) and parallel MSM

All operations are constant-time unless explicitly mentioned vartime.

For pairing-friendly curves Fp2 arithmetic is also exposed.
🏗️ Pairings and multi-pairings are implemented but not exposed yet.

General cryptography

Constantine supports the following hash functions and CSPRNGs in its public API.

	Nim	C	Rust	Go
SHA256	✅	✅	🏗️
Cryptographically-secure RNG from Operating System (sysrand)	✅	✅	✅	✅

Threadpool

Constantine also exposes a high-performance threadpool for Nim that inherits performance and API from:

Task parallelism API RFC: nim-lang/RFCs#347
- Weave data parallelism API:
  - spawn and sync
  - parallelFor and syncScope
    - parallelFor supports arbitrarily complex reduction. Constantine uses it extensively for parallel elliptic curve sum reductions.
  - isSpawned and isReady
CPU Topology - Query the number of threads available at the OS/VM-level to run computations:
- ctt_cpu_get_num_threads_os in C
- getNumThreadsOS in Nim
- constantine_core::hardware::get_num_threads_os in Rust
https://github.com/mratsim/weave
https://github.com/status-im/nim-taskpools

The threadpool supports nested parallelism to exploit high core counts and does not suffer from OpenMP limitations of nested parallel loops. For batching KZG verification, Constantine issues 3 multi-scalar multiplication in parallel, each using at 3 nested parallel loops.

See the following documents on the threadpool performance details, design and research:

Installation


❗ Constantine can be compiled by Nim v1.6.x or v2.0.2 but not Nim v2.0.0 due to a compile-time integer regression

From Rust

Install clang compiler, for example:

Debian/Ubuntu sudo apt update && sudo apt install build-essential clang
Archlinux pacman -S base-devel clang


📝	We require Clang as it's significantly more performant than GCC for cryptographic code, especially for ARM where Constantine has no assembly optimizations. And Rust, like Clang both rely on LLVM. This can be changed to any C compiler by deleting this line.

Install nim, it is available in most distros package manager for Linux and Homebrew for MacOS Windows binaries are on the official website: https://nim-lang.org/install_unix.html
- Debian/Ubuntu sudo apt install nim
- Archlinux pacman -S nim
Test both:
- the experimental ZK Accel API (ZAL) for Halo2-KZG
- Ethereum EIP4844 KZG polynomial commitments
```
git clone https://github.com/mratsim/constantine
cd constantine
cargo test
cargo bench
```

Add Constantine as a dependency in Cargo.toml

for Halo2-KZG Zk Accel Layer

[dependencies]
constantine-halo2-zal = { git = 'https://github.com/mratsim/constantine' }

for Ethereum EIP-4844 KZG polynomial commitments

[dependencies]
constantine-ethereum-kzg = { git = 'https://github.com/mratsim/constantine' }

Optionally, cross-language LTO between Nim and Rust can be used, see https://doc.rust-lang.org/rustc/linker-plugin-lto.html:

Add a .cargo/config.toml to your project with the following:

# .cargo/config.toml

[build]
rustflags="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld"

and modify Constantine's build.rs to pass CTT_LTO=1

    Command::new("nimble")
        .env("CC", "clang")
        .env("CTT_LTO", "1") // <--
        .arg("make_lib_rust")
        .current_dir(root_dir)
        .stdout(Stdio::inherit())
        .stderr(Stdio::inherit())
        .status()
        .expect("failed to execute process");

From Go

Install any C compiler, clang is recommended, for example:
- Debian/Ubuntu sudo apt update && sudo apt install build-essential clang
- Archlinux pacman -S base-devel clang
Install nim, it is available in most distros package manager for Linux and Homebrew for MacOS Windows binaries are on the official website: https://nim-lang.org/install_unix.html
- Debian/Ubuntu sudo apt install nim
- Archlinux pacman -S nim
Compile Constantine as a static (and shared) library in ./include
```
cd constantine
CC=clang nimble make_lib
```
Test the go API.
```
cd constantine-go
go test -modfile=../go_test.mod
```
📝 Constantine uses a separate modfile for tests.
It has no dependencies (key to avoid supply chain attacks) except for testing.

From C

Install a C compiler, clang is recommended, for example:
- Debian/Ubuntu sudo apt update && sudo apt install build-essential clang
- Archlinux pacman -S base-devel clang
Install nim, it is available in most distros package manager for Linux and Homebrew for MacOS Windows binaries are on the official website: https://nim-lang.org/install_unix.html
- Debian/Ubuntu sudo apt install nim
- Archlinux pacman -S nim
Compile the dynamic and static library.
- Recommended:
  CC=clang nimble make_lib
- or CTT_ASM=0 nimble make_lib
  to compile without assembly (otherwise it autodetects support)
- or with default compiler
  nimble make_lib
Ensure the libraries work
- nimble test_lib
Libraries location
- The librariess are put in ./lib/ folder
- The headers are in ./include/ for example Ethereum BLS signatures
Read the examples in examples-c:
- Using the Ethereum BLS signatures bindings from C
- Testing Constantine BLS12-381 vs GMP ./examples-c/t_libctt_bls12_381.c

From Nim

You can install the developement version of the library through nimble with the following command

nimble install https://github.com/mratsim/constantine@#master

Dependencies & Requirements

For speed it is recommended to use Clang (see Compiler-caveats). In particular GCC generates inefficient add-with-carry code.

Constantine requires at least:

GCC 7
Previous versions generated incorrect add-with-carry code.
Clang 14
On x86-64, inline assembly is used to workaround compilers having issues optimizing large integer arithmetic, and also ensure constant-time code.
Constantine uses the intel assembly syntax to address issues with the default AT&T syntax and constants propagated in Clang.
Clang 14 added support for -masm=intel.

On MacOS, Apple Clang does not support Intel assembly syntax, use Homebrew Clang instead or compile without assembly.
Note that Apple is discontinuing Intel CPU throughough their product line so this will impact only older model and Mac Pro

On Windows, Constantine is tested with MinGW. The Microsoft Visual C++ Compiler is not configured.

Constantine has no C, Nim, Rust, Go dependencies, besides compilers, even on Nim standard library except:

for testing and benchmarking
- the tested language json and yaml parsers for test vectors
- the tested language standard library for tests, timing and message formatting.
- GMP for testing against GMP
for Nvidia GPU backend:
- the LLVM runtime ("dev" version with headers is not needed)
- the CUDA runtime ("dev" version with headers is not needed)
at compile-time
- we need the std/macros library to generate Nim code.

Performance

This section got way too long and has its own file.
See ./README-PERFORMANCE.md

Assembly & Hardware acceleration

Assembly is used on x86 and x86-64, unless CTT_ASM=0 is passed.
Assembly is planned for ARM.
GPU acceleration is planned.

Assembly solves both:

Security: fighting the compiler for constant time code
Performance: compiler caveats

Security

Hardening an implementation against all existing and upcoming attack vectors is an extremely complex task. The library is provided as is, without any guarantees at least until:

it gets audited
formal proofs of correctness are produced
formal verification of constant-time implementation is possible

Defense against common attack vectors are provided on a best effort basis. Do note that Constantine has no external package dependencies hence it is not vulnerable to supply chain attacks (unless they affect a compiler or the OS).

Attackers may go to great lengths to retrieve secret data including:

Timing the time taken to multiply on an elliptic curve
Analysing the power usage of embedded devices
Detecting cache misses when using lookup tables
Memory attacks like page-faults, allocators, memory retention attacks

This is would be incomplete without mentioning that the hardware, OS and compiler actively hinder you by:

Hardware: sometimes not implementing multiplication in constant-time.
OS: not providing a way to prevent memory paging to disk, core dumps, a debugger attaching to your process or a context switch (coroutines) leaking register data.
Compiler: optimizing away your carefully crafted branchless code and leaking server secrets or optimizing away your secure erasure routine which is deemed "useless" because at the end of the function the data is not used anymore.

A growing number of attack vectors is being collected for your viewing pleasure at https://github.com/mratsim/constantine/wiki/Constant-time-arithmetics

Disclaimer

Constantine's authors do their utmost to implement a secure cryptographic library in particular against remote attack vectors like timing attacks.

Please note that Constantine is provided as-is without guarantees. Use at your own risks.

Thorough evaluation of your threat model, the security of any cryptographic library you are considering, and the secrets you put in jeopardy is strongly advised before putting data at risk. The author would like to remind users that the best code can only mitigate but not protect against human failures which are the weakest link and largest backdoors to secrets exploited today.

Security disclosure

You can privately report a security vulnerability through the Security tab.

Security > Report a vulnerability

Why Nim

The Nim language offers the following benefits for cryptography:

Compilation to machine code via C or C++ or alternatively compilation to Javascript. Easy FFI to those languages.
- Obscure embedded devices with proprietary C compilers can be targeted.
- WASM can be targeted.
Performance reachable in C is reachable in Nim, easily.
Rich type system: generics, dependent types, mutability-tracking and side-effect analysis, borrow-checking, compiler enforced distinct types (Miles != Meters, SecretBool != bool and SecretWord != uint64).
Compile-time evaluation, including parsing hex string, converting them to BigInt or Finite Field elements and doing bigint operations.
Assembly support either inline or a simple {.compile: "myasm.S".} away
No GC if no GC-ed types are used (automatic memory management is set at the type level and optimized for latency/soft-realtime by default and can be totally deactivated).
Procedural macros working directly on AST to
- create generic curve configuration,
- derive constants
- write a size-independent inline assembly code generator
Upcoming proof system for formal verification via Z3 (DrNim, Correct-by-Construction RFC)

License

Licensed and distributed under either of

MIT license: LICENSE-MIT or http://opensource.org/licenses/MIT

or

Apache License, Version 2.0, (LICENSE-APACHEv2 or http://www.apache.org/licenses/LICENSE-2.0)

at your option. This file may not be copied, modified, or distributed except according to those terms.

This library has no external dependencies. In particular GMP is used only for testing and differential fuzzing and is not linked in the library.

	run_EC_mul_sanity_tests(
	ec = ECP_SWei_Proj[Fp[BLS12_381]],
	ItersMul = ItersMul,
	moduleName = "test_ec_weierstrass_projective_g1_mul_sanity_" & $BLS12_381
	)


	test "EC mul [Order]P == Inf":
	var rng: RngState
	let seed = uint32(getTime().toUnix() and (1'i64 shl 32 - 1)) # unixTime mod 2^32
	rng.seed(seed)
	echo "test_ec_weierstrass_projective_g1_mul_sanity_extra_curve_order_mul_sanity xoshiro512** seed: ", seed

	proc test(EC: typedesc, bits: static int, randZ: static bool) =
	for _ in 0 ..< ItersMul:
	when randZ:
	let a = rng.random_unsafe_with_randZ(EC)
	else:
	let a = rng.random_unsafe(EC)

	let exponent = EC.F.C.getCurveOrder()
	var exponentCanonical{.noInit.}: array[(bits+7) div 8, byte]
	exponentCanonical.exportRawUint(exponent, bigEndian)

	var
	impl = a
	reference = a
	scratchSpace{.noInit.}: array[1 shl 4, EC]

	impl.scalarMulGeneric(exponentCanonical, scratchSpace)
	reference.unsafe_ECmul_double_add(exponentCanonical)

	check:
	bool(impl.isInf())
	bool(reference.isInf())

	test(ECP_SWei_Proj[Fp[BN254_Snarks]], bits = BN254_Snarks.getCurveOrderBitwidth(), randZ = false)
	test(ECP_SWei_Proj[Fp[BN254_Snarks]], bits = BN254_Snarks.getCurveOrderBitwidth(), randZ = true)
	# TODO: BLS12 is using a subgroup of order "r" such as r*h = CurveOrder
	# with h the curve cofactor
	# instead of the full group
	# test(Fp[BLS12_381], bits = BLS12_381.getCurveOrderBitwidth(), randZ = false)
	# test(Fp[BLS12_381], bits = BLS12_381.getCurveOrderBitwidth(), randZ = true)

	proc randomSqrtCheck_p3mod4(C: static Curve) =
	template testImpl(a: untyped): untyped {.dirty.} =
	var na{.noInit.}: Fp[C]
	na.neg(a)

	var a2 = a
	var na2 = na
	a2.square()
	na2.square()
	check:
	bool a2 == na2
	bool a2.isSquare()

	var r, s = a2
	r.sqrt()
	let ok = s.sqrt_if_square()
	check:
	bool ok
	bool(r == s)
	bool(r == a or r == na)

	type Word* = Ct[uint32]
	## Logical BigInt word
	## A logical BigInt word is of size physical MachineWord-1
	type DoubleWord* = Ct[uint64]

	type BaseType* = uint32
	## Physical BigInt for conversion in "normal integers"

	const
	WordPhysBitSize* = sizeof(Word) * 8
	WordBitSize* = WordPhysBitSize - 1

	macro addCarryGen_u64(a, b: untyped, bits: static int): untyped =
	var asmStmt = (block:
	" movq %[b], %[tmp]\n" &
	" addq %[tmp], %[a]\n"
	)

	let maxByteOffset = bits div 8
	const wsize = sizeof(uint64)

	when defined(gcc):
	for byteOffset in countup(wsize, maxByteOffset-1, wsize):
	asmStmt.add (block:
	"\n" &
	# movq 8+%[b], %[tmp]
	" movq " & $byteOffset & "+%[b], %[tmp]\n" &
	# adcq %[tmp], 8+%[a]
	" adcq %[tmp], " & $byteOffset & "+%[a]\n"
	)
	elif defined(clang):
	# https://lists.llvm.org/pipermail/llvm-dev/2017-August/116202.html
	for byteOffset in countup(wsize, maxByteOffset-1, wsize):
	asmStmt.add (block:
	"\n" &
	# movq 8+%[b], %[tmp]
	" movq " & $byteOffset & "%[b], %[tmp]\n" &
	# adcq %[tmp], 8+%[a]
	" adcq %[tmp], " & $byteOffset & "%[a]\n"
	)

	let tmp = ident("tmp")
	asmStmt.add (block:
	": [tmp] \"+r\" (`" & $tmp & "`), [a] \"+m\" (`" & $a & "->limbs[0]`)\n" &
	": [b] \"m\"(`" & $b & "->limbs[0]`)\n" &
	": \"cc\""
	)

	result = newStmtList()
	result.add quote do:
	var `tmp`{.noinit.}: uint64

	result.add nnkAsmStmt.newTree(
	newEmptyNode(),
	newLit asmStmt
	)

	func montgomeryResidue*(a: BigIntViewMut, N: BigIntViewConst) =
	## Transform a bigint ``a`` from it's natural representation (mod N)
	## to a the Montgomery n-residue representation
	## i.e. Does "a * (2^LimbSize)^W (mod N), where W is the number
	## of words needed to represent n in base 2^LimbSize
	##
	## `a`: The source BigInt in the natural representation. `a` in [0, N) range
	## `N`: The field modulus. N must be odd.
	##
	## Important: `a` is overwritten
	# Reference: https://eprint.iacr.org/2017/1057.pdf
	checkValidModulus(N)
	checkOddModulus(N)
	checkMatchingBitlengths(a, N)

	let nLen = N.numLimbs()
	for i in countdown(nLen, 1):
	a.shlAddMod(Zero, N)

	func shlAddMod(a: BigIntViewMut, c: Word, M: BigIntViewConst) =
	## Fused modular left-shift + add
	## Shift input `a` by a word and add `c` modulo `M`
	##
	## With a word W = 2^WordBitSize and a modulus M
	## Does a <- a * W + c (mod M)
	##
	## The modulus `M` MUST announced most-significant bit must be set.
	checkValidModulus(M)

	let aLen = a.numLimbs()
	let mBits = bitSizeof(M)

	if mBits <= WordBitSize:
	# If M fits in a single limb
	var q: Word

	# (hi, lo) = a * 2^63 + c
	let hi = a[0] shr 1 # 64 - 63 = 1
	let lo = (a[0] shl WordBitSize) or c # Assumes most-significant bit in c is not set
	unsafeDiv2n1n(q, a[0], hi, lo, M[0]) # (hi, lo) mod M
	return

	else:
	## Multiple limbs
	let hi = a[^1] # Save the high word to detect carries
	let R = mBits and WordBitSize # R = mBits mod 64

	var a0, a1, m0: Word
	if R == 0: # If the number of mBits is a multiple of 64
	a0 = a[^1] #
	moveMem(a[1].addr, a[0].addr, (aLen-1) * Word.sizeof) # we can just shift words
	a[0] = c # and replace the first one by c
	a1 = a[^1]
	m0 = M[^1]
	else: # Else: need to deal with partial word shifts at the edge.
	a0 = ((a[^1] shl (WordBitSize-R)) or (a[^2] shr R)) and MaxWord
	moveMem(a[1].addr, a[0].addr, (aLen-1) * Word.sizeof)
	a[0] = c
	a1 = ((a[^1] shl (WordBitSize-R)) or (a[^2] shr R)) and MaxWord
	m0 = ((M[^1] shl (WordBitSize-R)) or (M[^2] shr R)) and MaxWord

	# m0 has its high bit set. (a0, a1)/p0 fits in a limb.
	# Get a quotient q, at most we will be 2 iterations off
	# from the true quotient

	let
	a_hi = a0 shr 1 # 64 - 63 = 1
	a_lo = (a0 shl WordBitSize) or a1
	var q, r: Word
	unsafeDiv2n1n(q, r, a_hi, a_lo, m0) # Estimate quotient
	q = mux( # If n_hi == divisor
	a0 == m0, MaxWord, # Quotient == MaxWord (0b0111...1111)
	mux(
	q.isZero, Zero, # elif q == 0, true quotient = 0
	q - One # else instead of being of by 0, 1 or 2
	) # we returning q-1 to be off by -1, 0 or 1
	)

	# Now substract a2^63 - qp
	var carry = Zero
	var over_p = CtTrue # Track if quotient greater than the modulus

	for i in 0 ..< M.numLimbs():
	var qp_lo: Word

	block: # q*p
	# q * p + carry (doubleword) carry from previous limb
	let qp = unsafeExtPrecMul(q, M[i]) + carry.DoubleWord
	carry = Word(qp shr WordBitSize) # New carry: high digit besides LSB
	qp_lo = qp.Word and MaxWord # Normalize to u63

	block: # a2^63 - qp
	a[i] -= qp_lo
	carry += Word(a[i].isMsbSet) # Adjust if borrow
	a[i] = a[i] and MaxWord # Normalize to u63

	over_p = mux(
	a[i] == M[i], over_p,
	a[i] > M[i]
	)

	# Fix quotient, the true quotient is either q-1, q or q+1
	#
	# if carry < q or carry == q and over_p we must do "a -= p"
	# if carry > hi (negative result) we must do "a += p"

	let neg = carry > hi
	let tooBig = not neg and (over_p or (carry < hi))

	discard a.add(M, ctl = neg)
	discard a.sub(M, ctl = tooBig)
	return

	func montyMul*(
	r: BigIntViewMut, a, b: distinct BigIntViewAny,
	M: BigIntViewConst, montyMagic: Word) =
	## Compute r <- a*b (mod M) in the Montgomery domain
	## `montyMagic` = -1/M (mod Word). Our words are 2^31 or 2^63
	##
	## This resets r to zero before processing. Use {.noInit.}
	## to avoid duplicating with Nim zero-init policy
	# i.e. c'R <- a'R b'R * R^-1 (mod M) in the natural domain
	# as in the Montgomery domain all numbers are scaled by R

	checkValidModulus(M)
	checkOddModulus(M)
	checkMatchingBitlengths(r, M)
	checkMatchingBitlengths(a, M)
	checkMatchingBitlengths(b, M)

	let nLen = M.numLimbs()
	setZero(r)

	var r_hi = Zero # represents the high word that is used in intermediate computation before reduction mod M
	for i in 0 ..< nLen:

	let zi = (r[0] + wordMul(a[i], b[0])).wordMul(montyMagic)
	var carry = Zero

	for j in 0 ..< nLen:
	let z = DoubleWord(r[j]) + unsafeExtPrecMul(a[i], b[j]) +
	unsafeExtPrecMul(zi, M[j]) + DoubleWord(carry)
	carry = Word(z shr WordBitSize)
	if j != 0:
	r[j-1] = Word(z) and MaxWord

	r_hi += carry
	r[^1] = r_hi and MaxWord
	r_hi = r_hi shr WordBitSize

	# If the extra word is not zero or if r-M does not borrow (i.e. r > M)
	# Then substract M
	discard r.sub(M, r_hi.isNonZero() or not r.sub(M, CtFalse))

	test "EC add is associative":
	proc test(F: typedesc, randZ: static bool) =
	for _ in 0 ..< Iters:
	when randZ:
	let a = rng.random_unsafe_with_randZ(ECP_SWei_Proj[F])
	let b = rng.random_unsafe_with_randZ(ECP_SWei_Proj[F])
	let c = rng.random_unsafe_with_randZ(ECP_SWei_Proj[F])
	else:
	let a = rng.random_unsafe(ECP_SWei_Proj[F])
	let b = rng.random_unsafe(ECP_SWei_Proj[F])
	let c = rng.random_unsafe(ECP_SWei_Proj[F])

	var tmp1{.noInit.}, tmp2{.noInit.}: ECP_SWei_Proj[F]

	# r0 = (a + b) + c
	tmp1.sum(a, b)
	tmp2.sum(tmp1, c)
	let r0 = tmp2

	# r1 = a + (b + c)
	tmp1.sum(b, c)
	tmp2.sum(a, tmp1)
	let r1 = tmp2

	# r2 = (a + c) + b
	tmp1.sum(a, c)
	tmp2.sum(tmp1, b)
	let r2 = tmp2

	# r3 = a + (c + b)
	tmp1.sum(c, b)
	tmp2.sum(a, tmp1)
	let r3 = tmp2

	# r4 = (c + a) + b
	tmp1.sum(c, a)
	tmp2.sum(tmp1, b)
	let r4 = tmp2

	# ...

	check:
	bool(r0 == r1)
	bool(r0 == r2)
	bool(r0 == r3)
	bool(r0 == r4)

	test(Fp[BN254_Snarks], randZ = false)
	test(Fp[BN254_Snarks], randZ = true)
	test(Fp[BLS12_381], randZ = false)
	test(Fp[BLS12_381], randZ = true)

	var carry = DoubleWord(0)

	for j in 0 ..< nLen:
	let z = DoubleWord(a[i]) + unsafeExtPrecMul(z0, N[i]) + carry
	carry = z shr WordBitSize

	test "EC double and EC add are consistent":
	proc test(F: typedesc, randZ: static bool) =
	for _ in 0 ..< Iters:
	when randZ:
	let a = rng.random_unsafe_with_randZ(ECP_SWei_Proj[F])
	else:
	let a = rng.random_unsafe(ECP_SWei_Proj[F])

	var r0{.noInit.}, r1{.noInit.}: ECP_SWei_Proj[F]

	r0.double(a)
	r1.sum(a, a)

	check: bool(r0 == r1)

	test(Fp[BN254_Snarks], randZ = false)
	test(Fp[BN254_Snarks], randZ = true)
	test(Fp[BLS12_381], randZ = false)
	test(Fp[BLS12_381], randZ = true)

	Xi = object
	## ξ (Xi) the cubic non-residue

	func `*`(_: typedesc[Xi], a: Fp2): Fp2 {.inline.}=
	## Multiply an element of 𝔽p2 by 𝔽p6 cubic non-residue 1 + 𝑖
	## (c0 + c1 𝑖) (1 + 𝑖) => c0 + (c0 + c1)𝑖 + c1 𝑖²
	## => c0 - c1 + (c0 + c1) 𝑖
	result.c0.diff(a.c0, a.c1)
	result.c1.sum(a.c0, a.c1)

	template `*`(a: Fp2, _: typedesc[Xi]): Fp2 =
	Xi * a

	func `*=`(a: var Fp2, _: typedesc[Xi]) {.inline.}=
	## Inplace multiply an element of 𝔽p2 by 𝔽p6 cubic non-residue 1 + 𝑖
	let t = a.c0
	a.c0 -= a.c1
	a.c1 += t

	v1.square(a.c2)
	v1 *= Xi
	v2.prod(a.c0, a.c1)
	v1 -= v2

	# C in v2
	# C <- a1² - a0 a2
	v2.square(a.c1)
	v3.prod(a.c0, a.c2)
	v2 -= v3

	# F in v3
	# F <- ξ a1 C + a0 A + ξ a2 B
	r.c1.prod(v1, Xi * a.c2)
	r.c2.prod(v2, Xi * a.c1)
	v3.prod(r.c0, a.c0)

	type
	CubicExtAddGroup* = concept x
	## Cubic extension fields - Abelian Additive Group concept
	type BaseField = auto
	x.c0 is BaseField
	x.c1 is BaseField
	x.c2 is BaseField

	type
	QuadExtAddGroup* = concept x
	## Quadratic extension fields - Abelian Additive Group concept
	not(x is CubicExtAddGroup)
	type BaseField = auto
	x.c0 is BaseField
	x.c1 is BaseField

	const
	HalfWidth = WordBitWidth shr 1
	HalfBase = (BaseType(1) shl HalfWidth)
	HalfMask = HalfBase - 1

	func hi(n: BaseType): BaseType =
	result = n shr HalfWidth

	func lo(n: BaseType): BaseType =
	result = n and HalfMask

	func split(n: BaseType): tuple[hi, lo: BaseType] =
	result.hi = n.hi
	result.lo = n.lo

	func merge(hi, lo: BaseType): BaseType =
	(hi shl HalfWidth) or lo

	func addC(cOut, sum: var BaseType, a, b, cIn: BaseType) =
	# Add with carry, fallback for the Compile-Time VM
	# (CarryOut, Sum) <- a + b + CarryIn
	let (aHi, aLo) = split(a)
	let (bHi, bLo) = split(b)
	let tLo = aLo + bLo + cIn
	let (cLo, rLo) = split(tLo)
	let tHi = aHi + bHi + cLo
	let (cHi, rHi) = split(tHi)
	cOut = cHi
	sum = merge(rHi, rLo)

	func subB(bOut, diff: var BaseType, a, b, bIn: BaseType) =
	# Substract with borrow, fallback for the Compile-Time VM
	# (BorrowOut, Sum) <- a - b - BorrowIn
	let (aHi, aLo) = split(a)
	let (bHi, bLo) = split(b)
	let tLo = HalfBase + aLo - bLo - bIn
	let (noBorrowLo, rLo) = split(tLo)
	let tHi = HalfBase + aHi - bHi - BaseType(noBorrowLo == 0)
	let (noBorrowHi, rHi) = split(tHi)
	bOut = BaseType(noBorrowHi == 0)
	diff = merge(rHi, rLo)

	func add(a: var BigInt, w: BaseType): bool =
	## Limbs addition, add a number that fits in a word
	## Returns the carry
	var carry, sum: BaseType
	addC(carry, sum, BaseType(a.limbs[0]), w, carry)
	a.limbs[0] = SecretWord(sum)
	for i in 1 ..< a.limbs.len:
	let ai = BaseType(a.limbs[i])
	addC(carry, sum, ai, 0, carry)
	a.limbs[i] = SecretWord(sum)

	result = bool(carry)

	func dbl(a: var BigInt): bool =
	## In-place multiprecision double
	## a -> 2a
	var carry, sum: BaseType
	for i in 0 ..< a.limbs.len:
	let ai = BaseType(a.limbs[i])
	addC(carry, sum, ai, ai, carry)
	a.limbs[i] = SecretWord(sum)

	result = bool(carry)

	func sub(a: var BigInt, w: BaseType): bool =
	## Limbs substraction, sub a number that fits in a word
	## Returns the carry
	var borrow, diff: BaseType
	subB(borrow, diff, BaseType(a.limbs[0]), w, borrow)
	a.limbs[0] = SecretWord(diff)
	for i in 1 ..< a.limbs.len:
	let ai = BaseType(a.limbs[i])
	subB(borrow, diff, ai, 0, borrow)
	a.limbs[i] = SecretWord(diff)

	result = bool(borrow)

	func mul(hi, lo: var BaseType, u, v: BaseType) =
	## Extended precision multiplication
	## (hi, lo) <- u * v
	var x0, x1, x2, x3: BaseType

	let
	(uh, ul) = u.split()
	(vh, vl) = v.split()

	x0 = ul * vl
	x1 = ul * vh
	x2 = uh * vl
	x3 = uh * vh

	x1 += hi(x0) # This can't carry
	x1 += x2 # but this can
	if x1 < x2: # if carry, add it to x3
	x3 += HalfBase

	hi = x3 + hi(x1)
	lo = merge(x1, lo(x0))

	func muladd1(hi, lo: var BaseType, a, b, c: BaseType) {.inline.} =
	## Extended precision multiplication + addition
	## (hi, lo) <- a*b + c
	##
	## Note: 0xFFFFFFFF_FFFFFFFF² -> (hi: 0xFFFFFFFFFFFFFFFE, lo: 0x0000000000000001)
	## so adding any c cannot overflow
	var carry: BaseType
	mul(hi, lo, a, b)
	addC(carry, lo, lo, c, 0)
	addC(carry, hi, hi, 0, carry)

	func muladd2(hi, lo: var BaseType, a, b, c1, c2: BaseType) {.inline.}=
	## Extended precision multiplication + addition + addition
	## (hi, lo) <- a*b + c1 + c2
	##
	## Note: 0xFFFFFFFF_FFFFFFFF² -> (hi: 0xFFFFFFFFFFFFFFFE, lo: 0x0000000000000001)
	## so adding 0xFFFFFFFFFFFFFFFF leads to (hi: 0xFFFFFFFFFFFFFFFF, lo: 0x0000000000000000)
	## and we have enough space to add again 0xFFFFFFFFFFFFFFFF without overflowing
	var carry1, carry2: BaseType

	mul(hi, lo, a, b)
	# Carry chain 1
	addC(carry1, lo, lo, c1, 0)
	addC(carry1, hi, hi, 0, carry1)
	# Carry chain 2
	addC(carry2, lo, lo, c2, 0)
	addC(carry2, hi, hi, 0, carry2)

	const Lattice_BN254_Snarks_G1: array[2, array[2, tuple[b: BigInt[127], isNeg: bool]]] = [
	# Curve of order 254 -> mini scalars of size 127
	# u = 0x44E992B44A6909F1
	[(BigInt[127].fromHex"0x89d3256894d213e3", false), # 2u + 1
	(BigInt[127].fromHex"0x6f4d8248eeb859fd0be4e1541221250b", false)], # 6u² + 4u + 1
	[(BigInt[127].fromHex"0x6f4d8248eeb859fc8211bbeb7d4f1128", false), # 6u² + 2u
	(BigInt[127].fromHex"0x89d3256894d213e3", true)] # -2u - 1
	]

	const Babai_BN254_Snarks_G1 = [
	# Vector for Babai rounding
	BigInt[127].fromHex"0x89d3256894d213e3", # 2u + 1
	BigInt[127].fromHex"0x6f4d8248eeb859fd0be4e1541221250b" # 6u² + 4u + 1
	]

	const Lattice_BLS12_381_G1: array[2, array[2, tuple[b: BigInt[128], isNeg: bool]]] = [
	# Curve of order 254 -> mini scalars of size 127
	# u = 0x44E992B44A6909F1
	[(BigInt[128].fromHex"0xac45a4010001a40200000000ffffffff", false), # u² - 1
	(BigInt[128].fromHex"0x1", true)], # -1
	[(BigInt[128].fromHex"0x1", false), # 1
	(BigInt[128].fromHex"0xac45a4010001a4020000000100000000", false)] # u²
	]

	const Babai_BLS12_381_G1 = [
	# Vector for Babai rounding
	BigInt[128].fromHex"0xac45a4010001a4020000000100000000",
	BigInt[128].fromHex"0x1"
	]

	def scalarMulGLV(scalar, P0):
	m = 2
	L = ((int(r).bit_length() + m-1) // m) + 1 # l = ⌈log2 r/m⌉ + 1

	print('L: ' + str(L))

	print('scalar: ' + Integer(scalar).hex())

	k0, k1 = getGLV2_decomp(scalar)
	print('k0: ' + k0.hex())
	print('k1: ' + k1.hex())

	P1 = (lambda1_r % r) * P0
	(Px, Py, Pz) = P0
	P1_endo = G1([Px*phi1 % p, Py, Pz])
	assert P1 == P1_endo

	expected = scalar * P0
	decomp = k0P0 + k1P1
	assert expected == decomp

	print('------ recode scalar -----------')
	even = k0 & 1 == 1
	if even:
	k0 -= 1

	b = recodeScalars([k0, k1])
	print('b0: ' + str(list(reversed(b[0]))))
	print('b1: ' + str(list(reversed(b[1]))))

	print('------------ lut ---------------')

	lut = buildLut(P0, P1)

	print('------------ mul ---------------')
	print('b0 L-1: ' + str(b[0][L-1]))
	Q = b[0][L-1] * lut[b[1][L-1] & 1]
	for i in range(L-2, -1, -1):
	Q *= 2
	Q += b[0][i] * lut[b[1][i] & 1]

	suite "Modular squaring - bugs highlighted by property-based testing":
	test "a² == (-a)² on for Fp[2^127 - 1] - #61":
	var a{.noInit.}: Fp[Mersenne127]
	a.fromHex"0x75bfffefbfffffff7fd9dfd800000000"

	var na{.noInit.}: Fp[Mersenne127]

	na.neg(a)

	a.square()
	na.square()

	check:
	bool(a == na)

	var a2{.noInit.}, na2{.noInit.}: Fp[Mersenne127]
	a2.fromHex"0x75bfffefbfffffff7fd9dfd800000000"
	na2.neg(a2)

	a2 *= a2
	na2 *= na2

	check:
	bool(a2 == na2)
	bool(a2 == a)
	bool(a2 == na)

	test "a² == (-a)² on for Fp[2^127 - 1] - #62":
	var a{.noInit.}: Fp[Mersenne127]
	a.fromHex"0x7ff7ffffffffffff1dfb7fafc0000000"

	var na{.noInit.}: Fp[Mersenne127]

	na.neg(a)

	a.square()
	na.square()

	check:
	bool(a == na)

	func cadd(a: var BigInt, b: BigInt, ctl: bool): bool =
	## In-place optional addition
	##
	## It is NOT constant-time and is intended
	## only for compile-time precomputation
	## of non-secret data.
	var carry, sum: BaseType
	for i in 0 ..< a.limbs.len:
	let ai = BaseType(a.limbs[i])
	let bi = BaseType(b.limbs[i])
	addC(carry, sum, ai, bi, carry)
	if ctl:
	a.limbs[i] = SecretWord(sum)

	result = bool(carry)

	func csub(a: var BigInt, b: BigInt, ctl: bool): bool =
	## In-place optional substraction
	##
	## It is NOT constant-time and is intended
	## only for compile-time precomputation
	## of non-secret data.
	var borrow, diff: BaseType
	for i in 0 ..< a.limbs.len:
	let ai = BaseType(a.limbs[i])
	let bi = BaseType(b.limbs[i])
	subB(borrow, diff, ai, bi, borrow)
	if ctl:
	a.limbs[i] = SecretWord(diff)

	result = bool(borrow)

	func doubleMod(a: var BigInt, M: BigInt) =
	## In-place modular double
	## a -> 2a (mod M)
	##
	## It is NOT constant-time and is intended
	## only for compile-time precomputation
	## of non-secret data.
	var ctl = dbl(a)
	ctl = ctl or not a.csub(M, false)
	discard csub(a, M, ctl)

	func `<`(a, b: BigInt): bool =
	var diff, borrow: BaseType
	for i in 0 ..< a.limbs.len:
	subB(borrow, diff, BaseType(a.limbs[i]), BaseType(b.limbs[i]), borrow)

	result = bool borrow

	func shiftRight*(a: var BigInt, k: int) =
	## Shift right by k.
	##
	## k MUST be less than the base word size (2^32 or 2^64)

	for i in 0 ..< a.limbs.len-1:
	a.limbs[i] = (a.limbs[i] shr k) or (a.limbs[i+1] shl (WordBitWidth - k))
	a.limbs[a.limbs.len-1] = a.limbs[a.limbs.len-1] shr k

	func montySquare_CIOS(r: var Limbs, a, M: Limbs, m0ninv: BaseType) =
	## Montgomery Multiplication using Coarse Grained Operand Scanning (CIOS)
	##
	## Architectural Support for Long Integer Modulo Arithmetic on Risc-Based Smart Cards
	## Johann Großschädl, 2003
	## https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=95950BAC26A728114431C0C7B425E022?doi=10.1.1.115.3276&rep=rep1&type=pdf
	##
	## Analyzing and Comparing Montgomery Multiplication Algorithms
	## Koc, Acar, Kaliski, 1996
	## https://www.semanticscholar.org/paper/Analyzing-and-comparing-Montgomery-multiplication-Ko%C3%A7-Acar/5e3941ff482ec3ee41dc53c3298f0be085c69483

	# TODO: Deactivated
	# Off-by one on 32-bit for Fp[2^127 - 1] with inputs
	# - -0x75bfffefbfffffff7fd9dfd800000000
	# - -0x7ff7ffffffffffff1dfb7fafc0000000
	# Squaring the number and its opposite
	# should give the same result, but those are off-by-one

	# We want all the computation to be kept in registers
	# hence we use a temporary `t`, hoping that the compiler does it.
	var z: typeof(r) # zero-init
	const L = z.len
	# Extra words to handle up to 2 carries t[N] and t[N+1]
	var zLp1: SecretWord
	var zL: SecretWord

	staticFor i, 0, L:
	# Squaring
	var t: Carry
	var u, v: SecretWord
	# (u, v) <- a[i] * a[i] + z[i]
	muladd1(u, v, a[i], a[i], z[i])
	z[i] = v
	staticFor j, i+1, L:
	# (t, u, v) <- 2a[j]a[i] + z[j] + (t, u)
	# 2a[j]a[i] can spill 1-bit on a 3rd word
	mulDoubleAdd2(t, u, v, a[j], a[i], z[j], t, u)
	z[j] = v

	block:
	# (u, v) <- zs + (t, u)
	# zL <- v
	# zL+1 <- u
	var C: Carry
	addC(C, v, zL, u, Carry(0))
	addC(C, u, zLp1, SecretWord(t), C)
	zL = v
	zLp1 = u

	# Reduction
	# m <- (z[0] * m0ninv) mod 2^w
	# (u, v) <- m * M[0] + z[0]
	let m = z[0] * SecretWord(m0ninv)
	muladd1(u, v, m, M[0], z[0])
	staticFor j, 1, L:
	# (u, v) <- m*M[j] + z[j] + u
	# z[j-1] <- v
	muladd2(u, v, m, M[j], z[j], u)
	z[j-1] = v

	block:
	# (u, v) <- zL + u
	# z[L-1] <- v
	# z[L] <- zL+1 + u
	var C: Carry
	addC(C, v, zL, u, Carry(0))
	z[L-1] = v
	addC(C, zL, zLp1, Zero, C)

	discard z.csub(M, zL.isNonZero() or not(z < M)) # TODO: (z >= M) is unnecessary for prime in the form (2^64)^w - 1
	r = z

mratsim / constantine Goto Github PK

constantine's Introduction

Constantine

Public API: Curves & Protocols

Protocols

Elliptic Curves

General cryptography

Threadpool

Installation

From Rust

From Go

From C

From Nim

Dependencies & Requirements

Performance

Assembly & Hardware acceleration

Security

Disclaimer

Security disclosure

Why Nim

License

constantine's People

Contributors

Stargazers

Watchers

Forkers

constantine's Issues

Context

Choosing a default: Maximizing the usage of the word overhead

Side-note on SIMD

Side-note on "final substraction"-less Montgomery Multiplication/Exponentiation

Implementation strategy

References

Security

Verification of Assembly

Performance

Add with carry, sub with borrow

Conditional move

MULX

ADCX/ADOX

Context

Current implementation

montgomeryResidue

shlAddMod

Montgomery Multiplication

Implementation strategy

Computing R² (mod p) at compile-time

Current situation

Analysis

Note on conversion from Montgomery domain to the natural domain

Tradeoffs

Ergonomics

Performance

Parameter-passing of arrays

Register vs memory

{.noInit.}

32-bit

64-bit

Research

Implementation

Research

Implementations

Side-note

Montgomery simultenous inversion

Parallelizable alternatives

References

Checking or finding non-residues

Abstracting over the non-residues

Recommend Projects

Recommend Topics

Recommend Org

`shlAddMod`