benwr / bromberg_sl2 Goto Github PK

Cayley hashing as in "Navigating in the Cayley Graph of SL₂(𝔽ₚ)"

License: Creative Commons Zero v1.0 Universal

Rust 100.00%

homomorphic hashing hash monoidal matrix-multiplication sl2 tillich-zemor monoid

bromberg_sl2's Introduction

Bromberg-Shpilrain-Vdovina SL₂ Homomorphic Hashing

This is an implementation of the Tillich-Zémor-style hash function presented in the paper "Navigating in the Cayley Graph of SL₂(𝔽ₚ)" by Bromberg, Shpilrain, and Vdovina.

Warning

This module is not produced by cryptography experts, but by some random guy. Furthermore, the algorithm was published in 2017, and is itself not at all battle-tested. Only use this library if you either (a) know what you're doing and have read and understood our code, and/or (b) are building something that does not rely heavily on the cryptographic properties of the hash function.

If you are a cryptography expert, we welcome any bug reports or pull requests! We also welcome them if you're not a cryptography expert; this library is quite simple, and should be easy to grok over a coffee with a copy of the paper linked above in hand.

What is this library for?

This library implements a putatively-strong hash function H with the useful property that it gives a monoid homomorphism. This means there is a cheap operation * such that given strings s1 and s2, H(s1 ++ s2) = H(s1) * H(s2).

This property is especially useful for applications where some very long string may be constructed via many different routes, but you'd nonetheless like to be able to quickly rule out unequal strings.

It also allows you to hash parts of your data as you acquire them, and then merge them later in whatever order is convenient. This allows for very flexible hashing schemes.

H has some other cool properties, and is in some limited but potentially-useful sense "provably secure". See Bromberg et al. for details.

How to use this library

This library provides the means to construct HashMatrixes, using hash(), which takes a slice of bytes. These hashes can be compared, or serialized to hex strings using to_hex.

use bromberg_sl2::*;
assert_eq!(
  hash("hello, world! It's fun to hash stuff!".as_ref()).to_hex(),
  "01c5cf590d32654c87228c0d66441b200aec1439e54e724f05cd3c6c260634e565594b61988933e826e9705de22884ce007df0f733a371516ddd4ac9237f7a46");

Hashes may also be composed, using the * operator:

use bromberg_sl2::*;
assert_eq!(
  hash("hello, ".as_ref()) * hash("world!".as_ref()),
  hash("hello, world!".as_ref())
);

Technical Details

We use the A(2) and B(2) matrices as generators, and p = 2^127 - 1 as our prime order, for fast modular arithmetic.

We have not yet attempted to seriously optimize this library, and performance is a secondary goal. As of right now our procedure is about 1/3 as fast as SHA3-512.

We needed an architecture-agnostic cryptographic hash procedure with a monoid homomorphism respecting string concatenation, written in a low-level language. While there are a few implementations of related algorithms, e.g. the venerable but broken Tillich-Zémor hash, from "Hashing with SL₂" , none of them fulfill these desiderata.

bromberg_sl2's People

Contributors

Stargazers

Watchers

Forkers

isgasho basile-henry zed-industries iq-scm

bromberg_sl2's Issues

Is this alive?

I really need such a hashing function in my project.
Is this safe to use? Can I help somehow in order to make it safe?

Provide FFI bindings?

Not sure how difficult this is to do as an afterthought, though my guess is that it isn't too hard.

Incorporate various "best practices" documents

e.g. ensure we're following all possible security best practices and rust ecosystem best practices.

Property tests

Compare my results here with results acquired via bignum libraries.

Implement `digest` traits

https://docs.rs/digest/0.9.0/digest/trait.Digest.html

Larger matrix lookup table?

Right now we have a lookup table where we store the hash matrix for each byte. We could also store a hash matrix for each byte pair, which might ~double throughput (at best; in practice you'll have a lot more cache misses). The downside is that you'd need to store 2^16 * 2^6 bytes = 2^20 * 2^2 bytes = 4 MiB. This is just small enough to fit in L3, so the cache miss rate might not be so so bad. My guess is that it's worth attempting and benchmarking.

Another downside is that we want to use this library in a web context, and 4MiB is a lot of weight to add to your binary. You could have two different compilation modes, one with the table calculated on startup (or using a smaller table) and one with the table built in, for cases like this.

no_std support

I don't think we use anything that should preclude no_std support, though I could be wrong. My impression is that it's good practice to provide no_std whenever possible.

Probably everything in here should be constant-time

Right now there are some branches in the modular arithmetic; these give us ~2x speedup I think, but IIUC in general it's bad to have variable running times in crypto libraries. Probably should just kill that stuff.

If I wanted to be super careful about this I would probably also kill the 2-byte storage table? I don't really know. There are two potential problems with it:

Two different tables (2-byte and 1-byte) might make the running time more dependent on parity than it should be. I think this is basically a non-problem, though, because the running time is unavoidably going to vary with the input length.
Bigger tables have more cache misses, and that means more inferable state. I don't know how much this matters in practice but I'd guess it's not totally trivial for legit crypto libraries.

Sadly if I do both of these things I'll wipe out lots and lots of performance gains, which I actually do care about pretty significantly. My current state is something like "just, like, don't use this to hash data that's supposed to be secret, I guess?"

merge const and non-const fns where possible

Right now there's a whole tree of const and non-const fns, because at one point I was trying to use some non-const stuff inside add() or something. But it turns out that was slower than just using the const version. So now there's a bunch of duplicated stuff where it's not necessary. I think the only things that do need to be split are mod_p and matmul.