ogxd / gxhash Goto Github PK

View Code? Open in Web Editor NEW

248.0 3.0 13.0 12.13 MB

The fastest hashing algorithm 📈

Home Page: https://docs.rs/gxhash

License: MIT License

TeX 2.86% Rust 97.14%

cryptography hash hashing hashmap ilp no-std performance simd

gxhash's Introduction

gxhash's People

Contributors

Stargazers

Watchers

Forkers

snowppy dherse itsethra zhnathaniellee genbox gmh5225 virtualritz xxaier tkoeppen notsatvrn thedodd i18n-now beckend

gxhash's Issues

how to use AVX2 only for hashmap/set ，and use stable hash for hash128

Possible seed and value recovery

For small inputs (<16 bytes), the length of the input is added to the input.

Value recovery
During compressing, the byte array { 1, 2, 3, 4 } becomes a 16-byte vector <5, 6, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4>
It is then run through Finalize, which runs 4 rounds of AES encrypt with static keys:

output = Encrypt(input, seed);
output = Encrypt(output, key1);
output = Encrypt(output, key2);
output = Encryptoutput, key3);

It is possible for an attacker who knows the seed to recover the original value by inverting the function.

output = Decrypt(input, key3);
output = Decrypt(output, key2);
output = Decrypt(output, key1);
output = Decrypt(output, seed);

We get a masked input, but can simply unmask it by taking the length at the end of the vector and subtract it from each of the values in the vector. <5, 6, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4> then becomes the original input <1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>

Invertibility in a non-cryptographic hash function is not an issue, but it makes certain attacks much easier.

Seed recovery
In the case where an attacker does not know the seed, it is possible to recover the seed through brute-force, given only a single full-length hash output.

Let's say we are given a 128-bit hash value of <213, 184, 163, 143, 142, 216, 110, 155, 68, 66, 183, 166, 216, 225, 120, 52>
We run it through our inverted hash function above and get <148, 13, 13, 13, 13, 13, 13, 207, 13, 13, 58, 13, 13, 144, 13, 13>

At this point we don't know that the original input was <5, 6, 7, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4>, but we can brute-force the seed by running a single decrypt round: output = Decrypt(input, seed);

We know we have the right seed, if the result of decryption ends with a repeating integer (the length). We can filter false-positives by checking if the repeating characters start length-number-of-bytes into the vector. Once we have the seed, we can recover the original value.

Add meow_hash to bench

https://github.com/cmuratori/meow_hash

https://mollyrocket.com/meowhash

Readme image link broken

Missing `std` compatibility traits on `GxHasher` and `GxBuildHasher`

I often use AHashMap as a drop in for HashMap in my code.
Switching to GxHashMap in one of my projects, I get this:

error[E0277]: the trait bound `GxBuildHasher: Clone` is not satisfied
  --> src/app.rs:60:26
   |
59 | #[derive(Clone, Debug, Default)]
   |          ----- in this derive macro expansion
60 | struct UnicodeCollection(HashSet<char>);
   |                          ^^^^^^^^^^^^^ the trait `Clone` is not implemented for `GxBuildHasher`
   |
   = note: required for `HashSet<char, GxBuildHasher>` to implement `Clone`

Here is a PR that, among other stuff, fixes this..

Don't finalize in Hasher::write, but in finish instead

Context

Currently, each Hasher::write implies a finalization. This is a problem because hashing strings for instance needs 2 writes (see write_str implementation), thus 2 finalization steps, which is the most expensive part of the hashing process.

Todo

Only finalize in finish

Hybrid state gxhash

Context

Currently, gxhash with a 128-bit state is much faster for small input sizes, while gxhash with a 256-bit state is faster for large input sizes. The idea is to study the possibilities of making an hybrid state version of gxhash, leveraring the advantages of 128-bit state with the advantages of the 256-bit state processing, for maximum throughput for all input sizes.

See the break-even point:

Challenges

There are two ways to achieve this:

Either modify the 256-bit state version to use 128-bit state logic for small input sizes
Or try to make the 256-bit state processing stable along the 128-bit state. This is more complex, but if achieved, it could greatly simplify the usage of gxhash as a whole (stable for intrinsics set width)

Some challenges are:

It must be all evaluated at compile time because we don't want to increase the bytecode of the algorithm. There could not be a dynamic switch between two state width paths in the algorithm.
Making 256-bit and 128-bit stable requires rethinking of the high ILP construction

Todo

Study possibilities of using generics to avoid dynamic state width evaluation
Figure out where the break-even point is
Study the possibility of having a stable high ILP construction for both 128 and 256 bit states

error[E0432]: unresolved import `platform`

Tracking: missing compatibility with `std::HashMap` & `std::HashSet`

I just looked into replacing ahash with gxhash elsewhere and ran into HashMap::new() missing.

I reckon there may be more. I'll have a look later and may add some PRs to this issue. I'd keep it open until I'm certain the gxhash containers are full drop-in replacements for the std versions.

C# version

I saw your post on dotnet/runtime#85206 about a C# port. Are you able to share it? I'd like to benchmark it both in terms of speed and correctness.

Primitives specialization for Hasher

Context

The Hasher trait exposes methods to hash primitives. Currently, we hash primitives by considering them all as slices of bytes. Hashing can be much performance if the type is known in advance (eg load primitive directly in SIMD vector).

Triggered by the following discussion: rust-lang/hashbrown#487

Goals

Hashing a primitive type is faster.
Hashing a primitive still follows the algorithm principles (and thus remains stable and passes SMHasher)

Todo

Add benchmark to test hashing on primitive, like u32
Implement all methods (write_u32, write_u64, ...)
- Make it work for ARM
- Make it work for x86
Publish benchmark results before/after on ARM/X86]

Latency benchmarks

Hi! It would be interesting to see comparison of time to hash a short string (like in aHash README). Throughput is great, but since in many cases hashmap keys are short strings, time to hash 40 KB of data is less relevant then time time to hash some e.g. 16-byte strings. Plot of hashing time per key size would be really helpful IMO.

Use aligned loads in get_partial_safe

In get_partial_safe, it's possible to use aligned loads by declaring the buffer as MaybeUninit<State> and then casting the pointer for std::ptr::copy and zeroing the rest of the buffer, instead of declaring the buffer as an u8 array.

gxhash/src/gxhash/platform/x86_128.rs

Lines 35 to 39 in b2b9d24

 let mut buffer = [0i8; VECTOR_SIZE]; 

 // Copy data into the buffer 

 std::ptr::copy(data as *const i8, buffer.as_mut_ptr(), len); 

 // Load the buffer into a __m256i vector 

 let partial_vector = _mm_loadu_epi8(buffer.as_ptr());

How to use AVX2 only for hashmap/set, and use stable hash for hash128/64/32

how to use AVX2 only for hashmap/set ，and use stable hash for hash128/64/32

Non-hardcoded AES keys

The code seems to hardcode AES keys.

This makes no sense, and it should instead a mechanism similar to what the standard library uses to provide keys for SipHash.

Add fallback for platforms without AES acceleration

WASM support

Hello, I find this project quite interesting, do you think there could be a WASM compatible version (perhaps just not manually vectorized) for applications that compile on both WASM and native targets?

Congratulations on the awesome performance and good quality metrics! 🎉

Thanks in advance,

Dherse

how many bytes use write() each time that are most efficient?

i see blake3
https://docs.rs/blake3/latest/blake3/struct.Hasher.html

Hasher implements std::io::Write, so it’s possible to use std::io::copy to update a Hasher from any reader. Unfortunately, this standard approach can limit performance, because copy currently uses an internal 8 KiB buffer that isn’t big enough to take advantage of all SIMD instruction sets. (In particular, AVX-512 needs a 16 KiB buffer.) update_reader avoids this performance problem and is slightly more convenient.

I want ask, how many bytes use write() each time that are most efficient when use gxhash?

Couple issues in the paper

Combiniting is not a word

"Ideally, half of the bit should change on average." should be "Ideally, half of the hash should change on average"

Functionality on stable

It seems like this is nightly only, but it is probably pretty easy to get it working on stable. Here are some workarounds for the nightly features it uses:

pointer_byte_offsets: just cast it back and forth ptr.cast::<u8>().add(...).cast()
stmt_expr_attributes: looks like you only use this to make #[allow(unused_assignments)] ok. You can probably eliminate this attribute and work around it with something like let _tmp_ptr; before the if block then load_unaligned(_tmp_ptr, ...)
core_intrinsics for likely: you may be able to reverse this and get the same effects with stable #[cold] instead. There is also likely_stable which may work. You also might not even need it - not sure if you've benchmarked but iirc, likely doesn't gain as much in recent LLVM versions
stdsimd for _mm_loadu_epi8. I'm unfortunately not really sure what to do here, maybe you could inline assembly it?

Missing generic implementation or helper

The crate seems to be missing a non-arch specific implementation, which forces anyone using gxhash to select another hash and use cfg directives to choose between them (obviously releasing non-specialized crates that only work on x86-64 and arm64 is not acceptable).

If the user selecting another hash is desired, then all GxHash types should take the other hash as a generic parameter, for user convenience.

cargo bench --no-fail-fast -F=avx2 failed

     Running benches/hashset.rs (target/release/deps/hashset-e0bfa844d705dd83)
Gnuplot not found, using plotters backend
HashSet/u32/Default Hasher
                        time:   [16.380 ns 16.711 ns 17.067 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking HashSet/u32/GxHash: Warming up for 3.0000 serror: bench failed, to rerun pass `--bench hashset`

Caused by:
  process didn't exit successfully: `/root/git/gxhash/target/release/deps/hashset-e0bfa844d705dd83 --bench` (signal: 4, SIGILL: illegal instruction)
     Running benches/ilp.rs (target/release/deps/ilp-18f7b97131886606)
Gnuplot not found, using plotters backend
baseline                time:   [134.34 µs 135.20 µs 136.28 µs]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

unrolled                time:   [129.67 µs 132.18 µs 136.40 µs]
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe

temp                    time:   [54.313 µs 55.820 µs 57.345 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

laned                   time:   [46.377 µs 47.916 µs 49.764 µs]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

     Running benches/throughput/main.rs (target/release/deps/throughput-ce001a906bb8460b)
gxhash-avx2
error: bench failed, to rerun pass `--bench throughput`

Caused by:
  process didn't exit successfully: `/root/git/gxhash/target/release/deps/throughput-ce001a906bb8460b --bench` (signal: 4, SIGILL: illegal instruction)
     Running benches/throughput_criterion.rs (target/release/deps/throughput_criterion-c233dcb596180928)
Gnuplot not found, using plotters backend
Benchmarking all/gxhash-avx2/4: Warming up for 3.0000 serror: bench failed, to rerun pass `--bench throughput_criterion`

Caused by:
  process didn't exit successfully: `/root/git/gxhash/target/release/deps/throughput_criterion-c233dcb596180928 --bench` (signal: 4, SIGILL: illegal instruction)
error: 3 targets failed:
    `--bench hashset`
    `--bench throughput`
    `--bench throughput_criterion`
~/git/gxhash main                                                                                                                                                     1m 30s root@u2 13:43:20
❯ cargo bench --no-fail-fast -F=avx2

Quality issue: permutations produce the same hashes?

I have noticed a strange behavior which I am not sure whether it is intended or not:

The hashes of two u8 a and b are the same regardless of order:

(a, b).hash(&mut state)
(b, a).hash(&mut state)

This is an issue in my particular use case because it prevents me from distinguishing between two distinct cases only by the hash.

I don't know whether this is actually a bug or no as I am a total noob about cryptography and hashing functions.

Thanks for you help/advice,

Dherse

Make GxHasher DOS resistant

Context

Currently, GxHasher::default() is not DOS resitant. That is because it uses a default state/seed of 0, which implies that an attacker can generate in advance a set of colliding keys for this seed and dramatically alter the efficiency of a given HashMap / HashSet on a targeted service.
It is very possible to use GxHash with a seed (a full state or a short, fixed-size i64 value). Thus, by making GxHasher::default() create an Hasher with a random state (using a cryptographically secure RNG) we can make it DOS resistant, as attackers will no longer be able to know it advance which set of colliding keys.
An i64 seed is more than enough for this purpose.

Todo

Change GxHasher::default() to use a random seed.
Mention DOS resistance in README
Release new version

Major bug: out of bounds read for some input sizes

Context

For input sizes >= 80 bytes and modulo 16 (length of vector size) the construction is proceeding to reading one vector out of the bounds, making such hashes invalid.
This is a major bug for two reasons:

It means for instance hashing the same 80 bytes input twice can lead to different hashes as a result
It can lead to segfaults from reading beyond private memory

Todo

Fix the issue
Add an unit test to prevent out of bounds reads to happen again
Publish update (2.2.5)
Cargo yank bugged versions (2.X.X up to 2.2.4 included)

	let mut buffer = [0i8; VECTOR_SIZE];
	// Copy data into the buffer
	std::ptr::copy(data as *const i8, buffer.as_mut_ptr(), len);
	// Load the buffer into a __m256i vector
	let partial_vector = _mm_loadu_epi8(buffer.as_ptr());

ogxd / gxhash Goto Github PK

gxhash's Introduction

gxhash's People

Contributors

Stargazers

Watchers

Forkers

gxhash's Issues

Context

Todo

Context

Challenges

Todo

Context

Goals

Todo

Context

Todo

Context

Todo

Recommend Projects

Recommend Topics

Recommend Org