Well-known technique to mix bits up and then extract some: <div class="snippet-cli

Thank you for the idea. How do you calculate the <code class="notran

Replace mod_growth_policy with multiplication-based one about robin-map HOT 9 CLOSED

tessil commented on August 24, 2024

Replace mod_growth_policy with multiplication-based one

from robin-map.

Comments (9)

Tessil commented on August 24, 2024

Thank you for the idea.

How do you calculate the SHIFT from the bucket_count? Wouldn't you end-up to just take the x most significant bits of the hash?

I know there is the "fast range" from Lemire that uses the multiplication+shift to map a value x into a range [0, n).

Instead of:

uint32_t reduce(uint32_t x, uint32_t N) {
  return x % N;
}

He proposes to use:

uint32_t reduce(uint32_t x, uint32_t N) {
  return ((uint64_t) x * (uint64_t) N) >> 32 ;
}

But the problem is that hashes that are close together will end-up in the same bucket. Not too much of a problem with a good hash function, but libstd++ uses an identity hash function for std::hash<int>. So if there is a lot of small or close integers (which is often the case) you will end-up with a lot of collisions. At first glance, your method will suffer of the same problems.

from robin-map.

Bulat-Ziganshin commented on August 24, 2024

Yeah, there are many variations of this idea. Lemire's method economize one shift, but requires "wide multiplication" that may be tiny bit more expensive than "narrow multiplication". Although on 64-bit cpu, it may be implemented in 64-bit registers with the same MUL+SHIFT sequence as my code.

Now about distributions. Indeed, Lemire's method is equivalent to computing x / N that suffers on id hash function. But my code

allows to use arbitrary multiplication constant
extracts bits 32-N .. 31 from the result rather than bits 32 .. 31+N.

It's easy to see that with full-width 32-bit multiplication constant every bit of output will depend on every bit of input. Well, almost. Actually, the problem is exactly opposite to Lemire's hash - with my formula, lower bits of output doesn't depend on higher bits of input. Which may be "good enough" for most, but not all, practical purposes.

If we need more fair hash, we can use wide multiplication and then add higher and lower words of the result:

   res = u64(x) * u64(N);
   return u32(res>>32) + u32(res)

For even more fair hash, we can repeat it again and/or add some more rol/shift/add operations. Just to let you know - for arbitrary constant N, cost of computing exact x % N value is 3 wide multiplications plus a few simpler operations, so even my "fair hash" is still several times cheaper.

Alternatively, we can go with CRC hashing. It's very interesting math construction: CRC32(n) = (n*x^32) % p(x), where p(x) is a primitive binary polynomial of degree 32 and we compute remainder of polynomial division with GF(2) arithmetic. Yes, CRC is just another remainder, and it share some desirable properties with arithmetic remainder, namely each bit of output depends on each bit of input, and distribution of results is fair (moreover, afair, CRC32(u32) is a 1:1 relation, so it's ideally fair, guaranteed). Moreover, each CRC(n) is very different from CRC(n+1) that is also great for Robin Hood hashing.

Now, how to compute: SSE 4.2 cpus (i.e. 90% of current end-user computers) have u32 CRC32(u32, prev_crc==0) instruction which is as cheap as a single narrow MUL. In 64-bit mode, there is also u32 CRC32_64(u64, prev_crc==0) instruction that can process 64 input bits in a single step, and is equivalent to CRC32(u32(x), CRC32(x>>32)).

For a more portable code, CRC can be computed using tables. I.E., since CRC(N) is just remainder of some binary polynomial corresponding to binary digits of N, we can split N into shorter polynomials, get remainder from each sub-polynomial from a table, and then sum them up. Usually, four 256-entry (4 * 8 bits) tables are used, although we can use three tables instead, of 1024 + 2048 + 2048 entries (10+11+11 bits):

b0 = x>>24
b1 = (x>>16) & 0xFF
b2 = (x>>8) & 0xFF
b3 = x & 0xFF
return table0[b0] ^ table1[b1] ^ table2[b2] ^ table3[b3]

This requires 4 load+MOVZX operations, 3 LOAD+XOR and 1 LOAD. Still looks like faster than exact x % N computation while providing even a bit better bit mixing. But require 4 KB memory to store tables. Plus allows you to directly support 64-bit (and longer) input hashes with guaranteed fair distribution.

from robin-map.

Tessil commented on August 24, 2024

The idea seems interesting.

I will see if I can add a new growth policy based on it and do some tests when I have more time (a bit busy with the end of the year). If you have time don't hesitate to make a pull request, the interface to implement a growth policy is quite simple (see https://github.com/Tessil/robin-map#growth-policy).

from robin-map.

Tessil commented on August 24, 2024

What constant should be used for CONST (in 32 and 64 bits), do you have any good values in mind? You said in your first message "CONST should be even and preferably prime number" but I suppose you just meant prime as 2 is the only even prime.

from robin-map.

Bulat-Ziganshin commented on August 24, 2024

It's typo, of course it should be odd. It shouldn't be even because in this case x*CONST will always have zero in at least one lowest bit. Primality is not important per se, but it's a simple way to ensure that CONST doesn't have [small] multipliers that can interfere with the following calculations.

I usually use 1234567891 as 31-bit prime number. Even better, you can borrow some from xxHash32 or xxHash64 - i know that Cyan is super-careful about details, so his numbers should be better than mine (i.e. have ~50/50 split between 1 and 0 bits). Use prime.cpp to generate more primes - f.e. you can start with some random-generated number having fair split between 0 and 1 bits and then find closest prime.

from robin-map.

Bulat-Ziganshin commented on August 24, 2024

If you have time don't hesitate to make a pull request, the interface to implement a growth policy is quite simple

You are absolutely right that it's easy to implement other grow policies, it can be done in a few minutes. The real work is to benchmark the outcome, and to try various modifications of a basic idea. So, if you prefer, i can write several policies implementing variations of these ideas, but benchmarking them will be on you.

BTW, do you have test(s) measuring fairness of policy (rather than raw speed)? SMHasher tests should be (almost) directly applicable for this task.

from robin-map.

Tessil commented on August 24, 2024

BTW, do you have test(s) measuring fairness of policy (rather than raw speed)?

No, the benchmarks I did were mainly geared toward comparing speed of collisions resolutions in hash tables (didn't check probes lengths and cache-misses formally). I didn't bother too much regarding the hash reductions and their distribution as the ones I use are fairly common

from robin-map.

Bulat-Ziganshin commented on August 24, 2024

Overall, I think that the library should provide the following policies:

No hashing and growth factor=2^K. This policy just extracts N lower bits from the provided hash value - ideal for situations when client code already implements sophisticated hashing algorithm and the library just need to use the hash value as is.
No hashing and arbitrary growth factor. In this case, the policy should just multiply the hash value by the current hash table size using Lemire's formula. Again, we rely on client code to supply us with already well-mixed hash value.
Fair hashing and growth factor=2^K. Fair hashing may use CRC or multiplication with my formula. It should be the default policy, since it guarantees fair distribution even when user-provided hash values are unfair.
Fair hashing and arbitrary growth factor. Combines computations of two previous hashes.
Hardware-assisted CRC fair hashing. Very fast and 100% fair, but require SSE 4.2.
Combination of policies 5+2

Since all policies come in pairs, it will be great to combine them, i.e. a template CrcPolicy<float Factor=2> provides default Factor=2 for users that don't care about exact growth factor, and its code at compile-time chooses between employing lower bits of hash for Factor==2^K and use of Lemire's formula otherwise.

I believe that the default hashing policy should be the most reliable (fair) one, rather than the fastest one. This follows the "least surprise" principle.

from robin-map.

Tessil commented on August 24, 2024

I'll close the issue and keep the power of two policy as default policy for now. In the end I didn't really have the time to test the different alternatives extensively. If someone makes a comparison of different growth policies, feel free to post the link here, I'd be curious to read it.

from robin-map.

Replace mod_growth_policy with multiplication-based one about robin-map HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent