Giter Club home page Giter Club logo

Comments (4)

stgatilov avatar stgatilov commented on August 24, 2024

I think a good solution would be to define custom tsl::hash<T>.
Depending on __GLIBCXX__ (perhaps also check what's going on in libc++), it should either redirect to std::hash<T>, or call std::hash<T> and run its output through some hash finalizer.
Then replace std::hash with tsl::hash as default template argument in all class declarations.

GCC users will get proper hash function by default. No changes for non-GCC users. And hash function will remain fully tweakable.

from robin-map.

Tessil avatar Tessil commented on August 24, 2024

Yes, at the time of creating the map, I used the default std::hash as I didn't wanted to maintain my own hash function or have the repository depends on an external dependency, and adapted the hash in my projects depending on the usage.

Your solution would be a good compromise, I'll check to work on it when I have more time but don't hesitate to create a PR in the meantime if you have the time.

from robin-map.

stgatilov avatar stgatilov commented on August 24, 2024

I started working on PR.
Thus far I have added a unit test checking that default hash function on uint64_t depends on upper half of the integer.

I decided to check GCC / libstd-c++ to see which hash finalizer is used there.
It turns out that no hash finalizer is used at all!
They use prime size for hash table and simply hope that "modulo size" reduction is good enough.

For example, this code takes 8 seconds for 50K elements on my machine:

    std::unordered_set<uint64_t> set;
    set.reserve(NumElements);
    size_t p = set.bucket_count();
    for (uint64_t i = 0; i < NumElements; i++)
        set.insert(i * p);

The same happens on Clang / libc++.

Clearly, this issue is very unlikely to happen unintentionally if prime size is used.

from robin-map.

stgatilov avatar stgatilov commented on August 24, 2024

In the PR, I decided to use hash finalizers from MurmurHash2.
They are rather well-known and do very few operations.
Although 64-bit case looks weird: it seems that output bits in [17..47) range don't depend on the higher bits in the same range.

The newer MurmurHash3 uses one more shift & multiply round in finalizer.
Maybe it should be used in 64-bit case?
More specifically, the fmix64 function.

from robin-map.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.