Giter Club home page Giter Club logo

Comments (5)

martinus avatar martinus commented on September 10, 2024

Hi! It's an interesting use case. I usually try to be standard conform to std::unordered_map when possible, but I think this should be a simple extension that might be worth it. Can I have a look at the code?

from robin-hood-hashing.

mikemccarty-vertex avatar mikemccarty-vertex commented on September 10, 2024

I'm having a little trouble with github authentication at the moment so I'll just paste the code here. I think it is pretty self-explanatory:

    template<typename PRNG>
    size_t sampleIdx(PRNG& prng) const {
        size_t idx;
        InfoType info;

        // Pre-condition:  !empty()
        do {
            idx = prng();
            info = mInfoInc + static_cast<InfoType>(idx >> mInfoHashShift);
            idx &= mMask;

            while (mInfo[idx]==0) {
                next(&info, &idx);
            }

            if (mInfo[idx] != 0) {
                return idx;
            }
        } while ( true );
    }

    template<typename PRNG>
    const_iterator sample(PRNG& prng) const {
        ROBIN_HOOD_TRACE(this);
        if (empty()) {
            return cend();
        }
        const size_t idx = sampleIdx(prng);
        return const_iterator{mKeyVals + idx, mInfo + idx};
    }

The do ... while(true) is leftover from the initial (broken) implementation. It can probably be removed...

from robin-hood-hashing.

martinus avatar martinus commented on September 10, 2024

I'm afraid your sampling is not uniform. E.g. a hashmap's content could look like this (x means no element there)

x x x x x x 1 2

if you sample that way the number 1 would be sampled with probability 7/8=0.875, and number 2 only 1/8=0,125.

I think a correct sampling could would be this:

    template <typename R>
    iterator uniform_sample(R& rng) {
        if (empty()) {
            return end();
        }

        // to guarantee uniform random sampling, we have to randomly try spots until we find
        // an element that is not empty. It's slow though for maps with low fill factor.
        size_t idx;
        do {
            idx = (std::uniform_int_distribution<size_t>{}(rng)) & mMask;
        } while (0 == mInfo[idx]);
        return iterator{mKeyVals + idx, mInfo + idx};
    }

from robin-hood-hashing.

mikemccarty-vertex avatar mikemccarty-vertex commented on September 10, 2024

That's a good point -- there's a tradeoff between speed (minimal calls to the RNG) and distribution. One of my implementations had similar behavior to yours in that it sometimes made a lot of RNG calls in order to score a hit. I see now I may have over-compensated in the other direction.

I wonder if there's a reasonable compromise to be made here...

from robin-hood-hashing.

martinus avatar martinus commented on September 10, 2024

I think a better solution would be to have a storage vector separately from the unordered map, like this:

    std::vector<std::pair<size_t, std::string>> keyAndData;
    robin_hood::unordered_map<size_t, size_t> keyToDataidx;

So the keyToDataidx map points from a key to an index of the keyAndData vector. That way accessing data is O(1), e.g. value = keyAndData[keyToDataidx[key]].

Removing an element can also be done in O(1): find the element's index with idx = keyToDataidx[key], then replace keyAndData[idx] with keyAndData.back(), then update keyToDataIdx[keyAndData[idx].first] = idx.

Random sampling can then be easily done with the elements in keyToDataidx.

from robin-hood-hashing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.