Comments (5)
Hi! It's an interesting use case. I usually try to be standard conform to std::unordered_map
when possible, but I think this should be a simple extension that might be worth it. Can I have a look at the code?
from robin-hood-hashing.
I'm having a little trouble with github authentication at the moment so I'll just paste the code here. I think it is pretty self-explanatory:
template<typename PRNG>
size_t sampleIdx(PRNG& prng) const {
size_t idx;
InfoType info;
// Pre-condition: !empty()
do {
idx = prng();
info = mInfoInc + static_cast<InfoType>(idx >> mInfoHashShift);
idx &= mMask;
while (mInfo[idx]==0) {
next(&info, &idx);
}
if (mInfo[idx] != 0) {
return idx;
}
} while ( true );
}
template<typename PRNG>
const_iterator sample(PRNG& prng) const {
ROBIN_HOOD_TRACE(this);
if (empty()) {
return cend();
}
const size_t idx = sampleIdx(prng);
return const_iterator{mKeyVals + idx, mInfo + idx};
}
The do ... while(true)
is leftover from the initial (broken) implementation. It can probably be removed...
from robin-hood-hashing.
I'm afraid your sampling is not uniform. E.g. a hashmap's content could look like this (x
means no element there)
x x x x x x 1 2
if you sample that way the number 1 would be sampled with probability 7/8=0.875, and number 2 only 1/8=0,125.
I think a correct sampling could would be this:
template <typename R>
iterator uniform_sample(R& rng) {
if (empty()) {
return end();
}
// to guarantee uniform random sampling, we have to randomly try spots until we find
// an element that is not empty. It's slow though for maps with low fill factor.
size_t idx;
do {
idx = (std::uniform_int_distribution<size_t>{}(rng)) & mMask;
} while (0 == mInfo[idx]);
return iterator{mKeyVals + idx, mInfo + idx};
}
from robin-hood-hashing.
That's a good point -- there's a tradeoff between speed (minimal calls to the RNG) and distribution. One of my implementations had similar behavior to yours in that it sometimes made a lot of RNG calls in order to score a hit. I see now I may have over-compensated in the other direction.
I wonder if there's a reasonable compromise to be made here...
from robin-hood-hashing.
I think a better solution would be to have a storage vector separately from the unordered map, like this:
std::vector<std::pair<size_t, std::string>> keyAndData;
robin_hood::unordered_map<size_t, size_t> keyToDataidx;
So the keyToDataidx
map points from a key to an index of the keyAndData
vector. That way accessing data is O(1), e.g. value = keyAndData[keyToDataidx[key]]
.
Removing an element can also be done in O(1): find the element's index with idx = keyToDataidx[key]
, then replace keyAndData[idx]
with keyAndData.back()
, then update keyToDataIdx[keyAndData[idx].first] = idx
.
Random sampling can then be easily done with the elements in keyToDataidx
.
from robin-hood-hashing.
Related Issues (20)
- Segmentation fault: 11 with std::this_thread::get_id(); as key in C++ HOT 5
- Idea for robin-hood-hashing V2 HOT 1
- Keys are not marked as const when used structured binding during iteration over container HOT 2
- My problem with robin_hood::erase HOT 4
- robin_hoodConfig.cmake missing HOT 4
- terminate called after throwing an instance of 'std::overflow_error' what(): robin_hood::map overflow Aborted (core dumped) HOT 5
- hope to support bazel HOT 1
- How can I further improve performance in a read-only load? HOT 2
- func unaligned_load can be optimized HOT 2
- Please only build tests when BUILD_TESTING=ON HOT 1
- undefined symbol: std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) HOT 1
- Project doesn't install anything HOT 1
- Possible Data Corruption Bug? HOT 2
- Potentially critical bug in Robin Hood invariant checking HOT 4
- does insert support std::pair? HOT 1
- Consider using `INTERFACE_SYSTEM_INCLUDE_DIRECTORIES`
- Explicit template instantiation
- There is a coredump if I defined <std::string, std::shared_ptr<void>> when a pointer delete
- intel compiler warnings (not a bug)
- crash because alignment of key is not respected HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from robin-hood-hashing.