Comments (6)
Hi,
Thanks for the kind words.
Sparsepp uses open addressing (with this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table).
Unlike the chaining method used by std::unordered_map, the collisions are not resolved with a linked list, so the notion of bucket size does not really apply, and that's why the return values are 0 or 1.
There is no simple way currently to get the number of collisions without instrumenting the code. If you just insert items into the sparse_hash_set or map, you could use the statement at line 4315 (++num_probes;
) which indicates a collision every time it is executed
Hope this helps,
greg
from sparsepp.
If I use only insert() and count() operations in my sparse_hash_set (I do not use map), does num_probes
will report the total number of collisions ?
How can I get this value without producing bad code inside yours ?
PS:
Does your sparse_hash_set thread safe ?
Does your sparse_hash_set can be memory bounded ? For instance, if I want to use a hashtable with 1024 MBytes max, is it possible to bound your hashtable with this size ?
Thanks
from sparsepp.
Hi @zelda007,
Thanks for the questions. I have added a paragraph about thread safety at the end of the README.md.
Why are you concerned with the number of collisions. Is there any specific reason? An easy way to diminish the number of collisions if to call reserve() to preallocate the correct table size, if you know in advance how many unique entries will be inserted.
Currently, the sparse_hash_set cannot be memory bounded. It is not clear to me how this would happen. Would you expect an exception to be raised when a specific memory usage is reached, and all subsequent insert fail?
from sparsepp.
Yes, I work on an IA project and I need a very powerful hash table, i.e fast insert and reading and low memory storage. That is why I am currently tested your sparse_hash_set container.
I don't know how many entries will be stored inside the hash table, so I just call reserve() with an approximate number of entries.
Basically, the algorithm searchs in a tree of nodes the best path and I need a hash table for duplicate nodes.
Why are you concerned with the number of collisions. Is there any specific reason?
Yes, it is to check if my hash function is good... With std::unordered_set, I created a function to compute the total number of collisions and the max collision per bucket. It was very useful.
Would you expect an exception to be raised when a specific memory usage is reached, and all subsequent insert fail?
Yes. I could abord my search when the limit will be reached. It is a good starting point...
Thanks for your prompt response
from sparsepp.
Hi greg7mdp,
Do you plan to implement a memory bound hash table like described above ?
My IA project is almost finished and I would like to test your sparse hashtable in real time.
Thanks
from sparsepp.
Hi @zelda007, I don't plan on adding something like that. It is a very specific use, that would add an overhead in the hash table itself, but would be useful for very few people, Also, I am not 100% convinced that it is really needed, because I still don't understand why the hash table itself need to implement this, when you could easily check the size yourself?
from sparsepp.
Related Issues (20)
- Support operator[] with rvalue HOT 1
- When hashing on vector HOT 13
- Help me to serialize/deserialize sparse_hash_map<int , vector< std::string > > HOT 1
- missing return value in function spp_sptr& operator=(spp_sptr &&o) ? HOT 1
- SIGSEGV on using move-assigned-from hash set HOT 3
- Unable to create map for value type with deleted copy constructor HOT 2
- Can't insert std::tuple: spp uses deleted constructor/destructor HOT 1
- Lookup non-present keys becomes very slow. HOT 1
- Do you support C++11 stateful allocators? HOT 1
- Default allocator doesn't throw and caller doesn't check for NULL HOT 1
- SEGV after moving hash_map HOT 2
- conan package HOT 2
- sparse_hash_map<uint32_t, uint64_t> takes more memory than sparse_hash_map<uint64_t, uint64_t> HOT 1
- memcpy can happen with null "source" parameter HOT 6
- class-memaccess warning when compiling with GCC 8 or later HOT 1
- Problem when compiling on macOS catalina HOT 6
- Can sparse_hash_map support unique_ptr?
- Compiler warning on realloc call within sparsepp. HOT 1
- Error when compile: is not a special member function which can be defaulted HOT 2
- Missing return value. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparsepp.