Hi, This project looks very nice :) But I would like to know how

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Is there statistics for collisions ? about sparsepp HOT 6 CLOSED

greg7mdp commented on June 14, 2024

Is there statistics for collisions ?

from sparsepp.

Comments (6)

greg7mdp commented on June 14, 2024

Hi,

Thanks for the kind words.

Sparsepp uses open addressing (with this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table).

Unlike the chaining method used by std::unordered_map, the collisions are not resolved with a linked list, so the notion of bucket size does not really apply, and that's why the return values are 0 or 1.

There is no simple way currently to get the number of collisions without instrumenting the code. If you just insert items into the sparse_hash_set or map, you could use the statement at line 4315 (++num_probes; ) which indicates a collision every time it is executed

Hope this helps,

greg

from sparsepp.

zelda007 commented on June 14, 2024

If I use only insert() and count() operations in my sparse_hash_set (I do not use map), does num_probes will report the total number of collisions ?
How can I get this value without producing bad code inside yours ?

PS:
Does your sparse_hash_set thread safe ?
Does your sparse_hash_set can be memory bounded ? For instance, if I want to use a hashtable with 1024 MBytes max, is it possible to bound your hashtable with this size ?

Thanks

from sparsepp.

greg7mdp commented on June 14, 2024

Hi @zelda007,

Thanks for the questions. I have added a paragraph about thread safety at the end of the README.md.

Why are you concerned with the number of collisions. Is there any specific reason? An easy way to diminish the number of collisions if to call reserve() to preallocate the correct table size, if you know in advance how many unique entries will be inserted.

Currently, the sparse_hash_set cannot be memory bounded. It is not clear to me how this would happen. Would you expect an exception to be raised when a specific memory usage is reached, and all subsequent insert fail?

from sparsepp.

zelda007 commented on June 14, 2024

Yes, I work on an IA project and I need a very powerful hash table, i.e fast insert and reading and low memory storage. That is why I am currently tested your sparse_hash_set container.
I don't know how many entries will be stored inside the hash table, so I just call reserve() with an approximate number of entries.

Basically, the algorithm searchs in a tree of nodes the best path and I need a hash table for duplicate nodes.

Why are you concerned with the number of collisions. Is there any specific reason?

Yes, it is to check if my hash function is good... With std::unordered_set, I created a function to compute the total number of collisions and the max collision per bucket. It was very useful.

Would you expect an exception to be raised when a specific memory usage is reached, and all subsequent insert fail?

Yes. I could abord my search when the limit will be reached. It is a good starting point...

Thanks for your prompt response

from sparsepp.

zelda007 commented on June 14, 2024

Hi greg7mdp,
Do you plan to implement a memory bound hash table like described above ?
My IA project is almost finished and I would like to test your sparse hashtable in real time.
Thanks

from sparsepp.

greg7mdp commented on June 14, 2024

Hi @zelda007, I don't plan on adding something like that. It is a very specific use, that would add an overhead in the hash table itself, but would be useful for very few people, Also, I am not 100% convinced that it is really needed, because I still don't understand why the hash table itself need to implement this, when you could easily check the size yourself?

from sparsepp.

Is there statistics for collisions ? about sparsepp HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent