Giter Club home page Giter Club logo

Comments (6)

greg7mdp avatar greg7mdp commented on June 14, 2024

Hi,

Thanks for the kind words.

Sparsepp uses open addressing (with this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table).

Unlike the chaining method used by std::unordered_map, the collisions are not resolved with a linked list, so the notion of bucket size does not really apply, and that's why the return values are 0 or 1.

There is no simple way currently to get the number of collisions without instrumenting the code. If you just insert items into the sparse_hash_set or map, you could use the statement at line 4315 (++num_probes; ) which indicates a collision every time it is executed

Hope this helps,

greg

from sparsepp.

zelda007 avatar zelda007 commented on June 14, 2024

If I use only insert() and count() operations in my sparse_hash_set (I do not use map), does num_probes will report the total number of collisions ?
How can I get this value without producing bad code inside yours ?

PS:
Does your sparse_hash_set thread safe ?
Does your sparse_hash_set can be memory bounded ? For instance, if I want to use a hashtable with 1024 MBytes max, is it possible to bound your hashtable with this size ?

Thanks

from sparsepp.

greg7mdp avatar greg7mdp commented on June 14, 2024

Hi @zelda007,

Thanks for the questions. I have added a paragraph about thread safety at the end of the README.md.

Why are you concerned with the number of collisions. Is there any specific reason? An easy way to diminish the number of collisions if to call reserve() to preallocate the correct table size, if you know in advance how many unique entries will be inserted.

Currently, the sparse_hash_set cannot be memory bounded. It is not clear to me how this would happen. Would you expect an exception to be raised when a specific memory usage is reached, and all subsequent insert fail?

from sparsepp.

zelda007 avatar zelda007 commented on June 14, 2024

Yes, I work on an IA project and I need a very powerful hash table, i.e fast insert and reading and low memory storage. That is why I am currently tested your sparse_hash_set container.
I don't know how many entries will be stored inside the hash table, so I just call reserve() with an approximate number of entries.

Basically, the algorithm searchs in a tree of nodes the best path and I need a hash table for duplicate nodes.

Why are you concerned with the number of collisions. Is there any specific reason?

Yes, it is to check if my hash function is good... With std::unordered_set, I created a function to compute the total number of collisions and the max collision per bucket. It was very useful.

Would you expect an exception to be raised when a specific memory usage is reached, and all subsequent insert fail?

Yes. I could abord my search when the limit will be reached. It is a good starting point...

Thanks for your prompt response

from sparsepp.

zelda007 avatar zelda007 commented on June 14, 2024

Hi greg7mdp,
Do you plan to implement a memory bound hash table like described above ?
My IA project is almost finished and I would like to test your sparse hashtable in real time.
Thanks

from sparsepp.

greg7mdp avatar greg7mdp commented on June 14, 2024

Hi @zelda007, I don't plan on adding something like that. It is a very specific use, that would add an overhead in the hash table itself, but would be useful for very few people, Also, I am not 100% convinced that it is really needed, because I still don't understand why the hash table itself need to implement this, when you could easily check the size yourself?

from sparsepp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.