Giter Club home page Giter Club logo

sparsepp's Introduction

I now recommend using the parallel hashmap instead of sparsepp, unless if you are stuck with a non C++11 compatible compiler, or if using a little bit more memory is not acceptable. I will personally switch to using the parallel hashmap, which I believe offers a significantly better tradeoff (slightly higher memory usage, much faster).

Sparsepp: A fast, memory efficient hash map for C++ Build Status

Sparsepp is derived from Google's excellent sparsehash implementation. It aims to achieve the following objectives:

  • A drop-in alternative for unordered_map and unordered_set.
  • Extremely low memory usage (typically about one byte overhead per entry), and most importantly very small memory overhead when resizing.
  • Very efficient, typically faster than your compiler's unordered map/set or Boost's.
  • C++11 support (if supported by compiler).
  • Single header not anymore
  • Tested on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++).

We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's dense_hash_map is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance, and insertions cause a resizing).

This hash map. like Google's dense_hash_map, uses open addressing (meaning that the hash table entries are conceptually stored in a big array, and collisions are not resolved using a linked list, but by probing). A big drawback of open addressing is that when the array needs to be resized larger, the high mark for memory usage is typically 3 times the current map size (as we allocate a new array twice as big as the current one, and copy from the old array to the new one). Sparsepp's implementation resolves this memory usage issue, and the memory usage high mark shows only a small bump when resizing.

For a detailed comparison of various hash implementations, including Sparsepp, please see our write-up.

Example

#include <iostream>
#include <string>
#include <sparsepp/spp.h>

using spp::sparse_hash_map;
 
int main()
{
    // Create an unordered_map of three strings (that map to strings)
    sparse_hash_map<std::string, std::string> email = 
    {
        { "tom",  "[email protected]"},
        { "jeff", "[email protected]"},
        { "jim",  "[email protected]"}
    };
 
    // Iterate and print keys and values 
    for (const auto& n : email) 
        std::cout << n.first << "'s email is: " << n.second << "\n";
 
    // Add a new entry
    email["bill"] = "[email protected]";
 
    // and print it
    std::cout << "bill's email is: " << email["bill"] << "\n";
 
    return 0;
}

Installation

No compilation is needed, as this is a header-only library. The installation consist in copying the sparsepp directory wherever it will be convenient to include in your project(s).

Warning - iterator and reference invalidation on erase/insert

  1. erasing elements is likely to invalidate iterators (for example when calling erase())

  2. inserting new elements is likely to invalidate iterators (iterator invalidation can also happen with std::unordered_map if rehashing occurs due to the insertion)

  3. references to values stored in a sparse_hash_map or set are likely to be invalidated on insert()/erase(). This is not the case for std::unordered_map or set.

Usage

As shown in the example above, you need to include the header file: #include <sparsepp/spp.h>

This provides the implementation for the following classes:

namespace spp
{
    template <class Key, 
              class T,
              class HashFcn  = spp_hash<Key>,  
              class EqualKey = std::equal_to<Key>,
              class Alloc    = libc_allocator_with_realloc<std::pair<const Key, T>>>
    class sparse_hash_map;

    template <class Value,
              class HashFcn  = spp_hash<Value>,
              class EqualKey = std::equal_to<Value>,
              class Alloc    = libc_allocator_with_realloc<Value>>
    class sparse_hash_set;
};

These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences:

  • Calls to erase() may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values:

    for (auto it = c.begin(); it != c.end(); )
        if (it->first % 2 == 1)
           it = c.erase(it);
        else
           ++it;

    As for std::unordered_map, the order of the elements that are not erased is preserved.

  • Since items are not grouped into buckets, Bucket APIs have been adapted: max_bucket_count is equivalent to max_size, and bucket_count returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map.

  • Values inserted into sparsepp have to either be copyable and movable, or just movable. See example movable.cc.

Memory allocator on Windows (when building with Visual Studio)

When building with the Microsoft compiler, we provide a custom allocator because the default one (from the Visual C++ runtime) fragments memory when reallocating.

This is desirable only when creating large sparsepp hash maps. If you create lots of small hash_maps, memory usage may increase instead of decreasing as expected. The reason is that, for each instance of a hash_map, the custom memory allocator creates a new memory space to allocate from, which is typically 4K, so it may be a big waste if just a few items are allocated.

In order to use the custom spp allocator, define the following preprocessor variable before including <spp/spp.h>:

#define SPP_USE_SPP_ALLOC 1

Integer keys, and other hash function considerations.

  1. For basic integer types, sparsepp provides a default hash function which does some mixing of the bits of the keys (see Integer Hashing). This prevents a pathological case where inserted keys are sequential (1, 2, 3, 4, ...), and the lookup on non-present keys becomes very slow.

    Of course, the user of sparsepp may provide its own hash function, as shown below:

    #include <sparsepp/spp.h>
    
    struct Hash64 {
        size_t operator()(uint64_t k) const { return (k ^ 14695981039346656037ULL) * 1099511628211ULL; }
    };
    
    struct Hash32 {
        size_t operator()(uint32_t k) const { return (k ^ 2166136261U)  * 16777619UL; }
    };
    
    int main() 
    {
        spp::sparse_hash_map<uint64_t, double, Hash64> map;
        ...
    }
    
  2. When the user provides its own hash function, for example when inserting custom classes into a hash map, sometimes the resulting hash keys have similar low order bits and cause many collisions, decreasing the efficiency of the hash map. To address this use case, sparsepp provides an optional 'mixing' of the hash key (see Integer Hash Function which can be enabled by defining the proprocessor macro: SPP_MIX_HASH.

Example 2 - providing a hash function for a user-defined class

In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of std::hash for the class into the "std" namespace. For example:

#include <iostream>
#include <functional>
#include <string>
#include <sparsepp/spp.h>

using std::string;

struct Person 
{
    bool operator==(const Person &o) const 
    { return _first == o._first && _last == o._last; }

    string _first;
    string _last;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> 
    struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            std::size_t seed = 0;
            spp::hash_combine(seed, p._first);
            spp::hash_combine(seed, p._last);
            return seed;
        }
    };
}
 
int main()
{
    // As we have defined a specialization of std::hash() for Person, 
    // we can now create sparse_hash_set or sparse_hash_map of Persons
    // ----------------------------------------------------------------
    spp::sparse_hash_set<Person> persons = { { "John", "Galt" }, 
                                             { "Jane", "Doe" } };
    for (auto& p: persons)
        std::cout << p._first << ' ' << p._last << '\n';
}

The std::hash specialization for Person combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value.

spp::hash_combine is provided by the header sparsepp/spp.h. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header sparsepp/spp_utils.h for that purpose:

#include <string>
#include <sparsepp/spp_utils.h>

using std::string;
 
struct Person 
{
    bool operator==(const Person &o) const 
    { 
        return _first == o._first && _last == o._last && _age == o._age; 
    }

    string _first;
    string _last;
    int    _age;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> 
    struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            std::size_t seed = 0;
            spp::hash_combine(seed, p._first);
            spp::hash_combine(seed, p._last);
            spp::hash_combine(seed, p._age);
            return seed;
        }
    };
}

Example 3 - serialization

sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection. This support is implemented in the following APIs:

    template <typename Serializer, typename OUTPUT>
    bool serialize(Serializer serializer, OUTPUT *stream);

    template <typename Serializer, typename INPUT>
    bool unserialize(Serializer serializer, INPUT *stream);

The following example demonstrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS:

#include <cstdio>

#include <sparsepp/spp.h>

using spp::sparse_hash_map;
using namespace std;

class FileSerializer 
{
public:
    // serialize basic types to FILE
    // -----------------------------
    template <class T>
    bool operator()(FILE *fp, const T& value) 
    {
        return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
    }

    template <class T>
    bool operator()(FILE *fp, T* value) 
    {
        return fread((void *)value, sizeof(*value), 1, fp) == 1;
    }

    // serialize std::string to FILE
    // -----------------------------
    bool operator()(FILE *fp, const string& value) 
    {
        const size_t size = value.size();
        return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
    }

    bool operator()(FILE *fp, string* value) 
    {
        size_t size;
        if (!(*this)(fp, &size)) 
            return false;
        char* buf = new char[size];
        if (fread(buf, size, 1, fp) != 1) 
        {
            delete [] buf;
            return false;
        }
        new (value) string(buf, (size_t)size);
        delete[] buf;
        return true;
    }

    // serialize std::pair<const A, B> to FILE - needed for maps
    // ---------------------------------------------------------
    template <class A, class B>
    bool operator()(FILE *fp, const std::pair<const A, B>& value)
    {
        return (*this)(fp, value.first) && (*this)(fp, value.second);
    }

    template <class A, class B>
    bool operator()(FILE *fp, std::pair<const A, B> *value) 
    {
        return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second);
    }
};

int main(int argc, char* argv[]) 
{
    sparse_hash_map<string, int> age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } };

    // serialize age hash_map to "ages.dmp" file
    FILE *out = fopen("ages.dmp", "wb");
    age.serialize(FileSerializer(), out);
    fclose(out);

    sparse_hash_map<string, int> age_read;

    // read from "ages.dmp" file into age_read hash_map 
    FILE *input = fopen("ages.dmp", "rb");
    age_read.unserialize(FileSerializer(), input);
    fclose(input);

    // print out contents of age_read to verify correct serialization
    for (auto& v : age_read)
        printf("age_read: %s -> %d\n", v.first.c_str(), v.second);
}

Thread safety

Sparsepp follows the thread safety rules of the Standard C++ library. In Particular:

  • A single sparsepp hash table is thread safe for reading from multiple threads. For example, given a hash table A, it is safe to read A from thread 1 and from thread 2 simultaneously.

  • If a single hash table is being written to by one thread, then all reads and writes to that hash table on the same or other threads must be protected. For example, given a hash table A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.

  • It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given hash tables A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.

sparsepp's People

Contributors

brenoguim avatar greg7mdp avatar jhetherly avatar khng300 avatar legendre6891 avatar maskray avatar psialt avatar rob-p avatar speps avatar tessil avatar toddlipcon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparsepp's Issues

Assertion Failure in erase()

I tried replacing a usage of gnu_cxx::hash_map in our code with sparsepp::sparse_hash_map, but I get an assertion failure in erase():
sparsepp/sparsepp.h:2824: bool spp::sparsegroup<T, Alloc>::erase_ne(Alloc&, twod_iter&) [with twod_iter = spp::Two_d_iterator<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t>, spp::sparsegroup<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t>, spp::libc_allocator_with_realloc<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t> > >, std::pair<const croix::hashed_squish_bits_t, croix::subwind_map_value_t>, std::bidirectional_iterator_tag>, T = std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t>, Alloc = spp::libc_allocator_with_realloc<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t> >]: Assertion `_group && it.col_current != ne_end()' failed.

Most of my test cases work, but a few of them fail in this way. Here I'm inserting thousands of large objects (several hundred bytes each) into the hash_map, then creating a vector of iterators sorted by a custom sort function, then erasing them one-by-one to incrementally free the memory. This works correctly for all of my test cases when using gnu_cxx::hash_map, unordered_map, google::sparse_hash_map, and google::dense_hash_map. It only fails with sparsepp. Also, I'm using the latest version of sparsepp the sparsepp unit tests pass on my machine.

What is this assertion meant to test? It's not clear whether this is a sparsepp bug or if I'm doing something illegal in my code.

_group_alloc should be constructed from passed-in allocator

In sparsetable's constructor, it default-constructs the _group_alloc member. However, in some cases the passed-in allocator may have some state which needs to be captured into the group allocator. Thus _group_alloc should be constructed with 'alloc' as the argument.

how to do std-style contains?

I need to check if an spp::sparse_hash_map contains a particular key. Following the usual logic I would do with std::unordered_map<T, T>, it would be something like this:

spp::sparse_hash_map<T1, T2> map;
// ...
T1 some_key;
auto got = map.find(some_key);
if(got == map::end()) {
  // do something
}

However I am getting a super long error saying that the types for got and map::end() are incompatible. I tried a number of variations of this and always get the same error. I suspect this has something to do with the double definition of find() with const and non const iterators, but I have no idea how to get around it. Just need to do a simple contains check...

build problems

Just FYI:

  1. on Cygwin on Windows using g++ version 5.4, I had to change the option '-std=c++0x' to '-std=gnu++ 0x' because it complained strdup was undefined.
  2. compile failed on SuSE/SLES 12 and openSuSE Leap 42.1 on ARM64 with g++ version 4.8.5 because cpuid.h doesn't exist. I suspect this is an ARM64-specific problem, but I don't have an x86 SuSE system to check against.

Is there statistics for collisions ?

Hi,

This project looks very nice :)
But I would like to know how can I show the total number of collisions of the sparse_hash_set/map (and maybe the number of collisions per buckets) ?

I saw the function bucket_size(size_t i) does not return the number of elements in the bucket i but 0 or 1... So I cannot iterate through all buckets and call bucket_size() function to get the number of collisions...

Thanks

Question about un-expectedly *good* speed

Hi Greg,

First, thanks for this excellent hash table! I've been wanting something like this for some time, as I find the standard unordered_map in C++ to be reasonably fast, but with very poor memory use. The balance struck by sparsepp is near-perfect, and it's a near drop-in replacement for unordered_map which makes it very easy to use. I'm seeing something that, to me, is completely unexpected. I have a project, RapMap, where I want to replace a Google dense_hash_map with the sparsepp sparse_hash_map to reduce memory usage. I've been doing some benchmarking to see what type of speed hit I would have to take in my particular scenario (the memory usage with sparsepp is already a factor of 2 lower). I got some very unexpected results. Specifically, sparsepp seems to be working faster than the Google dense_hash_map. I'm attaching a timing plot and some /usr/bin/time results for your perusal.

timing

This image shows the total elapsed time (wall clock time) for each dataset using the google dense_hash_map (dense) and the sparsepp sparse_hash_map (sparse). Interestingly, you can see that sparsepp almost always outperforms the dense_hash_map. Now, my workload isn't entirely looking up hash keys, but no other code here changed apart from which hash was being used for the lookup, so that is the only thing to which I can attribute the timing difference. Also very interesting to me is the actual timing logs:

timing.zip

the thing that is interesting to me here is the difference in user, system and elapsed time, and the total CPU usage. For each run, I told the process to use 16 threads. The hashing workload here is completely lookups (i.e. there are no insertions or deletions during these runs, apart from the initial de-serialization of the hash from disk). It looks like sparsepp is consistently coming closer to the pre-specified cpu usage percentage (often 1,300-1,500% compared to the 1,000-1,200% with dense_hash_map). It also appears that sparsepp is consistently consuming more user time and less system time than google's dense_hash_map.

Do you have any idea why I might be seeing these un-expected performance characteristics? Any light you could shed on why I might see this would be greatly appreciated!

Thanks,
Rob

c++11 range for and non copyable.

Hello !

I've noticed some problems when adopting Your sparsepp. Mostly they occur when dealing with objects that are non copyable (they copy constructor and assign operator are deleted).

Lets say we have:

for( auto& it : m_values )
{
	it.second.RestoreDefault();
}

There is simple loop through references to all items in hash map. sparsepp wants to do this:

// sparsepp (2615)
    void _set_val(mutable_value_type *p, const_reference val)
    {
        *p = spp_const_mutable_ref(val);
    }

So it unrolls to simple assign operator *p = (*(reinterpret_cast<const_mutable_pointer>(&(x))))
It would be cool to support such objects, especially that Your hasmap should be easy plug&play replacement for std one.

Any way, lib is great and looking to hearing from You ;)

error during unserializing a sprasehashmap<string,vector<int>>

hi i tried to insert values for the map which has string as key and vector of integers as values. after that i dumped the map into the file using,
FILE *out = fopen("ages.dmp", "wb");
ages.serialize(FileSerializer(), out);
upto this point everything seems to be fine.
But when i try unserializing it, im getting some error(i think memory issues).
for unserializing i used,
FILE *input = fopen("ages.dmp", "rb");
readages.unserialize(FileSerializer(), input);
fclose(input);
Could you please help me with this ?

Compile error in debug mode

Hi there,

sparsepp works perfectly fine when compiled in release mode (with an amazingly small memory footprint, thanks!), but doesn't compile with my debug setup.

The error message comes from the debug version of my STL implementation:

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/debug/array:86:52: 
error: too many arguments to function call, expected single argument '__other', have 2 arguments
/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/debug/array:86:52: error: too many arguments to function call, expected single argument '__other', have 2 arguments
      noexcept(noexcept(swap(std::declval<_Tp&>(), std::declval<_Tp&>())))
                        ~~~~                       ^~~~~~~~~~~~~~~~~~~~

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/debug/array:264:23: note: in instantiation of exception specification for 'swap' requested here
    noexcept(noexcept(__one.swap(__two)))
                      ^

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_pair.h:195:25: note: in instantiation of exception specification for 'swap<unsigned int, 5>' requested here
      noexcept(noexcept(swap(first, __p.first))
                        ^

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_pair.h:255:23: note: in instantiation of exception specification for 'swap' requested here
    noexcept(noexcept(__x.swap(__y)))
                      ^

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algobase.h:148:7: note: in instantiation of exception specification for 'swap<std::__debug::array<unsigned int, 5>, unsigned long>'
      requested here
      swap(*__a, *__b);
      ^

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:1351:10: note: in instantiation of function template specialization 'std::iter_swap<std::pair<std::__debug::array<unsigned
      int, 5>, unsigned long> *, std::pair<std::__debug::array<unsigned int, 5>, unsigned long> *>' requested here
                  std::iter_swap(__p, __q);
                       ^

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:1419:12: note: (skipping 7 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
      std::__rotate(__first, __middle, __last,
           ^

[...]/sparsepp/sparsepp/spp.h:2744:26: note: in instantiation of member function 'spp::sparse_hashtable<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long>,
      std::__debug::array<unsigned int, 5>, spp::spp_hash<std::__debug::array<unsigned int, 5> >, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
      spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
      >::SelectKey, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >,
      spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::SetKey, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const
      std::__debug::array<unsigned int, 5>, unsigned long> > >::sparse_hashtable' requested here
        sparse_hashtable tmp(MoveDontCopy, *this, resize_to);
                         ^

[...]/sparsepp/sparsepp/spp.h:3276:21: note: in instantiation of member function 'spp::sparse_hashtable<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long>,
      std::__debug::array<unsigned int, 5>, spp::spp_hash<std::__debug::array<unsigned int, 5> >, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
      spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
      >::SelectKey, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >,
      spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::SetKey, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const
      std::__debug::array<unsigned int, 5>, unsigned long> > >::_resize_delta' requested here
                if (_resize_delta(1))
                    ^

[...]/sparsepp/sparsepp/spp.h:3797:29: note: in instantiation of function template specialization 'spp::sparse_hashtable<std::pair<const std::__debug::array<unsigned int, 5>, unsigned
      long>, std::__debug::array<unsigned int, 5>, spp::spp_hash<std::__debug::array<unsigned int, 5> >, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
      spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
      >::SelectKey, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >,
      spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::SetKey, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const
      std::__debug::array<unsigned int, 5>, unsigned long> > >::find_or_insert<spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >,
      std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::DefaultValue>' requested here
        return rep.template find_or_insert<DefaultValue>(key).second;
                            ^

[...]/foo.cpp:326:21: note: in instantiation of member function 'spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
      spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
      >::operator[]' requested here
                    hashmap[ key ] = value;

My setup: Clang 3.9.1
Debug flags:

-O2 -DDEBUG -g -ggdb3 -D_GLIBCXX_DEBUG

that is, with the checked STL implementation.

Any idea what is going on here?

Thanks a lot!
Lucas

Plan for porting into Rust ?

Hi, First of all thanks for the awesome tool!
I was wondering is there any efforts being made in porting the tool into Rust ?

unexpected memory usage after unserialization

Hi,

sparsepp is really great!

I create a <uint64_t, string> map. In my test code, about 178M <uint64_t,string> pairs are inserted into the map. And then I look-up them again. While looking up them, the memory usage is ~4.995G.

And then I add serialization and unserialization code (copied from your example 3) between insertion and look-up part. During the serialization, the memory usage is still ~4.995G. However, after unserialization, the memory usage is ~12.024G. The serialized file is ~4.4G.

Do you have any idea about this? My computing scenario is building map once and looking-up many times. Thank you!

Assertion

I am seeing this assertion:

sparsepp.h:3979: bool spp::sparse_hashtable<Value, Key, HashFcn, ExtractKey, SetKey, EqualKey, Alloc>::_maybe_shrink() [......]: Assertion `(bucket_count() & (bucket_count()-1)) == 0' failed.

Any suggestion?

Any recommendations on implementing disk paging on top of this?

Greetings, I'm in a position where I will potentially need a structure to store a very large number of hash entries (into the GB). I am investigating sparsepp as a potential candidate with the caveat being I'm interested in adding functionality that will support swapping out pages to disk based on an LRU algorithm. A cursory look at sparsepp docs (I have not thoroughly studied the source) indicates that while sparsepp does support serialization, it appears to support serializing the entire set to a file or stream; it does not appear designed for paging out to disk.

My questions are, first of all, is it possible? Secondly, is it recommended? I am open to other alternatives if you have suggestions. I've been a little reluctant to use a more database-oriented solution because I want something that's optimized to use as much memory as possible before resorting to disk.
TIA

Crashing when used on VS 2015

Hi, First of all thanks for this wonderful library. I needed an alternative for std::unordered_map for my application running on windows (its extremely low latency one). So by using sparsepp along with the default allocator it ran fine but didn't get any improvement in performance (of course it took a lot less memory). Then after setting the macro for using sparsepp allocator my application crashed without leaving a core dump (and so unable to point out where). Am using the library as
#define SPP_USE_SPP_ALLOC 1
#include <sparsepp/spp.h>

Please do let me know if i need to set anything more in order to improve the performance and avoid the crash.

Thanks once again,
Siril.

Bug in macro SPP_HASH_MIX

Hi again,

There is a minor bug in the description page. In order to enable hash mix, the macro is not SPP_HASH_MIX but SPP_MIX_HASH (according to the code in sparsepp.h file).

Otherwise, great job !

Can't store mutex inside object

The following code snippet produces compile-time errors which are way too big to paste:

#include <mutex>
#include <unordered_map>
#include <sparsepp/spp.h>
struct z
{
  int val;
  std::mutex mt;
  z(int v) : val(v) {};
};

//std::unordered_map<int, z> miz;
std::sparse_hash_map<int, z> miz;
int main() {
  miz.emplace(
    std::piecewise_construct,
    std::forward_as_tuple(1),
    std::forward_as_tuple(2));
  return 0;
}

Works fine with unordered_map, is this possible with this implementation too?

(also try_emplace from C++17 would be cool)

realocation of <long, double> map takes very long after about 60mil updates

On sparsehash issue tracker I have said that the update is getting stuck. It's actually not true, I just never waited long enough to see it going.

The program below processes about 3mil keys/sec till about size of 60mil where it gets stuck for about 2minutes twice. Then updates continue at a comparable speed as before then gets stuck on the next doubling.

I don't observe this with randomly generated keys, so it must have something to do with the distribution of my keys. The 3.5GB file is here.

#include <fstream>
#include <iostream>
#include <ctime>
#include "sparsepp.hpp"


int main() {
  std::ifstream f("./keys.tmp", std::ios::binary);
  spp::sparse_hash_map<uint64_t, double> map;

  long i = 0;
  while(!f.eof()) {
    time_t ctt = time(0);
    if (i % (int) 1e7 == 0)
      std::cout << "els:" << i << " map_size:" << map.size() << " at " << asctime(localtime(&ctt));
    uint64_t key;
    f.read((char*)&key, sizeof(key));
    map[key] += 1.0;
    i++;
  }
  return 0;
}

My output on Ubuntu 16.04 64bit machine:

g++ -std=c++11 -O2 tmp.cpp -lm -o tmp && ./tmp
els:0 map_size:0 at Sat Dec 31 12:55:50 2016
els:10000000 map_size:6849514 at Sat Dec 31 12:55:53 2016
els:20000000 map_size:10353403 at Sat Dec 31 12:55:56 2016
els:30000000 map_size:15231352 at Sat Dec 31 12:55:59 2016
els:40000000 map_size:22352845 at Sat Dec 31 12:56:04 2016
els:50000000 map_size:28546365 at Sat Dec 31 12:56:07 2016
els:60000000 map_size:33967341 at Sat Dec 31 12:56:14 2016
els:70000000 map_size:40369410 at Sat Dec 31 12:56:17 2016
els:80000000 map_size:46740765 at Sat Dec 31 12:56:21 2016
els:90000000 map_size:53220303 at Sat Dec 31 12:56:25 2016
els:100000000 map_size:60228564 at Sat Dec 31 12:56:29 2016
els:110000000 map_size:64684927 at Sat Dec 31 12:58:32 2016  <-- 2min
els:120000000 map_size:71014663 at Sat Dec 31 13:00:44 2016  <-- 2min
els:130000000 map_size:77742400 at Sat Dec 31 13:00:48 2016
els:140000000 map_size:83478135 at Sat Dec 31 13:00:51 2016
els:150000000 map_size:90135208 at Sat Dec 31 13:00:55 2016
els:160000000 map_size:96354061 at Sat Dec 31 13:00:59 2016
els:170000000 map_size:102911284 at Sat Dec 31 13:01:03 2016
els:180000000 map_size:109286371 at Sat Dec 31 13:01:08 2016
els:190000000 map_size:115816683 at Sat Dec 31 13:02:40 2016  <-- 1.5min
els:200000000 map_size:121958054 at Sat Dec 31 13:07:38 2016  <-- 5min
els:210000000 map_size:128693156 at Sat Dec 31 13:12:19 2016  <-- 5min
els:220000000 map_size:135951840 at Sat Dec 31 13:17:07 2016  <-- 5min
els:230000000 map_size:141173034 at Sat Dec 31 13:17:10 2016
els:240000000 map_size:147978238 at Sat Dec 31 13:17:14 2016
els:250000000 map_size:155220198 at Sat Dec 31 13:17:18 2016
els:260000000 map_size:162405186 at Sat Dec 31 13:17:22 2016
els:270000000 map_size:169145754 at Sat Dec 31 13:17:26 2016
els:280000000 map_size:173815023 at Sat Dec 31 13:17:29 2016
els:290000000 map_size:178707615 at Sat Dec 31 13:17:33 2016
els:300000000 map_size:185257094 at Sat Dec 31 13:17:38 2016
els:310000000 map_size:191926969 at Sat Dec 31 13:17:44 2016
els:320000000 map_size:198405326 at Sat Dec 31 13:19:10 2016   <--1.5min
els:330000000 map_size:204954471 at Sat Dec 31 13:22:38 2016   <--3.5min
els:340000000 map_size:211374035 at Sat Dec 31 13:26:36 2016   <--4min
els:350000000 map_size:217958953 at Sat Dec 31 13:31:15 2016   <--5min
els:360000000 map_size:224566793 at Sat Dec 31 13:35:48 2016     ...
els:370000000 map_size:231175570 at Sat Dec 31 13:41:07 2016     ...
els:380000000 map_size:237764603 at Sat Dec 31 13:47:00 2016     ...
els:390000000 map_size:244260100 at Sat Dec 31 13:53:40 2016     ...

Performance issue

Hi,

Seeing performance issue with finds for hash map <int, int> when inserting keys in order, either ascending or descending:

Testing sparse map
2000000 random inserts in 1 seconds
2000000 random finds in 0 seconds
2000000 random not-finds in 0 seconds
2000000 sequential inserts in 0 seconds
2000000 sequential finds in 0 seconds
2000000 random not-finds in 19 seconds
2000000 negative sequential inserts in 1 seconds
2000000 negative sequential finds in 0 seconds
2000000 random not-finds in 17 seconds

Testing STL map
2000000 random inserts in 2 seconds
2000000 random finds in 1 seconds
2000000 random not-finds in 1 seconds
2000000 sequential inserts in 1 seconds
2000000 sequential finds in 0 seconds
2000000 random not-finds in 0 seconds
2000000 negative sequential inserts in 1 seconds
2000000 negative sequential finds in 0 seconds
2000000 random not-finds in 0 seconds

Code is as follows:
void testSparseMap(int count) {
spp::sparse_hash_map<int, int> s;
time_t t;

printf ("Testing sparse map\n");
t = time (NULL);
srand (0);
for (int i = 0; i < count; ++i) { s.insert (make_pair(rand (), i)); }
printf ("%d random inserts in %ld seconds\n", count, time (NULL) - t);

t = time (NULL);
srand (0);
for (int i = 0; i < count; ++i) { s.find (rand ()); }
printf ("%d random finds in %ld seconds\n", count, time (NULL) - t);

t = time (NULL);
srand (1);
for (int i = 0; i < count; ++i) { s.find (rand ()); }
printf ("%d random not-finds in %ld seconds\n", count, time (NULL) - t);

s.clear ();
s.reserve(0);
t = time (NULL);
srand (0);
for (int i = 0; i < count; ++i) { s.insert (make_pair(i, i)); }
printf ("%d sequential inserts in %ld seconds\n", count, time (NULL) - t);

t = time (NULL);
srand (0);
for (int i = 0; i < count; ++i) { s.find (i); }
printf ("%d sequential finds in %ld seconds\n", count, time (NULL) - t);

t = time (NULL);
srand (1);
for (int i = 0; i < count; ++i) { s.find (rand ()); }
printf ("%d random not-finds in %ld seconds\n", count, time (NULL) - t);

s.clear ();
t = time (NULL);
srand (0);
s.reserve(0);
for (int i = 0; i < count; ++i) { s.insert (make_pair(-i, -i)); }
printf ("%d negative sequential inserts in %ld seconds\n", count, time (NULL) - t);

t = time (NULL);
srand (0);
for (int i = 0; i < count; ++i) { s.find (-i); }
printf ("%d negative sequential finds in %ld seconds\n", count, time (NULL) - t);

t = time (NULL);
srand (1);
for (int i = 0; i < count; ++i) { s.find (rand ()); }
printf ("%d random not-finds in %ld seconds\n", count, time (NULL) - t);
s.clear ();
s.reserve(0);
}

This is not an issue on Mac OS X.

Multiply defined symbol

Hi Greg,

When I compile (g++ 4.9.1) with the current sparsepp.h I get a multiply-defined symbol void cpuid(int*, int). It's not clear to me why this happens, as sparsepp seems to be properly include guarded, and the pre-processor directives should prevent the function from being defined more than once. However, no re-arrangement of the headers seems to rectify this problem, and I don't have any other includes that seem to be defining this symbol (i.e. all definitions are coming from sparsepp). I'm able to rectify the issue by simply changing the call to cpuid to be a call to the underlying __cpu_count function (my downstream software explicitly doesn't support windows, so I don't need to worry about that codepath) — but this solution seems like a hack. Any idea why I might be seeing this behavior or how I might rectify it cleanly?

Move semantics

If you take o pointer to a value inside the hash table and you do an insert, could that insert invalidate that pointer(not the iterator)? Considering every bucket has a vector and every vector is move aware it probably doesn't invalidate the pointer ?

Data race with aggressive and moderate configuration for SPP_ALLOC_SZ

So, I started using this hash map, and a few days after I pushed the code, random crashes started happening on our tests.

Apparently, there is a data race inside the function:

static uint32_t _sizing(uint32_t n)
{
#if !defined(SPP_ALLOC_SZ) || (SPP_ALLOC_SZ == 0)                                                                       
        // aggressive allocation first, then decreasing as sparsegroups fill up                                         
        // --------------------------------------------------------------------                                         
        static uint8_t s_alloc_batch_sz[SPP_GROUP_SIZE] = { 0 };                                           
        if (!s_alloc_batch_sz[0])                                               

s_alloc_batch_sz can be accessed from multiple threads.
Helgrind immediately complains about using different maps in different threads, so it's not hard to reproduce.

For my code, I changed the flag to SPP_ALLOC_SZ 1, since I was in this for memory anyway.

I also tested with adding thread_local to the static, and it fixes the issue. But I wouldn't go with thread local, since I don't want these few extra bytes in the thread stacks of my application.

Memory error when calling unserialize on file-scope sparse_hash_map.

Hi Greg,

Sorry to bother you with this, but I'm trying to bisect a bug that coincides with my conversion to sparsepp. Strangely, I only see a bug when running on my Mac (i.e. never on a linux system), but the bug also only presents when using sparsepp (config details at the bottom). Is the sparse_hash_map safe to read simultaneously from many threads? It's not the case that iterators are e.g. mutating state as they are searching, right? I'd guess not, and I'll continue looking for the bug elsewhere, but I just wanted to make sure.

Thanks!
Rob

details of compiler (also built using -stdlib=libc++).

Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

shrink to fit

Is there somenthing like shrink_to_fit like we have for std::vector?
I tried to use clear and reserve(0) but the memory is not released.

Thanks

Reference invalidation on insert

I'm trying to debug a problem of my own making. In attempting to use sparsepp as a drop-in replacement for std::unordered_set in a model checker, I ran into some unexpected memory corruption. The code can be distilled to something like the following self-contained reproducer:

#include <cassert>
#include <cstddef>
#include <cstdint>
#include <functional>
#include <iostream>
#include <queue>
#include <unistd.h>

#ifdef USE_SPARSEPP
  #include <sparsepp/spp.h>
  template<typename... Args>
  using set = spp::sparse_hash_set<Args...>;
#else
  #include <unordered_set>
  template<typename... Args>
  using set = std::unordered_set<Args...>;
#endif

struct State {
  uint8_t value = 0;

  // XXX: When using std::unordered_set, this can be
  //  const State *const previous = nullptr;
  const State *previous = nullptr;

  State() {}

  State(const State *s): value(s->value), previous(s) {}

  // XXX: When using std::unordered_set, assignment does not need to be implemented
  State &operator=(const State &other) {
    value = other.value;
    previous = other.previous;
    return *this;
  }

  bool operator==(const State &other) const noexcept {
    return value == other.value;
  }
};

namespace std {
  template<>
  struct hash<State> {
    std::size_t operator()(const State &s) const noexcept {
      return static_cast<std::size_t>(s.value);
    }
  };
}

static void print_trace(const State &s) {
  if (s.previous != nullptr)
    print_trace(*s.previous);
  std::cout << "value == " << unsigned(s.value) << "\n";
}

int main(void) {

  set<State> seen;
  std::queue<const State*> pending;

  State init;
  auto it = seen.insert(init);
  assert(it.second);
  pending.push(&*it.first);

  while (!pending.empty()) {

    const State *s = pending.front();
    pending.pop();

    State next(s);
    next.value++;

    if (next.value == 42) {
      std::cout << "Failed. Trace:\n";
      print_trace(next);
      return EXIT_FAILURE;
    }

    auto it = seen.insert(next);
    if (it.second) {
      pending.push(&*it.first);
    }
  }

  return EXIT_SUCCESS;
}

which can be compiled with:

# Build with std::unordered_set
g++ -std=c++11 -W -Wall -Wextra -Werror main.cc

# Build with sparsepp
g++ -DUSE_SPARSEPP -std=c++11 -W -Wall -Wextra -Werror main.cc

This runs correctly (produces the expected failing trace) when using std::unordered_set but triggers segfaults with spp::sparse_hash_set. It seems fairly likely to me that the problem stems from retaining a pointer into the set (pending.push(&*it.first);), and my following comments are assuming this is the case.

The sparsepp readme has this to say:

Warning - iterator invalidation on erase/insert

...
2. inserting new elements is likely to invalidate iterators (iterator invalidation can also happen with std::unordered_map if rehashing occurs due to the insertion)

The last part of this is surprising to me. I was under the impression std::unordered_set iterators and references remained valid after insertion. Not authoritative, but cppreference claims this is true before C++17. Is this not correct? Also the sparsepp readme doesn't explicitly mention that references are invalidated (though thinking about the implementation it is straightforward to see that this is probably the case and the motivation for this).

All in all, I don't think this is a sparsepp bug, but I wanted to understand whether I was doing something incorrect here or misreading the sparsepp docs.

Trivial compiler warnings

Compiling with -Wall and -Wdouble-promotion on gcc 6.1 yields the following trivial warnings:

sparsepp.h: In member function ‘void spp::sparsehash_internal::sh_hashtable_settings<Key, HashFunc, SizeType, HT_MIN_BUCKETS>::set_resizing_parameters(float, float)’:
sparsepp.h:1522:27: warning: implicit conversion from ‘float’ to ‘double’ to match other operand of binary expression [-Wdouble-promotion]
             assert(shrink >= 0.0);
                    ~~~~~~~^~~~
sparsepp.h:1523:25: warning: implicit conversion from ‘float’ to ‘double’ to match other operand of binary expression [-Wdouble-promotion]
             assert(grow <= 1.0);
                    ~~~~~^~~~

It is obviously inconsequential, but maybe worth fixing just to get clean compile time messages.

Doesn't compile on Mac using clang and C++03

First of all, thank you for writing an outstanding implementation of a hash table. It really saved my bacon ;)

I have one small problem that I was able to easily work around, but wanted to recommend a fix in case anyone else sees the same issue.

I originally tried compiling this using clang on my Mac without C++11. This failed because clang defines similar macros to gcc and this confuses the preprocessor logic found here
https://github.com/greg7mdp/sparsepp/blob/master/sparsepp.h#L892
and here
https://github.com/greg7mdp/sparsepp/blob/master/sparsepp.h#L905.
It boils down to not being able to locate the tr1 header. I fixed this by switching the order of the preprocessor if statements (i.e. checking for clang before gcc).

Performance degradation with latest changes

Hi @greg7mdp. Yesterday I noticed that my R package which uses sparsepp became super-slow. First I thought it is due to some my changes. But after more investigation I found that it is related to sparsepp, because it works fast when I use old version (1.5x faster that std::unordered_map as I reported in feedback) - https://github.com/greg7mdp/sparsepp/tree/0fb17374eb761e31c9256bce3a63e513e846d0bc.
In January I updated sparsepp to https://github.com/greg7mdp/sparsepp/tree/48fc88160011dff87030f992d7c0afc874c7350f and things became super slow (I think 10-100x).

I use following function to hash pair<uint32_t, uint32_t> to uint64_t:

namespace std {
  template <>
  struct hash<std::pair<uint32_t, uint32_t>>
  {
    inline uint64_t operator()(const std::pair<uint32_t, uint32_t>& k) const
    {
      return (uint64_t) k.first << 32 | k.second;
    }
  };
}

In my case it should not produce any collisions.

Ordering with hash collisions

Hello,

Just curious, if I insert multiple elements into a sparse_hash_set that hash to the same value, is there some specific ordering the elements will take if I iterate over all of the items using

for( auto it = set.begin(); it != set.end(); ++it )

Is there any way I can specify that ordering? (with a custom less-than function or something?)

Poor performance for inserting 100000 entries in sparse hash map compared to std::unordered_map

Hi,

Just a simple benchmark program (C++) comparing your sparse hash map with unordered_map got me results were unordered_map performed much better. Would like to check with you on this. Is this expected with the latest g++ libraries?

Am running on ubuntu 18.04

$ g++ --version
g++ (Ubuntu 7.3.0-16ubuntu3) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And my program is attached along with this.

The last output is as follows:
$ ./a.out
sparse hash map time 1211270
unordered map time 53430

testmap.cpp.txt

Thanks,
Siril.

MSVC 2013 64-bit compiler error

Following error:

spp_test.cc
d:\prj\sparsepp\sparsepp.h(1467) : error C2664: 'size_t spp::spp_hash<Key>::operator ()(T *) const' : cannot convert argument 1 from 'const std::basic_string<char,std::char_traits<char>,std::allocator<char>>' to 'std::string *'
        with
        [
            Key=std::string
,            T=std::string
        ]
        No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
        d:\prj\sparsepp\sparsepp.h(1466) : while compiling class template member function 'size_t spp::sparsehash_internal::sh_hashtable_settings<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,spp::spp_hash<Key>,unsigned __int64,4>::hash(const std::basic_string<char,std::char_traits<char>,std::allocator<char>> &) const'
        with
        [
            Key=std::string
        ]
        d:\prj\sparsepp\sparsepp.h(4832) : see reference to function template instantiation 'size_t spp::sparsehash_internal::sh_hashtable_settings<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,spp::spp_hash<Key>,unsigned __int64,4>::hash(const std::basic_string<char,std::char_traits<char>,std::allocator<char>> &) const' being compiled
        with
        [
            Key=std::string
        ]
        d:\prj\sparsepp\sparsepp.h(4797) : see reference to class template instantiation 'spp::sparsehash_internal::sh_hashtable_settings<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,spp::spp_hash<Key>,unsigned __int64,4>' being compiled
        with
        [
            Key=std::string
        ]
        d:\prj\sparsepp\sparsepp.h(4848) : see reference to class template instantiation 'spp::sparse_hashtable<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,T>,Key,HashFcn,spp::sparse_hash_map<Key,T,HashFcn,std::equal_to<Key>,spp::libc_allocator_with_realloc<std::pair<const Key,T>>>::SelectKey,spp::sparse_hash_map<Key,T,HashFcn,std::equal_to<Key>,spp::libc_allocator_with_realloc<std::pair<const Key,T>>>::SetKey,EqualKey,Alloc>::Settings' being compiled
        with
        [
            T=std::string
,            Key=std::string
,            HashFcn=spp::spp_hash<std::string>
,            EqualKey=std::equal_to<std::string>
,            Alloc=spp::libc_allocator_with_realloc<std::pair<const std::string,std::string>>
        ]
        d:\prj\sparsepp\sparsepp.h(4931) : see reference to class template instantiation 'spp::sparse_hashtable<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,T>,Key,HashFcn,spp::sparse_hash_map<Key,T,HashFcn,std::equal_to<Key>,spp::libc_allocator_with_realloc<std::pair<const Key,T>>>::SelectKey,spp::sparse_hash_map<Key,T,HashFcn,std::equal_to<Key>,spp::libc_allocator_with_realloc<std::pair<const Key,T>>>::SetKey,EqualKey,Alloc>' being compiled
        with
        [
            T=std::string
,            Key=std::string
,            HashFcn=spp::spp_hash<std::string>
,            EqualKey=std::equal_to<std::string>
,            Alloc=spp::libc_allocator_with_realloc<std::pair<const std::string,std::string>>
        ]
        spp_test.cc(55) : see reference to class template instantiation 'spp::sparse_hash_map<std::string,std::string,spp::spp_hash<Key>,std::equal_to<Key>,spp::libc_allocator_with_realloc<std::pair<const Key,T>>>' being compiled
        with
        [
            Key=std::string
,            T=std::string
        ]

while compiling following source:

#include <cstdio>
#include <iostream>
#include <string>

#include "sparsepp.h"

using spp::sparse_hash_map;
using namespace std;

class FileSerializer 
{
public:
    // serialize basic types to FILE
    // -----------------------------
    template <class T>
    bool operator()(FILE *fp, const T& value) 
    {
        return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
    }

    template <class T>
    bool operator()(FILE *fp, T* value) 
    {
        return fread((void *)value, sizeof(*value), 1, fp) == 1;
    }

    // serialize std::string to FILE
    // -----------------------------
    bool operator()(FILE *fp, const string& value) 
    {
        const size_t size = value.size();
        return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
    }

    bool operator()(FILE *fp, string* value) 
    {
        size_t size;
        if (!(*this)(fp, &size)) 
            return false;
        char* buf = new char[size];
        if (fread(buf, size, 1, fp) != 1) 
        {
            delete [] buf;
            return false;
        }
        new (value) string(buf, (size_t)size);
        delete[] buf;
        return true;
    }
};

int main()
{
    // Create an unordered_map of three strings (that map to strings)
    sparse_hash_map<std::string, std::string> email, copy;

    email["tom" ] = "[email protected]";
    email["jeff"] = "[email protected]";
    email["jim" ] = "[email protected]";

    // serialize age hash_map to "mails.dmp" file
    FILE *out = fopen("mails.dmp", "wb");
    email.serialize(FileSerializer(), out);
    fclose(out);

    // read from "mails.dmp" file into age_read hash_map 
    FILE *input = fopen("mails.dmp", "rb");
    copy.unserialize(FileSerializer(), input);
    fclose(input);

    // print out contents to verify correct serialization
    for (auto& v : copy)
        printf("email: %s -> %s\n", v.first.c_str(), v.second.c_str());

    return 0;
}

didn't measure difference on stock exchange HFT data stream

Liked the drop-in replacement feature, and the header only format. Indeed all was needed to copy and define the map.
Used it to hash 8byte stock symbols to incremented integers, where the symbol set was around 6000 with 6e7 - 8e7 even count per day x 5 years. Profiled the code with gprof and didn't find difference between hash_map, unordered_map, and your implementation.
Either

  1. I was doing something not right
  2. this was a corner case
  3. should find a good hash function, then compare again?
    wishing you best: steven

Move assignment operator not implemented

Hi,

I noticed when profiling my code I had undesired copies of hash maps. After looking into it, I noticed two things:

1 - Move assignment operator was not implemented. In my code, I did this:

    1.19 @@ -3732,6 +3725,8 @@ public:
    1.20                      const allocator_type& alloc) :
    1.21          rep(std::move(o.rep), alloc)
    1.22      {}
    1.23 +
    1.24 +    sparse_hash_map& operator=(sparse_hash_map &&o) = default;
    1.25  #endif
    1.26  
    1.27  #if !defined(SPP_NO_CXX11_HDR_INITIALIZER_LIST)

2 - When implementing 1, my code started to go into stack overflow. Fixed with this:

     1.3 @@ -2928,14 +2928,7 @@ public:
     1.4      {
     1.5      }
     1.6  
     1.7 -    sparse_hashtable& operator=(sparse_hashtable&& o)
     1.8 -    {
     1.9 -        using std::swap;
    1.10 -
    1.11 -        sparse_hashtable tmp(std::move(o));
    1.12 -        swap(tmp, *this);
    1.13 -        return *this;
    1.14 -    }
    1.15 +    sparse_hashtable& operator=(sparse_hashtable&& o) = default;
    1.16  #endif
    1.17  
    1.18      sparse_hashtable(MoveDontCopyT mover,

So, operator= calls swap, and apparently swap calls operator=. That is what caused the stack overflow.

I just thought to report it, because I'm not sure if I'll have the time to open the PR.

A way to configure default capacity

This is one thing I would like to set, maybe in the constructor. I assume it already defaults to 0 capacity until there are elements, but I also want the ability to configure the initial size once there are elements.

feedback please!

This is not a real issue, but just a space to provide feedback on sparsepp, good or bad. I'd love to hear what you think. Thank you in advance.

Clarify license

You have a bunch of different licenses in the header of sparsepp.h and it's not really clear under which license sparsepp can be used. Could you please clarify that a bit? The ones in the file appear to be:

  • SparseHash's BSD license
  • Boost License for the Boost bits - very similar to BSD
  • You added another instance of the BSD license, but it's not entirely clear to which parts of the code it apples. Is it only for your contributions or the entire code?

Am I correct in assuming that this means that the whole project is under the BSD license? If so, it'd be nice to mention this in the readme or maybe add a LICENSE file to the repo. The current situation is a bit confusing. Thanks a lot!

Support for search with compatible key types

First of all, congrats on this awesome hash map :) It's far better than others that I've tried.

One thing that I implemented locally is the ability to search on the sparse_hash_set, using a different key. Of course, you have to also pass in a compatible hash functor and comparison hash functor.

This is the original code in the use case:

spp::sparse_hash_map<unsigned, Car*> my_cars = produce();

// I can query cars by their Ids
unsigned cid = some_car_id();
bool got_it = my_cars.find(cid) != my_cars.end();

for (auto& p : my_cars) {
    Car* c = p.second;
    print(c); // I can iterate by car pointers as well!
}

So, instead of having a map of Id to Car, I could just have a set of cars:

struct Functor {
    std::size_t operator()(const Car* c) const { return c->id(); }
    bool operator()(const Car* c1, const Car* c2) const { return c1 == c2; } // its ok, car address is valid identity too
};

spp::sparse_hash_set<Car*, Functor> my_cars = produce();

// I can query cars by their Ids
struct CompatibleFunctor {
    std::size_t operator()(unsigned id) const { return id; }
    bool operator()(const Car* c1, unsigned c2) const { return c1->id() == c2; }
} cfunctor;

unsigned cid = some_car_id();
bool got_it = my_cars.find(cid, cfunctor, cfunctor) != my_cars.end();

for (auto& c : my_cars) {
    print(c); // I can iterate by car pointers as well!
}

So I get the same functionality but my hashtable items are half the size (which is my real goal with all this really).

Would you be willing to accept some form of implementation of this functionality as a pull request? Just thought I should check before writing the code in a shareable way.

sparse_hash_set seems slower and consumes more memory than Google sparsehash

My test code is as follows.

#include <iostream>
//#include "sparsepp/sparsepp.h"
#include <sparsehash/sparse_hash_set>

int main() {
  int n = 1000000*200;

  //spp::sparse_hash_set<int> s;
  google::sparse_hash_set<int> s;

  for (int i = 0; i < n; i++) {
    s.insert(i);
  }
  std::cout << "done\n";

  //while (true) {}
  return 0;
}
$ uname -srvmpio
Linux 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
<snip>
$ g++ -O3 -o sparse_hash_set sparse_hash_set.cc
$ time ./sparse_hash_set

Sparsepp results:

real	0m30.069s
user	0m29.592s
sys	0m0.372s

Google Sparsehash results:

real	0m12.180s
user	0m11.884s
sys	0m0.180s

From the System Monitor and with the while loop uncommented, I was also able to find out the memory usage to be 1.6GB for Sparsepp and 911.8MB for Google Sparsehash.

Compiler warning

sparsepp-master\sparsepp/spp_utils.h(356): warning C4146: unary minus operator applied to unsigned type, result still unsigned

Working with Classes

Hello,

Great work. How do I make this work with a user-defined class?

Thanks,
Nathan

Compute memory usage

Hello,

is there a way (it doesn't matter how fast) to compute the exact memory usage of one hashmap?
I did some experiments with the internal allocator but could not get a realistic result.

Thank you

Conan package

Hello,
Do you know about Conan?
Conan is modern dependency manager for C++. And will be great if your library will be available via package manager for other developers.

Here you can find example, how you can create package for the library.

If you have any questions, just ask :-)

Possible iterator bug?

I'm replacing some usage of std::unordered_map with sparse_hash_map and I've encountered an occasional issue with iterators.

map_t::const_iterator iter = map.find(keyA); // after this line iter contains a good value
map[keyB] = (*iter).second; // after this line iter contains a garbage value

With this code, keyB will occasionally map to some garbage values (0xdddddddd, indicating deleted memory), and not the same value as keyA, which is what I would expect. The same code works fine with std::unordered_map.

If I store keyA's value in a temp variable everything works as expected:

map_t::const_iterator iter = map.find(keyA);
tempVal = (*iter).second;
map[keyB] = tempVal;

(Unfortunately, I can't reproduce this issue with a simple test case, only our internal code.)

Using the temp variable is fine for my usage, just curious if what I'm seeing is a bug, or expected behaviour. I saw in the readme that calls to erase() can invalidate iterators. Is the same true with insertions as well?

C++ 98 support

Does sparsecpp work with C++98?

Based upon example file hash_std.cc, I created a file called simple.cc in C++98 format and ran it with -std=c++98 option. The code compiles with the following warnings. Is it safe to ignore these warnings?

In file included from simple.cc:3:0:
../sparsepp/spp.h:2315:21: warning: use of C++0x long long integer constant [-Wlong-long]
if (value < 0xFFFFFFFFULL) // fits in 4 bytes
^
../sparsepp/spp.h:2338:22: warning: use of C++0x long long integer constant [-Wlong-long]
if (first4 < 0xFFFFFFFFULL)
^
In file included from ../sparsepp/spp.h:64:0,
from simple.cc:3:
../sparsepp/spp_traits.h:55:40: warning: ISO C++ 1998 does not support ‘long long’ [-Wlong-long]
template<> struct is_integral : true_type { };
^
../sparsepp/spp_traits.h:56:49: warning: ISO C++ 1998 does not support ‘long long’ [-Wlong-long]
template<> struct is_integral : true_type { };

insert and move semantics

std::string s = "abacaba";
spp::sparse_hash_set<std::string> q;
q.insert(s);
assert(s == "abacaba");

The assertions fails with spp::sparse_hash_set, but works with std::unordered_set.

Using pointer as a key

It seems that spp_hash depends on knowing the struct/class behind a pointer. I am using a 3rd party code that hides the object internals, so all I have are pointer to forward declaration of a class. As a result, taking sizeof(T) fails to compile.

I hacked my copy to temporarily take sizeof(T*). Is this OK?

-dave

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.