Comments (6)
I'll add a fix in sparsepp to address the issue, so that the problem does not occur even when using std::hash.
from sparsepp.
@vspinu, Thanks for sharing the test data, I can reproduce the issue.
As you suspected, the problem is that the keys are not well distributed, and the std::hash() function from clang++ or g++ does not help (problem does not occur with Visual Studio 2015).
However, with a small change in your test program (overriding the default hash function), I was able to fix the issue:
vagrant@vagrant-ubuntu-trusty-64:/vagrant/tmp$ g++ -O2 -I.. keys.cc -o keys
vagrant@vagrant-ubuntu-trusty-64:/vagrant/tmp$ ./keys
els: 0, map_size: 0, num_present: 0 at Sun Jan 1 00:01:33 2017
els: 10000000, map_size: 6849514, num_present: 3150486 at Sun Jan 1 00:01:37 2017
els: 20000000, map_size: 10353403, num_present: 6496111 at Sun Jan 1 00:01:42 2017
els: 30000000, map_size: 15231352, num_present: 5122051 at Sun Jan 1 00:01:46 2017
els: 40000000, map_size: 22352845, num_present: 2878507 at Sun Jan 1 00:01:53 2017
els: 50000000, map_size: 28546365, num_present: 3806480 at Sun Jan 1 00:01:58 2017
els: 60000000, map_size: 33967341, num_present: 4579024 at Sun Jan 1 00:02:06 2017
els: 70000000, map_size: 40369410, num_present: 3597931 at Sun Jan 1 00:02:10 2017
els: 80000000, map_size: 46740765, num_present: 3628645 at Sun Jan 1 00:02:15 2017
els: 90000000, map_size: 53220303, num_present: 3520462 at Sun Jan 1 00:02:21 2017
els: 100000000, map_size: 60228564, num_present: 2991739 at Sun Jan 1 00:02:26 2017
els: 110000000, map_size: 64684927, num_present: 5543637 at Sun Jan 1 00:02:31 2017
els: 120000000, map_size: 71014663, num_present: 3670264 at Sun Jan 1 00:02:43 2017
els: 130000000, map_size: 77742400, num_present: 3272263 at Sun Jan 1 00:02:48 2017
els: 140000000, map_size: 83478135, num_present: 4264265 at Sun Jan 1 00:02:53 2017
els: 150000000, map_size: 90135208, num_present: 3342927 at Sun Jan 1 00:02:59 2017
els: 160000000, map_size: 96354061, num_present: 3781147 at Sun Jan 1 00:03:04 2017
els: 170000000, map_size: 102911284, num_present: 3442777 at Sun Jan 1 00:03:10 2017
els: 180000000, map_size: 109286371, num_present: 3624913 at Sun Jan 1 00:03:15 2017
els: 190000000, map_size: 115816683, num_present: 3469688 at Sun Jan 1 00:03:21 2017
els: 200000000, map_size: 121958054, num_present: 3858629 at Sun Jan 1 00:03:27 2017
els: 210000000, map_size: 128693156, num_present: 3264898 at Sun Jan 1 00:03:33 2017
els: 220000000, map_size: 135951840, num_present: 2741316 at Sun Jan 1 00:03:53 2017
els: 230000000, map_size: 141173034, num_present: 4778806 at Sun Jan 1 00:03:58 2017
els: 240000000, map_size: 147978238, num_present: 3194796 at Sun Jan 1 00:04:03 2017
Here is the modified code:
#include <fstream>
#include <iostream>
#include <ctime>
#include <cstdio>
#include <sparsepp.h>
struct Hasher {
size_t operator()(uint64_t a) const { return a * 1099511628211ULL; }
};
int main()
{
std::ifstream f("./keys.tmp", std::ios::binary);
spp::sparse_hash_map<uint64_t, double, Hasher> map;
long i = 0, num_present=0;
while(!f.eof())
{
time_t ctt = time(0);
if (i % 10000000ll == 0)
{
printf("els: %10lu, map_size: %10lu, num_present: %10lu at %s",
i, map.size(), num_present, asctime(localtime(&ctt)));
num_present=0;
}
uint64_t key;
f.read((char*)&key, sizeof(key));
if (map.find(key) != map.end())
num_present++;
map[key] += 1.0;
i++;
}
return 0;
}
from sparsepp.
Actually, I did not change the default in sparsepp to use integer keys unchanged, as this is used in some benchmarks and doing the right thing (scrambling integer keys by multiplying with a prime number) penalizes us in benchmarks. However I have added a section on the README showing how to override the hash function for integers.
from sparsepp.
Great! AFAIC the issue is fixed. Thank you very much for the solution and for the great package!
from sparsepp.
Thanks!
from sparsepp.
@vspinu, I have updated the sparsepp default hash to add some key mixing. This solves the issue you exposed, where the keys are not well distributed. If you get the latest version, it is not necessary anymore to provide a custom hashing function, the default sparsepp hash works fine.
from sparsepp.
Related Issues (20)
- Support operator[] with rvalue HOT 1
- When hashing on vector HOT 13
- Help me to serialize/deserialize sparse_hash_map<int , vector< std::string > > HOT 1
- missing return value in function spp_sptr& operator=(spp_sptr &&o) ? HOT 1
- SIGSEGV on using move-assigned-from hash set HOT 3
- Unable to create map for value type with deleted copy constructor HOT 2
- Can't insert std::tuple: spp uses deleted constructor/destructor HOT 1
- Lookup non-present keys becomes very slow. HOT 1
- Do you support C++11 stateful allocators? HOT 1
- Default allocator doesn't throw and caller doesn't check for NULL HOT 1
- SEGV after moving hash_map HOT 2
- conan package HOT 2
- sparse_hash_map<uint32_t, uint64_t> takes more memory than sparse_hash_map<uint64_t, uint64_t> HOT 1
- memcpy can happen with null "source" parameter HOT 6
- class-memaccess warning when compiling with GCC 8 or later HOT 1
- Problem when compiling on macOS catalina HOT 6
- Can sparse_hash_map support unique_ptr?
- Compiler warning on realloc call within sparsepp. HOT 1
- Error when compile: is not a special member function which can be defaulted HOT 2
- Missing return value. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparsepp.