Giter Club home page Giter Club logo

Comments (6)

greg7mdp avatar greg7mdp commented on June 15, 2024 2

I'll add a fix in sparsepp to address the issue, so that the problem does not occur even when using std::hash.

from sparsepp.

greg7mdp avatar greg7mdp commented on June 15, 2024

@vspinu, Thanks for sharing the test data, I can reproduce the issue.

As you suspected, the problem is that the keys are not well distributed, and the std::hash() function from clang++ or g++ does not help (problem does not occur with Visual Studio 2015).

However, with a small change in your test program (overriding the default hash function), I was able to fix the issue:

vagrant@vagrant-ubuntu-trusty-64:/vagrant/tmp$ g++ -O2 -I.. keys.cc -o keys
vagrant@vagrant-ubuntu-trusty-64:/vagrant/tmp$ ./keys
els:          0, map_size:          0, num_present:          0 at Sun Jan  1 00:01:33 2017
els:   10000000, map_size:    6849514, num_present:    3150486 at Sun Jan  1 00:01:37 2017
els:   20000000, map_size:   10353403, num_present:    6496111 at Sun Jan  1 00:01:42 2017
els:   30000000, map_size:   15231352, num_present:    5122051 at Sun Jan  1 00:01:46 2017
els:   40000000, map_size:   22352845, num_present:    2878507 at Sun Jan  1 00:01:53 2017
els:   50000000, map_size:   28546365, num_present:    3806480 at Sun Jan  1 00:01:58 2017
els:   60000000, map_size:   33967341, num_present:    4579024 at Sun Jan  1 00:02:06 2017
els:   70000000, map_size:   40369410, num_present:    3597931 at Sun Jan  1 00:02:10 2017
els:   80000000, map_size:   46740765, num_present:    3628645 at Sun Jan  1 00:02:15 2017
els:   90000000, map_size:   53220303, num_present:    3520462 at Sun Jan  1 00:02:21 2017
els:  100000000, map_size:   60228564, num_present:    2991739 at Sun Jan  1 00:02:26 2017
els:  110000000, map_size:   64684927, num_present:    5543637 at Sun Jan  1 00:02:31 2017
els:  120000000, map_size:   71014663, num_present:    3670264 at Sun Jan  1 00:02:43 2017
els:  130000000, map_size:   77742400, num_present:    3272263 at Sun Jan  1 00:02:48 2017
els:  140000000, map_size:   83478135, num_present:    4264265 at Sun Jan  1 00:02:53 2017
els:  150000000, map_size:   90135208, num_present:    3342927 at Sun Jan  1 00:02:59 2017
els:  160000000, map_size:   96354061, num_present:    3781147 at Sun Jan  1 00:03:04 2017
els:  170000000, map_size:  102911284, num_present:    3442777 at Sun Jan  1 00:03:10 2017
els:  180000000, map_size:  109286371, num_present:    3624913 at Sun Jan  1 00:03:15 2017
els:  190000000, map_size:  115816683, num_present:    3469688 at Sun Jan  1 00:03:21 2017
els:  200000000, map_size:  121958054, num_present:    3858629 at Sun Jan  1 00:03:27 2017
els:  210000000, map_size:  128693156, num_present:    3264898 at Sun Jan  1 00:03:33 2017
els:  220000000, map_size:  135951840, num_present:    2741316 at Sun Jan  1 00:03:53 2017
els:  230000000, map_size:  141173034, num_present:    4778806 at Sun Jan  1 00:03:58 2017
els:  240000000, map_size:  147978238, num_present:    3194796 at Sun Jan  1 00:04:03 2017

Here is the modified code:

#include <fstream>
#include <iostream>
#include <ctime>
#include <cstdio>
#include <sparsepp.h>

struct Hasher {
    size_t operator()(uint64_t a) const { return a * 1099511628211ULL; }
};

int main() 
{
    std::ifstream f("./keys.tmp", std::ios::binary);
    spp::sparse_hash_map<uint64_t, double, Hasher> map;

    long i = 0, num_present=0;

    while(!f.eof()) 
    {
        time_t ctt = time(0);
        if (i % 10000000ll == 0)
        {
            printf("els: %10lu, map_size: %10lu, num_present: %10lu at %s", 
                   i, map.size(), num_present, asctime(localtime(&ctt)));
            num_present=0;
        }
        uint64_t key;
        f.read((char*)&key, sizeof(key));
        if (map.find(key) != map.end())
            num_present++;

        map[key] += 1.0;
        i++;
    }
    return 0;
}

from sparsepp.

greg7mdp avatar greg7mdp commented on June 15, 2024

Actually, I did not change the default in sparsepp to use integer keys unchanged, as this is used in some benchmarks and doing the right thing (scrambling integer keys by multiplying with a prime number) penalizes us in benchmarks. However I have added a section on the README showing how to override the hash function for integers.

from sparsepp.

vspinu avatar vspinu commented on June 15, 2024

Great! AFAIC the issue is fixed. Thank you very much for the solution and for the great package!

from sparsepp.

greg7mdp avatar greg7mdp commented on June 15, 2024

Thanks!

from sparsepp.

greg7mdp avatar greg7mdp commented on June 15, 2024

@vspinu, I have updated the sparsepp default hash to add some key mixing. This solves the issue you exposed, where the keys are not well distributed. If you get the latest version, it is not necessary anymore to provide a custom hashing function, the default sparsepp hash works fine.

from sparsepp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.