Optimize HashContainer until the performance is on par with ES6 about js-sdsl HOT 9 CLOSED

js-sdsl commented on June 8, 2024

Optimize HashContainer until the performance is on par with ES6

from js-sdsl.

Comments (9)

ZLY201 commented on June 8, 2024

Draft

I have to saw the source code of chorme v8 engine.

To really optimize hash container we need to know the principle of Js-array.

Consider the following code:

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 11.60986328125 ms

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 619.56884765625 ms

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = {};
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 81.559814453125 ms

The length only increased one but the running time became ten times over.

The reason is the array in js has two mode: fast and slow.

The fast mode will create a memory contiguous array in C++.

But the slow mode will create a HashTable to reduce the memory.

You know all the members in js is Object including Array.

When you change the length of Array directly it will be changed to a HashTable like { 0: xxx, 1: xxx, ... }.

But in Java and C++, the array is always contiguous.

So we don't need to make the reAllocate function to expand or save the memory.

Just change the hashTable's length to (1 << 30).

Test

Do a simple verification of the above theory.

// After change.
import { HashSet } from 'js-sdsl';
const testNum = 1000000;
function generateRandomString() {
    const length = Math.max(Math.floor(Math.random() * 6), 1);
    const base = 'a'.charCodeAt(0);
    let str = '';
    for (let i = 0; i < length; ++i) {
        const random = Math.floor(Math.random() * 26);
        str += String.fromCharCode(base + random);
    }
    return str;
}
const arr = [];
for (let i = 0; i < testNum; ++i) {
    arr.push(generateRandomString());
    arr.push(Math.random() * testNum);
    arr.push(i);
}
const st = new Set();
console.time('ES6');
for (let i = 0; i < testNum; ++i) {
    st.add(arr[i]);
}
console.timeEnd('ES6');
const hst = new HashSet<number>();
console.time('js-sdsl');
for (let i = 0; i < testNum; ++i) {
    hst.insert(arr[i]);
}
console.timeEnd('js-sdsl');
// ES6: 552.02ms
// js-sdsl: 999.697ms

Much faster than before.

I'll do memory test later.

from js-sdsl.

ZLY201 commented on June 8, 2024

About hash function

I find the source hash function for Integer in v8.

inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
  uint32_t hash = key;
  hash = hash ^ seed;
  hash = ~hash + (hash << 15);  // hash = (hash << 15) - hash - 1;
  hash = hash ^ (hash >> 12);
  hash = hash + (hash << 2);
  hash = hash ^ (hash >> 4);
  hash = hash * 2057;  // hash = (hash + (hash << 3)) + (hash << 11);
  hash = hash ^ (hash >> 16);
  return hash & 0x3fffffff;
}

That's easy to implement.

But the hash function of object it use the address of the pointer.

int DescriptorLookupCache::Hash(Object* source, Name* name) {
  // Uses only lower 32 bits if pointers are larger.
  uint32_t source_hash =
      static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
      kPointerSizeLog2;
  uint32_t name_hash = name->hash_field();
  return (source_hash ^ name_hash) % kLength;
}

I have no idea how to implement it.

It seems like we have no way to reduce the time to get the hash code.

from js-sdsl.

ZLY201 commented on June 8, 2024

before

=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  1537   │
│    1    │      'insert'       │ 1000000 │    2000000    │  5486   │
│    2    │       'find'        │ 2000000 │    2000000    │  2380   │
│    3    │ 'eraseElementByKey' │ 2000000 │    2000000    │  1703   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  2308   │
│    1    │    'setElement'     │ 1000000 │    1000000    │  4395   │
│    2    │  'getElementByKey'  │ 1000000 │    1000000    │  3872   │
│    3    │ 'eraseElementByKey' │ 1000000 │    1000000    │  4138   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================

after

=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  1064   │
│    1    │      'insert'       │ 1000000 │    2000000    │  1140   │
│    2    │       'find'        │ 2000000 │    2000000    │  4839   │
│    3    │ 'eraseElementByKey' │ 2000000 │    2000000    │  2911   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │   988   │
│    1    │    'setElement'     │ 1000000 │    1000000    │   271   │
│    2    │  'getElementByKey'  │ 1000000 │    1000000    │  1205   │
│    3    │ 'eraseElementByKey' │ 1000000 │    1000000    │   863   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================

from js-sdsl.

ZLY201 commented on June 8, 2024

I find that slow mode for Array is really slow.

I think we should find a new way to solve this problem.

The best idea I consider is that change hashTable type to Node<K, V>[] like java.

from js-sdsl.

ZLY201 commented on June 8, 2024

The source code of ES6 Set: https://codereview.chromium.org/220293002/diff/20001/src/objects.cc.

from js-sdsl.

noname0310 commented on June 8, 2024

About hash function

I find the source hash function for Integer in v8.

inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
  uint32_t hash = key;
  hash = hash ^ seed;
  hash = ~hash + (hash << 15);  // hash = (hash << 15) - hash - 1;
  hash = hash ^ (hash >> 12);
  hash = hash + (hash << 2);
  hash = hash ^ (hash >> 4);
  hash = hash * 2057;  // hash = (hash + (hash << 3)) + (hash << 11);
  hash = hash ^ (hash >> 16);
  return hash & 0x3fffffff;
}

That's easy to implement.

But the hash function of object it use the address of the pointer.

int DescriptorLookupCache::Hash(Object* source, Name* name) {
  // Uses only lower 32 bits if pointers are larger.
  uint32_t source_hash =
      static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
      kPointerSizeLog2;
  uint32_t name_hash = name->hash_field();
  return (source_hash ^ name_hash) % kLength;
}

I have no idea how to implement it.

It seems like we have no way to reduce the time to get the hash code.

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

from js-sdsl.

ZLY201 commented on June 8, 2024

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

Do you mean { value: xxx, id: xxx }?

It will use object {} and it's a hash container too.

from js-sdsl.

noname0310 commented on June 8, 2024

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

Do you mean { value: xxx, id: xxx }?

It will use object {} and it's a hash container too.

const ObjectIdAllocator = new class ObjectIdAllocator {
    private _nextId = 0;
    private _map = new WeakMap<object, number>();

    public generateOrGetId(obj: object): number {
        let id = this._map.get(obj);
        if (id === undefined) {
            id = this._nextId;
            this._nextId += 1;
            this._map.set(obj, id);
        }
        return id;
    }
}

const obj1 = {
    a: 1,
    b: 2
};

const obj2 = {
    a: 1,
    b: 2
};

console.log(ObjectIdAllocator.generateOrGetId(obj1)); // 0
console.log(ObjectIdAllocator.generateOrGetId(obj2)); // 1

It seems that the hashing cost can be reduced by replacing the hashing function with an object that gives a unique ID according to the reference.

from js-sdsl.

ZLY201 commented on June 8, 2024

I carefully studied the internal implementation of Js v8, I think it is impossible to make the hash table perform better than the native object or set.

The reasons are as follows:

object is super fast, 70 times faster than the hash table I optimized.
The source code of js is written by c++, array in Js are not consistent with any other language. I can't use it as well as other static languages.
I can't do some low-level operations like getting the variable address. It make my hash function so slow.
If I want to optimize further, I may need to use some functions that some browsers do not support, but object is fully compatible for all the browsers.

I decided to hold off on this problem until I find a good solution.

It's worth mentioning that while I can't make it as fast as native set, it's not impossible to get performance that is orders of magnitude close.

If the object do something by 20ms I think I can make my hash table do it wthin 100ms.

Because the current hash table does not fully refer to the Java implementation due to some initial design flaws. I'll try to solve this problem as much as possible but it may take long long time to re-design js-sdsl. The version should be 5.x.

I think the only way to optimize the performance of the hash table as soon as possible is to encapsulate the native object or set. It turns out that popular hash libraries do this. But that should be discussed.

from js-sdsl.

Optimize HashContainer until the performance is on par with ES6 about js-sdsl HOT 9 CLOSED

Comments (9)

Draft

Test

About hash function

About hash function

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent