Giter Club home page Giter Club logo

Comments (9)

ZLY201 avatar ZLY201 commented on June 8, 2024

Draft

I have to saw the source code of chorme v8 engine.

To really optimize hash container we need to know the principle of Js-array.

Consider the following code:

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 11.60986328125 ms
console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 619.56884765625 ms
console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = {};
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 81.559814453125 ms

The length only increased one but the running time became ten times over.

The reason is the array in js has two mode: fast and slow.

The fast mode will create a memory contiguous array in C++.

But the slow mode will create a HashTable to reduce the memory.

You know all the members in js is Object including Array.

When you change the length of Array directly it will be changed to a HashTable like { 0: xxx, 1: xxx, ... }.

But in Java and C++, the array is always contiguous.

So we don't need to make the reAllocate function to expand or save the memory.

Just change the hashTable's length to (1 << 30).

Test

Do a simple verification of the above theory.

// After change.
import { HashSet } from 'js-sdsl';
const testNum = 1000000;
function generateRandomString() {
    const length = Math.max(Math.floor(Math.random() * 6), 1);
    const base = 'a'.charCodeAt(0);
    let str = '';
    for (let i = 0; i < length; ++i) {
        const random = Math.floor(Math.random() * 26);
        str += String.fromCharCode(base + random);
    }
    return str;
}
const arr = [];
for (let i = 0; i < testNum; ++i) {
    arr.push(generateRandomString());
    arr.push(Math.random() * testNum);
    arr.push(i);
}
const st = new Set();
console.time('ES6');
for (let i = 0; i < testNum; ++i) {
    st.add(arr[i]);
}
console.timeEnd('ES6');
const hst = new HashSet<number>();
console.time('js-sdsl');
for (let i = 0; i < testNum; ++i) {
    hst.insert(arr[i]);
}
console.timeEnd('js-sdsl');
// ES6: 552.02ms
// js-sdsl: 999.697ms

Much faster than before.

I'll do memory test later.

from js-sdsl.

ZLY201 avatar ZLY201 commented on June 8, 2024

About hash function

I find the source hash function for Integer in v8.

inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
  uint32_t hash = key;
  hash = hash ^ seed;
  hash = ~hash + (hash << 15);  // hash = (hash << 15) - hash - 1;
  hash = hash ^ (hash >> 12);
  hash = hash + (hash << 2);
  hash = hash ^ (hash >> 4);
  hash = hash * 2057;  // hash = (hash + (hash << 3)) + (hash << 11);
  hash = hash ^ (hash >> 16);
  return hash & 0x3fffffff;
}

That's easy to implement.

But the hash function of object it use the address of the pointer.

int DescriptorLookupCache::Hash(Object* source, Name* name) {
  // Uses only lower 32 bits if pointers are larger.
  uint32_t source_hash =
      static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
      kPointerSizeLog2;
  uint32_t name_hash = name->hash_field();
  return (source_hash ^ name_hash) % kLength;
}

I have no idea how to implement it.

It seems like we have no way to reduce the time to get the hash code.

from js-sdsl.

ZLY201 avatar ZLY201 commented on June 8, 2024
  • before
=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  1537   │
│    1    │      'insert'       │ 1000000 │    2000000    │  5486   │
│    2    │       'find'        │ 2000000 │    2000000    │  2380   │
│    3    │ 'eraseElementByKey' │ 2000000 │    2000000    │  1703   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  2308   │
│    1    │    'setElement'     │ 1000000 │    1000000    │  4395   │
│    2    │  'getElementByKey'  │ 1000000 │    1000000    │  3872   │
│    3    │ 'eraseElementByKey' │ 1000000 │    1000000    │  4138   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================
  • after
=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  1064   │
│    1    │      'insert'       │ 1000000 │    2000000    │  1140   │
│    2    │       'find'        │ 2000000 │    2000000    │  4839   │
│    3    │ 'eraseElementByKey' │ 2000000 │    2000000    │  2911   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │   988   │
│    1    │    'setElement'     │ 1000000 │    1000000    │   271   │
│    2    │  'getElementByKey'  │ 1000000 │    1000000    │  1205   │
│    3    │ 'eraseElementByKey' │ 1000000 │    1000000    │   863   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================

from js-sdsl.

ZLY201 avatar ZLY201 commented on June 8, 2024

I find that slow mode for Array is really slow.

I think we should find a new way to solve this problem.

The best idea I consider is that change hashTable type to Node<K, V>[] like java.

from js-sdsl.

ZLY201 avatar ZLY201 commented on June 8, 2024

The source code of ES6 Set: https://codereview.chromium.org/220293002/diff/20001/src/objects.cc.

from js-sdsl.

noname0310 avatar noname0310 commented on June 8, 2024

About hash function

I find the source hash function for Integer in v8.

inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
  uint32_t hash = key;
  hash = hash ^ seed;
  hash = ~hash + (hash << 15);  // hash = (hash << 15) - hash - 1;
  hash = hash ^ (hash >> 12);
  hash = hash + (hash << 2);
  hash = hash ^ (hash >> 4);
  hash = hash * 2057;  // hash = (hash + (hash << 3)) + (hash << 11);
  hash = hash ^ (hash >> 16);
  return hash & 0x3fffffff;
}

That's easy to implement.

But the hash function of object it use the address of the pointer.

int DescriptorLookupCache::Hash(Object* source, Name* name) {
  // Uses only lower 32 bits if pointers are larger.
  uint32_t source_hash =
      static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
      kPointerSizeLog2;
  uint32_t name_hash = name->hash_field();
  return (source_hash ^ name_hash) % kLength;
}

I have no idea how to implement it.

It seems like we have no way to reduce the time to get the hash code.

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

from js-sdsl.

ZLY201 avatar ZLY201 commented on June 8, 2024

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

Do you mean { value: xxx, id: xxx }?

It will use object {} and it's a hash container too.

from js-sdsl.

noname0310 avatar noname0310 commented on June 8, 2024

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

Do you mean { value: xxx, id: xxx }?

It will use object {} and it's a hash container too.

const ObjectIdAllocator = new class ObjectIdAllocator {
    private _nextId = 0;
    private _map = new WeakMap<object, number>();

    public generateOrGetId(obj: object): number {
        let id = this._map.get(obj);
        if (id === undefined) {
            id = this._nextId;
            this._nextId += 1;
            this._map.set(obj, id);
        }
        return id;
    }
}

const obj1 = {
    a: 1,
    b: 2
};

const obj2 = {
    a: 1,
    b: 2
};

console.log(ObjectIdAllocator.generateOrGetId(obj1)); // 0
console.log(ObjectIdAllocator.generateOrGetId(obj2)); // 1

It seems that the hashing cost can be reduced by replacing the hashing function with an object that gives a unique ID according to the reference.

from js-sdsl.

ZLY201 avatar ZLY201 commented on June 8, 2024

I carefully studied the internal implementation of Js v8, I think it is impossible to make the hash table perform better than the native object or set.

The reasons are as follows:

  1. object is super fast, 70 times faster than the hash table I optimized.
  2. The source code of js is written by c++, array in Js are not consistent with any other language. I can't use it as well as other static languages.
  3. I can't do some low-level operations like getting the variable address. It make my hash function so slow.
  4. If I want to optimize further, I may need to use some functions that some browsers do not support, but object is fully compatible for all the browsers.

I decided to hold off on this problem until I find a good solution.

It's worth mentioning that while I can't make it as fast as native set, it's not impossible to get performance that is orders of magnitude close.

If the object do something by 20ms I think I can make my hash table do it wthin 100ms.

Because the current hash table does not fully refer to the Java implementation due to some initial design flaws. I'll try to solve this problem as much as possible but it may take long long time to re-design js-sdsl. The version should be 5.x.

I think the only way to optimize the performance of the hash table as soon as possible is to encapsulate the native object or set. It turns out that popular hash libraries do this. But that should be discussed.

from js-sdsl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.