Comments (9)
Draft
I have to saw the source code of chorme v8 engine.
To really optimize hash container we need to know the principle of Js-array.
Consider the following code:
console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 11.60986328125 ms
console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 619.56884765625 ms
console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = {};
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 81.559814453125 ms
The length only increased one but the running time became ten times over.
The reason is the array in js has two mode: fast and slow.
The fast mode will create a memory contiguous array in C++.
But the slow mode will create a HashTable
to reduce the memory.
You know all the members in js is Object
including Array
.
When you change the length of Array
directly it will be changed to a HashTable
like { 0: xxx, 1: xxx, ... }
.
But in Java
and C++
, the array is always contiguous.
So we don't need to make the reAllocate
function to expand or save the memory.
Just change the hashTable
's length to (1 << 30)
.
Test
Do a simple verification of the above theory.
// After change.
import { HashSet } from 'js-sdsl';
const testNum = 1000000;
function generateRandomString() {
const length = Math.max(Math.floor(Math.random() * 6), 1);
const base = 'a'.charCodeAt(0);
let str = '';
for (let i = 0; i < length; ++i) {
const random = Math.floor(Math.random() * 26);
str += String.fromCharCode(base + random);
}
return str;
}
const arr = [];
for (let i = 0; i < testNum; ++i) {
arr.push(generateRandomString());
arr.push(Math.random() * testNum);
arr.push(i);
}
const st = new Set();
console.time('ES6');
for (let i = 0; i < testNum; ++i) {
st.add(arr[i]);
}
console.timeEnd('ES6');
const hst = new HashSet<number>();
console.time('js-sdsl');
for (let i = 0; i < testNum; ++i) {
hst.insert(arr[i]);
}
console.timeEnd('js-sdsl');
// ES6: 552.02ms
// js-sdsl: 999.697ms
Much faster than before.
I'll do memory test later.
from js-sdsl.
About hash function
I find the source hash function for Integer
in v8.
inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
uint32_t hash = key;
hash = hash ^ seed;
hash = ~hash + (hash << 15); // hash = (hash << 15) - hash - 1;
hash = hash ^ (hash >> 12);
hash = hash + (hash << 2);
hash = hash ^ (hash >> 4);
hash = hash * 2057; // hash = (hash + (hash << 3)) + (hash << 11);
hash = hash ^ (hash >> 16);
return hash & 0x3fffffff;
}
That's easy to implement.
But the hash function of object
it use the address of the pointer.
int DescriptorLookupCache::Hash(Object* source, Name* name) {
// Uses only lower 32 bits if pointers are larger.
uint32_t source_hash =
static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
kPointerSizeLog2;
uint32_t name_hash = name->hash_field();
return (source_hash ^ name_hash) % kLength;
}
I have no idea how to implement it.
It seems like we have no way to reduce the time to get the hash code.
from js-sdsl.
- before
=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 1537 │
│ 1 │ 'insert' │ 1000000 │ 2000000 │ 5486 │
│ 2 │ 'find' │ 2000000 │ 2000000 │ 2380 │
│ 3 │ 'eraseElementByKey' │ 2000000 │ 2000000 │ 1703 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 2308 │
│ 1 │ 'setElement' │ 1000000 │ 1000000 │ 4395 │
│ 2 │ 'getElementByKey' │ 1000000 │ 1000000 │ 3872 │
│ 3 │ 'eraseElementByKey' │ 1000000 │ 1000000 │ 4138 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================
- after
=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 1064 │
│ 1 │ 'insert' │ 1000000 │ 2000000 │ 1140 │
│ 2 │ 'find' │ 2000000 │ 2000000 │ 4839 │
│ 3 │ 'eraseElementByKey' │ 2000000 │ 2000000 │ 2911 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 988 │
│ 1 │ 'setElement' │ 1000000 │ 1000000 │ 271 │
│ 2 │ 'getElementByKey' │ 1000000 │ 1000000 │ 1205 │
│ 3 │ 'eraseElementByKey' │ 1000000 │ 1000000 │ 863 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================
from js-sdsl.
I find that slow mode for Array
is really slow.
I think we should find a new way to solve this problem.
The best idea I consider is that change hashTable
type to Node<K, V>[]
like java.
from js-sdsl.
The source code of ES6 Set: https://codereview.chromium.org/220293002/diff/20001/src/objects.cc.
from js-sdsl.
About hash function
I find the source hash function for
Integer
in v8.inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) { uint32_t hash = key; hash = hash ^ seed; hash = ~hash + (hash << 15); // hash = (hash << 15) - hash - 1; hash = hash ^ (hash >> 12); hash = hash + (hash << 2); hash = hash ^ (hash >> 4); hash = hash * 2057; // hash = (hash + (hash << 3)) + (hash << 11); hash = hash ^ (hash >> 16); return hash & 0x3fffffff; }That's easy to implement.
But the hash function of
object
it use the address of the pointer.int DescriptorLookupCache::Hash(Object* source, Name* name) { // Uses only lower 32 bits if pointers are larger. uint32_t source_hash = static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >> kPointerSizeLog2; uint32_t name_hash = name->hash_field(); return (source_hash ^ name_hash) % kLength; }I have no idea how to implement it.
It seems like we have no way to reduce the time to get the hash code.
Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?
from js-sdsl.
Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?
Do you mean { value: xxx, id: xxx }
?
It will use object {}
and it's a hash container too.
from js-sdsl.
Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?
Do you mean
{ value: xxx, id: xxx }
?It will use object
{}
and it's a hash container too.
const ObjectIdAllocator = new class ObjectIdAllocator {
private _nextId = 0;
private _map = new WeakMap<object, number>();
public generateOrGetId(obj: object): number {
let id = this._map.get(obj);
if (id === undefined) {
id = this._nextId;
this._nextId += 1;
this._map.set(obj, id);
}
return id;
}
}
const obj1 = {
a: 1,
b: 2
};
const obj2 = {
a: 1,
b: 2
};
console.log(ObjectIdAllocator.generateOrGetId(obj1)); // 0
console.log(ObjectIdAllocator.generateOrGetId(obj2)); // 1
It seems that the hashing cost can be reduced by replacing the hashing function with an object that gives a unique ID according to the reference.
from js-sdsl.
I carefully studied the internal implementation of Js v8, I think it is impossible to make the hash table perform better than the native object
or set
.
The reasons are as follows:
object
is super fast, 70 times faster than the hash table I optimized.- The source code of
js
is written byc++
,array
in Js are not consistent with any other language. I can't use it as well as other static languages. - I can't do some low-level operations like getting the variable address. It make my hash function so slow.
- If I want to optimize further, I may need to use some functions that some browsers do not support, but
object
is fully compatible for all the browsers.
I decided to hold off on this problem until I find a good solution.
It's worth mentioning that while I can't make it as fast as native set
, it's not impossible to get performance that is orders of magnitude close.
If the object
do something by 20ms
I think I can make my hash table do it wthin 100ms
.
Because the current hash table does not fully refer to the Java implementation due to some initial design flaws. I'll try to solve this problem as much as possible but it may take long long time to re-design js-sdsl
. The version should be 5.x
.
I think the only way to optimize the performance of the hash table as soon as possible is to encapsulate the native object
or set
. It turns out that popular hash libraries do this. But that should be discussed.
from js-sdsl.
Related Issues (20)
- Find a way to do bench for database connection
- Memory complexity? HOT 12
- Export default for isolatation package HOT 9
- iterator access denied?why? HOT 3
- [RFC] About the version 5.x
- ESM build has classes downleveled to functions, but CJS build does not HOT 3
- Find a new way to do publish HOT 4
- Question: OrderedSet and comparison function HOT 4
- OrderedSet with less than 3 items reverse iteration infinite loop HOT 4
- OrderedMapIterator.pointer does not support array destructuring assignment (TypeError: it.pointer is not iterable) HOT 3
- `isAccessible` property for iterators? HOT 2
- Queue is empty, but the last value still exists HOT 2
- Standardize HashContainer HOT 1
- Reduce packaging size HOT 15
- bug: get wrong tree index when tree size is 1 HOT 1
- Make heap stronger HOT 1
- Write examples for all apis
- Error: Cannot find module './Base' HOT 2
- Display version selection on the home page of gh-pages HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from js-sdsl.