object88 / immutable Goto Github PK
View Code? Open in Web Editor NEWExperimental: Golang library for immutable collections
Experimental: Golang library for immutable collections
Given a hashmap, is it acceptable to have a key-value pair with a nil value?
If so, what needs to change to support the difference between a missed key and a valid key with a nil value (say, during a Get
)?
In order to ensure quality code, connect an automated test service to github. Block PR merges if tests fail, similar to merge conflicts.
The current code starts with a given content collection, and the only mutations allowed are with Filter
, Map
, and Reduce
. Should provide methods such as Set
, Replace
, etc.
Prioritized collection types:
Hashmap
List
Set
Ordered Set
Ordered Hashmap
Tree?
Given a hashmap with two elements, and a call to Remove
with a key that does not match the collection, should the method return a reference to a new hashmap with identical contents, or a pointer to the same hashmap?
Same behavior for Add, when the key and value are the same as what already exists.
There is an ancillary question of consistency -- what happens if we cannot tell for certain that a collection has or has not changed? In the case of an add, if the value is a complex object, how to we determine its equivalence? Or do we assume that if the value is a pointer, than we simply compare the pointers?
What do other immutable libraries do?
In the core iteration code, HashMap.iterate
, we use defer close(ch)
, which has been shown to add some small execution time. We should perform some benchmarks to determine whether it is sensible to be explicit about the channel closing, rather than using defer
.
Because wouldn't it be fun to use a hashmap as a key to a hashmap?
Code currently assumes that all memory allocation for hash map packed hob arrays should be done with 32-bit blocks (i.e, uint32
). Introduce implementation for 16-bit (i.e., uint16
) blocks.
Either both should be internal, or both exposed.
In a given bucket, the current code base stores all collection entries as entry
, essentially a pair of interface{}
objects. If the key or value is a simple data type, (int
, bool
, etc), storing a reference off to another memory block is a waste.
This can be optimized (for space) by detecting whether the key or value is a simple data type, and reading into and out of the memory block directly.
Ultimately, this could be implemented with a strategy for speed vs. space.
Since the options cannot change internally, we can share the options object between hash maps; the options should never change.
For small hash maps (under 1k? under 10k?), slide down to a 32bit hash value, instead of 64 bit. This might reduce memory footprint.
Shared buckets?
Given a hashmap, and a addition or deletion which does not change the number of buckets, a bucket may be shared between two hashmaps.
Note that this will impact assignment functions; if the value is changed for a key, that will also enforce changing a bucket.
Make sure that, for a collection where nil
is an acceptable value, invoking hashmap.Insert(key, nil)
will properly replace the existing value with nil
.
Given a wrapper hashmap returning the list of keys, it receives a list of unsafe.Pointer from the internal hashmap, which it then needs to translate. This can be improved by constructing one array, and passing a callback into the internal hashmap's GetKeys
to return one key at a time.
Looking forward, it is important to evaluate different strategies for memory implementation, reallocation, etc., even if for purely academic reasons The collection objects should accept a pointer to an Options
object which allows the user to select the strategy they want on a per-instance basis.
Add varients of Map
, Reduce
, etc. such that predicates are called on Goroutines.
Note: must document to not use cross-dependant predicates; cannnot use a predicate which is dependant on the completion of a prior running of the same predicate.
The given IntKey
is complete, but does not cover all usage scenarios. We should provide keys for more data types.
Examples:
StringKey
(partially done)
DateKey
FloatKey
Check for possible problematic behavior. The problem is that Key
s are interfaces, which are pointers.
Current implementation reads more like GoString
. String
should be more human-readable.
Current hashmap uses a simple struct to store key-value-pairs:
type bucket struct {
// ...
entries []entry
}
type entry struct {
key Key
value Value
}
Key and Value are both interfaces, which means that every entry
contains a pointer to the key, rather than the value of the key itself. In the case of simple values like integers, that value could be stored directly in that memory space itself (or less).
If we replace the []entry
property in bucket
with a byte array, we can read & write directly into it, and even only as much as required (perhaps reducing memory usage).
To review:
Example code; causes an issue with hashmaps with large data sets.
Width is set to 53, because large hashmaps contain significant low-order bytes, and we are attempting to use 64-bit hashes.
src := rand.NewSource(time.Now().UnixNano())
random := rand.New(src)
width := uint32(64 - 11)
max := int64(math.Pow(2.0, float64(width)))
count := uint64(8)
contents := make([]uint64, count)
for i := uint64(0); i < count; i++ {
contents[i] = uint64(random.Int63n(max))
}
m := AllocateMemories(LargeBlock, width, 8)
for k, v := range contents {
m.Assign(uint64(k), v)
}
for k, v := range contents {
result := m.Read(uint64(k))
if result != v {
t.Fatalf("At %d\nexpected %064b\nreceived %064b\n", k, v, result)
}
}
HashMaps are only one type of immutable collection object. We want to allow for an array-like list as well.
In order to prevent repeatedly creating Hashmaps with no contents, when a new hashmap is requested, if there are no contents, return a static instance.
This is what immutable.js does.
This may effect some unit tests.
Given a Hashmap
, should have function GetKeys
which returns []Key
. A nil hash map would return nil, and an empty hash map would return a 0-length array.
All tests use keys which have hits in the hashmap. It is not understood what happens when the key does not have an entry; write test cases to cover this.
If we can streamline setting options, then we can make experimentation easier and potentially enforce required settings.
References:
Design questions:
Allow for 64-bit blocks in Memory.
Code currently assumes that all memory allocation for hash map packed hob arrays should be done with 32-bit blocks (i.e, uint32
). Complete (and fix) implementation for 8-bit (i.e., byte
) blocks.
Current performance for Get
is between 2 and 7 times slower than native implementation for maps (depending on map size). There are many possible reasons, including the performance of the hashing mechanism.
The current code uses the built-in FNV32a hash methods, as supplied in the hash
package. Using the benchmark tools, we can see that...
for i := 0; i < max; i++ {
hasher := fnv.New32a()
binary.Write(hasher, binary.LittleEndian, uint32(i))
hash32 = hasher.Sum32()
}
...takes about 0.02ns / op, where max is 500,000.
Although it seems that there is little to gain here, we might find some slightly better performance by using a custom hashing mechanism.
Certain places in code refer to HashMap
, while other use Hashmap
. Pick one and clean up the codebase.
The Get
function of the hash map does not respect overflow buckets. If a hash results in more than 8 items in a bucket, then the Get
method seems to just return the 8th entry.
Given a hashmap, if we know that the number of buckets isn't changing when adding or deleting a entry, then any bucket which does not have an entry added or removed can be directly duplicated, instead of needing to recalculate hashes, etc.
Aside from LargeBlock
, SmallBlock
, etc, which pack bits together, there should be a NoPacking
option. Using this option, the memory array would be a full (uint32
?) block for every item. This may be space-wasteful, but maybe speedy.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.