konsumlamm / rrb-vector Goto Github PK
View Code? Open in Web Editor NEWAn implementation of a Relaxed Radix Balanced Vector in Haskell.
License: BSD 3-Clause "New" or "Revised" License
An implementation of a Relaxed Radix Balanced Vector in Haskell.
License: BSD 3-Clause "New" or "Revised" License
It would be useful to have sorting utilities that function directly on rrb vectors so conversions between types to get sorting aren't necessary; sortBy at the very least would be useful.
Currently, replicate n x
can be implemented as fromList (replicate n x)
(using replicate
for lists).
However, this can be optimized by sharing as many nodes as possible, avoiding unnecessary allocations. This would not only be faster than using fromList
, but also make the resulting vector consume less memory.
RRB trees are appealing because like vectors, they offer an even more efficient transient (as opposed to persistent) API based on unsafe mutable primitives.
Can we have such a module in this library? Plus maybe another module that wraps the unsafe API with linear types.
See also Ed Kmett's transients lib.
Why are the arrays stored in the RRB-tree nodes O(1)-sliceable?
It adds a memory cost of 2 words per node, but the benefit is not clear to me from looking through the code.
The original paper describing the tree and its Scala implementation use regular fixed-length arrays, so reading that didn't shed light on it either.
Hi, @konsumlamm, thanks for publishing rrb-vector
. I'd like to try it out as a replacement for Seq
s in a few projects I'm working on. When working with Seq
s, I find using the bundled pattern synonyms quite convenient. Would you be interested in adding similar pattern synonyms for Vector
(I didn't see any in the haddocks)? If you think it might be a good choice, I could try to implement them and open a PR. Thanks!
It would be nice to define unzip
simply as unzipWith id
, but we need to make sure that this doesn't decrease its performance.
I would like to add some tests that check that structure invariants of the tree are not violated, if that sounds ok.
Functions <|
, ><
, insertAt
, deleteAt
seem to be violating invariants.
Originally proposed by u/sansboarders on reddit.
Using unboxed arrays in the tree structure would probably improve performance (note that I haven't done any benchmarks yet, so I'm not sure). There are a few different "kinds of unboxed values" to consider:
I don't know what the differences (in performance and ergonomics) between these three are or if it's possible to use these for the inner nodes (instead of only the leafs), so input on this is welcome.
One thing I'd like to avoid is duplicating too much code for this. One solution would be to implement a common typeclass, although this would probably make things slower (unless everything is specialized?) and type inference worse, so I'm not sure it's worth the trouble.
Supporting the appropriate type-classes from lens
like Seq
does could make transitioning from Seq
to Vector
as simple as just changing the used type constructor for code using lens
.
I would find a value-strict interface to this data structure helpful.
The proposal is to add a new module called Data.RRBVector.Strict
which exports a newtype wrapper around the existing type, and functions and typeclass instances that are all value-strict.
If this sounds OK I'd be happy to prepare a PR.
Currently, the Arbitrary1
instance in the test suite only generates balanced and a specific kind of unbalanced trees. It would be nice to have more kinds of unbalanced trees or trees resulting from applying several operations to an initially balanced tree.
rrb-vector/src/Data/RRBVector/Internal/Buffer.hs
Lines 19 to 20 in 204a643
Unpacking the MutableArray
would likely give a small performance benefit. Or better, Buffer
could use a SmallMutableArray
directly since it does not need to support slicing.
When you cons or snoc a vector that previously had its head removed, the resultant Vector has the head in it.
This is due to the copying in the cons and snoc functions ignoring the offset in the original Arrays. See here: https://hackage.haskell.org/package/rrb-vector-0.2.0.1/docs/src/Data.RRBVector.Internal.Array.html#cons
Reproduction below.
> import Data.RRBVector
> x = fromList "ab"
> viewl x
Just ('a',fromList "b")
> let Just (_, y) = viewl x
> y
fromList "b"
> 'a' <| y
fromList "aa"
For other arrays, containers, strings, etc., including Sequence that users will often be converting from, index takes the container before the Int, and adjust and adjust' take the (a -> a) function as the first argument. In addition to being consistent with these cases, I think one can argue that the standard order is more often useful for currying.
I realize this is a breaking change, but I think it'd be much better to make it now, before there are too many users, rather than wait. Users that are surprised by the change should at least get a compiler error, plus you'll obviously bump the library's version number.
Thanks very much for this library!! Sorry to nit-pick.
Your folks' great work has inspired me to think about a sparse vector/array type, that is usually (much) more space efficient than IntMap. I'm thinking of a Word64 of bits for present/missing, together with a SmallArray of the non-missing elements (for dimension <= 64) or sparse subvectors. Hopefully with bit-counting instructions for countLeadingZeros or countTrailingZeros, indexing would be pretty fast.
My application is sparse linear algebra, where vectors vary gradually in sparseness. For example, one very large case from integer factoring I believe starts with an integer (or integer mod p) matrix with around 100,000 rows, each of which averages 20-30 nonzero entries. As one does (clever) Gaussian elimination, the rows steadily get more dense. Thus I'm thinking that for most of the algorithm, this data structure would be much better than either extreme of IntMap (good for very sparse vectors), or fully dense arrays.
For linear algebra at least, the basic operations are singleton, zipWith/merge (e.g. adding sparse vectors), map, and fold. For high-performance (exact) work, it'd be nice eventually to also have these types over unboxed integers mod p where p < 2^8, 2^16, 2^32, or 2^64 respectively.
What do you think? Thanks in advance for any advice or interest, and sorry if this is somewhat off-topic for rrb-vector.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.