Giter Club home page Giter Club logo

bifurcan's Introduction

<dependency>
  <groupId>io.lacuna</groupId>
  <artifactId>bifurcan</artifactId>
  <version>0.2.0-alpha4</version>
</dependency>

This library provides high-quality Java implementations of mutable and immutable data structures, each sharing a common API and these design principles:

  • efficient random access
  • efficient inverted indices (all collections other than lists provide an indexOf method)
  • efficient splitting and merging of collections
  • customizable equality semantics
  • contiguous memory used wherever possible
  • performance equivalent to, or better than, existing alternatives
  • changes to a collection can be tracked in a diff data structure, which can be subsequently rebased onto a different collection
  • [ALPHA] durable (disk-backed) representations which share the API and asymptotic performance of their in-memory counterparts

Rather than using the existing collection interfaces in java.util such as List or Map, it provides its own interfaces (IList, IMap, ISet) that provide functional semantics - each update to a collection returns a reference to a new collection. Each interface provides a method (toList, toMap, toSet) for coercing the collection to a read-only version of the standard Java interfaces.

what makes this better?

Some aspects of this library, like the inverted indices, diffs, and durable collections, are unique.

There are, however, many existing implementations of "functional" (aka persistent, immutable) data structures on the JVM. As shown in these in-depth comparisons, Bifurcan's performance is equivalent to the best existing implementations for basic operations, and significantly better for batch operations such as union, intersection, and difference.

These optimized batch operations require a high degree of complexity and are difficult to test, so it's understandable other library authors haven't bothered. Bifurcan relies on extensive generative tests to validate its implementation, which makes this additional complexity easier to manage.

collections

  • LinearMap is a mutable hash-map, which allows for custom hash and equality semantics. It stores entries contiguously in memory, which means that iteration over the entries can be 20x faster than java.util.HashMap for larger collections.
  • Map is an immutable hash-map, which also allows for custom hash and equality semantics. It ensures that all equivalent collections have an equivalent layout in memory, which makes checking for equality and performing set operations (merge, union, difference, intersection) significantly faster.
  • LinearSet and Set are built atop their respective map implementations, and have similar properties.
  • LinearList is a mutable list, which allows for elements to be added or removed from both ends of the collection, and allows random reads and writes within the list.
  • List is an immutable list, which also allows for modification at both ends, as well as random reads and writes. Due to its relaxed radix structure, it also allows for near constant-time slices and concatenation.
  • SortedMap is an immutable sorted map, built with a red-black tree.
  • IntMap is an immutable sorted map of integers onto arbitrary values, and can be used as an efficient sparse vector. FloatMap provides similar functionality for floating-point keys.
  • Rope is an immutable tree-based sequence of Unicode characters. Unlike Java's String, it uses UTF-8 encoding and can efficiently index via both full code points and Java's preferred UTF-16 code units.
  • Graph, DirectedGraph, and DirectedAcyclicGraph implementations, which provide immutable graph data structures.

Full documentation can be found here.

'linear' and 'forked' collections

If we pass a mutable data structure into a function, we have to know what that function intends to do with it. If it updates the data structure, we cannot safely read from it afterwards. If it stores the data structure, we cannot safely write to it afterwards. In other words, to use a mutable data structure safely we must ensure it has a single owner. Enforcing this may require us to hold a large amount of code in our head at once.

Immutable data structures free us from having to care. Functions can update or hold onto data without any risks. We can reason locally about the flow of data, without any care for what the rest of the code is doing. This can be enormously freeing.

This freedom, however, comes at a cost. Updates to immutable data structures require a subset of the structure to be copied, which is much more expensive than simply overwriting existing memory.

If a data structure is referenced in multiple places, this is usually a cost worth paying. In this case, however, it's just wasteful:

Set<Long> set = new Set<>();
for (int i = 0; i < 1000; i++) {
  set = set.add(i);
}

This will create 999 intermediate copies of the set, none of which we care about. There is only a single reference to these copies, and each is discarded as soon as add() is called. The dataflow of these calls form a simple, linear chain. To have more than one reference, the dataflow must diverge.

Where the dataflow is linear, we can safely use mutable collections. Where it is not, we prefer to use immutable collections. Since this linear flow is a local property, we would also like mutability to be a local property:

Set<Long> set = new Set<>().linear();
for (int i = 0; i < 1000; i++) {
  set.add(i);
}
set = set.forked();

The call to linear() indicates that the collection has a single owner, and may be updated in-place. The call to forked() indicates that this is no longer true. By allowing temporary mutability, we gain huge performance benefits. There is still a cost, however, relative to purely mutable data structures. For this reason, Bifurcan provides permanently linear variants of each collection:

LinearSet<Long> set = new LinearSet<>();
for (int i = 0; i < 1000; i++) {
  set.add(i);
}

If we call forked() on this collection, it will be wrapped in a diff facade, which is described below.

virtual collections

Bifurcan offers a variety of collection implementations, but you can also create your own by implementing a handful of methods.

A list, at its base, is just a size and a function that, given an index, returns the corresponding element. This can be constructed using the Lists.from method:

IList<Long> list = Lists.from(1_000_000, i -> i);

This creates a list of the numbers within [0, 1e6) without any of the elements being stored in memory. All of the other operations associated with lists (adding and removing elements, updating elements, concatenating other lists, and so on) have efficient default implementations, which will be discussed in the next section.

An unsorted set is just a list of elements, plus a function that, given an value, returns an OptionalLong describing the index of that element:

Function<Long, OptionalLong> indexOf = n -> (0 <= n && n < list.size()) ? OptionalLong.of(i) : OptionalLong.empty();

ISet<Long> set = Sets.from(list, indexOf)

A sorted set, conversely, is a list of elements, a comparator, and a function that, given a value, returns an OptionalLong describing the index of the closest element which equal to or less than that value (referred to as the "floor index"):

Function<Double, OptionalLong> floorIndexOf = n -> indexOf.apply((long) n);

ISet<Double> sortedSet = Sets.from(list, Comparator.naturalOrder(), floorIndexOf);

Sorted and unsorted maps are just their corresponding sets, plus a function from key to value. These can be constructed using Maps.from, or by calling zip on a set:

IMap<Long, Double> squareRoots = set.zip(n -> Math.sqrt(n))

diffs

These virtual collections can be modified just like any other Bifurcan collection:

Lists.from(1, x -> 1).addLast(42)
// [1, 42]

This is made possible by diffs, which track changes on an immutable underlying collection. Diff implementations exists for all variants of Bifurcan collections, and share the asymptotic performance of their normal counterparts. By calling diff() on any collection, we create a diff wrapper whose changes can then be rebased onto a new underlying collection:

IList<Integer> numDiff = List.of(1, 2, 3).diff().removeFirst().addLast(42)
// [2, 3, 42]

IList<Integer> rebased = numDiffs.rebase(List.of(4, 5, 6))
// [5, 6, 42]

durable collections

All in-memory structures can be also saved to disk, while retaining the same API and asymptotic performance. These durable collections are optimized for reads and batched writes, which means they are not a replacement for general-purpose databases, but they are still useful in a variety of applications.

no lazy collections

Most libraries for "functional programming" provide a lazy list or stream construct, but this is less a data structure than a mechanism for control flow. While useful, such a mechanism is out of scope for this library.

splitting and merging

Many modern data structure libraries also provide "parallel" collections, which make it easy to use multiple cores to process a single data structure. These collections, however, are simply normal data structures with an execution model bolted on, without any obvious way to disentangle the two.

Rather than provide its own execution model, Bifurcan allows any collection to split into sub-collections using split(k), which will return approximately k pieces, depending on its size. The sub-collections can be processed and then merged using methods such as concat, merge, union, intersection, or difference.

This separation of concerns provides greater flexibility, but also requires more work to get up and running.

license

Copyright ยฉ 2016-2020 Zachary Tellman

Distributed under the MIT License

bifurcan's People

Contributors

aphyr avatar fehrenbach avatar gknauth avatar jafingerhut avatar mburyakov avatar mikeb-btw avatar pangloss avatar rgkirch avatar zajac avatar zolotov avatar ztellman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bifurcan's Issues

Add ability to create subclasses

I'm implementing a toy lisp based on kanaka/mal using bifurcan, and I've come across two cases where it would be useful to be able to create subclasses of the bifurcan structures. These are difficult to create since the methods create new instances of the structures and there's no way to create a subclass when that happens.

  1. MAL requires me to support Clojure-style lists and vectors. I'd like to use bifurcan's List for both since it supports the efficient operations of both, but I need to be able to record which was read (i.e. list or vector) so that I can later print it correctly. There's no really good way to do this at the moment. I ended up creating a Vector class implementing IList which delegates to an internal List, and for the methods which return a new List I then wrap them in a new Vector, but that's pretty ugly.
  2. I'd like to implement something like Clojure's IObj protocol to allow metadata to be stored on the collections too, but that's equally tricky. I could probably implement some equally ugly workaround, but a nicer way to do this would be great.

I've only looked at List so far, but it seems like the operations create new instances via the private List(boolean linear, Node root, int prefixLen, Object[] prefix, int suffixLen, Object[] suffix) constructor. Could the new instances instead be created by calling a protected method which in List just calls that constructor, but in a subclass could create a Vector, IObjList or whatever?

I'm not sure what the implications are for other interfaces such as Concat though.

SortedMap should have the possibility to supply a custom `Comparator`

SortedMap already has a field comparator, but is is always set to Comparator.naturalOrder().

There is also the private constructor SortedMap(Node<K, V> root, boolean linear, Comparator<K> comparator) that is used by SortedMap internally - although it can also be invoked with reflection magic. Tests show that it seems to work fine when supplying a custom Comparator.

But there should be an "official" possibility.

SortedMap should make use of covariant return types.

The following methods should have their return type adapted.

  • SortedMap#remove(K) should return SortedMap<K, V>
  • SortedMap#put(K, V) should return SortedMap<K, V>
  • SortedMap#put(K, V, BinaryOperator) should return SortedMap<K, V>

Currently they always need to be re-casted to (SortedMap(K, V)) which makes the code a bit ugly.

DAG bottoms seem to include extra vertices

I get the impression that DAG bottom sets should be just those vertices with no out edges. However, it looks like they also include nodes in the middle:

(let [g (-> (g/directed-acyclic-graph)
              (g/link :a1 :b)
              (g/link :a2 :b)
              (g/link :b  :c1)
              (g/link :b  :c2)
              (g/link :a3 :c3))]
    (testing "top"
      (is (= #{:a1 :a2 :a3} (datafy (g/top g)))))
    (testing "bottom"
      ; Is the presence of b here a bug, or expected behavior?
      (is (= #{:b :c1 :c2 :c3} (datafy (g/bottom g)))))))

Forked set is modified after baked linear set modification

Hi, thank you for the great library!
I've found out that the forked set can be modified in case the initial linear set gets any modifications.
This behaviour doesn't reproduce for lists.
Here is the code example:

import io.lacuna.bifurcan.List;
import io.lacuna.bifurcan.Set;

public class Main {
    public static void main(String[] args) {
        Set<Integer> linearSet = new Set<Integer>().linear();
        linearSet.add(1);
        Set<Integer> forkedSet = linearSet.forked();
        linearSet.add(2);

        System.out.println("Linear set: " + linearSet);
        System.out.println("Forked set: " + forkedSet);

        List<Integer> linearList = new List<Integer>().linear();
        linearList.addLast(1);
        List<Integer> forkedList = linearList.forked();
        linearList.addLast(2);

        System.out.println("Linear list: " + linearList);
        System.out.println("Forked list: " + forkedList);
    }
}

And here is the produced output:

Linear set: {1, 2}
Forked set: {1, 2}
Linear list: [1, 2]
Forked list: [1]

I'd expect that the "Forked set" remains as {1}.

Here is the full project code: https://github.com/AlexPl292/distraught-chimera

Upscaling/Downscaling between the Bifurcan List and the Clojure Vector

I've noticed that there's quite a lot of similarities between the two structures. Is there a way to have a hybrid structure that can upscale the simpler clojure vector if need be to an RRB Tree?

This will get rid of a problem that's always bothered me:

(conj (cons [...] 1) 2) => (2, 1, ...)  ;; which is a Linked List, not a vector

will then be:

(conj (cons [...] 1) 2) => [1 ... 2]  ;; which is an upscaled Bifurcan List

Possible NPE in MapNodes#put

There appears to be a null pointer exception that occurs at this line

long prevSize = child.size();

The rest of the NPE stacktrace

! java.lang.NullPointerException: null
! at io.lacuna.bifurcan.nodes.MapNodes$Node.put(MapNodes.java:193) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:139) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:135) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:130) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:24) ~[bifurcan-0.1.0.jar:na]

I see that my project is using an older version of Bifurcan, but judging from my cursory read of the MapNodes implementation, it seems the NPE is still possible?

I suspect the NPE behavior is also non-deterministic. In my testing environment, I had four separate processes populating bifurcan.Map with data from the same source, but only one resulted in the NPE mentioned above.

Unfortunately, I don't have more precise reproduction steps for you. But will update here if I end up finding a test that results in the NPE.

Some List RRB trees constructed deeper than necessary

So far I do not know that this is a correctness issue, only a minor performance issue from evidence I have now. Starting from an empty list and appending elements, some trees are constructed with 1 extra level of depth/height than necessary.

$ clj "-J-Dclojure.server.repl={:port 50505 :accept clojure.core.server/repl}" -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"}}}'
(import '(io.lacuna.bifurcan List))

;; (capac s) is the maximum number of vector elements that can be
;; descendants beneath a tree node with shift s.
(defn capac [shift]
  (bit-shift-left 1 (+ shift 5)))

;; make private root field accessible
(def list-class (class (List.)))
(def list-root-field (.getDeclaredField list-class "root"))
(.setAccessible list-root-field true)

(defn get-root [v]
  (.get list-root-field v))

(defn listnode? [x]
  (instance? io.lacuna.bifurcan.nodes.ListNodes$Node x))

(defn descendant-depths [n]
  (loop [remaining-nodes [[0 n]]
         depth->count {}]
    (if (zero? (count remaining-nodes))
      depth->count
      (let [[depth node] (peek remaining-nodes)
            depth->count (update depth->count depth (fnil inc 0))
            children (if (listnode? node)
                       (take (.numNodes node) (seq (.nodes node))))]
        (recur (into (pop remaining-nodes)
                     (map (fn [x] [(inc depth) x]) children))
               depth->count)))))

(defn list-n [n]
  (reduce (fn [v e] (.addLast v e))
          (List.)
          (range n)))

(def n1 (+ (capac 15) (- (capac 10)) (capac 5) (capac 0)))
(def pv (list-n n1))
(descendant-depths (get-root pv))
;; {0 1, 1 1, 2 32, 3 994, 4 31777}

For a List with fewer than (capac 15) elements, I would expect a tree
with shift 15 at the root, but this is the smallest value of n I found
that causes the tree to go to shift 20 at the root, a bit too early.

As far as I can tell so far this is not a correctness issue -- nth
returns the correct value for all indices, for example. It is a
performance issue.

I have not tried changes to the code yet to modify this behavior, but
I believe it is because the condition to add a new level of depth to
the tree in method pushLast only checks that the bottom-most node has
the maximum branching factor, and the root node does, but not that all
of the other nodes on the "right tree edge" have the maximum branching
factor. Thus the code can add a new level of depth to the tree when
an intermediate depth node on the "right tree edge" has only 1 node.

(def wrong-count (atom 0))
@wrong-count
(dotimes [i n1]
  (when (not= (.nth pv i) i)
    (swap! wrong-count inc)))
@wrong-count
;; 0 - good

Build errors

I tried to build a jar with lein and I get this error:

sputnik% lein jar
Compiling 26 source files to /home/james/code/forks/java/bifurcan/target/classes
/home/james/code/forks/java/bifurcan/src/io/lacuna/bifurcan/utils/Iterators.java:3: error: package clojure.lang does not exist
import clojure.lang.Obj;
                   ^
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 error
Compilation of Java sources(lein javac) failed.

So I tried your benchmark alias to see if something else worked and I get this:

sputnik% lein benchmark
Retrieving criterium/criterium/0.4.3/criterium-0.4.3.pom from clojars
Retrieving proteus/proteus/0.1.6/proteus-0.1.6.pom from clojars
Retrieving eftest/eftest/0.1.4/eftest-0.1.4.pom from clojars
Retrieving progrock/progrock/0.1.2/progrock-0.1.2.pom from clojars
Retrieving io/aviso/pretty/0.1.33/pretty-0.1.33.pom from clojars
Retrieving criterium/criterium/0.4.3/criterium-0.4.3.jar from clojars
Retrieving progrock/progrock/0.1.2/progrock-0.1.2.jar from clojars
Retrieving eftest/eftest/0.1.4/eftest-0.1.4.jar from clojars
Retrieving io/aviso/pretty/0.1.33/pretty-0.1.33.jar from clojars
Retrieving proteus/proteus/0.1.6/proteus-0.1.6.jar from clojars
Compiling 26 source files to /home/james/code/forks/java/bifurcan/target/classes
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
benchmarking LinearMap
Exception in thread "main" java.lang.NullPointerException, compiling:(/tmp/form-init9040329292974367555.clj:1:73)
	at clojure.lang.Compiler.load(Compiler.java:7391)
	at clojure.lang.Compiler.loadFile(Compiler.java:7317)
	at clojure.main$load_script.invokeStatic(main.clj:275)
	at clojure.main$init_opt.invokeStatic(main.clj:277)
	at clojure.main$init_opt.invoke(main.clj:277)
	at clojure.main$initialize.invokeStatic(main.clj:308)
	at clojure.main$null_opt.invokeStatic(main.clj:342)
	at clojure.main$null_opt.invoke(main.clj:339)
	at clojure.main$main.invokeStatic(main.clj:421)
	at clojure.main$main.doInvoke(main.clj:384)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at java.io.StringReader.<init>(StringReader.java:50)
	at clojure.lang.RT.readString(RT.java:1834)
	at clojure.lang.RT.readString(RT.java:1830)
	at clojure.core$read_string.invokeStatic(core.clj:3687)
	at clojure.core$read_string.invoke(core.clj:3677)
	at bifurcan.benchmark_test$benchmark_collection.invokeStatic(benchmark_test.clj:602)
	at bifurcan.benchmark_test$benchmark_collection.invoke(benchmark_test.clj:601)
	at bifurcan.benchmark_test$_main$fn__6660.invoke(benchmark_test.clj:625)
	at clojure.core$map$fn__4785.invoke(core.clj:2644)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:521)
	at clojure.core$seq__4357.invokeStatic(core.clj:137)
	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24)
	at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75)
	at clojure.core.protocols$fn__6738.invoke(protocols.clj:75)
	at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6545)
	at clojure.core$into.invokeStatic(core.clj:6610)
	at clojure.core$into.invoke(core.clj:6604)
	at bifurcan.benchmark_test$_main.invokeStatic(benchmark_test.clj:626)
	at bifurcan.benchmark_test$_main.doInvoke(benchmark_test.clj:609)
	at clojure.lang.RestFn.invoke(RestFn.java:410)
	at clojure.lang.Var.invoke(Var.java:379)
	at user$eval5.invokeStatic(form-init9040329292974367555.clj:1)
	at user$eval5.invoke(form-init9040329292974367555.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6927)
	at clojure.lang.Compiler.eval(Compiler.java:6917)
	at clojure.lang.Compiler.load(Compiler.java:7379)
	... 14 more
sputnik% java -version 
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (Zulu 8.20.0.5-linux64) (build 1.8.0_121-b15)
OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-linux64) (build 25.121-b15, mixed mode)

bug in List removeLast method

The value of n shown below is the smallest one I know of that exhibits this problem.

One issue with generative testing and data structures like List with wide branching factors is that it can take a large number of operations to reach "interesting" tree structures that exercise new code paths. I have found that making a local change with SHIFT_INCREMENT=2 and thus MAX_BRANCHES=(1 << 2) = 4 causes the existing generative tests to find issues like this in a fairly short period of time.

That requires a few replacements of literal constants 5 and 32 in the List.java and ListNodes.java source files to be replaced with appropriate names, to avoid replacing them manually in multiple places. I can submit a PR with such changes if that would be of interest.

$ clj -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"} org.clojure/clojure {:mvn/version "1.10.1"}}}'
(import '(io.lacuna.bifurcan List))
(def n 33824)
(def l1 (.concat (List.) (List/from (range n))))
(= (seq (range n)) (iterator-seq (.iterator l1)))
;; true.  Good
(.size l1)
;; 33824 - the expected size

(def l2 (.removeLast l1))
(= (seq (range (dec n))) (iterator-seq (.iterator l2)))
;; false.  bug!
(.size l2)
;; 1055 - not even close to the correct value

concat then slice throws exception with some List instances

A smaller example was found via generative testing with List code locally modified with SHIFT_INCREMENT=2 MAX_BRANCH=(1 << 2)=4. Then I tried the scaled up version with a similar tree structure and got the same error.

Everything goes well up until the final slice method call.

$ clj -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"} org.clojure/clojure {:mvn/version "1.10.1"}}}'
Clojure 1.10.1
user=> (import '(io.lacuna.bifurcan List))
io.lacuna.bifurcan.List
user=> (def b 32)
#'user/b
user=> (def n (+ (* b b b) (- (* b b)) b))
#'user/n
user=> n
31776
(defn same-seq [a b]
  (= (seq a) (seq b)))
#'user/same-seq
user=> (def r1 (range 2))
#'user/r1
user=> (def r2 (range 2 (+ 2 n)))
#'user/r2
user=> (def l1 (.concat (List.) (List/from r1)))
#'user/l1
user=> (def l2 (.concat (List.) (List/from r2)))
#'user/l2
user=> (same-seq r1 l1)
true
user=> (same-seq r2 l2)
true
user=> (def l3 (.concat l1 l2))
#'user/l3
user=> (def r3 (concat r1 r2))
#'user/r3
user=> (same-seq r3 l3)
true
user=> (def l4 (.slice l3 1 3))
Execution error (ArrayIndexOutOfBoundsException) at io.lacuna.bifurcan.nodes.ListNodes$Node/pushLast (ListNodes.java:364).
-1
user=> (pst)
ArrayIndexOutOfBoundsException -1
	io.lacuna.bifurcan.nodes.ListNodes$Node.pushLast (ListNodes.java:364)
	io.lacuna.bifurcan.nodes.ListNodes.pushLast (ListNodes.java:34)
	io.lacuna.bifurcan.nodes.ListNodes$Node.slice (ListNodes.java:240)
	io.lacuna.bifurcan.List.slice (List.java:228)
	sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2)
	sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
	sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
	java.lang.reflect.Method.invoke (Method.java:498)
	clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:167)
	clojure.lang.Reflector.invokeInstanceMethod (Reflector.java:102)
	clojure.lang.Compiler$InstanceMethodExpr.eval (Compiler.java:1555)
	clojure.lang.Compiler$DefExpr.eval (Compiler.java:457)
nil

AIOOBE on List slice/concat usage

Hi!

I have encountered a bug in List implementation when tried to implement Sorted Array data structure. I've been using concat and slice operations heavily to insert new elements at proper places, and it turned out my data structure failed after several iterations.

I've logged requests to List and recreated a pure test which fails with internal AIOOBE. Just in case, I've re-checked, the indices queried are correct and exist.

java.lang.ArrayIndexOutOfBoundsException: -1

	at io.lacuna.bifurcan.nodes.ListNodes$Node.pushLast(ListNodes.java:364)
	at io.lacuna.bifurcan.nodes.ListNodes.pushLast(ListNodes.java:34)
	at io.lacuna.bifurcan.nodes.ListNodes$Node.slice(ListNodes.java:240)
	at io.lacuna.bifurcan.List.slice(List.java:228)
	at io.lacuna.bifurcan.List.slice(List.java:19)

Here's the link to the gist: https://gist.github.com/valich/0958ed848ff5999532de59f360a2f899
I understand this is not a minimal sample, but I hope this will be enough to locate the problem. Please tell if you need more info.

The bug is present in 0.1.0 and 0.2.0-alpha.

Oh, yes, and removing linear/forked calls does not help either if that's important.

timing in clojure gives slower results than clojure's persistent datastructures on inserts

I'm doing just a simple benchmark test and am finding that the standard implementation in clojure (vec) is faster than List and LinearList on inserts

[(h/bench-ns {}
     (dotimes [i 4]
       (let [^LinearList v (LinearList.)]
         (dotimes [i 1000000]
           (.addFirst ^LinearList v (Object.)))
         (.size v))))
   
   (h/bench-ns {}
     (dotimes [i 4])
     (reduce (fn [v i]
               (.addFirst ^List v (Object.)))
               (List.)
               (range 1000000)))
   
   (h/bench-ns {}
     (dotimes [i 4])
       (reduce (fn [v i]
                 (conj v (Object.)))
               []
               (range 1000000)))]

  [85886785
   43148048
   36295950]

  [86910871
   48210347
   37343646]

this is quite rough but I'm running it a couple of times and it consistently shows the same results. Is there a way to speed it up?

Bug - remove fails on VirtualMap

When starting from a LinearMap and converting to a Map, the subsequent remove(key) call on the returned VirtualMap does nothing. I think this is probably a bug!

// TEST 1 (FAILS)
final IMap<String, String> linearMap = LinearMap.from(Arrays.asList(new Maps.Entry<>("a", "b")));
assert(linearMap.contains("a"));

final IMap<String, String> map = linearMap.forked();
assert(map.contains("a"));

assert(!map.remove("a").contains("a"));  // this FAILS

Interestingly when working the other way around, i.e. from Map to LinearMap everything works fine, i.e.

// TEST 2 (OK)
final IMap<String, String> map = Map.from(Arrays.asList(new Maps.Entry<>("a", "b")));
assert(map.contains("a"));

final IMap<String, String> linearMap = map.linear();
assert(linearMap.contains("a"));

assert(!linearMap.remove("a").contains("a"));  // this is OK

Even stranger, this one also seems to work, which makes me think the problem is definitely caused by LinearMap#forked() returning a VirtualCollection (in our first example):

// TEST 3 (OK)
final IMap<String, String> map = Map.from(Arrays.asList(new Maps.Entry<>("a", "b")));
assert(map.contains("a"));

final IMap<String, String> linearMap = map.linear();
assert(linearMap.contains("a"));

final IMap<String, String> map2 = linearMap.forked();

assert(!map2.remove("a").contains("a"));  // this is OK

Do several methods mutate the internals of immutable List instances?

It appears that several methods like addLast, addFirst can cause method grow to be called, even if the List is documented as immutable, e.g. isLinear() returns false.

I agree that the code is mutating the node in such a way that smaller arrays are replaced with larger arrays that have the same contents in the indices that are common to both the old and new arrays, which is good.

The red flag it raises in my mind is that if thread T1 is executing grow on a forked List, and thread T2 is doing read-only operations on it concurrently, then modifications made by T1 might not be safely published to T2, and T2 could start getting inconsistent results for different array elements.

Does that sound possible?

No documentation about the maximum number of elements that can be stored in a List

I recently tried to test the limits of the number of elements that can be stored in an instance of List<Long> and I learned (as I suspected) that indexing will not work properly with values that are greater than Integer.MAX_VALUE. This is because there is a downcast from long to int at line 79 of List.java. I think most users will assume that because long is used to index this collection that it will store more than Integer.MAX_VALUE number of elements. Some documentation to clarify this or to update the implementation to allow more values would be helpful. ๐Ÿ˜ƒ

Graphs with custom eq/hash don't seem to use them

This might be my bug, but it kinda looks like creating graphs with custom equality/hash semantics can result in duplicates. For instance, from the bifurcan-clj test suite:

  (testing "custom equality/hash"
    ; Very weird: I'm not sure why we get duplicate vertices here. Bug in
    ; bifurcan maybe?
    (is (= {:x #{:y "y"}
            :y #{:x "x"}}
           (-> (g/graph #(hash (name %)) (fn [a b] (= (name a) (name b))))
               (g/link :x :y)
               (g/link "y" "x")
               datafy)))))

Confusion over when to use linear/forked

I am a little perplexed over when it is appropriate to call fork or linear.

I have called map2.forked before I then call map2.linear to create map3, however I see changes made to my map3 reflected in my map2. I had thought that forked would mean that changes to map3 don't affect map2.

It may be that I have misunderstood the docs and this isn't a bug after all. Some guidance would be appreciated please...

    @Test
    public void bifurcanImmutableRemoveThenRemove() {

        /*
          1. Create the initial map1: `map { 1: true(), 2: true() }`
         */
        IMap<Integer, Boolean> map1 = new LinearMap<>();
        map1.put(1, true);
        map1.put(2, true);
        map1 = map1.forked();  // make the map immutable
        checkMapIsForked(map1);

        /*
          2. Create map2 and remove the entry with key `2`
         */
        IMap<Integer, Boolean> map2 = map1.linear();  // create a transient map for modifications
        map2 = map2.remove(2);
        map2 = map2.forked();  // make the map immutable
        checkMapIsForked(map2);

        /*
          3. Check the entries in map2 with key `1` and `2`
         */
        final Boolean expected1 = map2.get(1, null);
        assertTrue(expected1);
        final Boolean expected2 = map2.get(2, null);
        assertNull(expected2);

        /*
         4. Create map3 and remove the entry in the with key `1`
         */
        IMap<Integer, Boolean> map3 = map2.linear();  // create a transient map for modifications
        map3 = map3.remove(1);
        map3 = map3.forked();  // make the map immutable
        checkMapIsForked(map3);

        /*
         I expect map2.get(1, null) to return true
        */
        assertEquals(expected1, map2.get(1, null));

        /*
         I expect map3.get(1, null) to return null
         */
        assertNotEquals(expected1, map3.get(1, null));
    }

interpretor/compiler for bifurcan structures

It'll be really cool if there was a way for existing clojure code to be interpreted/compiled using bifurcan datastructures:

ie.

(assoc {} :a 1 :b 2) => io.lacuna.bifurcan.Map

(conj [] 1 2 3) => io.lacuna.bifurcan.List

I'm not too sure of the difficulty or the feasibility.

Iterator is incorrect after remove on Linear Map

Sorry to report another one... I am not sure if this bug existed before #23 was fixed, or if it is a regression. I suspect it is related though.

Unfortunately after removing from a LinearMap the iterator over the keys of that map produces the wrong keys.

The test below shows that:

  1. starting from new Map<>().linear() works as expected
  2. but that starting from new LinearMap<>() causes the wrong keys to be returned by iteration after a remove(key).
import io.lacuna.bifurcan.*;
import org.junit.Test;

import java.util.Iterator;

import static org.junit.Assert.*;

public class BifurcanTest {
    @Test
    public void removeFromLinearMap() {
        final IMap<String, String> map = new LinearMap<>();
        removeRemove(map);
    }

    @Test
    public void removeFromMapConvertedToLinear() {
        final IMap<String, String> map = new Map<String, String>().linear();
        removeRemove(map);
    }

    private void removeRemove(final IMap<String, String> map) {
        storeDays(map);

        final IMap<String, String> allDaysMap = map.forked();  // creates DiffMap from LinearMap (when starting with LinearMap)

        final IMap<String, String> withoutWeds = remove(allDaysMap, "We");
        assertSetEquals(Set.of("Su", "Mo", "Tu", "Th", "Fr", "Sa"), withoutWeds.keys());
        assertSetEqualsByIteration(Set.of("Su", "Mo", "Tu", "Th", "Fr", "Sa"), withoutWeds.keys());

        final IMap<String, String> withoutWedsAndTues = remove(withoutWeds, "Tu");
        assertSetEquals(Set.of("Su", "Mo", "Th", "Fr", "Sa"), withoutWedsAndTues.keys());
        assertSetEqualsByIteration(Set.of("Su", "Mo", "Th", "Fr", "Sa"), withoutWedsAndTues.keys());
    }

    private IMap<String, String> remove(final IMap<String, String> immutableMap, final String... keys) {
        assertTrue( !immutableMap.isLinear());

        // create a mutable map
        IMap<String, String> mutableMap = immutableMap.linear();
        assertTrue(mutableMap.isLinear());

        for (final String key: keys) {
            mutableMap = mutableMap.remove(key);

            assertFalse(mutableMap.contains(key));
            assertFalse(mutableMap.keys().contains(key));

            //TODO(AR) the blow assertion fails where it should not - this is because toString also makes use of Iterator! Should be uncommented when the Iterator is fixed...
//            assertEquals(-1, mutableMap.keys().toString().indexOf(key));
        }

        // return an immutable map
        return mutableMap.forked();

    }

    private void storeDays(final IMap<String, String> map) {
        assertTrue(map.isLinear());

        map.put("Su", "Sunday");
        map.put("Mo", "Monday");
        map.put("Tu", "Tuesday");
        map.put("We", "Wednesday");
        map.put("Th", "Thursday");
        map.put("Fr", "Friday");
        map.put("Sa", "Saturday");

        assertSetEquals(Set.of("Su", "Mo", "Tu", "We", "Th", "Fr", "Sa"), map.keys());
    }

    private static <T> void assertSetEquals(final ISet<T> expected, final ISet<T> actual) {
        assertEquals("Expected equal size", expected.size(), actual.size());
        assertTrue(actual.containsAll(expected));
    }

    private static <T> void assertSetEqualsByIteration(final ISet<T> expected, final ISet<T> actual) {
        assertEquals("Expected equal size", expected.size(), actual.size());

        for (final Iterator<T> itActual = actual.iterator(); itActual.hasNext(); ) {
            final T act = itActual.next();
            assertTrue("`actual` has key '" + act + "', but this is not in `expected`!", expected.contains(act));
        }
    }
}

Possible accidental mutation

Hi, first off, very cool library & benchmarks!

I'm not sure if I'm doing something wrong or making an incorrect assumption, but I think I've found a bug causing accidental mutation in List in 0.2.0-alpha6, see my minifed reproducer:

import io.lacuna.bifurcan.List;

public class Test {
        public static void main(String args[]) {
                int size = 32;
                List<Integer> a = new List<Integer>();
                List<Integer> b = new List<Integer>();
                for (int i = 0; i < size; i++) {
                        a = a.addLast(1);
                        b = b.addLast(1);
                }
                List<Integer> ap = a.set(15, 2);
                List<Integer> bp = b.set(15, 2);
                List<Integer> z = null;
                // Comment out, and the equals is true
                z = ap.set(15, 3);
                System.out.println("a: " + a);
                System.out.println("b: " + b);
                System.out.println("ap:" + ap);
                System.out.println("bp:" + bp);
                System.out.println("z: " + z);
                System.out.println("Should be true: " + ap.equals(bp));
        }
}

Will print:

~/test โžœ javac -cp "./bifurcan-0.2.0-alpha6.jar" Test.java && java -cp ".:./bifurcan-0.2.0-alpha6.jar" Test
a: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
b: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
ap:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
bp:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
z: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Should be true: false

Commenting out the z = ap.set(15,3); line, or changing the size to anything less than 32 will make the result true.

Allow null values for Graphs.Edge

As far as I can tell the io.lacuna.bifurcan.Graphs.Edge<V, E> class serves as a standard implementation for IEdge<V, E>.

As in many graph instances the edges won't have a special value, it should be allowed to set the value to null. But now, the #hashCode method will throw a NullPoinrterException.

It should either be changed to:

@Override
public int hashCode() {
  if (hash == -1) {
    hash = from.hashCode() ^ to.hashCode() ^ (vale == null ? 0 : value.hashCode());
  }
  return hash;
}

or to:

@Override
public int hashCode() {
  if (hash == -1) {
    hash = Objects.hash(from, to, value);
  }
  return hash;
}

If either from or to is null a NullPointerException should be thrown by the constructor.

Sliced list cannot be made linear

I am using version 0.2.0-alpha6.

In the following, a list of size 1 is sliced to make another list of size 1 that is then correctly made linear.

Then, a list of size 2 is sliced into a list of size 1. This new list is attempted to be made linear but that fails.

import io.lacuna.bifurcan.IList;
import io.lacuna.bifurcan.LinearList;

public class Main {
  public static void main(String[] args) {
    LinearList<Integer> list1 = new LinearList<>();
    list1.addLast(1);
    IList<Integer> slice1 = list1.slice(0, 1).linear();
    System.out.println("Slice 1 is linear: " + slice1.isLinear());

    LinearList<Integer> list2 = new LinearList<>();
    list2.addLast(1);
    list2.addLast(2);
    IList<Integer> slice2 = list2.slice(0, 1).linear();
    System.out.println("Slice 2 is linear: " + slice2.isLinear());
  }
}

The output is:

Slice 1 is linear: true
Slice 2 is linear: false

The output should be:

Slice 1 is linear: true
Slice 2 is linear: true

`DirectedGraph#edge(V, V)` can be improved by removing the `Optional` allocation

Currently, DirectedGraph#edge relies on Map#get(K)which returns an optional.

This could be improved by using Map#get(K, V)with a default value - especially in highly frequently used graph traversals this could save man allocations.

So instead of:

  @Override
  public E edge(V from, V to) {

    Map m = out.get(from).orElseThrow(() -> new IllegalArgumentException("no such edge"));
    Object e = m.get(to, DEFAULT);

    if (e == DEFAULT) {
      throw new IllegalArgumentException("no such edge");
    } else {
      return (E) e;
    }
  }

It could be something like:

  @Override
  public E edge(V from, V to) {

    Map m = out.get(from, DEFAULT_MAP)
    if (m == DEFAULT) {
      throw new IllegalArgumentException("no such edge");
   }

    Object e = m.get(to, DEFAULT);

    if (e == DEFAULT) {
      throw new IllegalArgumentException("no such edge");
    } else {
      return (E) e;
    }
  }

IntMap.mapValues crashes with a NPE

Not totally sure what's up here, but map-value seems to work OK on normal maps, but not on int-maps. It looks like it passes a null value to the function?

(deftest ^:buggy map-values-test
  (let [m (i/from {1 10, 3 30})]
    (testing "IMap"
      ; This fails due to a bug I think
      (is (= {1 11 3 31} (datafy (m/map-values m (fn [k v]
                                                   (prn :k k :v v)
                                                   v))))))))
:k 0 :v nil
...
                          bifurcan-clj.int-map-test/fn                                                           int_map_test.clj:   77
                           bifurcan-clj.map/map-values                                                                    map.clj:  202
                   io.lacuna.bifurcan.IntMap.mapValues                                                                IntMap.java:   23
                   io.lacuna.bifurcan.IntMap.mapValues                                                                IntMap.java:  286
     io.lacuna.bifurcan.nodes.IntMapNodes$Node.mapVals                                                           IntMapNodes.java:  197
java.lang.NullPointerException: Cannot invoke "io.lacuna.bifurcan.nodes.IntMapNodes$Node.mapVals(Object, java.util.function.BiFunction)" because "<local3>.content[<local4>]" is null

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.