lacuna / bifurcan Goto Github PK

functional, durable data structures

License: MIT License

Clojure 15.40% Java 83.48% Gnuplot 1.10% Shell 0.01%

bifurcan's Issues

Graphs.merge doesn't actually merge graphs!

Graphs.merge creates a result variable, but then mutates variable a instead of result, and returns result. Merge winds up being a noop for immutable graphs. :-O

bifurcan/src/io/lacuna/bifurcan/Graphs.java

Line 191 in 32b3fbf

a = a.link(src, dst, b.edge(src, dst), merge);

Sliced list cannot be made linear

I am using version 0.2.0-alpha6.

In the following, a list of size 1 is sliced to make another list of size 1 that is then correctly made linear.

Then, a list of size 2 is sliced into a list of size 1. This new list is attempted to be made linear but that fails.

import io.lacuna.bifurcan.IList;
import io.lacuna.bifurcan.LinearList;

public class Main {
  public static void main(String[] args) {
    LinearList<Integer> list1 = new LinearList<>();
    list1.addLast(1);
    IList<Integer> slice1 = list1.slice(0, 1).linear();
    System.out.println("Slice 1 is linear: " + slice1.isLinear());

    LinearList<Integer> list2 = new LinearList<>();
    list2.addLast(1);
    list2.addLast(2);
    IList<Integer> slice2 = list2.slice(0, 1).linear();
    System.out.println("Slice 2 is linear: " + slice2.isLinear());
  }
}

The output is:

Slice 1 is linear: true
Slice 2 is linear: false

The output should be:

Slice 1 is linear: true
Slice 2 is linear: true

Removing nodes from DAGS leaves those nodes in top/bottom

It looks like DirectedAcyclicGraph.remove only ever expands the top/bottom sets, which means that removing a vertex from a DAG leaves it in the top and bottom sets indefinitely.

bifurcan/src/io/lacuna/bifurcan/DirectedAcyclicGraph.java

Line 206 in 32b3fbf

 top.union(graph.out(vertex).stream().filter(v -> graph.in(v).size() == 1).collect(Sets.collector())); 

Graphs with custom eq/hash don't seem to use them

This might be my bug, but it kinda looks like creating graphs with custom equality/hash semantics can result in duplicates. For instance, from the bifurcan-clj test suite:

  (testing "custom equality/hash"
    ; Very weird: I'm not sure why we get duplicate vertices here. Bug in
    ; bifurcan maybe?
    (is (= {:x #{:y "y"}
            :y #{:x "x"}}
           (-> (g/graph #(hash (name %)) (fn [a b] (= (name a) (name b))))
               (g/link :x :y)
               (g/link "y" "x")
               datafy)))))

SortedMap should make use of covariant return types.

The following methods should have their return type adapted.

SortedMap#remove(K) should return SortedMap<K, V>
SortedMap#put(K, V) should return SortedMap<K, V>
SortedMap#put(K, V, BinaryOperator) should return SortedMap<K, V>

Currently they always need to be re-casted to (SortedMap(K, V)) which makes the code a bit ugly.

Build errors

I tried to build a jar with lein and I get this error:

sputnik% lein jar
Compiling 26 source files to /home/james/code/forks/java/bifurcan/target/classes
/home/james/code/forks/java/bifurcan/src/io/lacuna/bifurcan/utils/Iterators.java:3: error: package clojure.lang does not exist
import clojure.lang.Obj;
                   ^
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 error
Compilation of Java sources(lein javac) failed.

So I tried your benchmark alias to see if something else worked and I get this:

sputnik% lein benchmark
Retrieving criterium/criterium/0.4.3/criterium-0.4.3.pom from clojars
Retrieving proteus/proteus/0.1.6/proteus-0.1.6.pom from clojars
Retrieving eftest/eftest/0.1.4/eftest-0.1.4.pom from clojars
Retrieving progrock/progrock/0.1.2/progrock-0.1.2.pom from clojars
Retrieving io/aviso/pretty/0.1.33/pretty-0.1.33.pom from clojars
Retrieving criterium/criterium/0.4.3/criterium-0.4.3.jar from clojars
Retrieving progrock/progrock/0.1.2/progrock-0.1.2.jar from clojars
Retrieving eftest/eftest/0.1.4/eftest-0.1.4.jar from clojars
Retrieving io/aviso/pretty/0.1.33/pretty-0.1.33.jar from clojars
Retrieving proteus/proteus/0.1.6/proteus-0.1.6.jar from clojars
Compiling 26 source files to /home/james/code/forks/java/bifurcan/target/classes
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
benchmarking LinearMap
Exception in thread "main" java.lang.NullPointerException, compiling:(/tmp/form-init9040329292974367555.clj:1:73)
	at clojure.lang.Compiler.load(Compiler.java:7391)
	at clojure.lang.Compiler.loadFile(Compiler.java:7317)
	at clojure.main$load_script.invokeStatic(main.clj:275)
	at clojure.main$init_opt.invokeStatic(main.clj:277)
	at clojure.main$init_opt.invoke(main.clj:277)
	at clojure.main$initialize.invokeStatic(main.clj:308)
	at clojure.main$null_opt.invokeStatic(main.clj:342)
	at clojure.main$null_opt.invoke(main.clj:339)
	at clojure.main$main.invokeStatic(main.clj:421)
	at clojure.main$main.doInvoke(main.clj:384)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at java.io.StringReader.<init>(StringReader.java:50)
	at clojure.lang.RT.readString(RT.java:1834)
	at clojure.lang.RT.readString(RT.java:1830)
	at clojure.core$read_string.invokeStatic(core.clj:3687)
	at clojure.core$read_string.invoke(core.clj:3677)
	at bifurcan.benchmark_test$benchmark_collection.invokeStatic(benchmark_test.clj:602)
	at bifurcan.benchmark_test$benchmark_collection.invoke(benchmark_test.clj:601)
	at bifurcan.benchmark_test$_main$fn__6660.invoke(benchmark_test.clj:625)
	at clojure.core$map$fn__4785.invoke(core.clj:2644)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:521)
	at clojure.core$seq__4357.invokeStatic(core.clj:137)
	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24)
	at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75)
	at clojure.core.protocols$fn__6738.invoke(protocols.clj:75)
	at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6545)
	at clojure.core$into.invokeStatic(core.clj:6610)
	at clojure.core$into.invoke(core.clj:6604)
	at bifurcan.benchmark_test$_main.invokeStatic(benchmark_test.clj:626)
	at bifurcan.benchmark_test$_main.doInvoke(benchmark_test.clj:609)
	at clojure.lang.RestFn.invoke(RestFn.java:410)
	at clojure.lang.Var.invoke(Var.java:379)
	at user$eval5.invokeStatic(form-init9040329292974367555.clj:1)
	at user$eval5.invoke(form-init9040329292974367555.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6927)
	at clojure.lang.Compiler.eval(Compiler.java:6917)
	at clojure.lang.Compiler.load(Compiler.java:7379)
	... 14 more

sputnik% java -version 
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (Zulu 8.20.0.5-linux64) (build 1.8.0_121-b15)
OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-linux64) (build 25.121-b15, mixed mode)

interpretor/compiler for bifurcan structures

It'll be really cool if there was a way for existing clojure code to be interpreted/compiled using bifurcan datastructures:

ie.

(assoc {} :a 1 :b 2) => io.lacuna.bifurcan.Map

(conj [] 1 2 3) => io.lacuna.bifurcan.List

I'm not too sure of the difficulty or the feasibility.

What's in the 0.2.0-alpha2 release?

Hi @ztellman I see that 0.2.0-alpha2 appeared on Maven Central.

Unfortunately I don't see any Git tag or release on GitHub. I was wondering if you might be able to tell me what has changed since 0.2.0-alpha1 and if there was any resolution to our open issue #23 . Thanks :-)

Was there expected to be API changes? https://travis-ci.com/github/eXist-db/exist/jobs/321004943#L304

DAG bottoms seem to include extra vertices

I get the impression that DAG bottom sets should be just those vertices with no out edges. However, it looks like they also include nodes in the middle:

(let [g (-> (g/directed-acyclic-graph)
              (g/link :a1 :b)
              (g/link :a2 :b)
              (g/link :b  :c1)
              (g/link :b  :c2)
              (g/link :a3 :c3))]
    (testing "top"
      (is (= #{:a1 :a2 :a3} (datafy (g/top g)))))
    (testing "bottom"
      ; Is the presence of b here a bug, or expected behavior?
      (is (= #{:b :c1 :c2 :c3} (datafy (g/bottom g)))))))

Benchmark comparison with Capsule

There's a project with an implementation of CHAMP Trie based data structures - Capsule.

Do you think it'd make sense to add it to the benchmarks?

Possible NPE in MapNodes#put

There appears to be a null pointer exception that occurs at this line

bifurcan/src/io/lacuna/bifurcan/nodes/MapNodes.java

Line 195 in 7898040

long prevSize = child.size();

The rest of the NPE stacktrace

! java.lang.NullPointerException: null
! at io.lacuna.bifurcan.nodes.MapNodes$Node.put(MapNodes.java:193) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:139) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:135) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:130) ~[bifurcan-0.1.0.jar:na]
! at io.lacuna.bifurcan.Map.put(Map.java:24) ~[bifurcan-0.1.0.jar:na]

I see that my project is using an older version of Bifurcan, but judging from my cursory read of the MapNodes implementation, it seems the NPE is still possible?

I suspect the NPE behavior is also non-deterministic. In my testing environment, I had four separate processes populating bifurcan.Map with data from the same source, but only one resulted in the NPE mentioned above.

Unfortunately, I don't have more precise reproduction steps for you. But will update here if I end up finding a test that results in the NPE.

timing in clojure gives slower results than clojure's persistent datastructures on inserts

I'm doing just a simple benchmark test and am finding that the standard implementation in clojure (vec) is faster than List and LinearList on inserts

[(h/bench-ns {}
     (dotimes [i 4]
       (let [^LinearList v (LinearList.)]
         (dotimes [i 1000000]
           (.addFirst ^LinearList v (Object.)))
         (.size v))))
   
   (h/bench-ns {}
     (dotimes [i 4])
     (reduce (fn [v i]
               (.addFirst ^List v (Object.)))
               (List.)
               (range 1000000)))
   
   (h/bench-ns {}
     (dotimes [i 4])
       (reduce (fn [v i]
                 (conj v (Object.)))
               []
               (range 1000000)))]

  [85886785
   43148048
   36295950]

  [86910871
   48210347
   37343646]

this is quite rough but I'm running it a couple of times and it consistently shows the same results. Is there a way to speed it up?

IntMap.mapValues crashes with a NPE

Not totally sure what's up here, but map-value seems to work OK on normal maps, but not on int-maps. It looks like it passes a null value to the function?

(deftest ^:buggy map-values-test
  (let [m (i/from {1 10, 3 30})]
    (testing "IMap"
      ; This fails due to a bug I think
      (is (= {1 11 3 31} (datafy (m/map-values m (fn [k v]
                                                   (prn :k k :v v)
                                                   v))))))))

:k 0 :v nil
...
                          bifurcan-clj.int-map-test/fn                                                           int_map_test.clj:   77
                           bifurcan-clj.map/map-values                                                                    map.clj:  202
                   io.lacuna.bifurcan.IntMap.mapValues                                                                IntMap.java:   23
                   io.lacuna.bifurcan.IntMap.mapValues                                                                IntMap.java:  286
     io.lacuna.bifurcan.nodes.IntMapNodes$Node.mapVals                                                           IntMapNodes.java:  197
java.lang.NullPointerException: Cannot invoke "io.lacuna.bifurcan.nodes.IntMapNodes$Node.mapVals(Object, java.util.function.BiFunction)" because "<local3>.content[<local4>]" is null

add methods for type-safe creation of empty instances

The current EMPTY instances have no type parameters, similar to j.u.collection's empty instances.

Add an empty() method that performs the unchecked cast in the library, akin to Collections.emptyMap()

Iterator is incorrect after remove on Linear Map

Sorry to report another one... I am not sure if this bug existed before #23 was fixed, or if it is a regression. I suspect it is related though.

Unfortunately after removing from a LinearMap the iterator over the keys of that map produces the wrong keys.

The test below shows that:

starting from new Map<>().linear() works as expected
but that starting from new LinearMap<>() causes the wrong keys to be returned by iteration after a remove(key).

import io.lacuna.bifurcan.*;
import org.junit.Test;

import java.util.Iterator;

import static org.junit.Assert.*;

public class BifurcanTest {
    @Test
    public void removeFromLinearMap() {
        final IMap<String, String> map = new LinearMap<>();
        removeRemove(map);
    }

    @Test
    public void removeFromMapConvertedToLinear() {
        final IMap<String, String> map = new Map<String, String>().linear();
        removeRemove(map);
    }

    private void removeRemove(final IMap<String, String> map) {
        storeDays(map);

        final IMap<String, String> allDaysMap = map.forked();  // creates DiffMap from LinearMap (when starting with LinearMap)

        final IMap<String, String> withoutWeds = remove(allDaysMap, "We");
        assertSetEquals(Set.of("Su", "Mo", "Tu", "Th", "Fr", "Sa"), withoutWeds.keys());
        assertSetEqualsByIteration(Set.of("Su", "Mo", "Tu", "Th", "Fr", "Sa"), withoutWeds.keys());

        final IMap<String, String> withoutWedsAndTues = remove(withoutWeds, "Tu");
        assertSetEquals(Set.of("Su", "Mo", "Th", "Fr", "Sa"), withoutWedsAndTues.keys());
        assertSetEqualsByIteration(Set.of("Su", "Mo", "Th", "Fr", "Sa"), withoutWedsAndTues.keys());
    }

    private IMap<String, String> remove(final IMap<String, String> immutableMap, final String... keys) {
        assertTrue( !immutableMap.isLinear());

        // create a mutable map
        IMap<String, String> mutableMap = immutableMap.linear();
        assertTrue(mutableMap.isLinear());

        for (final String key: keys) {
            mutableMap = mutableMap.remove(key);

            assertFalse(mutableMap.contains(key));
            assertFalse(mutableMap.keys().contains(key));

            //TODO(AR) the blow assertion fails where it should not - this is because toString also makes use of Iterator! Should be uncommented when the Iterator is fixed...
//            assertEquals(-1, mutableMap.keys().toString().indexOf(key));
        }

        // return an immutable map
        return mutableMap.forked();

    }

    private void storeDays(final IMap<String, String> map) {
        assertTrue(map.isLinear());

        map.put("Su", "Sunday");
        map.put("Mo", "Monday");
        map.put("Tu", "Tuesday");
        map.put("We", "Wednesday");
        map.put("Th", "Thursday");
        map.put("Fr", "Friday");
        map.put("Sa", "Saturday");

        assertSetEquals(Set.of("Su", "Mo", "Tu", "We", "Th", "Fr", "Sa"), map.keys());
    }

    private static <T> void assertSetEquals(final ISet<T> expected, final ISet<T> actual) {
        assertEquals("Expected equal size", expected.size(), actual.size());
        assertTrue(actual.containsAll(expected));
    }

    private static <T> void assertSetEqualsByIteration(final ISet<T> expected, final ISet<T> actual) {
        assertEquals("Expected equal size", expected.size(), actual.size());

        for (final Iterator<T> itActual = actual.iterator(); itActual.hasNext(); ) {
            final T act = itActual.next();
            assertTrue("`actual` has key '" + act + "', but this is not in `expected`!", expected.contains(act));
        }
    }
}

Allow null values for Graphs.Edge

As far as I can tell the io.lacuna.bifurcan.Graphs.Edge<V, E> class serves as a standard implementation for IEdge<V, E>.

As in many graph instances the edges won't have a special value, it should be allowed to set the value to null. But now, the #hashCode method will throw a NullPoinrterException.

It should either be changed to:

@Override
public int hashCode() {
  if (hash == -1) {
    hash = from.hashCode() ^ to.hashCode() ^ (vale == null ? 0 : value.hashCode());
  }
  return hash;
}

or to:

@Override
public int hashCode() {
  if (hash == -1) {
    hash = Objects.hash(from, to, value);
  }
  return hash;
}

If either from or to is null a NullPointerException should be thrown by the constructor.

No documentation about the maximum number of elements that can be stored in a List

I recently tried to test the limits of the number of elements that can be stored in an instance of List<Long> and I learned (as I suspected) that indexing will not work properly with values that are greater than Integer.MAX_VALUE. This is because there is a downcast from long to int at line 79 of List.java. I think most users will assume that because long is used to index this collection that it will store more than Integer.MAX_VALUE number of elements. Some documentation to clarify this or to update the implementation to allow more values would be helpful. 😃

Confusion over when to use linear/forked

I am a little perplexed over when it is appropriate to call fork or linear.

I have called map2.forked before I then call map2.linear to create map3, however I see changes made to my map3 reflected in my map2. I had thought that forked would mean that changes to map3 don't affect map2.

It may be that I have misunderstood the docs and this isn't a bug after all. Some guidance would be appreciated please...

    @Test
    public void bifurcanImmutableRemoveThenRemove() {

        /*
          1. Create the initial map1: `map { 1: true(), 2: true() }`
         */
        IMap<Integer, Boolean> map1 = new LinearMap<>();
        map1.put(1, true);
        map1.put(2, true);
        map1 = map1.forked();  // make the map immutable
        checkMapIsForked(map1);

        /*
          2. Create map2 and remove the entry with key `2`
         */
        IMap<Integer, Boolean> map2 = map1.linear();  // create a transient map for modifications
        map2 = map2.remove(2);
        map2 = map2.forked();  // make the map immutable
        checkMapIsForked(map2);

        /*
          3. Check the entries in map2 with key `1` and `2`
         */
        final Boolean expected1 = map2.get(1, null);
        assertTrue(expected1);
        final Boolean expected2 = map2.get(2, null);
        assertNull(expected2);

        /*
         4. Create map3 and remove the entry in the with key `1`
         */
        IMap<Integer, Boolean> map3 = map2.linear();  // create a transient map for modifications
        map3 = map3.remove(1);
        map3 = map3.forked();  // make the map immutable
        checkMapIsForked(map3);

        /*
         I expect map2.get(1, null) to return true
        */
        assertEquals(expected1, map2.get(1, null));

        /*
         I expect map3.get(1, null) to return null
         */
        assertNotEquals(expected1, map3.get(1, null));
    }

Some List RRB trees constructed deeper than necessary

So far I do not know that this is a correctness issue, only a minor performance issue from evidence I have now. Starting from an empty list and appending elements, some trees are constructed with 1 extra level of depth/height than necessary.

$ clj "-J-Dclojure.server.repl={:port 50505 :accept clojure.core.server/repl}" -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"}}}'
(import '(io.lacuna.bifurcan List))

;; (capac s) is the maximum number of vector elements that can be
;; descendants beneath a tree node with shift s.
(defn capac [shift]
  (bit-shift-left 1 (+ shift 5)))

;; make private root field accessible
(def list-class (class (List.)))
(def list-root-field (.getDeclaredField list-class "root"))
(.setAccessible list-root-field true)

(defn get-root [v]
  (.get list-root-field v))

(defn listnode? [x]
  (instance? io.lacuna.bifurcan.nodes.ListNodes$Node x))

(defn descendant-depths [n]
  (loop [remaining-nodes [[0 n]]
         depth->count {}]
    (if (zero? (count remaining-nodes))
      depth->count
      (let [[depth node] (peek remaining-nodes)
            depth->count (update depth->count depth (fnil inc 0))
            children (if (listnode? node)
                       (take (.numNodes node) (seq (.nodes node))))]
        (recur (into (pop remaining-nodes)
                     (map (fn [x] [(inc depth) x]) children))
               depth->count)))))

(defn list-n [n]
  (reduce (fn [v e] (.addLast v e))
          (List.)
          (range n)))

(def n1 (+ (capac 15) (- (capac 10)) (capac 5) (capac 0)))
(def pv (list-n n1))
(descendant-depths (get-root pv))
;; {0 1, 1 1, 2 32, 3 994, 4 31777}

For a List with fewer than (capac 15) elements, I would expect a tree
with shift 15 at the root, but this is the smallest value of n I found
that causes the tree to go to shift 20 at the root, a bit too early.

As far as I can tell so far this is not a correctness issue -- nth
returns the correct value for all indices, for example. It is a
performance issue.

I have not tried changes to the code yet to modify this behavior, but
I believe it is because the condition to add a new level of depth to
the tree in method pushLast only checks that the bottom-most node has
the maximum branching factor, and the root node does, but not that all
of the other nodes on the "right tree edge" have the maximum branching
factor. Thus the code can add a new level of depth to the tree when
an intermediate depth node on the "right tree edge" has only 1 node.

(def wrong-count (atom 0))
@wrong-count
(dotimes [i n1]
  (when (not= (.nth pv i) i)
    (swap! wrong-count inc)))
@wrong-count
;; 0 - good

bug in List removeLast method

The value of n shown below is the smallest one I know of that exhibits this problem.

One issue with generative testing and data structures like List with wide branching factors is that it can take a large number of operations to reach "interesting" tree structures that exercise new code paths. I have found that making a local change with SHIFT_INCREMENT=2 and thus MAX_BRANCHES=(1 << 2) = 4 causes the existing generative tests to find issues like this in a fairly short period of time.

That requires a few replacements of literal constants 5 and 32 in the List.java and ListNodes.java source files to be replaced with appropriate names, to avoid replacing them manually in multiple places. I can submit a PR with such changes if that would be of interest.

$ clj -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"} org.clojure/clojure {:mvn/version "1.10.1"}}}'
(import '(io.lacuna.bifurcan List))
(def n 33824)
(def l1 (.concat (List.) (List/from (range n))))
(= (seq (range n)) (iterator-seq (.iterator l1)))
;; true.  Good
(.size l1)
;; 33824 - the expected size

(def l2 (.removeLast l1))
(= (seq (range (dec n))) (iterator-seq (.iterator l2)))
;; false.  bug!
(.size l2)
;; 1055 - not even close to the correct value

Upscaling/Downscaling between the Bifurcan List and the Clojure Vector

I've noticed that there's quite a lot of similarities between the two structures. Is there a way to have a hybrid structure that can upscale the simpler clojure vector if need be to an RRB Tree?

This will get rid of a problem that's always bothered me:

(conj (cons [...] 1) 2) => (2, 1, ...)  ;; which is a Linked List, not a vector

will then be:

(conj (cons [...] 1) 2) => [1 ... 2]  ;; which is an upscaled Bifurcan List

Forked set is modified after baked linear set modification

Hi, thank you for the great library!
I've found out that the forked set can be modified in case the initial linear set gets any modifications.
This behaviour doesn't reproduce for lists.
Here is the code example:

import io.lacuna.bifurcan.List;
import io.lacuna.bifurcan.Set;

public class Main {
    public static void main(String[] args) {
        Set<Integer> linearSet = new Set<Integer>().linear();
        linearSet.add(1);
        Set<Integer> forkedSet = linearSet.forked();
        linearSet.add(2);

        System.out.println("Linear set: " + linearSet);
        System.out.println("Forked set: " + forkedSet);

        List<Integer> linearList = new List<Integer>().linear();
        linearList.addLast(1);
        List<Integer> forkedList = linearList.forked();
        linearList.addLast(2);

        System.out.println("Linear list: " + linearList);
        System.out.println("Forked list: " + forkedList);
    }
}

And here is the produced output:

Linear set: {1, 2}
Forked set: {1, 2}
Linear list: [1, 2]
Forked list: [1]

I'd expect that the "Forked set" remains as {1}.

Here is the full project code: https://github.com/AlexPl292/distraught-chimera

`DirectedGraph#edge(V, V)` can be improved by removing the `Optional` allocation

Currently, DirectedGraph#edge relies on Map#get(K)which returns an optional.

This could be improved by using Map#get(K, V)with a default value - especially in highly frequently used graph traversals this could save man allocations.

So instead of:

  @Override
  public E edge(V from, V to) {

    Map m = out.get(from).orElseThrow(() -> new IllegalArgumentException("no such edge"));
    Object e = m.get(to, DEFAULT);

    if (e == DEFAULT) {
      throw new IllegalArgumentException("no such edge");
    } else {
      return (E) e;
    }
  }

It could be something like:

  @Override
  public E edge(V from, V to) {

    Map m = out.get(from, DEFAULT_MAP)
    if (m == DEFAULT) {
      throw new IllegalArgumentException("no such edge");
   }

    Object e = m.get(to, DEFAULT);

    if (e == DEFAULT) {
      throw new IllegalArgumentException("no such edge");
    } else {
      return (E) e;
    }
  }

SortedMap should have the possibility to supply a custom `Comparator`

SortedMap already has a field comparator, but is is always set to Comparator.naturalOrder().

There is also the private constructor SortedMap(Node<K, V> root, boolean linear, Comparator<K> comparator) that is used by SortedMap internally - although it can also be invoked with reflection magic. Tests show that it seems to work fine when supplying a custom Comparator.

But there should be an "official" possibility.

Do several methods mutate the internals of immutable List instances?

It appears that several methods like addLast, addFirst can cause method grow to be called, even if the List is documented as immutable, e.g. isLinear() returns false.

I agree that the code is mutating the node in such a way that smaller arrays are replaced with larger arrays that have the same contents in the indices that are common to both the old and new arrays, which is good.

The red flag it raises in my mind is that if thread T1 is executing grow on a forked List, and thread T2 is doing read-only operations on it concurrently, then modifications made by T1 might not be safely published to T2, and T2 could start getting inconsistent results for different array elements.

Does that sound possible?

Possible accidental mutation

Hi, first off, very cool library & benchmarks!

I'm not sure if I'm doing something wrong or making an incorrect assumption, but I think I've found a bug causing accidental mutation in List in 0.2.0-alpha6, see my minifed reproducer:

import io.lacuna.bifurcan.List;

public class Test {
        public static void main(String args[]) {
                int size = 32;
                List<Integer> a = new List<Integer>();
                List<Integer> b = new List<Integer>();
                for (int i = 0; i < size; i++) {
                        a = a.addLast(1);
                        b = b.addLast(1);
                }
                List<Integer> ap = a.set(15, 2);
                List<Integer> bp = b.set(15, 2);
                List<Integer> z = null;
                // Comment out, and the equals is true
                z = ap.set(15, 3);
                System.out.println("a: " + a);
                System.out.println("b: " + b);
                System.out.println("ap:" + ap);
                System.out.println("bp:" + bp);
                System.out.println("z: " + z);
                System.out.println("Should be true: " + ap.equals(bp));
        }
}

Will print:

~/test ➜ javac -cp "./bifurcan-0.2.0-alpha6.jar" Test.java && java -cp ".:./bifurcan-0.2.0-alpha6.jar" Test
a: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
b: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
ap:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
bp:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
z: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Should be true: false

Commenting out the z = ap.set(15,3); line, or changing the size to anything less than 32 will make the result true.

AIOOBE on List slice/concat usage

Hi!

I have encountered a bug in List implementation when tried to implement Sorted Array data structure. I've been using concat and slice operations heavily to insert new elements at proper places, and it turned out my data structure failed after several iterations.

I've logged requests to List and recreated a pure test which fails with internal AIOOBE. Just in case, I've re-checked, the indices queried are correct and exist.

java.lang.ArrayIndexOutOfBoundsException: -1

	at io.lacuna.bifurcan.nodes.ListNodes$Node.pushLast(ListNodes.java:364)
	at io.lacuna.bifurcan.nodes.ListNodes.pushLast(ListNodes.java:34)
	at io.lacuna.bifurcan.nodes.ListNodes$Node.slice(ListNodes.java:240)
	at io.lacuna.bifurcan.List.slice(List.java:228)
	at io.lacuna.bifurcan.List.slice(List.java:19)

Here's the link to the gist: https://gist.github.com/valich/0958ed848ff5999532de59f360a2f899
I understand this is not a minimal sample, but I hope this will be enough to locate the problem. Please tell if you need more info.

The bug is present in 0.1.0 and 0.2.0-alpha.

Oh, yes, and removing linear/forked calls does not help either if that's important.

Use clojure.PersistentList instead of clojure.PersistentVector for list comparisons

On the following page, you compare a number of persistent data structures:
https://github.com/lacuna/bifurcan/blob/master/doc/comparison.md

In the lists section, I think it would be more appropriate to use clojure.PersistentList than clojure.PersistentVector, considering you switch to the equivalent list types for other languages.

Minor: redundant defn of benchmark-iteration in benchmark-test ns

benchmark-iteration is defined twice in the benchmark-test namespace. The first definition is incorrect, but that doesn't matter much since the second one is correct, and replaces the first. No bug here, just a couple lines of unnecessary and unused code.

concat then slice throws exception with some List instances

A smaller example was found via generative testing with List code locally modified with SHIFT_INCREMENT=2 MAX_BRANCH=(1 << 2)=4. Then I tried the scaled up version with a similar tree structure and got the same error.

Everything goes well up until the final slice method call.

$ clj -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"} org.clojure/clojure {:mvn/version "1.10.1"}}}'
Clojure 1.10.1
user=> (import '(io.lacuna.bifurcan List))
io.lacuna.bifurcan.List
user=> (def b 32)
#'user/b
user=> (def n (+ (* b b b) (- (* b b)) b))
#'user/n
user=> n
31776
(defn same-seq [a b]
  (= (seq a) (seq b)))
#'user/same-seq
user=> (def r1 (range 2))
#'user/r1
user=> (def r2 (range 2 (+ 2 n)))
#'user/r2
user=> (def l1 (.concat (List.) (List/from r1)))
#'user/l1
user=> (def l2 (.concat (List.) (List/from r2)))
#'user/l2
user=> (same-seq r1 l1)
true
user=> (same-seq r2 l2)
true
user=> (def l3 (.concat l1 l2))
#'user/l3
user=> (def r3 (concat r1 r2))
#'user/r3
user=> (same-seq r3 l3)
true
user=> (def l4 (.slice l3 1 3))
Execution error (ArrayIndexOutOfBoundsException) at io.lacuna.bifurcan.nodes.ListNodes$Node/pushLast (ListNodes.java:364).
-1
user=> (pst)
ArrayIndexOutOfBoundsException -1
	io.lacuna.bifurcan.nodes.ListNodes$Node.pushLast (ListNodes.java:364)
	io.lacuna.bifurcan.nodes.ListNodes.pushLast (ListNodes.java:34)
	io.lacuna.bifurcan.nodes.ListNodes$Node.slice (ListNodes.java:240)
	io.lacuna.bifurcan.List.slice (List.java:228)
	sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2)
	sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
	sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
	java.lang.reflect.Method.invoke (Method.java:498)
	clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:167)
	clojure.lang.Reflector.invokeInstanceMethod (Reflector.java:102)
	clojure.lang.Compiler$InstanceMethodExpr.eval (Compiler.java:1555)
	clojure.lang.Compiler$DefExpr.eval (Compiler.java:457)
nil

Bug - remove fails on VirtualMap

When starting from a LinearMap and converting to a Map, the subsequent remove(key) call on the returned VirtualMap does nothing. I think this is probably a bug!

// TEST 1 (FAILS)
final IMap<String, String> linearMap = LinearMap.from(Arrays.asList(new Maps.Entry<>("a", "b")));
assert(linearMap.contains("a"));

final IMap<String, String> map = linearMap.forked();
assert(map.contains("a"));

assert(!map.remove("a").contains("a"));  // this FAILS

Interestingly when working the other way around, i.e. from Map to LinearMap everything works fine, i.e.

// TEST 2 (OK)
final IMap<String, String> map = Map.from(Arrays.asList(new Maps.Entry<>("a", "b")));
assert(map.contains("a"));

final IMap<String, String> linearMap = map.linear();
assert(linearMap.contains("a"));

assert(!linearMap.remove("a").contains("a"));  // this is OK

Even stranger, this one also seems to work, which makes me think the problem is definitely caused by LinearMap#forked() returning a VirtualCollection (in our first example):

// TEST 3 (OK)
final IMap<String, String> map = Map.from(Arrays.asList(new Maps.Entry<>("a", "b")));
assert(map.contains("a"));

final IMap<String, String> linearMap = map.linear();
assert(linearMap.contains("a"));

final IMap<String, String> map2 = linearMap.forked();

assert(!map2.remove("a").contains("a"));  // this is OK

Add ability to create subclasses

I'm implementing a toy lisp based on kanaka/mal using bifurcan, and I've come across two cases where it would be useful to be able to create subclasses of the bifurcan structures. These are difficult to create since the methods create new instances of the structures and there's no way to create a subclass when that happens.

MAL requires me to support Clojure-style lists and vectors. I'd like to use bifurcan's List for both since it supports the efficient operations of both, but I need to be able to record which was read (i.e. list or vector) so that I can later print it correctly. There's no really good way to do this at the moment. I ended up creating a Vector class implementing IList which delegates to an internal List, and for the methods which return a new List I then wrap them in a new Vector, but that's pretty ugly.
I'd like to implement something like Clojure's IObj protocol to allow metadata to be stored on the collections too, but that's equally tricky. I could probably implement some equally ugly workaround, but a nicer way to do this would be great.

I've only looked at List so far, but it seems like the operations create new instances via the private List(boolean linear, Node root, int prefixLen, Object[] prefix, int suffixLen, Object[] suffix) constructor. Could the new instances instead be created by calling a protected method which in List just calls that constructor, but in a subclass could create a Vector, IObjList or whatever?

I'm not sure what the implications are for other interfaces such as Concat though.

lacuna / bifurcan Goto Github PK

bifurcan's Issues

Recommend Projects

Recommend Topics

Recommend Org