bodil / im-rs Goto Github PK

View Code? Open in Web Editor NEW

1.5K 1.5K 111.0 1.49 MB

Assorted immutable collection datatypes for Rust

Home Page: http://immutable.rs/

License: Mozilla Public License 2.0

Rust 100.00%

im-rs's Introduction

im-rs

Blazing fast immutable collection datatypes for Rust.

Comes in two versions: im (thread safe) and im-rc (fast but not thread safe).

Documentation

API docs

Licence

This software is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

im-rs's People

Contributors

Stargazers

Watchers

Forkers

siarheimelnik aoprisan kodraus srijs yamafaktory vmosone d-plaindoux crazymerlyn dicej brendanzab tov jmikkola christophejunke bbc2 dwightguth mitsuhiko joliss tempbottle seako matthewhammer lambdaxymox silensangelusnex kipari krobelus baitcenter alexcrichton rustcamp jessegrosjean arazabishov andymac-2 jamesray1 raycmorgan sourcegraph-testing fj rtfeldman nomad010 octobus-net jrhy ninglin-p code-affinity cmyr dendisuhubdy olegnn abhisheknishantpuresoftware zainab-ali paytonrules rjmac dsarlis dfinity-lab isgasho itamarst theli-ua jdnavarro justinmoon zeta1999 steveklabnik justanotherdot danielhenrymantilla arthurprs ivanpierre vorner ammaraskar salewski dusdanig vlmutolo shuoli84 xu-cheng placrosse taylordotfish ddlog-dev transparencies qrilka yoshikitakashima ygg01 icodein ckaran brightly-salty ijijn rogach samuelmarks yatima-inc archive-tsao-chi-forks emisonlu grantjamespowell jplatte frozen frankfanslc dotnwat behzadnouri michel-slm jneem prateekvishnu mayhemheroes ryoqun qqwy logistic-bot iamwilhelm msifd ckampfe ekleog

im-rs's Issues

immutable push* and append

Hi!

I'm pretty new to rust, but playing around with it the first thing I got annoyed with it was the lack of immutable lists. Recursion over mutable vectors is just no fun.

im::Vector seemed like the thing I wanted, but then I had really expected it to have

pub fn append(&self, other: Self) -> Self
pub fn push_front(&self, value: A) -> Self
etc.

Is it a conscious design decision to make these methods mutating instead? Or is there a better way to immutably build Vectors that I missed?

Thank you for your work on open source!

cargo could use a persistent hash map in it's resolver

Hi,

Cargo's resolver has a long standing TODO:
"We should switch to persistent hash maps if we can..."
So when I watched your talk at RustFest Paris I thought I'd make the introductions. So if/when this crate is ready, perhaps a pr thare?

The comment is somewhat out of date. Thanks to a lot of profiling most things are wrapped in RC/ARC and are bespoke custom persistent data structures. So I suspect it will improve ergonomics more then speed, but I'd be happy to be wrong.

RRB Improvements - Direct Append/Direct Tail

I see RRB Vectors might appear in near future, and from what I've see they will include transience. I'm wondering about other improvements to RRB Vectors noted in Improving RRB Vectors with transience, namely Direct Append and Direct Tail.

I'd be willing for PR regarding this.

docs.rs failed to build im-12.0.0

Hi, I recently searched for the documentation of im. I reached to:
Google took me to this page: https://docs.rs/crate/im/12.0.0
which returns the following error:

docs.rs failed to build im-12.0.0
Please check build logs and if you believe this is docs.rs' fault, report into this issue report.

I don't know if it's a problem with docs.rs or with this repository.

I also wanted to say that im-rs really saved my life, it's one of my favourite crates. Thank you for your work!

Crash on Vector::remove()

fn main() {
    let mut x = im::Vector::new();
    for _ in 0..4100 {
        x.push_back(0);
    }
    x.remove(11);
    x.remove(11);
}

After the first call to remove(), inner_f is full (64 elements), and it crashes on the second remove(), see below.
I'm not yet sure what exactly needs to happen I'll try to figure out.

thread 'main' panicked at 'Chunk::push_front: can't push to full chunk', /home/johannes/.cargo/registry/src/github.com-1ecc6299db9ec823/sized-chunks-0.1.1/src/sized_chunk.rs:315:13
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:70
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:58
             at src/libstd/panicking.rs:200
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:215
   4: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:478
   5: std::panicking::begin_panic
   6: im::nodes::rrb::Size::push
   7: <im::nodes::rrb::Node<A>>::push_chunk
   8: <im::vector::RRB<A>>::push_middle
   9: <im::vector::Vector<A>>::append
  10: <im::vector::Vector<A>>::remove
  11: r2::main
  12: std::rt::lang_start::{{closure}}
  13: std::panicking::try::do_call
             at src/libstd/rt.rs:49
             at src/libstd/panicking.rs:297
  14: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:87
  15: std::rt::lang_start_internal
             at src/libstd/panicking.rs:276
             at src/libstd/panic.rs:388
             at src/libstd/rt.rs:48
  16: main
  17: __libc_start_main
  18: _start

Weaken some Fn constraints to FnMut or FnOnce

Many of the methods that take Fn, such as HashMap::update, only call the given function once or in a single-threaded way. In these cases, the constraints can be weakened to FnOnce or FnMut respectively.

In Vector, get_unwrapped works but not the index operator.

Adding the following code to the src/vector.rs file and running cargo test, the first assert works but the second fails.

#[test]
fn index_operator() {

    let vec1 = vector![0, 1, 2, 3, 4, 5];
    let vec2 = vec1.push_front(0);
    assert_eq!(0, *vec2.get_unwrapped(0));
    assert_eq!(0, vec2[0]);

Getting an `Iter<T>` instead of `Iter<Arc<T>>` from collections

Hi,

Right now I'm using a ConsList in a single-threaded scenario to implement a builder pattern kind of like this. (I'm aware it might be a bit odd, but it seemed like a natural fit to cons into some type of list here).

    fn with(self, e: ElementType) -> Self {
        Self {
            elements: im::conslist::cons(e, self.elements),
        }
    }

However, later on I need to read the ConsList and I wanted to pass around an iterator over it's elements. As it's implemented now (and this could well be a side-effect of how this needs to be implemented - I didn't investigate yet) this turns into an Iterator<Arc> while it would be more natural & closer to std collections to have it return an Iterator; it's not a huge deal and transforming between the two is relatively straight forward

self.elements.iter().map(|x| (&*x).clone())

However, I was wondering if there's a way to do this without having to call clone.

Jasper

Empty collections allocate a surprising amount

In the standard library, the only collection that allocates when you call new() is VecDeque, which will allocate space for 7 elements.

By comparison, the collections in this library allocate a lot more. Using an element type of u64 for testing:

HashSet (and HashMap) will allocate a small amount for the hasher (which is typically a 0-size or Copy type) and 536 bytes of space
OrdSet allocates 1080 bytes
Vector allocates 544 bytes

I'm not sure if this is easily fixable without a performance penalty but it would be nice to not allocate at all.

HashMap failes to write closing brace in Debug impl for an empty map

extern crate im;

use im::HashMap;

fn main() {
    let hm: HashMap<(), ()> = HashMap::new();

    println!("hm: {:?}", hm);
}

This will write to stdout:

hm: {

While it is expected to write:

hm: { }

Add a comparison of data structures a la std::collections

This and this is what I mean.

Maybe also performance comparisons of data structures with std::collections equivalents.

If you're interested I'll write something up in the next couple of days :)

Improving performance for `Copy` keys and values

I wrote a few simple benchmarks to determine what, if anything, might be gained by storing keys and values directly in OrdMap rather than wrapping them in Arcs:

https://github.com/dicej/ordmap_performance/blob/master/src/lib.rs

(note that I had to hack the im-rs lib.rs to make the nodes module public in order to make the above work)

Here are the results on my machine:

test add_and_remove         ... bench:   2,394,205 ns/iter (+/- 78,129)
test add_and_remove_mut     ... bench:     449,903 ns/iter (+/- 54,218)
test add_and_remove_raw     ... bench:   1,259,173 ns/iter (+/- 64,433)
test add_and_remove_raw_mut ... bench:     257,425 ns/iter (+/- 16,412)

Considering that performance is roughly doubled by removing the Arcs, would it be appropriate to modify the APIs so that the Arc wrappers are optional?

String/rope support

Hi,
I'm using this library for a toy language I'm building.
I want my data structures to be all persistent/immutable.

Right now I'm using Vector and HashMap.
But I also want to do the same for strings.
Are there any plans to add some String implementation?
I see there is a rope branch with an initial implementation.

If you would add a String today would you still use a normal rope?
Doesn't it make more sense to use a RRB vector as underlying type?
I'm really interested in how you would implement it.
I would love to help implement it.

Kind regards,
Alexander

HashMap<String> remove requires a &String

This is not a new issue but I did not find one here: a &str cannot be passed to remove when the hash map key is a String. Only &String works which is a bit unfortunate.

RRB size table reports incorrect length

I found a trace where the size tables seem to behave strangely.

Reproduce it with this program, cargo run --release < trace should work where input file trace can be found here: https://gist.github.com/krobelus/b921d2f2cdde4b4b5101aba34fda87fe.
It crashes in the last line because the computed index is out of bounds.

use std::io::BufRead;
use std::str::FromStr;

fn main() {
    let mut x = im::Vector::new();
    let mut lines = std::io::BufReader::new(std::io::stdin()).lines();
    while let Some(line) = lines.next().map(|l| l.unwrap()) {
        let words: Vec<&str> = line.split(' ').collect();
        let arg = usize::from_str(words[1]).unwrap();
        match words[0] {
            "push" => {
                for _ in 0..arg {
                    x.push_back(0u32);
                }
            }
            "remove" => {
                x.remove(arg);
            }
            _ => unreachable!(),
        };
    }
    assert!(x.len() != 0);
    eprintln!("{}", x[x.len() - 1])
}

Below is a representation of the Vector x just before executing the last line of above program.
I observe that the length of the middle part is reported to be 8215 (the last offset in the size table, see Node::len()).
However, summing the individual lengths of the 3 nodes in the middle part gives a different result:
4093 + 4096 + 45 = 8234. This is the actual length of the middle part.
This diff is a workaround that would prevent the example at hand from crashing, but I think that we need to make the size table report the correct size instead - or are they fine like this? Thank you!

     pub fn len(&self) -> usize {
         match self.children {
             Entry::Nodes(Size::Size(size), _) => size,
-            Entry::Nodes(Size::Table(ref size_table), _) => *(size_table.last().unwrap_or(&0)),
+            Entry::Nodes(Size::Table(_), ref children) => {
+                children.iter().map(|child| child.len()).sum()
+            }
             Entry::Values(ref values) => values.len(),
             Entry::Empty => 0,
         }

length       8299
middle_level 2
outer_f      [length = 0]
inner_f      [length = 0]
middle:
Nodes [length = 3], Table(Chunk[4093, 8170, 8215])
    Nodes [length = 64], Size(4093)
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 61]
    Nodes [length = 64], Size(4096)
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
        Values [length = 64]
    Nodes [length = 1], Size(45)
        Values [length = 45]
inner_b      [length = 64]
outer_b      [length = 1]

Error: unexpected `?` when compiling with rustc 0.2.2

When compiling im 10.2.0 with cargo build, the compiler outputs the following error:

$> cargo build
Compiling im v10.2.0
error: unexpected `?`
   --> /Users/jivan/.cargo/registry/src/github.com-1ecc6299db9ec823/im-10.2.0/src/nodes/btree.rs:203:19
    |
203 |         BK: Ord + ?Sized,
    |                   ^

error: aborting due to previous error

error: Could not compile `im`.

Content of Cargo.toml:

(...)

[dependencies]
im = "10.2.0"

Using rustc version 0.2.2.

Is there any specific compiler flag needed?

Incorrect results from HashSet

See my kinda-long test case here

https://gist.github.com/winstonewert/3eb537310c70fc0631e64c5cee26a062

In case you aren't familiar with quickcheck, it runs random tests against your code. In this case, I apply random combinations of sets of with both im's code and the std library and check they agree. If you don't want to deal with quickcheck you can delete the references to it and focus on the more conventional test at the end. (Derived from the case that breaks the quickcheck test)

I'm using an alternative hasher. This isn't critical the problem but I wanted to eliminate the randomness in the default hasher to get consistent failures.

Split types into Rc and Arc backed ones

I was trying to upgrade to latest im but I'm a utility library that needs sync im types, but I do not want to be the fault for forcing sync im on all my users.

Would you be open to a patch to have Vector<T> being Rc based and SyncVector<T> being Arc based (and so forth)?

Time complexity of iter

What's the time complexity of the iter methods?
I think they should be added to the docs too

Suggestion: Interval/Segment trees?

I haven't seen too much prior art in the persistent sphere, but finger trees seem to be the way they go for it in haskell.

Loving your work here so far btw, thanks ever so much :)

Document ordmap::Entry

im::ordmap::Entry et al are missing documentation. Please fix.

Clone vs Rc for expensive items

Trying a bit more with switching older versions of im to the current one I'm trying to figure out if there are some general guidelines about how to deal with items that are more expensive to clone. In particular it's not entirely clear form the API surface currently when a clone actually happens on the item since internally everything is wrapped by a reference counter.

So for instance we have quite a few things that hold Strings and I wonder if it makes sense to wrap this in another layer of refcounting.

Code of Conduct link on project front page broken

If you go to immutable.rs and click 'Code of Conduct' you get a 404. If you click 'Source' and then click on the file, then of course you see it.

If this is too trivial to bother with, just close the issue.

Restrict clippy config to stable features?

I'm fairly new to the Rust ecosystem, so apologies if I'm missing something obvious. I just came across the clippy tool and thought it'd help me write more idiomatic Rust. Unfortunately, the stable version won't work because im-rc uses a scoped lint called clippy::type_complexity that apparently is experimental.

It seems odd to me that clippy would lint my dependencies anyway, so maybe the real issue lies there. Unfortunately, as it stands I can't run it in my project because clippy also aborts after encountering the errors from im-rc.

For completeness, the log with im-rc 12.2.0 looks like:

error[E0658]: scoped lint `clippy::type_complexity` is experimental (see issue #44690)                                                                                                                                                                                                    
   --> /home/nirvdrum/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.2.0/./src/lib.rs:337:45                                                                                                                                                                                    
    |                                                                                                                                                                                                                                                                                     
337 | #![cfg_attr(feature = "cargo-clippy", allow(clippy::type_complexity))]                                                                                                                                                                                                              
    |                                             ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                          
error[E0658]: scoped lint `clippy::should_implement_trait` is experimental (see issue #44690)                                                                                                                                                                                             
   --> /home/nirvdrum/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.2.0/./src/vector/focus.rs:522:15                                                                                                                                                                           
    |                                                                                                                                                                                                                                                                                     
522 |         allow(clippy::should_implement_trait)                                                                                                                                                                                                                                       
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                          
error: aborting due to 2 previous errors                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                          
For more information about this error, try `rustc --explain E0658`.                                                                                                                                                                                                                       
error: Could not compile `im-rc`.

My solution for now is to run clippy with a nightly toolchain.

Why does Debug require Clone ?

I'm looking at impl of Debug for HashMap.

https://github.com/bodil/im-rs/blob/master/src/hash/map.rs#L1558-L1559

Why do K & V need to have Clone in order to derive Debug for HashMap<K, V> ?

Error using OrdSet::diff

I've got the following code:

fn main()  {
    let a = im::OrdSet::new();
    let b = im::OrdSet::new();
    for change in a.diff(&b) {
        match change {
            im::ordset::DiffItem::Update => (),
            _ => ()
        }
    }
}

It produces the following error:

error[E0599]: no variant named `Update` found for type `im::nodes::btree::DiffItem<'_, _>` in the current scope
 --> src/main.rs:6:13                                                                         
  |                                                                                           
6 |             im::ordset::DiffItem::Update => (),                                           
  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ variant not found in `im::nodes::btree::DiffItem<'_, _>`
  |                                                                                           
  = note: did you mean `im::nodes::btree::DiffItem<'_, _>::Update`?                           
                                                                                              
error: aborting due to previous error

It looks like it see that Update is there, but somehow can't find it?

Index out of bounds on Vector::iter()

Below is a program that causes a crash.
The problem seems to have been introduced in 6d0b068.

fn main() {
    let mut x = im::Vector::new();
    for _ in 0..262 {
        x.push_back(0);
    }
    for _ in 0..97 {
        x.pop_front();
    }
    for &offset in &[160, 163, 160] {
        x.remove(offset);
    }
    for _ in 0..5 {
        x.pop_front();
    }
    for _ in 0..128 {
        x.push_back(0);
    }
    x.push_back(0);
    for _ in x.iter() {
    }
}

Panic in List of length > 4035

Consider the following program:

extern crate im_rc;

fn main() {
    let mut l = im_rc::Vector::singleton(0);
    for _ in 0..4036 {
      let mut tmp = im_rc::Vector::singleton(0);
      tmp.append(l);
      l = tmp;
    }
    let len = l.len();
    l.slice(1..len);
  }

I would expect this program to terminate without panicking, but instead I get the following panic:

thread 'main' panicked at 'attempt to subtract with overflow', /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.2.0/./src/nodes/rrb.rs:754:44
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:227
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:511
   5: std::panicking::continue_panic_fmt
             at libstd/panicking.rs:426
   6: rust_begin_unwind
             at libstd/panicking.rs:337
   7: core::panicking::panic_fmt
             at libcore/panicking.rs:92
   8: core::panicking::panic
             at libcore/panicking.rs:53
   9: <im_rc::nodes::rrb::Node<A>>::split
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.2.0/./src/nodes/rrb.rs:754
  10: <im_rc::vector::Vector<A>>::split_off
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.2.0/./src/vector/mod.rs:968
  11: <im_rc::vector::Vector<A>>::slice
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.2.0/./src/vector/mod.rs:1082
  12: rust::main
             at src/main.rs:11
  13: std::rt::lang_start::{{closure}}
             at /checkout/src/libstd/rt.rs:74
  14: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:310
  15: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:105
  16: std::rt::lang_start_internal
             at libstd/panicking.rs:289
             at libstd/panic.rs:392
             at libstd/rt.rs:58
  17: std::rt::lang_start
             at /checkout/src/libstd/rt.rs:74
  18: main
  19: __libc_start_main
  20: _start

Note that the program works if I reduce the length of the loop to 4035.

Surprising performance on `HashMap::update_with`

Hello!

Using this library, I ran in to HashMap::update_with being slower than I'd expected. I figured I'd post the demonstration case here. Perhaps this performance is expected, and I've just miss-understood the time complexity of update_with! Perhaps this will be unexpected to someone else as well.

I'm counting the frequency of integers in a long vector, building a HashMap<i32, i32>. My initial plan was to use an im::HashMap for this, because the API of update_with is so much nicer than doing this with a std::collections::HashMap. However, using an im::HashMap is slower than I was expecting. To demonstrate, I did a simple comparison, logging out the SystemTime::now().elapsed() after calling each of these functions:

use im::HashMap as ImHashMap;
use std::collections::HashMap as StdHashMap;

fn immutable_hash_map(v: &Vec<i32>) -> ImHashMap<i32, i32> {
    v.iter()
        .fold(ImHashMap::new(), |f, i| f.update_with(*i, 1, |o, n| o + n))
}

fn mutable_hash_map(v: &Vec<i32>) -> StdHashMap<i32, i32> {
    v.iter().fold(StdHashMap::new(), |mut f, i| {
        if f.contains_key(&i) {
            let v = f.get(&i).unwrap();
            f.insert(*i, v + 1);
            f
        } else {
            f.insert(*i, 1);
            f
        }
    })
}

Providing a Vec of a million i32s, the std::collections::HashMap version completes in about 185 miliseconds. The im::HashMap version, on the other hand, takes a touch over 5 seconds. Slow enough to catch me off guard.

Reading over the documentation, I do see that update_with has a time complexity of O(log n). Does this sound like the expected performance for this operation?

`Vector::chunks` is misleading

I expected to see Vec::chunks and Vec::chunks_exact, but for Vector; but for im's vector chunks mean something different.

Maybe Vector::chunks and Vector::chunks_mut should be renamed somehow?

Panic in Vector of length > 4099

See issue #55, but I am opening a new issue because I do not have the permissions to reopen the previous issue.

extern crate im_rc;

fn main() {
    let mut l = im_rc::Vector::singleton(0);
    for _ in 0..4099 {
      let mut tmp = im_rc::Vector::singleton(0);
      tmp.append(l);
      l = tmp;
    }
    let len = l.len();
    l.slice(1..len);
  }

Note this is the exact same program, but with a slightly higher number of iterations in the loop. The actual program I'm debugging iterates up to 7621 iterations, so you should probably continue to test up to substantially higher N this time.

The new backtrace:

thread 'hook_list::tests::test_long_list' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:221
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:511
   5: std::panicking::continue_panic_fmt
             at libstd/panicking.rs:426
   6: rust_begin_unwind
             at libstd/panicking.rs:337
   7: core::panicking::panic_fmt
             at libcore/panicking.rs:92
   8: core::panicking::panic
             at libcore/panicking.rs:53
   9: <core::option::Option<T>>::unwrap
             at /checkout/src/libcore/macros.rs:20
  10: <im_rc::nodes::rrb::Node<A>>::len
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/nodes/rrb.rs:343
  11: <im_rc::nodes::rrb::Node<A>>::split
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/nodes/rrb.rs:722
  12: <im_rc::vector::Vector<A>>::split_off
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/mod.rs:1043
  13: <im_rc::vector::Vector<A>>::slice
             at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/mod.rs:1157
  14: datastructures::hook_list::tests::test_long_list
             at src/hook_list.rs:521
  15: datastructures::__test::TESTS::{{closure}}
             at src/hook_list.rs:513
  16: core::ops::function::FnOnce::call_once
             at /checkout/src/libcore/ops/function.rs:223
  17: <F as alloc::boxed::FnBox<A>>::call_box
             at libtest/lib.rs:1451
             at /checkout/src/libcore/ops/function.rs:223
             at /checkout/src/liballoc/boxed.rs:640
  18: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:105

I also got another backtrace with the same error message that is probably related, but I haven't had a chance to construct a minimal example yet; I include it here in the hopes that it might give you some insight or that it might lead your attention to another possible location where a bug of this type might be occurring:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff599d801 in __GI_abort () at abort.c:79
#2  0x00000000009acaf6 in panic_abort::__rust_start_panic::abort () at libpanic_abort/lib.rs:61
#3  0x00000000009acae6 in __rust_start_panic () at libpanic_abort/lib.rs:56
#4  0x0000000000966a98 in rust_panic () at libstd/panicking.rs:559
#5  0x00000000009669b4 in std::panicking::rust_panic_with_hook () at libstd/panicking.rs:531
#6  0x00000000009666ba in std::panicking::continue_panic_fmt () at libstd/panicking.rs:426
#7  0x00000000009663d6 in rust_begin_unwind () at libstd/panicking.rs:337
#8  0x000000000095360c in core::panicking::panic_fmt () at libcore/panicking.rs:92
#9  0x000000000095353b in core::panicking::panic () at libcore/panicking.rs:53
#10 0x000000000090a4c9 in <core::option::Option<T>>::unwrap (self=...) at /checkout/src/libcore/macros.rs:20
#11 0x00000000008fb8af in <im_rc::nodes::rrb::Node<A>>::lookup_chunk (self=0x7fff0e604658, level=1, base=0, index=3200)
    at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/nodes/rrb.rs:476
#12 0x00000000008fb9e8 in <im_rc::nodes::rrb::Node<A>>::lookup_chunk (self=0x7fff0e603b28, level=2, base=0, index=3200)
    at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/nodes/rrb.rs:481
#13 0x0000000000928ab3 in <im_rc::vector::focus::TreeFocus<A>>::set_focus (self=0x7fffffff70b8, index=3218)
    at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/focus.rs:365
#14 0x0000000000928446 in <im_rc::vector::focus::TreeFocus<A>>::get (self=0x7fffffff70b8, index=3218)
    at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/focus.rs:386
#15 0x0000000000928c0d in <im_rc::vector::focus::Focus<'a, A>>::get (self=0x7fffffff70b0, index=3218)
    at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/focus.rs:129
#16 0x00000000008f1f78 in <im_rc::vector::Iter<'a, A> as core::iter::iterator::Iterator>::next (self=0x7fffffff70b0)
    at /home/dwightguth/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/mod.rs:1858

Provide `.clear` method to match API of `std::collections`

It would be nice to have https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.clear in im as well!

Cache locality (comparison with Immer)

Hi!

First of all: this is amazing work. I was long waiting for someone to go ahead and do this. The code looks very good and you also seem to be very fast in making progress! 😄

I myself have been doing some work on bringing Clojure-like data-structures to system languages, but in the ugly land of C++, which resulted in the Immer library and this paper. One of the things that took most effort there (and brought most complexity) is to try achieve the most cache locality possible. In particular, vector elements are not boxed individually but live in the same array block. Studying your code, it seems to me that you are boxing the elements individually, and it seems to me that it would be impossible to do otherwise without adding ridiculous amounts of unsafe {} code. Do you have some thoughts on that?

Also, do you plan to implement catenable vectors (Relaxed Radix Balanced trees)?

Once again, congrats! I would be interested also in trying to do some benchmarking between the libraries. Let me know eventually if you would like to work together on that to try to find useful and fair benchmarks!

Cheers!

List::uncons_back missing?

List says, in its module docs, that it supports append/remove operations on both ends. It certainly supports bidirectional appends/conses, but I went looking for pop_front and pop_back equivalents; the first is roughly uncons, and the second is... nowhere to be found? I didn't see a head_back/tail_back or last/init pairing either. Is this intentional?

Add methods to Vector that set/add an element, then return a mutable reference to it.

I'm trying to add an entry-like api to PGraph (for edges). I need to use Vector::set, then mutably borrow the new value.

vector.set(index, new_value);
let result = vector.get_mut(index).unwrap();

If I understand correctly, this is messing with Option even though we'd have already panicked if the set was out-of-bounds and it's traversing the RRB Tree an extra time. It would be nice for Vector to have some functions like these (not necessarily with these names), so that I modify the vector and borrow the modified element at the same time.

fn set_mut(&mut self, index: usize, value: A) -> (A, &mut A)
fn push_*_mut(&mut self, value: A) -> &mut A
fn insert_mut(&mut self, index: usize, value: A) -> &mut A

Incorrect and inconsistent behaviour using HashSet::into_iter

The following code occasionally fails:

    for i in 0..10000 {
        let mut lhs = vec![604, 986, 436, 379, 59, 423, 391, 434, 375, 486, 64, 851, 631, 89];
        lhs.sort();

        let mut iset = im::HashSet::new();
        for &i in &lhs {
            iset.insert(i);
        }

        let mut rhs: Vec<_> = iset.clone().into_iter().collect();
        rhs.sort();

        if lhs != rhs {
            println!("iteration: {}", i);
            println!("lhs: {:?}", &lhs);
            println!("rhs: {:?}", &rhs);
            panic!();
        }
    }

It seems values are sometimes skipped in the IntoInterator implementation of HashSet.

The behaviour is not consistent but 10000 iterations is usually enough to reproduce it, outputting something like

iteration: 3370
lhs: [59, 64, 89, 375, 379, 391, 423, 434, 436, 486, 604, 631, 851, 986]
rhs: [64, 375]

Clippy broken on stable

Running cargo clippy on stable is now generates this error:

error[E0554]: #![feature] may not be used on the stable release channel
   --> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/im-12.1.0/./src/lib.rs:260:39
    |
260 | #![cfg_attr(feature = "cargo-clippy", feature(tool_lints))]
    |                                       ^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0554`.
error: Could not compile `im`.

Compilation Fails

I'm unable to install cargo-vendor apparently due to problems with im-rd. Something to do with "clippy::should_implement_trait" seemingly referencing: rust-lang/rust#44690

cargo install cargo-vendor
...
error[E0658]: scoped lint `clippy::should_implement_trait` is experimental (see issue #44690)
   --> /home/pass/.cargo/registry/src/github.com-1ecc6299db9ec823/im-rc-12.3.0/./src/vector/focus.rs:524:13
    |
524 |     #[allow(clippy::should_implement_trait)] // would if I could
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0658`.
error: Could not compile `im-rc`.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `cargo-vendor v0.1.23`, intermediate artifacts can be found at `/tmp/cargo-installTbG18U`

Add an entry api

Hi,

So I know that there is a new release in the works, so this may be out of date. But can we add an Entry API like std has? Specifically I have casch of hard to compute values in the form cache.entry(id).or_insert_with(|| slow_calculation()), and I don't see a way to do the same in this lib. I'd like to avoid hashing Id more than once if possible, and need to avoid pre computing slow_calculation making insert_* unhelpful. Or did I miss something that will allow this functionality with the existing methods?

Consider adding a BinaryHeap

Would you be interested in adding a BinaryHeap to im-rs?

I don't know much about persistent data structure research so there may be more relevant algorithms out there but https://arxiv.org/pdf/1312.4666.pdf may be a helpful read.

A more flexible way to iterable over an OrdMap

Hi,

Is there a way to iterate backwards on OrdMap efficiently?
Is there a way to set a starting point for iteration over OrdMap (perhaps even non-existing key) efficiently?

Great project, BTW!

Question: License

This is a very exciting library you've made here. Looking at bit at the MPL license seems like it would be nearly impossible for many people to use this for work. I can't speak for everyone, but at least at my job we can basically only use the more permissive BSD/MIT/Apache licenses. Most of the Rust ecosystem uses a combination of MIT and Apache which makes it a lot easier to use in a work environment.

I'm not trying to pressure you to change it or anything, but I'm just curious if there's a particular reason you chose MPL and if you had any attachment to it.

Either way, great work on this library!

Bump sized-chunks dependency

cargo test shows one failure (a proptest I believe) but when I use latest sized-chunks it works fine

Assertion failure in rrb::Size::pop

This program causes an assertion failure

fn main() {
    let mut x = im::Vector::new();
    for _ in 0..44 { x.push_back(0); }
    for _ in 0..20 { x.insert(0, 0); }
    x.insert(1, 0);
    for _ in 0..441 { x.push_back(0); }
    for _ in 0..58 { x.insert(0, 0); }
    x.insert(514, 0);
    for _ in 0..73 { x.push_back(0); }
    for _ in 0..10 { x.insert(0, 10); }
    x.insert(514, 0);
}

Here's what the tree looks like before the last insertion. It seems that the split is performed at the wrong chunk.
When we do x.insert(514, 0), we need to split at index 514 which corresponds to 514 - 4 - 64 = 446 within middle. Hence the split should be done in the last but one node, but it seems like the one before is used.

Should the middle be using a size table here?

length       648
middle_level 1
outer_f      [length = 4]
inner_f      [length = 64]
middle:
Nodes [length = 9], Size(507)
    Values [length = 2]
    Values [length = 64]
    Values [length = 64]
    Values [length = 64]
    Values [length = 64]
    Values [length = 64]
    Values [length = 64]
    Values [length = 64]
    Values [length = 57]
inner_b      [length = 64]
outer_b      [length = 9]

Clone is specified where it not required

When trying to derive PartialEq on a generic type which used an im-rs data structure I got an error about a missing Clone bound as it appears that Clone is specified pervasively on im-rs's impl's.

However, from reading the source and pulling out the minimally necessary methods to remove the bound for PartialEq for Vector<T> this should only be necessary on mutating methods, any queries should be able to work without the Clone bound.

// ERROR Clone required
#[derive(PartialEq)]
struct Test<T>(im::Vector<T>);

Docs links is broken

https://docs.rs/im/12.0.0/im/ is a 404.

Apparently docs.rs failed to build it? https://docs.rs/crate/im/12.0.0

publicise btree?

The btree nodes would be quite useful for another project i've got on. Is there any chance they could be made public?

Congrats on your recent inclusion in cargo, btw.

Immutable Vector

Are there any plans to add a clojure-esque immutable vector datatype? If so, what needs to be done. If not, are there any major blockers?

Arc and Iter.map

As explained in the documentation, the fact that Vector<T> wraps elements in Arcs makes it a little harder than with standard Vec<T> to iterate over in a functional way.

Therefore, I was wondering what an idiomatic use of Iter.map could look like:

extern crate im;

use std::iter::FromIterator;
use std::sync::Arc;

use im::*;

fn double(i: u64) -> u64 {
    return 2 * i;
}

fn vec() {
    let vec = vec![1, 2, 3];
    let vec_double = Vec::from_iter(vec.into_iter().map(double));
    println!("{:?}", vec_double);
}

fn vector() {
    let vec = vector![1, 2, 3];
    let vec_double = Vector::from_iter(vec.iter().map(|arc| {
        let i = Arc::try_unwrap(arc).unwrap_err();
        return double(*i);
    }));
    println!("{:?}", vec_double);
}

fn main() {
    vec();
    vector();
}

Can the Vector<T> example be improved? If this pattern is useful, perhaps it could be made easier by the library?

Accessing head and tail requires an Arc clone

So, I've been investigating ways to change the API of ConsList so that selector methods methods don't have to clone any Arcs. You can see the results of my experiment thus far here. The main changes are:

ConsList::head returns &Arc<A> instead of Arc<A>.
ConsList::tail return &ConsList<A> instead of ConsList<A>.
ConsList::uncons and ConsList::uncons2 are similarly updated.
A method ConsList::into_cons that takes the cdr by value rather than reference, to avoid cloning, is added.
into_cons is used where possible, e.g., in append, reverse, from_iter.

The overall result was a significant speedup to most benchmarks and a tiny slowdown in the append benchmark. The reason for the slowdown, I believe, is that to make this work, I had to revert the representation change that takes the size out of each node. (However, I kept the Option-instead-of-bespoke-enum aspect of the change.)

In order to see the changes, I added three new benchmarks first:

conslist_sum sums up a ConsList<u32> using an iterator.
conslist_sum_no_iter does the same thing, but with manual iteration instead of an iterator.
conslist_reverse reverses a list.

Running these benchmarks on the unchanged code, here's the result:

test conslist_append      ... bench:  96,154,736 ns/iter (+/- 9,114,543)
test conslist_cons        ... bench:     216,257 ns/iter (+/- 42,042)
test conslist_reverse     ... bench:     218,570 ns/iter (+/- 10,463)
test conslist_sum         ... bench:      94,835 ns/iter (+/- 6,432)
test conslist_sum_no_iter ... bench:      49,140 ns/iter (+/- 3,601)
test conslist_uncons      ... bench:      20,860 ns/iter (+/- 1,138)

Here's the final result, after all the changes:

test conslist_append      ... bench: 107,298,926 ns/iter (+/- 6,940,051)
test conslist_cons        ... bench:     212,312 ns/iter (+/- 14,665)
test conslist_reverse     ... bench:     153,976 ns/iter (+/- 9,101)
test conslist_sum         ... bench:      49,745 ns/iter (+/- 4,994)
test conslist_sum_no_iter ... bench:       4,535 ns/iter (+/- 722)
test conslist_uncons      ... bench:       2,974 ns/iter (+/- 502)

As you can see, most benchmarks have improved, especially sum_no_iter (10x) and uncons (7x)!

I haven't submitted a pull request, because this is a major API change and I assume you don't want it now/as-is. Note that it would be possible to get the speed benefits in a backward compatible way by adding separate head_ref/tail_ref/uncons_ref methods and leaving the old methods alone. If you want me to work that up, just say so.

Sorry that I'm putting all this effort into what I assume you consider your least interesting data structure. It's the only one I understand yet, but I'm wondering whether similar optimizations could apply to CatList.