So, I've been investigating ways to change the API of ConsList
so that selector methods methods don't have to clone any Arc
s. You can see the results of my experiment thus far here. The main changes are:
ConsList::head
returns &Arc<A>
instead of Arc<A>
.
ConsList::tail
return &ConsList<A>
instead of ConsList<A>
.
ConsList::uncons
and ConsList::uncons2
are similarly updated.
- A method
ConsList::into_cons
that takes the cdr by value rather than reference, to avoid cloning, is added.
into_cons
is used where possible, e.g., in append
, reverse
, from_iter
.
The overall result was a significant speedup to most benchmarks and a tiny slowdown in the append benchmark. The reason for the slowdown, I believe, is that to make this work, I had to revert the representation change that takes the size out of each node. (However, I kept the Option
-instead-of-bespoke-enum
aspect of the change.)
In order to see the changes, I added three new benchmarks first:
Running these benchmarks on the unchanged code, here's the result:
test conslist_append ... bench: 96,154,736 ns/iter (+/- 9,114,543)
test conslist_cons ... bench: 216,257 ns/iter (+/- 42,042)
test conslist_reverse ... bench: 218,570 ns/iter (+/- 10,463)
test conslist_sum ... bench: 94,835 ns/iter (+/- 6,432)
test conslist_sum_no_iter ... bench: 49,140 ns/iter (+/- 3,601)
test conslist_uncons ... bench: 20,860 ns/iter (+/- 1,138)
Here's the final result, after all the changes:
test conslist_append ... bench: 107,298,926 ns/iter (+/- 6,940,051)
test conslist_cons ... bench: 212,312 ns/iter (+/- 14,665)
test conslist_reverse ... bench: 153,976 ns/iter (+/- 9,101)
test conslist_sum ... bench: 49,745 ns/iter (+/- 4,994)
test conslist_sum_no_iter ... bench: 4,535 ns/iter (+/- 722)
test conslist_uncons ... bench: 2,974 ns/iter (+/- 502)
As you can see, most benchmarks have improved, especially sum_no_iter
(10x) and uncons
(7x)!
I haven't submitted a pull request, because this is a major API change and I assume you don't want it now/as-is. Note that it would be possible to get the speed benefits in a backward compatible way by adding separate head_ref
/tail_ref
/uncons_ref
methods and leaving the old methods alone. If you want me to work that up, just say so.
Sorry that I'm putting all this effort into what I assume you consider your least interesting data structure. It's the only one I understand yet, but I'm wondering whether similar optimizations could apply to CatList
.