scala / scala-collection-contrib Goto Github PK
View Code? Open in Web Editor NEWcommunity-contributed additions to the Scala 2.13 collections
License: Apache License 2.0
community-contributed additions to the Scala 2.13 collections
License: Apache License 2.0
ImmutableBuilder
is currently being used as the default builder implementation for:
immutable.MultiDict
immutable.SortedMultiDict
immutable.MultiSet
immutable.SortedMultiSet
However, it should be possible, for the sake of performance, to build mutating versions of these, which would mostly be delegating to already-implemented mutating builders of Maps in the standard library (i.e. scala.collection.immutable.{MapBuilderImpl, HashMapBuilder}
)
I would love to have permutations
on Set
s.
E.g. Set(1,2,3).permutations == Iterator(Set(1,2,3), Set(2,3), Set(1,3), Set(2,3), Set(1), Set(2), Set(3), Set.empty)
as per scala-lang.org/blog/2021/02/16/preventing-version-conflicts-with-versionscheme.html and discussion at scala/sbt-scala-module#111
backwards-only is fine, and I think we could give ourselves permission to be liberal about bumping the version number as needed, in order not to be too saddled by this
@eed3si9n can I interest you in waving the same wand over this repo that you waved over the other modules?
Bag
is a lot less verbose than MultiSet
, and MultiDict
is also pretty awkward to try and say aloud.
The add method is defined as follows
def add(key: K, value: V): MultiDict[K, V] =
new MultiDict(elems.updatedWith(key) {
case None => Some(Set(value))
case Some(vs) => Some(vs + value)
})
but elms.updatedWith
could easily return the same MultiDict if the value is already in the Set[V].
It would help if the signature made sure that the same object would be returned if nothing changed as that would allow one
to reduce unnecessary object creation (both in the call, and also in the creation of objects in the calling object.
Otherwise one ends up creating a lot of garbage.
Something like this seems better:
def add(key: K, value: V): MultiDict[K, V] =
val newElems = elems.updatedWith(key) {
case None => Some(Set(value))
case Some(vs) => Some(vs + value) // if vs + value == vs then this will return the original MD I believe
}
if newElems eq this.elems then this
else new MultiDict(newElems)
That is how incl
in the standard scala library for HashSet works.
@julienrf can I interest you in doing some publicity on this?
I think it may make the most sense to rename MultiDict
to SetMultiMap
, and potentially make MultiMap
into a generic trait.
MultiMap
ArrayListMultimap
, ListMultimap
, SetMultimap
, and etc; C++ STL is std::multimap
; Rust is MultiMap
, Apache Commons is MultiMap
, etc)Set
, but there is no way to extend it to use an Array
, Vector
or List
instead)MultiMap
in various locations (like in the hashcode!), and the CC[_]
is MapFactory
(e.g. not DictFactory
)Map
for any scala code, only the java converters have the mention of a dictionary. MultiDict
isn't scala idiomatic.Considering the distinction of a SortedMultiDict
vs a MultiDict
is the sorting, it seems reasonable to me that the returned values should also retain the relative ordering as they were added to the collection.
As a reference, this is how the type signature is in Guava: https://guava.dev/releases/19.0/api/docs/index.html?com/google/common/collect/ListMultimap.html
Sometimes I work with homogenous tuples, which I use more like a collection with a fixed size. When working in this style, I often define a map
extension for tuples. Would something like this be worth adding here?
Example:
implicit class AnyTuple2Ops[T](v: (T, T)) {
def map[X](f: T => X): (X, X) = (f(v._1), f(v._2))
}
implicit class AnyTuple3Ops[T](v: (T, T, T)) {
def map[X](f: T => X): (X, X, X) = (f(v._1), f(v._2), f(v._3))
}
implicit class AnyTuple4Ops[T](v: (T, T, T, T)) {
def map[X](f: T => X): (X, X, X, X) = (f(v._1), f(v._2), f(v._3), f(v._4))
}
Other operations in similar style can be added, like:
implicit class AnyTuple2Ops[T](v: (T, T)) {
def combine(that: (T, T))(by: (T, T) => T): (T, T) = (by(v._1, that._1), by(v._2, that._2))
}
All tests pass when using Scala 2.13.0. Does this mean that this project is ready for to be released?
0.3.0
Scala worksheet, so assertions are verified by eye.
Using mergeByKey
for convenience, knowing it delegates to mergeByKeyWith
.
import scala.collection.decorators.mapDecorator
val arthur = "arthur.txt"
val tyson = "tyson.txt"
val sandra = "sandra.txt"
val allKeys = Set(arthur, tyson, sandra)
val sharedValue = 1
val ourChanges = Map(
(
arthur,
sharedValue
),
(
tyson,
2
)
)
val theirChanges = Map(
(
arthur,
sharedValue
),
(
sandra,
3
)
)
ourChanges -> theirChanges
ourChanges.mergeByKey(theirChanges)
// Expect all the keys to appear in an outer join, and they do, good...
ourChanges.mergeByKey(theirChanges).keys == allKeys
theirChanges.mergeByKey(ourChanges)
// Expect all the keys to appear in an outer join, and they do, good...
theirChanges.mergeByKey(ourChanges).keys == allKeys
// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They are, good...
ourChanges
.mergeByKey(theirChanges)
.values
.map(_.swap)
.toList
.sorted
.sameElements(theirChanges.mergeByKey(ourChanges).values.toList.sorted)
// Expect these to be equal, and they are, good...
ourChanges.mergeByKey(theirChanges).keySet == theirChanges
.mergeByKey(ourChanges)
.keys
val theirChangesRedux = Map(
(
arthur,
sharedValue
),
(
sandra,
sharedValue // <<<<------- Ahem!
)
)
ourChanges -> theirChangesRedux
ourChanges.mergeByKey(theirChangesRedux)
// Expect all the keys to appear in an outer join, but they don't...
ourChanges.mergeByKey(theirChangesRedux).keys == allKeys
theirChangesRedux.mergeByKey(ourChanges)
// Expect all the keys to appear in an outer join, and they do, good...
theirChangesRedux.mergeByKey(ourChanges).keys == allKeys
// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They aren't...
ourChanges
.mergeByKey(theirChangesRedux)
.values
.map(_.swap)
.toList
.sorted
.sameElements(theirChangesRedux.mergeByKey(ourChanges).values.toList.sorted)
// Expect these to be equal, but they aren't...
ourChanges.mergeByKey(theirChangesRedux).keySet == theirChangesRedux
.mergeByKey(ourChanges)
.keys
The Scaladoc for mergeByKey
states:
Perform a full outer join of this and that.
Equivalent to mergeByKeyWith(that) { case any => any }.
So all the keys from ourChanges
and theirChanges
should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both of ourChanges
and theirChanges
should appear in the associated pairs, wrapped in Some
.
ourChanges.mergeByKey(theirChangesRedux)
drops key sandra
, whereas theirChangesRedux.mergeByKey(ourChanges)
preserves all keys.
There is a set, traversed
that is used to prevent duplicate key processing when switching from iterating over from coll
to other
. It is however typed as a set over the value type W
of other
, and is populated with values, not keys.
In the example above, the use of a shared value triggers the problem, causing the entry for key sandra
to be dropped.
I am assuming this was a typo and not the desired behaviour.
If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.
I'll wait until others chime in in case I've misinterpreted the original intent...
It would be nice to have a function similar to SQL BETWEEN
operator that can filter elements from a Seq
that are in some range. This is not an uncommon task when working with statistics, time series, location etc.
We can implement it as an extension for Seq, roughly like this:
extension [T: Ordering](seq: Seq[T])
def between(a: T, b: T): Seq[T] = seq.filter(e => Ordering[T].gt(e, a) && Ordering[T].lt(e, b))
and then use it like so:
Seq(1, 2, 3, 4, 5, 6 ,7, 7, 9, 10).between(4, 7) //will result in Seq(5, 6)
We could also make an extension that would allow filtering sequences of arbitrary product types that don't have an ordering, but have some fields that we can use for comparison, similar to sortBy
function, for example:
extension [T](seq: Seq[T])
def betweenBy[U: Ordering](a: U, b: U)(f: T => U): Seq[T] = seq.filter { e =>
Ordering[U].gt(f(e), a) && Ordering[U].lt(f(e), b)
}
locationDataList.betweenBy(hourAgoInstant, Instant.now)(_.timestamp)
IMHO this is much more intuitive, less prone to errors, and nicer to read than using a simple filter function.
P.S. naming is debatable here because I also can suggest using between
on numeric types to check if they are in some range, e.g. 1.between(-1, 5) // returns true
that would be useful addition too, but might cause confusion if paired with between
on Sequences.
Hi, I've recently started using mutable.MultiDict
, but I often found myself using this pattern.
iterable.foreach(v => dict.addOne(k,v))
Could we add a function to add many values all at once for a single key instead of one at a time?
I was thinking of something like the following:
// mutable
def addMany(key: K, values: Iterable[V]): this.type = {
elems.updateWith(k) {
case None => Some(Set.from(values))
case Some(vs) => Some(vs ++= values)
}
this
}
// immutable
def addMany(key: K, values: Iterable[V]): MultiDict[K, V] =
new MultiDict(elems.updatedWith(key) {
case None => Some(Set.from(values))
case Some(vs) => Some(vs ++ values)
})
We might also want to skip updates if the iterables are empty.
We should reuse the property-based tests written in the scalacheck
sub-project. We will have to adapt the build a bit because it is currently not cross-compatible with dotty and Scala.js.
(Scala Native is out of scope until they have Scala 2.13 support)
In order to increase compatibility on JDK versions with modules consider adding Automatic-Module-Name in the MANIFEST.INF along the lines of akka/akka#23960 or perhaps even implement properly the module-info descriptor.
Hi!
I believe it would be nice to have a -- method to remove multiple elements from the collection. Looks like it was missed
Would it be possible to get a new release pushed out for the Scala Native support? Thanks.
This library supports scala-js
via cross compilation, is there any chance it can support scala-native-0.4.x
as well, or is there some blocker to it?
scala-collection-contrib_2.13-0.2.0.jar
is missing .class files (it only has .properties file and manifest)
If scala/scala#6674 gets merged, we should remove the HasXxxOps
types and use the equivalent types from the stdlib.
volunteer?
it probably works with withDottyCompat
, but it's better to have a "real" Scala 3 artifact
It would be good to make these classes available in scala.js.
The iteration order when iterating over a MultiDict
(e.g., with map
or values
is not specified anywhere in the documentation that I can find.
I suspect that the order is implementation dependent, which is fine. If that is the case, it should be documented.
In my current use case, I am hoping that the iteration order is the insertion order. Since I don't know the order, I don't know if I can use a MultiDict[A, B]
or if I need to work with a Map[A, List[B]]
which would be more cumbersome.
Is it possible to link to the library's scaladoc at the top of the README? Would also be great if specific new types and operations in the README could link to their specific scaladoc.
the problem is split packages. if this code wasn't under scala.collection
, we'd be fine. see #51 for (and linked tickets) for gory details
we (the core Scala team) don't plan to tackle this ourselves, but we're open to receiving a pull request on it
comment removed from build.sbt:
// TODO: osgi settings; not trivial because of split packages.
// See https://github.com/scala/scala-collection-compat/pull/226
// and https://github.com/scala/scala-collection-contrib/issues/51
// (and issues linked from 51)
// OsgiKeys.exportPackage := Seq(s"scala.collection.*;version=${version.value}"),
it's tagged but not published, the staging repos wouldn't close. I'll come back to it
Add bitwise shift (<<
and >>
) operations to BitSet
collection.
Motivation:
std::bitset
: https://en.cppreference.com/w/cpp/utility/bitset/operator_ltltgtgtNote: <<
can be emulated with .map(_ + shiftBy)
and >>
with .collect { case b if b >= shiftBy => b - shiftBy }
, but performance difference is two orders of magnitude.
I have some existing code for it, will be able to submit a PR soon.
Currently the pattern between MultiDicts and MultiSets are different:
MultDict
is a concrete collectionMultiSet
is an abstract data type (trait) with a public concrete implementation MultiSetImpl
.I lean more towards the abstract data type approach of MultiSet
, since it is consistent with the rest of the collections. Though the name MultiSetIml
/BagImpl
isn't very good. Perhaps CountedMultiSet
/CountedBag
extends MultiSet
/Bag
?
https://www.scala-lang.org/api/current/scala/collection/mutable/MultiMap.html
says:
"(Since version 2.13.0) Use a scala.collection.mutable.MultiDict in the scala-collection-contrib module"
I'd like to use MultiDict
in a project that I want to support both Scala 2.12 and 2.13.
Can scala-collection-contrib also support Scala 2.12?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.