scala / scala-collection-contrib Goto Github PK

View Code? Open in Web Editor NEW

108.0 108.0 31.0 342 KB

community-contributed additions to the Scala 2.13 collections

License: Apache License 2.0

Scala 100.00%

scala-collection-contrib's Issues

Setup release process

Eliminate usages of `ImmutableBuilder`

ImmutableBuilder is currently being used as the default builder implementation for:

immutable.MultiDict
immutable.SortedMultiDict
immutable.MultiSet
immutable.SortedMultiSet

However, it should be possible, for the sake of performance, to build mutating versions of these, which would mostly be delegating to already-implemented mutating builders of Maps in the standard library (i.e. scala.collection.immutable.{MapBuilderImpl, HashMapBuilder})

add Apache 2 license

Set.permutations

I would love to have permutations on Sets.

E.g. Set(1,2,3).permutations == Iterator(Set(1,2,3), Set(2,3), Set(1,3), Set(2,3), Set(1), Set(2), Set(3), Set.empty)

Build should declare a `versionScheme`

as per scala-lang.org/blog/2021/02/16/preventing-version-conflicts-with-versionscheme.html and discussion at scala/sbt-scala-module#111

After first release/publish, set up MiMa to enforce bincompat

backwards-only is fine, and I think we could give ourselves permission to be liberal about bumping the version number as needed, in order not to be too saddled by this

use SDKMAN in `.travis.yml` to obtain JDK

@eed3si9n can I interest you in waving the same wand over this repo that you waved over the other modules?

Rename MultiSet to Bag, MultiDict => MultiMap?

Bag is a lot less verbose than MultiSet, and MultiDict is also pretty awkward to try and say aloud.

MultiDict.add should return this if no changes are made

The add method is defined as follows

  def add(key: K, value: V): MultiDict[K, V] =
    new MultiDict(elems.updatedWith(key) {
      case None     => Some(Set(value))
      case Some(vs) => Some(vs + value)
    })

but elms.updatedWith could easily return the same MultiDict if the value is already in the Set[V].
It would help if the signature made sure that the same object would be returned if nothing changed as that would allow one
to reduce unnecessary object creation (both in the call, and also in the creation of objects in the calling object.
Otherwise one ends up creating a lot of garbage.

Something like this seems better:

def add(key: K, value: V): MultiDict[K, V] =
   val newElems = elems.updatedWith(key) {
      case None     => Some(Set(value))
      case Some(vs) => Some(vs + value) // if vs + value == vs then this will return the original MD I believe
    }
    if newElems eq this.elems then this
    else new MultiDict(newElems)

That is how incl in the standard scala library for HashSet works.

publicize this module now that it has been published

@julienrf can I interest you in doing some publicity on this?

Rename MultiDict back to original MultiMap or SetMultiMap

I think it may make the most sense to rename MultiDict to SetMultiMap, and potentially make MultiMap into a generic trait.

It helps remove the @Deprecation warning if renamed back to MultiMap
"MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has ArrayListMultimap, ListMultimap, SetMultimap, and etc; C++ STL is std::multimap; Rust is MultiMap, Apache Commons is MultiMap, etc)
The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements Set, but there is no way to extend it to use an Array, Vector or List instead)
The code still uses MultiMap in various locations (like in the hashcode!), and the CC[_] is MapFactory (e.g. not DictFactory)
There are no other Dictionary terms in any of the Scala collections. It's Map for any scala code, only the java converters have the mention of a dictionary. MultiDict isn't scala idiomatic.

Change values returned from SortedMultiDict to be a SortedSet[V]

Considering the distinction of a SortedMultiDict vs a MultiDict is the sorting, it seems reasonable to me that the returned values should also retain the relative ordering as they were added to the collection.

Scaladoc: https://www.javadoc.io/doc/org.scala-lang.modules/scala-collection-contrib_2.13/latest/scala/collection/SortedMultiDict.html

As a reference, this is how the type signature is in Guava: https://guava.dev/releases/19.0/api/docs/index.html?com/google/common/collect/ListMultimap.html

Collection-like operations on tuples

Sometimes I work with homogenous tuples, which I use more like a collection with a fixed size. When working in this style, I often define a map extension for tuples. Would something like this be worth adding here?

Example:

  implicit class AnyTuple2Ops[T](v: (T, T)) {
    def map[X](f: T => X): (X, X) = (f(v._1), f(v._2))
  }

  implicit class AnyTuple3Ops[T](v: (T, T, T)) {
    def map[X](f: T => X): (X, X, X) = (f(v._1), f(v._2), f(v._3))
  }

  implicit class AnyTuple4Ops[T](v: (T, T, T, T)) {
    def map[X](f: T => X): (X, X, X, X) = (f(v._1), f(v._2), f(v._3), f(v._4))
  }

Other operations in similar style can be added, like:

  implicit class AnyTuple2Ops[T](v: (T, T)) {
    def combine(that: (T, T))(by: (T, T) => T): (T, T) = (by(v._1, that._1), by(v._2, that._2))
  }

Release for Scala 2.13.0

All tests pass when using Scala 2.13.0. Does this mean that this project is ready for to be released?

Possible bug in implementation of `MapDecorator.mergeByKeyWith`.

Seen in

0.3.0

Reproduction

Scala worksheet, so assertions are verified by eye.
Using mergeByKey for convenience, knowing it delegates to mergeByKeyWith.

import scala.collection.decorators.mapDecorator

val arthur = "arthur.txt"

val tyson = "tyson.txt"

val sandra = "sandra.txt"

val allKeys = Set(arthur, tyson, sandra)

val sharedValue = 1

val ourChanges = Map(
  (
    arthur,
    sharedValue
  ),
  (
    tyson,
    2
  )
)

val theirChanges = Map(
  (
    arthur,
    sharedValue
  ),
  (
    sandra,
    3
  )
)

ourChanges -> theirChanges

ourChanges.mergeByKey(theirChanges)

// Expect all the keys to appear in an outer join, and they do, good...
ourChanges.mergeByKey(theirChanges).keys == allKeys

theirChanges.mergeByKey(ourChanges)

// Expect all the keys to appear in an outer join, and they do, good...
theirChanges.mergeByKey(ourChanges).keys == allKeys

// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They are, good...
ourChanges
  .mergeByKey(theirChanges)
  .values
  .map(_.swap)
  .toList
  .sorted
  .sameElements(theirChanges.mergeByKey(ourChanges).values.toList.sorted)

// Expect these to be equal, and they are, good...
ourChanges.mergeByKey(theirChanges).keySet == theirChanges
  .mergeByKey(ourChanges)
  .keys

val theirChangesRedux = Map(
  (
    arthur,
    sharedValue
  ),
  (
    sandra,
    sharedValue // <<<<------- Ahem!
  )
)

ourChanges -> theirChangesRedux

ourChanges.mergeByKey(theirChangesRedux)

// Expect all the keys to appear in an outer join, but they don't...
ourChanges.mergeByKey(theirChangesRedux).keys == allKeys

theirChangesRedux.mergeByKey(ourChanges)

// Expect all the keys to appear in an outer join, and they do, good...
theirChangesRedux.mergeByKey(ourChanges).keys == allKeys

// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They aren't...
ourChanges
  .mergeByKey(theirChangesRedux)
  .values
  .map(_.swap)
  .toList
  .sorted
  .sameElements(theirChangesRedux.mergeByKey(ourChanges).values.toList.sorted)

// Expect these to be equal, but they aren't...
ourChanges.mergeByKey(theirChangesRedux).keySet == theirChangesRedux
  .mergeByKey(ourChanges)
  .keys

Expectation

The Scaladoc for mergeByKey states:

Perform a full outer join of this and that.
Equivalent to mergeByKeyWith(that) { case any => any }.

So all the keys from ourChanges and theirChanges should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both of ourChanges and theirChanges should appear in the associated pairs, wrapped in Some.

Observed

ourChanges.mergeByKey(theirChangesRedux) drops key sandra, whereas theirChangesRedux.mergeByKey(ourChanges) preserves all keys.

Discussion

There is a set, traversed that is used to prevent duplicate key processing when switching from iterating over from coll to other. It is however typed as a set over the value type W of other, and is populated with values, not keys.

In the example above, the use of a shared value triggers the problem, causing the entry for key sandra to be dropped.

I am assuming this was a typo and not the desired behaviour.

If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.

I'll wait until others chime in in case I've misinterpreted the original intent...

Add `between`, `betweenBy` functions to Seq

It would be nice to have a function similar to SQL BETWEEN operator that can filter elements from a Seq that are in some range. This is not an uncommon task when working with statistics, time series, location etc.
We can implement it as an extension for Seq, roughly like this:

extension [T: Ordering](seq: Seq[T])
  def between(a: T, b: T): Seq[T] = seq.filter(e => Ordering[T].gt(e, a) && Ordering[T].lt(e, b))

and then use it like so:

Seq(1, 2, 3, 4, 5, 6 ,7, 7, 9, 10).between(4, 7) //will result in Seq(5, 6)

We could also make an extension that would allow filtering sequences of arbitrary product types that don't have an ordering, but have some fields that we can use for comparison, similar to sortBy function, for example:

extension [T](seq: Seq[T])
  def betweenBy[U: Ordering](a: U, b: U)(f: T => U): Seq[T] = seq.filter { e => 
    Ordering[U].gt(f(e), a) && Ordering[U].lt(f(e), b) 
}

locationDataList.betweenBy(hourAgoInstant, Instant.now)(_.timestamp)

IMHO this is much more intuitive, less prone to errors, and nicer to read than using a simple filter function.

P.S. naming is debatable here because I also can suggest using between on numeric types to check if they are in some range, e.g. 1.between(-1, 5) // returns true that would be useful addition too, but might cause confusion if paired with between on Sequences.

MultiDict add many values at once for a single key

Hi, I've recently started using mutable.MultiDict, but I often found myself using this pattern.

iterable.foreach(v => dict.addOne(k,v))

Could we add a function to add many values all at once for a single key instead of one at a time?

I was thinking of something like the following:

// mutable
  def addMany(key: K, values: Iterable[V]): this.type = {
    elems.updateWith(k) {
      case None     => Some(Set.from(values))
      case Some(vs) => Some(vs ++= values)
    }
    this
  }

// immutable
  def addMany(key: K, values: Iterable[V]): MultiDict[K, V] =
    new MultiDict(elems.updatedWith(key) {
      case None     => Some(Set.from(values))
      case Some(vs) => Some(vs ++ values)
    })

We might also want to skip updates if the iterables are empty.

Project does not correctly import into IntelliJ

The project loads into IntelliJ, but then for some reason, IntelliJ cannot understand that the tests require JUnit as a dependency, so everything JUnit is marked in red. This is confirmed with multiple fresh imports of the project, across multiple users and OS's (at least on Mac and Linux).

Law-test collections implemented in contrib

We should reuse the property-based tests written in the scalacheck sub-project. We will have to adapt the build a bit because it is currently not cross-compatible with dotty and Scala.js.

add Scala.js cross-build

(Scala Native is out of scope until they have Scala 2.13 support)

Add Automatic-Module-Name to MANIFEST.INF

In order to increase compatibility on JDK versions with modules consider adding Automatic-Module-Name in the MANIFEST.INF along the lines of akka/akka#23960 or perhaps even implement properly the module-info descriptor.

Setup CI

Immutable MultiSet ( and MultiDict ) lacks -- method

Hi!

I believe it would be nice to have a -- method to remove multiple elements from the collection. Looks like it was missed

New Release

Would it be possible to get a new release pushed out for the Scala Native support? Thanks.

Scala Native Support

This library supports scala-js via cross compilation, is there any chance it can support scala-native-0.4.x as well, or is there some blocker to it?

Broken 0.2.0 artifact (no .class files in .jar)

scala-collection-contrib_2.13-0.2.0.jar is missing .class files (it only has .properties file and manifest)

https://repo1.maven.org/maven2/org/scala-lang/modules/scala-collection-contrib_2.13/0.2.0/scala-collection-contrib_2.13-0.2.0.jar

@SethTisue @julienrf

Remove extensibility framework

If scala/scala#6674 gets merged, we should remove the HasXxxOps types and use the equivalent types from the stdlib.

Add Scala 3 crossbuild

volunteer?

it probably works with withDottyCompat, but it's better to have a "real" Scala 3 artifact

Support scala.js

It would be good to make these classes available in scala.js.

Document iteration order of multi-dict

The iteration order when iterating over a MultiDict (e.g., with map or values is not specified anywhere in the documentation that I can find.

I suspect that the order is implementation dependent, which is fine. If that is the case, it should be documented.

In my current use case, I am hoping that the iteration order is the insertion order. Since I don't know the order, I don't know if I can use a MultiDict[A, B] or if I need to work with a Map[A, List[B]] which would be more cumbersome.

Link to scaladoc in README

Is it possible to link to the library's scaladoc at the top of the README? Would also be great if specific new types and operations in the README could link to their specific scaladoc.

OSGi support is missing

the problem is split packages. if this code wasn't under scala.collection, we'd be fine. see #51 for (and linked tickets) for gory details

we (the core Scala team) don't plan to tackle this ourselves, but we're open to receiving a pull request on it

comment removed from build.sbt:

    // TODO: osgi settings; not trivial because of split packages.                                                       
    // See https://github.com/scala/scala-collection-compat/pull/226                                                     
    // and https://github.com/scala/scala-collection-contrib/issues/51                                                   
    // (and issues linked from 51)                                                                                       
    // OsgiKeys.exportPackage := Seq(s"scala.collection.*;version=${version.value}"),

publish 0.2.0

it's tagged but not published, the staging repos wouldn't close. I'll come back to it

Add shift operations on BitSet

Add bitwise shift (<< and >>) operations to BitSet collection.
Motivation:

completeness of bitwise operation support
applicable in Dynamic Programming solutions to the Subset Sum / Knapsack types of problems
feature parity with C++ std::bitset: https://en.cppreference.com/w/cpp/utility/bitset/operator_ltltgtgt

Note: << can be emulated with .map(_ + shiftBy) and >> with .collect { case b if b >= shiftBy => b - shiftBy }, but performance difference is two orders of magnitude.

I have some existing code for it, will be able to submit a PR soon.

MultiDict and MultiSet should be either abstract traits or concrete implementations

Currently the pattern between MultiDicts and MultiSets are different:

MultDict is a concrete collection
MultiSet is an abstract data type (trait) with a public concrete implementation MultiSetImpl.

I lean more towards the abstract data type approach of MultiSet, since it is consistent with the rest of the collections. Though the name MultiSetIml/BagImpl isn't very good. Perhaps CountedMultiSet/CountedBag extends MultiSet/Bag?

Can Scala 2.12 be supported?

https://www.scala-lang.org/api/current/scala/collection/mutable/MultiMap.html
says:
"(Since version 2.13.0) Use a scala.collection.mutable.MultiDict in the scala-collection-contrib module"

I'd like to use MultiDict in a project that I want to support both Scala 2.12 and 2.13.

Can scala-collection-contrib also support Scala 2.12?