Giter Club home page Giter Club logo

scala-collection-contrib's Issues

Eliminate usages of `ImmutableBuilder`

ImmutableBuilder is currently being used as the default builder implementation for:

  • immutable.MultiDict
  • immutable.SortedMultiDict
  • immutable.MultiSet
  • immutable.SortedMultiSet

However, it should be possible, for the sake of performance, to build mutating versions of these, which would mostly be delegating to already-implemented mutating builders of Maps in the standard library (i.e. scala.collection.immutable.{MapBuilderImpl, HashMapBuilder})

Set.permutations

I would love to have permutations on Sets.

E.g. Set(1,2,3).permutations == Iterator(Set(1,2,3), Set(2,3), Set(1,3), Set(2,3), Set(1), Set(2), Set(3), Set.empty)

MultiDict.add should return this if no changes are made

The add method is defined as follows

  def add(key: K, value: V): MultiDict[K, V] =
    new MultiDict(elems.updatedWith(key) {
      case None     => Some(Set(value))
      case Some(vs) => Some(vs + value)
    })

but elms.updatedWith could easily return the same MultiDict if the value is already in the Set[V].
It would help if the signature made sure that the same object would be returned if nothing changed as that would allow one
to reduce unnecessary object creation (both in the call, and also in the creation of objects in the calling object.
Otherwise one ends up creating a lot of garbage.

Something like this seems better:

def add(key: K, value: V): MultiDict[K, V] =
   val newElems = elems.updatedWith(key) {
      case None     => Some(Set(value))
      case Some(vs) => Some(vs + value) // if vs + value == vs then this will return the original MD I believe
    }
    if newElems eq this.elems then this
    else new MultiDict(newElems)

That is how incl in the standard scala library for HashSet works.

Rename MultiDict back to original MultiMap or SetMultiMap

I think it may make the most sense to rename MultiDict to SetMultiMap, and potentially make MultiMap into a generic trait.

  1. It helps remove the @Deprecation warning if renamed back to MultiMap
  2. "MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has ArrayListMultimap, ListMultimap, SetMultimap, and etc; C++ STL is std::multimap; Rust is MultiMap, Apache Commons is MultiMap, etc)
  3. The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements Set, but there is no way to extend it to use an Array, Vector or List instead)
  4. The code still uses MultiMap in various locations (like in the hashcode!), and the CC[_] is MapFactory (e.g. not DictFactory)
  5. There are no other Dictionary terms in any of the Scala collections. It's Map for any scala code, only the java converters have the mention of a dictionary. MultiDict isn't scala idiomatic.

Change values returned from SortedMultiDict to be a SortedSet[V]

Considering the distinction of a SortedMultiDict vs a MultiDict is the sorting, it seems reasonable to me that the returned values should also retain the relative ordering as they were added to the collection.

Scaladoc: https://www.javadoc.io/doc/org.scala-lang.modules/scala-collection-contrib_2.13/latest/scala/collection/SortedMultiDict.html

As a reference, this is how the type signature is in Guava: https://guava.dev/releases/19.0/api/docs/index.html?com/google/common/collect/ListMultimap.html

Collection-like operations on tuples

Sometimes I work with homogenous tuples, which I use more like a collection with a fixed size. When working in this style, I often define a map extension for tuples. Would something like this be worth adding here?

Example:

  implicit class AnyTuple2Ops[T](v: (T, T)) {
    def map[X](f: T => X): (X, X) = (f(v._1), f(v._2))
  }

  implicit class AnyTuple3Ops[T](v: (T, T, T)) {
    def map[X](f: T => X): (X, X, X) = (f(v._1), f(v._2), f(v._3))
  }

  implicit class AnyTuple4Ops[T](v: (T, T, T, T)) {
    def map[X](f: T => X): (X, X, X, X) = (f(v._1), f(v._2), f(v._3), f(v._4))
  }

Other operations in similar style can be added, like:

  implicit class AnyTuple2Ops[T](v: (T, T)) {
    def combine(that: (T, T))(by: (T, T) => T): (T, T) = (by(v._1, that._1), by(v._2, that._2))
  }

Release for Scala 2.13.0

All tests pass when using Scala 2.13.0. Does this mean that this project is ready for to be released?

Possible bug in implementation of `MapDecorator.mergeByKeyWith`.

Seen in

0.3.0

Reproduction

Scala worksheet, so assertions are verified by eye.
Using mergeByKey for convenience, knowing it delegates to mergeByKeyWith.

import scala.collection.decorators.mapDecorator

val arthur = "arthur.txt"

val tyson = "tyson.txt"

val sandra = "sandra.txt"

val allKeys = Set(arthur, tyson, sandra)

val sharedValue = 1

val ourChanges = Map(
  (
    arthur,
    sharedValue
  ),
  (
    tyson,
    2
  )
)

val theirChanges = Map(
  (
    arthur,
    sharedValue
  ),
  (
    sandra,
    3
  )
)

ourChanges -> theirChanges

ourChanges.mergeByKey(theirChanges)

// Expect all the keys to appear in an outer join, and they do, good...
ourChanges.mergeByKey(theirChanges).keys == allKeys

theirChanges.mergeByKey(ourChanges)

// Expect all the keys to appear in an outer join, and they do, good...
theirChanges.mergeByKey(ourChanges).keys == allKeys

// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They are, good...
ourChanges
  .mergeByKey(theirChanges)
  .values
  .map(_.swap)
  .toList
  .sorted
  .sameElements(theirChanges.mergeByKey(ourChanges).values.toList.sorted)

// Expect these to be equal, and they are, good...
ourChanges.mergeByKey(theirChanges).keySet == theirChanges
  .mergeByKey(ourChanges)
  .keys

val theirChangesRedux = Map(
  (
    arthur,
    sharedValue
  ),
  (
    sandra,
    sharedValue // <<<<------- Ahem!
  )
)

ourChanges -> theirChangesRedux

ourChanges.mergeByKey(theirChangesRedux)

// Expect all the keys to appear in an outer join, but they don't...
ourChanges.mergeByKey(theirChangesRedux).keys == allKeys

theirChangesRedux.mergeByKey(ourChanges)

// Expect all the keys to appear in an outer join, and they do, good...
theirChangesRedux.mergeByKey(ourChanges).keys == allKeys

// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They aren't...
ourChanges
  .mergeByKey(theirChangesRedux)
  .values
  .map(_.swap)
  .toList
  .sorted
  .sameElements(theirChangesRedux.mergeByKey(ourChanges).values.toList.sorted)

// Expect these to be equal, but they aren't...
ourChanges.mergeByKey(theirChangesRedux).keySet == theirChangesRedux
  .mergeByKey(ourChanges)
  .keys

Expectation

The Scaladoc for mergeByKey states:

Perform a full outer join of this and that.
Equivalent to mergeByKeyWith(that) { case any => any }.

So all the keys from ourChanges and theirChanges should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both of ourChanges and theirChanges should appear in the associated pairs, wrapped in Some.

Observed

ourChanges.mergeByKey(theirChangesRedux) drops key sandra, whereas theirChangesRedux.mergeByKey(ourChanges) preserves all keys.

Discussion

There is a set, traversed that is used to prevent duplicate key processing when switching from iterating over from coll to other. It is however typed as a set over the value type W of other, and is populated with values, not keys.

In the example above, the use of a shared value triggers the problem, causing the entry for key sandra to be dropped.

I am assuming this was a typo and not the desired behaviour.

If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.

I'll wait until others chime in in case I've misinterpreted the original intent...

Add `between`, `betweenBy` functions to Seq

It would be nice to have a function similar to SQL BETWEEN operator that can filter elements from a Seq that are in some range. This is not an uncommon task when working with statistics, time series, location etc.
We can implement it as an extension for Seq, roughly like this:

extension [T: Ordering](seq: Seq[T])
  def between(a: T, b: T): Seq[T] = seq.filter(e => Ordering[T].gt(e, a) && Ordering[T].lt(e, b))

and then use it like so:

Seq(1, 2, 3, 4, 5, 6 ,7, 7, 9, 10).between(4, 7) //will result in Seq(5, 6)

We could also make an extension that would allow filtering sequences of arbitrary product types that don't have an ordering, but have some fields that we can use for comparison, similar to sortBy function, for example:

extension [T](seq: Seq[T])
  def betweenBy[U: Ordering](a: U, b: U)(f: T => U): Seq[T] = seq.filter { e => 
    Ordering[U].gt(f(e), a) && Ordering[U].lt(f(e), b) 
}

locationDataList.betweenBy(hourAgoInstant, Instant.now)(_.timestamp)

IMHO this is much more intuitive, less prone to errors, and nicer to read than using a simple filter function.

P.S. naming is debatable here because I also can suggest using between on numeric types to check if they are in some range, e.g. 1.between(-1, 5) // returns true that would be useful addition too, but might cause confusion if paired with between on Sequences.

MultiDict add many values at once for a single key

Hi, I've recently started using mutable.MultiDict, but I often found myself using this pattern.

iterable.foreach(v => dict.addOne(k,v))

Could we add a function to add many values all at once for a single key instead of one at a time?

I was thinking of something like the following:

// mutable
  def addMany(key: K, values: Iterable[V]): this.type = {
    elems.updateWith(k) {
      case None     => Some(Set.from(values))
      case Some(vs) => Some(vs ++= values)
    }
    this
  }
// immutable
  def addMany(key: K, values: Iterable[V]): MultiDict[K, V] =
    new MultiDict(elems.updatedWith(key) {
      case None     => Some(Set.from(values))
      case Some(vs) => Some(vs ++ values)
    })

We might also want to skip updates if the iterables are empty.

Project does not correctly import into IntelliJ

The project loads into IntelliJ, but then for some reason, IntelliJ cannot understand that the tests require JUnit as a dependency, so everything JUnit is marked in red. This is confirmed with multiple fresh imports of the project, across multiple users and OS's (at least on Mac and Linux).

Screen Shot 2019-11-01 at 1 24 00 PM

Law-test collections implemented in contrib

We should reuse the property-based tests written in the scalacheck sub-project. We will have to adapt the build a bit because it is currently not cross-compatible with dotty and Scala.js.

New Release

Would it be possible to get a new release pushed out for the Scala Native support? Thanks.

Scala Native Support

This library supports scala-js via cross compilation, is there any chance it can support scala-native-0.4.x as well, or is there some blocker to it?

Add Scala 3 crossbuild

volunteer?

it probably works with withDottyCompat, but it's better to have a "real" Scala 3 artifact

Support scala.js

It would be good to make these classes available in scala.js.

Document iteration order of multi-dict

The iteration order when iterating over a MultiDict (e.g., with map or values is not specified anywhere in the documentation that I can find.

I suspect that the order is implementation dependent, which is fine. If that is the case, it should be documented.

In my current use case, I am hoping that the iteration order is the insertion order. Since I don't know the order, I don't know if I can use a MultiDict[A, B] or if I need to work with a Map[A, List[B]] which would be more cumbersome.

Link to scaladoc in README

Is it possible to link to the library's scaladoc at the top of the README? Would also be great if specific new types and operations in the README could link to their specific scaladoc.

OSGi support is missing

the problem is split packages. if this code wasn't under scala.collection, we'd be fine. see #51 for (and linked tickets) for gory details

we (the core Scala team) don't plan to tackle this ourselves, but we're open to receiving a pull request on it

comment removed from build.sbt:

    // TODO: osgi settings; not trivial because of split packages.                                                       
    // See https://github.com/scala/scala-collection-compat/pull/226                                                     
    // and https://github.com/scala/scala-collection-contrib/issues/51                                                   
    // (and issues linked from 51)                                                                                       
    // OsgiKeys.exportPackage := Seq(s"scala.collection.*;version=${version.value}"),                                    

publish 0.2.0

it's tagged but not published, the staging repos wouldn't close. I'll come back to it

Add shift operations on BitSet

Add bitwise shift (<< and >>) operations to BitSet collection.
Motivation:

Note: << can be emulated with .map(_ + shiftBy) and >> with .collect { case b if b >= shiftBy => b - shiftBy }, but performance difference is two orders of magnitude.

I have some existing code for it, will be able to submit a PR soon.

MultiDict and MultiSet should be either abstract traits or concrete implementations

Currently the pattern between MultiDicts and MultiSets are different:

  • MultDict is a concrete collection
  • MultiSet is an abstract data type (trait) with a public concrete implementation MultiSetImpl.

I lean more towards the abstract data type approach of MultiSet, since it is consistent with the rest of the collections. Though the name MultiSetIml/BagImpl isn't very good. Perhaps CountedMultiSet/CountedBag extends MultiSet/Bag?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.