Giter Club home page Giter Club logo

Comments (16)

joshlemer avatar joshlemer commented on June 1, 2024 3

Bag does appear to be an established name:

https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset
https://en.wikipedia.org/wiki/Multiset
https://commons.apache.org/proper/commons-collections/userguide.html#Bags

but you're likely right, that MultiSet is more commonly used, I'm not sure.

That's too bad about the MultiMap conflict.

from scala-collection-contrib.

joshlemer avatar joshlemer commented on June 1, 2024 2

Thanks @Ichoran ! Some points:

However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So šŸ‘Ž for MultiSet to Bag naming

So if I interpret correctly, you're saying that Bag is unintuitive, because when you insert an element into it, you expect that exact object (up to reference identity) should be found in the resulting Bag. This critique is fair enough in isolation but in Scala-land the ethos is generally that we don't care about reference equality and only care about value equality. For instance, you might have made the same critique of Sets -- that you expect that when you put an apple in a Set and get it back out, you get the same one you put in.

Now for some quick points for the name change:

  • Bag has 3 letters vs MultiSet's 8, so it's much more convenient to type, which makes a pretty big difference when we treat collection apply as "literal" syntax (Bag(...)). The length savings is the same difference as Seq vs Sequence for instance. This may seem like a silly point, but anecdotally, I've even heard of people using Seq over List just to save one char.

  • While MultiSet is one accepted term and possible the most popular, Bag is not too much less popular, and avoids potentially confusing people into thinking that a MultiSet is some kind of special case of Set (a la HashSet/TreeSet/etc), when in fact MultiSets/Bags have nothing to do with Sets in the Iterable hierarchy, and do not behave like Sets.

  • I think that the Bag analogy very nicely visualizes what it is we're describing. A bag is an unordered collection of data, and a bag is equal to an other bag if and only if the two contain the same elements (including arity), ignoring order. The term Bag really drives home that order is ignored, IMO, because bags are all jumbled up/unstructured inside.

from scala-collection-contrib.

julienrf avatar julienrf commented on June 1, 2024

The name MultiDict was chosen to not conflict with the existing MultiMap.

Is Bag an established name for sets containing possibly multiple occurrences of their elements?

from scala-collection-contrib.

joshlemer avatar joshlemer commented on June 1, 2024

This does raise the question of why we need a mutable.MultiMap in stdlib as well as a mutable.MultiDict in collections-contrib. Is there an end game of deprecating the stdlib version?

from scala-collection-contrib.

julienrf avatar julienrf commented on June 1, 2024

Iā€™m in favor of deprecating mutable.MultiMap but anyways we need to make both solutions cohabit together during the transition period.

from scala-collection-contrib.

Ichoran avatar Ichoran commented on June 1, 2024

Do we need both a counted set and a set which retains original copies? Although both can occasionally be useful, I would tend to say that most uses of the latter would just as well be served by a MultiMap. It's only the former that is really awkward and inefficient.

However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So šŸ‘Ž for MultiSet to Bag naming, assuming that MultiSet is a counted set. If it is not a counted set, šŸ‘Ž for having it at all.

Also šŸ‘Ž to introducing a name collision between the old MultiMap trait and a new MultiMap that may have both immutable and mutable variants. MultiDict is awkward, though. In principle, šŸ‘ for finding something better. ManyMap or KeyBag perhaps. If KeyBag, then I guess MultiSet as a plain Bag is okay. Or maybe have all three: CountedBag, Bag, and KeyBag. But they will all be distinct APIs, I suspect, so maybe Bag is again bad.

from scala-collection-contrib.

julienrf avatar julienrf commented on June 1, 2024

The MultiSet here is a counted multiset: https://github.com/scala/scala-collection-contrib/blob/master/src/main/scala/scala/collection/MultiSet.scala#L50-L53

I prefer MultiDict over KeyBag or ManyMap. https://www.google.com/search?q=multidict

from scala-collection-contrib.

Ichoran avatar Ichoran commented on June 1, 2024

You can't get elements out of sets, and if you manage to, you do get one that you put in (if you put in duplicates, you obviously only get one of them out).

The bag is acting more like a bank: you deposit an item, and you get out an equivalent item, but not necessarily the one you put in. (Though it's weirder if counted--it will give you back the same item multiple times, rather than drawing separate instances from a vault of available instances.)

The point about the length of the name is a good one, though. That may be important enough to override all the other concerns.

from scala-collection-contrib.

Ichoran avatar Ichoran commented on June 1, 2024

Oh, also, next time my groceries end up packed all jumbled up/unstructured, I will remind myself that it is because they were put in a bag šŸ˜‚

from scala-collection-contrib.

julienrf avatar julienrf commented on June 1, 2024

Iā€™m convinced that Bag is a good name for counted sets. But the problem is more on the side of MultiDict. Iā€™m not sure that KeyBag makes sense here. A MultiDict (or MultiMap) associates a set of values to a key.

from scala-collection-contrib.

joshlemer avatar joshlemer commented on June 1, 2024

@julienrf I also don't think KeyBag is an improvement over MultiDict. MultiMap is probably best, save for the name collision. If I had to think of a different name, and I do prefer to go with practical/"blue collar" names over academic ones, I might consider something like OneToMany but I'm not sure that's necessarily any better than MultiDict (though it has exact same length). šŸ¤·ā€ā™‚

from scala-collection-contrib.

er1c avatar er1c commented on June 1, 2024

Copied from #98

TLDR; I think it should be named SetMultiMap

I think it may make the most sense to rename MultiDict to SetMultiMap, and potentially make MultiMap into a generic trait.

  1. It helps remove the @Deprecation warning and name conflict with the 2.12/2.13 MultiMap
  2. "MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has ArrayListMultimap, ListMultimap, SetMultimap, and etc; C++ STL is std::multimap; Rust is MultiMap, Apache Commons is MultiMap, etc)
  3. The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements Set, but there is no way to extend it to use an Array, Vector or List instead)
  4. The code still uses MultiMap in various locations (like in the hashcode!), and the CC[_] is MapFactory (e.g. not DictFactory)
  5. There are no other Dictionary terms in any of the Scala collections. It's Map for any scala code, only the java converters have the mention of a dictionary. MultiDict isn't scala idiomatic.

from scala-collection-contrib.

joshlemer avatar joshlemer commented on June 1, 2024

@er1c I agree that MultiMap and MultiSet (I would prefer Bag) should be abstract data types (akin to Seq, Map, Set). I also tried to point this out in #25. I think that everyone agrees that 'MultiDict' is much worse a name than MultiMap, but it was named that way to avoid collisions with the stdlib implementation.

I agree that there would be valid usecases for MultiMaps where the underlying value collection is a Set or a Seq or even a Bag. Perhaps the best way to achieve this would be to utilize the existing Iterable Factories so that users can pass in whatever kind of collection they like? Maybe this is possible:

trait MultiMap[K, V, C[_]] { ... }

object MultiMap {
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}

// user code
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2, 1 -> 3)

Perhaps some reasonable default value collection should be chosen (probably Set?) so that users can still initialize a MultiMap without specifying the value:

object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
  /// implement factory...

  // additional method
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}


// user code:

MultiMap(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Set]
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, List]
MultiMap.withValuesIn(Bag)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Bag]

If that route is taken then I would suggest making a static factory for each of the most common collection types so that we avoid allocating on each MultiMap creation, so:

object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
  /// implement factory...

  // additional method
  private final val ListMultiMapFactory = ???
  private final vall SetMultiMapFactory = ???
  private final val VectorMultiMapFactory = ???
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = factory match {
    case List => ListMultiMapFactory
    case Vector => VectorMultiMapFactory
    case Set => SetMultiMapFactory
    case other => new MultiMapFactory(other)
  }
}

Care should also ideally be taken to design the api so that users that just want to use the default collection don't have to specify the higher kinded type parameter everywhere. This is the same principle that was applied to the new views which make them much more ergonomic (so, View[Int] rather than View[Int, List[Int]]). Maybe MultiMap[K, V] extends a more general trait like MultiMapWithValues[K, V, C[_]].

Or maybe I'm just over complicating things, just giving ideas.

from scala-collection-contrib.

er1c avatar er1c commented on June 1, 2024

trait MultiMap[K, V, C[_]] { ... }
šŸ‘

This is exactly what I was thinking. I think the only other option is MultiMap[K, V, C <: Iterable[V]]

My brain starts going into CanBuildFrom implicits when thinking about it, so I probably need to brush up more on the 2.13 collections to see a better builder strategy :)

from scala-collection-contrib.

SethTisue avatar SethTisue commented on June 1, 2024

abandoned draft PR: #24

from scala-collection-contrib.

som-snytt avatar som-snytt commented on June 1, 2024

Bagwell.

from scala-collection-contrib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.