Comments (16)
Bag does appear to be an established name:
https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset
https://en.wikipedia.org/wiki/Multiset
https://commons.apache.org/proper/commons-collections/userguide.html#Bags
but you're likely right, that MultiSet is more commonly used, I'm not sure.
That's too bad about the MultiMap
conflict.
from scala-collection-contrib.
Thanks @Ichoran ! Some points:
However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So š for MultiSet to Bag naming
So if I interpret correctly, you're saying that Bag
is unintuitive, because when you insert an element into it, you expect that exact object (up to reference identity) should be found in the resulting Bag
. This critique is fair enough in isolation but in Scala-land the ethos is generally that we don't care about reference equality and only care about value equality. For instance, you might have made the same critique of Sets
-- that you expect that when you put an apple in a Set
and get it back out, you get the same one you put in.
Now for some quick points for the name change:
-
Bag
has 3 letters vsMultiSet
's 8, so it's much more convenient to type, which makes a pretty big difference when we treat collectionapply
as "literal" syntax (Bag(...)
). The length savings is the same difference asSeq
vsSequence
for instance. This may seem like a silly point, but anecdotally, I've even heard of people usingSeq
overList
just to save one char. -
While
MultiSet
is one accepted term and possible the most popular,Bag
is not too much less popular, and avoids potentially confusing people into thinking that aMultiSet
is some kind of special case ofSet
(a la HashSet/TreeSet/etc), when in fact MultiSets/Bags have nothing to do with Sets in the Iterable hierarchy, and do not behave like Sets. -
I think that the
Bag
analogy very nicely visualizes what it is we're describing. A bag is an unordered collection of data, and a bag is equal to an other bag if and only if the two contain the same elements (including arity), ignoring order. The termBag
really drives home that order is ignored, IMO, because bags are all jumbled up/unstructured inside.
from scala-collection-contrib.
The name MultiDict
was chosen to not conflict with the existing MultiMap
.
Is Bag
an established name for sets containing possibly multiple occurrences of their elements?
from scala-collection-contrib.
This does raise the question of why we need a mutable.MultiMap
in stdlib as well as a mutable.MultiDict
in collections-contrib. Is there an end game of deprecating the stdlib version?
from scala-collection-contrib.
Iām in favor of deprecating mutable.MultiMap
but anyways we need to make both solutions cohabit together during the transition period.
from scala-collection-contrib.
Do we need both a counted set and a set which retains original copies? Although both can occasionally be useful, I would tend to say that most uses of the latter would just as well be served by a MultiMap. It's only the former that is really awkward and inefficient.
However, I don't think the counted set is very intuitively a Bag
. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So š for MultiSet
to Bag
naming, assuming that MultiSet
is a counted set. If it is not a counted set, š for having it at all.
Also š to introducing a name collision between the old MultiMap
trait and a new MultiMap
that may have both immutable and mutable variants. MultiDict
is awkward, though. In principle, š for finding something better. ManyMap
or KeyBag
perhaps. If KeyBag
, then I guess MultiSet
as a plain Bag
is okay. Or maybe have all three: CountedBag
, Bag
, and KeyBag
. But they will all be distinct APIs, I suspect, so maybe Bag
is again bad.
from scala-collection-contrib.
The MultiSet
here is a counted multiset: https://github.com/scala/scala-collection-contrib/blob/master/src/main/scala/scala/collection/MultiSet.scala#L50-L53
I prefer MultiDict
over KeyBag
or ManyMap
. https://www.google.com/search?q=multidict
from scala-collection-contrib.
You can't get elements out of sets, and if you manage to, you do get one that you put in (if you put in duplicates, you obviously only get one of them out).
The bag is acting more like a bank: you deposit an item, and you get out an equivalent item, but not necessarily the one you put in. (Though it's weirder if counted--it will give you back the same item multiple times, rather than drawing separate instances from a vault of available instances.)
The point about the length of the name is a good one, though. That may be important enough to override all the other concerns.
from scala-collection-contrib.
Oh, also, next time my groceries end up packed all jumbled up/unstructured, I will remind myself that it is because they were put in a bag š
from scala-collection-contrib.
Iām convinced that Bag
is a good name for counted sets. But the problem is more on the side of MultiDict
. Iām not sure that KeyBag
makes sense here. A MultiDict
(or MultiMap
) associates a set of values to a key.
from scala-collection-contrib.
@julienrf I also don't think KeyBag is an improvement over MultiDict. MultiMap
is probably best, save for the name collision. If I had to think of a different name, and I do prefer to go with practical/"blue collar" names over academic ones, I might consider something like OneToMany
but I'm not sure that's necessarily any better than MultiDict
(though it has exact same length). š¤·āā
from scala-collection-contrib.
Copied from #98
TLDR; I think it should be named SetMultiMap
I think it may make the most sense to rename MultiDict
to SetMultiMap
, and potentially make MultiMap
into a generic trait.
- It helps remove the @Deprecation warning and name conflict with the 2.12/2.13
MultiMap
- "MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has
ArrayListMultimap
,ListMultimap
,SetMultimap
, and etc; C++ STL isstd::multimap
; Rust isMultiMap
, Apache Commons isMultiMap
, etc) - The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements
Set
, but there is no way to extend it to use anArray
,Vector
orList
instead) - The code still uses
MultiMap
in various locations (like in the hashcode!), and theCC[_]
isMapFactory
(e.g. notDictFactory
) - There are no other Dictionary terms in any of the Scala collections. It's
Map
for any scala code, only the java converters have the mention of a dictionary.MultiDict
isn't scala idiomatic.
from scala-collection-contrib.
@er1c I agree that MultiMap and MultiSet (I would prefer Bag) should be abstract data types (akin to Seq, Map, Set). I also tried to point this out in #25. I think that everyone agrees that 'MultiDict' is much worse a name than MultiMap, but it was named that way to avoid collisions with the stdlib implementation.
I agree that there would be valid usecases for MultiMaps where the underlying value collection is a Set or a Seq or even a Bag. Perhaps the best way to achieve this would be to utilize the existing Iterable Factories so that users can pass in whatever kind of collection they like? Maybe this is possible:
trait MultiMap[K, V, C[_]] { ... }
object MultiMap {
def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}
// user code
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2, 1 -> 3)
Perhaps some reasonable default value collection should be chosen (probably Set?) so that users can still initialize a MultiMap without specifying the value:
object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
/// implement factory...
// additional method
def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}
// user code:
MultiMap(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Set]
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, List]
MultiMap.withValuesIn(Bag)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Bag]
If that route is taken then I would suggest making a static factory for each of the most common collection types so that we avoid allocating on each MultiMap creation, so:
object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
/// implement factory...
// additional method
private final val ListMultiMapFactory = ???
private final vall SetMultiMapFactory = ???
private final val VectorMultiMapFactory = ???
def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = factory match {
case List => ListMultiMapFactory
case Vector => VectorMultiMapFactory
case Set => SetMultiMapFactory
case other => new MultiMapFactory(other)
}
}
Care should also ideally be taken to design the api so that users that just want to use the default collection don't have to specify the higher kinded type parameter everywhere. This is the same principle that was applied to the new views which make them much more ergonomic (so, View[Int]
rather than View[Int, List[Int]]
). Maybe MultiMap[K, V]
extends a more general trait like MultiMapWithValues[K, V, C[_]]
.
Or maybe I'm just over complicating things, just giving ideas.
from scala-collection-contrib.
trait MultiMap[K, V, C[_]] { ... }
š
This is exactly what I was thinking. I think the only other option is MultiMap[K, V, C <: Iterable[V]]
My brain starts going into CanBuildFrom implicits when thinking about it, so I probably need to brush up more on the 2.13 collections to see a better builder strategy :)
from scala-collection-contrib.
abandoned draft PR: #24
from scala-collection-contrib.
Bagwell
.
from scala-collection-contrib.
Related Issues (20)
- Support scala.js HOT 3
- Add Scala 3 crossbuild HOT 3
- Build should declare a `versionScheme` HOT 1
- Immutable MultiSet ( and MultiDict ) lacks -- method
- Add `between`, `betweenBy` functions to Seq HOT 2
- MultiDict add many values at once for a single key
- Scala Native Support HOT 2
- New Release HOT 3
- Collection-like operations on tuples HOT 3
- MultiDict.add should return this if no changes are made HOT 3
- Document iteration order of multi-dict
- publish 0.2.0 HOT 6
- Broken 0.2.0 artifact (no .class files in .jar) HOT 9
- Project does not correctly import into IntelliJ HOT 9
- Add Automatic-Module-Name to MANIFEST.INF HOT 1
- Can Scala 2.12 be supported? HOT 1
- OSGi support is missing
- Set.permutations HOT 1
- Change values returned from SortedMultiDict to be a SortedSet[V] HOT 2
- Rename MultiDict back to original MultiMap or SetMultiMap HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scala-collection-contrib.