Giter Club home page Giter Club logo

Comments (134)

LiberalArtist avatar LiberalArtist commented on May 25, 2024 3

As another point in the design space, Pyret has several notions of equality. In summary:

Name Operator Partial Predicate Total Predicate Similar To
Equal Now =~ equal-now equal-now3 equal? (Racket) == (Python, Ruby)
Always Equal == equal-always equal-always3 = (Ocaml)
Identical <=> identical identical3 eq? (Scheme) == (Ocaml) === (JavaScript) is (Python) == (Java)

IIUC, Racket lacks a general "Always Equal" for mutable values: a test to identify that mutations to one value will also mutate the other. Currently, eq? can only say "yes" or "idk": it is foiled by chaperones and impersonators. The default equal? for opaque structs does this, but sacrifices "Equal Now".

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024 3

My current(*) perference:

  • == is Racket's equal? (because that's usually what you want, and it's a good choice for match)
  • === is Racket's eq? (because it really is a common operation, however much we might prefer a different universe)
  • .= is Racket's = (because I think traditional number comparison is common enough to merit a compact name)

Meanwhile:

  • := is assignment (because Pascal was my second language)
  • = is not a predefined operator, although it is used in places like providing the default for an optional argument

I still worry that the behavior of == as equal? on numbers will end surprising to some, but it's the least-bad effect I see in various compromises. Also, ending up with Racket's set of equality operators seems like a big benefit.

(*) Ask me again after you ask @Metaxal.

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024 3

From @AlexKnauth:

Match patterns aren't values... they don't have a "reference" identity to compare against, of course they can match mutable values structurally.

Ok, I buy that. I'm probably too used to a world with quote, where quote patterns are explicitly defined in terms of equal?, and where there is the possibility of 3-D syntax (or historically worse), and that has muddied my thinking here. I also appreciate @rocketnia's point that we can have different kinds of patterns if something about them is meant to connect to values and equal?.

From @jackfirth:

I'd rather just implement gen:equal+hash with equal-always semantics for Rhombus data types than try and introduce a parallel equal-always? procedure. I don't think it's worth having two operators for this.

I don't yet buy this, because I'm not convinced that equal_now is rare or that equal_always is definitely useful. There also seems to be an idea here of sealing off Rhombus from Racket datatypes, which which is not obviously practical in terms of interoperability and performance; having separate equal_now and equal_always, in contrast, seems to have a good story on those fronts.

We should put this on the agenda of the bi-weekly Zoom meeting soon, maybe just after data structures.

from rhombus-prototype.

jeapostrophe avatar jeapostrophe commented on May 25, 2024 2

There's a syntax part of this, a semantics part, and a library part.

Regarding semantics of eq?, Robby has long advocated the idea that programs should have the same functional correctness if all (eq? x y) is replaced with #f. The idea is that eq? is purely a kind of optimistic optimization where the slow path is required to be present.

from rhombus-prototype.

gus-massa avatar gus-massa commented on May 25, 2024 2

Something that I think is weird is that an immutable string can be equal? to a mutable string, but an immutable hash can't be equal? to a mutable hash. (I understand the reason at the C implementation, but I think that the difference should not leak to the racket level.)

from rhombus-prototype.

sorawee avatar sorawee commented on May 25, 2024 2

In the current Rhombus:

  • = is for variable/object mutation (set!, vector-set!, etc.).
  • == is for number comparison (=)
  • === is for equal?

I think this is not ideal.

  • No other languages use == strictly for number comparison, so it would be very confusing to people who come from other languages.
  • === is not ergonomic. It's used everywhere, and three keystrokes cost too much.
    • = for mutation seems like a good target to be removed, since it's much less frequently used compared to other operations.

My preference is:

  • := for mutation (from Discord, @plane suggests that <- is also another good candidate)
  • = for number equality (from Discord, @soegaard suggests that this could be generalized to eqv?, though I personally think eqv? should not exist)
  • == for equal?
  • === for eq?

from rhombus-prototype.

Metaxal avatar Metaxal commented on May 25, 2024 2

Hmm, I'm not too sure that's a good thing, as quite likely a large portion of the users (in particular students) will expect this result to be true. This can lead to serious silent bugs too.

I would be okay with 1 == exact_to_inexact(1) raising an exception instead.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024 2

=, ==, and equal? all having slightly different semantics is not a good state of affairs. I think there should only be one symbolic equality operator and it should have the same meaning as whatever is_equal means in Rhombus (which may or may not be the same as equal? in Racket.) Any other forms of equality should have descriptive names like is_numerically_equal or is_same_object and should not have symbolic operators associated with them. We want equality to have a single clear default meaning, with other notions of equality given names that say how they're different from the default meaning.

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024 2

What if instead of making == mean numeric equality on numbers, we just made numbers read as exact decimals by default?

That would definitely create surprises, given how much more expensive exact arithmetic is.

Exactness-insensitive numerical equality tests make up a very tiny fraction of all equality tests.

I'm not at all sure this is true, and I think this must depend on what kind of program you're writing.

While there are pitfalls to the convention that . means a floating point number and to conventions for implicit coercions between them, it's a pretty good compromise for the most part between performance and real arithmetic. I haven't seen the combination as a big source of bugs in our libraries. (It is a source of confusion for beginners in practically all languages, and this is one of those places where a different choice may be appropriate in a pedagogical language like BSL.)

More generally, I'm surprised at how often opinions in this thread dismiss the status quo, as if the only reason Racket has =, eq?, and equal? the way they are and in widespread use is because we've been too lazy to try anything else. Equality is tricky. Picking a small number of equality operators and allocating names to encourage the best ones — that sounds tractable and a good idea from our past experience. Trying to boil it down to just one equality operator seems at odds with experience across languages.

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024 2

For now, I've updated the prototype to use == as equal?, === as eq?, .= as numeric =, := as set!, and = as a delimiter for use by things like fun. This isn't set in stone, obviously, but it seems like an improvement over the previous state.

from rhombus-prototype.

sorawee avatar sorawee commented on May 25, 2024 2

@countvajhula I don't think we can do ->float.

  1. Consider (expt 10 1000) and (add1 (expt 10 1000)). They are different numbers, so adding them to an empty set should count two elements. But if you convert them to float, they will be both +inf.0, counting only one element.
  2. Consider (/ 22 7) and (exact->inexact (/ 22 7)). They are not equal, not even under =. But if you convert them to float, then they become the same number.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024 2

A key method like that is convenient for containers, but it shouldn't be the way to define equality for all datatypes.

The example that comes to mind first is for rational numbers. The easiest way to define equality for them is that an/ad = bn/bd when an*bd = bn*ad. Easily done with just 3 simple integer operations. But to define "key" would require reducing it to simplest form by factoring, which requires a bit more.

from rhombus-prototype.

LiberalArtist avatar LiberalArtist commented on May 25, 2024 1

My primary use for eq? is with symbols, especially as keys in eq?-based hash-tables. My impression is that there is a notable performance benefit.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024 1

I'd be fine with that being false, and telling people if they want to mix comparisons of exact and inexact values, they could use this:

is_numerically_equal(1, exact_to_inexact(1))

Exactness-insensitive numerical equality tests make up a very tiny fraction of all equality tests. I'm not too worried about them requiring some explicitness.

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024 1

Like many, I find the idea of egal?/equal_always appealing. But adding it to Racket (everything done for equal? in parallel for a new equal-always?) would be significant work. I'd go for it if if it seemed clearly worthwhile... but when I try to think of situations where my code would be better with egal? instead of equal?, I'm having trouble finding good examples.

The closest I think think of is the same others have noticed: problems with keys in an equal_always map getting mutated would not happen. Then again, if I correctly remember the cases where that happened to me or someone I was helping, the code would have been more complicated to achieve the goal with equal_always hashing, and so the programmer would have almost certainly just opted for an equal_now map, anyway.

Along similar lines, it seems like the interaction with match and classes with mutable fields is more subtle. If the fields of a Posn are mutable, does Posn(0, 0) match Posn(0, 0) on the grounds that the pattern Posn(0, 0) means any Posn whose fields match 0 and 0? Or does it fail to match because separate Posn(0, 0) and Posn(0, 0) are distinct? The current implementation would say the former, probably, but I'm not sure it's consistent. Another possibility is to disallow mutable-class constructors in patterns, but that seems unhelpfully strict.

I've re-read the discussion here and on Discourse, and I'm missing the anecdoates for where equal_always could have been better. Does anyone (perhaps @AlexKnauth, the main advocate here) have concrete examples you can point at where equal_always instead of equal_now would have solved/avoided problems in practice?

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024 1

I've made a branch and a draft PR to implement equal-always? for Racket:
racket/racket#4076

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024 1

To provide another perspective here, some time ago I wrote a library providing an equality relation, =. It encodes my belief that a single equality relation is all we need in order to express any notion of equivalence. In the docs for the library I drew a distinction between the notion of "equality" and the notion of "identity," corresponding to the traditional distinction in Lisp (and in philosophy). But I think only the former is really needed.

The Main Idea

I'm not a professional mathematician, but I'll do my best to articulate what is essentially a mathematical point of view. The basis for it is the following idea, which intentionally has some handwaving, to be made precise later:

Any notion of equivalence can be expressed as a function. Objects that are mapped to the same target value by the function are equivalent under the notion of equivalence that the function represents.

For instance, two strings "apple" and "APPLE" are equivalent under the function string-upcase because they map to the same value.

One way to think about this is that the defining function f maps inputs values considered "equal" to a common representative, as "APPLE" was in the string-upcase example.

More precisely,
For any equivalence relation "==" on a set E and a primitive notion of equivalence "=", there exists a set M and a function f : E → M such that x == y is equivalent to f(x) = f(y).
Proof. By the results "I don't wanna prove it" and "I really feel like this is true," Q.E.D. Also, this closely corresponds to, but isn't exactly, a standard result on equivalence relations. If we feel a proper proof is necessary, we could tackle this later.

Primitive Equivalence and the Defining Function

Parsing the above carefully, we glean that the implementation motivated by this idea requires two things: (1) a primitive notion of equivalence "=", and (2) that the high-level equivalence relation "==" being defined should accept a function argument ("the defining function"). We could make this argument optional, so that if none is provided it is essentially the same as using the identity mapping, i.e. values, and then "==" and "=" are the same relation.

Re: (1) the primitive notion of equivalence, I think this could be any "sufficiently expressive" idea of equivalence. This notion of equivalence should, for instance, accept inputs of any type, and be able to distinguish between values of the same type (returning false if the inputs are of different types). For this primitive equality relation, we could potentially use one of the existing relations (such as equal?) or a combination of them, based on the following considerations:

What does "sufficiently expressive" mean? How should numbers be treated? How should immutable vs mutable values be handled?

Sufficiently expressive means that the primitive equivalence relation must be more discriminating than any notion of equivalence that we are likely to want in practice. So for instance, if it were to treat "abc" as equal to "ABC" then that's not good, because there are cases where we want to be case-sensitive. Another example: if it were to treat 1.0 as different from 1.00 -- that would be OK, since it is more discriminating than we typically need. Likewise, it should probably always treat immutable values as different from mutable values, if this distinction is one we are ever likely to care about in practice. Of course if all values were immutable, as Sorawee foreshadowed above, this would be a non-issue, but resolving it isn't a precondition to being able to define the equivalence relation here.

As an example, if we were to use equal? for the primitive relation "=", and considering the sign "==" to represent the new high-level equality relation, then in order to compare numbers the way the built-in numeric equality relation "=" does, in this scheme, we would need to use something like relation/type's ->float as the defining function, e.g. (== #:key ->float a b), to use Racket syntax. I don't know a lot about how floating point numbers are modeled in Racket and whether conversion to float is a reliable way to do the comparison or not, but the point is that we would need to map the input numbers to the "appropriate" representation that is able to treat 0.125 as the same as 1/8, if we happen to be interested in decimal equivalence.

If Primitive Equivalence Should be "More Discriminating," Shouldn't We Use eq?

For eq?, every value is equivalent to itself but nothing else at all. It would seem that this is "sufficiently expressive" and the "most discriminating," and therefore seemingly a good choice. Yet, it doesn't feel right. What defining function could we use to treat two distinct strings "hello" and "hello" as equal? It would need to somehow map both of these values to the same object in memory, while also implementing the criterion we wish to use to assert the equivalence of these two values. I think the issue is that while eq? is the most discriminating, it is this way by virtue of arbitrary criteria rather than by actual analysis of the objects being compared. That is, objects are the same or distinct based on where they are, not what they are. So it may be that the real criterion is not just "most discriminating" but more specifically, "most discriminating based on inherent properties of the objects," and we cannot use eq? here.

How to handle identity vs equality?

In many cases in practice, we prefer to use eq? rather than equal?. For instance, when comparing structs, the latter does a recursive comparison of fields (as I understand it), and likewise for lists, making it expensive. While the former is a constant time check. We may still like to have a dedicated notion of "identity" for such cases. For this, we could simply use (== #:key eq-hash-code a b) i.e. use a function that maps an object to its memory address. So same? proposed above becomes a thin facade on == and replaces eq? while retaining its performance profile.

Likewise, the equal-always? proposed above is the default behavior of =, but we could get less discriminating behavior with another thin facade, equal-now?, via the defining function, ->immutable. For objects that we know are mutable, it falls to us as developers to use same? rather than ==.

Equality-based interfaces and structures

There are many implicit equality-based interfaces (like remove, member, ...) and structures (like hashes). For these, at the moment in Racket there are parallel sets of interfaces and structures, each employing a particular notion of equality (e.g. eq? or equal? or eqv? etc.). With the new defining-function-based equality relation, we would only need to have a single set of interfaces and a single set of structures that implicitly employ ==. All of these interfaces and structures must also accept an optional function argument, just as == itself does, to be able to specialize the definition of equality as we have seen.

User-supplied equality functions

Racket interfaces often accept an optional equal-proc argument (e.g. assoc), allowing the user to supply a function to use in place of equal? or eq? etc. The claim with == is that all notions of equivalence can be modeled by it. It would be good to try and find counter-examples to this claim: cases where we need to supply a (binary) equality function, where a (unary) defining function is inadequate.

Counterexamples

One possible counter-example is the relation of being "approximately equal." We could define this as "a = b means a and b are within 0.5 of each other." We cannot express this as a defining function with ==, since the notion of equality here relies on a difference between two values, and therefore is a function of both values and cannot be determined by projection of a single value alone. It would seem that == is done for! But on closer inspection, we see that 4.7 ~= 5.0 and 5.0 ~= 5.3. Yet, 4.7 ~= 5.3 is not true. Mathematically, notions of equivalence must be transitive, meaning a = b and b = c implies a = c. The relation of "approximate equality" violates this, so we could say, no bueno, and in that case, == is OK. Do we want to consider this a legitimate definition of equivalence, even though mathematically it isn't valid? Philosophically (rather than mathematically), I am not unsympathetic to this view, for certain reasons that we needn't get into, but personally, I think the mathematical criteria should prevail here, and == is still good, since otherwise, for instance, it would make things weird with hashtables where hashing would be ambiguous! E.g. if our hashtable represents buckets of 100m sprint times with a tolerance of 0.5s, then 5.0 could either be hashed under 4.7 or 5.3 (or both).

Caveat Emptor

In the tradition of Knuth, I'll qualify all this by saying, "beware of bugs in the above -- I've only proved it true, not tried it." If we can come up with other counterexamples for consideration, that would certainly help reveal whether == is viable.

By the way, the = relation from relation/equivalence is roughly a proof of concept for some of the behavior above, but it differs from == in some ways -- e.g. handling of numbers, and I'm not sure if the implementation of the hash-code is sound for the purposes of using it in a hash/dictionary. The above discussion should be treated as taking precedence over any of the decisions in relation/equivalence. But you're welcome to try it to get a sense of how == might work.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024 1

I think you might be misunderstanding my purpose. I don't think there are that many problems in existing Racket programs using equal? that would be fixed by switching to equal-always?. (Mostly because mutation is rare enough in Racket that it doesn't matter much, and people who do use mutation in Racket usually know enough to know when to use eq? instead of equal? for mutable data.)

However, there are more problems in existing Racket programs using eq? that would be fixed by switching those to equal-always?. Both from beginners who might think eq? is just a convenient shorthand, and from more knowledgeable people who use eq? because of its properties wrt mutable data, but get burned by chaperones and the like not counting as eq.

An equal-always? would also be a useful middle-ground that would suffice to replace most uses of both eq? and equal?-used-in-functional-programs, making it a good choice to condense the many various equality operators in Racket currently, into just one or two to simplify it.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024 1

Just to continue the discussion from today and the point @jackfirth brought up, I think we agree that we want the following relation to hold:

	a <= b <=> a < b or a = b

In the current proposal, we define a <= b to be a < b or a = b, so that this does hold by definition. In the specific case:

	2 <= 2.0 <=> 2 = 2.0 or 2 < 2.0

Comparing values of different types seems quirky even at a conceptual level, and it probably shouldn't dictate the design.

If we do contemplate it anyway, we should probably reach whatever conclusion we would reach if they were members of the same type. And perhaps that's already the case here; perhaps we can think of 2 and 2.0 as being members of the real? type.

In that case, comparing numbers with potentially overlapping error distributions seems quirky even at a conceptual level... but I really want to say 2 <= 2.0 by comparison with 0.0 <= -0.0. Giving non-NaN floating-point numbers a total preorder based on some canonical exact counterpart makes -0.0 = 0.0 make sense in at least some way, and them 2 = 2.0 makes sense in the same way, = is an equivalence relation, and a <= b is equivalent to (a < b) or (a = b). I think that's a nice set of properties.

One downside: I don't know if the author of an interval arithmetic system would want a <= b and (a < b) or (a = b) to be equivalent all the time. Still, we might have to draw the line somewhere.

I'm perpetually tempted to seek a notion of equality that works for everything, but I believe one of Rhombus's most defining advantages will be its usability as a calculator by people familiar with other low-code computing experiences (spreadsheets, actual calculators, scripting languages). (I haven't always liked this aspect of Rhombus, but I've come around.) This makes numeric equality far more important than other kinds. By the time someone knows enough about compound data structures to know what kinds of comparison they want to perform on them, they've probably learned enough to navigate some documentation and pick out a variation that has the more explicit name they need.

So this is what I'm thinking:

  • Rhombus should have a single equality check operation that's preferred for virtually everything, =. It should be what it needs to be so that arithmetic works in a familiar way, even if that skews other aspects of Rhombus's design.
    • Not all types should actually have full support for =, and out of the types that do, some of their values (NaN in particular) might not participate in it. The function type would be a good example of a type that doesn't support an = check. Types like that might still have a conceptual notion of when values are = even though it's not programmatically checkable.
      • Actually, functions are an awkward example which brings out some more subtlety. In Racket, and presumably in Rhombus, many structs with their own notions of equality can all implement prop:procedure. So in general, a type's notion of = might have multiple disjoint subsets of values that can be compared. It just so happens that for the real? type, NaN is not part of any set.
    • I'd like to consider a function to be pure if it not only has no externally visible side effects but also obeys function extensionality with respect to its input and output types' respective notions of =. Embracing = even at this conceptual level ought to reduce the overall number of notions of equality people need to juggle. And this mindset leads to a few interesting implications:
      • This makes + an impure operation on floating-point inputs, since the expressions 0.1 + 0.2 and 1/10 + 2/10 pass it = inputs but compute non-= results. Calling + impure seems like a good choice; people dealing with floating point values need to be aware of + as a source of surprises (e.g., catastrophic cancellation), so it's already impure in a practical way.
      • It would be odd to say a function that constructed a new mutable hash was pure, but if = on mutable hashes behaved like equal?, we might end up with it being pure on a technicality. That's a clue that = on mutable hashes should behave like eq? instead. (Or, in general, like equal-always?.)
      • For an ecosystem which enables pure FP as much as Racket's does, it might be weird for user-defined structs to have impure constructors by default. That's a clue that user-defined structs shouldn't default to a notion of = that matches eq?. Structs in Rhombus should probably have some other default behavior for = instead, like an implementation which treats every value of the type as a non-participant in its = check, similar to NaN.
    • Types that support <= should support = as well, and a = b should be consistent with (a <= b) and (b <= a).
    • Functions and data structures that are polymorphic over all equality-equipped types should rely on their caller to specify what that equality is, since the caller is also specifying what the type is. If the caller expresses no preference, = should be the default. When the caller does specify their preference, they might use a key function (as @countvajhula's been saying) or a comparator.

While I'm thinking about it, here's various functionality I imagine each equality-like comparator might support. Not all comparators would necessarily support all the features. Some of these features are kind of pie-in-the-sky ideas that might never need to exist. certainly not in the first several versions of Rhombus.

  • Checking that a value is a participant in the comparator, and figuring out which disjoint subset of comparable values it belongs to.
    • (More generally, a comparator could have all the same features a contract does, including the ability to check whether another comparator is a restriction of it (contract-stronger?) or to generate random examples of comparable values. I think most contract features would be superfluous on comparators and vice versa, but there may be potential for some features to influence each other or to be factored out.)
  • Hashing a value.
  • Comparing values.
    • (I'm also interested in a generalization where a comparison is actually performed on a graph of values with comparators on the edges between them. Not because I actually ever imagine it being useful in an application! I suppose it just seems related to taking the limit of a diagram in category theory, so it seems like the kind of generality that might be useful by accident.)
  • Comparing values according to some total preorder that's consistent for any program using the current implementation of the comparator, even if it's not part of the stable API. This can be handy for using the values as keys in an AVL tree.
    • Sorting a whole collection of values according to that total preorder, producing a list of partitions. (I've built something like this before in the form of Interconfection's getfx-cline-sort.)
  • Looking up a value's entry in a trie, possibly a trie where some subkeys are wildcard patterns that match any key. (I want to use something like this for pattern-matching, but I haven't figured out all the details.)
  • Diffing values to create unit test failure reports.
    • (In general, an unsuccessful comparison could generate a diff that's a lazy stream of steps that traverse into the value tree and find a series of equality witnesses until they find an apartness witness and abort. A successful comparison could have the same result, but would encounter no apartness witness and would end up with a fully zipped structure, similar to the original values but with equality witnesses at the leaves.)
      • (I kind of want to figure out if the lazy stream of apartness witnesses and the sequence of trie lookups can use the same API somehow.)
  • Inspired by the the kind of nominal-type-based optimization that @jackfirth has talked about in the context of stream and collection APIs, specific kinds of comparators could have various extra operations that made them more cooperative as subcomponents of certain other comparators or data structures. For instance, a comparator between strings might first transform the strings and then compare the results according to another string comparator, but if string comparators exposed certain extra functionality, this process might not need to construct intermediate strings nor take two passes over the data.

from rhombus-prototype.

soegaard avatar soegaard commented on May 25, 2024 1

This is a long thread so forgive me if this has been mentioned already.

The IEEE standard for floating point numbers explicitly says that:

Comparisons shall ignore the sign of zero (so +0 = −0).

My understanding is that both +0.0 and -0.0 means the result was rounded to zero.
The +0.0 means "rounded down to zero" and -0.0 means "rounded up to zero" - but both
representations are just zero.

That is: -0.0, 0.0 and 0.0 all represent the value zero.
And none of them are larger or greater than the other.

That view is consistent with the current behavior:
> (< -0.0 0.0)
#f
> (< 0.0 -0.0)
#f
> (= 0.0 -0.0)
#t
> (list (<= -0.0 +0.0) (<= +0.0 -0.0))
'(#t #t)

Section 5.11 has more to say on the subject of comparison predicates.

https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf

[Note: There is a newer standard, but I don't know where to find it.]

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024 1

@countvajhula

It wouldn't be "key functions all the way down" because we are taking egal? over the built-in Racket values as a primitive.

I wanted to encourage digging in further and figuring out which built-in Racket values are in use here. How do we know they're enough? If they're not enough someday, do we make more built-in Racket values or take another approach? But if you're fine with binary-style gen:equal+hash being an option, then I think that makes it possible to express anything binary comparators can do.

That is, assuming we allow specifying a key function via gen:equal+hash for the type and not just on an ad hoc basis.

Incidentally, I believe an interface like that can currently be built in a way that abstracts over the existing gen:equal+hash or prop:equal+hash. I do think it'll be a lot more convenient than writing binary comparison functions and hash functions by hand, though, and having the ability to do that with just what's in the base package would be nice.

I see what you're saying re: key types though. Since the method of checking equality on the ground set breaks down into islands that correspond to types, these types could be seen as "key types."

In Racket, most types are assumed to be disjoint, so it might be odd to come up with an examples where the islands are nontrivial. Still, there's a pretty well-defined distinctness between symbol? and hash? values (making them potentially part of the same island), and there's a pretty well-defined distinctness between symbol? and bound-id-table? values, but maybe hash? and bound-id-table? values might converge someday. If there's any chance they could, then hash? and bound-id-table? are different islands for now, and they might converge to a single island once comparisons between them are defined. And if those two are different islands, that leaves symbol? in a position where it's unclear whether it shares an island with hash? or with bound-id-table?... and until we can decide that, it's probably easiest to think of symbol? as an island of its own that just happens to have some well-defined disjointness relationships with the hash? and bound-id-table? islands.

(Hmm, perhaps if islands are sometimes related by known disjointness, "islands" aren't quite the whole story.)

Realistically, the most plausible types to converge someday are types in separate libraries that haven't coordinated with each other yet. There probably aren't many meaningful examples in base Racket.

Another example is that some types may become cross-phase persistent that weren't cross-phase persistent from the beginning. Some values that were based on phase-specific versions of a type before, and hence couldn't be equal? to each other, suddenly can be.

Examples of islands are a lot clearer when it comes to ordered comparison. We might expect rational? values to have the ordering they're mathematically defined with, and we might expect list? values to have some kind of lexicographic order, but when we get to comparing a rational? to a list?, the ordering is a lot less clear. Most people who try to compare a rational? to a list? probably have a type error in their program, and if they really mean to compare these things, they should probably be using some custom comparison behavior.

I'm not sure I see a difference with key types being extended by comparators vs key functions, though -- in either case, once a new type has defined equality for itself, new types after that could treat it as a key type, no?

I mean, that's part of why I was confused. :) I see comparators as a building block that's useful in making more key types, which in turn makes key functions more useful, which in turn is one of the easiest ways to build a comparator.... I think that unless there's a reason to exclude some techniques (and indeed Cene's dexes exclude some techniques to keep abstraction details from leaking), they all tend to be better together.

But if gen:equal+hash supports specifying a key function, then the equality check here would delegate to the method on the type, and it would produce a representative "from within" the instance.

Oh yeah, I guess it would! Then the encapsulated information would only be seen by the Racket/Rhombus internals and not in the comparison results. Interesting. :D

All right, I might have to think about whether I have any other encapsulation needs that aren't solved by a combination of key functions and a key-function-based generic interface. Actually, this is very analogous to what I've had in mind for user-definable dexes in Cene, so I suppose this might already cover any complication I've been able to think up in the past few years.

So... out of the reasons that I as a user might be afraid I need to use binary comparison techniques someday, I think performance might be the last holdout.

And as for performance, I think this might actually be solvable by imagining a value that can have its comparison-relevant parts iterated like a stream (rather than constructing a whole vector or other representation all at once). Then there can be a new key type that wraps another value in a way that applies a transducer to the other one's stream -- the transducer essentially being a streaming key function!

Wow, all right, maybe key functions can have it all. :)

[...] what's the reason for using make-rectangular here?

The number->maybe-equal?-key function takes a complex number? apart into two real? values, normalizes them, and then uses make-rectangular to put them back together.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024 1

It would be better to find ways to incorporate such special cases using special handling rather than design the core equality extension interface around them.

What special handling do you propose for the case outlined by Alex? I believe Alex's point was that there is no special handling that can compute the equality of two rational numbers using key types without factoring.

As for collections, I don't want a comparison of two collections to require copying each of their contents into new vectors just for comparison purposes. That completely negates my ability to make collections of different sizes short-circuit their comparison logic. Comparing an empty list to a list of size N should always be an O(1) operation.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024 1

@jackfirth: Comparing an empty list to a list of size N should always be an O(1) operation.

@jackfirth: Here's my use case: comparing two collections of different sizes for equality should always be O(1) (assuming that computing their size properties is O(1)) and should never require traversing their elements. How can we make that happen?

With memoization, the amortized asymptotic complexity of using comparison representatives should be the same as without them. The size of the representative is usually proportional to that of the value it represents, so it might as well have been constructed at the same time.

That isn't to say there's no cost to it -- I think it does mean doubling the memory footprint of every data structure that ever participates in a comparison -- but it's not a cost that I think would show up in an asymptotic complexity analysis.

@jackfirth: Do we need the recur argument in the Rhombus interface? I'd prefer to simplify things by avoiding it.

Besides the examples Alex gave, I believe it's important not to forget that the recur of equal-proc is also used for chaperone-of? and impersonator-of?.

Notably, in those cases, equal-proc and recur do a partial order comparison -- not an equivalence comparison, and not even a total order comparison. Namely, they check what information ordering holds between values that may have some details partly cordoned off by contract failure reports.

As a practical matter in Racket, I believe this is part of what lets chaperone contracts be as strict as they are without being unusable. If authors of user-defined types in Rhombus didn't cooperate with the chaperone system this way, I think more contracts would have to be impersonator contracts.

(All right, I'm not sure I would actually notice the difference. I've always kind of wondered why Racket bothers with chaperones.)

It's also worth observing that this cooperation mechanism isn't specific to chaperone-of? and impersonator-of?. If someone else had an all-new chaperone-like idea or some other kind of information ordering in mind, they could use equal-always?/recur to define their own chaperone-of?-esque partial order.

Similarly, there's some flexibility here for the Racket language to improve the meaning of chaperone-of? itself without needing everyone to migrate their code.

(Hey, that might be it! Maybe Racket bothers with the strictness of chaperones because when people try to respect the intended use of the contract system, the contract system itself can be updated without breaking their code.)

@AlexKnauth: Since Rhombus == follows the equal-always? mode, would it be okay to assume that the EqualPlusHash methods are always in line with that mode too?

Hmm. Hiding the mode from Rhombus programmers by default seems like a good way to keep the Rhombus experience free of distractions. There's little reason for a type to customize equal-always? and equal? in different ways except possibly to maintain consistency with existing Racket data structures.

Like @countvajhula, I'll also point out that the key method is even freer of distractions. No need to think about recur, mode, hash-proc, or hash2-proc. No need to repeat operations for the left side and the right side of the comparison. That convenience is the main reason I like it.

A secondary reason is that it's decoupled from details like those, perhaps even some details that might arise in the future. If Rhombus were a standalone language undergoing a lot of experimentation, then the choice to prevent Rhombus code from depending on these details could be a good way to save people from having to rewrite their code later on.

However, Rhombus isn't entirely a new language; I hope it will coexist with Racket code that doesn't use key at all. So I wouldn't go quite so far as to say that everything that can go through the key interface must do so (much less to conjecture that every possible comparison operation can usefully do so).

@countvajhula: As I recall, part of the work that would be needed to support it out of the box is to have a unique "type tag" for custom user-defined types, which Racket currently doesn't provide. The POC code allows us to link the gen:equatable (called gen:comparable in the code) to a struct type property that includes such a type tag, but that only works for types that explicitly declare gen:comparable and not ones (i.e. hopefully the majority) that use the default equality extension implementation (based on vectors).

I don't see this as being an issue for the purposes I think you're going for here.

Using struct->vector breaks the encapsulation of the abstract data type, so I think it's probably a mistake for anyone but the author of the type to invoke it. Specifically, in that code, I don't think the interface should have a default that invokes it. Now the default implementation has two missing parts: The type tag and all the fields. 😛

But the default you're going for can be accomplished a totally different way. Rhombus's analogue of struct could inject an implementation of gen:comparable into the type it defined. The author of the type is right there saying what the type tag and all the fields are, and they have the authority to poke holes in the encapsulation of their own type if they want to, so the information is all there with no strings attached.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

It might be worth considering a notion of equality that recurs through immutable data structures like equal? does, but uses reference equality on mutable data structures.

Same idea as egal? from Rackjure and Equal Rights for Functional Objects

from rhombus-prototype.

spdegabrielle avatar spdegabrielle commented on May 25, 2024

regarding eq? would same? be better?
the racket reference uses same to describe eq?;

Return #t if v1 and v2 refer to the same object, #f otherwise.

https://docs.racket-lang.org/reference/booleans.html?q=eq%3F#%28def._%28%28quote._~23~25kernel%29._eq~3f%29%29

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

regarding eq? would same? be better?
the racket reference uses same to describe eq?;

Return #t if v1 and v2 refer to the same object, #f otherwise.

https://docs.racket-lang.org/reference/booleans.html?q=eq%3F#%28def._%28%28quote._~23~25kernel%29._eq~3f%29%29

The name we pick for eq? ought to be longer / more complex than equal?, to nudge people towards using equal? by default. So while I like using the word "same" for this, I think same-object? (or something similar) would be much clearer than same?.

from rhombus-prototype.

spdegabrielle avatar spdegabrielle commented on May 25, 2024

+1 for same-object?. Much better than same?.

from rhombus-prototype.

sorawee avatar sorawee commented on May 25, 2024
  1. Why don't we make it consistent: use =? instead of =? We do have things like free-identifier=? already anyway (yes, I'm also up for <?, <=?, etc.).
  2. Is there a case where (equal? a b) and (= a b) don't behave the same when a and b are numbers? If not, why don't we also merge them into one. I.e., in general use =? (which is currently equal?). If you want a contracted version, just prefix it: number=?, string=?, etc.
  3. object-equal? or reference-equal? also works for eq?'s new name. No need to invent a new name (same).

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

To answer (2), yes, for (equal? 1 1.0) => #f vs (= 1 1.0) => #t. Exact-integers and floats-that-happen-to-be-integers are different racket values, but are equivalent numbers under =

from rhombus-prototype.

dedbox avatar dedbox commented on May 25, 2024

My primary use for eq? is with symbols, especially as keys in eq?-based hash-tables. My impression is that there is a notable performance benefit.

Here, here. eq? (and intern) are in the original Scheme paper, and I'm having a hard time understanding (Jay's account of) Robby's idea.

Is he saying eq? is misused so often that it is effectively useless, or that it is actually useless for some technical reason?

from rhombus-prototype.

jeapostrophe avatar jeapostrophe commented on May 25, 2024

eq? interferes with optimizations because it exposes pointer identities, so you cannot, for example, create copies of immutable objects. Similarly, it interferes with providing safety because you can tell the difference between an object and its chaperone.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

Mutable objects are a strange case. If I have two vectors v and w, it's very important to know whether or not they refer to the same underlying storage: meaning whether or not vector-set! on v changes the result of vector-ref on w. This is different from knowing if v and w are eq?, because a vector may not be eq? to its own chaperone but they both refer to the same storage. Today, equal? on mutable objects just considers whether they have the same contents - not the same identity (which, due to chaperones, is not the same as just being eq?). I think this is very confusing, and that we should change equal? on mutable objects to only return true if they refer to the same mutable storage. This would communicate that the identity of a mutable object is an important part of its semantics.

from rhombus-prototype.

LiberalArtist avatar LiberalArtist commented on May 25, 2024

Re @jeapostrophe's comment on eq?: I agree that we should avoid exposing pointer identities (at least in the safe/high-level language) and we should allow the implementation to be aggressive as it wants at optimizing storage for immutable objects. I hope we can find a way to do that without degrading performance for things that are now idiomatic and correct uses of eq?, particularly with symbols.

I have a vague notion that the cost of equal? vs. eq? is especially notable in hash tables. With a statically-visible symbol, equal? can just be optimized to eq?, but AIUI it triggers more expensive double hashing in hash tables.

Another use-case is weak eq?-based hash tables to cache expensive computations that might be repeated: if the hash-table lookup becomes more expensive, that might negate the benefit of using the cache in the first place.

But maybe we will come up with an equality test that avoids the problems with eq? but still has good performance.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

eq? interferes with optimizations because it exposes pointer identities, so you cannot, for example, create copies of immutable objects. Similarly, it interferes with providing safety because you can tell the difference between an object and its chaperone.

For this reason exactly, I think it'd be good to introduce a prop:has-hashable-object-identity property that new user-defined data structures don't have to implement (even if every value up to now has done so).

And, uh, I think this would eventually motivate migrating to yet another list representation, this time using cons cells that are not mutable and do not have object identity. But maybe one step at a time...

With that in mind, this is how I figure the equality functions can be renamed and recontextualized:

  • eq? -> equal-cheap, an equality check that only works on values that can be compared without traversing them. All values that have object identity support this since the object identity is the only part that needs to be checked.
  • (did not exist) -> equal, a new deep equality check that not all values support, but that is actually guaranteed to tell the truth about indistinguishability for values that do. Again, this any value that has object identity can be compared without traversing into it, so this operation is equivalent to eq? in Racket 1, and that's why it doesn't exist yet.
  • equal? -> custom-hashable-equal, which no longer accepts all values, but just those values which have object identity, have support for the new equal, or have prop:equal+hash. The name might tip people off to the gotchas of using this kind of equality, since programmatic behavior in an untyped language is unlikely to enforce algebraic laws.
  • = -> numbers-are-actual-numbers-and-have-been-rounded-to-be-equal. This rather elaborate name clarifies that this operation is only for numbers, that it returns #f if there are any NaNs, and that it isn't comparing real numbers, but only roundings thereof, which may have become arbitrarily imprecise.
  • eqv? -> (retired, possibly renamed to number-content-or-character-content-or-other-value-object-identity-is-equal)

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

I think in nearly all cases where eq?-based hash tables are used, they're used with quoted symbol keys. In these cases, instead of exposing pointer equality we can just point users to a special purpose data structure that assumes symbol-like keys such as Rebellion's records. This also makes it easier to optimize further than what is possible with a generic hasheq data structure that assumes nothing about its keys.

I really think we should avoid introducing additional equality operators and concentrate on changing or removing the existing ones. I'd much rather change equal? to treat mutable objects with different identity as unequal than introduce a new equality operator with this behavior. If I wanted to compare two mutable vectors on their contents alone, the first approach I'd reach for would be (equal? (vector->immutable-vector vec1) (vector->immutable-vector vec2)) rather than read through the docs on the various equality operators until I'm sure I have the one that does what I want.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

For an equality function that uses structural equality on immutable values, but reference equality on mutable values, is chaperone-of? the function that currently does that?

I would propose making equal? more like Rackjure's egal? or Pyret's equal-always, while renaming the existing structural-on-mutable equal? to equal-now?.

However if chaperone-of? already behaves like an "equal-always" predicate, then that might be as simple as renaming chaperone-of? to equal? while renaming the old equal? to equal-now?. Is this right?

If chaperone-of? behaves like an "equal-always", then the only difference I can think of between chaperone-of? and egal? is on streams, where egal? says streams are egal if they have egal elements (potentially diverging on infinite streams), where chaperone-of? would return false.

(Edit: okay it wouldn't quite be that simple, there would also need to be chaperone-of-based hash-tables, renamed to equal-based hash-tables along with it)
(Edit 2: apparently chaperone-of? isn't symmetric, so I suppose the new "equal-always" version of equal? should be equivalent to (or (chaperone-of? a b) (chaperone-of? b a)), except it shouldn't have to traverse the data twice)
(Edit 3: so just or-flipping on the outside won't be enough, the or-flip needs to happen on every nested place because it could need different directions in different parts of the data)
(Edit 4: so just or-fliiping-and-traversing won't be enough, because if x2 and x3 are both chaperones of x1, they're unrelated by chaperone-of? in both directions)

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024

One potential confusion with == as equal? is that 1 is not equal? to 1.0. (That's one reason why == isn't currently equal?.)

Maybe == could be numeric equality when both arguments are numbers and equal? otherwise.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

What if instead of making == mean numeric equality on numbers, we just made numbers read as exact decimals by default? Then 1 == 1.0 would be true even if == means equal?.

from rhombus-prototype.

sorawee avatar sorawee commented on May 25, 2024

@jackfirth What should be the answer of

1 == exact_to_inexact(1)

?

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

I think of it as a type mismatch. Two things of different disjoint types can't be equal, similar to how some_number == some_string is allowed, but pointless. It's important that it's allowed so that things like x == some_gensym work correctly. But we could lean on static analysis to detect cases where equality tests are guaranteed to be false.

from rhombus-prototype.

Metaxal avatar Metaxal commented on May 25, 2024

My current(*) personal preference:

  • set, := or <- for assignment
  • = for numeric equality (1 = 1. is true).
  • == for value equality within the same type, and exception raised if types are different
  • equal? for type equality, then value equality
  • object-equal? or ref=? for eq? (these names aren't great, but at least it's intuitive)
  • eqv? does not exist

Roughly based on the principle of least surprise: names should be intuitive enough and not do what you would not expect them to do (which is less stringent than "they should do what you expect them to do").

(*) Ask me next month for different answers.

from rhombus-prototype.

sorawee avatar sorawee commented on May 25, 2024

In Discord (and above comments), @AlexKnauth has been advocating for == for egal? over equal?. egal? is probably not ergonomic unless most things are immutable (strings in particular), but it looks like we are heading in that direction anyway?

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

I've been calling it "equal always" to distinguish it from "equal now", and I was thinking of having 2 separate operators for them. So my current preference:

  • =~ for "equal now", which is what Racket's equal? does.
  • == for "equal always", somewhat similar to Racket's chaperone-of? but symmetric.
  • === for "identical", which is what Racket's eq? does.

With == equal always being blessed as the preferred notion of equality, the default for list operations such as member and remove-duplicates, the one used to enforce chaperone compliance, and the one used for key comparison in the most convenient hash-tables / maps / dictionaries / sets.

Meanwhile:

  • := for assignment.
  • = for numeric equality would be nice, but I suppose I'd also like =. or .=. Slightly prefer =. over .= for number comparison.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

More concretely, I would like == equal always to be defined such that:

a == b iff there exists a value v such that (chaperone-of? a v) and (chaperone-of? b v)

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

Match patterns aren't values... they don't have a "reference" identity to compare against, of course they can match mutable values structurally.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

I'd rather just implement gen:equal+hash with equal-always semantics for Rhombus data types than try and introduce a parallel equal-always? procedure. I don't think it's worth having two operators for this.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

I do think the matching of pure-constructor-only patterns against the results of pure-constructor-only expressions should coincide with the default equality check, so I think @mflatt makes a good point there.

Constructors of mutable objects, like Posn here, are generative rather than pure. However, I'd say their generativity can be thought of in terms of a pure functions that takes an unused object identity as an argument. It's the creation of an unused object identity that's impure. In some languages, writing out new Posn(0, 0) with a new keyword would make the creation and passing-in of that object identity explicit.

In that case, the corresponding pattern would also be written out as new Posn(0, 0). The notion of new in expression position is to create a fresh identity, so the corresponding notion in pattern position would be to consume one. In order to be sure this is a fresh identity being consumed, the pattern has to verify (somehow) that there are no aliases of this identity that are accessible to the program. This may sound kind of impractical short of using linear types or global weak maps to track aliasing, but if it's impractical, that means new Posn(0, 0) probably shouldn't be a supported pattern in the first place.

Serendipitously, aliasing also helps explain what's going on with mutable state here. If a mutable value has aliases and we try to match it in a pattern like Posn(0, 0) or new Posn(0, 0), there can be awkward situations where concurrent modification happens in between accessing the x and the y of the Posn, or in between accessing these two values in the pattern and using them later on in the branch. If we only allow pattern-matches on mutable values to proceed if they're not aliased, then these awkward cases are eliminated. And since we know at construction time that a value isn't aliased, requiring it not to be aliased at pattern-matching time makes the pattern more neatly analogous to the corresponding expression, too. So it seems like the right semantics for the pattern, even if it's impractical again.

If the pattern were something other than the expression, such as now Posn(0, 0), then the semantics of this pattern wouldn't have to coincide with any expression semantics, and they could be allowed to be something more practical. These semantics could make it explicit that the match succeeds for any given object identity (even non-fresh ones), that the object identity is not bound to anything by the match, and that concurrent modification might throw a wrench in the x and y submatches' up-to-date-ness and mutual consistency.

I don't think this now Posn(0, 0) would make much sense in expression position. The pattern form discards information, and it's unclear where the expression form should procure it from. Because of this, I don't think a design where Posn(0, 0) means now Posn(0, 0) would make sense either.

However, I think in a world where object identities and concurrent modification are always considered accidents, the act of discarding this information is moot because well-behaved programs always discard it anyway. In that case, the expression now Posn(0, 0) can make sense and can be implemented to procure the information from anywhere we like, such as by using the same implementation as new Posn(0, 0). Collapsing these into the common notation Posn(0, 0) seems justifiable in that world.

Anyhow, getting to the point:

  • In an "equal now" world, Posn(x, y) makes complete sense as a pattern and as an expression.
  • There is the alternative to have a notational distinction between new Posn(x, y) expressions and now Posn(x, y) patterns (or whatever syntax they need to have in Rhombus). I think these would be a better choice in an "equal always"-by-default world to make the "equal now"-style structural traversal explicit.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

The kind of "key" function you can imagine that can pick a single representative for its equivalence class, used to map to a more primitive operation eq?, sounds like what interning does, with datum-intern-literal.

However, there are good reasons why something like (== x y) wouldn't implemented as just (eq? (datum-intern-literal x) (datum-intern-literal y)). For one thing, just checking equality between two objects shouldn't require finding-or-allocating a 3rd object to be the "representative" for their equivalence class. The equality operation also shouldn't require using a table to look up that representative if it exists already, since that table itself would need the equality operation anyway.

I've been thinking about this because it would also a bad idea to implement my (equal-always? x y) as just (chaperone-of? x (remove-all-chaperones y)), for similar reasons. Rather than trying that, in my implementation referenced above I'm mostly using the existing implementation of chaperone-of? and equal?, just adding a new mode in the same traversals.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

The kind of "key" function you can imagine that can pick a single representative for its equivalence class, used to map to a more primitive operation eq?, sounds like what interning does, with datum-intern-literal.

That could be useful, except I don't think eq? is the primitive equivalence relation we need. It would rather be more like equal? except that it treats mutable/immutable more strictly -- similar to your equal-always?, I think (but I am not too concerned with the details of this aside from it being more discriminating than what we need in practice). The defining function also does not mean a specific function. It's just that given any notion of equivalence that we seek to model, there always exists some function (or several) that we could use to express it.

For one thing, just checking equality between two objects shouldn't require finding-or-allocating a 3rd object to be the "representative" for their equivalence class.

For performance reasons, or something else?

The equality operation also shouldn't require using a table to look up that representative if it exists already, since that table itself would need the equality operation anyway.

I don't think I suggested a lookup table for this purpose. Did I implicitly assume one? I believe the proposed primitive equivalence operator = addresses possible circularity here.

from rhombus-prototype.

LiberalArtist avatar LiberalArtist commented on May 25, 2024

Re #16 (comment):

What if instead of making == mean numeric equality on numbers, we just made numbers read as exact decimals by default? Then 1 == 1.0 would be true even if == means equal?.

It's a tangent, but this comment prompted me to think that, regardless of equality, an advantage of having numbers read as exact by default is that a macro—perhaps even #%datum—could transform them to inexact numbers with no loss of precision, whereas, if read produces inexact numbers, the reverse transformation wouldn't work.

Personally, most of the time, exact arithmetic is worth the price for me: either what I'm doing isn't very numerically intensive, or it's important enough that I want to avoid floating-point headaches. I also think the numeric tower is a fairly distinctive inheritance from the Scheme tradition worth spreading to the world. On the other hand, as Matthew said, having good floating-point performance when you do want it is also a feature, and being surprisingly slow could be a poor first impression.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

The places eq? is seemingly encouraged in educational material for beginners, such as a very memorable page in Realm of Racket, you know the one where they shave the guy and the other argument is also shaved... would be better off replaced by equal-always? in their new versions in Rhombus, at the very least.

My preference would be that equal-always? semantics would be the go-to equality operation in the new language, and another function with equal-now? semantics would be off to the side in another operator such as =~... but,

Even if people don't like having my preference:

  • =~ for equal-now
  • == for equal-always

I wonder if those other people would like something like:

  • == for equal-now
  • === for equal-always

instead?

To be clear I'd rather have == as equal-always... but a version where equal-always at least exists under some name would be better than being left with just eq

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024

A reminder: So far, there have been 0 concrete examples reported where equal_always instead of equal_now would have solved/avoided problems in practice. We could use some examples.

from rhombus-prototype.

samth avatar samth commented on May 25, 2024

For those interested, Henry Baker's original paper proposing egal? (the same as equal_always) is here: https://dl.acm.org/doi/10.1145/165593.165596.

However, it doesn't describe any concrete examples where equal_now causes problems; the arguments made are primarily that it's bad to have a non-referentially-transparent equality predicate, and that it is bad to have a bunch of them that programmers have to choose between. (There's also discussion of why providing traditional eq? semantics on immutable data is problematic especially in the presence of distributed computing, but I don't think that has a clear bearing on this issue.)

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024

However, there are more problems in existing Racket programs using eq? that would be fixed by switching those to equal-always?.

I had indeed not understood that motivation from the preceding discussion. Thanks for the clarification. So, we're looking for examples where eq? has gone wrong and would be fixed by equal_always but not equal_now?

I can remember a places where eq? was used to recognize procedures and it interactedly badly with chaperones, as you say. It was the notify-on-change method of style-list%. But using equal? there was a fine start at a correction, because it wasn't really trying to check the mutability of a procedure. Also, there was more to the change. (See the documentation for that method.)

The problem that people some reach for eq? because it's shorter than equal? seems less relevant. As we pick new operators, we're aiming for the reverse, at a minimum.

But place where eq? was used to recognize mutable things and interacted badly with chaperones sounds relevant, and I'm curious to hear more.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Just a friendly check-in -- has there been any consideration of my proposal above? Please let me know if I can provide additional examples or clarification to illustrate it further.

In my humble opinion, it is the simplest possible equality interface a language could provide as it is essentially just one equality relation (==, which reduces to = in the naive case). It is intuitive for users due to its simplicity, and also minimizes the maintenance work required for core developers, since it provides a simple standard for all equality-based interfaces in the entire language ecosystem - that is, they expose an optional key argument to be passed through to ==. Contrast that with the present state of affairs, where there is sometimes a key argument expected, sometimes a binary comparison predicate, sometimes it isn't exposed at all for customization, and sometimes there are parallel sets of interfaces. The "key" argument can be used in all cases, and even for order relations such as < and > (relation/order is a proof of concept here, if you'd like to try it).

I'd like to highlight that this 2-level scheme is more along the lines of the original issue description re: exposing a smaller interface. It is agnostic wrt the specific choice of =, modulo the considerations I mentioned, and for instance, something resembling equal-always? would be fine here - just ideally one that is efficient for all types of comparisons, including numeric comparisons.

from rhombus-prototype.

mflatt avatar mflatt commented on May 25, 2024

From @countvajhula:

Just a friendly check-in -- has there been any consideration of my proposal above?

I haven't seen any more discussion than above, so I'll offer some thoughts that might prod further discussion.

The good part that I see is that parameterizing over one function to find a representative value is conceptually cleaner than parametering over separate equality functions, hashing functions, and perhaps other functions. With your strategy, it's not possible to, say, have an equality and hashing function that are out-of-sync. Also, whether an operation or data structure uses hashing is completely hidden.

The big drawback, as I think you've alluded to, seems to be the cost. Converting a large tree to make sure the leaves are immutable would be expensive, for example, as a prerequisite to discovering that the big tree is not equal to #f. Separate conversions over pieces of an otherwise shared data structure could also lead to asymptotically worse space consumption or time, depending on whether a representative is retained as a key. Even for data that's consistently small so that there's no asymptotic-performance question, the difference between a direct equality comparison and going through a synthesized intermediate value can involve a significant constant factor, and a constant factor on a common operation adds up.

On a somewhat related note, eq-hash-code doesn't work as a way to find a representative for eq? comparison, because eq-hash-code is just a hash function (i.e., it loses inforamtion). Pinning down an address for each object creates all sorts of fragmentation issues, whether the address is the actual one (so fragmentation at the main memory allocator) or just allocated from a conceptual space (i.e., a new kind of allocator that must keep track of allocated addresses, even if there's no payload, making eq? more expensive).

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

This is great @mflatt , I appreciate your taking the time. On the face of it, I feel at least some of these concerns can be favorably addressed. I'll give it more thought and share my findings soon.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Here are some thoughts on your critique @mflatt -- thank you again for the time you put into it.

But first, I want to point out that the suggestion to include an optional key argument into the equivalence interface of the language has two aspects to it: (1) a theoretical aspect, of whether it can model equality well, some of which we have discussed above, and (2) a language design aspect, of whether it provides a good and effective interface for the developer while also keeping the language coherent and free of ad hoc patterns.

These are closely related but still can be considered independently. The former forms the basis for why it may provide a consistent and simple user interface, and also forms the basis for why it can simplify equality-related interfaces in the language to create a smaller and simpler language overall. Aside from the subtle considerations that inform the former aspect, we can consider the second aspect purely in empirical terms -- is using a key function a common, if ad hoc, pattern? And do we gain something for users and/or developers by formally incorporating it into the equality interface itself?

Regarding the concerns you brought up, there are four resolutions I can see. Referring to the low-level relation (e.g. egal/equal-always) as = and the high level one with the optional key argument as ==, these are:

Resolution A: always (at compile time), if a = b (e.g. via egal), then return true without doing the mapping.
Resolution B: in some cases, a mapping is necessary anyway, and this approach does not add something formerly absent.
Resolution C: The most commonly used key functions (such as string-upcase) could have custom compile-time optimizations. This is an option not available in the ad hoc case and should mean that these cases are more performant than if we didn't adopt the key function into the equality interface.
Resolution D: In the case of egal, eq?=, so the check for identity is done at the elementary equality level, and does not need a mapping at all.

In more detail:

Converting a large tree to make sure the leaves are immutable would be expensive, for example, as a prerequisite to discovering that the big tree is not equal to #f.

Yes, as you pointed out, using an ->immutable mapping function would be an expensive way of discovering a big tree is not #f. If the relation = distinguishes mutable and immutable versions of a value (as egal does), then there are two considerations here: (a) if we are content with this distinction, then typically we wouldn't provide a key argument here so there would be no mapping performed, and (b) If we are interested in not making this mutable/immutable distinction and only care about the contents, then this would indeed remain expensive. The question here is: do we think this is a common case? Is this a pattern we want to encourage or one that we should discourage? If the answer to the former is no and/or the answer to the latter is that the pattern should be discouraged, then we should be OK with this case being inefficient. Finally, if we do feel this is either common or a pattern to be encouraged, then it could still potentially be amenable to Resolution C, i.e. special compile-time handling of the ->immutable mapping function.

Separate conversions over pieces of an otherwise shared data structure could also lead to asymptotically worse space consumption or time, depending on whether a representative is retained as a key.

Case 1 - (== #:key f a b) where a and b are eq?

This is addressed by Resolution A.

Case 2 - a and b are not eq?

Do you have examples here of cases where this approach would introduce a mapping where one formerly would not have been needed? I can imagine something like (== #:key ->float 1.0 1), and I'm wondering whether such cases would always be numerical.

In most cases I can think of, the mapping would be more like string-upcase or floor or something domain-specific such as surname. In such cases, we need to do the mapping anyway (i.e. Resolution B). And in fact, since we are formally recognizing this common practice in the interface, we have an opportunity to optimize the most common cases (such as string-upcase) here (Resolution C), making them faster than they usually are.

Even for data that's consistently small so that there's no asymptotic-performance question, the difference between a direct equality comparison and going through a synthesized intermediate value can involve a significant constant factor, and a constant factor on a common operation adds up.

As we only use a mapping function in a subset of cases, the issue here boils down to these questions:

  1. how big is that subset compared to all cases?
  2. for that subset, what proportion of cases would have required a mapping even without the 2-level scheme in place?

And depending on the answers here, Resolutions B and C may be applicable for the same reasons as the previous point.

On a somewhat related note, eq-hash-code doesn't work as a way to find a representative for eq? comparison, because eq-hash-code is just a hash function (i.e., it loses inforamtion). Pinning down an address for each object creates all sorts of fragmentation issues, whether the address is the actual one (so fragmentation at the main memory allocator) or just allocated from a conceptual space (i.e., a new kind of allocator that must keep track of allocated addresses, even if there's no payload, making eq? more expensive).

That makes sense. Assuming we use something like egal where eq?=, then we do not need a mapping here and can just use == as is, as without a mapping, == reduces to = (Resolution D).

These considerations don't touch upon the language design aspect of the proposal, which there's a lot there as well. My comment on the other issue touches upon some of these considerations. But also, for instance, adopting this interface may make partial application of functions more prevalent, making currying or the need for something like fancy-app more prominent, and these are things that would need to be considered as well.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Hope everyone's doing great. As discussed (and at the suggestion of @samth ), I've started an RFC-style doc here as a Gist:

RRFI [Draft]: Equality and Order Relations Interface

This includes the proposal regarding standardizing on the key argument, and making all of the elementary relations (=, <, >, <=, >=) generic, and also the various standard APIs that would be affected. Feel free to make any comments on it here, and we could also discuss it in person at a future meeting.

I've referred to it as an "RRFI" in analogy with SRFIs (and also the similar practice of PEPs in the python community) -- the suggestion being that we can discuss and work on the specifics of the proposal together, fleshing out any missing pieces, improving it until it is good (and I can edit it to reflect these changes -- or even post it somewhere where it is publicly editable), and then decide on whether to implement it or not. Of course, we can go about it however we like :)

Enjoy ☕ 😄 👍

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Just to continue the discussion from today and the point @jackfirth brought up, I think we agree that we want the following relation to hold:

	a <= b <=> a < b or a = b

In the current proposal, we define a <= b to be a < b or a = b, so that this does hold by definition. In the specific case:

	2 <= 2.0 <=> 2 = 2.0 or 2 < 2.0

With Racket, this holds, and additionally, both the LHS and RHS evaluate to #t.

With = as equal-always? in Rhombus, this would become:

	2 <= 2.0 => #f

And:

	2.0 = 2 => #f
	2 < 2.0 => #f

This does not violate the original relation -- it just has a different view of the relationship between 2 and 2.0.

Given a sorted set {1, 2, 3}, and if we attempt to insert 2.0,

In Racket, this would return {1, 2, 3}, i.e. 2.0 would not be added because it is treated as equal to 2.

In Rhombus, this would return {1, 2, 2.0, 3}, i.e. it would be added because, according to equal-always?, it is not equal to any existing member, and it would be placed within the set either before or after two, depending on the sorting algorithm.

Now, if we did want to treat 2 and 2.0 as equal, then one way we could do it is this:

(define s (sorted-set #:key ->float))

Inserting elements into the set are then effectively compared using (< a b #:key ->float) with equality checked using (= a b #:key ->float). In the case of 2 and 2.0, since (= 2 2.0 #:key ->float) => #t, adding 2.0 in this case would not add it to the sorted set.

Of course, whether this is the actual strategy employed here or not is an implementation issue we'd need to work out.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Thank you for furnishing the concrete counterexamples. Any other ideas here?

One thing that came up in discussion with @spdegabrielle is that for numbers we could potentially consider (and (not (< a b)) (not (> a b)) to be the definition of equality if a and b are of different precisions (e.g. 22/7 and 3.1428), which could be related to the "order based equivalence" notion @jackfirth brought up.

If this could be accomplished by augmenting equal-always? with special behavior for numbers, that would still be preferable to having two separate user-facing equality relations. I think I'd rely on folks with more expertise in this area to propose options here as I'm very much out of my depth.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Comparing values of different types seems quirky even at a conceptual level, and it probably shouldn't dictate the design.

If we do contemplate it anyway, we should probably reach whatever conclusion we would reach if they were members of the same type. And perhaps that's already the case here; perhaps we can think of 2 and 2.0 as being members of the real? type.

Very true. Funny that a pathological case like numbers happens to also be a prominent one in the design space here. I think designing around equality checks being type-native comparisons makes a lot of sense.

->float was in retrospect an attempt to convert the numbers to the same type, but it so happens that information is lost or even misrepresented in this conversion. Maybe real? or another type like it is the one we need - a rich type containing more information rather than less. Could be something like this:

(struct real (num)
  (define (equal? a b)
    (let ([a-num (real-num a)]
          [b-num (real-num b)])
      (cond ((equal-always? (type-of a-num) (type-of b-num)) (equal-always? a-num b-num))
            (else (and (not (< a b) (not (> a b)))))))))

And then (= a b #:key real) would work.

Rhombus should have a single equality check operation that's preferred for virtually everything, =. It should be what it needs to be so that arithmetic works in a familiar way, even if that skews other aspects of Rhombus's design.

Not all types should actually have full support for =, and out of the types that do, some of their values (NaN in particular) might not participate in it. The function type would be a good example of a type that doesn't support an = check. Types like that might still have a conceptual notion of when values are = even though it's not programmatically checkable.

+1

Leaving out exceptional cases for special handling in the support of a simple unity in the majority of cases is a worthy compromise.

Actually, functions are an awkward example which brings out some more subtlety. In Racket, and presumably in Rhombus, many structs with their own notions of equality can all implement prop:procedure.

So in general, a type's notion of = might have multiple disjoint subsets of values that can be compared. It just so happens that for the real? type, NaN is not part of any set.

This is a great way of putting it. I don't know much about type theory so I may be abusing the terminology here, but it seems like what we're saying here is that subtypes could have a notion of equality even if the parent types do not. And in the case of NaN, we treat it as excluded from the real? type. Further, we could either treat it as a distinct type so that NaN = NaN, or not at all (as you suggest) so that NaN = anything is an error.

I'd like to consider a function to be pure if it not only has no externally visible side effects but also obeys function extensionality with respect to its input and output types' respective notions of =.

+1

Embracing = even at this conceptual level ought to reduce the overall number of notions of equality people need to juggle. And this mindset leads to a few interesting implications:

This makes + an impure operation on floating-point inputs, since the expressions 0.1 + 0.2 and 1/10 + 2/10 pass it = inputs but compute non-= results. Calling + impure seems like a good choice; people dealing with floating point values need to be aware of + as a source of surprises (e.g., catastrophic cancellation), so it's already impure in a practical way.

👍

It would be odd to say a function that constructed a new mutable hash was pure, but if = on mutable hashes behaved like equal?, we might end up with it being pure on a technicality. That's a clue that = on mutable hashes should behave like eq? instead. (Or, in general, like equal-always?.)

For an ecosystem which enables pure FP as much as Racket's does, it might be weird for user-defined structs to have impure constructors by default. That's a clue that user-defined structs shouldn't default to a notion of = that matches eq?. Structs in Rhombus should probably have some other default behavior for = instead, like an implementation which treats every value of the type as a non-participant in its = check, similar to NaN.

I'm probably missing some context here (e.g. is this a resolution to issues around "generativity" and prefab structs?), but some questions: favoring NaN-like exclusion over eq? -- is this specifically for opaque structs? Couldn't we could use equal-always? for transparent ones?

Types that support <= should support = as well, and a = b should be consistent with (a <= b) and (b <= a).

Yes, agreed.

Functions and data structures that are polymorphic over all equality-equipped types should rely on their caller to specify what that equality is, since the caller is also specifying what the type is. If the caller expresses no preference, = should be the default. When the caller does specify their preference, they might use a key function (as @countvajhula's been saying) or a comparator.

Yes, and as mentioned in the RRFI, I would favor a key function exclusively, since mathematically it always works, and simplifies the API convention for users and developers alike.

While I'm thinking about it, here's various functionality I imagine each equality-like comparator might support. Not all comparators would necessarily support all the features. Some of these features are kind of pie-in-the-sky ideas that might never need to exist. certainly not in the first several versions of Rhombus.

Checking that a value is a participant in the comparator, and figuring out which disjoint subset of comparable values it belongs to.

If the "disjoint subset of comparable values" could be considered a distinct (sub)type, would the delegation via standard mechanisms (e.g. generic interface) suffice here?

This may be a sidetrack, but for user-defined types, specifying a definition of equality currently only dispatches on the first argument (i.e. single dispatch). It seems that, like you're saying here, I think, the specification here would always need to check if the second value is comparable or not. Solving it via specialized optimizations here would be great, but it would be even better if we could have efficient multi-dispatch in general, so that this just becomes a special case of an existing optimized dispatching scheme.

I'd need to think more about your suggestions re: comparators:

(More generally, a comparator could have all the same features a contract does, including the ability to check whether another comparator is a restriction of it (contract-stronger?) or to generate random examples of comparable values. I think most contract features would be superfluous on comparators and vice versa, but there may be potential for some features to influence each other or to be factored out.)
Hashing a value.
Comparing values.

How do you see these comparators being defined -- is it in a similar way to gen:equal+hash, i.e. a type-specific method definition somewhere, or something else?

(I'm also interested in a generalization where a comparison is actually performed on a graph of values with comparators on the edges between them. Not because I actually ever imagine it being useful in an application! I suppose it just seems related to taking the limit of a diagram in category theory, so it seems like the kind of generality that might be useful by accident.)

A graph struct type could define = in such a way that the nodes of two instances t1 and t2 are walked and compared, but I'm sure I haven't understood the "category" you're imagining here.

Sorting a whole collection of values according to that total preorder, producing a list of partitions. (I've built something like this before in the form of Interconfection's getfx-cline-sort.)

Looking up a value's entry in a trie, possibly a trie where some subkeys are wildcard patterns that match any key. (I want to use something like this for pattern-matching, but I haven't figured out all the details.)
Diffing values to create unit test failure reports.
(In general, an unsuccessful comparison could generate a diff that's a lazy stream of steps that traverse into the value tree and find a series of equality witnesses until they find an apartness witness and abort. A successful comparison could have the same result, but would encounter no apartness witness and would end up with a fully zipped structure, similar to the original values but with equality witnesses at the leaves.)
(I kind of want to figure out if the lazy stream of apartness witnesses and the sequence of trie lookups can use the same API somehow.)

These are all very interesting -- sounds like the interface here would need to be elaborated on, e.g. how does it relate to the boolean return value of the equality comparison? Is this a separate interface, independent of (and consistent with) =, e.g. (diff a b), to generate the lazy witness structure? Or is this computed/returned "monadically" in some way, so that it is a side effect of = that can be accessed somehow?

Inspired by the the kind of nominal-type-based optimization that @jackfirth has talked about in the context of stream and collection APIs, specific kinds of comparators could have various extra operations that made them more cooperative as subcomponents of certain other comparators or data structures. For instance, a comparator between strings might first transform the strings and then compare the results according to another string comparator, but if string comparators exposed certain extra functionality, this process might not need to construct intermediate strings nor take two passes over the data.

The notion of these comparators as distinct objects with structure between them is something I'll need to spend more time to understand, but sounds like it could enable some interesting possibilities including optimizations.

I wanted to get these initial thoughts down for now, but hopefully others will give more thought to your ideas here and discuss them further from other points of view.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

I think it's worth moving discussion about a generic ordering interface to a new issue, separate from this issue. This one is about equality. The two topics are related but distinct.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

True, I created a separate issue for order relations @jackfirth . Much of the recent discussion has been more about equality and also the relationship between order and equality, so I think that can continue here (and such hybrid discussions could arise on either ticket), while discussions pertaining specifically to one or the other should fall under the dedicated issues.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

@countvajhula There are a lot points I'd love to talk about more, but it might get to be a very long reply and distract from the thread. I'll focus on some main points. I'm even skipping some clarification questions, just in case the rest of what I'm saying here is clarifying enough already, so please feel free to ask more than once.

On the topic of key functions, I think it's a viable approach. Key functions seem to be like compilers, where if there's a composable language of enough key types to compile to, people can get a lot done that way. As someone who likes extensible languages, I'm inclined to want to be able to define key types in terms of functional code in case there's a type the language designer forgot, and a lot of my thoughts on this are in terms of comparators. But it looks like what you're proposing doesn't get rid of the gen:equal+hash-style way of defining new innately comparable types. A lot of what I've said about comparator features could be translated into gen:equal+hash features instead.

In general, it seems pretty easy to shift concepts around. A comparator library can fit into a key function world by having wrapper types that compare themselves by invoking comparators. A key type can fit into a comparator world by having a comparator that works by invoking a key function. On the one hand, I'm reluctant to limit things to one approach if there are so many unfinished ideas that might need another, but I don't see a long-term difference in expressiveness.

would the delegation via standard mechanisms (e.g. generic interface) suffice here?

Virtually always, I think. And that's a good point because each island of mutually comparable values should probably have its own identifying predicate, contract, and/or static type anyway, since code that relies on = probably knows what island it's working within and would benefit from double-checking that assumption.

I think I came at it from the other way around. The way I'm thinking about it, a generic interface type has instance values that also implement a patchwork quilt of other combinations of interfaces. An interface type that specifies some conceptual things about = has a patchwork of subtypes, some of which actually implement a programmatic = check and some of which don't. This patchwork, which I attribute not to the peculiarities of Racket structs but to the essential complexities of the Expression Provlem, is the reason I'm thinking about disjoint islands of comparable values in the first place. And normal use of the object system would be the way most such islands arise.

But then, some situations might have disjoint islands due to the essential complexities of the Expression Problem without actually coinciding with object system mechanisms. Defining islands just using struct would give you one island at a time. What if someone wants to define an infinite number of islands all at once? For instance, what if the type they're implementing is something of an interpreter, where one of the fields carries a symbolic specification of the desired subtype, and values pretend to be of their respective subtypes? In that case, rather than writing an infinite number of type definitions, this person might want to define one type whose = implementation explicitly modeled its disjoint islands.

This kind of thought experiment works just as well with = swapped out for any other generic operation. And that suggests it could be a topic better handled by generalizing the object system... but I dunno, redesigning the foundations can easily lead down a rabbit hole of endless generalization in endless variations. I think it's very reasonable to implement a first take on the comparison interface that just completely ignores this kind of generality (and uses the standard mechanisms like you say), and then implement another interface which allows for more expressive support for multiple islands.

This brings me to another part of what you said:

is this specifically for opaque structs? Couldn't we could use equal-always? for transparent ones?

I think of Racket's struct system as one point in a big design space of ways to let programmers define abstract data types. So I think of transparent structs as being a bit unnecessarily coupled with the struct system's details, and I wasn't thinking about them at all just then.

When someone's defining an abstract type, whether using struct or anything else, I think the least surprising default would be that it doesn't define any behaviors except what's specified. I think this is generally good for all kinds of operations, and I'd love for my types not to even support eq? or object-name unless I've thought through the details of what implementation details I expose that way, but right now, when thinking about =, I think it would be a good default for =.

There's also another way of approaching good defaults where they don't have to be the least surprising defaults possible, just defaults that seem like a good a starting point as any. This more opinionated kind of default may be especially good for more casual one-off creations and prototypes that come with no compatibility guarantees, where getting into a creative flow fast is more important than minimizing future regret. (I think this is where things like transparent structs come into play, too. If you want to make your type transparent, they're a certain default way to do so.) If Rhombus's user-defined types turn out to have a sensible default implementation of =, that seems understandable considering Rombus's approachability goals overall, so I suppose I'd just be glad to have an easy way to define a type that's not =-able.

Anyway, as you're saying, equal-always?comparison indeed seems like an ideal notion of = for transparent structs.

More broadly, I think equal-always? and = should coincide basically wherever = is defined, except for types like real? which have non-participants like NaN and/or participants with multiple representations like 2 and 2.0 (where programmers would have practical reasons to care about the difference between one and the other but still might want to consider them =). This is where I think of this = approach I'm describing as being skewed in quirky ways toward Rhombus's arithmetic needs. I'm not sure whether to suggest that equal-always? be skewed the same way, since these quirks may or may not be quite as at home in Racket.

I think the right name for this in Racket would be =. It should exactly coincide with Racket's existing = on numbers, including returning false for (= +nan.0 +nan.0). The new part would be extending it to work on things other than numbers.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

As someone who likes extensible languages, I'm inclined to want to be able to define key types in terms of functional code in case there's a type the language designer forgot [...]

There are no constraints on what a key function could be, except that they are unary and single-valued. As such, the language need not provide a library of key functions, as any (e.g. user-written) function would work. But you may be right that the language probably should provide some standard and commonly used ones, something like ->integer, both so that the user doesn't need to implement them, and also so that the language could treat these as canonical and leverage them in optimizations, as I think Matthew hinted at, re: storing optimized binary comparators "on" these canonical key functions at compile time.

But it looks like what you're proposing doesn't get rid of the gen:equal+hash-style way of defining new innately comparable types.

That's right. I think it's fine to leave something resembling gen:equal+hash as a way to associate a comparator with a type, even though in principle we could use a unary function here. For instance, essentially, the default equal-always?-style comparison for transparent structs is precisely an implicit implementation of a key function, viz. T -> Vector, where the struct is mapped to a vector whose elements are the field values -- vectors which are then compared using equal-always? (on the Vector type). But I think the fact that floating point comparisons are an exception here points to non-idealities in computer representations (e.g. inability to represent irrationals exactly) where a "hack" is necessary to ensure equality in ways that are practically desirable, such as treating 22/7 as equal to 3.142857 at sufficient precision. That's another reason why type definitions of equality using binary comparators makes sense -- to be able to handle any such pathological cases.

Although, for completeness, it could be acceptable to make the generic interface allow either a comparator or a key function (but not both) to be specified as the equality definition for a particular type, since the comparator could be inferred from the key.

A lot of what I've said about comparator features could be translated into gen:equal+hash features instead.

👌

In general, it seems pretty easy to shift concepts around. A comparator library can fit into a key function world by having wrapper types that compare themselves by invoking comparators. A key type can fit into a comparator world by having a comparator that works by invoking a key function. On the one hand, I'm reluctant to limit things to one approach if there are so many unfinished ideas that might need another, but I don't see a long-term difference in expressiveness.

Could you clarify what you see as unfinished here? Would be good to address these so we tie down any loose ends.

But then, some situations might have disjoint islands due to the essential complexities of the Expression Problem without actually coinciding with object system mechanisms. Defining islands just using struct would give you one island at a time. What if someone wants to define an infinite number of islands all at once? For instance, what if the type they're implementing is something of an interpreter, where one of the fields carries a symbolic specification of the desired subtype, and values pretend to be of their respective subtypes? In that case, rather than writing an infinite number of type definitions, this person might want to define one type whose = implementation explicitly modeled its disjoint islands.

Would you say that the real? type we discussed earlier is an example (at least in my reified version of it) of one type defining equality across multiple types? Since inexact? and exact? were subtypes of real? in this case, I'm wondering whether such cases would always arise with a parent type defining equality for all subtypes which could otherwise be islands. Just noting down my reactions, but this probably doesn't matter since as you said, there could be many different systems of defining types and we needn't constrain them.

This kind of thought experiment works just as well with = swapped out for any other generic operation. And that suggests it could be a topic better handled by generalizing the object system... but I dunno, redesigning the foundations can easily lead down a rabbit hole of endless generalization in endless variations. I think it's very reasonable to implement a first take on the comparison interface that just completely ignores this kind of generality (and uses the standard mechanisms like you say), and then implement another interface which allows for more expressive support for multiple islands.

Yeah, designing around the current type system seems expedient at least at this stage, to have just one moving part at a time. Once we have a design here, we could revisit it if there are any changes to the type system (which, it sounds like you have a lot of ideas about!).

The considerations around approachability / sane defaults vs surprising behavior also make sense.

More broadly, I think equal-always? and = should coincide basically wherever = is defined, except for types like real? which have non-participants like NaN and/or participants with multiple representations like 2 and 2.0 (where programmers would have practical reasons to care about the difference between one and the other but still might want to consider them =). This is where I think of this = approach I'm describing as being skewed in quirky ways toward Rhombus's arithmetic needs. I'm not sure whether to suggest that equal-always? be skewed the same way, since these quirks may or may not be quite as at home in Racket.

I think the right name for this in Racket would be =. It should exactly coincide with Racket's existing = on numbers, including returning false for (= +nan.0 +nan.0). The new part would be extending it to work on things other than numbers.

I tried numeric comparisons in the python and haskell REPLs. From what I can see, these languages do the comparisons by treating the inputs as floats. E.g. in the Haskell shell (ghci):

3.14285714285714[26..30] == 22/7 => True

But:

3.1428571428571426 == 3.1428571428571425 => False

That is, by virtue of the floating point comparison used, these numbers violate the transitive property of equality(!). I'd guess that this means that for sufficiently large rational components, two rational numbers that differ by a small enough margin would be considered equal here. And indeed:

22000000000000000/7 == 22000000000000001/7

... is considered true.

I gather that Racket inherits a "numeric tower" from Scheme which does this in a better way, where exact numbers such as rationals are modeled as exact numbers, while numbers indicated with decimals are modeled as inexact ones. And in fact:

(= 22000000000000000/7 22000000000000001/7)

is False.

This is great. As someone mentioned earlier (@LiberalArtist , in fact), this is something we should absolutely preserve.

How about this: = means equal-always?. But numbers are compared not as exact or inexact but as reals, i.e. a rich type that doesn't lose information about the numeric representation. So:

1 = 1.0 would be true (per the definition of real equality in my earlier comment), and

3.14285714285714[26..30] = 22/7

would be considered equal just like in Haskell, because these would be compared in an inexact way since the LHS is inexact. Yet, two rational numbers could differ in arbitrarily small margins and always be considered distinct, since they would leverage the definition of rational (exact) equality.

So, it would be an even better calculator than other languages!

By doing the comparison on the rich numeric type (Real), we retain the flexibility at that level to handle NaN and -inf and -0.0 etc. in whatever way we agree on.

Thoughts?

Btw, as a procedural note, I think we should consider the "RRFI" as the deliverable outcome of these discussions, so that any options or conclusions we come to here should be incorporated into the document in a tangible way, and when all is said and done, we will have an actual document containing specific and concrete recommendations. Just something to keep things directed as these discussions continue.

Also on this procedural side, it doesn't look like we can submit PRs on Gists, but it sounds like we could use this approach to approximate it, for anyone interested in updating the RRFI. Though, as the de facto editor of the document I am happy to keep it up to date with discussions here (e.g. I may bring it up to date and solicit confirmation from contributors once per week).

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

As someone who likes extensible languages, I'm inclined to want to be able to define key types in terms of functional code in case there's a type the language designer forgot [...]

There are no constraints on what a key function could be, except that they are unary and single-valued. As such, the language need not provide a library of key functions, as any (e.g. user-written) function would work.

I said "key types," not "key functions." For the key function string-length from string? to natural?, natural? is what I mean by the key type.

If there were absolutely nothing like gen:equal+hash left, and all there were were key functions, then what would the key functions return? Whatever value they returned, that value might also implement its comparison in terms of a key function, and if all values were like that, it would never end.

One way to look at this is that the key functions are like macros, and they eventually translate to some set of built-in special forms. In this analogy, gen:equal+hash is a way for the programmer to define new special forms by extending the key-interpreter. But it's also possible to get to a pretty expressive language with only a small set of special forms.

Along the way, I can think of a few stages where people might feel like the key types available to them are insufficient:

  • If the language has no innately comparable types to use as keys, then key functions don't have anything useful to return at all, and they can't be used to compare anything.
  • If the language's only key types have small finite size like boolean? and fixnum?, then no key function will be able to compare bignums, strings, vectors, or other data of arbitrary size.
  • If its key types have arbitrary size but are always serializable, then no key function will be able to compare opaque values according to information those values keep encapsulated, unless the key function is created by some code which has local access to that information. Certain code like that might be nice enough to expose the key function for use by other people's code, but that kind of exposure means the information is no longer encapsulated.

I think in the context of Racket, it's easy to take for granted that we have keys that can compare data of arbitrary size. We have comparable hashes, lists, strings, and numbers already, and these can be composed to express a variety of comparisons in a fairly efficient way.

Opaque data is harder. Perhaps undermining my own point a bit, the task of keeping implementation details from leaking is the reason I've made Cene and Interconfection's dexes work a little like correct-by-construction key functions -- because I don't trust programmer-defined code not to give different results based on the order it receives objects to compare, and making sure it's not possible for a module to introduce indeterminism that way is one of my requirements for the Era module system. Cene and Interconfection have a single underlying representation of keys, called name, which is itself very opaque so that it doesn't breach the opacity of the values being compared (other than what their comparison results are). For determinism's sake, dexes compare all possible parts of the value that a computation could observe, so users can't specify them with key-returning functions that might discard information; instead, they can write dex-returning functions.

(By the way, I didn't make such a strong connection between dexes and key functions until just now.... Somehow I've been regarding them more like order-invariant n-ary comparators.)

If I'm programming in a language, and if I don't think its module system is as side-channel-free as I'd want Era's to be, I appreciate having the ability to write arbitrary binary comparison code in case I hit some other kind of barrier. And with so many things I want custom comparison logic to be able to do. I figure there's bound to be something that's going to be hard to express in terms of a sufficient set of key types.

Technically, even if I'm writing my own Turing-complete or side-effectful comparison code, I don't have to write it in Racket or Rhombus. The suite of key types could grow expressive enough to be Turing-complete or side-effectful in its own right, and then it could mature even further to the point that it has its own module system, package manager, and IDE. Realistically though, I think it'll be hard to resist having at least some way to call out to Rhombus code.

So I'd say it's not just "fine to leave something resembling gen:equal+hash," it's actually something that plays an important role in understanding what you're going for by saying "mathematically it always works." I think there is a potential for a very pure approach based on a closed set of key types and letting users write key-returning functions, and I think I've even found myself building something like that by necessity with Cene's dexes. But I think emphasis on the key functions might underemphasize the role the key types are playing in the design.

But you may be right that the language probably should provide some standard and commonly used [key functions], something like ->integer[...]

Now that you mention it, although I was talking about key types rather than key functions, I do think there are some functions that would be particularly useful as key functions. I think most will be field accessors (for when you just want to ORDER BY some column, SQL-style), and some will be curried mapping functions for various container types (e.g., making a key function for lists that works by applying another key function to all the elements of the list).

A key function declares what parts of a value's API surface area are being chosen to be relevant to its use in the comparison, and this makes it like a type declaration with proofreading corrections on the leaves. Specifically, it's a lot like a Racket flat contract, because it has to compute a key value right away and can only interact with the parts of the API surface it can see right away.

The same is true if the user specifies a binary comparator. Binary comparators benefit from the same suite of flat-contract-like combinators too. (And so do Cene's and Interconfection's dexes and clines.)

[...] as I think Matthew hinted at, re: storing optimized binary comparators "on" these canonical key functions at compile time.

That was me, but I think I had trouble getting the idea across until Matthew arrived at the same place. 😄

But I think the fact that floating point comparisons are an exception here [...]

Oh, I don't think they'd be an exception. I think the key function you're looking for is just inexact->exact with some special-casing of NaNs and infinities:

(-> number? (or/c #f number?))
(define (number->maybe-equal?-key x)
  (define (real->maybe-equal?-key x)
    (cond
      [(nan? x) #f]
      [(= +inf.0 x) +inf.0]
      [(= -inf.0 x) -inf.0]
      [else (inexact->exact x)]))
  (for/first ([r (in-value (real->maybe-equal?-key (real-part x)))]
              #:when r
              [i (in-value (real->maybe-equal?-key (real-part x)))]
              #:when i)
    (make-rectangular r i)))

In general, it seems pretty easy to shift concepts around. A comparator library can fit into a key function world by having wrapper types that compare themselves by invoking comparators. A key type can fit into a comparator world by having a comparator that works by invoking a key function. On the one hand, I'm reluctant to limit things to one approach if there are so many unfinished ideas that might need another, but I don't see a long-term difference in expressiveness.

Could you clarify what you see as unfinished here? Would be good to address these so we tie down any loose ends.

I haven't ever built a comparator interface with equality witnesses, apartness witnesses, and a trie lookup function. I'm still working out the details for myself, and I hardly even want to propose them for Rhombus given how much feature creep they could represent. I certainly don't want Rhombus to be blocked on every single one of them, and I have no illusion that I won't just come up with more ideas by the time these are half-done. XD

I've been trying to work out the details of these features for myself for months, and I have some barely begun repos where I aim to explore them. For instance, I'm thinking of implementing the trie lookup in JavaScript first, as part of a multiple dispatch system, so that I can get my run-time dispatch approach figured out before getting lost in any compile-time dispatch ideas. (Unfortunately, I think the main thing JavaScript programmers might want in a multiple dispatch framework is something that makes it easier to tree-shake out the specializations the program doesn't use... and that's a compile-time concern. So I guess I don't escape distractions that way.)

While I consider these ideas unfinished for my own purposes, I'm sure many people have made type classes called "decidable equality" (with some spelling) in something like Coq, Idris, or Agda, where it returns not just a boolean but either a proof of equality or a proof of apartness. ...And it's possible MathScheme even has a decidable equality interface of some sort expressed in a more Scheme-related way.

Somewhere out there might even be an interface with the kind of trie lookup method I'm trying to figure out. I mean, I figure the fact that tries have a name means someone's probably done this kind of inversion of control at least once before. XD

Would you say that the real? type we discussed earlier is an example (at least in my reified version of it) of one type defining equality across multiple types? Since inexact? and exact? were subtypes of real? in this case, I'm wondering whether such cases would always arise with a parent type defining equality for all subtypes which could otherwise be islands.

Yeah, that sounds right. Having a comparison operation that's defined on every combination of values in a set makes that set a good candidate for a "type." Any code that invokes the operation probably has an island in mind and would ascribe that island's type to some of its variables, at least in spirit.

22000000000000000/7 == 22000000000000001/7

... is considered true.

Haskell has exact rationals. They're just apparently not the default Fractional type, which might be some floating-point representation.

Prelude> import Data.Ratio
Prelude Data.Ratio> 22000000000000000/7 == 22000000000000001/(7 :: Ratio Integer)
False

I don't think a lot of people using Haskell are going to be surprised if a language has its own conventions for things. Still, it seems like Haskell is designed in a way that makes arithmetic look pretty conventional, especially if the user's willing to specify the type they expect. It's probably the only user-extensible approach to arithmetic I know where users can not only define new number types (as they can in Python with operator overloading) but new operations as well.

Anyhow, arithmetic could probably use its own discussion thread or issue. :) I wonder if there already is one.

How about this: = means equal-always?. But numbers are compared not as exact or inexact but as reals, i.e. a rich type that doesn't lose information about the numeric representation. So:

1 = 1.0 would be true (per the definition of real equality in my earlier comment),

Do you want 1 = 1.0 to be equal-always? as well?

and

3.14285714285714[26..30] = 22/7

would be considered equal just like in Haskell, because these would be compared in an inexact way since the LHS is inexact.

I think Racket compares as though the inexact number is converted to an exact one.

At any rate, I in my opinion, an inexact number that's even slightly less than 22/7 should be sorted prior to it and considered < to it, and I have no idea what to do with the idea that some numbers could be < and = at the same time. I'd much rather keep < and = mutually exclusive.

Btw, as a procedural note, I think we should consider the "RRFI" as the deliverable outcome of these discussions, so that any options or conclusions we come to here should be incorporated into the document in a tangible way, and when all is said and done, we will have an actual document containing specific and concrete recommendations. Just something to keep things directed as these discussions continue.

Also on this procedural side, it doesn't look like we can submit PRs on Gists, but it sounds like we could use this approach to approximate it, for anyone interested in updating the RRFI. Though, as the de facto editor of the document I am happy to keep it up to date with discussions here (e.g. I may bring it up to date and solicit confirmation from contributors once per week).

I never know how much time I have available to focus on this, so a lot of my participation commitment in Rhombus overall is in drive-by comments, but I was thinking the same thing about your RRFI and wondering how people could participate in your Gist. I'm glad you said something. :)

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Opaque data is harder. Perhaps undermining my own point a bit, the task of keeping implementation details from leaking is the reason I've made Cene and Interconfection's dexes work a little like correct-by-construction key functions -- because I don't trust programmer-defined code not to give different results based on the order it receives objects to compare, and making sure it's not possible for a module to introduce indeterminism that way is one of my requirements for the Era module system. Cene and Interconfection have a single underlying representation of keys, called name, which is itself very opaque so that it doesn't breach the opacity of the values being compared (other than what their comparison results are). For determinism's sake, dexes compare all possible parts of the value that a computation could observe, so users can't specify them with key-returning functions that might discard information; instead, they can write dex-returning functions.

Along these lines, it occurs to me that one benefit of using key functions even for the type definition side is that they are safer and provide stronger guarantees than the comparator. That is, in a comparator, there is nothing preventing a user from violating reflexivity (via a != a), symmetry (via a = b but b != a), or transitivity (via a = b and b = c but a != c). But an equality key guarantees that all three properties of equality are preserved in the new definition.

This would be one reason to favor key-based definitions for gen:equal+hash instead of a comparator. Not sure if it would be more difficult to optimize an inferred comparator like (= (vector a-attr1 a-attr2) (vector b-attr1 b-attr2)) than a user-specified one like (and (= a-attr1 b-attr1) (= a-attr2 b-attr2)). If constructing a vector out of the components of the type is the most common way to implement the equality key (i.e. the same as the default definition for transparent structs), then maybe this could have a standard compiler optimization.

I said "key types," not "key functions." For the key function string-length from string? to natural?, natural? is what I mean by the key type.

If there were absolutely nothing like gen:equal+hash left, and all there were were key functions, then what would the key functions return? Whatever value they returned, that value might also implement its comparison in terms of a key function, and if all values were like that, it would never end.

It wouldn't be "key functions all the way down" because we are taking egal? over the built-in Racket values as a primitive.

I see what you're saying re: key types though. Since the method of checking equality on the ground set breaks down into islands that correspond to types, these types could be seen as "key types." This type-based way of looking at it sounds more structured, and is possibly a more useful way of thinking about this for that reason.

I'm not sure I see a difference with key types being extended by comparators vs key functions, though -- in either case, once a new type has defined equality for itself, new types after that could treat it as a key type, no? That is, assuming we allow specifying a key function via gen:equal+hash for the type and not just on an ad hoc basis.

If I'm programming in a language, and if I don't think its module system is as side-channel-free as I'd want Era's to be, I appreciate having the ability to write arbitrary binary comparison code in case I hit some other kind of barrier. And with so many things I want custom comparison logic to be able to do. I figure there's bound to be something that's going to be hard to express in terms of a sufficient set of key types.

Yeah, I think the assumption with "mathematical completeness" of key functions is that the built-in Racket values are diverse enough, and egal? is fine enough, for any conceivable custom type to find natural representatives. While the fact that it is empirically true on all types that we see (including numbers, as you pointed out) is enough to make it useful in APIs, we of course do not need to rely on this assumption being true if we go with the comparator on the type definition side.

Just to rehash the options here FTR: for defining equality for a type via gen:equal+hash, we could either use a comparator (as it is in Racket currently and as we have been mostly saying so far), or a key (by virtue of the safety considerations above), or support users specifying either (but not both).

Along the way, I can think of a few stages where people might feel like the key types available to them are insufficient: [...]

The issues at the various stages of paucity / extravagance of key types makes sense, and it's great to have this concrete breakdown of the cases.

Encapsulation is a great example here, and one where you'd think that a key function wouldn't work. But if gen:equal+hash supports specifying a key function, then the equality check here would delegate to the method on the type, and it would produce a representative "from within" the instance. Although, of course, this "internally" produced representative would (for instance, if it were to use the vector-of-attributes approach for a key) produce a value containing all of the relevant internal attributes -- a kind of serialization. All this is not to say that we should go with key functions on the type definition side, just that (as far as I can see) we can.

That was me, but I think I had trouble getting the idea across until Matthew arrived at the same place. 😄

Ah that's right I remember now.

Oh, I don't think they'd be an exception. I think the key function you're looking for is just inexact->exact with some special-casing of NaNs and infinities:

I'm sure you're right, but what's the reason for using make-rectangular here?

Haskell has exact rationals. They're just apparently not the default Fractional type, which might be some floating-point representation.

Nice, thanks for checking that.

I think Racket compares as though the inexact number is converted to an exact one.

Ah yeah, this does appear to be the case.

At any rate, I in my opinion, an inexact number that's even slightly less than 22/7 should be sorted prior to it and considered < to it, and I have no idea what to do with the idea that some numbers could be < and = at the same time. I'd much rather keep < and = mutually exclusive.

Yes, I agree this is better 👍

I never know how much time I have available to focus on this, so a lot of my participation commitment in Rhombus overall is in drive-by comments, but I was thinking the same thing about your RRFI and wondering how people could participate in your Gist. I'm glad you said something. :)

I'm glad I did too :) As a start, I'll aim to work these discussions into the document, but realistically, there's a lot in what you said that I'm sure I couldn't do justice to. I'll probably do an initial pass and add placeholders to indicate places that may need input or details from you.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Ok, I've updated and extracted the equality part into a brand new RRFI document. Thank you all for your input so far:

RRFI [Draft]: Two-level Universal Scheme for Equality

This is in a new document to keep things decoupled, as @jackfirth suggested.

I've tried to capture what we discussed and what people have done but it's entirely possible that I've missed things. There are also spots in the document marked by TODOs where I know it needs more work. Would love input from anyone on this thread on those things as some of them are fairly general and well-bounded. Most of these are in the compiler optimizations section. There's also a TODO for specific examples where egal? replaces eq? (and also equal?). That is, what's a concrete case where a Racket programmer would reach for eq? but in fact is going to run into problems that way, and she should use egal? instead, and also cannot use equal-now/equal?? Or perhaps there aren't such cases and the point is that the recourse to separate eq? and equal? predicates has been done away with here. It would be great to substantiate the move to egal? further as the best choice of primitive equality predicate.

Oh also, the document takes a stronger stance in support of key functions for "extending key types" (i.e. via a generic interface definition of equality) than we have been taking so far, partly just because I had more time to reflect on these discussions, but also because I recalled something Matthew had pointed out earlier regarding hashing being completely abstracted from the user by this scheme. I realized that that simplifies gen:equal+hash to a significant extent. That combined with the "safety" consideration that came up made me feel that we ought to recommend key functions exclusively as the right thing here -- of course, this is still a draft and subject to our further consideration and review and input. Enjoy ☕ 😸 👓

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

@countvajhula Could you make your Gist into a GitHub repo, please? That'll probably be lower friction than having people create their own repos to make pull requests against, and. I've been meaning to start my own instance of a GitHub repo just so I could write a pull request against it, and it'd probably be easier to find the discussion around it if it's all in the same place.

(Or maybe for the community's collaboration, it could make more sense to have pull requests center around this repo, or at least something under the Racket organization? At one point this repo was starting to be set up for proposals like this in the form of https://github.com/racket/rhombus-prototype/blob/8f9f8719cfb56c64000410c9c8eb43139dc39af7/resources/template.md , and I wonder if that's still a good use of it or if it's more dedicated to a prototype now.)

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Having it be part of this repo seems to make the most sense @rocketnia . I see the template that you linked to, but I didn't find any examples of proposals that have gone through this process -- mind sending a link if there have been any? Otherwise, here's one process we could follow:

  • I fork this repo and add the RRFI to it and maintain the current version at my fork
  • anyone interested in working on this submits patches to my fork
  • when we're ready, we submit a PR to this upstream repo

Does that work?

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Alright I've created the fork and added the RRFI @rocketnia
https://github.com/countvajhula/rhombus-prototype

The doc is now at:
https://github.com/countvajhula/rhombus-prototype/blob/master/rrfi/equality.rst

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

Thanks for setting that up. Once I actually got started reading the RRFI, it provoked some deeper thoughts about the equality of opaque values. To keep this comment self-contained, I'll focus on that, but I hope to get to more of the RRFI soon.

Since the RRFI mentions egal? in a few places, I took a closer look at Equal Rights for Equal Objects, which introduces egal?. I think I've overlooked this part before:

As a result of these problems, we recommend that EGAL remain a simple, reliable, decidable predicate which compares types and representations, as in the following code. While EGAL will sometimes provide a finer equivalence relation than the abstract one desired, it cannot cause embarrassment due to violation of the basic rules of object identity, because it tests for identity as preserved by the basic value transmission operations of the programming language.. Furthermore, this EGAL is consistent with the earlier definition of EGAL on closures, which are often used to implement abstract data types (this consistency can be seen by equating the notion of an abstract type object with the notion of the code for a closure). Systems defining abstract datatypes might consider providing a new "generic" predicate that defaults to EGAL for primitive datatypes and can be overloaded for user-defined abstract datatypes; this generic predicate would then be used for table lookup routines such as assoc, member, etc., instead of passing an equality predicate as an argument.

(defun egal (x y)
  (and (egal (type-of x) (type-of y))
    (cond ((abstract-type-p (type-of x))
           (egal (representation x) (representation y)))
          << the other clauses for egal, as before. >>
           )))

In short, egal? works on user-defined types by traversing into their type tags and representations. It thus guarantees reflexivity, symmetry, and transitivity by not allowing programmers to mess them up.

Meanwhile, egal? traverses into function closures as well. This is described elsewhere in the paper and is alluded to here by "this EGAL is consistent with the earlier definition of EGAL on closures." These policies do look consistent with each other to me.

That paragraph suggests having a generic alternative alongside egal?. I think the direction equal-always? has been taking is close to what that would be. Perhaps equal-always? won't traverse into procedures' implementations as deeply as egal?, but it probably will check that they're eq?.

The kind of equality predicate I'd like to see in programming languages--and I'm sorry if I'm retracing some of the points I described in comments above--is one that respects the encapsulation of abstract types. It shouldn't traverse into their representations unless given the go-ahead by the types' authors, and then it should traverse only into the parts it's allowed to. If type authors can't be trusted to specify their own traversals, it shouldn't traverse into the types at all.

Ideally, the equality predicate itself also obeys function extensionality with respect to types' conceptual equality. (In comments above, I called that "purity.")

For instance, conceptually, the following functions are equal, even if they have different representations:

(define (add3 n) (+ 2 (+ 1 n)))
(define (3plus n) (+ 1 (+ 2 n)))

Thus, the following calls have conceptually equal arguments, and by the function extensionality of abstraction-respecting-equal? itself, they should have equal results:

(abstraction-respecting-equal? add3 add3)
(abstraction-respecting-equal? add3 3plus)

Since extensional equality of functions is in general undedicable, we probably can't expect the (abstraction-respecting-equal? add3 3plus) expression to return #t, so neither should (abstraction-respecting-equal? add3 add3).

That gives the appearance that abstraction-respecting-equal? isn't reflexive. But some values don't have any total decision procedure for their equality, so abstraction-respecting-equal? can't be a total decision procedure for their equality either. Instead, it's a partial decision procedure for equality, one which may still be total when restricted to certain particularly useful subsets of input values.

I started off a few comments ago a bit saying numeric =, including its quirky NaN behavior, would be a good starting point for equality in Rhombus because it's consistent with <= and likely to be familiar to new programmers who've worked with numbers in low-coding tools. But at this point I think this NaN behavior isn't even a quirk; the ability of = to reserve judgment on certain values makes it a better starting point for a generic equality predicate even outside the familiarity-driven domain of Rhombus.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

Sigh, I'm going in circles with myself. I don't think Rhombus's = necessarily needs to abide by function extensionality on inputs that aren't designed to be compared.

For one thing, passing in those inputs in the first place is arguably a type error.

For another thing, the Racket culture doesn't yet include a habit of explicitly declaring when types are disjoint from each other, and it's not clear that introducing one would actually make the language any more comfortable to use. If types can be implicitly assumed to be disjoint, then disjoint union types can be taken. If values that belong to no explicit type are assumed to be of their own types (e.g., each lambda expression basically defining its own prop:procedure-implementing structure type), then a few of those values can be singled out and unioned up with some other types. A disjoint union of posets is a poset, so types are in some sense more compositional in this culture. And if all types are disjoint, then data structures don't have to go to elaborate lengths to make sure all their keys are comparable to each other, which seems it could get inefficient.

So if Rhombus's = receives values of types that weren't explicitly designed to be compared, it may actually be a reasonable assumption that the programmer wasn't mistaken and that the types were implicitly designed to be compared.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

First of all, I'm really glad you are subjecting egal? and equal-always? to scrutiny @rocketnia . Thus far I have been mostly content to abstract the specific choice of primitive equality relation, but I think this kind of scrutiny is necessary both to convince ourselves we are making the right choice here, as well as reveal possible ways (as you have started to do) in which we might do even better.

In short, egal? works on user-defined types by traversing into their type tags and representations. It thus guarantees reflexivity, symmetry, and transitivity by not allowing programmers to mess them up.

Meanwhile, egal? traverses into function closures as well. This is described elsewhere in the paper and is alluded to here by "this EGAL is consistent with the earlier definition of EGAL on closures." These policies do look consistent with each other to me.

That paragraph suggests having a generic alternative alongside egal?. I think the direction equal-always? has been taking is close to what that would be. Perhaps equal-always? won't traverse into procedures' implementations as deeply as egal?, but it probably will check that they're eq?.

The kind of equality predicate I'd like to see in programming languages--and I'm sorry if I'm retracing some of the points I described in comments above--is one that respects the encapsulation of abstract types. It shouldn't traverse into their representations unless given the go-ahead by the types' authors, and then it should traverse only into the parts it's allowed to. If type authors can't be trusted to specify their own traversals, it shouldn't traverse into the types at all.

I agree that traversing into representations is "invasive" of the abstraction barrier. The current key proposal avoids this, btw. In essence, the kind of comparison that egal? would do on the attributes of the representation are done on these attributes represented as a vector by the default key function (in cases where a default definition is used - which could be always unless otherwise specified, or even never unless otherwise specified).

Additionally, by limiting user-defined equality purely to specifying a representative, the user cannot violate reflexivity, symmetry, or transitivity -- so it simultaneously trusts the user and also implicitly places guardrails against the possibility of user error.

Ideally, the equality predicate itself also obeys function extensionality with respect to types' conceptual equality. (In comments above, I called that "purity.")

For instance, conceptually, the following functions are equal, even if they have different representations:

Since extensional equality of functions is in general undedicable, we probably can't expect the (abstraction-respecting-equal? add3 3plus) expression to return #t, so neither should (abstraction-respecting-equal? add3 add3).

That sounds reasonable -- perhaps functions should be incomparable.

That gives the appearance that abstraction-respecting-equal? isn't reflexive. But some values don't have any total decision procedure for their equality, so abstraction-respecting-equal? can't be a total decision procedure for their equality either. Instead, it's a partial decision procedure for equality, one which may still be total when restricted to certain particularly useful subsets of input values.

In order to preserve reflexivity, instead of returning false we could raise an error that function is not a comparable type. Currently in Racket, any values are comparable, i.e. the domain of e.g. equal? is, I believe, all Racket values. But preserving function extensionality may be a good reason to exclude at least functions from the domain here.

I started off a few comments ago a bit saying numeric =, including its quirky NaN behavior, would be a good starting point for equality in Rhombus because it's consistent with <= and likely to be familiar to new programmers who've worked with numbers in low-coding tools. But at this point I think this NaN behavior isn't even a quirk; the ability of = to reserve judgment on certain values makes it a better starting point for a generic equality predicate even outside the familiarity-driven domain of Rhombus.

I agree that restricting the domain of = to a smaller set than all Racket values should be a possibility we consider, and although I am not familiar with the IEEE(?) standard that defines NaN and what the rationale for its existence is, it sounds like another example of a value that (as you suggested earlier) should be excluded from equality comparison. I think my preference would be for such comparisons of incomparable values to raise an error (or return an optional value that does not have a boolean interpretation -- although this seems less Rackety) rather than return false.

Sigh, I'm going in circles with myself. I don't think Rhombus's = necessarily needs to abide by function extensionality on inputs that aren't designed to be compared.

For one thing, passing in those inputs in the first place is arguably a type error.

I think this equivalent to saying that we exclude certain values from the domain of =, is that right?

For another thing, the Racket culture doesn't yet include a habit of explicitly declaring when types are disjoint from each other, and it's not clear that introducing one would actually make the language any more comfortable to use. If types can be implicitly assumed to be disjoint, then disjoint union types can be taken. If values that belong to no explicit type are assumed to be of their own types (e.g., each lambda expression basically defining its own prop:procedure-implementing structure type), then a few of those values can be singled out and unioned up with some other types. A disjoint union of posets is a poset, so types are in some sense more compositional in this culture. And if all types are disjoint, then data structures don't have to go to elaborate lengths to make sure all their keys are comparable to each other, which seems it could get inefficient.

Not sure I understand the point about types here, but it seems like excluding values from the domain of = should mean that these values are also not usable in data structures like hashes (just like mutable values would be excluded here -- but in the case of mutable values they are still in the domain of = and are excluded for other reasons than their incomparability).

Another speculative possibility is to introduce a new truth value into the language to represent "undecidable" and, er, decide on the semantics of that.

So if Rhombus's = receives values of types that weren't explicitly designed to be compared, it may actually be a reasonable assumption that the programmer wasn't mistaken and that the types were implicitly designed to be compared.

Are you saying that egal?'s compromise is OK here? I'm actually thinking through the implications of excluding functions from equality comparison, and it feels like it would discourage users from basing logic on the equality of functions -- which is undecidable after all. Maybe this is not a bad thing.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

Sigh, I'm going in circles with myself. I don't think Rhombus's = necessarily needs to abide by function extensionality on inputs that aren't designed to be compared.

For one thing, passing in those inputs in the first place is arguably a type error.

I think this equivalent to saying that we exclude certain values from the domain of =, is that right?

Hmm, that could mean some things I do intend and some I don't.

Keep in mind that = takes multiple inputs. It wouldn't always be "values" that would be excluded, it would be pairs of values. A lot of the time, even if the comparisons a = a and b = b were allowed, a = b might not be, typically because a and b were values of different types without an explicit API guarantee of disjointness between them.

Anyway, this support for not-necessarily-disjoint types is something I was saying I wanted for abstraction-respecting-equal?/"extensible =," and it might be what I would want for a dream language, but that's before I just went in circles with myself.

What I'm saying now is that perhaps something that treats disjoint types as the default, as equal-always? does, is a more practical route in the Racket and Rhombus ecosystem due to the common use of union types in Racket API design.

Not sure I understand the point about types here, but it seems like excluding values from the domain of = should mean that these values are also not usable in data structures like hashes (just like mutable values would be excluded here -- but in the case of mutable values they are still in the domain of = and are excluded for other reasons than their incomparability).

I was talking about Racket's assumptions of disjointness there, not anything to do with hashes. I'll try rephrasing.

Racket core APIs seem keen on using lots of union types. Many functions take arguments of the form (either a boolean or a procedure) or (either a list or a symbol).

In some cases, this may be a gamble that prioritizes short-term convenience over the potential for long-term convergence of language extensions. In a scenario where person A makes an apricot library, person B makes a plum library, and person C writes a function that takes (either an apricot or a plum) and branches on which type of value it receives, person C has likely assumed the apricot? and plum? predicates are disjoint. Person C may have also assumed no apricots would ever be equal to any plums. This closes off a path A and B could have taken with their libraries, where they could have merged their similar types and made their downstream ecosystems automatically interoperate more seamlessly.

Mostly this just means that if there's a sense in which some apricots should be equal to some plums, then there will need to be a sum type that wraps apricots and plums and has its own upgraded comparison behavior. The innate comparison behaviors of apricots and of plums will have to continue to treat them as disjoint, but this new type can treat them as related.

(To converge in ways other than equality, the libraries can just introduce a new prunus? generic interface that's a supertype of apricot? and plum? and has support for all the operations of each. Equality of prunus values will sadly distinguish apricots from their plum counterparts, but for most other purposes, libraries can migrate to the prunus type to achieve better interoperation.)

Heh, when I describe it this way, I suppose taking on a sum type wrapper today isn't worth it if it's just to prevent a possible sum type wrapper in a future that might never come to pass. I find myself less concerned about and more endeared to the disjoint-by-default world.

Another speculative possibility is to introduce a new truth value into the language to represent "undecidable" and, er, decide on the semantics of that.

Hm, I think the straightforward semantics is that it's an error. An undefined result may someday turn into a defined result, so whatever some code does in response to it is almost inherently unstable.

It would be handy to have a way to compare things that computes a Maybe Boolean so that certain code can use that information for its own more sophisticated reporting of errors (a use case where the instability wouldn't matter), but I think the everyday meaning of = in Rhombus should just return a boolean.

So if Rhombus's = receives values of types that weren't explicitly designed to be compared, it may actually be a reasonable assumption that the programmer wasn't mistaken and that the types were implicitly designed to be compared.

Are you saying that egal?'s compromise is OK here?

Well, not egal?, equal-always?.

I'm actually thinking through the implications of excluding functions from equality comparison, and it feels like it would discourage users from basing logic on the equality of functions -- which is undecidable after all. Maybe this is not a bad thing.

Yeah, I think it could be good not just for lambdas but for some other anonymous codata values as well. The only way to be sure that two codata values are equal is by their object identities or by some underlying data representation they have (like what their struct fields are, if they're a prop:procedure struct). However, for some codata values, even their object identities may not be predictable enough to rely upon. Excluding them from comparison altogether could be annoying because chaperone contracts wouldn't be able to wrap them, but at least they might be excluded from more coarse-grained comparisons.


Here's a more self-contained description of what I'm thinking at the moment. Just for identification purposes, maybe I'll call this bundle of design choices "business-logic = alongside equal-always? ==, with disjoint-by-default types," or business=? for short. Apologies in advance for mixing ordered comparison stuff (#214) in with equality stuff again:

Rhombus's == is equal-always?, which is the symmetric closure of chaperone-of?. Types can implement custom behavior for equal-always?/chaperone-of?, but eq? values (and certain values that are chaperones of eq? values, I think?) will be equal-always? regardless of custom behavior. This means equal-always? will be no finer-grained than eq?.

Rhombus's = represents a coarser equivalence relation that's consistent with a <= partial order. Whereas == represents a value's equality in terms of its API surface, = is just an arbitrary equivalence relation that the programmer is modeling on a set of comparable values. Since = itself is part of the API surface, values that are == should also be =.

Rather than being total decision procedures, ==, =, and <= can be undefined on some inputs. They must be defined on any inputs for which there exist other inputs that could be used to deduce what their behavior should be there. Specifying this precisely might get wordy, but here's an example: If b <= c is #t but a <= c is #f, then by the contrapositive of transitivity, a <= b has to be #f, not undefined.

This complex condition allows for a certain good default for transparent algebraic data types: They can be noncommittal about what the result of a comparison like Foo(1, 2) < Foo(2, 1) or None() < Some(2) should be at first, even if they commit to Foo(1, 1) < Foo(2, 2) and Some(1) < Some(2) being true. In more detail, certain obvious ways to order values follow from the composition of partial orders by categorical sums and products, but some types might further elaborate on the partial order obtained this way until it's closer to a total order, and this design reserves their right to do so. It also leaves enough room to encompass the cases I've been considering above with NaN-like values and not-necessarily-disjoint types.

When defining a more opaque type, one good default for = and <= is to be undefined except where the truth of == obligates that they're true. This reserves the right for a type to define an equivalence relation or partial order over its values in the future, even if it doesn't yet have that as part of its design.

In both of those cases, undefined results aren't the only good default. False results are another; that way the algebraic laws for equivalence relations and partial orders are still respected, but the partial decision procedure is closer to being a total decision procedure. The power of a total decision procedure can be useful in the short term even if documentation cautions people about the instability of certain false results.

Meanwhile, the behavior for == (if I understand chaperone-of? right) defaults to just unwrapping chaperones and checking for eq?.

Per what I was saying a moment ago, I'm convinced now that it's a good thing for types to be disjoint by default. That means comparisons between them should return false, and type authors should only get an undefined result if they opt out. Similarly, it's good for the default behavior of == to return false for distinct values of the same type. That way, subsets of the type's values are effectively subtypes that are disjoint from each other by default too. Altogether, this means == can be totally consistent with equal-always? on existing types, even though equal-always? might not yet allow undefined results like I'm describing for ==.

I believe this set of design choices will mean that in situations where ==, =, and <= have undefined results, those results will be both algebraically justified and intentional on the part of the type author. So while there might be a role in some cases for treating an undefined result as false or as some other distinct value, I think treating it as an error will be the most helpful for the majority of code.

The way people customize = and <= can be by supplying a key function that converts the values into some other type that already supports = and <=. Depending on the quality of error messages desired, the key function approach could get a little wacky in order to report error messages in terms of the original data structure, and I think there might need to be several key types dedicated to expressing different order combinators, but I think it can still work.

The way people customize == can be by supplying a key function that converts the values into some other type that already supports =. (Yes, this would define one value's finer-grained relation in terms of another value's coarse-grained relation. The key value is being used for nothing but its comparison logic, so an =-able value is appropriate.)

In summary, Rhombus could have two levels of equality predicate: an equal-always? level for comparing values by their observable behavior, and a business-logic business=? level for comparing according to whatever equivalence relation that specific type is designed to support on its values. These could be called == and =, and the = one could be consistent with the same type's <= comparison. I believe all these procedures could be customized using key functions. These procedures could also be allowed to be partial, thereby allowing some type authors to throw exceptions to bring attention to type errors or to keep certain behavior unspecified as they see fit. However, if type authors want their types' disjointness to be unspecified, they have to opt out; disjointness is the default to allow union types to work.

(Maybe I should separate this out into its own top-level thread/issue in order to keep it from being buried so deep? And perhaps == and = should become === and == like some of the notations people suggested earlier in the thread.)

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

@rocketnia -- If I may ask you a favor Nia, if there is something in your comment above that you feel ought to be included in the RRFI and isn't already, would you be kind enough to create a PR for it? The general orientation of the RRFI is to present many options and, where possible, make a recommendation from among them. If you think you can add the content of your comment as options in the appropriate sections (ideal -- e.g. the section on "Primitive Equality Predicate") or even introduce new sections (if needed), I think that would greatly help by putting it in perspective. If you would prefer to wait until tomorrow so that I've had time to do an edit pass (which I plan to do today) that you can review first, that would be fine too -- either way. At the same time, if we had to choose between working on new things and refining what we already have, I'd err on the side of the latter in the interest of time and making a cogent case. On that subject...

@everyone -- If anybody else wants to contribute anything for consideration in the upcoming meeting, please get it in now. I think we are focused on refining the RRFI in time for people to read it (i.e. probably have it ready before the weekend), and, opportunistically, on setting up initial benchmarks (which Nia and I discussed offline) for the two-level scheme. I will aim to have the benchmark suite ready hopefully soon, and I think it will give us all an opportunity to play without needing too much context :)

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

FYI: the document has been updated. Have a good weekend, and enjoy 🤓

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

@countvajhula I'm getting to the point in #259 where it makes sense to start implementing equality in the collection data structures. To do that, I need equality interfaces based on the new class and interface system in Rhombus. Would you be interested in integrating the design work you've already done with the class system?

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

@jackfirth Could you explain a bit more what's needed? I don't expect I'll be able to work directly on the implementation, but I'm available to consult on the design.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

@countvajhula I've got a bunch of collection classes in #259 and I want a == b in Rhombus to work correctly on them. This requires getting the existing racket equal? and equal-always? predicates to work correctly with custom datatypes defined in Rhombus. I'm not yet sure what's needed to make that happen.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

One basic solution following gen:equal+hash / prop:equal+hash would be an EqualPlusHash interface (or whatever follows a better naming convention) with 2 methods: equal_proc and hash_proc.

Since Rhombus == follows the equal-always? mode, would it be okay to assume that the EqualPlusHash methods are always in line with that mode too? If that's the case, then equal_proc would take 3 arguments: self, other, and recur, and the hash_proc would take 2 arguments: self and recur.

This would implement prop:equal+hash with a 2-element list of equal-mode-proc and hash-mode-proc. Those implementations can ignore the mode argument and just pass the others through.

If that assumption of the EqualPlusHash methods always being in line with equal-always? mode wouldn't be true... perhaps there could be an EqualNowPlusHash interface or an EqualModePlusHash interface for the cases where users want to implement an equal-now behavior instead?

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

Do we need the recur argument in the Rhombus interface? I'd prefer to simplify things by avoiding it.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

The equality procedures use the recur argument for cycle detection as well as preserving mode stuff for things deeper inside containers.

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

I'm not convinced cycles are worthwhile to detect. As for preserving modes, why would we need to do that if we're only using one mode in Rhombus?

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

We only need a key function, see this example. We don't need equal-proc, hash-proc or recur.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

I'd also argue for the usefulness of the recur argument for equal?/recur and equal-always?/recur.

One example is check-within which can use equal?/recur to implement an looser equality that allows deviations of numbers within epsilon of each other for testing purposes.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

As I recall, part of the work that would be needed to support it out of the box is to have a unique "type tag" for custom user-defined types, which Racket currently doesn't provide. The POC code allows us to link the gen:equatable (called gen:comparable in the code) to a struct type property that includes such a type tag, but that only works for types that explicitly declare gen:comparable and not ones (i.e. hopefully the majority) that use the default equality extension implementation (based on vectors).

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

It would be better to find ways to incorporate such special cases using special handling rather than design the core equality extension interface around them. There are many reasons to use the key function interface as the core means of extension, covered in the design document and discussed extensively a little while back, so I don't think it's worth rehashing here.

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

What special handling do you propose for the case outlined by Alex? I believe Alex's point was that there is no special handling that can compute the equality of two rational numbers using key types without factoring.

As I'm reloading the cache on this, I'm remembering that the design specifically recommends against supporting special cases of binary comparators, as even one of these would come at the cost of breaking invariants that can be assumed in the language, viz. reflexivity, symmetry, and transitivity of equality.

For Alex's rational number example, since it is a familiar and general type, this should be something provided by the language itself, and that can have an optimized implementation without exposing the interface to the user (and thus without breaking invariants). In the more general case for arbitrary data defined by users, we rely on optimizations like memoization of keys and ground representatives (which outperformed the binary equal? in the early benchmarking that I did for the POC).

For collections, I'd say assume you only need key until you encounter a case where you feel you need a binary comparator, and we could take a look at that point. Wdyt?

from rhombus-prototype.

jackfirth avatar jackfirth commented on May 25, 2024

Here's my use case: comparing two collections of different sizes for equality should always be O(1) (assuming that computing their size properties is O(1)) and should never require traversing their elements. How can we make that happen?

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Here's my use case: comparing two collections of different sizes for equality should always be O(1) (assuming that computing their size properties is O(1)) and should never require traversing their elements. How can we make that happen?

It will be O(1) after the members of the relation have ever been compared for equality (with anything, not necessarily the other members of the present comparison), since at that point the ground representatives would be memoized. Even before that point, it is the hash codes that are compared first, and a negative result would immediately allow us to conclude they are unequal.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

But computing the hash code for an N-size collection is not O(1)

from rhombus-prototype.

countvajhula avatar countvajhula commented on May 25, 2024

Wouldn't that affect size computation anyway?

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

Oh, also if the interface includes a hash-proc there should be standard functions to combine hash codes in different ways, for example ordered vs unordered.

The distinctions that might be made there are another reason why a key method like the one proposed above can't be the one core way to define equality/hashing. A hashing function based on that key method would have to make assumptions, for example always ordered. That would make it harder to define hashing behavior in other ways.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

@AlexKnauth: A hashing function based on that key method would have to make assumptions, for example always ordered.

Why? If you want ordered hashing, return a list or a vector from the key method. If you want unordered hashing, return a set or a map.

If you can imagine any type that should be hashed the way you want, then all you need to keep using the key method approach is for at least one type like that to be built into the language as a special case. (Since Rhombus will coexist with Racket, one of those special cases might just be the type "a value that performs its comparisons using gen:equal-mode+hash," as much as that kind of undercuts the theoretical purity @countvajhula has in mind.)


...By the way, @AlexKnauth, you mentioned unreduced rational numbers as a counterexample earlier. You'd like to be able to compare them without reducing them, which you say you can do using the gen:equal-mode+hash style. All right, now how would you compute hashes for those values without reducing them?

To make a certain point, I'm gonna assume there's no good way to compute those hashes. (Maybe there is a way, and this will turn out to be the wrong example to use to make this point. Actually, reducing a rational number is probably cheap enough that there's little reason to want to avoid that step anyway, but let's please keep imagining this premise for now, lol.)

Even if we don't have a good way to hash, there may be an alternative. Unreduced rational numbers have pretty cheap ordered comparisons: If their denominators are positive, an/ad <= bn/bd iff an * bd <= bn * ad, so their ordered comparisons can use the same cheap technique you were saying their equality comparisons could use. Certain map data structures like AVL trees use ordered comparisons to get benefits similar to hashing, so we can actually get somewhere with this.

Maybe if a map data structure in Racket has keys that prefer to be hashed, it should be a hash map, but if it has keys that prefer to be ordered, it should be an AVL tree. A compound value like a struct or a collection may contain a mix of values that prefer to be hashed and values that prefer to be ordered, so maybe maps keyed by those compound values need to be tries, with each layer of the trie indexed in the way its values need. To support all three kinds of value at once, a single Racket map could use multiple backing implementations, carrying a hash map for the hashable keys, an AVL tree for each family of totally orderable keys, and a trie for compound keys.

Now, while our starting point may have been gen:equal-mode+hash, we've found the need for some substantial changes. We'll have new methods for checking what form of indexing to use, for performing ordered comparison, and for iterating through the keys to use at every layer of a trie. We'll need to ask everyone to revise all their gen:equal-mode+hash implementations to support these methods, paying special attention to migrating all their compound types to use trie indexing.

But if we start from a world where every type is built in or uses a key method, then the only types we have to migrate are the built-in ones. The details end up being internal to the language implementation. In this way, the key methods can lead to more efficient support for this new never-to-be-reduced rational number type by giving us the leeway to upend the rest of the comparison infrastructure to accommodate it.

In attempting to give an example of why not everyone can use the key function approach, you may have given an example of the kind of type that would actually be better supported if everyone did.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

If you want unordered hashing, return a set

I guess that would work, but if it relies on allocating a complex data structure like that... it requires a lot more than just adding fixnums together and letting them wrap around like hash-code-combine-unordered does.

unreduced rational numbers

Yes, computing the hash code for those would require factoring, but the equality procedure doesn't need to use any factoring... this is why equal? shouldn't call hash-proc first. equal-proc in this case is much cheaper than hash-proc.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

@rocketnia: If you want unordered hashing, return a set

@AlexKnauth: I guess that would work, but if it relies on allocating a complex data structure like that... it requires a lot more than just adding fixnums together and letting them wrap around like hash-code-combine-unordered does.

You're already allocating one data structure, so why not another? If a certain built-in set type has too much extra stuff in its representation, maybe look for a more efficient built-in set type to use in its place.

The internal implementation of hashing for built-in set and map types could add fixnums together just as you describe (if they use hashing at all).

@rocketnia: unreduced rational numbers

@AlexKnauth: Yes, computing the hash code for those would require factoring, but the equality procedure doesn't need to use any factoring... this is why equal? shouldn't call hash-proc first. equal-proc in this case is much cheaper than hash-proc.

I'm making a different point than that. You may have introduced that example originally to show that it wasn't always good to compare by hash first when doing comparisons, but what I'm saying is that if we dig deeper into thinking about that kind of hash-unfriendly type, then all the implementations of gen:equal+hash and gen:equal-mode+hash people have written are prohibitive and need to be migrated to accommodate a case that key methods would already support.

from rhombus-prototype.

AlexKnauth avatar AlexKnauth commented on May 25, 2024

Re: unreduced rational numbers

@rocketnia: to accommodate a case that key methods would already support

No, this is a case where key is worse than an interface with equal-proc and hash-proc, because for the cases where it only needs equal-proc, it doesn't need to factor anything at all. The key method doesn't support that.

However, your comment about an/ad <= bn/bd iff an * bd <= bn * ad making order-based maps possibly more convenient than hash-based maps for these is interesting. But that advantage is completely negated if the comparison function is forced to go through a key method, forced to go through unnecessary factoring.

from rhombus-prototype.

rocketnia avatar rocketnia commented on May 25, 2024

@AlexKnauth: But that advantage is completely negated if the comparison function is forced to go through a key method, forced to go through unnecessary factoring.

With the key method approach, a type with unusual comparison needs like this one would be built into the language core. It wouldn't need to have its own key method. (As @countvajhula phrased it, "It would be better to find ways to incorporate such special cases using special handling rather than design the core equality extension interface around them.")

Once one unreduced fraction type is built in, users can define new unreduced fraction types which use the existing one as their key result, thereby having all the same comparison behavior. (And since it's returning an unreduced fraction, the key method doesn't need to reduce it.) This makes the system general enough to accommodate related ideas, like unreduced fractional complex numbers or intervals with unreduced fractional endpoints, without all these things having to be built in too. Whenever there's a newly discovered unmet comparison need, it becomes a new built-in type that can be composed with all the others, rather than a new generic method that changes the extension interface for everyone.

from rhombus-prototype.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.