Comments (8)
How does whatever we do work with blocks that return a literal, eg if?
from ponyc.
New plan after long discussion:
Basically we do top down typing for integer and float literals.
We type check expressions bottom up as now. Integer literals are assigned the type "integer literal" and float literals the type "float literal". These are internal types that only exist within the type checker, they cannot be used in Pony code.
When literals are combined by operators the result is still a literal. For example, an integer literal plus another integer literal gives an integer literal. An integer literal multiplied by a float literal gives a float literal.
When something with literal type is combined with something with an actual type, we infer what type the literal should and (if legal) we push that type back down the literal typed tree. For example, if we add an integer literal to something with type U32, then we infer that the literal must also be of type U32.
This works perfectly well for control structures such as if as well as for operators.
In some cases we won't be able to tell the type. For example:
var x = 3
Such cases will be errors. Later on we may add more sophistiocated inference based on use later in the code, which would then make some of those cases legal. However such changes can only make existing error cases legal, they cannot make existing legal code illegal or change the effect of existing legal code.
When the type needs to be specified for a literal, constructor call syntax for the required type is used, eg U8(4) will generate a U8 with value 4.
from ponyc.
(Edit: to make future conversations easier, everything described in this comment wil be referred to as Plan C.)
The current state of my thinking.
Terminology:
UIF type - any unsigned, signed or float type. Equivalent to a "number" type, but less ambiguous in general use.
The basics
When type checking, every literal value is marked as having literal type (with a TK_LITERAL AST node). Literal sub-expressions, ie sub-expressions containing only things with literal type, are also marked as literal type. Each literal type has flags to indicate whether it must be a float, must be an integer or can be either (referred to as a number).
When something with literal type is used we coerce that literal to an actual type. "Using" includes assigning to a local or field, passing as a method argument or returning as a method return value. In all of these cases we know the set of types that are legal (referred to as the "target type") and pick the "best" one or generate an error if none is legal.
Examples:
var x:U64 = 3 // Target type is U64, 3 becomes U64
var x:None = 3 // Target type is None, error
var x:Number = 3 // Target type is Number, 3 becomes F64
foo(3) // 3 becomes type of argument of foo() or error, depending on definition of foo()
The definition of "best" type is based on a simple ordering: F64, F32, I128, U128, I64, U64, I32, U32, I16, U16, I8, U8 with F64 being the best.
When something with literal type is used is such a way that we cannot determine the type it is an error. This includes simply discarding the value and using it as a receiver.
Examples:
var x:U64 = 3; 4 // Error. 4 can become U64, but type cannot be deduced for 3
var x = 3 // Error, cannot do simultaneous literal and local inference
3.foo() // Error, no idea what foo() to call
An exception to this last point is the special functions equivalent to the infix operators. For these if either the receiver or the (single) argument has a known type then other is coerced to that type. If both have literal type then the whole sub-expression has literal type. Note that this requires combining the float / integer / number-ness of the 2 types appropriately, eg "number + float" gives a float and "integer + float" gives an error.
Examples:
3 + 4 // Has type literal number
3.add(4) // Has type literal number
3.0 + 4 // Has type literal float
3.add(U32(4)) // 3 becomes U32, whole expression has type U32
var x:U64 = 3 + 4 // 3 and 4 both become U64
The unary operators are handled similarly.
Control structures
For control structures (if, while, etc) we allow each branch to have a separately inferred type. Note that break statements count as "branches" for these purposes, so there may be an arbitrary number of branches. If the value of the whole control structure is assigned to something with a simple type then all the branches will end up with the same type anyway. But if it is assigned to a union then they may be different.
Examples:
var x:U64 = if b then 3 else 4 end // 3 and 4 both become U64
var x:U64 = if b then 3 else 4.0 end // Error, 4.0 cannot be coerced to U64
var x:Number = if b then 3 else 4.0 end // 3 becomes I128 and 4.0 becomes F64. Type of if block is (I128 | F64)
This means that control structures which have some branches with literal type and some with non-literal type just work.
Example:
var x:(U64 | None) = if b then 3 else None end // 3 becomes U64
If the value given by a control structure is used as a receiver then we have to have special consideration. We use the fact that we can't call methods on unions. This means that all the branches from such a control structure must give the same type. This gives us 3 possible cases:
A. At least 1 branch gives a type that is not a UIF type. This must be an error. Example:
(if b then 3 else None end).foo() // Error
Similarly 2 or more branches with different UIF types is an error. Example:
(while(c) do
if b1 then break U64(3) end
if b2 then break U32(4) end
3
end).foo() // Error
Note that I'm ignoring while else clauses in these examples for the sake of brevity, but they obviously need to be handled.
B. The non-literal branches give exactly 1 UIF type and no non-UIF types. In this case we coerce the literals to the other branch type. Example:
(if b then 3 else U32(4) end).foo() // 3 becomes U32
C. All the branches give literal types. In this case we treat the expression exactly as for a simple literal receiver above, except that combining the arguments is more complex since each might be a union of literals. This is why the properties float, integer and number need to be flags. Example:
(if b then 3 else 4 end).add(5) // Has type literal number
(if b then 3.0 else 4 end).add(5) // Has type literal float
(if b then 3.0 else 4 end).add(1 << 2) // Error, float + integer
(if b then 3.0 else 4 end).add(if b2 then 5 else 6.0 end) // Has type literal float
(if b then 3.0 else 4 end).add(if b2 then 5 else 1 << 6 end) // Error, float + integer
(if b then 3 else 4 end).foo() // Error, don't know what foo() to call
Break non-locality
One important case to watch out for is that break statements may have their value inferred, but may also appear within some other expression that is inferred completely separately. Example:
var x:U32 = (while c do
var y:F64 = (if b then break 3 else 4 end)
5
end)
The 4 becomes F64 because it is part of the y inference. However, the 3 becomes a U32 (as does the 5) because it is part of the x inference. This has 3 key implications:
A. Break statements do not affect the type of their immediate surroundings. The if block is only a literal at all because of the 4, not because of the 3.
B. When we coerce a literal sub-expression to a particular value we can only set the type of the literals that are actually part of that value, not all that happen to appear within the AST. In this example when we coerce the if block to be F64 we must take care NOT to coerce the 3 to F64, but to leave it as literal.
C. The if block only actually has 1 branch that gives back a type. When handling control branches we need to check for ones which do not contribute a type and act accordingly.
Since control structures have to be able to coerce their break values, independent of where they are within the control structure, we have to be able to find the breaks. To enable this literal types for control structures need to contain a list of pointers to the ASTs (not just the types) of all their branches, including breaks. An additional TK_ value is used to build a linked list of nodes whose data members point to the relevant ASTs.
Tuples
For tuple literals we handle each tuple element separately. Any tuple expression with at least one element with literal type is itself considered a literal type. A literal tuple type contains a list of element types, each of which is either a non-literal type, a non-tuple literal type or a nested literal tuple type. Example:
(1, None, (2, 3.0)) // Has type literal tuple (literal number, None, literal tuple (literal number, literal float))
When determining the "best" tuple type we pick the best type for the first literal element and if that is a tie we pick the best type for the second literal element and so on. Example:
var x:((U32, U32) | (F64, U8)) = (1, 2) // 1 becomes F64 and 2 becomes U8
Methods cannot be called on tuples (and hence infix operators cannot be used on them). This means that we do not need to worry about special cases above for handling literal receivers, this is always an error. Example:
var x:(U32, U32) = (1, 2).add((3, 4)) // Error
When tuples are combined with control structures we may end up with types that are unions of different tuple cardinalities (including 1, ie not a tuple). This is actually fine and just works out. Examples:
var x:(None | (U16, U32)) = if b then None else (3, 4) end // 3 becomes U16 and 4 becomes U32
var x:(U8 | (U16, U32)) = if b then 3 else (4, 5) end // 3 becomes U8, 4 becomes U16 and 5 becomes U32
Complex tuple target types with control structures
When we consider control structures and complex tuple target types our definition of "best" type falls down. For any literal sub-expression we may be comparing a pair of types, one for floats and one for integers. Example:
var x:((U64, (F64 | U8)) | (U64, U32)) =
(2, if b then 3 else 1 << 4 end)
Clearly 2 must become U64. But we have a choice for the other literals:
A. 3 becomes F64 and 1 and 4 become U8, or
B. 1, 3 and 4 all become U32.
The rule we use is that first we pick the best float type (if any) and then the best integer type. So in this example we use option A.
Using formal parameters
When a target type includes a formal parameter we cannot necessarily determine the correct type to use for a literal. Instead we split the task into 2 parts:
A. We check at compile time that we are always guaranteed to have at least one legal type for the literal once reification occurs.
B. When reification occurs we then pick the actual types for the literals, which must always be possible because we already performed A.
Examples:
fun box f[A: Number]() => var x:A = 3 // OK, can always pick a legal type for 3
fun box f[A: Signed]() => var x:A = 3 // OK, can always pick a legal type for 3
fun box f[A: Number]() => var x:A = 3.0 // Error, f[U32] does not work
fun box f[A: (Number | None)]() => var x:A = 3 // Error, f[None] does not work
Note that we can always pick the best type at reification even if we don't know if it's going to be the type parameter before reification. Example:
fun box f[A: Number] => var x:(A | U64) = 3
f[F64]() // 3 becomes F64
f[U16]() // 3 becomes U64 (even though A is U16)
For more complex examples involving formal parameters exactly the same principle is followed. Example:
fun box f[A: Float, B: Number]() =>
var x: ((U32, A) | (I32, B)) = if b then (1, 2.0) else (3, 4) end
// OK, but don't know types until reification
f[F64, U32]() // 1 becomes U32, 2.0 becomes F64, 3 becomes I32, 4 becomes U32
f[F32, F64]() // 1 becomes I32, 2.0 becomes F64, 3 becomes I32, 4 becomes F64
Note that the 2 branches of the if block are inferred separately.
Canonical form
The combination of tuples, intersections and formal parameters make the implementation of determining whether there is a legal type (and what it is after reification) a nightmare. To get round this we convert the target type to a canonical form before we start coercion. To do this we push intersections inside tuples and unions, leaving us with intersections only of simple types and formal parameters (which cannot be tuples). This will be a bit of a pain to implement, but makes everything else much easier.
from ponyc.
This is the outline of the new simplified plan D.
Whereas in plan C we aimed to always acheive the correct result, here we aim to acheive a correct result where sensible, but be a lot more willing to say we can't cope with nasty cases. The aim is to handle all "reasonable" cases and require the programmer to explicitly specify the type of literals in "unreasonable" cases. Obviously the definition of "reasonable" is highly subjective.
There are a few key changes from plan C. These are all interdependent to various degrees, so we cannot treat or apply them independently.
- We no longer pick the "best" type, all ambiguity is treated as an error. Whilst picking the best type does simplify some simple cases it makes handling very complex cases much harder, partly because it makes it very difficult to determine when we're in a very complex case. Treating all ambiguity as an inference failure makes spotting and bailing out of complex cases very easy.
- We only infer literal types assigned to single UIF type and formal parameters with constraints that are subtypes of [A: (Real[A] & Number)]. (Assigning here includes passing as an argument etc, as in plan C.)
- Rather than trying to infer the types of tuples as single unit we handle each tuple element separately. This means we will now sometimes pick types for literals that actually work out to be illegal, but these errors will be caught by the type check for the coercing assignment.
- Each literal is inferred to be exactly one of a single UIF type, a single formal parameter or an error. No complex compound types are allowed. This means there is no post-reification step.
- As stated above, we do not try to deduce overly complicated types, if it gets too hard give up. This is only possible because of the other changes.
As part of the general simplification we will be giving up on complex type expressions much more quickly. For example, given 2 random types T1 and T2, consider this subtype expression:
((T1 | T2) & T1)
This is actually equivalent to type T1 and T2 is irrelevant. We used to try to work through this, now we don't bother. If this gets too hard (which depends on what T1 and T2 are) we give up, even though it always can be reduced to T1.
There are many things we can do to aid in cases like this. For plan D we will start by implementing exactly none of these. Since only very complex types will benefit from such tricks, it is probable that they will never actually be used. If we do end up running into such complex types then various heuristics can be implemented on an ad hoc basis.
The basics
The basic examples are unaffected by this change.
var x:U64 = 3 // Target type is U64, 3 becomes U64
var x:None = 3 // Target type is None, error
foo(3) // 3 becomes type of argument of foo() or error, depending on definition of foo()
var x:U64 = 3; 4 // Error. 4 can become U64, but type cannot be deduced for 3
var x = 3 // Error, cannot do simultaneous literal and local inference
3.foo() // Error, no idea what foo() to call
Unions
One case that has changed is assigning to a union. Since this is ambiguous it is now an error:
var x:Number = 3 // Error, ambiguity
Assigning to a union is OK if there is only one appropriate UIF type in the union. Examples:
var x:(U64 | None) = 3 // Target type is (U64 | None), 3 becomes U64
var x:(U64 | F64) = 3.0 // Target type is (U64 | F64), 3.0 becomes F64
Control cases can still have different types for different branches, even if both are literals. Example:
var x:(U32 | None) = if b then 3 else None end // 3 becomes U32
var x:(U32 | F64) = if b then 1 << 2 else 3.0 end // 1 becomes U32, 2 becomes U32, 3.0 becomes F64
But each branch must be unambiguous:
var x:(U32 | F64) = if b then 2 else 3.0 end // Error, 2 could be U32 or F64
Tuples
For tuples, in plan C we checked the types of all elements. Example:
var x:(Foo, U32) = (x, 1)
In plan C: if x is a Foo then the 1 becomes U32. If x is not a Foo then we could not infer a type for 1, error.
In plan D: we always infer that 1 is U32 and we ignore the Foo element. If x is not a Foo the assignment type check will spot this, so it's not our problem.
Given a more complex example:
var x:((Foo, U32) | (Bar, U64)) = (x, 1)
In plan C: we would check if x was a Foo and if x was a Bar pick a type appropriately. If x was neither a Foo or a Bar an error would occur.
In plan D: we ignore the first element and take the target type for the second to be (U32 | U64). This is ambiguous, so the type is not inferrable.
Another tuple example:
var x:((Foo, None) | (Bar, U32)) = (x, 1)
In plan C: again we would compare the type of x to Foo and Bar before deciding whether to coerce the 1 to U32 or generate an error.
In plan D: the target type for the second element is (None | U32) so the 1 becomes U32. If x is not a Bar this is an error, but it will be picked up by the assignment.
Formal parameters
Since we now limit literal assignments to constraints that are subtypes of [A: (Real[A] & Number)] the example in Range still works fine:
fun box create[A: (Real[A] & Number)]() => var inc:A = 1 // 1 becomes type A
Since [A: (Real[A] & Number)] can only ever be bound to a single type (not a union) this is unambiguous.
Since our rule specifies the constraint is a subtype this still works for restricting to smaller sets of UIF types. Examples:
fun box create[A: (Real[A] & (U8 | U16))]() => var inc:A = 1 // 1 becomes A, either U8 or U16
fun box create[A: (Real[A] & Float)]() => var inc:A = 1.0 // 1.0 becomes A, either F32 or F64
What we have outlawed is assigning literals to formal parameters with the looser constraint of just Number. Example:
fun box create[A: Number]() => var inc:A = 1 // Error, ambiguous since A may be a union
I have not come up with an example where this would actually be a problem, requiring A to possibly be a union and still needing to assign a literal to it. Since they may be unions you can't do anything with the values anyway, other than store them, hand them out or pass them through a match. If it does ever actually come up an additional formal parameter can be used to specify the initialiser type. Example:
fun box create[A: Number, B: (Real[B] & A)]() => var inc:A = B(1)
Assigning to unions of formal parameters and other formal parameters or UIF types has also been outlawed, since it is ambiguous. Examples:
fun box create[A: (Real[A] & Number), B: (Real[B] & Number)]() => var inc:(A | B) = 1
fun box create[A: (Real[A] & Number)]() => var inc:(A | U32) = 1
Both the above are errors due to ambuguity.
Canonical form
Since we no longer have to deal with intersections of unions of complete tuples we no longer need to convert types to canonical form.
After thought
Reading back this comment having written it, it doesn't really give the impression that plan D is particularly different to plan C, other than for formal parameters. However, the difference in complexity for the implementation is huge.
from ponyc.
This is now done, but there are a few edge cases to tidy up, so I'm leaving this open for a little while.
from ponyc.
I forgot about inference in match patterns. This won't be hard to do, but I want to check I have the right plan before I start.
Given:
match expression
| pattern =>...
- We don't infer a type for the expression, because why would you want to do "match(3)" and what type could that possibly be?
- The type for the pattern is inferred to be the type of the expression.
- All the patterns in a given match infer to the same expression type. There's no inferring to previous patterns or anything like that.
from ponyc.
- Yes. But we should make sure that if the match expression is a literal, we don't assert :)
- Yes. Of course, it could be some subtype of the expression, but that's exactly what the literal inference does anyway.
- Yes. Patterns are independent of each other, they are only dependent on the match expression.
from ponyc.
All done (hooray)
from ponyc.
Related Issues (20)
- Release 0.56.0 HOT 3
- Release 0.56.1 HOT 2
- ponyc segfaults when trying to match using a single element tuple form HOT 6
- Windows TCP tests are very flakey HOT 13
- Compilation error when using memalign HOT 1
- GDB script for running tests appears broken
- Get cross compilation tests working in gdb
- Windows CI sometimes hangs HOT 1
- Backend null in epoll during musl CI run
- pony_os_peername crash
- Experiment: Change Pony errors to use return flags (internally) rather than exceptions HOT 2
- Release 0.57.0 HOT 4
- Incorrect dtrace probe information HOT 8
- Possible infinite loop in ponyint_formattime
- dtrace support is missing from the Makefile HOT 3
- Trait default method segfault HOT 4
- Dependent default types HOT 4
- x86 Mac OS Sonoma Compile Error with XCode 15 and HOT 21
- Backend null in epoll during CI run HOT 3
- Non-sendable data seen as sendable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ponyc.