webassembly / tail-call Goto Github PK

Proposal to add tail calls to WebAssembly

Home Page: https://webassembly.github.io/tail-call/

License: Other

Makefile 0.18% Python 4.99% Batchfile 0.18% CSS 0.01% Shell 0.06% OCaml 3.43% Standard ML 0.01% JavaScript 2.23% HTML 0.03% WebAssembly 88.07% Perl 0.13% TeX 0.01% Bikeshed 0.68%

proposal

tail-call's Introduction

Tail Call Proposal for WebAssembly

This repository is a clone of github.com/WebAssembly/spec/. It is meant for discussion, prototype specification and implementation of a proposal to add tail call support to WebAssembly.

See the overview for a summary of the proposal.
See the modified spec for details.

Original README from upstream repository follows...

spec

This repository holds a prototypical reference implementation for WebAssembly, which is currently serving as the official specification. Eventually, we expect to produce a specification either written in human-readable prose or in a formal specification language.

It also holds the WebAssembly testsuite, which tests numerous aspects of conformance to the spec.

View the work-in-progress spec at webassembly.github.io/spec.

At this time, the contents of this repository are under development and known to be "incomplet and inkorrect".

Participation is welcome. Discussions about new features, significant semantic changes, or any specification change likely to generate substantial discussion should take place in the WebAssembly design repository first, so that this spec repository can remain focused. And please follow the guidelines for contributing.

citing

For citing WebAssembly in LaTeX, use this bibtex file.

tail-call's People

Contributors

Stargazers

Watchers

Forkers

webghc 020782 dalavancloud wingo iamthatiam777 noverby mnordine isabella232 seanpm2001

tail-call's Issues

Add more administrative instructions or modify the existing ones?

We have part of an implementation of this proposal over here. It adds two new call instructions and an optional annotation to function types to describe whether they are tail recursive or not. I believe the change is backwards compatible for existing code.

It needs more tests and documentation for the execution part, but other than that it seems to be coming together. I also need to remove some commented out code that shows me the stack depth as the interpreter runs.

I've made the execution part work with the addition of a couple of new administrative instructions. I'm about to try to tidy those up and document them (and then figure out what I need to do able host functions).

Before I do that, I thought I should ask in here: is there a preference between adding new administrative instructions and modifying the existing ones to carry more information?

I figure adding new instructions means that the proposal is self contained, whereas modifying existing administrative instructions risk having an effect elsewhere. I don't know what people's thoughts are with regard to whether backwards compatibility is more desirable / less desirable than keeping the number of administrative instructions low.

I haven't actually thought about how to modify the existing administrative instructions to do tail calls, but I can cross that bridge once I get to it.

[test] Test tail-call to imported function

There aren't any tests yet that use imports or attempt to tail-call an imported function -- there probably should be one (this would have caught a bug in wasm2c, WebAssembly/wabt#2420).

Rebase on spec repo?

There have been some spec interpreter improvements since this repo was last merged with the mainline spec repo. Any chance those could be merged in here as well?

Progress toward phase 4

The phase 4 requirements are:

Two or more Web VMs implement the feature.
- Partially done. Chrome supports tail calls behind the #enable-experimental-webassembly-features flag. No other engines have implementations according to https://webassembly.org/roadmap/.
At least one toolchain implements the feature.
- Done. LLVM and Binaryen have complete support.
The formalization and the reference interpreter are usually updated (though these two can be done as part of step 3 at the Working Group chair's discretion).
- Done (AFAICT).
Community Group has reached consensus in support of the feature.
- Judging from the lack of open issues on this repo, this shouldn't be a problem.

Is there anything left to do besides get another web engine implementation?

Tail calls in Schism

I have a branch of Schism (a self-hosting Scheme-to-Wasm compiler) that makes use of the new return_call and return_call_indirect instructions: https://github.com/eholk/schism/tree/tail-call-instr

I thought I'd give some feedback on the proposal from the standpoint of a language implementation.

Adding tail call support to Schism

Adding the new opcodes was a pretty straightforward affair. I chose to do it by implementing a new pass, replace-tail-calls, that finds all calls in tail position with tail call instructions instead. The rest of the work was just forwarding those changes through the compiler until the encoding step where it outputs the right opcodes.

The result validates in the spec interpreter, but I don't have a VM to run the output in yet. As time allows, I hope to modify the spec interpreter to implement Schism's runtime so that will be an option for running the output.

Schism's requirements from the tail call proposal

Scheme requires that an unbounded sequence of calls in tail position use a constant amount of stack space. All of Scheme's iteration facilities are expressed in terms of tail recursion (although some compilers recognize special cases and generate more traditional loops instead). Whereas in C and C++, tail recursion is an optimization the compiler may do, in Scheme tail recursion is not an optimization and is required for correctness.

So what does Scheme need from Wasm? Technically nothing. It's possible to generate a giant function with a big loop and a switch. It sounds like this is more or less what Go is doing. While this technically works, it's less than ideal. For example, it makes linking several Schism-compiled modules together impossible.

If Wasm has tail call instructions, Schism will happily use them. The most important thing for these instructions is that they are an honest-to-goodness tail call. For example, LLVM has the tail and musttail annotation, where the tail annotation can be ignored. Schism needs the semantics of musttail or else the instructions will be useless. Fortunately, this is the direction we're going with the proposal.

The ideal case for Schism is that Wasm supports tail calls from any Wasm function to any Wasm function, including to imported functions that were exported from Wasm. This will make for the easiest implementation and give the most flexibility. As far as Wasm-to-JS calls, I think it is fine if these cannot be done as tail calls.

As far as whether may-tail-call or may-be-tail-called should be tracked in the type system, this doesn't particularly matter for Schism. All functions in Scheme need to be able to make tail calls and be tail called, so if extra type annotations are required, Schism will just use them unconditionally.

There was a suggestion of making the tail call annotation a per-module setting. In my mind, I would think of this as a section that provides requirements on the calling convention, similar to how LLVM supports different calling conventions. Basically, the new calling convention section would have the option to say "this module requires a calling convention that supports proper tail call handling." If we do this route, I think it's important to be a specced section, rather than a custom section like most of the hint section proposals have been. The reason is that custom sections are allowed to be completely ignored or even dropped (for example, you could imagine a CDN that drops custom sections on the fly to save bandwidth), whereas Schism needs to be able to rely on tail calls being tail calls.

Another suggestion was to only support tail calls with the caller and callee signature are the same (so-called natural tail calls, although from my perspective this is an unnatural restriction on tail calls). This is actually more workable for Schism than I thought at first. While I haven't settled on a design yet, there's a reasonable chance that all functions in Schism will have the same signature, due to the way Scheme does varargs. Still, even if all functions have the same signature, these would probably be done as a wrapper function with the standard signature that calls the real function with a more precise signature. Schism would then be able to directly call the precise function in a lot of cases. This optimization would not be possible if tail calls can only go from the same signature to the same signature. Basically, Schism could work with this restriction, but it would significantly limit what it's able to do.

Anyway, I'm excited to see Wasm get support for tail calls. Hopefully this perspective from at least one language implementation is helpful.

Calling tail callable functions in non-tail position.

With most platforms' C ABI, the caller is responsible for popping the arguments to the callee after it returns. This helps to support getting the varargs ABI for free, among other things. With most tail calling conventions, when calling a function that may perform a tail call, the callee is usually the one responsible for popping its own arguments, so that it may reuse the stack frame for its successor. The tail called function will return by popping its own arguments and jumping to the return address. Thus, the calling convention for calling a function that may perform a tail call is somewhat different than the typical C calling convention. In fact, I believe that the typical tail calling convention is such that any function which may perform a tail call is also tail callable itself. i.e. If a function either performs a tail call, or is tail callable, it must pop its own arguments and jump to the return address. This is somewhat nice in that we only need one signature modifier; there is no distinction between "performs a tail call" and "is tail callable."

So a caller cannot perform a normal call on a callee that is tail callable, even though that caller is not doing a tail call. If it is the case that a function which performs a tail call uses the same calling convention as a tail call, then we can't simply create a normally callable wrapper that performs the tail call. Essentially, we must support tail call instructions in non tail positions. This isn't problematic, overall. The caller can simply push a custom return address before pushing args and jumping to the tail-callable function. This is similar to the typical C calling convention, except that it omits the step where it pops the arguments, as the callee will have done that itself.

But this raises the question: Should this functionality be a separate instruction? i.e. should there be a push_call instruction for calling tail callable instructions in non-tail positions? Or should the call instruction simply infer that it should use this behavior from the function signature, which indicates the calling convention?

I lean towards the former, that there should be a new instruction. It seems easier on hosts if they don't have to look up extra information while compiling call instructions. But I can imagine how bloating the number of instructions we have could be seen as problematic.

Alternatives to tail call

I am a strong supporter of the desired functionality of TCO.
However, there are alternative implementation strategies.
In particular, instead of having a special tail_call instruction, we could support garbage collection of the call stack.
This requires two elements: a reliable 'stack overflow' exception (with enough guaranteed room to execute a gc handler) and a way of walking the stack.
The merit of this approach is that it is (a) cheaper on an amortized basis and (b) should allow more flexibility on implementation of the ABI.

Question on design

I think this is the best place to post this question, and if not, please direct me to where I should go. My question is this: why add an instruction to enable properly optimized tail calls limited? Why not just make all function calls able to be TCO'ed? The only reason for this that I can think of is social in nature, not technical. As in, most languages don't "need" TCO, and so don't have it. This is like saying a language doesn't "need" structs/records to use adjacent pieces of memory, so your struct is a linked list. In some ways easier to implement, but vastly slower in performance...

So really, is there a good, technical reason for separating it out into a separate instruction? Or are we beholden to our Corporate OO Overlords? It almost seems the latter, as TCO doesn't even have a base implementation in any browser as of yet... 😢 https://webassembly.org/roadmap/

Stack trace associated with failing return_call_indirect

Currently there are no strict requirements for produced stack in the ~~WebAssembly.Exception~~ Error. I wonder what will be expected stack trace for the return calls.

SpiderMonkey implementation of call_indirect checks the function signature at the callee side. Information about originating call site will be lost. The trap handler has no association of the trap code and its source/origin, so we are planning to start stack trace from the previous frame, or keep it empty if there are no other frames.

The https://github.com/WebAssembly/tail-call/blob/main/proposals/tail-call/Overview.md#execution-1 has no instructions on the subject.

Multiple tail call instructions, or a prefix instruction?

Opening this issue to discuss this question from the proposal overview.

Use case: indirect jumps in x86 virtualization

We are currently developing an x86 virtualization solution based on WebAssembly, which is in a fairly advanced stage. It does not seem appropriate to get into details here, but if you want to know more we have presented some details of the JIT architecture at the Wasm SF Meetup back in February: https://www.youtube.com/watch?v=7JUs4c99-mo&t=4s (first 30secs or so of audio are missing. Apologies.)

In our latest post (https://medium.com/leaningtech/extreme-webassembly-1-pushing-browsers-to-their-absolute-limits-56a393435323) we discuss how arbitrary control flow is virtualized and in particular how the efficient implementation of indirect jumps requires tail calls. (And a first public demo of our technology is included)

I wanted to share this information with the community to provide a real use case for Wasm tail calls and possibly spur a discussion about the current state of the implementations and about the timeline for standardization of the feature. I apologize if a repo issue is not the right place for this, suggestions for a more suitable location are welcome.

We plan to release a followup post in a few weeks to detail the limitations we have found in current implementations of tail calls on how these limitations impact the project.

Calling Convention Features Section

I've been thinking more about @flagxor's suggestion to add a module-level bit that specifies functions in a given module should be compiled to support tail calls.

At first I was somewhat skeptical of this idea, because I interpreted it as a custom section that is not part of the core spec that includes tail calling just one of many optimization hints. For example, we've also tossed around the idea of having tools provide register allocation hints that could be freely ignored but could be used to do better register allocation at codegen time based on analysis the tools do at their leisure. To me, tail calls are a matter of correctness and not merely optimization, so I'm uncomfortable making this a section that can be ignored.

However, thinking of it as a section that lists calling convention requirements makes more sense to me. It's common for compilers to offer several different calling conventions. For example, LLVM lists support for 12 different conventions currently. Some of these conventions support certain features, while others don't. If we imagine Wasm impementations also supporting a number of conventions, we could the module specify which features it needs from the calling convention. Wasm engines would then choose their best calling convention that meets the requirements.

I'm not convinced yet that this is the way to go (I'd still prefer tail calling always be supported), but I think this is an idea that's worth exploring.

Some questions:

How would this impact validation? If I don't list tail calling as a requirement in the calling convention section, do tail calls refuse to validate? Do they silently become regular calls?
Related to validation, how should this be reflected in the type system? Does it actually simplify anything compared to having tail-callability in the function signature?
Are there any other features besides tail calling that we could imagine including in the signature? Would GC or exception handling benefit from placing some constraints on the calling convention?
How much benefit is there actually to choosing different conventions based on different feature requirements?

Clarification request: tail calls in unreachable code vs. multi-return

In reachable code the result types of any tail-called function must obviously match those of its tail-caller, i.e. there must be the same number of them, and they must be subtypes of them.

In unreachable code, I'm confused.

Interpretation 1

When starting with the understanding that "(return_call $x) is like (call $x) (return)" (which gives an accurate intuitive understanding in the common case, and has the nice property that it's a Wasm-to-Wasm transformation), it follows that the number of results of $x doesn't matter in unreachable code, since the polymorphic stack takes care of mismatches; but to the extent that they are present and overlap with the tail-calling function's result types, they must still be subtypes of those.

In other words:

(func $produce-i32 (result i32) (i32.const 0))
(func $valid
  unreachable
  return_call $produce-i32)
(func $valid-too (result f64) (result i32)
  unreachable
  return_call $produce-i32)
(func $invalid (result f64)
  unreachable
  return_call $produce-i32)
(func $invalid-too (result i32) (result f64)
  unreachable
  return_call $produce-i32)

This was my initial assumption, but after thinking about it for a while, I'm beginning to think that was a misunderstanding on my part.

Interpretation 2

When starting with the understanding that "return_call $x is like stashing away the call parameters (in a 'magical' way that isn't expressible in terms of other Wasm instructions), returning from the tail-calling function, restoring the call parameters on the value stack, and then calling $x", it follows that reachability inside the tail-calling function is irrelevant, and the result types of $x must always have the same arity as and be subtypes of the result types of the tail-calling function.
Is that correct?

I would appreciate it if this was explicitly clarified in the spec text and/or tests. The current test suite in this repository passes with either of these two implementations of the validation algorithm.

I stumbled upon this because V8's implementation is slightly inconsistent. We implement return_call and return_call_indirect per Interpretation 2, but (for historical and possibly accidental reasons) have a single test case that requires Interpretation 1 for return_call_ref. Obviously that should be unified, and in order to do so, I'd like to get clarity on what the right behavior is.

(Side note: this is one more chapter in the story of unreachable code validation consuming annoying amounts of time and energy to get right, while having next to zero practical relevance.)

Use case: supporting co_await in emscripten

emscripten-core/emscripten#10991 (comment)

see comment from @GorNishanov

Question about tail-invocation stack value length assertion (item 3)

I'm reading through the spec, and https://github.com/WebAssembly/tail-call/blame/master/document/core/exec/instructions.rst#L1134 says that there must be at least m values on the stack, but then proceeds to pop n items from it

Should that m be n? Or am I misunderstanding something?

Including tail-callability in signatures

An open question in the design space is whether to include tail-callability in function signatures. This would provide greater consumer flexibility for functions that are not tail-called. For example, non-tail-callable functions wouldn't be restricted to callee-pop calling conventions (on platforms where that applies).

This would also provide a tidy answer to the open question about tail calls to host functions: each host function would indicate its tail-callability in its signature.

Performing a tail call to a non-tail-callable function, or importing a non-tail-callable export as tail-callable, would have the same behavior as a type signature mismatch.

Performing a non-tail call to a tail-callable function, or importing a tail-callable export as non-tail-callable, could either:

behave as a type signature mismatch (simpler for spec and consumers, less flexible for producers), or
succeed (modest subtyping, more flexible for producers).

Tail-invoking host functions?

Does this proposal include tail-calling of host functions? The spec seems to only mention regular non-return invoking of host functions, but I didn't see anywhere that it explicitly says it's not possible.

Thanks!

Expand motivation section of overview

Some feedback I've heard about this proposal is that the motivation is unclear for why this has to be an operation in core wasm.

Specifically I've had it asked, "why can't this be done by compilers targeting wasm"? I don't think the motivation section goes into this at all, and I think that would be useful.

The best answer I've been able to give is that this is not possible for return_call_indirect, which is basically an indirect jump and has no close analogue in core wasm. I'm not as knowledgeable though about the general case of a functional language wanting to compile to wasm and the feasibility of performing tail call elimination in a compiler. I think for posterity it would be nice to have something to point to on this.