webassembly / spec Goto Github PK

WebAssembly specification, reference interpreter, and test suite.

Home Page: https://webassembly.github.io/spec/

License: Other

Makefile 0.20% OCaml 3.43% Python 5.02% Shell 0.07% Standard ML 0.01% Batchfile 0.07% CSS 0.01% JavaScript 2.26% WebAssembly 88.08% HTML 0.03% Perl 0.13% TeX 0.01% Bikeshed 0.70%

specification

spec's Issues

Segments always being char*, should we not allow data as well?

Right now, this is fine for testing but the spec defines this as being:

segment: ( segment <int> "<char>*" )

I am supposing this is just temporary and at some point, in the binary format, we will actually have something more like:

segment: ( segment <int:start> <int:len> <data of len size>)

or:

segment: ( segment <int:start> <int:len> <offset to data of len size>)

Perhaps in the text format, since this was temporary, it might be the reason char* was decided to be used. Now however, I would love to have a means to initialize some integer/floating-point arrays to do some more testing.

However, I don't want to move away from the ml-proto support. Could we perhaps extend this to something like:

init_data: byte_hex init_data |
segment: ( segment <int> "<char>*" ) | ( segment <int> <int> (init_data) )

This would allow us/me to have segment define integer arrays or whatever we want.

Opinions/Criticisms/Did I miss something?

Significant performance regression

I updated to the latest revision of the spec and ml-proto is much slower than it used to be. The NBody test (https://github.com/WebAssembly/ilwasm/blob/master/third_party/tests/NBody.cs) now utilizes ~100% cpu for multiple seconds while running; it used to run at a reasonable speed.

Do we care about the performance of ml-proto? Overly slow execution will hinder our ability to iterate and bias us away from real-world tests, I think.

Document behavior of label name reuse.

Currently, the spec does not say if label name reuse is allowed or not:

f3015ed#diff-d372e456aea857e846293c5f4add7c81R156

I'm not sure what the current implemented behavior is, but I suppose there are only two options:

Not allowed.
Allowed, but scoped (not allowed if ambiguous).

Split tests out into a separate repository?

We want to make it as easy as possible for people developing WebAssembly implementations to run the official testsuite, because that's one of our main mechanisms for ensuring portable behavior. @Teemperor suggested on irc that we split the tests out of the spec repo into their own repo, so that they can be included as submodules in other projects.

I think this sounds like a good idea. Any objections?

Mystery syntax error on file suspiciously near 65536 bytes in size

My branch here:

https://github.com/WebAssembly/spec/tree/large-file-syntax-error

has a (work-in-progress) test in test/float64.wasm. It currently fails mysteriously with

test/float64.wasm:524.130-524.130: syntax error

Line 524 is syntactically correct. Deleting seemingly any single the assert line from the file makes the syntax error go away. Suspiciously, the file size is just over 65536 bytes.

Anyone have any clues as to what could be going on here?

Multiple return values mismatch with AstSemantics

AstSemantics suggests that multiple return value calls are possibly pushed to post-MVP but they are present in the spec implementation (complete with test case). Is spec supposed to represent MVP or something more than that?

Separation into classes?

I am having a hard time deciding how to represent WebAssembly in a C++ class hierarchy in binaryen, and would appreciate advice. Right now I have a separate class for call and call_import, for example, although I have just one if whose else might be null, instead of two classes, and that feels inconsistent, but I'm not sure which way to refactor it.

I also see that we have br and br_if which have an optional value argument. That seems to be more consistent with having one if class with an optional argument. Is that intended?

Load/store default alignment?

I can't find anything in the spec interpreter that defines the default alignment of loads/stores. If an align= attribute is omitted, then the memop's align member is None, and the evaluator just ignores the alignment. Can it be made more explicit what the default alignment is?

br_if question: order of evaluation

As always, it seems I come back to ensure that I'm reading things right. With the new testsuite, I have come up with an issue with the labels.wast test.

Before I go into the problem I am facing, let me first mention that it seems that the ml-proto is supporting:
(br_if (i32.const 1) $outer (set_local $i (i32.or (get_local $i) (i32.const 0x10))))

But the https://github.com/WebAssembly/spec/tree/master/ml-proto file says only:
( br_if <expr> <var> ) ;; = (if_else <expr> (br <var>) (nop))

When it should really be:
( br_if <expr> <var> <expr>?) ;; = (if_else <expr> (br <var> <expr>?) (nop))

no?

Second, this last part seems not to be entirely true because the order of evaluation seems then off. When I consider the test below:

  (func $br_if (result i32)
    (local $i i32)
    (set_local $i (i32.const 0))
    (block $outer
      (block $inner
        (br_if (i32.const 0) $inner)
        (set_local $i (i32.or (get_local $i) (i32.const 0x1)))
        (br_if (i32.const 1) $inner)
        (set_local $i (i32.or (get_local $i) (i32.const 0x2)))
      )
      (br_if (i32.const 0) $outer (set_local $i (i32.or (get_local $i) (i32.const 0x4))))
      (set_local $i (i32.or (get_local $i) (i32.const 0x8)))
      (br_if (i32.const 1) $outer (set_local $i (i32.or (get_local $i) (i32.const 0x10))))
      (set_local $i (i32.or (get_local $i) (i32.const 0x20)))
    )
  )

The assertion requires this method to return 0x1d. This means, we evaluate the sets with the 0x1, 0x4, 0x8, and 0x10.

It is the 0x4 that I have problems with:
(br_if (i32.const 0) $outer (set_local $i (i32.or (get_local $i) (i32.const 0x4))))

In this case, we thus evaluate first the child before branching. Therefore, this is not equivalent to putting a if around the (i32.const 0) and doing a br with the label and the set_local expression.

So, which is it?

Is br_if a short-hand for an if with a branch, ie:
( br_if <expr> <var> <expr>?) ;; = (if_else <expr> (br <var> <expr>?) (nop))
Is br_if a short-hand to:
( br_if <cond_expr> <var> <expr>?) ;; = do the expr if there and remember the result, do the if <cond_expr> and, if it is true, jump to <var> while return the result

Not sure I was entirely clear here but bringing up little issues I see.

Note about what the design repo says:

"All nodes other than control flow constructs" , so this does not work here, right?
The Branches and nesting section does not express when the br_if should evaluate the expression if it is there

Thanks for your input and I hope you all don' t mind me spamming about these questions :)

define <char> in the README

is currently referenced in the definition of export and data:

export: ( export "<char>*" <var> )
data:   ( data "<char>*" )

We should fully define which characters and escape sequences are allowed here.

Parser rejects i32.reinterpret/f32

For this code:

(assert_eq
  (i32.reinterpret/f32 (f32.const 1.0))
  (i32.reinterpret/f32 (f32.const 1.0))
)

wasm says:

test/test.wasm:2.4-2.23: syntax error

As far as I can tell, i32.reinterpret/f32 should be a valid opcode recognized by the lexer. Is this code actually invalid, or is there a parsing bug here?

Filename extension for S-expression test files

The test files in ml-proto/test were originally name with a '.wasm' extension. However, we might expect .wasm to eventually be the filename extension for the official binary format. I propose we use '.wase' for the S-expression text format.

I accidentally committed a patch implementing this to master in be2b6ea which renames the files to '.wase', rather than submitting a pull request. I'm happy to revert it, or I'm happy to submit a new patch using some other extension, following whatever consensus we get here.

[interpreter] fp tests fail on Windows 32 due to bit pattern mismatches

test/f64.wast:2326.1-2326.103: assert_return f64 operands have different bit patterns
test/float_misc.wast:92.1-92.141: assert_return f64 operands have different bit patterns

======================================================================
FAIL: test/f64.wast (__main__.RunTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./runtests.py", line 37, in <lambda>
    return lambda self : self._runTestFile(*rec)
  File "./runtests.py", line 20, in _runTestFile
    self.assertEqual(0, exitCode, "test runner failed with exit code %i" % exitCode)
AssertionError: test runner failed with exit code 1

======================================================================
FAIL: test/float_misc.wast (__main__.RunTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./runtests.py", line 37, in <lambda>
    return lambda self : self._runTestFile(*rec)
  File "./runtests.py", line 20, in _runTestFile
    self.assertEqual(0, exitCode, "test runner failed with exit code %i" % exitCode)
AssertionError: test runner failed with exit code 1

OCaml's Bigarray module can't handle large linear memory sizes

memory.ml is currently using OCaml's Bigarray module, specifically Bigarray.Array1, to represent linear memory.

Bigarray's interfaces all use OCaml's int type for array extents and index values. While int has a host-dependent size, it is 31 bits on some common systems. It is also signed, so it can only hold values less than 1<<30.

Consequently, on 32-bit hosts, it seems the WebAssembly reference interpreter is limited to linear memory sizes less than 1 GiB, even when the underlying host is capable of allocating that much memory.

Am I understanding everything here correctly? And if so, are there any alternatives to Bigarray which allow for bigger sizes?

Change memory sizes to be in units of the page size.

Could we change the memory sizes, in the memory section for the min and max and in grow-memory to be in units of the pages size which is 64k?

Other sizes for the min and max can not even be encoded in the binary format, and it's just a burden for consumers and producers to have to be rounding these off and scaling them.

This will avoid the need for page size error checking in grow-memory.

Memory bug on Windows x64

On my x64 Windows 10 box memory.wasm fails mysteriously:

-- Assert invoking...
Result: 42. : f64
Expect: 42. : f64
-- Error:
test/memory.wasm:151.1-151.45: assertion failed

Not sure what's going on here, don't know ocaml well enough to debug.

AssertEq seems to reject any operand other than invoke or const

I can't figure out why it does this, but asserteq produces a syntax error if its operands are anything other than invoke or const. loads/loadu definitely don't work.

Bring names in line with AST semantics sooner rather than later

Before we get too far along I think we need to have a referendum on the naming going on in the spec. In particular, I think the spec should follow naming conventions that have already been agreed upon in AstSemantics.md. Also, we should solve some of the outstanding issues with naming in AstSemantics.md.

In particular, the operation.type in the spec bothers me. This might come from a viewpoint where operations are parameterized over types in some way. That kind of doesn't make sense, since there are many floating point operations available that don't work on integers and vice versa. Therefore I think it's more logical to do what we were doing before and consider operations to be members of types.

E.g.

type.operation
type.operation[type]

Break expression handling

Dear all,

As I was starting to handle more complex labeling, I was looking at the labels.wast case:

  (func $loop2 (result i32)
    (local $i i32)
    (set_local $i (i32.const 0))
    (loop $exit $cont
      (set_local $i (i32.add (get_local $i) (i32.const 1)))
      (if (i32.eq (get_local $i) (i32.const 5))
        (br $cont (i32.const -1))
      )
      (if (i32.eq (get_local $i) (i32.const 8))
        (br $exit (get_local $i))
      )
      (set_local $i (i32.add (get_local $i) (i32.const 1)))
      (br $cont)
    )
  )

I am especially interested in this break:
(br $cont (i32.const -1))

Is the idea behind this to say:

If I do branch back to the loop, execute the code of the expression before jumping (well here it would get DCE'd anyway but still)
- So for example, if it was (unreachable) we would trap there instead of DCE-ing it
Even though I know no one will consume it afterwards (compared to a branch to an exit for example)

I read the design and it wasn't clear if this was or not the behavior that we were going for and I'd prefer checking.

Thanks!
Jc

load_global after store_global doesn't return the value?

hi, the following code returns 0 for the 3 expressions in comments at the bottom.

shouldn't the last one return 32?

(module
   (global $rx0 i32)

   (func $gx0 (load_global $rx0))
   (func $sx0 (param i32) (store_global $rx0 (get_local 0)))

   (export "gx0" $gx0)
   (export "sx0" $gx0)

   (;
    (invoke "gx0") ;; returns 0
    (invoke "sx0" (i32.const 32)) ;; returns 0
    (invoke "gx0") ;; returns 0, should return 32?
   ;)
)

_start export

I have seen some talk about a _start export and see it mentioned here in the doc. Would be something we could as to the spec? Could it replace invoke?

Support for trapping

AstSemantics.md has the concept of a trap, which means

execution [...] is terminated and abnormal termination is reported to the outside environment.

Several operators are defined to trap, such as integer division when the denominator is zero. What's the best way to represent this in OCaml? Would an exception be appropriate?

In the future, trapping behavior may be made customizable, though I don't think it's necessary to try to anticipate this in the current code.

Make assert/invoke require constant parameters?

I just noticed that invoke accepts arbitrary expressions as parameters. WAVM currently generates code for each assertion, but I was hoping to get rid of that and make each assertion just call some exported function with constant parameters.

It could also go the other way and allow expressions other than invoke as the left hand side of the assertion, but what's there right now seems like an awkward trade off between power and simplicity that achieves neither.

handling dead code

@AndrewScheidecker and I were chatting yesterday about the expression value of the return statement, specifically should something like this be allowed:

...
(f32.neg (return (f32.const 1.0)))

This looks like an error to me, and sexpr-wasm-prototype and v8-native-prototype treat is an error. But it is currently allowed in ml-proto. @sunfishcode also suggested that it might be useful to allow dead code for non-optimizing wasm.

Thoughts?

runtests.py errors on Linux

Running runtests.py produces the following output:

// building main.native
SANITIZE: a total of 54 files that should probably not be in your source tree
  has been found. A script shell file
  "/moz/wasm/spec/ml-proto/src/_build/sanitize.sh" is being created. Check
  this script and run it to remove unwanted files or use other options (such
  as defining hygiene exceptions or using the -no-hygiene option).
IMPORTANT: I cannot work with leftover compiled files.
ERROR: Leftover ocamllex-generated files:
  Files lexer.mll and lexer.ml should not be together in .
ERROR: Leftover ocamlyacc-generated files:
  Files parser.mly and parser.ml should not be together in .
  Files parser.mly and parser.mli should not be together in .
ERROR: Leftover object files:
  File memory.o in . has suffix .o
  File arithmetic.o in . has suffix .o
  File parser.o in . has suffix .o
  File script.o in . has suffix .o
  File source.o in . has suffix .o
  File main.o in . has suffix .o
  File print.o in . has suffix .o
  File eval.o in . has suffix .o
  File flags.o in . has suffix .o
  File lib.o in . has suffix .o
  File values.o in . has suffix .o
  File check.o in . has suffix .o
  File error.o in . has suffix .o
  File ast.o in . has suffix .o
  File lexer.o in . has suffix .o
  File types.o in . has suffix .o
ERROR: Leftover OCaml compilation files:
  File types.cmo in . has suffix .cmo
  File values.cmo in . has suffix .cmo
  File ast.cmo in . has suffix .cmo
  File memory.cmi in . has suffix .cmi
  File main.cmi in . has suffix .cmi
  File lexer.cmi in . has suffix .cmi
  File values.cmi in . has suffix .cmi
  File check.cmi in . has suffix .cmi
  File error.cmi in . has suffix .cmi
  File flags.cmi in . has suffix .cmi
  File script.cmi in . has suffix .cmi
  File arithmetic.cmi in . has suffix .cmi
  File eval.cmi in . has suffix .cmi
  File ast.cmi in . has suffix .cmi
  File source.cmi in . has suffix .cmi
  File parser.cmi in . has suffix .cmi
  File lib.cmi in . has suffix .cmi
  File print.cmi in . has suffix .cmi
  File types.cmi in . has suffix .cmi
  File parser.cmx in . has suffix .cmx
  File lexer.cmx in . has suffix .cmx
  File values.cmx in . has suffix .cmx
  File arithmetic.cmx in . has suffix .cmx
  File source.cmx in . has suffix .cmx
  File eval.cmx in . has suffix .cmx
  File flags.cmx in . has suffix .cmx
  File check.cmx in . has suffix .cmx
  File error.cmx in . has suffix .cmx
  File script.cmx in . has suffix .cmx
  File memory.cmx in . has suffix .cmx
  File ast.cmx in . has suffix .cmx
  File print.cmx in . has suffix .cmx
  File lib.cmx in . has suffix .cmx
  File main.cmx in . has suffix .cmx
  File types.cmx in . has suffix .cmx
Exiting due to hygiene violations.
Compilation unsuccessful after building 0 targets (0 cached) in 00:00:00.
Traceback (most recent call last):
  File "runtests.py", line 67, in <module>
    interpreterPath = rebuild_interpreter()
  File "runtests.py", line 57, in rebuild_interpreter
    raise Exception("ocamlbuild failed with exit code %i" % exitCode)
Exception: ocamlbuild failed with exit code 1

Performing make clean and then running produces a different set of errors:

// building main.native
SANITIZE: a total of 3 files that should probably not be in your source tree
  has been found. A script shell file
  "/moz/wasm/spec/ml-proto/src/_build/sanitize.sh" is being created. Check
  this script and run it to remove unwanted files or use other options (such
  as defining hygiene exceptions or using the -no-hygiene option).
IMPORTANT: I cannot work with leftover compiled files.
ERROR: Leftover ocamllex-generated files:
  Files lexer.mll and lexer.ml should not be together in .
ERROR: Leftover ocamlyacc-generated files:
  Files parser.mly and parser.ml should not be together in .
  Files parser.mly and parser.mli should not be together in .
Exiting due to hygiene violations.
Compilation unsuccessful after building 0 targets (0 cached) in 00:00:00.
Traceback (most recent call last):
  File "runtests.py", line 67, in <module>
    interpreterPath = rebuild_interpreter()
  File "runtests.py", line 57, in rebuild_interpreter
    raise Exception("ocamlbuild failed with exit code %i" % exitCode)
Exception: ocamlbuild failed with exit code 1

Tests to write

Here is my current rough list of "tests to write". I believe everything here is either specified in AstSemantics.md, has a link to an open issue/PR, or is obvious. Comments/corrections/additions welcome.

Misc semantics:

test that linear memory is little-endian for all integers and floats
test that unaligned and misaligned accesses work, even if slow
test that runaway recursion traps
test that too-big linear memory resize fails appropriately
test that too-big linear memory initial allocation fails
test that function addresses are monotonic indices, and not actual addresses.
test that one can clobber the entire contents of the linear memory without corrupting: call stack, global variables, local variables, program execution.

Operator semantics:

test that promote/demote, sext/trunc, zext/trunc is bit-preserving if not NaN
test that clz/ctz handle zero
test that numbers slightly outside of the int32 range round into the int32 range in floating-to-int32 conversion
test that neg, abs, copysign, reinterpretcast, store+load, set+get, preserve the sign bit and significand bits of NaN and don't canonicalize
test that shifts don't mask their shift count. 32 is particularly nice to test.
test that page_size returns something sane (power of 2?)
test that arithmetic operands are evaluated left-to-right
test that add/sub/mul/wrap/wrapping-store silently wrap on overflow
test that sdiv/udiv/srem/urem trap on divide-by-zero
test that sdiv traps on overflow
test that srem doesn't trap when the corresponding sdiv would overflow
test that float-to-integer conversion traps on overflow and invalid

Floating point semantics:

test for round-to-nearest rounding
test for ties-to-even rounding
test that all operations with floating point inputs correctly handle all their NaN, -0, 0, Infinity, and -Infinity special cases
test that all operations that can overflow produce Infinity and with the correct sign
test that all operations that can divide by zero produce Infinity with the correct sign
test that all operations that can have an invalid produce NaN
test that all operations that can have underflow behave correctly
test that nearestint doesn't do JS-style Math.round or C-style round(3) rounding
test that signalling NaN doesn't cause weirdness
test that signalling/quiet NaNs can have sign bits and payloads in literals

Expression optimizer bait:

test that a+1<b+1 isn't folded to a<b
test that that demote-promote, wrap+sext, wrap+zext, shl+ashr, shl+lshr, div+mul, mul+div aren't folded away
test that converting int32 to float and back isn't folded away
test that converting int64 to double and back isn't folded away
test that float(double(float(x))+double(y)) is not float(x)+float(y) (and so on for other operators)
test that x*0.0 is not folded to 0.0
test that 0.0/x is not folded to 0.0
test that signed integer div by negative constant is not ashr
test that signed integer div rounds toward zero
test that signed integer mod has the sign of the dividend
test unsigned and signed division by 3, 5, 7
test that floating-point division by immediate 0 and -0 is defined
test that ueq/one/etc aren't folded to oeq/une/etc.
test that floating point add/mul aren't reassociated even when tempting
test that floating point mul+add isn't folded to fma even when tempting
test that 1/x isn't translated into reciprocal-approximate
test that 1/sqrt(x) isn't approximated either
test that fp division by non-power-2 constant gets full precision (isn't a multiply-by-reciprocal deal)?

Misc optimizer bait:

test that the impl doesn't constant-fold away or DCE away or speculate operations that should trap, such as 1/0u, 1/0, 1%0u, 1%0, convertToInt(NaN), INT_MIN/-1 and so on.
test that likely constant folding uses the correct rounding mode
test that the scheduler doesn't move a trapping div past a call which may not return

Misc x86 optimizer bait:

test that oeq handles NaN right in if, if-else, and setcc cases

memory:

test that loading from null works
test that loading from constant OOB traps and is not DCE'd or folded (pending discussion)
test that loading from "beyond the STACKPTR" succeeds
test that "stackptr + (linearmemptr - stackptr)" loads from linearmemptr.
test loading "uninitialized" things from aliased stack frames return what's there
test that loadwithoffset traps in overflow cases

Misc x87-isms:

test for invalid Precision-Control-style x87 math
test for invalid -ffloat-store-style x87 math
test for evaluating intermediate results at greater precision
test for loading and storing NaNs

Control flow:

test that continue goes to the right place in do_while and forever
test that break goes to the right place in all cases where it can appear
test devious switch case patterns

validation errors:

load/store or variables with type void/bool/funcptr/etc.
sign-extend load from int64 to int32 etc.
fp-promote load and fp-demote store

multiple build processes, processes disagree

When using src/Makefile, one gets an executable at src/wasm. However, runtests.py expects an executable at src/main.native. It'd be useful for these to agree; which one is preferable?

Asserts for floating point tests

assert_eq uses ocaml's <> which returns false for -0 <> 0 and true for NaN <> x ∀x. This is the usual behavior, but for our floating point unit tests we have unusual needs: we do actually need to distinguish between -0 and 0, test for specific NaNs sometimes, and test for any NaN sometimes. What do people think of adding the following?

assert_eq_bits - like assert_eq, but reinterpret-casts both operands as same-size integers first
assert_nan - has one operand and asserts that it is a NaN (of any kind)

Calls to the embedder

The design repo is pretty vague about how calls to the embedder work, mostly because we were waiting on an implementation to really get a good understanding. I''m hoping that the spec repo can try things out, and help inform the design.

Could the spec repo implement something so that I can e.g. compile "Hello World!" and print that string to the screen?

I think the current agreement is that embedder calls look like declared-but-not-defined functions in the module. It's pretty similar to what dynamic linking would do.

Parser reports constant out of range at incorrect source location

For this testcase:

(module
  (func $add (param $x i32) (param $y i32) (result i32)
    (i32.add (get_local $x) (get_local $y))
  )
  (export "add" $add)
)

(assert_eq
  (invoke "add" (i32.reinterpret/f32 (f32.const 1.0)) (i32.const 0.0))
  (i32.const 1065353216)
)

wasm says:

test/test.wasm:8.2-8.11: constant out of range

The actual error is that I had copypastad a 0.0 into an i32 constant, but it took me a while to figure that out because the source line of the error message points at the assert_eq.

Inconsistencies with regard to cases

When reading the design document, I saw two things:

A) The if, br, br_if, case, and return constructs do not yield values.

B) A case node consists of an expression

Now, when looking at the testsuite, I saw this:

labels.wast contains:
(case $0 (i32.const 5))

Which seems to go against (A).

And:
(case $3 (set_local $j (i32.sub (i32.const 0) (get_local $i))) (br 0))

which seems to go against (B) if we are pedantic that an expression is different to a list of expressions.

My questions are the following:
A) should we fix this in the design or the spec?
B) should we fix this in the design or the spec?

I can do both if you wish (though again my Ocaml is a bit rusty but I can try :)); I just would need to know what we want for each. My thoughts are:

A) I like that case can return a value, I think there are little gains to not doing it and forcing a br there
B) I think that a case should respect the design document and only have one expression, if you want to have more complex things, add a block node

Opinions?

invoke outputs garbage

When invoking functions with a void return type, ml-proto acts like they have a return type and outputs garbage (the last expression in the function body, seems like?)

(module 
  (import $write "stdio" "write" (param i32 i32))

  (memory 4096 4096 (segment 0 "\89\50\4e\47\0d\0a\1a\0a\00"))

  (func $write_png_header
    (call_import $write (i32.const 0) (i32.const 9))
    (i32.const 7)
  )

  (export "write_png_header" $write_png_header)
)

(invoke "write_png_header")

prints 7 : i32 at the end.

'make' generates 4 executables

After the recent Makefile changes (which makes things much nicer), the first run of 'make' generates 'wasm' and 'unopt'. The second run adds 'main.native' and 'main.d.byte'. Ideally, 'make' would only generate 1 executable. For ease of use, maybe this should be the unopt build and you have to type 'make opt' to get the opt build?

Significant mismatches with AstSemantics

While working on #35 I ran into many significant mismatches with AstSemantics from the design repo. I think most of them are wrong:

There's no continue node. This can be awkwardly emulated with More Labels(tm) but we shouldn't do that. We should implement it.

switch uses break to suppress fallthrough at the end of a case, which is problematic, because it's overloading the normal break $label form in a way that is ambiguous. Fallthrough appears to be automatic, instead of opt-in, also. I think it should be explicit, and the default should be non-fallthrough - that way break isn't needed. That or we define a dedicated opcode like case-break.

AstSemantics specifies do-while and forever loop types; the prototype only has loop. This further complicates emulating continue since you need to make sure you jump to the right place.

comma isn't implemented. That will be a problem when trying to emulate particular logical/arithmetic constructs common in C++ and C#.

Base + immediate offset addressing

Base + immediate offset addressing seems missing from the spec, but is in the design:
https://github.com/WebAssembly/design/blob/master/AstSemantics.md#addressing

address.wast testcase constants

That testcase has

    (func $bad1 (param $i i32) (i32.load offset=4294967296 (get_local $i)))

and it tests that that traps on loads of 0 and 1.

The offset seems odd - that isn't a 32-bit unsigned. Is it internally represented as a larger number?
If it's a 32-bit unsigned, shouldn't that first turn into a 0, and so there is no fault?

OCaml download instructions are insecure

README currently says to get OCaml from source at http://caml.inria.fr/pub/distrib/ocaml-4.02/ocaml-4.02.2.tar.gz which is http, and doesn't have an https equivalent.

The newer instructions from ocaml.org to get the source are here: http://ocaml.org/releases/svn.html
The only https solution is through github... Which is automatically sync'd with the insecure svn repo.

Security people everywhere are sad.

WASM and SSA form?

May be a question here? Can WebAssembly format hold Static Single Assignment (SSA) based representation? Does it have opcode for the Phi function?

Tail recursion optimization not trapping

In the fac.wast there is an assert_trap:

(assert_trap (invoke "fac-rec" (i64.const 1073741824)) "runtime: callstack exhausted")

However, in my wasm-to-llvm-prototype code, LLVM removes the tail recursion and therefore this no longer traps.

Now I've seen here and there some comments about testing that we are not optimizing things away. So my first question relates directly to this test:

Are we not going to allow that the Wasm compiler performs optimizations such as tail recursion if it wants to?
- Is that true only for Web browser case or also the AOT case?
What about the other tests then? Should we not allow the compiler to optimize things away if possible?
- If for testing purposes, we could perhaps flag the module in a way to say: don't optimize this, we are testing something and, without that flag, be able to optimize away...

Ps: now that we have the awesome mirror: https://github.com/WebAssembly/testsuite; should these questions go there or is it simpler to keep them here?

S-expression significand syntax

I am new to wast syntax and s-expressions. I see several tests that uses "signalling NaN" in form of nan(0x01234567). My understanding that ( and ) are special characters in s-expression. Depend on S-expression parser can treat nan(...) as an extra level of tree. Will it be better to avoid using parents in nan scenario?

module loading

It would be nice if we could test importing other modules written in wasm

expected binary form for the testsuite

@sunfishcode thank you for maintaining WebAssembly/testsuite *.wast files . How about releasing expected binary *.wasm files to test conversion to from binary format?

Consider offsets and streaming compilation in binary format

With the wasm binary format we have the goals of:

enabling maximally efficient decoding and
allowing streaming compilation (by the engine) and transformation (by polyfills, the specific layer, Service Workers, etc).

Considering only goal 1, you want lots of nice offset tables up front so that a decoder can eagerly fork off parallel decode/compilation tasks, removing the usual sequential bottleneck of scanning to find boundaries. However, this is hazardous to goal 2: if I want to generate a stream of wasm and the exact target contents are not known a priori (and thus cannot be encoded in the prologue of the stream source; this would be the case for a feature-testing polyfill), then I won't know the offsets up front.

BinaryEncoding.md currently has 3 instances of tables-of-offsets in the global structure section: sections, functions and operator tables. Operator tables don't seem like an issue since they are tiny prologue stuff.

As discussed in #182, there are various alternatives to the up-front table of offsets. I like @JSStats suggestion that, for function definitions, putting the length inline in function definition achieves most of the win since streaming compilation probably needs to happen at function granularity anyways to allow, e.g., backpatching of the locals declaration.

Sections are harder: for the primary code section we definitely don't want to require knowing the byte length up front, but for some completely unknown section, we definitely want to be able to skip it. There are a couple of compromises that come to mind that seem viable that I won't enumerate here.

A more general point is that we should definitely have working examples of user-space streaming (polyfill and specific-layer decoding) before finishing v.1 to flush out all these issues. This was already roughly the plan, but it's important that these libraries don't take the liberty of performing full random access on the source/destination ArrayBuffers; they should behave as if they were operating on streams.

return?

The exports.wast testcase has a "return" opcode. Is that part of the spec?

String.create error in interpreter

I tried to parse the Poppler PDF library compiled to .wast in the interpreter. It's 39MB, and the interpreter errors on

[..]/spec/ml-proto/wasm: uncaught exception Invalid_argument("String.create")
Raised by primitive operation at file "host/main.ml", line 12, characters 2-206
Called from file "host/main.ml", line 53, characters 15-24
Called from file "list.ml", line 73, characters 12-15
Called from file "host/main.ml", line 86, characters 4-33

I'm not that familiar with ocaml, so I'm not sure what to do to investigate this further.

I can attach the file if that's useful (too big for github though, I think).

call_indirect

It's unclear how to use call_indirect and it doesn't seem to align well with the design repo.

As far as I can tell, call_indirect has roughly this syntax in spec right now:
(call_indirect <table_number> <function_index> <args>), where:

table_number is an integer literal - symbols don't work for some non-obvious reason
function_index is an expression that evaluates to an integer
args is obvious

The addressof primitive expected for getting function indices does not appear to exist. My assumption is that the code generator is expected to hard-code an index as it exists in the tables defined by the module.

The syntax for defining a function table appears to be (table <table_number> <function> <function>) where:

table_number is an integer literal - symbols fail to parse, as do bare words and quoted strings
function is just a symbol or function index, like in export.
Presumably this generates a function table with id table_number containing the provided functions in order, starting at 0 (1?)

Various questions remain that I can't easily figure out from reading the source:

What happens if the signature of call_indirect is a mismatch from the signature of the function the pointer is aimed at?
Are tables allowed to be heterogeneous?
Why do we have multiple tables in a single module to begin with?

I am probably wrong about some of this since I still haven't managed to write a functioning module that uses call_indirect, but I'll update once I do.

Support embedded newline characters in names?

In #141 I created a test which attempted to test all the ASCII control characters in exported symbol names. All of them worked except 0x0a, the ASCII newline character. The spec interpreter gave this error when I tried it:

test/names.wast:50.11-50.14: unclosed text literal

What is the intended behavior here? I don't presently have an opinion here; I could see arguments for restricting the character set in some says, but I could also see arguments that it should be entirely unrestricted.

separate index space for parameters/locals?

A recent commit separated out parameters from locals, giving the two overlapping index spaces. This is fairly different than what's in the design repo (only locals) but not a huge fundamental design difference (both work), so I thought we should discuss it here.

Finish TestingTodo.md

We should work through the remaining items in TestingTodo.md. Once finished, we can probably move to just filing new issues for new test ideas, at least until we start collecting ideas in bulk again (eg. when we introduce major features like threads).

Some of the items may already have tests (particularly the ones related to memory semantics), some may be out of date with respect to recent changes, some may need to be clarified, and some just need tests to be written. Anyone is welcome to join in; please post plans here so that we can coordinate our efforts.

Call return types and multiple passes?

I noticed when implementing binaryen's s-parser that I need to do 2 passes: one for almost everything, then later, when I know call return types, I can apply those. Because while we have i32.add etc., our call syntax doesn't mention the return type.

Maybe I'm missing something here?

webassembly / spec Goto Github PK

spec's Issues

Recommend Projects

Recommend Topics

Recommend Org