tc39 / proposal-binary-ast Goto Github PK
View Code? Open in Web Editor NEWBinary AST proposal for ECMAScript
Binary AST proposal for ECMAScript
In practice, pretty much all integer literal numbers we encounter are within [|-2^31, 2^31[| (I believe that I have never seen an integer literal outside of this range). Experience shows that introducing the following is a pretty good filesize win:
typedef (LiteralF64Expression or LiteralI32Expression)
LiteralNumericExpression;
interface LiteralF64Expression {
attribute double value;
}
interface LiteralI32Expression {
attribute long value;
}
Any objection to this? The assumption being that the long
will be a variable length encoded integer.
I don't understand what this assertion is meant to accomplish. Naively it looks like it would forbid
"directive";
("directive");
which I don't think it should?
It's possible for valid Javascript strings to be invalid unicode strings - arising out of the fact that JS strings are specced as arbitrary sequences of 16-bit words. This means that invalid UCS-2 sequences, for example \udc11
(which is a lone surrogate pair component) can show up in our string literals.
The binast encoding needs to handle this - we cannot assume that there is always a valid translation of a JS string to a UTF-8 string. This all relates to situations where 16-bit chars fall into the surrogate pair range.
My suggestion is the following: we translate the 16-bit word sequence as if it was a UTF-16 string. This means that when we see valid surrogate pair sequences, we translate those into unicode codepoints and re-encode as a UTF-8 sequence.
When we see surrogate pair values that occur in invalid circumstances, we encode those directly as codepoints. These 16-bit chars are not valid unicode codepoints, so there is no valid UTF-8 sequence that corresponds to them. Those sequences are thus "free" for us to use to encode invalid 16-bit codepoints.
I'm not 100% sure this needs to be addressed in the spec, but @Yoric suggested I make the issue here because it may need to be addressed here.
Semantically, the labels are meaningless after parsing, so why include them as part of the AST?
In case you're wondering what precedent there is for just ignoring the extra case
labels:
The following is a proposal to make clearer in the specifications which parts may be skipped and which may not.
In the current state of things, all lists may be skipped. This was decided because it was clear that hardcoding arbitrary interface names in the detokenizer was a bad idea, and because the only alternatives were allowing all interfaces to be skipped or allowing all lists to be skipped.
This is generally a waste of bytes, as we don't care about skipping most lists.
We add an extended attribute [Skippable]
to interfaces. A [Skippable]
interface is one in which
I'm not entirely clear about clause 2. at this stage, but I imagine that there are (or will be in the future) subsets of
Asserted*Scope
that are only useful when skipping. If so, this means that we only need them in a[Skippable]
interface.
Now, we redefine FunctionDeclaration
as follows:
typedef (EagerFunctionDeclaration or SkippableFunctionDeclaration) FunctionDeclaration;
interface EagerFunctionDeclaration {
// ...
// Pretty much what we currently have in `FunctionDeclaration`, unless we realize
// that some of the data is needed only for skipping.
}
[Skippable] interface SkippableFunctionDeclaration {
// Here, insert any attribute that may be useful for skipping.
// For instance, we could imagine an instruction for the parser to enqueue
// the node for background processing, with a given priority.
attribute EagerFunctionDeclaration content;
}
We apply the same treatment to other functions/method/getter/setter/...
EagerFunctionDeclaration
and SkippableFunctionDeclaration
.TL;DR - this is cool because it unifies the JS ecosystem AST, and can lead to perf when compiling JS in the current toolchain.
We expect the format to be output by existing compilers such as Babel and TypeScript, and by bundlers such as WebPack.
If I or someone else doesn't bring it up at the meeting, I'l just note here that another reason that this proposal is a ๐ from me is that it would help to unify the AST of the javascript ecosystem of tools (especially regarding babel, webpack, uglify, other parsers like acorn, esprima, etc). https://github.com/estree/estree was started as most know to unify this effort but it's diverted in certain ways (babel 5 to babel 6 https://github.com/babel/babylon/#output although we have a compat plugin).
I believe some people wanted to do an ESTree 2 (I believe the reasoning that Sebastian changed the AST in Babel 6 was just that the old AST was difficult to use when trying to create Babel plugins). If this becomes part of JS then it could be a nice forcing function for the tools that need to output the binary AST and everyone else that opts in to this could give perf benefits to developers given we can pass the same AST between all tools (ex: babel -> webpack -> uglify) and not have to re-parse in that toolchain as well as in the browser itself.
Discussion on the actual AST is a bit different but ya just saying the idea is enticing!
Also interesting is that future proposals could propose the AST node as part of the staging process (and Babel can help with that given we can start implementation at Stage 0). So we'll already have progress in that area
Can be another issue, but I had a question about if comments would be encoded in the AST or if that can be an extension? Webpack/uglify use comments to do chunking/dead code elim (it doesn't have to be the in output to the browser of course but if the AST is extensible enough to encode that info in a node property/comment that would be good)
I'm revisiting this proposal a few months later, and I'm wondering: could this proposal be better specified in terms of raw bytes? Currently, it seems largely spec'd in terms of a JSON-like format, but IMHO that doesn't really seem like it's as small as it could be. For one, it could leverage LEB128 much like WASM does and in a similar fashion. It also doesn't need to keep type names or even operator names as strings, so I feel being a bit more binary could realize the proposal's intent a little better.
The following snippet demonstrate how the name of a FunctionExpression
can be captured:
let a = function f() {
return function g() {
return f;
}
};
let b = a()(); // This is `f`
The same technique would probably work for Method
, Getter
, Setter
.
For the moment, we have nowhere to store this information.
Currently AssertedVarScope has the following structure:
interface AssertedVarScope {
// checked eagerly during transformation
attribute FrozenArray<IdentifierName> lexicallyDeclaredNames;
attribute FrozenArray<IdentifierName> varDeclaredNames;
// checked lazily as inner functions are invoked
attribute FrozenArray<IdentifierName> capturedNames;
attribute boolean hasDirectEval;
};
So at least it doesn't distinguish between let
and const
.
This is troublesome for streaming compilation because the type of each name is unknown until we hit VariableDeclaration, which can appear later in the statement list. (there can be many another statements before it), and we cannot create efficient representation of the scope until hitting all of them.
Also, it would be nice if it also distinguish between var
and function
, which might be too-SpiderMonkey specific tho.
I'm not convinced it's required to marry the binary format too tightly to the JS source, and there are things you could do to make it simpler. It's not like the binary format is meant to be human-readable, just machine-readable. There are also idioms in compiled-to-JS code (e.g. from Elm, CoffeeScript, and Babel) that could also stand to benefit from a binary format that that's at least aware of some of their needs.
Here's a few of the ideas I have to throw at the wall. Apologies if this comes across as a bit rambly.
Adding a constant for undefined
void 0
(what UglifyJS replaces strict-mode undefined
with) and similar.undefined
variable over void 0
for code size reasons.Allowing synthetic, anonymous locals that aren't viewable by direct eval
or with
Adding built-in source map support.
Adding a built-in description/metadata field.
Adding combined type-checking operators for typeof x === "number"
, etc.
typeof
use cases, and engines already desugar these 100% of the time. (It's an obvious optimization.)Adding combined coersion operators for x | 0
/~~x
, !!x
, x + ""
, etc.
x | 0
or at least act as a synonym.Separating the pragmas from the source/function body.
Separating strings from the code, instead storing debug string/name references as {int offset; int length}
pairs and putting their values consecutively in a string table before any real code references.
Reducing logical "and"/"or" to corresponding if/else variants: test && other
โ(tmp = test) ? tmp : other
, test || other
โ (tmp = test) ? other : test
.
Using breakable blocks with an expression-based AST instead of a statement-based one.
let x = foo(); try { x = foo(); } catch (e) { if (!(e instanceof SafeError)) throw e }
, where it could be simplified to something roughly like let x = try { foo() } catch (e) { e instanceof SafeError ? undefined : throw e }
. Edit: Missed a backtick ๐Requiring fallthrough to be explicit in switch
statements.
switch
would become much smaller when encoded.Declaring locals before script/module/function body.
Declaring imports/exports before module data table.
Encoding the bytecode as a hybrid register/stack machine.
How would a binary AST encoded resource be delivered when SRI is involved? How should the hash be calculated on both the encoder and the decoder sides?
The current text of the proposal does not mention anything about mapping binary AST offsets back to source positions. If the original input is JavaScript source code, then I understand that debugging tools could show the pretty-printed version of the AST. However, if the input JS or the binary AST itself was produced by a compiler from another language, we need some way to map back to the original source file.
Directly leveraging source maps as they currently exist does not seem possible. They rely on very precise positions in the compiled .js file. If the .js file used for source mapping is the product of pretty-printing the AST, the source maps are very sensitively dependent on the pretty-printing algorithm. This would require to specify that algorithm along with the binary AST specification. IMO that is not a desirable situation.
Instead, I would suggest two possible paths to address this.
The first would be to store original positions of nodes directly in the AST. This has the advantage that it would produce extremely accurate source positions through the compilation pipeline of the VM, eventually resulting in better source mapping for the binary AST than what we enjoy with source maps at the moment. The disadvantage is that the binary AST itself is encumbered by positions, which would probably amount to a significant portion of the file size. Although useful for development "builds", it would unnecessary increase bloat for production files. That could be mitigated by a global flag at the beginning of the file telling whether positions are stored or not.
An alternative would be to define an equivalent to source maps, specifically designed for binary AST. Such source maps would map binary AST offsets to source positions.
It is almost certainly too early to worry about this, but a couple notes while I'm thinking of them:
The tree grammar specified does not allow for (var a = b in c);
, which is a legal program (in sloppy mode, assuming Annex B) as of tc39/ecma262#614.
There's a variety of ways that well-typed trees can fail to correspond to real programs, which should all be captured in this project (except that said project hasn't been updated for async
/await
yet). For example, you can't have an if
with an else
as the body of an if
without an else
, even though the tree types can represent that. You also have to make sure that Identifiers
are actually identifiers and that sort of thing. These aren't captured by the early error rules because they don't match the lexical grammar, and so presumably will need to be checked explicitly.
The type for TemplateExpression
can be made more strict by having something like
interface Interpolation {
attribute Expression value;
attribute TemplateElement after;
}
interface TemplateExpression : Node {
attribute Expression? tag;
attribute TemplateElement start;
attribute FrozenArray<Interpolation> elements;
};
instead of the current TemplateExpression
definition which just has a list of elements which mixes Expression
s and TemplateElement
s. Shift doesn't currently do this because it's kinda awkward to use (or, well, I think that was the justification, but have now forgotten), but this project might find it to be worth it.
All stage 1+ proposals need to live in the tc39 org on github. Please follow the transfer instructions to move this repo to the appropriate place.
The time required to create a full AST (without verifying annotations) was reduced by ~70-90%, which is a considerable reduction since parsing time in SpiderMonkey for the plain JavaScript was 500-800 ms for the benchmark.
Is "the time required to create a full AST" 500-800 ms? Or is it a subset of that? Maybe stating the actual reduction in milliseconds would be helpful.
Currently FormalParameters params
field is inside FunctionExpressionContents
interface, which has [Lazy]
attribute inside LazyFunctionExpression
interface.
Function object's .length
property needs that information, even before executing the function.
interface LazyFunctionExpression : Node {
attribute boolean isAsync;
attribute boolean isGenerator;
attribute BindingIdentifier? name;
attribute FrozenArray<Directive> directives;
[Lazy] attribute FunctionExpressionContents contents;
};
interface FunctionExpressionContents : Node {
attribute boolean isFunctionNameCaptured;
attribute boolean isThisCaptured;
attribute AssertedParameterScope parameterScope;
attribute FormalParameters params;
attribute AssertedVarScope bodyScope;
attribute FunctionBody body;
};
if we're going to entirely skip parsing FunctionExpressionContents contents
field for lazy functions until executing the function, the length of formal parameters should be put into LazyFunctionExpression
interface.
Currently FunctionBody
is FrozenArray
.
typedef FrozenArray<Statement> FunctionBody;
but ArrowExpressionContents
contains an attribute with (FunctionBody or Expression)
type, which, I think, means FunctionBody
and Expression
should have common super interface.
interface ArrowExpressionContents : Node {
...
attribute (FunctionBody or Expression) body;
};
So FunctionBody
should be the following definition.
interface FunctionBody : Node {
attribute FrozenArray<Statement> statements;
};
Curious as to why even allow early errors to become an AST?
For the moment, literal numbers and booleans are represented by
interface LiteralInfinityExpression : Node { };
interface LiteralNumericExpression {
attribute double value;
};
interface LiteralBooleanExpression : Node {
attribute boolean value;
};
This typically means that a boolean will be stored as 2 bytes and a number other than infinity as 9 bytes. The latter is particularly odd, since the vast majority of numbers are 0 and 1. So, in AST v2, I had success decreasing the size of files by introducing special literal values for 0
, true
and false
.
One way of doing this would be to introduce in the grammar
interface LiteralInfinityExpression : Node { };
typedef (LiteralZeroExpression or
LiteralOneExpression or
LiteralDoubleExpression)
LiteralNumericExpression;
interface LiteralDoubleExpression {
attribute double value;
};
interface LiteralZeroExpression {};
interface LiteralOneExpression {};
typedef (LiteralTrueBooleanExpression or LiteralFalseBooleanExpression) LiteralBooleanExpression
interface LiteralTrueExpression : Node { }
interface LiteralFalseExpression : Node { }
Similarly, it seems odd to reserve one byte for UpdateExpression
to determine whether it's a prefix or a postfix expression, so we could rewrite
typedef (PrefixUpdateExpression or PostfixUpdateExpression) UpdateExpression;
interface PrefixUpdateExpression : Node {
attribute UpdateOperator operator;
attribute SimpleAssignmentTarget operand;
};
interface PostfixUpdateExpression : Node {
attribute UpdateOperator operator;
attribute SimpleAssignmentTarget operand;
};
Admittedly, introducing a change in the AST solely for the sake of compression sounds a bit odd. An alternative would be to either trust compression (but this didn't seem to work that well in AST v2) or somehow make the (de)tokenizers smart enough to introduce the above changes transparently (note to self: the deanonymizer would certainly be smart enough to do this).
Starting the conversation here.
It is unclear to me from the documentation provided whether the size savings specified are the result of the format essentially requiring compression as part of its specification, vs actually being meaningfully smaller; esp. compared to minified code+gzipped code that is currently the de jure deployment.
I ask because this is a significant body of new codegen, and new parsers with new attack surface -- I think it is reasonable to know how the size of the binary AST compares to the size of a standard conforming JS in a normal tool assisted environment (which to be clear, the binary AST would require).
I haven't attempted to benchmark that, but in the current text source world, when parsing a function, the parser must:
"use strict"
, restart from 1 with different options.That seems awkward. Maybe there is a better way to handle this in the binary world. Or maybe "use strict"
doesn't really impact parsing all that much in the binary world, to be checked.
They're listed as strings in the spec, but it would seem highly inefficient to encode them that way. Are they in fact encoded as strings? (If not, you could encode them as LEB128 integers.)
Binary AST could hold additional type information for variables, parameters and return types. This can help VMs do some optimizations ahead of time. Of course types should be statically validated but potentially this static analyse could be done by the same toolchain as used for binary AST creation. Was this idea considered?
Can we have a commitment to include a magic number in this new format? A lot of the pain that Node is experiencing around supporting ES Modules is the inability to determine whether a file is ESM or CJS based on the ambiguities that exist in the syntax and the lack of some kind of pragma. The controversy around .mjs vs .js exists almost entirely due to this ambiguity. Since Node can't use mime-types like the browser, a leading pragma or magic number in these new file formats that VMs can load would provide much more optionality than having to simply rely on file extension.
As @sebmarkbage and others have pointed out, the deferred-until-invocation-of-function error model is not amenable to streaming interpreters. That model is amenable to streaming compilers, but to be able to interpret a binary AST stream incrementally, errors need to be even more lazy.
It is not clear to me how to reconcile the two use cases. Maximum laziness on errors enables streaming interpreters, but hurts compiled code performance by requiring runtime checks. Per-function laziness means the entire function needs to be inspected before any code can execute.
In the browser world, streaming interpreters is not a compelling use case IMO. However, streaming interpreters is more compelling for an engine specialized only for run-once code (e.g., no JITs).
We should either come up with a technical solution that enables both use cases, or explicitly decide to not support one.
I believe that we should consider offering a subset of the syntax specialized for JSON-style literal expressions.
JSON.parse()
to TypeArray and create a new JSON.binify()
;typedef (... // Previous stuff
LiteralExpression)
Expression;
typedef (LiteralObjectExpression or
LiteralBooleanExpression or
LiteralStringExpression or
LiteralNullExpression or
LiteralNumericExpression or
LiteralArrayExpression)
LiteralExpression;
typedef (EagerLiteralObjectExpression or SkippableLiteralObjectExpression) LiteralObjectExpression;
typedef (EagerLiteralArrayExpression or SkippableLiteralArrayExpression) LiteralArrayExpression;
interface EagerLiteralObjectExpression {
attribute FrozenArray<LiteralObjectProperty> properties;
}
[Skippable] interface SkippableLiteralObjectExpression {
attribute EagerLiteralObjectExpression value;
}
interface LiteralObjectProperty {
attribute LiteralPropertyName name;
attribute LiteralExpression value;
}
interface EagerLiteralArrayExpression {
attribute FrozenArray<LiteralExpression> elements;
}
[Skippable] interface SkippableLiteralArrayExpression {
attribute EagerLiteralArrayExpression value;
}
I really think this proposal is great, and I wanted to suggest the addition of one relatively small high-level goal that I didn't see in the proposal.
It can be very useful in some cases to inline critical JavaScript into an HTML page, especially if the browser doesn't support HTTP/2 server push. I would love to see this project have as a high-level goal the ability for ASTs to be embedded into an HTML <script>
tag. This could either be as the contents of the tag, or, more likely, as a data URI for the src
attribute.
from https://bugzilla.mozilla.org/show_bug.cgi?id=1497788
If there's duplicate parameters, what should be in AssertedParameterScope
?
The current spec requires AssertedPositionalParameterName
for them.
like,
function f(a, a) {}
will have the following data for AssertedParameterScope
:
AssertedParameterScope {
paramNames: [
AssertedPositionalParameterName {
index: 0,
name: "a",
...
},
AssertedPositionalParameterName {
index: 1,
name: "a",
...
},
],
...
}
given the purpose of Asserted*Scope
is to provide the information about binding, having duplicate entry without any info about duplication won't be nice.
The situation is following:
what I can think of is the following 2 solutions:
Asserted*Name
with index for non-last duplicate parametersfor example, AssertedPositionalDuplicateParameterName
AssertedParameterScope {
paramNames: [
AssertedPositionalDuplicateParameterName {
index: 0,
name: "a",
...
},
AssertedPositionalParameterName {
index: 1,
name: "a",
...
},
],
...
}
this way, CheckParameterNames
and CheckPositionalParameterIndices
can be done in almost same way as current ones
AssertedParameterScope {
paramNames: [
AssertedPositionalParameterName {
index: 1,
name: "a",
...
},
],
...
}
this is smallest, but CheckParameterNames
and CheckPositionalParameterIndices
should be modified in order to check duplication
Is this a typo or is there a semantics associated to this?
There seems to be very little mention of modules in this spec and issue queue - I assume a ECMAScript Module target will be supported? Ideally this should likely be the primary use case here.
I recently asked on the chat about a planned way to request Binary AST from the server and got the following answer:
@Yoric: We plan to have a mechanism, but we haven't attempted to design it yet. The vague consensus for the moment was to use something like
<script src="..." binsrc="...">
, which seems like the cheap way to keep it backwards-compatible.
While this is a relatively simple solution, I have a concern about limitations it imposes.
In particular, in an ideal world I think it would be reasonable to support a usecase where e.g. a shared CDN with lots of JavaScript libraries could simply create Binary AST variants of all the assets, and return them instead of regular JavaScript when it knows that 1) browser supports it and 2) that such change would be mostly invisible to the consumer (that is, JS was indeed requested via <script>
or import(...)
or other means purely for execution, and not with XMLHttpRequest
or fetch
).
To support usecases like that, signal for Binary AST support should come not from HTML level (as it's much harder to get HTML updated on all the websites where script is inserted), but rather on network level.
One way to do this would be adding binast
or similar marker to the Accept-Encoding
list for script requests in supported browsers, which would tell the server that Binary AST version can be safely returned with Content-Encoding: binast
in the response.
Using encoding headers for this goal feels quite natural, as it's mostly an encoding format for JavaScript, although one might argue that because it's not lossless in terms of debugging information, it doesn't belong to Accept-Encoding
/Content-Encoding
headers - in that case, I'm open to any proposals.
I got an impromptu drive-by review from the Firefox DOM team. One of the suggestions was to turn all our Interfaces into Dictionaries.
This question might not yet be answerable.
I'm curious about the impact on this proposal on resource-constrained devices.
Am I correct that a spec-compliant VM would need to implement a second, completely separate parser for the binary AST format? A resource-constrained device may find the added weight of this implementation burdensome.
However, if such a VM could be deployed with only the binary AST parser implementation, is it likely or unlikely to actually reduce the footprint of the VM compared to the way things currently stand (in terms of both memory and persistent storage usage)?
This proposal would certainly improve parsing speed on a puny device, but I'm wondering if there are other costs.
Thanks!
UPDATE: By "persistent storage usage" I mean size of the firmware blob.
Are you aware of JSZap from Microsoft?
https://www.microsoft.com/en-us/research/publication/jszap-compressing-javascript-code/
Seems very close to the purpose of this proposal
Consider the following snippet:
try {
...
} catch (e) {
return function() {
throw e; // Or, really, do anything with `e`.
}
}
In this snippet, the anonymous function captures a binding e
which was introduced implicitly by a CatchClause
. For the moment, we do not have a way of representing this in the AST.
As discussed on Gitter, we might want to do this to simplify handling and provide optimised representations (like binast/binjs-ref#239) more easily for both kinds of numbers.
Some points from the discussion:
double
can store infinity values just fine in any IEEE.754 compatible representation, so this shouldn't be an issue, although one voiced concern is that WebIDL defines double
as finite floating 64-bit numbers, but this likely shouldn't be a problem due to how we use it.LiteralNumericExpression
that would produce something like 1e111111111...
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1524302#c2), but implementors would need to do the same for LiteralInfinityExpression
as well, so it's only a matter of place where to put the branching.In the meeting notes of the July TC39 meeting, it was said that the space savings of the AST are minimal (compared to a minified version) - about 5%, so AST-ing the source code doesn't give a lot of benefit in terms of space-savings.
OTOH, the main (and considerable!) parse-time savings are accomplished from:
* Knowing which variables are hoisted in a scope
* Knowing which variables are "closed over" in a scope
* Marking the presence of an eval
or with
in a scope
* Early errors can de-optimize lazy parsing
* Efficiencies when using binary instead of text characters.
Then again, defining an AST is a big endeavor, and may be fraught with politics, not to mention having to maintain it forever every time the language changes syntactically.
So given that an AST is problematic, and yet the parse-time savings from the proposal are considerable, I was wondering whether we could get the parse-time savings by using various markers in the source code to add the information that is missing. In other words, the above points that are saved can be encoded into the source code using special comments or string directives. Something like:
//@efficient-parse
var x;
function f() {
//@efficent-parse: hoisted[x]
function use_x() {
use(x);
}
var x;
}
(I wouldn't even call the above a strawman proposal, but just an incredibly rough draft to determine where I'm going here...)
This will probably enable parsers to deal with the first of the four points described in the proposal, but will not let them deal with the last point (efficiencies from the use of binary instead of text characters), given that it's still keeping the source code as source code with text characters. Unfortunately, I don't know how much of the performance improvements were gotten from each of the points above. It may be that 90% of the optimizations was because text was discarded, which makes my proposal of keeping the source code as text dead on arrival. :-)
So I can't really know whether my proposal is efficient, but if it is, I believe the benefits of keeping the source code as source code and not going into the endeavor of defining an AST for JS may outweigh the decrease in performance that keeping source code as text will make.
So I fully understand the differences, but some visitors may not. It might be nice to have a section about the differences between this and WebASM, and maybe (for the rest of us) why this is seen as more suitable or different than WebASM philosophically/practically.
Another extension for https://github.com/binast/ecmascript-binary-ast/issues/50
The requirement on the implementation is following:
https://bugzilla.mozilla.org/show_bug.cgi?id=1475458
To satisfy it without changing the implementation (this is just for reference. I'm also thinking about changing implementation side), the following structure is necessary:
enum ParameterKind {
"simple",
"default",
"destructuring",
"destructuring default",
"rest",
"destructuring rest"
};
interface AssertedParameterName {
attribute unsigned short index;
attribute ParameterKind kind;
attribute IdentifierName name;
attribute boolean isCaptured;
};
interface AssertedParameterScope {
attribute FrozenArray<AssertedParameterName> paramNames;
attribute boolean hasDirectEval;
};
index
is where the binding is defined in parameters list.
for example, function f(a, b=10, {c}, [d, e] = [], f, ...g) {} has the following scope data:
AssertedParameterScope {
paramNames: [
AssertedParameterName {
index: 0, kind: "simple", name: "a", isCaptured: false
},
AssertedParameterName {
index: 1, kind: "default", name: "b", isCaptured: false
},
AssertedParameterName {
index: 2, kind: "destructuring", name: "c", isCaptured: false
},
AssertedParameterName {
index: 3, kind: "destructuring default", name: "d", isCaptured: false
},
AssertedParameterName {
index: 3, kind: "destructuring default", name: "e", isCaptured: false
},
AssertedParameterName {
index: 4, kind: "simple", name: "f", isCaptured: false
},
AssertedParameterName {
index: 5, kind: "rest", name: "g", isCaptured: false
},
],
hasDirectEval: false
}
Also, we need bodyScope
before param
.
Thanks for such a cool proposal!
Continuing on from this thread, I wanted to suggest that there should be some sort of structured way for browsers to tell servers whether or not the browser understands Binary AST files, and, if supported, which version(s) are understood.
I think the most likely candidate for implementation is the HTTP Accept header, although it gets a bit complicated when combined with features like server push or inlined scripts.
If browsers don't send an Accept header or something like it, servers will have to use user-agent sniffing to figure out whether to send Binary AST or JavaScript. In my experience with sending different versions of JS to browsers, this is a pretty cumbersome and error-prone solution.
Thanks again for this!
Right now the reference encoder feels free to sort parameter bindings lexicographically, which doesn't play nicely with engines.
This might be beyond the scope of the current proposal, but I expect that we'll want to apply binary parsing also for JSON data.
For the specific case of JSON, I suspect that we want the data to be checked eagerly. Does this mean that we want to guarantee that some subsets of EcmaScript will be checked eagerly? That we want to be able to be able to specify that some files need to be parsed eagerly?
When experimenting streaming compilation from multipart .binjs file to SpiderMonkey bytecode,
the order of the fields often become trouble [1][2][3].
The issue is that, if we don't seek/lookahead, and don't keep on-memory AST converted from .binjs file, we should compile in the same order as the .binjs file.
but SpiderMonkey often emits the bytecode in the different order (or interleaving sub-trees) than the original JS syntax itself,
and sometimes applies optimization depends on the nodes which appears later.
the issue comes from that, we cannot lookahead without extra overhead with current format,
(thus current experimental implementation doesn't support seek),
because non-Skippable nodes doesn't have length property at the beginning of the serialized data.
of course we could emit the different bytecode than original JS, in the same order as .binjs file, but it would be the source of extra .binjs-specific bugs, and I'd like to avoid it as much as possible.
So, it would be nice if we can support (maybe partial-) tree-traversal without reading same field twice, in the file-format level.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1456006#c1
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1456404#c2
[3] https://github.com/binast/ecmascript-binary-ast/issues/33
from https://bugzilla.mozilla.org/show_bug.cgi?id=1472103#c20
attribute unsigned long length;
would need a comment that it's function object's length property.
At first I thought this is the same issue as https://github.com/binast/ecmascript-binary-ast/issues/30, but I guess it's a bit different.
While creating binding data for parameters, we want a list of positional formal parameters+indices which directly maps to arguments
element, so that the binding name maps to arguments slot at the point of reading scope data.
With current spec, it's unknown before parsing FormalParameters
.
What I propose is the following:
interface AssertedPositionalParameterName {
attribute unsigned short index;
attribute IdentifierName name;
attribute boolean isCaptured;
};
interface AssertedParameterName {
attribute IdentifierName name;
attribute boolean isCaptured;
};
typedef (AssertedPositionalParameterName or
AssertedParameterName)
AssertedMaybePositionalParameterName;
interface AssertedParameterScope {
attribute FrozenArray<AssertedMaybePositionalParameterName> paramNames;
attribute boolean hasDirectEval;
};
AssertedPositionalParameterName
contains the index, which is the index in parameter list, and also the index in arguments
. (to be clear, it's not the index in paramNames
array).
AssertedParameterName
is basically the same thing as current AssertedBoundName
.
for example, function f(a, b=10, {c}, [d, e] = [], f, ...g) {}
has the following scope data:
AssertedParameterScope {
paramNames: [
AssertedPositionalParameterName {
index: 0, name: "a", isCaptured: false
},
AssertedParameterName {
name: "b", isCaptured: false
},
AssertedParameterName {
name: "c", isCaptured: false
},
AssertedParameterName {
name: "d", isCaptured: false
},
AssertedParameterName {
name: "e", isCaptured: false
},
AssertedPositionalParameterName {
index: 4, name: "f", isCaptured: false
},
AssertedParameterName {
name: "g", isCaptured: false
},
],
hasDirectEval: false
}
there a
and f
are positional parameters and their indices are 0 and 4.
isSimpleParameterList
field is removed, because it's obvious from paramNames
(by, whether there's at least AssertedParameterName
).
Even if it's well-typed, Babylon ASTs permit things which are not valid JavaScript syntax (for example, module declarations in scripts). What's the idea for how to validate these ASTs? I suppose, with the "delayed early error" idea, you validate at the function granularity, when the function is actually called; this would include include not just what would be JavaScript early errors, but also anything that would be malformed about the Babylon AST--is that what you had in mind?
README states:
Why not use WebAssembly?
There are massive existing untyped codebases, and there is no easy way to convert an untyped, garbage collected language to WebAssembly. And even if there were, there is no guarantee that it would be any faster to transmit/parse/start than what we currently have.
whereas WebAssembly FAQ states:
The kind of binary format being considered for WebAssembly can be natively decoded much faster than JavaScript can be parsed (experiments show more than 20ร faster). On mobile, large compiled codes can easily take 20โ40 seconds just to parse, so native decoding (especially when combined with other techniques like streaming for better-than-gzip compression) is critical to providing a good cold-load user experience.
For the moment, we have discussed the following annotations:
(moved from binast/binjs-ref#131)
IfStatement
has optional alternate
statement, and the existence of the alternate
is unknown until we start parsing it, that means, it's unknown when generating branch opcode or generating bytecode for consequent
interface IfStatement : Node {
attribute Expression test;
// The first `Statement`.
attribute Statement consequent;
// The second `Statement`, if present.
attribute Statement? alternate;
};
So, with current IfStatement
interface, we should modify the branch's kind (source note) when it turns out that there's alternate
.
It would be better that kind of information is known at the beginning.
Separate IfStatement
into IfStatement
without alternate
, and IfElseStatement
with alternate
interface IfStatement : Node {
attribute Expression test;
attribute Statement consequent;
};
interface IfElseStatement : Node {
attribute Expression test;
attribute Statement consequent;
attribute Statement alternate;
};
alternate
becomes smaller in .binjs fileA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.