rbuckton / grammarkdown Goto Github PK

View Code? Open in Web Editor NEW

122.0 122.0 23.0 4.56 MB

Markdown-like DSL for defining grammatical syntax for programming languages.

Home Page: https://rbuckton.github.io/grammarkdown/

License: MIT License

TypeScript 43.30% JavaScript 1.99% HTML 54.57% CSS 0.13%

grammarkdown's People

Contributors

Stargazers

Watchers

grammarkdown's Issues

No check for guard which references undefined parameter

Running this project on the following grammar does not produce any check failures:

A[One] ::
  [+One] `x`
  [+Two] `x`

I would expect it to fail a check, because the [+Two] gating the second production makes no sense because A does not have a Two parameter.

Support parsing grammar fragments

It would be great to support parsing of fragments along the lines of NonTerminal, Initializer?, and Pattern[U]. This would allow <emu-grammar> to be used universally for full productions and references (and Emd could get out of the business of parsing the contents inside of pipes).

All of the above cases should emit a element with the appropriate attributes (none, optional, and params="U" respectively).

Parse '[>' as '[>'

See tc39/ecma262#570 (comment)

Related: tc39/ecmarkup#266

[Suggestion] Add support for Unicode character ranges

Originally part of #9:

@jmdyck wrote:

There are a couple of constructs used in ES6 productions that don't appear to be supported by grammarkdown:

24.3.1's production for DoubleStringCharacter has the RHS:
    SourceCharacter but not one of " or \ or U+0000 through U+001F
...

Add emu-grammar lint rules from ecmarkup to Grammarkdown's grammar checker

ecmarkup has a few lint rules that grammarkdown doesn't enforce on its own, but that make sense:

[empty] assertions should have no other content: https://github.com/tc39/ecmarkup/blob/52df46d6177a14f6d101a34aecbebf5834c77571/src/lint/utils.ts#L70
but not operators cannot be empty: https://github.com/tc39/ecmarkup/blob/52df46d6177a14f6d101a34aecbebf5834c77571/src/lint/utils.ts#L132
one of operators cannot be empty: https://github.com/tc39/ecmarkup/blob/52df46d6177a14f6d101a34aecbebf5834c77571/src/lint/utils.ts#L144
Parsing constraints on a RHS should report an error if no symbols follow: https://github.com/tc39/ecmarkup/blob/52df46d6177a14f6d101a34aecbebf5834c77571/src/lint/utils.ts#L66

There are also a few cases that indicate some property types include | undefined that shouldn't, including:

Production.body (unconditionally set during parse)
Argument.name (unconditionally set during parse)

In other cases, grammarkdown already reports errors during parse that are also being reported by lint:

Arguments should have operators:
- ecmarkup check
- grammarkdown check

In general, I'd like to move a lot of these checks to grammarkdown since they aren't precisely "lint" checks (i.e., cases where you're enforcing stylistic consistency with otherwise acceptable source text), but rather are syntactic or semantic checks that should result in grammarkdown reporting errors regardless of consumer (be it ecmarkup, or the grammarkdown-vscode extension, etc.).

cc: @bakkot

bin/grammarkdown points to nonexistent file

grammarkdown/bin/grammarkdown

Line 2 in e225ef1

require('../out/lib/cli.js')

refers to ../out/lib/cli.js, but the file it actually wants ends up in ../dist/cli.js.

emitting non-strings

For ecmarkup, I need to associate the generated HTML productions with their original Production objects. The obvious way to do that is to make an emitter which creates JSDom nodes directly and sticks them in a WeakMap pointing back to the original object. But the Emitter API is string-based. Would it be possible to allow more general emitters which support emitting other types?

My current workaround is to assume that the production and emu-rhs elements are in the same order as in the original and index each list, which seems to work. So there's no urgency in supporting this.

Sidebar: as a question of API design, why is the emitter a property of the grammar? I would expect it to be passed in to the emit method. (That would be helpful here because then the emit method could have a type parameter for the type produced by the emitter, which doesn't work when the emitter is state instead of being an argument.)

The condition `but not one of ...` does not emit properly for ecmarkup.

Currently emits:

<emu-nt>SourceCharacter</emu-nt> one of <emu-t>/</emu-t> or <emu-t>*</emu-t>

But it should be:

<emu-nt>SourceCharacter</emu-nt> <emu-gmod>but not one of <emu-t>/</emu-t> or <emu-t>*</emu-t></emu-gmod>

Add support for ins and del tags in one of grammars

This is a continuation to #26 for adding more support for <ins> and <del> tags.

Ideally the following would work:

<emu-grammar type="definition">
  Keyword :: one of
    `await` `break` `case` `catch` `class` `const` `continue` `debugger` `default` `delete` `do` `else` <ins>`enum`</ins> `export` `extends` `finally` `for` `function` `if` `import` `in` `instanceof` `new` `return` `super` `switch` `this` `throw` `try` `typeof` `var` `void` `while` `with` `yield`
</emu-grammar>

Current output:

<emu-production name="Keyword" type="lexical" oneof="" id="prod-Keyword">
  <emu-nt><a href="#prod-Keyword">Keyword</a></emu-nt>
  <emu-geq>::</emu-geq>
  <emu-oneof>one of</emu-oneof>
  <emu-rhs>
    <emu-t>await</emu-t>
    <emu-t>break</emu-t>
    <emu-t>case</emu-t>
    <emu-t>catch</emu-t>
    <emu-t>class</emu-t>
    <emu-t>const</emu-t>
    <emu-t>continue</emu-t>
    <emu-t>debugger</emu-t>
    <emu-t>default</emu-t>
    <emu-t>delete</emu-t>
    <emu-t>do</emu-t>
    <emu-t>else</emu-t>
    <emu-t>enum</emu-t>
    <emu-t>export</emu-t>
    <emu-t>extends</emu-t>
    <emu-t>finally</emu-t>
    <emu-t>for</emu-t>
    <emu-t>function</emu-t>
    <emu-t>if</emu-t>
    <emu-t>import</emu-t>
    <emu-t>in</emu-t>
    <emu-t>instanceof</emu-t>
    <emu-t>new</emu-t>
    <emu-t>return</emu-t>
    <emu-t>super</emu-t>
    <emu-t>switch</emu-t>
    <emu-t>this</emu-t>
    <emu-t>throw</emu-t>
    <emu-t>try</emu-t>
    <emu-t>typeof</emu-t>
    <emu-t>var</emu-t>
    <emu-t>void</emu-t>
    <emu-t>while</emu-t>
    <emu-t>with</emu-t>
    <emu-t>yield</emu-t>
  </emu-rhs>
</emu-production>

The <ins> is gone. What it should look like:

<emu-production name="Keyword" type="lexical" oneof="" id="prod-Keyword">
  <emu-nt><a href="#prod-Keyword">Keyword</a></emu-nt>
  <emu-geq>::</emu-geq>
  <emu-oneof>one of</emu-oneof>
  <emu-rhs>
    <emu-t>await</emu-t>
    <emu-t>break</emu-t>
    <emu-t>case</emu-t>
    <emu-t>catch</emu-t>
    <emu-t>class</emu-t>
    <emu-t>const</emu-t>
    <emu-t>continue</emu-t>
    <emu-t>debugger</emu-t>
    <emu-t>default</emu-t>
    <emu-t>delete</emu-t>
    <emu-t>do</emu-t>
    <emu-t>else</emu-t>
    <emu-t><ins>enum</ins></emu-t>
    <emu-t>export</emu-t>
    <emu-t>extends</emu-t>
    <emu-t>finally</emu-t>
    <emu-t>for</emu-t>
    <emu-t>function</emu-t>
    <emu-t>if</emu-t>
    <emu-t>import</emu-t>
    <emu-t>in</emu-t>
    <emu-t>instanceof</emu-t>
    <emu-t>new</emu-t>
    <emu-t>return</emu-t>
    <emu-t>super</emu-t>
    <emu-t>switch</emu-t>
    <emu-t>this</emu-t>
    <emu-t>throw</emu-t>
    <emu-t>try</emu-t>
    <emu-t>typeof</emu-t>
    <emu-t>var</emu-t>
    <emu-t>void</emu-t>
    <emu-t>while</emu-t>
    <emu-t>with</emu-t>
    <emu-t>yield</emu-t>
  </emu-rhs>
</emu-production>

That renders correctly in ecmarkup. If you put the <ins> on the outside of the <emu-t> tag it looks odd because they apply a margin-right to the <emu-t> tag.

Feature request: Extract grammarkdown source from ecmarkup

I see there's a grammar for ECMAScript 2020. It'd be nice if grammarkdown provided an option for emitting Grammarkdown source from an Ecmarkup file. Something to the effect of:

$ wget https://raw.githubusercontent.com/tc39/ecma262/es2022/spec.html
$ grammarkdown --extract spec.html
@ line 5012 file:///Users/Alhadis/Desktop/spec.html

  StringNumericLiteral :::
    StrWhiteSpace?
    StrWhiteSpace? StrNumericLiteral StrWhiteSpace?

   StrWhiteSpace :::
     StrWhiteSpaceChar StrWhiteSpace?

   StrWhiteSpaceChar :::
     WhiteSpace
     LineTerminator
⋮

grammarkdown emitter does not include blank lines between productions wrapped in `<ins>`

A simple reproduction:

'use strict';
let { CoreAsyncHost, Grammar, GrammarkdownEmitter } = require('grammarkdown');

let source = `
X ::
  Y

<ins>
X ::
  Y
</ins>

X ::
  Y
`;

(async()=>{
  const grammarHost = CoreAsyncHost.forFile(source);
  const grammar = new Grammar([grammarHost.file], {}, grammarHost);
  await grammar.bind();
  (new GrammarkdownEmitter({})).emit(grammar.rootFiles[0], grammar.resolver, grammar.diagnostics, (file, result) => {
    console.log(result);
  });
})().catch(e => {
  console.error(e);
  process.exit(1);
});

This prints

X ::
    Y

<ins>X ::
    Y
</ins>
X ::
    Y

which seems suboptimal.

Build fails on node 12

gulp 3 uses an ancient version of natives, which does not work on node 12. (See gulpjs/gulp#2324.)

Checker test.grammar diagnostic is absolute-path sensitive

After cloning and running npm install, I got an immediate test failure.

Root cause appears to be the absolute path names in the baseline files named test.grammar.diagnostics. The contents of this file used in a test depends on the root directory where the project is cloned. It seems like this should be processed to remove the root directory prefix before comparing the local baseline against the reference baseline.

The verbatim test output:


[12:44:30] Starting 'test:lib'...
  ................................

  31 passing (766ms)
  1 failing

  1) Checker test.grammar diagnostics:
     Error: The baseline file 'test.grammar.diagnostics' has changed.
      at Object.compareBaseline (C:\code\grammarkdown\src\tests\diff.ts:140:15)
      at Context.<anonymous> (C:\code\grammarkdown\src\tests\checker-tests.ts:32:13)
      at callFn (C:\code\grammarkdown\node_modules\mocha\lib\runnable.js:326:21)
      at Test.Runnable.run (C:\code\grammarkdown\node_modules\mocha\lib\runnable.js:319:7)
      at Runner.runTest (C:\code\grammarkdown\node_modules\mocha\lib\runner.js:422:10)
      at C:\code\grammarkdown\node_modules\mocha\lib\runner.js:528:12
      at next (C:\code\grammarkdown\node_modules\mocha\lib\runner.js:342:14)
      at C:\code\grammarkdown\node_modules\mocha\lib\runner.js:352:7
      at next (C:\code\grammarkdown\node_modules\mocha\lib\runner.js:284:14)
      at Immediate.<anonymous> (C:\code\grammarkdown\node_modules\mocha\lib\runner.js:320:5)

[12:44:30] 'test:lib' errored after 865 ms
[12:44:30] Error in plugin 'gulp-mocha'

Request: location information without leading whitepspace

Consider the following program:

'use strict';

let gmd = require('.');

let host = gmd.SyncHost.forFile(`
  Foo :
    Bar
`);

let compilerOptions = {
  noChecks: false,
  noUnusedParameters: true,
};

let grammar = new gmd.Grammar([host.file], compilerOptions, host);
grammar.parseSync();

let file = grammar.sourceFiles[0];
let rhs = file.elements[0].body.elements[0];
let bar = rhs.head.symbol;
console.log(file.lineMap.positionAt(bar.pos));

This prints { line: 1, character: 7 }: that is, it reports that the location for the Bar nonterminal begins following the colon, rather than at the B.

This is unexpected, at least to me: every parser on astexplorer which reports location information (except, interestingly, TypeScript) gives locations which begin at the first non-whitespace character rather than including whitespace.

It is possible to work around this by instead using console.log(file.lineMap.positionAt(bar.end - bar.name.text.length)), but this is a bit awkward. Possibly there is some other API I should be using?

I've used a nonterminal for illustration, but the case I most care about currently is RightHandSide, because I'm trying to use this information to report errors and the location for a given RightHandSide typically points to the very end of the preceding one, which is confusing.

Copying and pasting rendered grammar leads to missing spaces

Low-priority feature request: Include the spaces.

Do not do file I/O to read the grammars

The grammar.ts file does require.resolve and fs.readFileSync to read some grammars.

This is unfortunate as it prevents grammarkdown, and thus ecmarkup, from being run inside the browser.

This is unfortunate because I am trying to turn compilation of the ECMAScript specification into a benchmark.

Ideally some build process would convert these grammars into template strings that could be require()ed directly, I think.

Import syntax to import an existing grammar

Add a mechanism to import an existing grammar file to define a superset grammar, such as the TypeScript grammar overlay for the ECMA-262-6 grammar:

@import "es6.grammar"
// superset grammar follows

"no argument given for parameter" diagnostic incorrectly includes nonterminals in prose assertions

Consider the following grammar:

HexDigits[Sep] ::
  HexDigit
  HexDigits[?Sep] HexDigit
  [+Sep] HexDigits[?Sep] `_` HexDigit

HexDigit :: one of
  `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` `a` `b` `c` `d` `e` `f` `A` `B` `C` `D` `E` `F`

NotCodePoint ::
  HexDigits[~Sep] [> but only if MV of |HexDigits| &gt; 0x10FFFF]

Grammarkdown has a diagnostic for the last line:

error GM2007: There is no argument given for parameter 'Sep'.

but I wouldn't really expect to write the parameter in the MV of |HexDigits| part. Also the parser doesn't allow a parameter to occur in that position: if you change it to MV of |HexDigits[~Sep]|, then you get

error GM2000: Cannot find name: 'HexDigits[~Sep]'.

documentation missing for #link

README.md should document the #link syntax.

Incorrect list parsing recovery for NoSymbolHereAssertion

When parsing the following production:

A : [no Foo]

The parser fails to recover due to the missing here keyword.

Feature request: multiple grammar parameter constraints on the RHS

In the RegExp named groups proposal, I'm updating the "Annex B" grammar for this new feature. I wanted to make the right hand side of a grammar production depend on two parameters, not one, but this seems to be not accounted for in Grammarkdown. Specifically, in the code, it looks like parseParameterValueAssertionTail expects a single token, and then a closing square bracket, but here, I'd have multiple comma-separated tokens. cc @bterlson

Anyway, this is not urgent; it can be worked around by separating into another nonterminal, as I did here.

Remove CLA requirement and change license to MIT

I plan to remove the CLA requirement and instead switch the package's license to the MIT License and depend on the GitHub Terms of Service and GitHub Community Guidelines.

Is there a sanctioned way to reference a code point with its official name?

The README mentions two ways to reference a Unicode code point, but fails to adequately specify them:

An abbreviation for a Unicode Code point, of the form <NBSP>

A Unicode code point, of the form U+00A0

grammarkdown.grammar doesn't mention the latter at all, and implicitly defines the former as one or more non-< non-> non-|LineTerminator| code points in between < and >. As for the implementation, scanner.ts uses scanString(CharacterCodes.GreaterThan, …), which pays special attention only to line terminators and >—and in particular allows < when represented as a character reference like < in e.g.

Nonterminal :::
  &lt;foo&lt;bar&gt;

scanner.ts also handles the second form upon encountering "U+" or "u+" followed by four hexadecimal digits (and notably not working for supplementary-plane characters such as U+1D306 TETRAGRAM FOR CENTRE "𝌆").

This is relevant because I want to express a nonterminal like <U+2212 MINUS SIGN>, which is not clearly valid or invalid according to documentation here and accepted by ecma262 build:spec while being rejected by esmeta (cf. tc39/ecma262@cc5e203 and https://github.com/tc39/ecma262/actions/runs/5270397258/jobs/9529840136?pr=3098 ).

Ideally, we'd end up with alignment between documentation and implementation on a form that represents a single code point in any Unicode plane by its hexadecimal value plus descriptive explanatory text (generally its name in the Unicode Character Database), e.g.

A single Unicode code point may be specified using one of the following forms:

U+ followed by four to six non-lowercase hexadecimal digits with no leading zeroes other than those necessary for padding to a minimum of four digits, in accordance with The Unicode Standard, Version 15.0.0, Appendix A, Notational Conventions (i.e., matching Unicode extended BNF pattern "U+" ( [1-9 A-F] | "10" )? H H H H or regular expression pattern ^U[+]([1-9A-F]|10)?[0-9A-F]{4}$ as in U+00A0 or U+1D306)

The preceding representation followed by a space and a printable ASCII prose explanation (such as a character name) free of < and > and line terminators, all wrapped in < and > (i.e., matching Unicode extended BNF pattern "<" "U+" ( [1-9 A-F] | "10" )? H H H H " " [\u0020-\u007E -- [<>]]+ ">" or regular expression pattern ^<U[+]([1-9A-F]|10)?[0-9A-F]{4} [\x20-\x3b\x3d\x3f-\x7e]+>$ as in <U+2212 MINUS SIGN>)

An abbreviation defined somewhere outside the grammar as an ASCII identifier name (i.e., matching Unicode extended BNF pattern [A-Z a-z _] [A-Z a-z _ 0-9]* or regular expression pattern ^[A-Za-z_][A-Za-z_0-9]*$ as in <NBSP>)

Pass through html tags

It would be nice to be able to use and ~~tags, especially. See also tc39/ecmarkup#95.~~

es6.grammar: parens for brackets

Your es6.grammar has the production:

FunctionStatementList[Yield] :
    StatementList(?Yield, Return)?

The parens should be square brackets.

Crash when missing `here` in `[no Symbol here]` assertion.

Stack trace:

Error: Recovery failed to advance.
    at Parser.parseList (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:387:27)
    at Parser.parseNoSymbolHereAssertionTail (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:478:30)
    at Parser.parseAssertion (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:508:25)
    at Parser.parseSymbol (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:687:25)
    at Parser.parseSymbolSpanRest (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:709:29)
    at Parser.tryParseSymbolSpan (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:703:25)
    at Parser.parseSymbolSpanRest (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:710:27)
    at Parser.tryParseSymbolSpan (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:703:25)
    at Parser.parseRightHandSide (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:773:41)
    at Parser.parseElement (C:\path\to\project\node_modules\grammarkdown\dist\parser.js:242:29)

Add HTML report output for cli

Add support for a html report output in the grammarkdown cli:

grammarkdown source.grammar --format html

This would have the following characteristics:

Grammar emit to a <dl> (HTML definition list).
- LHS of each production emit as a <dt> (HTML definition term).
  - id attribute for hyperlink navigation using url fragments (e.g. #Identifier).
  - Identifier as a <var> (HTML variable).
  - Identifier linked via <a> (HTML anchor/link) to an index at the end of the report.
- Production Parameters linked via <a> (HTML anchor/link) to an index at the end of the report.
- RHS of each production emit as a <dd> (HTML definition)
  - id attribute for hyperlink navigation using url fragments (e.g. #Identifier-AlternateId).
- Each Nonterminal is enclosed in a <var> (HTML variable) element.
- Each Nonterminal is linked via <a> (HTML anchor/link) to the first instance of the declaring production.
- Nonterminal arguments linked via <a> to the first instance of the declaring parameter.
- Each Terminal is enclosed in a <kbd> (HTML keyboard input) element.
Index following the grammar:
- Entries for each Nonterminal used in the grammar.
- Unordered list of each production in which the Nonterminal is used.

Example:

<section id="grammar">
<h1>Grammar</h1>
<dl class="grammar">
  ...
  <dt id="ImportDeclaration">
    <var><a href="#index-ImportDeclaration">ImportDeclaration</a></var>
  </dt>
  <dd id="ImportDeclaration-ade9f438">
    <span>
      <kbd>import</kbd>
      <var><a href="#ImportClause">ImportClause</a></var>
      <var><a href="#FromClause">FromClause</a></var>
      <kbd>;</kbd>
    </span>
  </dd>
  <dd>
    <span>
      <kbd>import</kbd>
      <var><a href="#ModuleSpecifier">ModuleSpecifier</a></var>
      <kbd>;</kbd>
    </span>
  </dd>

  <dt id="ImportClause">
    <var><a href="#index-ImportClause">ImportClause</a></var>
  </dt>
  ...
</dl>
</section>

<section id="index">
<h1>Index</h1>
<dl class="index">
  <dt id="index-ImportClause">
    <var><a href="#ImportClause">ImportClause</a></var>
  </dt>
  <dd>
    <var><a href="#ImportDeclaration">ImportDeclaration</a></var> : 
    <span>
      <kbd>import</kbd>
      <var>ImportClause</var>
      <var><a href="#FromClause">FromClause</a></var>
      <kbd>;</kbd>
    </span>
  </dd>
  ...
</dl>
</section>

Investigate dropping synchronous API

To simplify the internals of grammarkdown, I'm considering dropping the synchronous APIs such as SyncHost, parseSync, bindSync, checkSync, etc. However, doing so would have a significant impact on tools such as ecmarkup.

@bakkot: If I were to make this change, ecmarkup would need to make the walk and lint functions asynchronous. If necessary, I can create a PR against ecmarkup that does this in advance of this change. The walk function shouldn't be too much trouble because its only called by itself and Spec.prototype.build (which is already async), though I haven't investigated the impact it would have on lint.

If there are scenarios that you believe would be a blocker for me removing the synchronous APIs, please let me know. If there are no blockers, I would plan to ship this change with a semver-major bump to 3.0.0.

Request: warn for unused parameters

Currently there is no warning given for

Foo[A] ::
  Bar[+A]

Bar[A] ::
  `x`

I think it would be useful to have warnings for both of these, since neither nonterminal ever actually uses its parameter in any of its productions.

This only really makes sense as a warning if you know you're processing the entire grammar, rather than just some subset of it (because if you are only processing part of it, there could be another production elsewhere which does use the parameter). Perhaps there could be an argument to the checker telling it to make this assumption?

[Suggestion] Add unsupported ES6 construct

There are a couple of constructs used in ES6 productions that don't appear to be supported by grammarkdown:

~~24.3.1's production for DoubleStringCharacter has the RHS:~~ Edit: Moved to #14 - @rbuckton

B.1.4's productions for AtomEscape and ClassAtomNoDashInRange have the RHSs:

    [~U] DecimalEscape but only if the integer value of DecimalEscape is <= NCapturingParens

    \ ClassEscape but only if ClassEscape evaluates to a CharSet with exactly one character

RHS IDs

There needs to be a way for consumers to reference a particular RHS alternative of a production. This is done all over ECMAScript specifications where all the syntax for a feature is defined at the start of the clause and subsequent subclauses define semantics for specific alternatives of the productions.

Emu supports this via an a attribute on emu-rhs allowing authors to specify an alternative id that can remain stable across spec versions and won't get hosed by refactoring or modifying the grammar. Something similar can be considered for grammarkdown, eg.

BindingIdentifier ::
    Identifier       #a
    [~Yield] `yield` #b

Add link to your GitHub pages to preview the results

Every time I come here I have to futz with the URL and remember how it works. It'd be easier if you just supplied a link to the ES6 and TypeScript grammars. 😄

HTML trivia tagName includes closing `>`

Repro:

'use strict';
let { CoreAsyncHost, Grammar, GrammarkdownEmitter } = require('grammarkdown');

let source = `
    Foo : <ins>Bar</ins>
`;

(async () => {
  const grammarHost = CoreAsyncHost.forFile(source);
  const grammar = new Grammar([grammarHost.file], {}, grammarHost);
  await grammar.bind();
  console.log(grammar.rootFiles[0].elements[0].body.leadingHtmlTrivia[0].tagName);
})().catch(e => {
  console.error(e);
  process.exit(1);
});

prints ins>. Note trailing >.

Report when productions are collapsed

Can emu emitter emit the a collapsed attribute on emu-prodref when the production is collapsed, ie where the prudction name and only rhs are on the same line?

Add GitHub Actions for automated builds

Add GitHub Actions for the following scenarios:

PR Builds
CI Builds and ~/docs regeneration.

Representing comments in the parse tree

I'm working on a formatter for ecmarkup, which implies formatting grammarkdown as well. Right now I can almost just use the grammarkdown emitter (with a little tweaking), but because the AST doesn't represent comments, those get stripped from the output.

Right now I'm working around it with a custom emitter which scans every node for comments before emitting it, but that's a little painful.

Would it be possible to put comments in the parse tree directly, or have some other representation of them available without rescanning?

While I'm here: would it also be possible to accept HTML-style multiline comments? That's the style used in ecmarkup, being an HTML dialect, so it would be nice to be able to use them within grammarkdown snippets as well.

Mixed content warning in TypeScript example

In the README, ¶ Examples, the link to the HTML for the TypeScript grammar is broken because the CSS is loaded via http instead of https.

Viewed in modern Chrome, I get the below in the dev console.

Mixed Content: The page at 'https://rbuckton.github.io/grammarkdown/typescript.html' was loaded over HTTPS, but requested an insecure stylesheet 'http://bterlson.github.io/ecmarkup/elements.css'. This request has been blocked; the content must be served over HTTPS.

Support <ins>/<del> tags in more places in grammars

In ECMAScript spec proposals, we use <ins> and <del> HTML tags when marking changes vs current spec text. In a grammarkdown grammar, I noticed that things like the following trip this up:

Foo ::
  [~U] <ins>foo</ins>

Somehow <ins> is escaped and rendered inside the [~U]. The tags are well-supported in many other places in grammars, however. cc @bterlson

Ecmarkup no longer runs after updating grammarkdown to 2.1.0

Ecmarkup no longer runs after updating grammarkdown to 2.1.0 (or 2.1.1):

$ ./bin/ecmarkup.js ../ecma262/spec.html out.html
Cannot add property _signal, object is not extensible
    at CancellationToken.[@esfx/cancelable:Cancelable.cancelSignal] (/Code/ecmarkup/node_modules/prex/out/lib/cancellation.js:302:26)
    at Function.from (/Code/ecmarkup/node_modules/@esfx/async-canceltoken/dist/index.js:207:72)
    at Object.toCancelToken (/Code/ecmarkup/node_modules/grammarkdown/dist/core.js:296:48)
    at Function.convert (/Code/ecmarkup/node_modules/grammarkdown/dist/grammar.js:81:36)
    at Function.enter (/Code/ecmarkup/lib/Grammar.js:60:49)
    at walk (/Code/ecmarkup/lib/Spec.js:634:17)
    at walk (/Code/ecmarkup/lib/Spec.js:638:13)
    at walk (/Code/ecmarkup/lib/Spec.js:638:13)
    at walk (/Code/ecmarkup/lib/Spec.js:638:13)
    at walk (/Code/ecmarkup/lib/Spec.js:638:13)

I am guessing this has to do with #44.

production 'A : A @ B' not handled correctly

Section 12.11.3 of the ES6 spec begins (in ecmarkup):

<p>The production <emu-grammar>A : A @ B</emu-grammar>, where @ is ...

grammarkdown does not handle this production correctly, translating it to:

    A:A

    B:[empty]

Support for new RHS parameter forms

There is a proposed ECMAScript change to RHS parameter forms. Specifically, if when an RHS non-terminal production is parameterized, the RHS must be explicit about whether it is passing the parameter (currently an unadorned parameter name), forwarding the value of the parameter passed into the production (currently prefixed with ?), or not passing the parameter. Specifically:

Form	Current	New
Set parameter	Foo	+Foo
Forward parameter	?Foo	?Foo
Don't set parameter	Parameter is omitted	~Foo

Example

Foo[Param] :
  [+Param] `Present`
  [~Param] `Not present`

Bar[Param] :
  Foo[?Param]

Baz :
  Foo[+Param]

Qux :
  Foo[~Param]

parser.ts(439,62): An index expression argument must...

After fixing #22, I still see a compiler error during the gulp build phase.

Reviewing the source code, since ParsingContext is an enum, I gather the intent is to translate the numeric value into a string. I'm surprised by the error.

Could this be related to the recent upgrade to TypeScript?!? I'm running version 2.0.3.

== Full error message

C:\code\grammarkdown>gulp build
[12:33:51] Using gulpfile C:\code\grammarkdown\gulpfile.js
[12:33:51] Starting 'build:lib'...
"C:/code/grammarkdown/src/lib/parser.ts(439,62): An index expression argument must be of type 'string', 'number', 'symbol', or 'any'."
[12:33:53] Finished 'build:lib' after 2.65 s
[12:33:53] Starting 'build:tests'...
[12:33:54] Finished 'build:tests' after 735 ms
[12:33:54] Starting 'build'...
[12:33:54] Finished 'build' after 30 μs

documentation missing 3 constructs

The README doesn't mention the following constructs that appear in es6.grammar:

L : X but not Y

L : X but not one of Y or Z ...

L : one of
    X  Y  Z

HTML entities emitted incorrectly

var grammarkdown = require('grammarkdown'),
  Grammar = grammarkdown.Grammar,
  EmitFormat = grammarkdown.EmitFormat;

var source = 'LineTerminatorSequence :: &lt;CR&gt;&lt;LF&gt;';
var output;

const options = {
  format: EmitFormat.ecmarkup,
  noChecks: true
};

console.log(Grammar.convert(source, options, /*hostFallback*/ undefined));

Actual output:

<emu-production name="LineTerminatorSequence" type="lexical" collapsed>
    <emu-rhs a="238863bb">
        <emu-gprose>&lt;CR&gt;&amp;</emu-gprose>
        <emu-nt>lt</emu-nt>
    </emu-rhs>
</emu-production>

Expected output:

<emu-production name="LineTerminatorSequence" type="lexical" collapsed>
    <emu-rhs a="238863bb">
        <emu-gprose>&lt;CR&gt;&lt;LF&gt;</emu-gprose>
        <emu-nt>lt</emu-nt>
    </emu-rhs>
</emu-production>

Add support for ins and del tags for new grammar rules

This is another continuation of #26 to add more support for <ins> and <del> tags in the grammar.

This output looks bad and for a new rule probably isn't how someone would write it:

<emu-grammar>
  <ins>MemberDefinition :</ins>
    <ins>PropertyName ColonType? Initializer?</ins>
</emu-grammar>

<ins>
  <emu-production name="MemberDefinition" id="prod-MemberDefinition">
    <emu-nt><a href="#prod-MemberDefinition">MemberDefinition</a></emu-nt>
    <emu-geq>:</emu-geq>
    <ins>
      <emu-rhs a="7149f9b7">
        <emu-nt id="_ref_11734"><a href="#prod-PropertyName">PropertyName</a></emu-nt>
        <emu-nt optional="" id="_ref_11735">
          <a href="#prod-ColonType">ColonType</a>
          <emu-mods>
            <emu-opt>opt</emu-opt>
          </emu-mods>
        </emu-nt>
        <emu-nt optional="" id="_ref_11736">
          <a href="#prod-Initializer">Initializer</a>
          <emu-mods>
            <emu-opt>opt</emu-opt>
          </emu-mods>
        </emu-nt>
      </emu-rhs>
    </ins>
  </emu-production>
</ins>

The following look close. (Both generate the same output) The first is probably how I'd write new rules to wrap it all in an <ins>:

<emu-grammar>
  <ins>MemberDefinition :
    PropertyName ColonType? Initializer?</ins>
</emu-grammar>

<emu-grammar>
  <ins>MemberDefinition :</ins>
    PropertyName ColonType? Initializer?
</emu-grammar>

<ins>
  <emu-production name="MemberDefinition" id="prod-MemberDefinition">
    <emu-nt><a href="#prod-MemberDefinition">MemberDefinition</a></emu-nt>
    <emu-geq>:</emu-geq>
    <emu-rhs a="7149f9b7">
      <emu-nt id="_ref_11734"><a href="#prod-PropertyName">PropertyName</a></emu-nt>
      <emu-nt optional="" id="_ref_11735">
        <a href="#prod-ColonType">ColonType</a>
        <emu-mods>
          <emu-opt>opt</emu-opt>
        </emu-mods>
      </emu-nt>
      <emu-nt optional="" id="_ref_11736">
        <a href="#prod-Initializer">Initializer</a>
        <emu-mods>
          <emu-opt>opt</emu-opt>
        </emu-mods>
      </emu-nt>
    </emu-rhs>
  </emu-production>
</ins>

This is bad because of how: works for the appendix. It copies and pastes the emu-production and the <ins> is outside of it and doesn't get copied so when scrolling through the appendix you don't see any green tint over the new rule.

This is how it should(?) look and it renders correctly in ecmarkup:

<emu-production name="MemberDefinition" id="prod-MemberDefinition">
  <ins>
    <emu-nt><a href="#prod-MemberDefinition">MemberDefinition</a></emu-nt>
    <emu-geq>:</emu-geq>
    <emu-rhs a="7149f9b7">
      <emu-nt id="_ref_11734"><a href="#prod-PropertyName">PropertyName</a></emu-nt>
      <emu-nt optional="" id="_ref_11735">
        <a href="#prod-ColonType">ColonType</a>
        <emu-mods>
          <emu-opt>opt</emu-opt>
        </emu-mods>
      </emu-nt>
      <emu-nt optional="" id="_ref_11736">
        <a href="#prod-Initializer">Initializer</a>
        <emu-mods>
          <emu-opt>opt</emu-opt>
        </emu-mods>
      </emu-nt>
    </emu-rhs>
  </ins>
</emu-production>

rbuckton / grammarkdown Goto Github PK

grammarkdown's People

Contributors

Stargazers

Watchers

Forkers

grammarkdown's Issues

Example

Recommend Projects

Recommend Topics

Recommend Org