Giter Club home page Giter Club logo

domemtech.trash's Introduction

Trash

Build

Status: The toolset is undergoing a large rewrite due to the way parse trees are represented. Some tools are have not been rewritten yet.

The repo g4-scripts contains a collections of Bash scripts. which use Trash. to check or find properties about Antlr grammars and parse trees. It is the best place to see Trash in action. You can also read about Trash details in my blog.

Trash is a collection of ~40 command-line tools to analyze and transform Antlr parse trees and grammars. The toolkit can: generate a parser application for an Antlr4 grammar for any target and any OS; analyze the grammar for common problems; automate changes applied to a grammar scraped from a specification; transform parse trees for transpilating and proprocessing source code. With the Antlr toolkit and the collection of Antlr grammars, one can write programming language tools quickly and easily.

The toolkit is designed around a JSON representation of parse trees and command-line tools that read, modify, and write those tree via standard input and output. Complex refactorings can be achieved by chaining different commands together.

Each app in Trash is implemented as a Dotnet Tool console application, and can be used on Windows, Linux, or Mac. No prerequisites are required other than installing the NET SDK, and the toolchains for any other targets you want to use.

The toolkit uses Antlr and XPath2. The code is implemented in C#.

An application of the toolkit was used to scrape and refactor the Dart2 grammar from spec. See this script.

Installation

Requirements

Install Dotnet 8.0.x

Install

Copy this script and execute it in a command-line prompt.

dotnet tool install -g trcaret
dotnet tool install -g trcombine
dotnet tool install -g trconvert
dotnet tool install -g trcover
dotnet tool install -g trdelete
dotnet tool install -g trdeltree
dotnet tool install -g trfoldlit
dotnet tool install -g trgen
dotnet tool install -g trglob
dotnet tool install -g triconv
dotnet tool install -g trinsert
dotnet tool install -g trjson
dotnet tool install -g trparse
dotnet tool install -g trperf
dotnet tool install -g trrename
dotnet tool install -g trreplace
dotnet tool install -g trsplit
dotnet tool install -g trsponge
dotnet tool install -g trstrip
dotnet tool install -g trtext
dotnet tool install -g trtokens
dotnet tool install -g trtree
dotnet tool install -g trunfold
dotnet tool install -g trwdog
dotnet tool install -g trxgrep
dotnet tool install -g trxml
dotnet tool install -g trxml2

Uninstall

dotnet tool uninstall -g trcaret
dotnet tool uninstall -g trcombine
dotnet tool uninstall -g trconvert
dotnet tool uninstall -g trcover
dotnet tool uninstall -g trdelete
dotnet tool uninstall -g trfoldlit
dotnet tool uninstall -g trgen
dotnet tool uninstall -g triconv
dotnet tool uninstall -g trinsert
dotnet tool uninstall -g trjson
dotnet tool uninstall -g trparse
dotnet tool uninstall -g trperf
dotnet tool uninstall -g trrename
dotnet tool uninstall -g trreplace
dotnet tool uninstall -g trsplit
dotnet tool uninstall -g trsponge
dotnet tool uninstall -g trstrip
dotnet tool uninstall -g trtext
dotnet tool uninstall -g trtokens
dotnet tool uninstall -g trtree
dotnet tool uninstall -g trunfold
dotnet tool uninstall -g trwdog
dotnet tool uninstall -g trxgrep
dotnet tool uninstall -g trxml
dotnet tool uninstall -g trxml2

List of commands

NB: Out of date

  1. tranalyze -- Analyze a grammar
  2. trcombine -- Combine a split Antlr4 grammar
  3. trconvert -- Convert a grammar from one for to another
  4. trdelabel -- Remove labels from an Antlr4 grammar
  5. trdelete -- Delete nodes in a parse tree
  6. trdot -- Print a parse tree in Graphvis Dot format
  7. trenum -- Not functional, to enumerate strings from grammar.
  8. trfirst -- Outputs first sets of a grammar
  9. trfold -- Perform fold transform on a grammar
  10. trfoldlit -- Perform fold transform on grammar with literals
  11. trformat -- Format a grammar
  12. trgen -- Generate an Antlr4 parser for a given target language
  13. trgen2 -- Generate files from template and XML doc list.
  14. trgroup -- Perform a group transform on a grammar
  15. trinsert -- Insert string into points in a parse tree
  16. tritext -- Get strings from a PDF file
  17. trjson -- Print a parse tree in JSON structured format
  18. trkleene -- Perform a Kleene transform of a grammar
  19. trmove -- Move nodes in a parse tree
  20. trparse -- Parse a grammar or use generated parse to parse input
  21. trperf -- Perform performance analysis of an Antlr grammar parse
  22. trpiggy -- Perform a parse tree rewrite
  23. trprint -- Print a parse tree, including off-token characters
  24. trrename -- Rename symbols in a grammar
  25. trreplace -- Replace nodes in a parse tree with text
  26. trrr -- (No description.)
  27. trrup -- Remove useless parentheses in a grammar
  28. trsem -- Read static semantics and generate code
  29. trsort -- Sort rules in a grammar
  30. trsplit -- Split a combined Antlr4 grammar
  31. trsponge -- Extract parsing results output of Trash command into files
  32. trst -- Print a parse tree in Antlr4 ToStringTree()
  33. trstrip -- Strip a grammar of all actions, labels, etc.
  34. trtext -- Print a parse tree with a specific interval
  35. trthompson -- (No description.)
  36. trtokens -- Print tokens in a parse tree
  37. trtree -- Print a parse tree in a human-readable format
  38. trull -- Transform a grammar with upper- and lowercase string literals
  39. trunfold -- Perform an unfold transform on a grammar
  40. trungroup -- Perform an ungroup transform on a grammar
  41. trwdog -- Kill a program that runs too long
  42. trxgrep -- "Grep" for nodes in a parse tree using XPath
  43. trxml -- Print a parse tree in XML structured format
  44. trxml2 -- Print an enumeration of all paths in a parse tree to leaves

Examples

Parse a grammar, create a parser for the grammar, build, and test

git clone https://github.com/antlr/grammars-v4
cd grammars-v4/python/python
trparse *.g4 | trxgrep ' //grammarDecl' | trtext
# Output:
# PythonLexer.g4:lexer grammar PythonLexer;
# PythonParser.g4:parser grammar PythonParser;
trgen
cd Generated
dotnet build
cat - <<EOF | trparse | trxgrep ' //test' | trtext
x == y
x == y if z == b else a == u
lambda: a
lambda x, y: a
EOF
# Output:
# a
# lambda x, y: a
# a
# lambda: a
# a == u
# x == y if z == b else a == u
# x == y

Display parse tree

trparse -i "a == b" | trtree

trtree is only one of several ways to view parse tree data. Other programs for different output are trjson for JSON output, trxml for XML output, trst for Antlr runtime ToStringTree output, trdot, trprint for input text for the parse, and tragl.

Convert grammars to Antlr4

trparse ada.g2 | trconvert | trprint | less

This command parses an old Antlr2 grammar using trparse, converts the parse tree data to Antlr4 syntax using trconvert and finally prints out the converted parse tree data, ada.g4 using trprint. Other grammar that can be converted are Antlr3, Bison, and ISO EBNF. In order to use the grammar to parse data, you will need to convert it to an Antlr4 grammar.

Generate an Arithmetic parser application

mkdir foobar; cd foobar; trgen

This command creates a parser application for the C# target. If executed in an empty directory, which is done in the example shown above, trgen creates an application using the Arithmetic grammar. If executed in a directory containing a Antlr Maven plugin (pom.xml), trgen will create a program according to the information specified in the pom.xml file. Either way, it creates a directory Generated/, and places the source code there.

trgen has many options to generate a parser from any Antlr4 grammar, for any target. But, if a parser is generated for the C# target, built using the NET SDK, then trparse can execute the generated parser, and can be used with all the other tools in Trash. _NB: In order to use the generate parser application, you must first build it:

dotnet restore Generated/Test.csproj
dotnet build Generated/Test.csproj

Run the generated parser application

trparse -i "1+2+3" | trtree

After using trgen to generate a parser program in C#, shown previously, and after building the program, you can run the parser using trparse. This program looks for the generated parser in directory Generated/. If it exists, it will run the parser application in the directory. You can pass as command-line arguments an input string or input file. If no command-line arguments are supplied, the program will read stdin. The output of trparse, as with most tools of Trash, is parse tree data.

Find nodes in the parse tree using XPath

mkdir empty; cd empty; trgen; dotnet build Generated/Test.csproj; \
    trparse -i "1+2+3" | trxgrep " //SCIENTIFIC_NUMBER" | trst

With this command, a directory is created, the Arithmetic grammar generated, build, and then run using trparse. The trparse tool unifies all parsing, whether it's parsing a grammar or parsing input using a generated parser application. The output from the trparse tool is a parse tree which you can search. Trxgrep is the generalized search program for parse trees. Trxgrep uses XPath expressions to precisely identify nodes in the parse tree.

XPath was added to Antlr4, but Trash takes the idea further with the addition of an XPath2 engine ported from the Eclipse Web toolkit. XPath is a well-defined language that should be used more often in compiler construction.

Rename a symbol in a grammar, generate a parser for new grammar

trparse Arithmetic.g4 | trrename "//parserRuleSpec//labeledAlt//RULE_REF[text() = 'expression']" "xxx" | trtext > new-source.g4
trparse Arithmetic.g4 | trrename -r "expression,expression_;atom,atom_;scientific,scientific_" | trprint

In these two examples, the Arithmetic grammar is parsed. trrename reads the parse tree data and modifies it by renaming the expression symbol two ways: first by XPath expression identifying the LHS terminal symbol of the expression symbol, and the second by assumption that the tree is an Antlr4 parse tree, then renaming a semi-colon-separated list of paired renames. The resulting code is reconstructed and saved. trrename does not rename symbols in actions, nor does it rename identifiers corresponding to the grammar symbols in any support source code (but it could if the tool is extended).

Count method declarations in a Java source file

git clone https://github.com/antlr/grammars-v4.git; \
    cd grammars-v4/java/java9; \
    trgen; dotnet build Generated/Test.csproj;\
    trparse examples/AllInOne8.java | trxgrep " //methodDeclaration" | trst | wc

This command clones the Antlr4 grammars-v4 repo, generates a parser for the Java9 grammar, then runs the parser on examples/AllInOne8.java. The parse tree is then piped to trxgrep to find all parse tree nodes that are a methodDeclaration type, converts it to a simple string, and counts the result using wc.

Strip a grammar of all non-essential CFG

trparse Java9.g4 | trstrip | trtext > Essential-Java9.g4

Split a grammar

Since Antlr2, one can written a combined parser/lexer in one file, or a split parser/lexer in two files. While it's not hard to split or combine a grammar, it's tedious. For automating transformations, it's necessary because Antlr4 requires the grammars to be split when super classes are needed for different targets.

trcombine ArithmeticLexer.g4 ArithmeticParser.g4 | trprint > Arithmetic.g4

This command calls trcombine which parses two split grammar files ArithmeticLexer.g4 and ArithmeticParser.g4, and creates a combined grammar for the two.

trparse Arithmetic.g4 | trsplit | trsponge -o true

This command calls trsplit which splits the grammar into two parse tree results, one that defines ArithmeticLexer.g4 and the other that defines ArithmeticParser.g4. The tool trsponge is similar to the tee in Linux: the parse tree data is split and placed in files.

Parsing Result Sets -- the data passed between commands

A parsing result set is a JSON serialization of an array of:

  • A set of parse tree nodes.
  • Parser information related to the parse tree nodes.
  • Lexer information related to the parse tree nodes.
  • The name of the input corresponding to the parse tree nodes.
  • The input text corresponding to the parse tree nodes.

Most commands in Trash read and/or write parsing result sets.

Supported grammars

Grammars File suffix
Antlr4 .g4
Antlr3 .g3
Antlr2 .g2
Bison .y
LBNF .cf
W3C EBNF .ebnf
ISO 14977 .iso14977, .iso

Analysis

Recursion

Refactoring

Trash provides a number of transformations that can help to make grammars cleaner (reformatting), more readable (reducing the length of the RHS of a rule), and more efficient (reducing the number of non-terminals) for Antlr.

Some of these refactorings are very specific for Antlr due to the way the parser works, e.g., converting a prioritized chain of productions recognizing an arithmetic expression to a recursive alternate form. The refactorings implemented are:

Raw tree editing

Reordering

Changing rules

Splitting and combining

Conversion


The source code for the extension is open source, free of charge, and free of ads. For the latest developments on the extension, check out my blog.

Building

git clone https://github.com/kaby76/Domemtech.Trash
cd Domemtech.Trash
make clean; make; make install

You must have the NET SDK installed to build and run.

Current release

0.22.0

Update to .NET 8. Added trdot.

Prior Releases

0.18.1 -- Nov 12, 2022

Re-adding CI tests and stabilizing the tools.

0.18.0 -- Nov 7, 2022

  • Adding Xalan code.
  • Fix #180.
  • Fix crash in trgen antlr/grammars-v4#2818.
  • Fix #134.
  • Add -e option to trrename.
  • Update Antlr4BuildTasks version.
  • Fix #197, #198.
  • Fix trparse exit code.
  • Add --quiet option to trparse to just get exit code.
  • Change trgen templates to remove -file option, make file name parsing the default.

If you have any questions, email me at ken.domino gmail.com

domemtech.trash's People

Contributors

dependabot[bot] avatar kaby76 avatar skalt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

domemtech.trash's Issues

converting inline %prec in bison grammars leaves extra symbols

Using the latest version of trash (8.5),

// bison
SelectStmt: select_no_parens			%prec UMINUS
			| select_with_parens		%prec UMINUS
		;

becomes

selectStmt  : select_no_parens UMINUS
  | select_with_parens UMINUS
  ;

The desired outcome is

selectStmt  : select_no_parens
  | select_with_parens
  ;

Policy and/or document request: contribution guidelines?

Hi @kaby76, would you be willing to accept contributions to this repo? If so, would you be willing to document your desired process for someone to contribute to this repo?

I'm grateful for your work. I'd like to contribute patches in a way that makes your life easier and respects your creative ownership of the code. However, I understand if you'd prefer to limit contributions: reviewing PRs is work that you don't owe anyone. In either case, I think it might be worth writing out a CONTRIBUTING.md to establish either a process or some ground rules for contributions.

Trash commands should be able to distinguish between clients that read output

I don't know if this is possible, or even a good design: when I do trparse foo.g4, most of the time I'm just interested in whether the parse actually succeeded. I don't want all the parsing result. But, clearly, for trparse foo.g4 | trtree, trparse should output a parsing result. And for trparse 2>&1 | less, I just want the list of error messages. Perhaps I should offer an optional arg for parsing result?

feature request: make optional rules mandatory

I've got a grammar where some rules match the empty string. I'd like to transform those rules such that they no longer match the empty string and have all references to the rules made optional. Example:

// before
my_rule: foo opt_bar baz;
opt_bar: bar | ;
// after
my_rule: foo opt_bar? baz;
opt_bar: bar;

More difficult example:

// before
foo: bar | opt_baz | quux; // <- foo matches the empty string too
opt_baz: baz | ;
// after
foo: bar | opt_baz? | quux;
opt_baz: baz; 

If you point me to where this feature should go, I'd be happy to take a stab at it.

Error converting bison grammar

Hi! I was testing out the script to convert grammars on a .y grammar, and I got an unexpected error:

System.Runtime.CompilerServices.SwitchExpressionException: Non-exhaustive switch expression failed to match its input.
Unmatched value was BisonParser.g4.
   at Docs.Class1.CreateDoc(ParsingResultSet parse_info) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\Docs\Class1.cs:line 63
   at Trash.CConvert.Execute(Config config) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\CConvert.cs:line 44
   at Trash.Program.MainInternal(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\Program.cs:line 65
   at Trash.Program.Main(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\Program.cs:line 14

Looking at line 63 of Docs/Class1.cs, it seems like trconvert doesn't support conversion from Bison to Antlr4 without additional configuration. Is that correct?

Reproduction:

#!/usr/bin/env bash
# with docker and git accessible

# clone trash
git clone https://github.com/kaby76/Domemtech.Trash.git /tmp/trash
cd /tmp/trash

# build a docker image with trash's tools installed, since I don't have .NET installed locally
echo '
# Dockerfile, butilt in the repo root cloned to 5cfd838
FROM mcr.microsoft.com/dotnet/sdk
ENV PATH="$PATH:/root/.dotnet/tools"
WORKDIR /trash
COPY . /trash
RUN dotnet tool install -g trparse
RUN dotnet tool install -g trconvert
CMD bash
' | docker build -t trash -f - /tmp/trash;

git clone --depth 1 https://github.com/postgres/postgres.git /tmp/postgres

docker run \
  -v /tmp/postgres:/postgres \
  --workdir /postgres \
   trash sh -c '
     trparse /postgres/src/backend/parser/gram.y | trconvert
  '

trrup removes required parentheses

In the XText Antlr4 grammar, this line occurs:

assignment : ( ( ( ( '=>' ) ) | ( ( '->' ) ) ) ? ( ( validID ) ) ( ( ( '+=' | '=' | '?=' ) ) ) ( ( assignableTerminal ) ) ) ;

trrup removes too many parentheses around ( '+=' | '=' | '?=' ).

trparse fails to distiguish bison production rules with different casings

In a bison grammar there were two production rules distinguished by case: character and Character (1). trparse gram.y | trconvert | trsponge normalized the distinct symbols in the input grammar to character in the output grammar. That's a reasonable way to deal with a pathological grammar, but I think it's a bug in how trparse parses yacc-compatible grammars: the yacc spec says that

[Rule] Names are of arbitrary length, made up of letters, periods ( '.' ), underscores ( '_' ), and non-initial digits. Uppercase and lowercase letters are distinct.

I'd be happy to take a shot at addressing this bug if you point me to the right area of the code.


1: there was also a CHARACTER a token to spice things up, but trparse handled that perfectly.

documentation request: explicitly state `trrename` can rename tokens

When I read the current trrename readme and help-text

trrename renames rule symbols in a grammar.

I didn't expect renaming tokens to work. I tested it out, however, and I was pleasantly surprised!

test:

parser grammar temp;
a : MY_TOKEN
; trparse ./temp.g4 | trrename -r 'MY_TOKEN,OTHER_TOKEN' | trprint |  sed 's/^/# /g'                                                          130 
# parser grammar temp;
# a : OTHER_TOKEN ;

xpath engine doesn't seem to produce right result

For this grammar:

grammar temp;
my_rule: foo opt_bar baz;
opt_bar: bar | ;

the following command outputs both rules in the grammar.

trparse temp.g4 | trxgrep ' //parserRuleSpec[//alternative[@ChildCount=0]]' | trtext

This doesn't make sense because trparse temp.g4 | trxgrep ' //alternative[@ChildCount=0]' | trtree returns a result of one match, which is correct. And, trparse temp.g4 | trxgrep ' //parserRuleSpec[//al]' | trtext returns a result of zero matches because there are no parserRuleSpec nodes with an al descent anywhere.

This is in referece to #21

trparse -input isn't working

For this grammar:

grammar A;
all: e* EOF;
e: 
  e D e
  | e S  e
  | e M e
  | e P e
  | OP e CP
  | e (LT | LE | GT | GE | EQ | NE) e
  | e A e
  | e O e
  | NUMBER
  | STRING
  | IDENTIFIER
  ;    
OP: '(';
CP: ')';
D : '/';
S : '*';
M : '-';
P : '+';
LT: '<';
LE: '<=';
GT: '>';
GE: '>=';
EQ: '==';
NE: '!=';
A: '&&';
O: '||';
NUMBER: [0-9]+;
STRING: '"' ~'"'*? '"';
IDENTIFIER: [a-zA-Z]+;
WS: [ \t\n\r]+ -> channel(HIDDEN);

generate a CSharp target parser (trgen -t CSharp -s all) and test for input "8*4/2". Beyond the fact that the grammar doesn't define the multiplicative operators correctly (they are ordered in different alts!), "trparse -input '8*4/2' | trtree" is a truncated tree. If the input is in a file, "trparse file | trtree" works fine.

0.13.8.

trxgrep //foobar/text() should return text

I know that it's possible to have text results instead of parsing tree results. A trxgrep //foobar/text() should return the text attribute for the node, not a NET type name.

Can't do "trparse ... | trxgrep ... | trdelete ... | trtext"

Parsing results, I think, can pass multiple trees between programs. However, trdelete wants to do var pr = LanguageServer.ParsingResultsFactory.Create(document); and that returns null because the results are stdin and there are multiple tree nodes. I want this to work so I can collect all class declarations in a C# file--actually, all of antlr/antlr4/runtime/CSharp/src/...

trgen assumes that Maven project only contains one scenario when determining grammarName

Found here:

var pom_grammar_name = navigator
.Select("//plugins/plugin[artifactId=\"antlr4test-maven-plugin\"]/configuration/grammarName", nsmgr)
.Cast<XPathNavigator>()
.Select(t => t.Value)
.ToList();

This code is making an assumption that <grammarName> is directly inside <configuration>, but the actual docs for antlr4test-maven-plugin say that it doesn't have to be. You can have multiple scenarios, each testing a different grammar.

Discovered on my PR over at antlr/grammars-v4, which has this:

                <configuration>
                    <scenarios>
                        <scenario>
                            <scenarioName>FBX</scenarioName>
                            <verbose>false</verbose>
                            <showTree>false</showTree>
                            <entryPoint>start</entryPoint>
                            <grammarName>FBX</grammarName>
                            <packageName></packageName>
                            <testFileExtension>.fbx</testFileExtension>
                            <exampleFiles>examples/</exampleFiles>
                        </scenario>
                        <scenario>
                            <scenarioName>FBXSemantic</scenarioName>
                            <verbose>false</verbose>
                            <showTree>false</showTree>
                            <entryPoint>start</entryPoint>
                            <grammarName>FBXSemantic</grammarName>
                            <packageName></packageName>
                            <testFileExtension>.fbx</testFileExtension>
                            <exampleFiles>examples/</exampleFiles>
                        </scenario>
                    </scenarios>
                </configuration>

Kleene should output better rewrites

For these rules:

xx : 'a' xx | 'a';
yy : yy 'b' | 'b' ;
zz : | 'a' | 'a' zz;
z2 : | 'b' | z2 'b';

trkleene outputs this:

xx : ( 'a' ) * ( 'a' ) ;
yy : ( 'b' ) ( 'b' ) * ;
zz : ( 'a' ) * ( | 'a' ) ;
z2 : ( | 'b' ) ( 'b' ) * ;

While I think it is okay, it could be improved by using the +-operator and not require all the parentheses.

xx: 'a'+;
yy: 'b'+;
zz: 'a'*;
z2: 'b'*;

Add extension capability to xpath library

I would like to create tuples of information from xpath expressions. For example, if I want to find nodes in a Java parse tree and tags specific nodes with an integer, the resulting sets cannot be combined because there's no operator or function to union value sets:

value-union(//classDeclaration/IDENTIFIER/string-join(' 1', text()) , //fieldDeclaration/variableDeclarators/variableDeclarator/variableDeclaratorId/IDENTIFIER/string-join(' 2',text()) )

I would like to bind a function value-union, written in C#, to trxgrep.

Kleene still messed up

After a bit of refactoring, here are the transforms that need to be fixed:

        // Left recursion:
        // A -> A b1 | A b2 | ... | a1 | a2 | ... ;
        // => A ->  (a1 | a2 | ... ) (b1 | b2 | ...)*;
        // Note, A on RHS cannot have any postfix operators.
        //
        // A -> A? b1 | A? b2 | ... | a1 | a2 | ...;
        // A -> A b1 | b1 | A b2 | b2 | ... | a1 | a2 | ...;
        // A -> b1 | b2 | ... | a1 | a2 | ... | A b1 | a b2 | ...; 
        // A -> ( a1 | a2 | ... | b1 | b2 | ... ) (b1 | b2 | ...)* ;
        // A on RHS must only be "A?".
        //
        // Note, the rule cannot have any alts without A?.
        //
        // Right recursion:
        // Convert A -> b1 A | b2 A | ... | a1 | a2 | ... ;
        // into A ->   (b1 | b2 | ...)* (a1 | a2 | ... )
        //
        // A -> a1 | a2 | ... | b1 A? | b2 A? | ...;
        // A -> (b1 | b2 | ...)* (a1 | a2 | b1 | b2 | ...)
        // A on RHS must only be "A?".

unable to reproduce trgroup command-line examples

Using trgroup version v0.11.0, I'm unable to reproduce the examples from the trgroup README:

echo "grammar temp;
a : 'X' 'B' 'Z' | 'X' 'C' 'Z' | 'X' 'D' 'Z' ;
" > ./temp.g4;
trparse ./temp.g4 | trgroup "//parserRuleSpec[RULE_REF/text() = 'a']//altList" | trprint
# prints 'no changes'

Add grouping to tranalyze

As noted by Ivan Kochurkin, Antlr parsers perform better--and just look more reasonable--with grouping of common alt prefixes. See this note.

So, tranalyze, which I just entered in the toolchain, should have a check for prefixes, and flag potential groupings.

Remove and replace trmvsr

The trmvsr is a terrible program. It's useless except for parser grammars, and it is just a specialized case for the much more powerful and useful trmove. To move a rule to the top, run something like this:

trparse foo.g4 | trmove "//ruleSpec[parserRuleSpec/RULE_REF/text()='start_rule_name']" "(//ruleSpec)[1]" | trsponge -c true

trmvsr must be removed.

Likely bug in trxgrep

  • Create the expression grammar (mkdir foo; cd foo; trgen; cd Generated; make).
  • Create input file "in.txt" with 1+2+3+4+5+6+7+8+9
  • Run: trparse in.txt | trxgrep ' count(//SCIENTIFIC_NUMBER/ancestor::*)'

Note, the xpath implementation doesn't have a way to create a tuple, so I can't do a count, and also return the node. That's a failure in xpath2 and I don't think it's available in xpath3.

This crashes in trxgrep. If I remove the function call "count()", it works.

bug: unexplained error when transforming large grammar

I pointed the sharp end of trgroup v0.11.2 at the trconverted postgres grammar, and I got

System.Exception: Exception of type 'System.Exception' was thrown.
   at NWayDiff.Classical`1.classical_lcs(List`1 a, List`1 b, Int32 i, Int32 j, Dictionary`2 memo)
   at NWayDiff.Difdef_impl`1.add_vec_to_diff_classical(Diff`1& a, Int32 fieldid, List`1 b)
   at NWayDiff.Difdef_impl`1.add_vec_to_diff(Diff`1& a, Int32 fileid, List`1 b)
   at NWayDiff.Difdef_impl`1.add_vec_to_diff(Diff`1& a, Int32 fileid, List`1 b)
   at NWayDiff.Difdef_impl`1.merge(Int32 fmask)
   at NWayDiff.Difdef`1.merge()
   at LanguageServer.Transform.Group(List`1 nodes, Document document)
   at Trash.CGroup.Execute(Config config) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\CGroup.cs:line 84
   at Trash.Program.MainInternal(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 68
   at Trash.Program.Main(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 14

I'm unable to parse what's going wrong. Also, it's worth noting the System.Exception did not halt the program in a way that exited my grammar-trasnform pipeline despite having set -eo pipefail in my shell. trgroup ran for 10+ minutes with low/no CPU consumption after the error occurred.

Do you know what's happening here?

W3C EBNF is wrong

Please refer to https://www.w3.org/TR/2010/REC-xquery-20101214/#EBNFNotation

There is a working version here: https://www.bottlecaps.de/rr/ui

But, for all the bluster in w3c.org in saying that is provides "high quality" specifications, W3C EBNF itself is not formally defined!

BNF Notation != W3C EBNF

The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].

=> XML 1.0 EBNF NOTATION

  • Start symbol begins with lowercase letter. Everything else is uppercase start? Really? Then "element" is a start symbol?

XQuery 3.1, more detail here

  • Notice differences statement!

Trash Build

Hello,

I'm trying to build this project but it could not find TrashBase nuget package. can you please help me about this?

I'll use it to convert antlr2 grammar to antlr4 if I can by the way.

Edit: I have tried to use VS2019 extension, but it throws an error for Antlr4.Runtime.Standard.dll and I could not work it out.

Thanks,

Best Regards

Multiple file xgrep should work analogously to grep

When I do a grep "hello" *.g4, I see a list of lines preceded by a file name. When I do a trparse *.g4 | trxgrep ' //TOKEN_REF[text()="EOF"]', I see a parsing result set, but if I pipe to trtext, I get the text, but have no idea which file this came from. There seems to be a missing flag here.

Add test case generator

  • Grammarinator
    • Hodován, R., Kiss, Á. and Gyimóthy, T., 2018, November. Grammarinator: a grammar-based open source fuzzer. In Proceedings of the 9th ACM SIGSOFT international workshop on automating TEST case design, selection, and evaluation (pp. 45-48).
    • https://github.com/renatahodovan/grammarinator
    • Python3
    • "Unparser()" creates a random parse tree that fits within some specified parameter limits. "Unlexer()" creates a random token of some type. Note: there is an option --keep-trees on grammarinator-generate, but I cannot get it to work. I think it's critical to know what the intended parse is supposed to be in trying to figure out what is going wrong when the test case is actually parsed.
    • Section 2, paragraph 2, "AST"--I don't know why it's called an AST, when it is actually a CST. Why do people keep confusing the 50+ y.o. term.
    • Does not perform randomized inter-token spacing. It requires a "serializer", one provided, but it is not randomized, and there is no analysis as to when no intertoken spacing is required.
  • Gramtest

I have yet to find a good paper on enumeration method, defined in clear formal manner, defining and using derivations. Derivation

  • Purdom, P., 1972. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3), pp.366-375.

trgroup fails with exception

Versions
trparse 0.11.5
trgroup 0.11.5
trprint 0.11.5

command line:
trparse g.g4 |trgroup|trprint

Grammar:
Note: commenting any of alts make trgroup happy

grammar trgroupfail;
alter_table_cmd
: ADD_P COLUMN IF_P NOT EXISTS columnDef
| ALTER opt_column colid alter_column_default
| ALTER opt_column colid SET NOT NULL_P
| NOT OF
;

Error:
System.Exception: Exception of type 'System.Exception' was thrown.
at NWayDiff.Classical1.classical_lcs(List1 a, List1 b, Int32 i, Int32 j, Dictionary2 memo)
at NWayDiff.Classical1.classical_lcs(List1 a, List1 b, Int32 i, Int32 j, Dictionary2 memo)
at NWayDiff.Classical1.classical_lcs(List1 a, List1 b, Int32 i, Int32 j, Dictionary2 memo)
at NWayDiff.Difdef_impl1.add_vec_to_diff_classical(Diff1& a, Int32 fieldid, List1 b) at NWayDiff.Difdef_impl1.add_vec_to_diff(Diff1& a, Int32 fileid, List1 b)
at NWayDiff.Difdef_impl1.merge(Int32 fmask) at NWayDiff.Difdef1.merge()
at LanguageServer.Transform.Group(List`1 nodes, Document document)
at Trash.CGroup.Execute(Config config) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\CGroup.cs:line 84
at Trash.Program.MainInternal(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 68
at Trash.Program.Main(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 14

trgen fails to generate driver code

The trgen program doesn't work with a number of scenarios:

  • Multiple grammars scenario, as reported here.
  • Missing test scenario, as reported here.
  • trgen -t Go parser doesn't work because it puts the parser in the wrong directory.
  • trgen has the capability to just directly gather the information from the grammar(s) in the current directory.

These need to be fixed (after I get the optimized ISO C++ grammars finished).

Add in parsing of grammars into trgen

When a pom.xml file is not supplied, trgen assumes information passed via command line. Instead, the tool should just read the grammars in the directory, and look for the first rule with EOF at the end. In addition, when the tool is able to read the grammar, I can then focus on generating a target-specific grammar from a "target agnostic" grammar. Whether we choose Java as the accepted format of actions, or we have an "options { language=TargetAgnostic; }" is a good question.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.