kaby76 / domemtech.trash Goto Github PK
View Code? Open in Web Editor NEWToolkit for grammars
License: MIT License
Toolkit for grammars
License: MIT License
After a bit of refactoring, here are the transforms that need to be fixed:
// Left recursion:
// A -> A b1 | A b2 | ... | a1 | a2 | ... ;
// => A -> (a1 | a2 | ... ) (b1 | b2 | ...)*;
// Note, A on RHS cannot have any postfix operators.
//
// A -> A? b1 | A? b2 | ... | a1 | a2 | ...;
// A -> A b1 | b1 | A b2 | b2 | ... | a1 | a2 | ...;
// A -> b1 | b2 | ... | a1 | a2 | ... | A b1 | a b2 | ...;
// A -> ( a1 | a2 | ... | b1 | b2 | ... ) (b1 | b2 | ...)* ;
// A on RHS must only be "A?".
//
// Note, the rule cannot have any alts without A?.
//
// Right recursion:
// Convert A -> b1 A | b2 A | ... | a1 | a2 | ... ;
// into A -> (b1 | b2 | ...)* (a1 | a2 | ... )
//
// A -> a1 | a2 | ... | b1 A? | b2 A? | ...;
// A -> (b1 | b2 | ...)* (a1 | a2 | b1 | b2 | ...)
// A on RHS must only be "A?".
Versions
trparse 0.11.5
trgroup 0.11.5
trprint 0.11.5
command line:
trparse g.g4 |trgroup|trprint
Grammar:
Note: commenting any of alts make trgroup happy
grammar trgroupfail;
alter_table_cmd
: ADD_P COLUMN IF_P NOT EXISTS columnDef
| ALTER opt_column colid alter_column_default
| ALTER opt_column colid SET NOT NULL_P
| NOT OF
;
Error:
System.Exception: Exception of type 'System.Exception' was thrown.
at NWayDiff.Classical1.classical_lcs(List
1 a, List1 b, Int32 i, Int32 j, Dictionary
2 memo)
at NWayDiff.Classical1.classical_lcs(List
1 a, List1 b, Int32 i, Int32 j, Dictionary
2 memo)
at NWayDiff.Classical1.classical_lcs(List
1 a, List1 b, Int32 i, Int32 j, Dictionary
2 memo)
at NWayDiff.Difdef_impl1.add_vec_to_diff_classical(Diff
1& a, Int32 fieldid, List1 b) at NWayDiff.Difdef_impl
1.add_vec_to_diff(Diff1& a, Int32 fileid, List
1 b)
at NWayDiff.Difdef_impl1.merge(Int32 fmask) at NWayDiff.Difdef
1.merge()
at LanguageServer.Transform.Group(List`1 nodes, Document document)
at Trash.CGroup.Execute(Config config) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\CGroup.cs:line 84
at Trash.Program.MainInternal(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 68
at Trash.Program.Main(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 14
For this grammar:
grammar A;
all: e* EOF;
e:
e D e
| e S e
| e M e
| e P e
| OP e CP
| e (LT | LE | GT | GE | EQ | NE) e
| e A e
| e O e
| NUMBER
| STRING
| IDENTIFIER
;
OP: '(';
CP: ')';
D : '/';
S : '*';
M : '-';
P : '+';
LT: '<';
LE: '<=';
GT: '>';
GE: '>=';
EQ: '==';
NE: '!=';
A: '&&';
O: '||';
NUMBER: [0-9]+;
STRING: '"' ~'"'*? '"';
IDENTIFIER: [a-zA-Z]+;
WS: [ \t\n\r]+ -> channel(HIDDEN);
generate a CSharp target parser (trgen -t CSharp -s all) and test for input "8*4/2". Beyond the fact that the grammar doesn't define the multiplicative operators correctly (they are ordered in different alts!), "trparse -input '8*4/2' | trtree" is a truncated tree. If the input is in a file, "trparse file | trtree" works fine.
0.13.8.
When I do a grep "hello" *.g4
, I see a list of lines preceded by a file name. When I do a trparse *.g4 | trxgrep ' //TOKEN_REF[text()="EOF"]'
, I see a parsing result set, but if I pipe to trtext
, I get the text, but have no idea which file this came from. There seems to be a missing flag here.
The trgen program doesn't work with a number of scenarios:
trgen -t Go
parser doesn't work because it puts the parser in the wrong directory.These need to be fixed (after I get the optimized ISO C++ grammars finished).
As noted by Ivan Kochurkin, Antlr parsers perform better--and just look more reasonable--with grouping of common alt prefixes. See this note.
So, tranalyze, which I just entered in the toolchain, should have a check for prefixes, and flag potential groupings.
Using the latest version of trash (8.5),
// bison
SelectStmt: select_no_parens %prec UMINUS
| select_with_parens %prec UMINUS
;
becomes
selectStmt : select_no_parens UMINUS
| select_with_parens UMINUS
;
The desired outcome is
selectStmt : select_no_parens
| select_with_parens
;
demo:
grammar temp;
foo: (bar)?; // <- can be safely replaced with `bar?`
bar: 'baz';
; trparse ./temp.g4 | trrup | trprint
# no changes
# grammar temp;
# foo: (bar)?;
# bar: 'baz';
; trrup --version
# trrup 0.11.3
Hi! I was testing out the script to convert grammars on a .y
grammar, and I got an unexpected error:
System.Runtime.CompilerServices.SwitchExpressionException: Non-exhaustive switch expression failed to match its input.
Unmatched value was BisonParser.g4.
at Docs.Class1.CreateDoc(ParsingResultSet parse_info) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\Docs\Class1.cs:line 63
at Trash.CConvert.Execute(Config config) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\CConvert.cs:line 44
at Trash.Program.MainInternal(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\Program.cs:line 65
at Trash.Program.Main(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\Program.cs:line 14
Looking at line 63 of Docs/Class1.cs
, it seems like trconvert
doesn't support conversion from Bison to Antlr4 without additional configuration. Is that correct?
Reproduction:
#!/usr/bin/env bash
# with docker and git accessible
# clone trash
git clone https://github.com/kaby76/Domemtech.Trash.git /tmp/trash
cd /tmp/trash
# build a docker image with trash's tools installed, since I don't have .NET installed locally
echo '
# Dockerfile, butilt in the repo root cloned to 5cfd838
FROM mcr.microsoft.com/dotnet/sdk
ENV PATH="$PATH:/root/.dotnet/tools"
WORKDIR /trash
COPY . /trash
RUN dotnet tool install -g trparse
RUN dotnet tool install -g trconvert
CMD bash
' | docker build -t trash -f - /tmp/trash;
git clone --depth 1 https://github.com/postgres/postgres.git /tmp/postgres
docker run \
-v /tmp/postgres:/postgres \
--workdir /postgres \
trash sh -c '
trparse /postgres/src/backend/parser/gram.y | trconvert
'
I've got a token named SKIP
, which is an ANTLR keyword. Do you think it would be appropriate for trconvert
to postfix tokens that collide with ANTLR keywords with a '_'?
I know that it's possible to have text results instead of parsing tree results. A trxgrep //foobar/text() should return the text attribute for the node, not a NET type name.
I pointed the sharp end of trgroup
v0.11.2 at the trconvert
ed postgres grammar, and I got
System.Exception: Exception of type 'System.Exception' was thrown.
at NWayDiff.Classical`1.classical_lcs(List`1 a, List`1 b, Int32 i, Int32 j, Dictionary`2 memo)
at NWayDiff.Difdef_impl`1.add_vec_to_diff_classical(Diff`1& a, Int32 fieldid, List`1 b)
at NWayDiff.Difdef_impl`1.add_vec_to_diff(Diff`1& a, Int32 fileid, List`1 b)
at NWayDiff.Difdef_impl`1.add_vec_to_diff(Diff`1& a, Int32 fileid, List`1 b)
at NWayDiff.Difdef_impl`1.merge(Int32 fmask)
at NWayDiff.Difdef`1.merge()
at LanguageServer.Transform.Group(List`1 nodes, Document document)
at Trash.CGroup.Execute(Config config) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\CGroup.cs:line 84
at Trash.Program.MainInternal(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 68
at Trash.Program.Main(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 14
I'm unable to parse what's going wrong. Also, it's worth noting the System.Exception
did not halt the program in a way that exited my grammar-trasnform pipeline despite having set -eo pipefail
in my shell. trgroup
ran for 10+ minutes with low/no CPU consumption after the error occurred.
Do you know what's happening here?
This program needs to be fixed.
For this grammar:
grammar temp;
my_rule: foo opt_bar baz;
opt_bar: bar | ;
the following command outputs both rules in the grammar.
trparse temp.g4 | trxgrep ' //parserRuleSpec[//alternative[@ChildCount=0]]' | trtext
This doesn't make sense because trparse temp.g4 | trxgrep ' //alternative[@ChildCount=0]' | trtree
returns a result of one match, which is correct. And, trparse temp.g4 | trxgrep ' //parserRuleSpec[//al]' | trtext
returns a result of zero matches because there are no parserRuleSpec
nodes with an al
descent anywhere.
This is in referece to #21
1+2+3+4+5+6+7+8+9
trparse in.txt | trxgrep ' count(//SCIENTIFIC_NUMBER/ancestor::*)'
Note, the xpath implementation doesn't have a way to create a tuple, so I can't do a count, and also return the node. That's a failure in xpath2 and I don't think it's available in xpath3.
This crashes in trxgrep. If I remove the function call "count()", it works.
R. L¨ammel, V. Zaytsev, An Introduction to Grammar Convergence, in: M. Leuschel, H. Wehrheim (Eds.), Proceedings of the Seventh International Conference on Integrated Formal Methods (iFM 2009), Vol. 5423 of LNCS, Springer, 2009, pp. 246–260. doi: 10.1007/978-3-642-00255-7_17. https://link.springer.com/chapter/10.1007/978-3-642-00255-7_17
I've got a grammar where some rules match the empty string. I'd like to transform those rules such that they no longer match the empty string and have all references to the rules made optional. Example:
// before
my_rule: foo opt_bar baz;
opt_bar: bar | ;
// after
my_rule: foo opt_bar? baz;
opt_bar: bar;
More difficult example:
// before
foo: bar | opt_baz | quux; // <- foo matches the empty string too
opt_baz: baz | ;
// after
foo: bar | opt_baz? | quux;
opt_baz: baz;
If you point me to where this feature should go, I'd be happy to take a stab at it.
In a bison grammar there were two production rules distinguished by case: character
and Character
(1). trparse gram.y | trconvert | trsponge
normalized the distinct symbols in the input grammar to character
in the output grammar. That's a reasonable way to deal with a pathological grammar, but I think it's a bug in how trparse
parses yacc-compatible grammars: the yacc spec says that
[Rule] Names are of arbitrary length, made up of letters, periods ( '.' ), underscores ( '_' ), and non-initial digits. Uppercase and lowercase letters are distinct.
I'd be happy to take a shot at addressing this bug if you point me to the right area of the code.
1: there was also a CHARACTER
a token to spice things up, but trparse
handled that perfectly.
For https://stackoverflow.com/a/71641054/4779853 one should make a custom error handler in cases where the error is a NoViableAltException
. That code should go through the possible transitions in the ATN to compute the lookahead.
If I perform a trparse abbLexer.g4 abbParser.g4
(the abb grammar from the grammars-v4 repo), the parsing result set is correct. However, the code that reads the serialized data creates a bad deserialized object--the token streams are messed up. There is a bit of juggling of data around in order to create the objects, but it has to be done after reading everything into temporary variables.
The web tool, only in infancy, needs to provide trgen for an input grammar. Download should be a zip or tar or tgz.
There's an issue with the grammars-v4/verilog/systemverilog grammar: it's not split correctly. Trash/trgen should detect improper grammars that appear to have split (have two separate grammars, one for the lexer, the other for the parser), but don't declare the grammar correctly.
The trmvsr is a terrible program. It's useless except for parser grammars, and it is just a specialized case for the much more powerful and useful trmove. To move a rule to the top, run something like this:
trparse foo.g4 | trmove "//ruleSpec[parserRuleSpec/RULE_REF/text()='start_rule_name']" "(//ruleSpec)[1]" | trsponge -c true
trmvsr must be removed.
This grammar parses fine in Antlr3 but doesn't parse after conversion to Antlr4. This is because the grammar, which is generated by XText, contains double-parenthesized expressions, e.g., RULE_UNRESTRICTED_NAME : '\'' ('\\' ('b'|'t'|'n'|'f'|'r'|'"'|'\''|'\\')|~(('\\'|'\'')))* '\'';
.
InteralAlf.txt
For these rules:
xx : 'a' xx | 'a';
yy : yy 'b' | 'b' ;
zz : | 'a' | 'a' zz;
z2 : | 'b' | z2 'b';
trkleene outputs this:
xx : ( 'a' ) * ( 'a' ) ;
yy : ( 'b' ) ( 'b' ) * ;
zz : ( 'a' ) * ( | 'a' ) ;
z2 : ( | 'b' ) ( 'b' ) * ;
While I think it is okay, it could be improved by using the +
-operator and not require all the parentheses.
xx: 'a'+;
yy: 'b'+;
zz: 'a'*;
z2: 'b'*;
reproduction case (with both trparse
and tranalyze
at 0.11.5)
grammar temp;
foo: ('bar')*;
trparse ./temp.g4 | tranalyze | grep 'Rule foo is' | sed 's/^/# /g'
# Rule foo is NonEmpty
Parse errors are just ignored. They should not.
Hi @kaby76, would you be willing to accept contributions to this repo? If so, would you be willing to document your desired process for someone to contribute to this repo?
I'm grateful for your work. I'd like to contribute patches in a way that makes your life easier and respects your creative ownership of the code. However, I understand if you'd prefer to limit contributions: reviewing PRs is work that you don't owe anyone. In either case, I think it might be worth writing out a CONTRIBUTING.md
to establish either a process or some ground rules for contributions.
During trparse /tmp/gram.g4 | trkleene
, I get a nonzero exit with
ERROR(S):
A sequence value not bound to option name is defined with few items than
required.
--help Display this help screen.
--version Display version information.
value pos. 0
This appears to be a problem with the definition of the command-line interface
I'm trying to get the "start rule" for a grammar. That should be the parser rule that contains a TOKEN_REF = "EOF". But, when I do that, I get a set containing basically every parser rule.
When a pom.xml file is not supplied, trgen assumes information passed via command line. Instead, the tool should just read the grammars in the directory, and look for the first rule with EOF at the end. In addition, when the tool is able to read the grammar, I can then focus on generating a target-specific grammar from a "target agnostic" grammar. Whether we choose Java as the accepted format of actions, or we have an "options { language=TargetAgnostic; }" is a good question.
Found here:
Lines 315 to 319 in c3113e8
This code is making an assumption that <grammarName>
is directly inside <configuration>
, but the actual docs for antlr4test-maven-plugin say that it doesn't have to be. You can have multiple scenarios, each testing a different grammar.
Discovered on my PR over at antlr/grammars-v4, which has this:
<configuration>
<scenarios>
<scenario>
<scenarioName>FBX</scenarioName>
<verbose>false</verbose>
<showTree>false</showTree>
<entryPoint>start</entryPoint>
<grammarName>FBX</grammarName>
<packageName></packageName>
<testFileExtension>.fbx</testFileExtension>
<exampleFiles>examples/</exampleFiles>
</scenario>
<scenario>
<scenarioName>FBXSemantic</scenarioName>
<verbose>false</verbose>
<showTree>false</showTree>
<entryPoint>start</entryPoint>
<grammarName>FBXSemantic</grammarName>
<packageName></packageName>
<testFileExtension>.fbx</testFileExtension>
<exampleFiles>examples/</exampleFiles>
</scenario>
</scenarios>
</configuration>
Instead of trrename "foo_,foo"
, allow trrename "(?<name>.*)_,$name"
.
Hello,
I'm trying to build this project but it could not find TrashBase nuget package. can you please help me about this?
I'll use it to convert antlr2 grammar to antlr4 if I can by the way.
Edit: I have tried to use VS2019 extension, but it throws an error for Antlr4.Runtime.Standard.dll and I could not work it out.
Thanks,
Best Regards
Parsing results, I think, can pass multiple trees between programs. However, trdelete wants to do var pr = LanguageServer.ParsingResultsFactory.Create(document);
and that returns null because the results are stdin and there are multiple tree nodes. I want this to work so I can collect all class declarations in a C# file--actually, all of antlr/antlr4/runtime/CSharp/src/...
In the XText Antlr4 grammar, this line occurs:
assignment : ( ( ( ( '=>' ) ) | ( ( '->' ) ) ) ? ( ( validID ) ) ( ( ( '+=' | '=' | '?=' ) ) ) ( ( assignableTerminal ) ) ) ;
trrup
removes too many parentheses around ( '+=' | '=' | '?=' )
.
Please refer to https://www.w3.org/TR/2010/REC-xquery-20101214/#EBNFNotation
There is a working version here: https://www.bottlecaps.de/rr/ui
But, for all the bluster in w3c.org in saying that is provides "high quality" specifications, W3C EBNF itself is not formally defined!
BNF Notation != W3C EBNF
The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].
XQuery 3.1, more detail here
This thread has a great table describing the transformations involved in conversion.
Using trgroup
version v0.11.0, I'm unable to reproduce the examples from the trgroup README:
echo "grammar temp;
a : 'X' 'B' 'Z' | 'X' 'C' 'Z' | 'X' 'D' 'Z' ;
" > ./temp.g4;
trparse ./temp.g4 | trgroup "//parserRuleSpec[RULE_REF/text() = 'a']//altList" | trprint
# prints 'no changes'
When I read the current trrename readme and help-text
trrename
renames rule symbols in a grammar.
I didn't expect renaming tokens to work. I tested it out, however, and I was pleasantly surprised!
test:
parser grammar temp;
a : MY_TOKEN
; trparse ./temp.g4 | trrename -r 'MY_TOKEN,OTHER_TOKEN' | trprint | sed 's/^/# /g' 130
# parser grammar temp;
# a : OTHER_TOKEN ;
This line, closure--I had to add the parens manually here.
When a Trash command gets an XPath expression that fails to parse, it outputs:
org.eclipse.wst.xml.xpath2.processor.XPathParserException: The parse operation was cancelled.
That's not really helpful--at all.
I don't know if this is possible, or even a good design: when I do trparse foo.g4
, most of the time I'm just interested in whether the parse actually succeeded. I don't want all the parsing result. But, clearly, for trparse foo.g4 | trtree
, trparse should output a parsing result. And for trparse 2>&1 | less
, I just want the list of error messages. Perhaps I should offer an optional arg for parsing result?
As I pointed out in this conversation, there's a problem with trkleene. This must be fixed for optimizing the C++14/17/20 ISO grammars automatically.
I would like to create tuples of information from xpath expressions. For example, if I want to find nodes in a Java parse tree and tags specific nodes with an integer, the resulting sets cannot be combined because there's no operator or function to union value sets:
value-union(//classDeclaration/IDENTIFIER/string-join(' 1', text()) , //fieldDeclaration/variableDeclarators/variableDeclarator/variableDeclaratorId/IDENTIFIER/string-join(' 2',text()) )
I would like to bind a function value-union
, written in C#, to trxgrep.
I have yet to find a good paper on enumeration method, defined in clear formal manner, defining and using derivations. Derivation
This should be easy by looking for public classes and public methods in classes. But, #88 needs to be fixed since I need to grep for these classes and methods, and delete some nodes out of the found trees to get just the API declarations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.