Giter Club home page Giter Club logo

domemtech.trash's Issues

Kleene still messed up

After a bit of refactoring, here are the transforms that need to be fixed:

        // Left recursion:
        // A -> A b1 | A b2 | ... | a1 | a2 | ... ;
        // => A ->  (a1 | a2 | ... ) (b1 | b2 | ...)*;
        // Note, A on RHS cannot have any postfix operators.
        //
        // A -> A? b1 | A? b2 | ... | a1 | a2 | ...;
        // A -> A b1 | b1 | A b2 | b2 | ... | a1 | a2 | ...;
        // A -> b1 | b2 | ... | a1 | a2 | ... | A b1 | a b2 | ...; 
        // A -> ( a1 | a2 | ... | b1 | b2 | ... ) (b1 | b2 | ...)* ;
        // A on RHS must only be "A?".
        //
        // Note, the rule cannot have any alts without A?.
        //
        // Right recursion:
        // Convert A -> b1 A | b2 A | ... | a1 | a2 | ... ;
        // into A ->   (b1 | b2 | ...)* (a1 | a2 | ... )
        //
        // A -> a1 | a2 | ... | b1 A? | b2 A? | ...;
        // A -> (b1 | b2 | ...)* (a1 | a2 | b1 | b2 | ...)
        // A on RHS must only be "A?".

trgroup fails with exception

Versions
trparse 0.11.5
trgroup 0.11.5
trprint 0.11.5

command line:
trparse g.g4 |trgroup|trprint

Grammar:
Note: commenting any of alts make trgroup happy

grammar trgroupfail;
alter_table_cmd
: ADD_P COLUMN IF_P NOT EXISTS columnDef
| ALTER opt_column colid alter_column_default
| ALTER opt_column colid SET NOT NULL_P
| NOT OF
;

Error:
System.Exception: Exception of type 'System.Exception' was thrown.
at NWayDiff.Classical1.classical_lcs(List1 a, List1 b, Int32 i, Int32 j, Dictionary2 memo)
at NWayDiff.Classical1.classical_lcs(List1 a, List1 b, Int32 i, Int32 j, Dictionary2 memo)
at NWayDiff.Classical1.classical_lcs(List1 a, List1 b, Int32 i, Int32 j, Dictionary2 memo)
at NWayDiff.Difdef_impl1.add_vec_to_diff_classical(Diff1& a, Int32 fieldid, List1 b) at NWayDiff.Difdef_impl1.add_vec_to_diff(Diff1& a, Int32 fileid, List1 b)
at NWayDiff.Difdef_impl1.merge(Int32 fmask) at NWayDiff.Difdef1.merge()
at LanguageServer.Transform.Group(List`1 nodes, Document document)
at Trash.CGroup.Execute(Config config) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\CGroup.cs:line 84
at Trash.Program.MainInternal(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 68
at Trash.Program.Main(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 14

trparse -input isn't working

For this grammar:

grammar A;
all: e* EOF;
e: 
  e D e
  | e S  e
  | e M e
  | e P e
  | OP e CP
  | e (LT | LE | GT | GE | EQ | NE) e
  | e A e
  | e O e
  | NUMBER
  | STRING
  | IDENTIFIER
  ;    
OP: '(';
CP: ')';
D : '/';
S : '*';
M : '-';
P : '+';
LT: '<';
LE: '<=';
GT: '>';
GE: '>=';
EQ: '==';
NE: '!=';
A: '&&';
O: '||';
NUMBER: [0-9]+;
STRING: '"' ~'"'*? '"';
IDENTIFIER: [a-zA-Z]+;
WS: [ \t\n\r]+ -> channel(HIDDEN);

generate a CSharp target parser (trgen -t CSharp -s all) and test for input "8*4/2". Beyond the fact that the grammar doesn't define the multiplicative operators correctly (they are ordered in different alts!), "trparse -input '8*4/2' | trtree" is a truncated tree. If the input is in a file, "trparse file | trtree" works fine.

0.13.8.

Multiple file xgrep should work analogously to grep

When I do a grep "hello" *.g4, I see a list of lines preceded by a file name. When I do a trparse *.g4 | trxgrep ' //TOKEN_REF[text()="EOF"]', I see a parsing result set, but if I pipe to trtext, I get the text, but have no idea which file this came from. There seems to be a missing flag here.

trgen fails to generate driver code

The trgen program doesn't work with a number of scenarios:

  • Multiple grammars scenario, as reported here.
  • Missing test scenario, as reported here.
  • trgen -t Go parser doesn't work because it puts the parser in the wrong directory.
  • trgen has the capability to just directly gather the information from the grammar(s) in the current directory.

These need to be fixed (after I get the optimized ISO C++ grammars finished).

Add grouping to tranalyze

As noted by Ivan Kochurkin, Antlr parsers perform better--and just look more reasonable--with grouping of common alt prefixes. See this note.

So, tranalyze, which I just entered in the toolchain, should have a check for prefixes, and flag potential groupings.

converting inline %prec in bison grammars leaves extra symbols

Using the latest version of trash (8.5),

// bison
SelectStmt: select_no_parens			%prec UMINUS
			| select_with_parens		%prec UMINUS
		;

becomes

selectStmt  : select_no_parens UMINUS
  | select_with_parens UMINUS
  ;

The desired outcome is

selectStmt  : select_no_parens
  | select_with_parens
  ;

Error converting bison grammar

Hi! I was testing out the script to convert grammars on a .y grammar, and I got an unexpected error:

System.Runtime.CompilerServices.SwitchExpressionException: Non-exhaustive switch expression failed to match its input.
Unmatched value was BisonParser.g4.
   at Docs.Class1.CreateDoc(ParsingResultSet parse_info) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\Docs\Class1.cs:line 63
   at Trash.CConvert.Execute(Config config) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\CConvert.cs:line 44
   at Trash.Program.MainInternal(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\Program.cs:line 65
   at Trash.Program.Main(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trconvert\Program.cs:line 14

Looking at line 63 of Docs/Class1.cs, it seems like trconvert doesn't support conversion from Bison to Antlr4 without additional configuration. Is that correct?

Reproduction:

#!/usr/bin/env bash
# with docker and git accessible

# clone trash
git clone https://github.com/kaby76/Domemtech.Trash.git /tmp/trash
cd /tmp/trash

# build a docker image with trash's tools installed, since I don't have .NET installed locally
echo '
# Dockerfile, butilt in the repo root cloned to 5cfd838
FROM mcr.microsoft.com/dotnet/sdk
ENV PATH="$PATH:/root/.dotnet/tools"
WORKDIR /trash
COPY . /trash
RUN dotnet tool install -g trparse
RUN dotnet tool install -g trconvert
CMD bash
' | docker build -t trash -f - /tmp/trash;

git clone --depth 1 https://github.com/postgres/postgres.git /tmp/postgres

docker run \
  -v /tmp/postgres:/postgres \
  --workdir /postgres \
   trash sh -c '
     trparse /postgres/src/backend/parser/gram.y | trconvert
  '

trxgrep //foobar/text() should return text

I know that it's possible to have text results instead of parsing tree results. A trxgrep //foobar/text() should return the text attribute for the node, not a NET type name.

bug: unexplained error when transforming large grammar

I pointed the sharp end of trgroup v0.11.2 at the trconverted postgres grammar, and I got

System.Exception: Exception of type 'System.Exception' was thrown.
   at NWayDiff.Classical`1.classical_lcs(List`1 a, List`1 b, Int32 i, Int32 j, Dictionary`2 memo)
   at NWayDiff.Difdef_impl`1.add_vec_to_diff_classical(Diff`1& a, Int32 fieldid, List`1 b)
   at NWayDiff.Difdef_impl`1.add_vec_to_diff(Diff`1& a, Int32 fileid, List`1 b)
   at NWayDiff.Difdef_impl`1.add_vec_to_diff(Diff`1& a, Int32 fileid, List`1 b)
   at NWayDiff.Difdef_impl`1.merge(Int32 fmask)
   at NWayDiff.Difdef`1.merge()
   at LanguageServer.Transform.Group(List`1 nodes, Document document)
   at Trash.CGroup.Execute(Config config) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\CGroup.cs:line 84
   at Trash.Program.MainInternal(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 68
   at Trash.Program.Main(String[] args) in C:\Users\kenne\Documents\GitHub\Domemtech.Trash\trgroup\Program.cs:line 14

I'm unable to parse what's going wrong. Also, it's worth noting the System.Exception did not halt the program in a way that exited my grammar-trasnform pipeline despite having set -eo pipefail in my shell. trgroup ran for 10+ minutes with low/no CPU consumption after the error occurred.

Do you know what's happening here?

xpath engine doesn't seem to produce right result

For this grammar:

grammar temp;
my_rule: foo opt_bar baz;
opt_bar: bar | ;

the following command outputs both rules in the grammar.

trparse temp.g4 | trxgrep ' //parserRuleSpec[//alternative[@ChildCount=0]]' | trtext

This doesn't make sense because trparse temp.g4 | trxgrep ' //alternative[@ChildCount=0]' | trtree returns a result of one match, which is correct. And, trparse temp.g4 | trxgrep ' //parserRuleSpec[//al]' | trtext returns a result of zero matches because there are no parserRuleSpec nodes with an al descent anywhere.

This is in referece to #21

Likely bug in trxgrep

  • Create the expression grammar (mkdir foo; cd foo; trgen; cd Generated; make).
  • Create input file "in.txt" with 1+2+3+4+5+6+7+8+9
  • Run: trparse in.txt | trxgrep ' count(//SCIENTIFIC_NUMBER/ancestor::*)'

Note, the xpath implementation doesn't have a way to create a tuple, so I can't do a count, and also return the node. That's a failure in xpath2 and I don't think it's available in xpath3.

This crashes in trxgrep. If I remove the function call "count()", it works.

feature request: make optional rules mandatory

I've got a grammar where some rules match the empty string. I'd like to transform those rules such that they no longer match the empty string and have all references to the rules made optional. Example:

// before
my_rule: foo opt_bar baz;
opt_bar: bar | ;
// after
my_rule: foo opt_bar? baz;
opt_bar: bar;

More difficult example:

// before
foo: bar | opt_baz | quux; // <- foo matches the empty string too
opt_baz: baz | ;
// after
foo: bar | opt_baz? | quux;
opt_baz: baz; 

If you point me to where this feature should go, I'd be happy to take a stab at it.

trparse fails to distiguish bison production rules with different casings

In a bison grammar there were two production rules distinguished by case: character and Character (1). trparse gram.y | trconvert | trsponge normalized the distinct symbols in the input grammar to character in the output grammar. That's a reasonable way to deal with a pathological grammar, but I think it's a bug in how trparse parses yacc-compatible grammars: the yacc spec says that

[Rule] Names are of arbitrary length, made up of letters, periods ( '.' ), underscores ( '_' ), and non-initial digits. Uppercase and lowercase letters are distinct.

I'd be happy to take a shot at addressing this bug if you point me to the right area of the code.


1: there was also a CHARACTER a token to spice things up, but trparse handled that perfectly.

Remove and replace trmvsr

The trmvsr is a terrible program. It's useless except for parser grammars, and it is just a specialized case for the much more powerful and useful trmove. To move a rule to the top, run something like this:

trparse foo.g4 | trmove "//ruleSpec[parserRuleSpec/RULE_REF/text()='start_rule_name']" "(//ruleSpec)[1]" | trsponge -c true

trmvsr must be removed.

Kleene should output better rewrites

For these rules:

xx : 'a' xx | 'a';
yy : yy 'b' | 'b' ;
zz : | 'a' | 'a' zz;
z2 : | 'b' | z2 'b';

trkleene outputs this:

xx : ( 'a' ) * ( 'a' ) ;
yy : ( 'b' ) ( 'b' ) * ;
zz : ( 'a' ) * ( | 'a' ) ;
z2 : ( | 'b' ) ( 'b' ) * ;

While I think it is okay, it could be improved by using the +-operator and not require all the parentheses.

xx: 'a'+;
yy: 'b'+;
zz: 'a'*;
z2: 'b'*;

Policy and/or document request: contribution guidelines?

Hi @kaby76, would you be willing to accept contributions to this repo? If so, would you be willing to document your desired process for someone to contribute to this repo?

I'm grateful for your work. I'd like to contribute patches in a way that makes your life easier and respects your creative ownership of the code. However, I understand if you'd prefer to limit contributions: reviewing PRs is work that you don't owe anyone. In either case, I think it might be worth writing out a CONTRIBUTING.md to establish either a process or some ground rules for contributions.

Add in parsing of grammars into trgen

When a pom.xml file is not supplied, trgen assumes information passed via command line. Instead, the tool should just read the grammars in the directory, and look for the first rule with EOF at the end. In addition, when the tool is able to read the grammar, I can then focus on generating a target-specific grammar from a "target agnostic" grammar. Whether we choose Java as the accepted format of actions, or we have an "options { language=TargetAgnostic; }" is a good question.

trgen assumes that Maven project only contains one scenario when determining grammarName

Found here:

var pom_grammar_name = navigator
.Select("//plugins/plugin[artifactId=\"antlr4test-maven-plugin\"]/configuration/grammarName", nsmgr)
.Cast<XPathNavigator>()
.Select(t => t.Value)
.ToList();

This code is making an assumption that <grammarName> is directly inside <configuration>, but the actual docs for antlr4test-maven-plugin say that it doesn't have to be. You can have multiple scenarios, each testing a different grammar.

Discovered on my PR over at antlr/grammars-v4, which has this:

                <configuration>
                    <scenarios>
                        <scenario>
                            <scenarioName>FBX</scenarioName>
                            <verbose>false</verbose>
                            <showTree>false</showTree>
                            <entryPoint>start</entryPoint>
                            <grammarName>FBX</grammarName>
                            <packageName></packageName>
                            <testFileExtension>.fbx</testFileExtension>
                            <exampleFiles>examples/</exampleFiles>
                        </scenario>
                        <scenario>
                            <scenarioName>FBXSemantic</scenarioName>
                            <verbose>false</verbose>
                            <showTree>false</showTree>
                            <entryPoint>start</entryPoint>
                            <grammarName>FBXSemantic</grammarName>
                            <packageName></packageName>
                            <testFileExtension>.fbx</testFileExtension>
                            <exampleFiles>examples/</exampleFiles>
                        </scenario>
                    </scenarios>
                </configuration>

Trash Build

Hello,

I'm trying to build this project but it could not find TrashBase nuget package. can you please help me about this?

I'll use it to convert antlr2 grammar to antlr4 if I can by the way.

Edit: I have tried to use VS2019 extension, but it throws an error for Antlr4.Runtime.Standard.dll and I could not work it out.

Thanks,

Best Regards

Can't do "trparse ... | trxgrep ... | trdelete ... | trtext"

Parsing results, I think, can pass multiple trees between programs. However, trdelete wants to do var pr = LanguageServer.ParsingResultsFactory.Create(document); and that returns null because the results are stdin and there are multiple tree nodes. I want this to work so I can collect all class declarations in a C# file--actually, all of antlr/antlr4/runtime/CSharp/src/...

trrup removes required parentheses

In the XText Antlr4 grammar, this line occurs:

assignment : ( ( ( ( '=>' ) ) | ( ( '->' ) ) ) ? ( ( validID ) ) ( ( ( '+=' | '=' | '?=' ) ) ) ( ( assignableTerminal ) ) ) ;

trrup removes too many parentheses around ( '+=' | '=' | '?=' ).

W3C EBNF is wrong

Please refer to https://www.w3.org/TR/2010/REC-xquery-20101214/#EBNFNotation

There is a working version here: https://www.bottlecaps.de/rr/ui

But, for all the bluster in w3c.org in saying that is provides "high quality" specifications, W3C EBNF itself is not formally defined!

BNF Notation != W3C EBNF

The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].

=> XML 1.0 EBNF NOTATION

  • Start symbol begins with lowercase letter. Everything else is uppercase start? Really? Then "element" is a start symbol?

XQuery 3.1, more detail here

  • Notice differences statement!

unable to reproduce trgroup command-line examples

Using trgroup version v0.11.0, I'm unable to reproduce the examples from the trgroup README:

echo "grammar temp;
a : 'X' 'B' 'Z' | 'X' 'C' 'Z' | 'X' 'D' 'Z' ;
" > ./temp.g4;
trparse ./temp.g4 | trgroup "//parserRuleSpec[RULE_REF/text() = 'a']//altList" | trprint
# prints 'no changes'

documentation request: explicitly state `trrename` can rename tokens

When I read the current trrename readme and help-text

trrename renames rule symbols in a grammar.

I didn't expect renaming tokens to work. I tested it out, however, and I was pleasantly surprised!

test:

parser grammar temp;
a : MY_TOKEN
; trparse ./temp.g4 | trrename -r 'MY_TOKEN,OTHER_TOKEN' | trprint |  sed 's/^/# /g'                                                          130 
# parser grammar temp;
# a : OTHER_TOKEN ;

Trash commands should be able to distinguish between clients that read output

I don't know if this is possible, or even a good design: when I do trparse foo.g4, most of the time I'm just interested in whether the parse actually succeeded. I don't want all the parsing result. But, clearly, for trparse foo.g4 | trtree, trparse should output a parsing result. And for trparse 2>&1 | less, I just want the list of error messages. Perhaps I should offer an optional arg for parsing result?

Add extension capability to xpath library

I would like to create tuples of information from xpath expressions. For example, if I want to find nodes in a Java parse tree and tags specific nodes with an integer, the resulting sets cannot be combined because there's no operator or function to union value sets:

value-union(//classDeclaration/IDENTIFIER/string-join(' 1', text()) , //fieldDeclaration/variableDeclarators/variableDeclarator/variableDeclaratorId/IDENTIFIER/string-join(' 2',text()) )

I would like to bind a function value-union, written in C#, to trxgrep.

Add test case generator

  • Grammarinator
    • Hodován, R., Kiss, Á. and Gyimóthy, T., 2018, November. Grammarinator: a grammar-based open source fuzzer. In Proceedings of the 9th ACM SIGSOFT international workshop on automating TEST case design, selection, and evaluation (pp. 45-48).
    • https://github.com/renatahodovan/grammarinator
    • Python3
    • "Unparser()" creates a random parse tree that fits within some specified parameter limits. "Unlexer()" creates a random token of some type. Note: there is an option --keep-trees on grammarinator-generate, but I cannot get it to work. I think it's critical to know what the intended parse is supposed to be in trying to figure out what is going wrong when the test case is actually parsed.
    • Section 2, paragraph 2, "AST"--I don't know why it's called an AST, when it is actually a CST. Why do people keep confusing the 50+ y.o. term.
    • Does not perform randomized inter-token spacing. It requires a "serializer", one provided, but it is not randomized, and there is no analysis as to when no intertoken spacing is required.
  • Gramtest

I have yet to find a good paper on enumeration method, defined in clear formal manner, defining and using derivations. Derivation

  • Purdom, P., 1972. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3), pp.366-375.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.