Giter Club home page Giter Club logo

Comments (3)

OriRoth avatar OriRoth commented on July 16, 2024

I am positively impressed by your work.

I'm happy to hear that.

The limitation comes from the restriction on the class of accepted grammars, which must be LL.

Take a look at the accompanied paper, where we presented a general solution for all deterministic context-free languages or, equivalently, all LR grammars.
A few years have passed and there's now much more to be said about our algorithm. Especially, we completely missed the connection between our "tree embedding" and real-time pushdown tree automata a la Guessarian 1984.
We chose to implement LL grammars because it's much easier. Yamazaki et al. 2019's embedding also supports LR grammars but implemented for LALR grammars.

I know that CFG to LL translation is possible, so point 1 could be achieved.

That's incorrect. LL grammars can express a strict subset of deterministic CFLs which is in turn a strict subset of CFLs.

Have the generated AST resemble the input grammar as closely as possible.

You get $$ variables because we use a naive namer (NaiveNamer) which doesn't make any smart attempts at guessing intermediate variable names. You could specify them explicitly though:

A := x y | w z

=>

A := B | C

B := x y
C := w z

That should work I think.

A better namer (Namer) could possibly infer better names automatically.

from fling.

homedirectory avatar homedirectory commented on July 16, 2024

@OriRoth, I am familiar with the paper - it is through the paper that I found out about your work. However, I must say that I am yet to grasp the tree encoding you described.

Regarding my statement about the translation of CFG to LL. While it is technically incorrect, I rather meant that it is possible to translate a subset of CFLs that are expressible in an LL grammar, such as the one exemplified (predicates and combinations), not all CFLs of course.

You mention the issue with picking appropriate names for intermediate variables. However, that is not the central issue I was trying to raise, so let me be more clear about it. I consider the existence of AST classes corresponding to intermediate variables itself problematic. In the above example, I could explicitly name intermediate variables in my grammar or, as you suggest, provide a smarter Namer, but that will still lead to an obscure AST being generated. Therefore, the focus should not be on names but on the structure of the AST. That is what I tried to convey by providing an example of the desired AST, which I shall repeat below:

interface Condition {}
record AndCondition(Condition left, Condition right) implements Condition {}
record OrCondition(Condition left, Condition right) implements Condition {}
record CompoundCondition(Condition cond) implements Condition {}
record Predicate(...) implements Condition {}

This AST does not contain any intermediate variables. It directly corresponds to the original CFG (also exemplified in the issue description but not repeated in this comment) which is not LL, but the underlying language can be expressed in LL (as it was shown in the issue description), hence the need for grammar transformation.
This poses a challenge for parsing and, ultimately, AST generation. The goal is for the AST to have a structure that closely resembles the original grammar; the LL grammar (a result of grammar transformation) would then remain an implementation detail.

from fling.

OriRoth avatar OriRoth commented on July 16, 2024

Ok, I think I get it now. Quite interesting. Wouldn't supporting a different class of grammars, such as LR or LALR, alleviate the problem? It's possible in theory. It might also be interesting to see how this issue is approached in classical parsers that support LL grammars, such as ANTLR. If they came up with creative solutions, these may be applicable here as well.

from fling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.