I've spent some time familiarising myself with Fling and I am positively impressed by your work. Naturally, I am starting to hit some limitations as I explore further. So the goal of this issue is to discuss one particular limitation. I'd like to know what you think and whether it is possible to overcome this limitation in Fling.
The limitation comes from the restriction on the class of accepted grammars, which must be LL. Here are some challenges that stem from it:
- Any sophisticated grammar needs to be rewritten to fit the class of LL grammars. This is mostly a matter of programming ergonomics, and I can see how the translation could be implemented.
- Generated ASTs directly correspond to their LL grammars. This challenge is the most significant one as it makes the generated AST difficult to work with: it is hard to reason about in the same way as LL grammars are hard to read. Moreover, there is an issue of evolvability. Certain changes that induce little semantic variation, such as operator precedence, might require significant changes to the LL grammar resulting in proportionate changes to the resulting AST.
For example, let's consider a context-free grammar for a language that has predicates and means of combining them:
Condition = Predicate | Condition 'and' Condition | Condition 'or' Condition | '(' Condition ')'
Predicate = ...
Notably, this grammar is not LL as it involves left recursion.
To provide this grammar to Fling, it has first to be transformed:
Condition = Predicate Condition$$ | CompoundCondition
CompoundCondition = '(' Condition ')' Condition$$
Condition$ = 'or' Condition | 'and' Condition
Condition$$ = Condition$ Condition$$ | ε
Now in Fling:
derive(Condition).to(Predicate, Condition$$).or(CompoundCondition).
derive(CompoundCondition).to(begin, Condition, end, Condition$$).
derive(Condition$).to(or, Condition).or(and, Condition).
derive(Condition$$).to(Condition$, Condition$$).orNone().
And the resulting AST:
interface Condition {}
record Condition1(Predicate predicate, Condition$$ condition$$) implements Condition {}
record CompoundCondition(Condition condition, Condition$$ condition$$) implements Condition {}
interface Condition$ {}
record Condition$1(Condition condition) implements Condition$ {}
record Condition$2(Condition condition) implements Condition$ {}
interface Condition$$ {}
record Condition$$1(Condition$ condition$, Condition$$ condition$$) implements Condition$$ {}
record Condition$$2 implements Condition$$ {}
Clearly, such an AST is difficult to reason about because this is the level of LL grammars.
Ideally, an AST corresponding to the original grammar would have the following form:
interface Condition {}
record AndCondition(Condition left, Condition right) implements Condition {}
record OrCondition(Condition left, Condition right) implements Condition {}
record CompoundCondition(Condition cond) implements Condition {}
record Predicate(...) implements Condition {}
And the original grammar would be specified as follows:
specialize(Condition).into(AndCondition, OrCondition, CompoundCondition, Predicate)
derive(AndCondition).to(Condition, and, Condition).
derive(OrCondition).to(Condition, or, Condition).
derive(CompoundCondition).to(begin, Condition, end).
derive(Predicate).to(...).
Overall, it would be great to be able to:
- Specify a context-free grammar as input to Fling (given that it can be transformed into LL).
- Have the generated AST resemble the input grammar as closely as possible.
As I already mentioned, I know that CFG to LL translation is possible, so point 1 could be achieved.
The requirement outlined in point 2, however, seems vastly more difficult to achieve. It would
probably also require changes to the parsing algorithm. A mapping between the original and
transformed grammars would need to be maintained and used to construct the desirable AST. I can only
guess at possible approaches, so I strongly wish to hear what the authors of Fling think.