dtpreda / predictive Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 670 KB

A generic purpose C++ parser generator

License: MIT License

CMake 3.01% C++ 96.99%

predictive's People

Contributors

Stargazers

Watchers

predictive's Issues

Parser annotation support

Currently, the parser does not support annotations. This should be updated to allow for better parsing trees. It suffices to make changes to the Parser and NonTerminal classes.

Skip expressions currently lack support from the program. The file passed to the program should be opened by a special function which should take into account the expressions passed down to the new grammar.

Symbol Existence Verification

All terminals should have been properly declared (with the exception of EOF) and all non-terminals on the right-hand side should have been declared, at some point, on the left-hand side.

Token Ambiguity

The MATCH_ALL and REGEX_EXPR tokens can be easily mistaken by the ID token. This ambiguity should be cleared in order to ease the parser development.

End of file for new grammars

The rule and symbol should be created to detect and process end of file: Start' -> Start <EOF>.

This should be done before computing the grammar sets.

Nullable, First and Follow

To generate the parser, we need a parsing table. For that, we need to determine the Nullable, First and Follow sets for each of the symbols.

The Nullable set requires empty rules support. While the closure * creates empty rules, it only allows so in a specific context. This can be solved by making the Expansion symbol on the prediCtive grammar nullable.

Regex Validation

To prevent errors on the parser generation phase, each terminal's regex should be validated at a semantic check level.

Node and node-related instances refactor

The node class should always hold pointers as references to other nodes, be it children or parents. Therefore, it should only work with Nodes in that appropriate format. A refactor is required to force this requirement upon the class.

Parsing table

After determining the Nullable, First and Follow sets, the parsing table should be built to generate the parser. For this, it must be verified that the language is, indeed, LL(1).

Wrap-up

All that suffices now is to create a main function responsible for handling all phases of the program.

It should take two arguments - the grammar file and the file to parsed.

It should output the parsing tree obtained.

Symbol Table Creation

New visitors should be created for the purpose of establishing a symbol table, required for semantic verification of the program. The actual symbol table class should also be created.

Project Vision

A project vision should be defined in order to clearly define the end goals of this project. It should be added to the root README, with a a second section for the main features.

Parser Grammar

The first step is to properly document the grammar of the parser. A good documentation should provide:

The CFG for the parser grammar;
The First and Follow sets for each terminal and non-terminal;
The Nullable value for each terminal and non-terminal;
The LL(1) parsing table.

Nested Closures

The `ClosureSimplifierVisitor isn't currently supporting nested closures. Fixing should be straightforward.

AST printing

For debug purposes (and other unforeseen ones), the AST could be printed out as text to standard output, in order for easier visualization.

Every node should take up a line, with each depth level increase getting an extra tab at the beginning of its line. Annotations should be included, next to the node's name.

Bundle visiting for AST conversion

The visitors are now capable of converting the parsing tree into an AST. Their behaviour should be bundled together in a class which provides the conversion, while hiding away specific details.

Token recognition

With the grammar now properly defined, the first step should be to allow the program to properly recognize the terminal symbols. This recognition requires the establishment of priority levels for the terminals list for disambiguation (e.g., "TOKENS" could be recognized as the TOKENS terminal symbol or as an ID terminal symbol).

Recursive Descent Parsing

Now with the ability to properly parse Tokens, the next step is to parse the grammar through recursive descent. The basic algorithm for LL(1) must be implemented. This will allow for input correctness verification.

Furthermore, the algorithm should already be responsible for generating a basic parsing tree with the overall structure of the file. Node annotating and flattening of certain branches should be left for later.

Parsing Tree Simplification

In order to return to the user the parsing tree that he wished for, there should be a visitor that correctly removes all the Intermediate_NonTerminal_X nodes, while keeping the annotations taken by those nodes.

Keywords

Keywords must be defined. The list should include:

~~A mandatory Start Non-Terminal, responsible for indicating the grammar's start point. Therefore, there should be no START token allowed~~;
Symbol name representation on the AST should have a keyword associated;
Consumed tokens should also have a keyword associated with them;
~~All prediCtive keywords, such as TOKENS, SKIP or RULES, should also not be allowed~~.

More entries may be added to this list.

Visitor(s)

With parsing now finished, the parsing tree needs to be converted into a syntax tree. For this, a generic visitor should be created to allow for contextual approaches to each node type.

Context-specific visitors should also be created, in order to convert the parsing tree into a syntax tree.

dtpreda / predictive Goto Github PK

predictive's People

Contributors

Stargazers

Watchers

predictive's Issues

Recommend Projects

Recommend Topics

Recommend Org