Comments (10)
My first intention was to write some code that will produce a single unit's AST as an XML. It gives me an opportunity to use standard XML tools (e.g. XPath) to do some simple static analysis. For me XPath is the best way to query the tree. With XML and XPath I can easily ask questions like "give me a list of variables which start with 'abc'".
This project was not intended to be in open source, but now it is.
That's why class structure rather primitive (only TSyntaxNode class). Though more complex AST with typed AST nodes, visitor support and so forth is a good direction for future development.
Error recovery is my pain too. But I still have no time to deal with it. If somebody wish to enhance the parser in order to make it able to recover from errors, it would be great.
At this moment the main weakness is the processing of unit's interface section (data types declarations and so forth) and project/package files (I was paying the most attention to unit). Parser is in good shape but syntax tree builder still ignores many details. So in near future I plan to concentrate on filling this gap. Any contributions are appreciated. We need to have complete AST tree before we could start thinking about a symbol table.
Your idea about a common parser for all IDE experts sounds promising.
Maybe one day it will become a reality.
from delphiast.
HI Vincent and Roman,
I was also thinking along the lines of a more strongly typed representation. I would be interested in helping out with it if needed. I have this dream of being able to for..in across all functions in a class, querying for matching names, parameter types, etc.
I also ran it across some of my code that failed to parse. I'll try and narrow down a minimal failing case later this week and see what I can find.
Cheers
Malcolm
from delphiast.
I don't have much time to work on it at the moment (trying to finish FB8) but I'll try and put some ideas to words and post here for comment. I'd hate to see multiple people all working on the same thing which then becomes a merge nightmare. I guess it would be good to define some goals for the project, what are we trying to achieve, what types of usage are envisaged etc.
from delphiast.
It is hard to define specific goals... How about "to produce a fully qualified abstract syntax tree"? Maybe you have more specific ideas?
I think we could start with making the tree representation typed.
from delphiast.
Before XMAS i finished the proposal called LDEF, a language definition intermediate format, better known as an AST.
The purpose of LDEF was to provide first and foremost a binary, portable format suitable for source to source compilation, analysis; being able to extract not just information about classes and their members, but also about per-line operations (read: the actual code).
LDEF is more or less a bog standard in-memory class hiearchy capable of representing most languages and their features, such as object pascal, C# and to some extent C++ (no support for multiple ancestors).
My personal interest is source-to-source compilation, so naturally I am biased towards that. But the information gathering is exactly the same as that required for analysis of code and/or "IDE plugins".
LDEF was planned for the coming spring, but since DelphiAST already does most of what i wanted, I think I may save a lot of time by building on that instead.
One thing that would be extremely cool would be to use lex/yacc with this. That way you would not just be able to parse object pascal, but also C# and all the other languages and produce the same compatible AST. With a codegen capable of reading the XML you would in fact have a 1-to-many compiler chain.
Although it would be useless without an RTL
from delphiast.
My vision about this project is a two (or three) phase parsing:
The first phase is tokenizing: separate keywords, operators, literals, identifyers, comments, white space etc. - and syntax errors (such as unterminated string literals). This produces a linear list of tokens, with line and col position, token kind and user-defined data. This list is useful (and should contain enough information) for a syntax-highlighter, for example. The program should be able to export this list separately. Concatenating the tokens gives back the original source code.
The second phase works on the list above, and builds a syntax tree from the tokens, revealing their structure. This phase should keep the difference between Beep;
and begin Beep; end;
, for example. Similarly, all compiler directives and semantic errors (such as undeclared identifyers) are kept in this level. This tree is useful for a source-code reformatter. This level should be (and more or less is) the AST. Traversing this tree should give back a syntactically equivalent version of the original source code.
The third phase would be a semantic tree. In this level interface and implementation can be merged, current compiler directives are applied and structures are expressed in language (Delphi)-independent concepts as much as possible. This level is similar to the LDEF proposed by @quartexNOR. This tree is useful for an interpreter or compiler (source-to-source, or source-to-machine code). Traversing this tree should give back a semantically equivalent version of the original source code.
Put it together, and you have Delphi...
I think all these levels (or a mix of them) are present in the Delphi environment; it would be great (and more accurate) to extract it somehow from there.
For an efficient compiler and IDE these levels are mixed and a lot of language-specific assumptions are built into, but for our project these three should be separated, I think.
from delphiast.
My task is to migrate our company's projects (hundreds of units) from Delphi to C#... :X
It seems to be a mission impossible (https://www.linkedin.com/groups/How-migrate-Delphi-application-C-40949.S.252441991), but I think that having an AST (or LDEF) is half of the way to an automated conversion - I hope it can do most of the typing of something that looks C#.
To start with a simpler task (see my fork at https://github.com/bkisdi/DelphiAST), I wanted to restore the original Delphi unit from the AST as much as possible. I'm not ready with this, but now I see a number of issues with missing information. What do you think about fixing these?
The first issue I faced with is: keeping comments. (Coincidence with issue #39 of @LaKraven.) I know, comments are "around" the syntax tree, not in it. More precisely, the syntax tree contains comments, but the semantic tree doesn't. (See my comment above.) The current AST is a strange mix of these, with a lot of anomalies. (Well, it is the closest one to a good solution, anyway.)
The ASSIGN node, for example, contains LHS and RHS subnodes, which are semantic, and those in turn contain IDENTIFIER and EXPRESSION subnodes, which are the actual syntactic items. But the FOR, WHILE etc. nodes lack these levels.
I'd like to see a complete list of the XML nodes produced by DelphiAST.
Some actual coding problems:
to
, downto
or in
is missing from the FOR node. Unfortunately TmwSimplePasPar.ForStatement
can't access FStack.Peek
in TPasSyntaxTreeBuilder.ForStatement
.
The kind
attribute is missing from the METHOD node in the interface section.
private
, protected
, public
, published
etc. should be added to the METHOD node as attribute.
Please use unambiguous node names - it helps navigating in the AST. For example, UNIT was used in two different senses - I had to change one of them to USEDUNIT. TYPE and CALL are another examples of ambiguous nodes. I think the constants in DelphiAST.Consts should be converted to an enumeration.
TmwBasePasLex.BorProc;
and part of TmwBasePasLex.BraceOpenProc;
seems to do the same work; similarly TmwBasePasLex.AnsiProc;
and TmwBasePasLex.RoundOpenProc;
, but the formers are never executed (in my examples).
I don't see the role of the TmwPasCodeInfo
enumeration. It seems to be unused.
from delphiast.
@bkisdi once the Symbol Table is ready (to resolve an identifier back to its original definition between units), DelphiAST would be suitable for use in making a language-to-language conversion tool. It would, of course, be up to you to handle the casing for generating your output syntax, but all of the necessary information should be there to do just that.
I've been busy with other work the last few weeks, which is the reason why the Symbol Table isn't already completed... but (fortunately) my assignment for this weekend REQUIRES a working Symbol Table in DelphiAST, so as you can imagine I'll have to complete it in order to do the work the client is paying me for!
from delphiast.
@bkisdi you are right, unfortunately some information is missing from AST. It will be better to open separate issues for them.
I'll take a look at ForStatement implementation. I think I can fix this soon.
from delphiast.
bkisdi: The biggest problem with converting to a particular language and framework, is that most delphi applications relies heavily on the VCL. Some tidbits of the VCL can be easily ported/recognized, such as showmessage() etc.. which have their equivilents in the .net framework, but take something like database access and custom DB objects (ORM) and it will be harder.
But object pascal as a language is more than capable of being ported to C# or something else.
from delphiast.
Related Issues (20)
- Can't parse System.Classes.pas HOT 2
- Can't parse Posix.SysSocket.pas
- Can't parse System.Internal.Unwinder.pas
- Attributes for variant record fields missing HOT 2
- How to compile in Lazarus v2.0.6 HOT 2
- Need list of all comments HOT 10
- Need feature to REMOVE all elements of a specific type from the source HOT 1
- Get the end_line and end_col of a multi-line comment? HOT 3
- Lexer does not fully initializes its state HOT 1
- Possible string mismatch? HOT 2
- Lexer does not parse Unicode characters
- Procedure declaration with varargs directive is not fully supported
- {$I ..\file.inc} is not proceed correctly
- delphi2007 compile failed HOT 1
- Cannot parse System.SysUtils in 10.4.2 - should there be a default set of defines? HOT 7
- FreePascal demo is not working HOT 2
- Lexer failes parsing generic types with default value without spacecing inbetween
- Error when assigning a initial value to generic "global" variable in a specific way. HOT 1
- Should an empty begin/end block be classified as `ntEmptyStatement` ? HOT 1
- Bing chat claims: DelphiAST.Helpers file exist but it's not there ? What is going on ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from delphiast.