Giter Club home page Giter Club logo

Comments (10)

RomanYankovsky avatar RomanYankovsky commented on June 12, 2024

My first intention was to write some code that will produce a single unit's AST as an XML. It gives me an opportunity to use standard XML tools (e.g. XPath) to do some simple static analysis. For me XPath is the best way to query the tree. With XML and XPath I can easily ask questions like "give me a list of variables which start with 'abc'".

This project was not intended to be in open source, but now it is.

That's why class structure rather primitive (only TSyntaxNode class). Though more complex AST with typed AST nodes, visitor support and so forth is a good direction for future development.

Error recovery is my pain too. But I still have no time to deal with it. If somebody wish to enhance the parser in order to make it able to recover from errors, it would be great.

At this moment the main weakness is the processing of unit's interface section (data types declarations and so forth) and project/package files (I was paying the most attention to unit). Parser is in good shape but syntax tree builder still ignores many details. So in near future I plan to concentrate on filling this gap. Any contributions are appreciated. We need to have complete AST tree before we could start thinking about a symbol table.

Your idea about a common parser for all IDE experts sounds promising.
Maybe one day it will become a reality.

from delphiast.

malcolmgroves avatar malcolmgroves commented on June 12, 2024

HI Vincent and Roman,

I was also thinking along the lines of a more strongly typed representation. I would be interested in helping out with it if needed. I have this dream of being able to for..in across all functions in a class, querying for matching names, parameter types, etc.

I also ran it across some of my code that failed to parse. I'll try and narrow down a minimal failing case later this week and see what I can find.

Cheers
Malcolm

from delphiast.

vincentparrett avatar vincentparrett commented on June 12, 2024

I don't have much time to work on it at the moment (trying to finish FB8) but I'll try and put some ideas to words and post here for comment. I'd hate to see multiple people all working on the same thing which then becomes a merge nightmare. I guess it would be good to define some goals for the project, what are we trying to achieve, what types of usage are envisaged etc.

from delphiast.

RomanYankovsky avatar RomanYankovsky commented on June 12, 2024

It is hard to define specific goals... How about "to produce a fully qualified abstract syntax tree"? Maybe you have more specific ideas?

I think we could start with making the tree representation typed.

from delphiast.

quartexNOR avatar quartexNOR commented on June 12, 2024

Before XMAS i finished the proposal called LDEF, a language definition intermediate format, better known as an AST.
The purpose of LDEF was to provide first and foremost a binary, portable format suitable for source to source compilation, analysis; being able to extract not just information about classes and their members, but also about per-line operations (read: the actual code).

LDEF is more or less a bog standard in-memory class hiearchy capable of representing most languages and their features, such as object pascal, C# and to some extent C++ (no support for multiple ancestors).

My personal interest is source-to-source compilation, so naturally I am biased towards that. But the information gathering is exactly the same as that required for analysis of code and/or "IDE plugins".

LDEF was planned for the coming spring, but since DelphiAST already does most of what i wanted, I think I may save a lot of time by building on that instead.

One thing that would be extremely cool would be to use lex/yacc with this. That way you would not just be able to parse object pascal, but also C# and all the other languages and produce the same compatible AST. With a codegen capable of reading the XML you would in fact have a 1-to-many compiler chain.

Although it would be useless without an RTL

from delphiast.

bkisdi avatar bkisdi commented on June 12, 2024

My vision about this project is a two (or three) phase parsing:

The first phase is tokenizing: separate keywords, operators, literals, identifyers, comments, white space etc. - and syntax errors (such as unterminated string literals). This produces a linear list of tokens, with line and col position, token kind and user-defined data. This list is useful (and should contain enough information) for a syntax-highlighter, for example. The program should be able to export this list separately. Concatenating the tokens gives back the original source code.

The second phase works on the list above, and builds a syntax tree from the tokens, revealing their structure. This phase should keep the difference between Beep; and begin Beep; end;, for example. Similarly, all compiler directives and semantic errors (such as undeclared identifyers) are kept in this level. This tree is useful for a source-code reformatter. This level should be (and more or less is) the AST. Traversing this tree should give back a syntactically equivalent version of the original source code.

The third phase would be a semantic tree. In this level interface and implementation can be merged, current compiler directives are applied and structures are expressed in language (Delphi)-independent concepts as much as possible. This level is similar to the LDEF proposed by @quartexNOR. This tree is useful for an interpreter or compiler (source-to-source, or source-to-machine code). Traversing this tree should give back a semantically equivalent version of the original source code.

Put it together, and you have Delphi...
I think all these levels (or a mix of them) are present in the Delphi environment; it would be great (and more accurate) to extract it somehow from there.
For an efficient compiler and IDE these levels are mixed and a lot of language-specific assumptions are built into, but for our project these three should be separated, I think.

from delphiast.

bkisdi avatar bkisdi commented on June 12, 2024

My task is to migrate our company's projects (hundreds of units) from Delphi to C#... :X

It seems to be a mission impossible (https://www.linkedin.com/groups/How-migrate-Delphi-application-C-40949.S.252441991), but I think that having an AST (or LDEF) is half of the way to an automated conversion - I hope it can do most of the typing of something that looks C#.

To start with a simpler task (see my fork at https://github.com/bkisdi/DelphiAST), I wanted to restore the original Delphi unit from the AST as much as possible. I'm not ready with this, but now I see a number of issues with missing information. What do you think about fixing these?

The first issue I faced with is: keeping comments. (Coincidence with issue #39 of @LaKraven.) I know, comments are "around" the syntax tree, not in it. More precisely, the syntax tree contains comments, but the semantic tree doesn't. (See my comment above.) The current AST is a strange mix of these, with a lot of anomalies. (Well, it is the closest one to a good solution, anyway.)

The ASSIGN node, for example, contains LHS and RHS subnodes, which are semantic, and those in turn contain IDENTIFIER and EXPRESSION subnodes, which are the actual syntactic items. But the FOR, WHILE etc. nodes lack these levels.

I'd like to see a complete list of the XML nodes produced by DelphiAST.

Some actual coding problems:

to, downto or in is missing from the FOR node. Unfortunately TmwSimplePasPar.ForStatement can't access FStack.Peek in TPasSyntaxTreeBuilder.ForStatement.

The kind attribute is missing from the METHOD node in the interface section.

private, protected, public, published etc. should be added to the METHOD node as attribute.

Please use unambiguous node names - it helps navigating in the AST. For example, UNIT was used in two different senses - I had to change one of them to USEDUNIT. TYPE and CALL are another examples of ambiguous nodes. I think the constants in DelphiAST.Consts should be converted to an enumeration.

TmwBasePasLex.BorProc; and part of TmwBasePasLex.BraceOpenProc; seems to do the same work; similarly TmwBasePasLex.AnsiProc; and TmwBasePasLex.RoundOpenProc;, but the formers are never executed (in my examples).

I don't see the role of the TmwPasCodeInfo enumeration. It seems to be unused.

from delphiast.

 avatar commented on June 12, 2024

@bkisdi once the Symbol Table is ready (to resolve an identifier back to its original definition between units), DelphiAST would be suitable for use in making a language-to-language conversion tool. It would, of course, be up to you to handle the casing for generating your output syntax, but all of the necessary information should be there to do just that.

I've been busy with other work the last few weeks, which is the reason why the Symbol Table isn't already completed... but (fortunately) my assignment for this weekend REQUIRES a working Symbol Table in DelphiAST, so as you can imagine I'll have to complete it in order to do the work the client is paying me for!

from delphiast.

RomanYankovsky avatar RomanYankovsky commented on June 12, 2024

@bkisdi you are right, unfortunately some information is missing from AST. It will be better to open separate issues for them.

I'll take a look at ForStatement implementation. I think I can fix this soon.

from delphiast.

quartexNOR avatar quartexNOR commented on June 12, 2024

bkisdi: The biggest problem with converting to a particular language and framework, is that most delphi applications relies heavily on the VCL. Some tidbits of the VCL can be easily ported/recognized, such as showmessage() etc.. which have their equivilents in the .net framework, but take something like database access and custom DB objects (ORM) and it will be harder.

But object pascal as a language is more than capable of being ported to C# or something else.

from delphiast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.