Giter Club home page Giter Club logo

delphiast's People

Contributors

darnocian avatar gabr42 avatar kjpowerworld avatar luebbe avatar masonwheeler avatar romanyankovsky avatar sglienke avatar uweraabe avatar vincentparrett avatar vintagedave avatar vladislavasomov avatar wosi avatar xqzmeplz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

delphiast's Issues

EParserException at array property parameter with out or var

Using out or var for a parameter in an array property leads to EParserException.

Sample

type
  IAccessibleDisp = dispinterface
    ['{5DF0DE2C-0FC1-4600-9C74-74B2171EC74D}']
    property ArrayPropConst[const AParam: Integer]: Integer readonly dispid -5009;//works
    property ArrayPropOut[out AParam: Integer]: Integer readonly dispid -5009;
    property ArrayPropVar[var AParam: Integer]: Integer readonly dispid -5009;
  end;

Found while checking $(BDS)\source\rtl\win\Winapi.oleacc.pas

FOR statement "kind" (TO / DOWNTO / IN) not parsed by DelphiAST

The AST builder is 'bypass' an important information about a FOR statement. A FOR statement can be:

FOR identifier := startvalue TO endvalue ...
FOR identifier := startvalue DOWNTO endvalue ...
FOR identifier IN enumerable ...

The base class TmwSimplePasPar parses these tokens (TO / DOWNTO / IN), but just "skips it".
(TmwSimplePasPar.ForStatement calls NextToken for ptTo, ptDownTo and Expected for ptIn tokens)

Suggested enhancement:

  • 'Clone' TmwSimplePasPar.ForStatement code to DelphiAST TPasSyntaxTreeBuilder class
  • Change NextToken calls for ptTo, ptDownTo tokens and Expected(ptIn) for ptIn token to:
FStack.Push(Lexer.Token);
try
  Expected(ptTo); // ptDownTo, ptIn
finally
  FStack.Pop;
end;

This solution will generate a separate child node (between the expression nodes) to the FOR statement with the 'for-kind' information as the childnode localname (TO / DOWNTO / IN ).
(This information also can be used as an attribute for FOR statement, in an other way).

Address compiler warnings/hints

For the sake of "best practise", I've addressed all of the hints and warnings raised within the Castalia parser.

These are fixed in my fork here: https://github.com/LaKraven/DelphiAST

No other changes have been made beyond appropriately resolving these hints/warnings (no "hide and forget" tactics used there)

Improve complex $IF and $IFELSE directives handling

Partial support for $IF and $IFELSE compiler directives was introduced previously, this work was continued by @LaKraven in #20.

Current implementation is enough in many cases, but $IF and $IFELSE directive still do not support any expression other than "DEFINED" or "NOT DEFINED".

For instance:

{$IF SOME_CONSTANT >= SOME_INTEGER}
{$IF SOME_CONSTANT > SOME_INTEGER}
{$IF SOME_CONSTANT = SOME_INTEGER}
{$IF SOME_CONSTANT < SOME_INTEGER}
{$IF SOME_CONSTANT <= SOME_INTEGER}

Also brackets are not supported too:

{$IF DEFINED(A) AND (DEFINED(B) OR DEFINED(C))}

Type name can be in either a node attribute or a child node - should be one or the other

When parsing a 'type' section, the AST records the type name of a type differently when the class is a type vs when it's anything else (record etc.)

For classes, the name of the type is an attribute 'name'.

For records, the name of the type is in a child node with node name 'type'.

So, for example, parsing code:
type
TFoo = class
end;

TBar = record
end;

TAbc = Integer;

The syntax node for the first (TFoo) will have an attribute 'type' with value 'TFoo'. The second and third (TBar, TAbc) will have child nodes with node names 'type', with name attribute 'TBar' or 'TAbc'.

I am happy to fix this but am not sure which alternative is better. (Personally, I lean towards the first behaviour, having the type name as an attribute for the typedecl node. A whole new child node seems overkill to me, unless it may contain other information in future.)

Implement new TSyntaxTreeWriter.ToXML

We don't have to use TXMLDocument to convert a simple tree of classes to a XML string. This can be done by writting, say, into a TStringStream. This will work much faster than current implementation. This could be a valuable performance enhancement.

Multidimensional array-constant's expression list not separated by dimensions

At a multidimensional array-constant definition e.g.:

const
  SAMPLE: array[1..2, 2..3] of String = ( ( 'a', 'b' ), ( 'c', 'd' ) );

the generated expression list for the RHS value is 'continuous', the 'dimensional-breaks' are lost.
(EXPR('a') - EXPR('b') - EXPR('c') - EXPR('d') is generated in AST)

Suggested solution:
Override ArrayConstant method in TPasSyntaxTreeBuilder class, with the next code:

procedure TPasSyntaxTreeBuilder.ArrayConstant;
begin
  FStack.Push(sROUNDOPEN);
  try
    inherited;
  finally
    FStack.Pop;
  end;
end;

which will 'insert' * ROUNDOPEN* tags at 'dimensional-breaks'. The generated AST nodes will be:
RO - RO - EXPR('a') - EXPR('b') - RO/ - RO - EXPR('c') - EXPR('d') - RO/ - RO/

Two remarks:

  • ROUNDOPEN is just an 'acceptable' tag, it can be for example 'EXPRESSIONS' tag (or something else).
  • The constant-dimension definitions is parsed well by DelphiAST (and one can know the expression list dimension boundaries from it), but I write a 'Delphi to JavaScript' compiler (I use the DelphiAST for this purpose), and the dimension expressions can be 'complicated' (can be any kind of constant expression, which value can not be evaluated at AST parsing). So the best way I think is to add the expression list boundaries explicit to the AST.

Move handling of Defines from Lexer to Parser

At present, the Castalia Lexer is taking responsibility for dealing with Defines... which means that tokens are only produced for source for which any conditional Defines evaluate as True.

This is incorrect behavior, as the Lexer should generate tokens for the ENTIRE source, and elimination of Define-specific tokens should be handled at the Parser level.

Understandably, this is a non-trivial change to make... however, one possible use-case would be the ability to warn a developer when their source consumes an identifier that is limited by one or more Defines (where the consuming source is not limited by suitable Defines itself).

Parser fails at attribute in interface directly after guid

The following code will fail with error: 'End' expected found '['

unit Unit1;

interface

type
IFoo = interface
['{F208B04E-2612-4E64-8728-1A94A1D19C27}']
[Test]
procedure Test;
end;

implementation

end.

P.S. It also fails for 2 attributes following each other.

Ability to include and exclude specific constructs in AST

It would be useful to provide the implementing developer with the ability to specify what constructs of the source to include within the Syntax Tree.

This could be done as a Set, so the implementing developer can choose to, for example, exclude the complete STATEMENTS section from each method implementation from the Syntax Tree, or to exclude global variables/constants etc.

This way, if a tool using the AST is only interested in a particular area of the source, they can omit the rest and make the subsequent scraping/transformation operation(s) of their implementing tool quicker.

Wish: Unit namespaces

Wish: Recognize unit namespaces

unit Test.Foo;
interface
uses Bar.Unit1;
implementation
end.

Foo and Unit1 aren't recognized:

<?xml version="1.0"?>
<UNIT line="0" col="5" name="Test">
  <INTERFACE line="1" col="0">
    <USES line="2" col="0">
      <UNIT line="2" col="5" name="Bar"/>
    </USES>
  </INTERFACE>
  <IMPLEMENTATION line="3" col="0"/>
</UNIT>

Type nodes are duplicated

Example:

Stats := TDictionary<string, Integer>.Create;

Output:

        <ASSIGN line="14" col="1">
          <LHS>
            <IDENTIFIER line="14" col="1" name="Stats"/>
          </LHS>
          <RHS>
            <EXPRESSION line="14" col="10">
              <GENERIC>
                <IDENTIFIER line="14" col="10" name="TDictionary"/>
                <DOT>
                  <TYPEARGS line="14" col="22">
                    <TYPE line="14" col="22" name="string"/>
                    <TYPE line="14" col="22" name="string"/>
                    <TYPE line="14" col="30" name="Integer"/>
                    <TYPE line="14" col="30" name="Integer"/>
                  </TYPEARGS>
                  <IDENTIFIER line="14" col="39" name="Create"/>
                </DOT>
              </GENERIC>
            </EXPRESSION>
          </RHS>
        </ASSIGN>

Delphi string literals with '#' prefixed character parts - erroneous parsing

For example the Delphi statement (which is valid in Delphi):

const
  SAMPLE = 'apple#20'#13#10;

generates the RHS expression in AST as:

<LITERAL type="string" value="apple#20#13#10"/>

which is 'losts' the original (intended) string value.
Original meaning: "apple#20" + Char(13) + Char(10) is parsed as "apple#20#13#10"

I saw a remark //TODO support ptAsciiChar in TPasSyntaxTreeBuilder.StringConstSimple, maybe it can be handled here...

EParserException at record array constant with function in expression

A function with parameters within an expression as value for a record field in an array constant leads to EParserException.

Sample

const
  FilerSignatures: array [0..0] of TBinarySignature = (
   (BinarySignature: Func(1) + 1;
    SignatureLength: 1));

"Func(1) + 1" -> exception
"Func + 1" -> no exception
"Func(1)" -> no exception

Found while checking $(BDS)\source\rtl\common\System.Classes.pas

Type arguments are missing in VARIABLES section

Example:

var
  Stats := TDictionary<string, Integer>.Create;

Output:

      <VARIABLES line="13" col="1">
        <VARIABLE>
          <NAME line="14" col="3" value="Stats"/>
          <TYPE line="14" col="10" name="TDictionary"/>
        </VARIABLE>
      </VARIABLES>

EParserException at subrange type using scoped enums

Subrange type definitions using scoped enums lead to EParserException.

Sample

type
  TAlignment = (taLeftJustify, taRightJustify, taCenter);
  TLeftRight = TAlignment.taLeftJustify..TAlignment.taRightJustify;

Found while checking $(BDS)\source\rtl\common\System.Classes.pas

Better dialect management

Introduce parser/lexer compiler-switches for finer control of language features.
First and foremost language features which counteracts platform and compiler independence.

Being able to easily adjust the parser to support "Delphi 7" dialect, or "Delphi XE" dialect -- and also turn off pointer support, ASM support and so on - broadens the uses of the AST profoundly.

-Pointers enable/disable
-ASM section enable/disable
-external library references enable/disable
-symbol export (export keyword) enable/disable
-WinAPI messages (message keyword and mapping)
-Generics enable/disable
-Records enable/disable

Also Introduce switches for allowed value-types:

-Integer
-int64
-Boolean
-Currency
-ShortString
-Variant

  • .. all intrinsic datatypes

The purpose of such customization is to make it easier to use the AST as a drop in module for existing code-generators. Smart Mobile Studio is presently using (for instance) DWScript as it's primary parser/lexer/tokenizer - and then generates javascript from the AST produced by DWS.
This however has the downside of: any changes to the codegen affects the existing DWS dialect, and any changes to DWScript affects Smart Pascal.

DelphiAST could in many ways replace the use of DWS, allowing for a much richer translation using XML transformation. A draft for this was actually written by me (Jon Lennart Aasenden) for the quartex pascal IDE (quartexpascal.wordpress.com).

Since DelphiAST is more or less 90% compatible with my own architecture for a portable, source to source based transformation format, it makes more sense to improve DelphiAST than writing yet another module.

If DelphiAST could introduce finer control of the dialect, where we could toggle support for various language features (pointers and so on), it would be more than capable of becoming the de-facto AST generator for my projects.

As of writing: JavaScript, C# and C++ are my primary planned targets. Although JavaScript is already covered by Smart Mobile Studio, and it's so efficient that no new compilers are required.
C# on the other hand is very interesting, since it would give us access to mono and it's frameworks.

Missing data from "compound" constants/variables

Parse the following using the DelphiAST demo project:

const
    BOOLEAN_NAMES: Array[Boolean] of String = ('True', 'False');

You'll get the following:

<?xml version="1.0"?>
<UNIT line="0" col="0">
  <CONSTANTS line="3" col="0">
    <CONSTANT>
      <NAME line="2" col="1" value="BOOLEAN_NAMES"/>
      <TYPE compound="true" line="2" col="11" name="ArrayOf">
        <TYPE line="2" col="29" name="String"/>
      </TYPE>
      <VALUE line="2" col="38">
        <LITERAL line="2" col="45" type="string" value="True"/>
        <LITERAL line="2" col="54" type="string" value="False"/>
      </VALUE>
    </CONSTANT>
  </CONSTANTS>
</UNIT>

Notice that the Array's "condition type" of "Boolean" is not included in the "compound" attributes?

This is crucial information, so needs to be implemented.

Attributes are not recognized

Attributes of classes, fields or properties are not recognized.
A file with this code

unit PlainOldData;

interface

uses
  Classes;

type
  InfoAttribute = class(TCustomAttribute)

  end;

  [Info]
  TPlainOldData = class(TPersistent)
  private
    [Info]
    FName: string;
  public
    [Info]
    property Name: string read FName write FName;
  end;

implementation

end.

generates this tree

<?xml version="1.0"?>
<UNIT line="1" col="1" name="PlainOldData">
  <INTERFACE line="3" col="1">
    <USES line="5" col="1">
      <UNIT line="6" col="3" name="Classes"/>
    </USES>
    <TYPESECTION line="8" col="1">
      <TYPEDECL line="9" col="3" name="InfoAttribute" type="class">
        <TYPE line="9" col="25" name="TCustomAttribute"/>
      </TYPEDECL>
      <TYPEDECL line="14" col="3" name="TPlainOldData" type="class">
        <TYPE line="14" col="25" name="TPersistent"/>
        <PRIVATE line="15" col="3" Visibility="True">
          <FIELD>
            <NAME line="17" col="5" value="FName"/>
            <TYPE line="17" col="12" name="string"/>
          </FIELD>
        </PRIVATE>
        <PUBLIC line="18" col="3" Visibility="True">
          <PROPERTY line="20" col="5" name="Name">
            <TYPE line="20" col="20" name="string">
              <TYPE line="20" col="20" name="string"/>
            </TYPE>
            <READ line="20" col="27">
              <IDENTIFIER line="20" col="32" name="FName"/>
            </READ>
            <WRITE line="20" col="38">
              <IDENTIFIER line="20" col="44" name="FName"/>
            </WRITE>
          </PROPERTY>
        </PUBLIC>
      </TYPEDECL>
    </TYPESECTION>
  </INTERFACE>
  <IMPLEMENTATION line="23" col="1"/>
</UNIT>

The definition of InfoAttribute does exist, but its usage does not.

Bad expression tree, when INDEXED and DOT operators mixed in an expression

If an expression contains INDEXED and DOT operators mixed e.g.:

a := SomeObject.SomeIndexedProperty[a].SomeSubproperty[b].Items[c].Name;

the parsed expression tree is erroneous (is in bad order, can't restore the original expression from it).

I think the cause is wrong operator priority in OperatorsInfo constant, in the DelphiAST.Classes unit.

Suggested fix ([o] : original, [*]: change to):

[o] OperatorsInfo: array [0..27] of TOperatorInfo =
...
[o] (Name: sDOT; Priority: 1; Kind: okBinary; AssocType: atRight),
[*] (Name: sDOT; Priority: 2; Kind: okBinary; AssocType: atRight), // Change Priority to 2
...
[o] (Name: sINDEXED; Priority: 4; Kind: okUnary;  AssocType: atLeft),
[*] (Name: sINDEXED; Priority: 1; Kind: okUnary;  AssocType: atLeft), // Change Priority to 1
...

FPC/Lazarus support

Most of the code should be FPC and Lazarus compatible, but dot-notation in filenames fails, also some specialization might be required with respect to generics to make it compile.
Could be very valuable for cross-platform work and being able to use FPC on linux/mac.

What are the aims of this project?

Hi All

A Delphi parser that produces a usable AST is long overdue and I'm excited to see the beginnings of one.. but I have to ask, what are the aims of or plans for this project?

I ask because the design of the parser will very much influence what is achievable. I know there are not exactly a lot of open source (or closed source/commercial) parsers out there, so it's understandable that this is based on the SimpleParser. The problem I see with this parser is that it has zero error recovery, it bails out on the first error it encounters. To make this useful for scenarios such as codeinsight, code analysis, refactoring etc, it needs to be able to recover from syntax errors (ideally without skipping too much code) and continue parsing. The lexer & parser are also a little too intertwined, for example the lexer parses compiler directives rather than the parser.

What sort of contributions are you happy to accept from the community? I'd very much like to see a more complex AST, with typed ast nodes,visitor support etc, error recovery in the parser, and perhaps a symbol table.

Not a small task though... I guess the question is where to start?.

I'm working on an IDE plugin for DUnitX (to enable running tests from within the IDE), and of course, I need a parser.

Here's some food for thought.

There are a bunch of IDE plugins that make use of syntax parsers (most likely a derivitive of the simpleparser), so for example if you have GExperts, CNPack, Castilia etc all installed, your code is getting parsed a lot on top of the IDE's own (somewhat broken) code insight parser. What if there were an IDE plugin that provided the parser as a service, which other IDE plugins could then use. Instead of multiple parsers from all the plugins, there would be one. The parser plugin would be loaded as an expert, and have an api so that other plugins can subscribe to be notified when the AST changes, and an api to force a reparse. Ideally the parse would happen in a background thread, and built it's own symbol table (could then replace error insight!)

Fix a bug in the parser (or add new language features) and all the plugins benefit. Imagine the possibilities!

Ideally, this would be something that Embarcadero would have provided as part of the tools api, but we've been asking for this for years and I'm not holding my breath.

Interested to read peoples thoughts on this.

EParserException at boolean expressions in const section

Boolean expressions in the const section lead to EParserException.

Sample

const
  C1 = 1 <> 2;
  C2 = 1 in [1..4];
  C3 = (1 <> 2) <> (2 <> 3);
  C4 = (1 <> 2) <> (2 <> 3) <> (2 <> 3);

Found it while checking
https://svn.code.sf.net/p/radstudioverins/code/trunk/svn/svn_client.pas

It can be fixed by changing SimpleExpression in TmwSimplePasPar.ConstantExpression into Expression, but that would be a breaking change regarding the tree for the constant values. (I don't care that it is a breaking change.)

Parser loops forever on .dpr file (On line 0, if I understand correctly)

If not meant to parse .dpr file, should fail spectacularly ;)

In same way it should fail on wrong file types.

Also this happens Unit with comment at beginning and very empty IDE-made Unit1.pas

// Code

var
LTreeBuilder : TPasSyntaxTreeBuilder;
LData : TMemoryStream;
LSyntaxTree : TSyntaxNode;
begin
LTreeBuilder := TPasSyntaxTreeBuilder.Create;
LData := TMemoryStream.Create;
try
LData.LoadFromFile(EditFileName.Text);
LData.Position := 0;

LTreeBuilder.AddDefine('MSWINDOWS');
LTreeBuilder.AddDefine('WIN32');

LSyntaxTree := LTreeBuilder.Run(LData);

finally
LTreeBuilder.Free;
LSyntaxTree.Free;
LData.Free;
end;

Method parameters default values not parsed

If a method has a parameter with a default value e.g.:

function SomeMethod( a: Integer = 1 );

the default value expression is not parsed by DelphiAST.

Suggested enhancement ([+] : added, [o] : original):

function TPasSyntaxTreeBuilder.FormalParameterList;
[o] var
...
[+] ParamExpr: TSyntaxNode;
...
[o] ParamKind := ParamList.GetAttribute('kind');
[+] ParamExpr := ParamList.FindNode(sEXPRESSION);
...
[o] if Assigned(TypeInfo) then
[o]   FStack.AddChild(TypeInfo.Clone);
[o]
[+] if Assigned(ParamExpr) then
[+]   FStack.AddChild(ParamExpr.Clone);
...

DelphiAST failed parsing external directive

Example:

function SHMessageBoxCheckW(wnd: HWND; pszText: LPCTSTR; pszTitle: LPCTSTR; uType: UINT; iDefault: Integer; pszRegVal: LPCTSTR): Integer; stdcall; external 'shlwapi.dll' index 191 delayed;

EParserException at type helper calls for keyword types

When calling a method from the string type helper (TStringHelper) for the string type itself that leads to EParserException. I guess this issue exists for all types that are keywords or have it's own type in TptTokenKind.

Sample

procedure Test;
begin
  string.LowerCase('X');
end;

Found while checking $(BDS)\source\rtl\common\System.Internal.ICU.pas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.