romanyankovsky / delphiast Goto Github PK
View Code? Open in Web Editor NEWAbstract syntax tree builder for Delphi
License: Mozilla Public License 2.0
Abstract syntax tree builder for Delphi
License: Mozilla Public License 2.0
Using out or var for a parameter in an array property leads to EParserException.
Sample
type
IAccessibleDisp = dispinterface
['{5DF0DE2C-0FC1-4600-9C74-74B2171EC74D}']
property ArrayPropConst[const AParam: Integer]: Integer readonly dispid -5009;//works
property ArrayPropOut[out AParam: Integer]: Integer readonly dispid -5009;
property ArrayPropVar[var AParam: Integer]: Integer readonly dispid -5009;
end;
Found while checking $(BDS)\source\rtl\win\Winapi.oleacc.pas
The AST builder is 'bypass' an important information about a FOR statement. A FOR statement can be:
FOR identifier := startvalue TO endvalue ...
FOR identifier := startvalue DOWNTO endvalue ...
FOR identifier IN enumerable ...
The base class TmwSimplePasPar parses these tokens (TO / DOWNTO / IN), but just "skips it".
(TmwSimplePasPar.ForStatement calls NextToken for ptTo, ptDownTo and Expected for ptIn tokens)
Suggested enhancement:
FStack.Push(Lexer.Token);
try
Expected(ptTo); // ptDownTo, ptIn
finally
FStack.Pop;
end;
This solution will generate a separate child node (between the expression nodes) to the FOR statement with the 'for-kind' information as the childnode localname (TO / DOWNTO / IN ).
(This information also can be used as an attribute for FOR statement, in an other way).
Example:
constructor Create(typeInfo: PTypeInfo;
const interceptor: TMockInterceptor; const proxy: TValue) overload;
Note: semicolon is missing before overload keyword.
For the sake of "best practise", I've addressed all of the hints and warnings raised within the Castalia parser.
These are fixed in my fork here: https://github.com/LaKraven/DelphiAST
No other changes have been made beyond appropriately resolving these hints/warnings (no "hide and forget" tactics used there)
Tried with XE6 and XE7.
'Begin' expected, found 'end'
Partial support for $IF and $IFELSE compiler directives was introduced previously, this work was continued by @LaKraven in #20.
Current implementation is enough in many cases, but $IF and $IFELSE directive still do not support any expression other than "DEFINED" or "NOT DEFINED".
For instance:
{$IF SOME_CONSTANT >= SOME_INTEGER}
{$IF SOME_CONSTANT > SOME_INTEGER}
{$IF SOME_CONSTANT = SOME_INTEGER}
{$IF SOME_CONSTANT < SOME_INTEGER}
{$IF SOME_CONSTANT <= SOME_INTEGER}
Also brackets are not supported too:
{$IF DEFINED(A) AND (DEFINED(B) OR DEFINED(C))}
When parsing a 'type' section, the AST records the type name of a type differently when the class is a type vs when it's anything else (record etc.)
For classes, the name of the type is an attribute 'name'.
For records, the name of the type is in a child node with node name 'type'.
So, for example, parsing code:
type
TFoo = class
end;
TBar = record
end;
TAbc = Integer;
The syntax node for the first (TFoo) will have an attribute 'type' with value 'TFoo'. The second and third (TBar, TAbc) will have child nodes with node names 'type', with name attribute 'TBar' or 'TAbc'.
I am happy to fix this but am not sure which alternative is better. (Personally, I lean towards the first behaviour, having the type name as an attribute for the typedecl node. A whole new child node seems overkill to me, unless it may contain other information in future.)
DelphiAST is able to process {$INCLIDE ...} directive, but it does not have any handler at this moment.
Example:
With FBinary / 1 Do ;
(Note: it is an overloaded operator)
Using concatenated names in the exports clause leads to EParserException.
Sample
exports
foo name 'prefix' + 'foo',
bar name _prefix + 'bar';
Found while checking $(BDS)\source\rtl\sys\System.Internal.MachExceptions.pas
We don't have to use TXMLDocument to convert a simple tree of classes to a XML string. This can be done by writting, say, into a TStringStream. This will work much faster than current implementation. This could be a valuable performance enhancement.
At a multidimensional array-constant definition e.g.:
const
SAMPLE: array[1..2, 2..3] of String = ( ( 'a', 'b' ), ( 'c', 'd' ) );
the generated expression list for the RHS value is 'continuous', the 'dimensional-breaks' are lost.
(EXPR('a') - EXPR('b') - EXPR('c') - EXPR('d') is generated in AST)
Suggested solution:
Override ArrayConstant method in TPasSyntaxTreeBuilder class, with the next code:
procedure TPasSyntaxTreeBuilder.ArrayConstant;
begin
FStack.Push(sROUNDOPEN);
try
inherited;
finally
FStack.Pop;
end;
end;
which will 'insert' * ROUNDOPEN* tags at 'dimensional-breaks'. The generated AST nodes will be:
RO - RO - EXPR('a') - EXPR('b') - RO/ - RO - EXPR('c') - EXPR('d') - RO/ - RO/
Two remarks:
Related to #38, it would be useful to include code comments within the Snytax Tree.
Example:
(* $IFDEF UNICODE *)
Result := True;
(* $ELSE *)
Result := False;
(* $ENDIF *)
At present, the Castalia Lexer is taking responsibility for dealing with Defines... which means that tokens are only produced for source for which any conditional Defines evaluate as True.
This is incorrect behavior, as the Lexer should generate tokens for the ENTIRE source, and elimination of Define-specific tokens should be handled at the Parser level.
Understandably, this is a non-trivial change to make... however, one possible use-case would be the ability to warn a developer when their source consumes an identifier that is limited by one or more Defines (where the consuming source is not limited by suitable Defines itself).
A forward declaration with overload directive leads to EParserException
Sample
unit TestUnit;
interface
implementation
procedure Test; forward; overload;
procedure Test(AParam: Integer); overload;
begin
end;
procedure Test;
begin
end;
end.
The following code will fail with error: 'End' expected found '['
unit Unit1;
interface
type
IFoo = interface
['{F208B04E-2612-4E64-8728-1A94A1D19C27}']
[Test]
procedure Test;
end;
implementation
end.
P.S. It also fails for 2 attributes following each other.
It would be useful to provide the implementing developer with the ability to specify what constructs of the source to include within the Syntax Tree.
This could be done as a Set, so the implementing developer can choose to, for example, exclude the complete STATEMENTS section from each method implementation from the Syntax Tree, or to exclude global variables/constants etc.
This way, if a tool using the AST is only interested in a particular area of the source, they can omit the rest and make the subsequent scraping/transformation operation(s) of their implementing tool quicker.
Example:
property aProperty
[ aIndex: Integer;
doCreate: Boolean = False] : aType;
I've added support in my fork (https://github.com/LaKraven/DelphiAST) for the default defines in SimpleParser.Lexer.pas for (TmwBasePasLex.InitDefines) for newer Delphi versions, as well as other platforms (Android, iOS, MacOSX, Win64)
These should be merged over into the official repository.
Wish: Recognize unit namespaces
unit Test.Foo;
interface
uses Bar.Unit1;
implementation
end.
Foo and Unit1 aren't recognized:
<?xml version="1.0"?>
<UNIT line="0" col="5" name="Test">
<INTERFACE line="1" col="0">
<USES line="2" col="0">
<UNIT line="2" col="5" name="Bar"/>
</USES>
</INTERFACE>
<IMPLEMENTATION line="3" col="0"/>
</UNIT>
Example:
function CreateIndex( { Create, and add an index }
const [REF] IdxDesc : DSIDXDesc
): DBResult; stdcall;
Example:
Stats := TDictionary<string, Integer>.Create;
Output:
<ASSIGN line="14" col="1">
<LHS>
<IDENTIFIER line="14" col="1" name="Stats"/>
</LHS>
<RHS>
<EXPRESSION line="14" col="10">
<GENERIC>
<IDENTIFIER line="14" col="10" name="TDictionary"/>
<DOT>
<TYPEARGS line="14" col="22">
<TYPE line="14" col="22" name="string"/>
<TYPE line="14" col="22" name="string"/>
<TYPE line="14" col="30" name="Integer"/>
<TYPE line="14" col="30" name="Integer"/>
</TYPEARGS>
<IDENTIFIER line="14" col="39" name="Create"/>
</DOT>
</GENERIC>
</EXPRESSION>
</RHS>
</ASSIGN>
Example:
const
cHexChars: Array[0..15] of Byte = (Ord('0') .. Ord('9'), Ord('A')..Ord('F'));
For example the Delphi statement (which is valid in Delphi):
const
SAMPLE = 'apple#20'#13#10;
generates the RHS expression in AST as:
<LITERAL type="string" value="apple#20#13#10"/>
which is 'losts' the original (intended) string value.
Original meaning: "apple#20" + Char(13) + Char(10) is parsed as "apple#20#13#10"
I saw a remark //TODO support ptAsciiChar in TPasSyntaxTreeBuilder.StringConstSimple, maybe it can be handled here...
Example:
function CreateJobObjectA; external Kernel32 name 'CreateJobObjectA';
A function with parameters within an expression as value for a record field in an array constant leads to EParserException.
Sample
const
FilerSignatures: array [0..0] of TBinarySignature = (
(BinarySignature: Func(1) + 1;
SignatureLength: 1));
"Func(1) + 1" -> exception
"Func + 1" -> no exception
"Func(1)" -> no exception
Found while checking $(BDS)\source\rtl\common\System.Classes.pas
Example:
var
Stats := TDictionary<string, Integer>.Create;
Output:
<VARIABLES line="13" col="1">
<VARIABLE>
<NAME line="14" col="3" value="Stats"/>
<TYPE line="14" col="10" name="TDictionary"/>
</VARIABLE>
</VARIABLES>
Subrange type definitions using scoped enums lead to EParserException.
Sample
type
TAlignment = (taLeftJustify, taRightJustify, taCenter);
TLeftRight = TAlignment.taLeftJustify..TAlignment.taRightJustify;
Found while checking $(BDS)\source\rtl\common\System.Classes.pas
Introduce parser/lexer compiler-switches for finer control of language features.
First and foremost language features which counteracts platform and compiler independence.
Being able to easily adjust the parser to support "Delphi 7" dialect, or "Delphi XE" dialect -- and also turn off pointer support, ASM support and so on - broadens the uses of the AST profoundly.
-Pointers enable/disable
-ASM section enable/disable
-external library references enable/disable
-symbol export (export keyword) enable/disable
-WinAPI messages (message keyword and mapping)
-Generics enable/disable
-Records enable/disable
Also Introduce switches for allowed value-types:
-Integer
-int64
-Boolean
-Currency
-ShortString
-Variant
The purpose of such customization is to make it easier to use the AST as a drop in module for existing code-generators. Smart Mobile Studio is presently using (for instance) DWScript as it's primary parser/lexer/tokenizer - and then generates javascript from the AST produced by DWS.
This however has the downside of: any changes to the codegen affects the existing DWS dialect, and any changes to DWScript affects Smart Pascal.
DelphiAST could in many ways replace the use of DWS, allowing for a much richer translation using XML transformation. A draft for this was actually written by me (Jon Lennart Aasenden) for the quartex pascal IDE (quartexpascal.wordpress.com).
Since DelphiAST is more or less 90% compatible with my own architecture for a portable, source to source based transformation format, it makes more sense to improve DelphiAST than writing yet another module.
If DelphiAST could introduce finer control of the dialect, where we could toggle support for various language features (pointers and so on), it would be more than capable of becoming the de-facto AST generator for my projects.
As of writing: JavaScript, C# and C++ are my primary planned targets. Although JavaScript is already covered by Smart Mobile Studio, and it's so efficient that no new compilers are required.
C# on the other hand is very interesting, since it would give us access to mono and it's frameworks.
Parse the following using the DelphiAST demo project:
const
BOOLEAN_NAMES: Array[Boolean] of String = ('True', 'False');
You'll get the following:
<?xml version="1.0"?>
<UNIT line="0" col="0">
<CONSTANTS line="3" col="0">
<CONSTANT>
<NAME line="2" col="1" value="BOOLEAN_NAMES"/>
<TYPE compound="true" line="2" col="11" name="ArrayOf">
<TYPE line="2" col="29" name="String"/>
</TYPE>
<VALUE line="2" col="38">
<LITERAL line="2" col="45" type="string" value="True"/>
<LITERAL line="2" col="54" type="string" value="False"/>
</VALUE>
</CONSTANT>
</CONSTANTS>
</UNIT>
Notice that the Array's "condition type" of "Boolean" is not included in the "compound" attributes?
This is crucial information, so needs to be implemented.
TGenericsCast < T, TT >= class(TObject)
/// <summary> Cast T type to TT type .
/// </summary>
class function Cast(const T: T): TT;
end;
https://code.google.com/p/delphi-detours-library/source/browse/trunk/GenericsCast.pas
Attributes of classes, fields or properties are not recognized.
A file with this code
unit PlainOldData;
interface
uses
Classes;
type
InfoAttribute = class(TCustomAttribute)
end;
[Info]
TPlainOldData = class(TPersistent)
private
[Info]
FName: string;
public
[Info]
property Name: string read FName write FName;
end;
implementation
end.
generates this tree
<?xml version="1.0"?>
<UNIT line="1" col="1" name="PlainOldData">
<INTERFACE line="3" col="1">
<USES line="5" col="1">
<UNIT line="6" col="3" name="Classes"/>
</USES>
<TYPESECTION line="8" col="1">
<TYPEDECL line="9" col="3" name="InfoAttribute" type="class">
<TYPE line="9" col="25" name="TCustomAttribute"/>
</TYPEDECL>
<TYPEDECL line="14" col="3" name="TPlainOldData" type="class">
<TYPE line="14" col="25" name="TPersistent"/>
<PRIVATE line="15" col="3" Visibility="True">
<FIELD>
<NAME line="17" col="5" value="FName"/>
<TYPE line="17" col="12" name="string"/>
</FIELD>
</PRIVATE>
<PUBLIC line="18" col="3" Visibility="True">
<PROPERTY line="20" col="5" name="Name">
<TYPE line="20" col="20" name="string">
<TYPE line="20" col="20" name="string"/>
</TYPE>
<READ line="20" col="27">
<IDENTIFIER line="20" col="32" name="FName"/>
</READ>
<WRITE line="20" col="38">
<IDENTIFIER line="20" col="44" name="FName"/>
</WRITE>
</PROPERTY>
</PUBLIC>
</TYPEDECL>
</TYPESECTION>
</INTERFACE>
<IMPLEMENTATION line="23" col="1"/>
</UNIT>
The definition of InfoAttribute does exist, but its usage does not.
Parser error at "register"
procedure FillDWord(var Destination; Count: Integer; Value: DWord ); assembler; register;
Example:
var
[unsafe] Obj: TObject;
TTreeData class instances are leaking.
type
INT16 = type Smallint;
If an expression contains INDEXED and DOT operators mixed e.g.:
a := SomeObject.SomeIndexedProperty[a].SomeSubproperty[b].Items[c].Name;
the parsed expression tree is erroneous (is in bad order, can't restore the original expression from it).
I think the cause is wrong operator priority in OperatorsInfo constant, in the DelphiAST.Classes unit.
Suggested fix ([o] : original, [*]: change to):
[o] OperatorsInfo: array [0..27] of TOperatorInfo =
...
[o] (Name: sDOT; Priority: 1; Kind: okBinary; AssocType: atRight),
[*] (Name: sDOT; Priority: 2; Kind: okBinary; AssocType: atRight), // Change Priority to 2
...
[o] (Name: sINDEXED; Priority: 4; Kind: okUnary; AssocType: atLeft),
[*] (Name: sINDEXED; Priority: 1; Kind: okUnary; AssocType: atLeft), // Change Priority to 1
...
Example:
TEventHandler = TMulticastMethod<procedure (ASender: TObject; E: TObject) of object>;
DelphiAST works fine with simple compiler directives like {$IFDEF A} or {$IF DEFINED(A)}, but does not support complex compiler directives like {$IF DEFINED(A) AND NOT DEFINED(B)}.
Most of the code should be FPC and Lazarus compatible, but dot-notation in filenames fails, also some specialization might be required with respect to generics to make it compile.
Could be very valuable for cross-platform work and being able to use FPC on linux/mac.
Example:
jpeg_marker_parser_method = function(cinfo : j_decompress_ptr) : LongBool; {$IFDEF LINUX} cdecl; {$ENDIF}
jpeg_saved_marker_ptr = ^jpeg_marker_struct;
Hi All
A Delphi parser that produces a usable AST is long overdue and I'm excited to see the beginnings of one.. but I have to ask, what are the aims of or plans for this project?
I ask because the design of the parser will very much influence what is achievable. I know there are not exactly a lot of open source (or closed source/commercial) parsers out there, so it's understandable that this is based on the SimpleParser. The problem I see with this parser is that it has zero error recovery, it bails out on the first error it encounters. To make this useful for scenarios such as codeinsight, code analysis, refactoring etc, it needs to be able to recover from syntax errors (ideally without skipping too much code) and continue parsing. The lexer & parser are also a little too intertwined, for example the lexer parses compiler directives rather than the parser.
What sort of contributions are you happy to accept from the community? I'd very much like to see a more complex AST, with typed ast nodes,visitor support etc, error recovery in the parser, and perhaps a symbol table.
Not a small task though... I guess the question is where to start?.
I'm working on an IDE plugin for DUnitX (to enable running tests from within the IDE), and of course, I need a parser.
Here's some food for thought.
There are a bunch of IDE plugins that make use of syntax parsers (most likely a derivitive of the simpleparser), so for example if you have GExperts, CNPack, Castilia etc all installed, your code is getting parsed a lot on top of the IDE's own (somewhat broken) code insight parser. What if there were an IDE plugin that provided the parser as a service, which other IDE plugins could then use. Instead of multiple parsers from all the plugins, there would be one. The parser plugin would be loaded as an expert, and have an api so that other plugins can subscribe to be notified when the AST changes, and an api to force a reparse. Ideally the parse would happen in a background thread, and built it's own symbol table (could then replace error insight!)
Fix a bug in the parser (or add new language features) and all the plugins benefit. Imagine the possibilities!
Ideally, this would be something that Embarcadero would have provided as part of the tools api, but we've been asking for this for years and I'm not holding my breath.
Interested to read peoples thoughts on this.
Boolean expressions in the const section lead to EParserException.
Sample
const
C1 = 1 <> 2;
C2 = 1 in [1..4];
C3 = (1 <> 2) <> (2 <> 3);
C4 = (1 <> 2) <> (2 <> 3) <> (2 <> 3);
Found it while checking
https://svn.code.sf.net/p/radstudioverins/code/trunk/svn/svn_client.pas
It can be fixed by changing SimpleExpression in TmwSimplePasPar.ConstantExpression into Expression, but that would be a breaking change regarding the tree for the constant values. (I don't care that it is a breaking change.)
Example:
TRttiMethodHelper = class helper(Spring.TRttiMethodHelper) for TRttiMethod
Example:
var
Foo: String = 'Bar';
DelphiAST will only provide attributes for the Identifier and its Type, but not the Value.
If not meant to parse .dpr file, should fail spectacularly ;)
In same way it should fail on wrong file types.
Also this happens Unit with comment at beginning and very empty IDE-made Unit1.pas
// Code
var
LTreeBuilder : TPasSyntaxTreeBuilder;
LData : TMemoryStream;
LSyntaxTree : TSyntaxNode;
begin
LTreeBuilder := TPasSyntaxTreeBuilder.Create;
LData := TMemoryStream.Create;
try
LData.LoadFromFile(EditFileName.Text);
LData.Position := 0;
LTreeBuilder.AddDefine('MSWINDOWS');
LTreeBuilder.AddDefine('WIN32');
LSyntaxTree := LTreeBuilder.Run(LData);
finally
LTreeBuilder.Free;
LSyntaxTree.Free;
LData.Free;
end;
If a method has a parameter with a default value e.g.:
function SomeMethod( a: Integer = 1 );
the default value expression is not parsed by DelphiAST.
Suggested enhancement ([+] : added, [o] : original):
function TPasSyntaxTreeBuilder.FormalParameterList;
[o] var
...
[+] ParamExpr: TSyntaxNode;
...
[o] ParamKind := ParamList.GetAttribute('kind');
[+] ParamExpr := ParamList.FindNode(sEXPRESSION);
...
[o] if Assigned(TypeInfo) then
[o] FStack.AddChild(TypeInfo.Clone);
[o]
[+] if Assigned(ParamExpr) then
[+] FStack.AddChild(ParamExpr.Clone);
...
Example:
function SHMessageBoxCheckW(wnd: HWND; pszText: LPCTSTR; pszTitle: LPCTSTR; uType: UINT; iDefault: Integer; pszRegVal: LPCTSTR): Integer; stdcall; external 'shlwapi.dll' index 191 delayed;
When calling a method from the string type helper (TStringHelper) for the string type itself that leads to EParserException. I guess this issue exists for all types that are keywords or have it's own type in TptTokenKind.
Sample
procedure Test;
begin
string.LowerCase('X');
end;
Found while checking $(BDS)\source\rtl\common\System.Internal.ICU.pas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.