Giter Club home page Giter Club logo

parser's Introduction

PHP-Parser

A handwritten fault-tolerant, recursive-descent parser for PHP written in Rust.

justforfunnoreally.dev badge

Warning - this is still alpha software and the public API is still subject to change. Please use at your own risk.


Usage

Add php-parser-rs in your Cargo.toml's dependencies section

[dependencies]
php-parser-rs = { git = "https://github.com/php-rust-tools/parser" }

or use cargo add

cargo add php-parser-rs --git https://github.com/php-rust-tools/parser

Example

use std::io::Result;

use php_parser_rs::parser;

const CODE: &str = r#"<?php

final class User {
    public function __construct(
        public readonly string $name,
        public readonly string $email,
        public readonly string $password,
    ) {
    }
}
"#;

fn main() -> Result<()> {
    match parser::parse(CODE) {
        Ok(ast) => {
            println!("{:#?}", ast);
        }
        Err(err) => {
            println!("{}", err.report(CODE, None, true, false)?);

            println!("parsed so far: {:#?}", err.partial);
        }
    }

    Ok(())
}

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Credits

parser's People

Contributors

azjezz avatar danog avatar dohxis avatar dominikzogg avatar drupol avatar edsrzf avatar kennedytedesco avatar ryangjchandler avatar williamdes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parser's Issues

Add support for heredocs / nowdocs

Not sure on the best approach yet. I think it'll be something like producing a start & end token, then parsing everything in the middle as either a constant string or an interpolated string.

Parse all type declarations

There are several varieties of type declaration that Trunk can't parse:

  • Unions of more than two members (eg string|int|bool)
  • Intersections of more than two members (eg string&int&bool)
  • void
  • false
  • true
  • null
  • Disjunctive Normal Form types (PHP 8.2)

It also might be nice to have a richer type structure to deal with so that not everything is a string, but this is probably a separate issue.

Allow semi-reserved words in various places

PHP has semi-reserved words which can be used as names in some contexts. The Trunk parser appears to know about these already, but does not allow them to be used in all the places they should be allowed.

See test cases:

Also see uses of semi_reserved in the php-parser grammar: https://github.com/nikic/PHP-Parser/blob/master/grammar/php.y

I'll try to fill out this issue with more specific details in the future.

Parse close tags

The Trunk parser doesn't handle PHP close tags. eg:

<?php
// Doesn't parse:
?>
<html>

Add an `attributes` field to all statements

I want to introduce a new Attributes structure and relevant field to all statements so that we can store things like start_line, end_line and any comments on the statement.

Parse __halt_compiler

__halt_compiler is a pseudo-function that stops compilation past the point it's "called". Any arbitrary data can follow and it doesn't need to be valid PHP.

<?php

__halt_compiler();
We can put anything we want here, and it should still parse!

Move expressions to use a struct-type enum

The current setup on the Expression enum is a tuple/tagged union. This makes things a little hard to recognise in some cases, like anonymous classes. We should either store a single struct inside of the enum or use a structure on the variant directly and have named fields.

Parse references in loops and functions

These should parse:

<?php
function a(&$b) {}
function &a($b) {}
function &($a) {};
static fn(&$x) => $x;
fn&($x) => $x;
foreach ($a as &$b) {}
foreach ($a as $b => &$c) {}

Somewhat related to #19, since & can be a unary operator, but here we're talking about non-expression contexts.

Get rid of comment hacks

We just have loads of checks for comments at the moment to essentially make them noop statements, etc.

We should remove this and instead have some kind of store_comments() method that will store them on the struct and then another method that goes with #10 to clear them from the parser and stores them in the attributes.

Implement all of PHP's operators

https://www.php.net/manual/en/language.operators.precedence.php provides a list of operators ordered by their precedence. We support quite a few of these already but there's some things like bitwise ops and casts that still need to be implemented.

Unary

  • All casts ((int), (bool), etc.)
  • Plus (+)
  • Bitwise not (~)
  • Prefix decrement (--$i)
  • Prefix increment (++$i)
  • Error control (@)
  • Print (print)

Binary

  • Modulo (%)
  • Left shift (<<)
  • Right shift (>>)
  • Bitwise and (&)
  • Bitwise or (|)
  • Bitwise xor (^)
  • Not equal (<>)
  • Spaceship (<=>)
  • Logical and (and)
  • Logical or (or)
  • Logical xor (xor)

Assignment

  • Assign =
  • Add assign +=
  • Sub assign -=
  • Mul assign *=
  • Pow assign **=
  • Div assign /=
  • Concat assign .=
  • Mod assign %=
  • Bitwise and assign &=
  • Bitwise or assign |=
  • Bitwise xor assign ^=
  • Left shift assign <<=
  • Right shift assign >>=
  • Null coalesce assign ??=

Parse braced global namespace

This is a valid way to access the global namespace, but doesn't parse:

<?php
namespace {
    function globalFunc() {}
}

Parse array spread

These should parse:

<?php
[...[]];
[...[1, 2, 3]];
[...$array];
[...getArr()];
[...arrGen()];
[...new ArrayIterator(['a', 'b', 'c'])];
[0, ...$array, ...getArr(), 6, 7, 8, 9, 10, ...arrGen()];
[0, ...$array, ...$array, 'end'];

Lex binary strings

String literals can be prefixed with b or B:

<?php

b'';
b"";
b'Hi';
B'Hi';

// Heredocs and nowdocs too!
b<<<EOS
Binary
EOS;

I can't find any official documentation for this, but there are test cases for it and I found this Stackoverflow question. Apparently it was intended as a forward-compatibility feature for PHP 6, but PHP 6 was never released.

Closures trying to be parsed as function statements

function () {};

The parser thinks there should be an identifier after the function since it sees the function keyword. Just need to check if the peek isn't an identifier I think, it'll fall down into the expression logic then.

Parse `global`

<?php
$a = 1;
$b = 2;

function Sum()
{
    global $a, $b;

    $b = $a + $b;
} 

Parse top-level constants

Constants inside classes will parse, but top-level constants currently don't:

<?php

const A = 0, B = 1.0;
const C = 'const';

Prevent named arguments being used before positional arguments

function foo($bar, $baz) {}


foo(baz: "Baz", "Bar");

nikic/php-parser produces a valid AST for this since it doesn't do anywork to validate the AST, but we can do one better and actually produce a ParseError for this instance since it's not allowed at runtime.

Move away from binding powers

Binding powers are kind of annoying given how complicated PHP's precedence rules can get. I think I should just move it all over to a regular Precedence enum or something and then have an Associativity rule too.

(This is where parser generators kind of dominate).

Parse first-class callables

These should all parse:

<?php
foo(...);
$this->foo(...);
A::foo(...);

nikic/php-parser also accepts these, even though they result in runtime errors:

<?php
new Foo(...);
$this?->foo(...);

Trunk can't handle invalid UTF-8 in PHP source code

The Trunk lexer accepts a string and operates on chars. PHP source does not need to be valid UTF-8, and so in degenerate cases can't be represented as a Rust string or as individual chars. In particular, PHP docs state:

A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

For instance, you can run this on the CLI:

$ printf '<?php $var\xff = 1; echo $var\xff . "\n";'|php
1

...and it works! This uses the character \xff as part of a variable name, which is not otherwise printable and is invalid UTF-8:

$ printf '<?php $var\xff = 1; echo $var\xff . "\n";'|iconv -f UTF-8
<?php $var
iconv: (stdin):1:10: cannot convert

This is admittedly a fairly bizarre corner of the PHP language, but if this toolchain wants to be compatible with the main PHP implementation, it should be fixed.

Parse grouped use statements

We already support basic use statements at the top of a file.

use Foo;
use Bar;

We should also support grouped statements though.

use App\{Foo, Bar, Baz as Car};

Parse readonly properties and classes

Readonly properties, which are required to have type declarations:

<?php

class C {
    readonly int $a;
    readonly protected string $b;
    readonly static stdClass $c;

    function __construct(readonly bool $d) {}
}

Readonly classes: (coming in PHP 8.2)

<?php

readonly class C {}

Nullsafe operator doesn't parse

Nullsafe parsing is implemented, but it doesn't seem to work correctly. A simple test case fails to parse at all:

<?php $a?->b;

Handle all types of numeric literals

The lexer doesn't handle various cases for numeric literals:

  • e-notation for floats (eg 0E0, 1e10000)
  • Hexadecimal literals (0xCAFEF00D)
  • Octal literals (0700)
  • Explicit octal literals (0o700)
  • Binary literals (0b01011111)
  • Large integer literals should overflow to floating-point (18446744073709551615) - currently they result in an error.

See also these test cases:

Ensure that identifiers also include non-reserved keywords

The current expectation in the parser is that all identifiers are non-keywords. This isn't quite right though since PHP has introduced new keywords in releases and wanted to preserve compatibility with method names etc.

This should be quite an easy change now that we dedicated methods for the expectations. If something is a keyword, we just need to return the string form instead of the captured identifier.

We could use nikic/php-parser as a source for the soft-reserved/non-reserved keywords.

Allow forcing type strings in parser

At the moment the parser will only parse a type string if it could potentially exist. We should introduce a new ParserOptions struct that allows forcing things like type strings in appropriate positions.

Add support for short (`:`) control structures

It's common for templating engines to use short control structures, i.e. foreach and endforeach.

The parsing is quite simple here really since we just need to check if there is a { or a : and change how we check for the end token.

  • If/elseif/else
  • Foreach
  • For
  • While
  • Switch
  • Declare

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.