Giter Club home page Giter Club logo

php's Introduction

php

Archived: This project only supported PHP 5, and never matured beyond a basic parser and AST visualizer. Since I lost interest, it has fallen into disrepair, beyond the more conventional bugs.


Parser for PHP written in Go

See this post for an introduction.

Build Status GoDoc

Test console:

console

Project Status

This project is under heavy development, though some pieces are more or less stable. Listed here are components that in progress or are ideas for future development

Feature Status
Lexer and Parser mostly complete. there are probably a few gaps still
Scoping complete for simple cases. probably some gaps still, most notably that conditional definitions are treated as if they are always defined
Code search and symbol lookup basic idea implemented, many many details missing
Code formatting basic idea implemented, formatting needs to narrow down to PSR-2
Transpilation to Go basic idea implemented, need follow through with more node types
Type inferencing not begun
Dead code analysis basic idea implemented, but only for some types of code. Also, this suffers from the same caveats as scoping

Project Components

Directory Description
php/ast (abstract syntax tree) describes the nodes in PHP as parsed by the parser
php/ast/printer prints an ast back to source code
php/cmd a tool used to debug the parser
php/lexer reads a stream of tokens from source code
php/parser the core parser
php/passes tools and packages related to modifying or analyzing PHP code (heavily a work in progress)
php/passes/togo transpiler
php/passes/deadcode dead code analyzer
php/query tools and packages related to analyzing and finding things in PHP code (heavily a work in progress)
php/testdata simple examples of PHP that must parse with no errors for tests to pass
php/token describes the tokens read by the lexer

php's People

Contributors

jostillmanns avatar mcuadros avatar mtibben avatar rwos avatar stephens2424 avatar stephensearles avatar xsavitar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php's Issues

Create Package for PHP Standard Library and Common Packages

To do an intermediate transpile, we will need to translate common PHP functions and classes to Go equivalents. This may happen at two levels. The first is to provide functional equivalent implementations of the same functions. These functions would be available to transpiled programs as a package. The second is to provide compiler passes that translate these functions to their idiomatic Go equivalents.

This issue will be closed when the first of these is completed.

Inconsistent parsing of function calls

Hi,
I think I have found a problem with the parsing of simple function calls. Given the following code:

<?php
sort($myArray);

sort() will be parsed to ast.ExpressionStmt. If you use a the function call in an assignment like this:

<?php
$ok = sort($myArray);

sort() will be parsed as ast.FunctionCallExpression. Maybe I missed some point but it seems a bit confusing.
If this is a real bug I could fix it by myself... I would appreciate some hints :)

Line number in AST Node?

Hi,

I'm using your tool to parse PHP code with the secret hope of building some kind of static analysis tool with it; I wondered if it would be feasible and/or desirable to track line numbers in Node structures? I don't mind lending a hand for that, but if it's definitely unwanted I'd rather not waste time :)

"or" logical operator error

Hey Stephen! Thanks for this project. I'm finally getting around to playing with it!

It appears to be an issue when an 'OR' is reached.

The simplest test case I can give you is this:

<?php defined('BASEPATH') OR exit('No direct script access allowed');

This gives the following error:

(ast.ExitStmt)Literal-string: 'No direct script access allowed'
Compiled 1 files. 0 files with errors - 100.000000% success

Remove reference-or-not inconsistencies when manipulating AST objects

Several places in the code look for ast.Foo but also *ast.Foo.
Example in ast/printer/printer.go:

    case ast.AssignmentExpression:
        p.PrintAssignmentExpression(&n)
    case *ast.AssignmentExpression:
        p.PrintAssignmentExpression(n)

At the end of the day we are not sure about the kind of object inside in AST.

Prefix vs Postfix Operators

I am attempting to help flesh out the fmt passes. I do not see how you can differentiate between ++$a and $a++. Both appear to have the same values:

node.Children() # => ['$a']
node.Position() # =>  0
node.Operator # => '++'
node.Type # => 'integer|float'
node.Operand1 # => '$a'
node.Operand2 # => nil
node.Operand3 # => nil

Lexer - Infinite loop on identifiers that contain multibyte runes

The parser will hog the CPU when working on data like

<?php
$a = ‘test‘; // Note that ‘ is a 3-byte UTF-8 character %E2%80%98, not a single quote 

I get those files from users that copy-paste PHP code from online articles that apply formatting on PHP code. The reason it hangs is that *lexer.acceptRun doesn't accept multibyte characters since they are not in the valid set. A quick fix I did was add an additional check for utf8.RuneLen to accept multibyte strings, but I'm not sure it's the best workaround.

Create Parser Passes to Translate PHP Standard Library and Common Packages to Idiomatic Go

To do an intermediate transpile, we will need to translate common PHP functions and classes to Go equivalents. This may happen at two levels. The first is to provide functional equivalent implementations of the same functions. These functions would be available to transpiled programs as a package. The second is to provide compiler passes that translate these functions to their idiomatic Go equivalents.

This issue will be closed when the both of these are completed.

Integrate with srclib?

Do you have plans to integrate your parser with Srclib? https://srclib.org/
This would allow for intelligent PHP code completion in a number of editors. I'm working on a vim plugin for srclib right now, so it could be very cool!

Lexer hangs on bad PHP file from Codeception

Was looking at this project to look into making a fast PHP autocomplete daemon. Out of curiosity, any reason why the lexer is running in a Goroutine? Wouldn't it be the job of the Parser / user-code to run the lexer in a go-routine (if at all?)

File:
https://github.com/Codeception/Codeception/blob/2.2/tests/data/Invalid.php

File Contents:

<?php
$I do nothing here

My Golang loop:

for {
	token := lexer.Next()
	if token.Typ == PHPToken.EOF {
		break
	}
	// ... more
}

Nested if-else block not parsing correctly

Add a testdata file with a simple foreach.php file, for example:

<?php foreach ( $arr as $key => $val ){ echo 1; }

Now tune it to have actual logic inside:

<?php
foreach ( $arr as $key => $val ){
    if ($x === $y = $z) {
      echo "true";
    } else {
      echo "false";
    }
}

And this breaks with:

--- FAIL: TestFiles (0.01s)
    bulk_test.go:28: foreach.php :10: Expected expression. Found EOF
        :10: Found EOF, statement or expression
        :10: Found EOF, statement or expression

Line-comments eat up everything on files using only carriage-return

If the file uses only CR instead of LF/CRLF as line endings, the parser will treat everything after a line comment as part of that line comment.

Example (the \r character represents carriage-return, there's no \n anywhere in the file):

<?php\r
// comment\r
$a = 1;

Will yield // comment\r$a = 1; as a line comment.

To fix that I changed

func lexLineComment(l *lexer) stateFn {
    lineLength := strings.Index(l.input[l.pos:], "\n") + 1

to

func lexLineComment(l *lexer) stateFn {
    lineLength := strings.IndexAny(l.input[l.pos:], "\r\n") + 1

and it seems to work fine.

Print Golang Equivalent to PHP

Hey @stephens2424, I stumbled onto your blog post and was intrigued. I wanted to test and play. What I'd like to do is print the golang equivalent of a php file. I edited the bulk_test.go file as a broken attempt.

...
        p := NewParser(string(src))
        astNodes, errs := p.Parse() // Returns []ast.Node
        var buf bytes.Buffer
        //  printer.Fprint(os.Stdout, nil, astNodes)
        printer.Fprint(&buf, nil, astNodes)
        s := buf.String()
        fmt.Println(s, astNodes)
        //The Go parser and printer package are 

I'm trying to figure out what I need to do with astNodes and how to create a token.FileSet.
Thanks in advance.

Comments are not available in AST

I got a test php file:

<?php
/* Foo   */
// Bar
$baz = "hey";

and a small go tool to parse this:

package main

import (
    "fmt"
    "github.com/stephens2424/php"
    "github.com/stephens2424/php/passes/printing"
    "io/ioutil"
    "log"
)

func main() {
    content, err := ioutil.ReadFile("./test.php")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(string(content))

    p := php.NewParser()
    a, err := p.Parse("test.php", string(content))
    if err != nil {
        log.Fatal(err)
    }

    w := printing.NewWalker()

    for _, node := range a.Nodes {
        w.Walk(node)
    }
}

The output looks like:

go run main.go
<?php
/* Foo   */
// Bar
$baz = "hey";

(ast.ExpressionStmt)=
    (ast.AssignmentExpression)=
        (*ast.Variable)$baz
            (*ast.Identifier)baz
        (*ast.Literal)Literal-string: "hey"

As you see the single line + multi line comments are nor parsed in the AST.
Do i something wrong or are they not reflected as node?

Missing namespace node

Working with the php parser in cannot find the namespace node.
How I can access to the namespace node?

Best reagards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.