alex / rply Goto Github PK

An attempt to port David Beazley's PLY to RPython, and give it a cooler API.

License: BSD 3-Clause "New" or "Revised" License

Shell 0.37% Python 99.63%

rply's Introduction

RPLY

Welcome to RPLY! A pure Python parser generator, that also works with RPython. It is a more-or-less direct port of David Beazley's awesome PLY, with a new public API, and RPython support.

You can find the documentation online.

Basic API:

from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox

lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")

lg.ignore(r"\s+")

# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
        precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")

@pg.production("main : expr")
def main(p):
    # p is a list, of each of the pieces on the right hand side of the
    # grammar rule
    return p[0]

@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
    lhs = p[0].getint()
    rhs = p[2].getint()
    if p[1].gettokentype() == "PLUS":
        return BoxInt(lhs + rhs)
    elif p[1].gettokentype() == "MINUS":
        return BoxInt(lhs - rhs)
    else:
        raise AssertionError("This is impossible, abort the time machine!")

@pg.production("expr : NUMBER")
def expr_num(p):
    return BoxInt(int(p[0].getstr()))

lexer = lg.build()
parser = pg.build()

class BoxInt(BaseBox):
    def __init__(self, value):
        self.value = value

    def getint(self):
        return self.value

Then you can do:

parser.parse(lexer.lex("1 + 3 - 2+12-32"))

You can also substitute your own lexer. A lexer is an object with a next() method that returns either the next token in sequence, or None if the token stream has been exhausted.

Why do we have the boxes?

In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project.

If you don't intend to use your parser from RPython, and just want a cool pure Python parser you can ignore all the box stuff and just return whatever you like from each production method.

Error handling

By default, when a parsing error is encountered, an rply.ParsingError is raised, it has a method getsourcepos(), which returns an rply.token.SourcePosition object.

You may also provide an error handler, which, at the moment, must raise an exception. It receives the Token object that the parser errored on.

pg = ParserGenerator(...)

@pg.error
def error_handler(token):
    raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())

Python compatibility

RPly is tested and known to work under Python 2.7, 3.4+, and PyPy. It is also valid RPython for PyPy checkouts from 6c642ae7a0ea onwards.

rply's People

Contributors

Stargazers

Watchers

rply's Issues

New release?

Last one was almost 2 years ago.

Precedence on rules not working

I'm trying to use precedence on production rules and it doesn't seem to be working.

I'm using the same grammar of simple mathematical expressions used in the docs. All I've done is copy-and-paste the lexer, ast, parser and implicit_multiplication code from docs into a file.

The result of that is:

>>> parser.parse(lexer.lex('1 + 2 * 3')).eval()
7
>>> parser.parse(lexer.lex('1 + 2 3')).eval()
9

This is on python 3.6.1 with rply 0.7.4

Is it possible to expand manual for using RPLY for light language design?

I followed this tutorial on first use of PyPy: https://www.youtube.com/playlist?list=PLIwCTnC4PkkA1LMTJKlzO9IxLwiPBJtF8

It will be great to expand it with more complex syntax, and some features which provided by PyPy:

compiling to bytecode
dynamic object system (pythonic, or some more interesting like SmallTalk message passing)
optional JIT

Possible to have empty productions?

I have two modes: One where I do a full parse and another where I just want to parse out certain productions. Is this possible? Do I need to setup the Lexer differently or can I handle it in the Parser? If so, for either, any hints would be helpful.

test failures in 0.7.2; test_simple, test_empty_production under py3[3-4]

from the source, with system python set to Python 3.3.5;

rply-0.7.2 $ py.test tests/

yields

======================== test session starts
========================
platform linux -- Python 3.3.5 -- py-1.4.20 -- pytest-2.5.2
collected 35 items / 1 skipped

tests/test_both.py .
tests/test_lexer.py ...
tests/test_parser.py ..........
tests/test_parsergenerator.py FF.......
tests/test_tokens.py ...
tests/test_utils.py .....
tests/test_warnings.py ....

========================== FAILURES
==========================
_________________________ TestParserGenerator.test_simple
_________________________
self = <tests.test_parsergenerator.TestParserGenerator object at 0x7fcbc174e850>

    def test_simple(self):
        pg = ParserGenerator(["VALUE"])

        @pg.production("main : VALUE")
        def main(p):
            return p[0]

        parser = pg.build()

>       assert parser.lr_table.lr_action == [
            {"VALUE": 2},
            {"$end": 0},
            {"$end": -1},
        ]
E       assert [{'VALUE': 1}..., {'$end': 0}] == [{'VALUE': 2},... {'$end': -1}]
E         At index 0 diff: {'VALUE': 1} != {'VALUE': 2}

tests/test_parsergenerator.py:19: AssertionError
______________________ TestParserGenerator.test_empty_production
______________________

self = <tests.test_parsergenerator.TestParserGenerator object at 0x7fcbbeac9b50>
    def test_empty_production(self):
        pg = ParserGenerator(["VALUE"])

        @pg.production("main : values")
        def main(p):
            return p[0]

        @pg.production("values : VALUE values")
        def values_value(p):
            return [p[0]] + p[1]

        @pg.production("values :")
        def values_empty(p):
            return []

        parser = pg.build()
>       assert parser.lr_table.lr_action == [
            {"$end": -3, "VALUE": 3},
            {"$end": 0},
            {"$end": -1},
            {"$end": -3, "VALUE": 3},
            {"$end": -2},
        ]
E       assert [{'$end': -3,... {'$end': -2}] == [{'$end': -3, ... {'$end': -2}]
E         At index 0 diff: {'VALUE': 1, '$end': -3} != {'VALUE': 3, '$end': -3}

with system Python 3.4.0

tests/test_parsergenerator.py FF.......

__ TestParserGenerator.test_simple __

as for Python 3.3.5

__ TestParserGenerator.test_empty_production __

>       assert parser.lr_table.lr_action == [
            {"$end": -3, "VALUE": 3},
            {"$end": 0},
            {"$end": -1},
            {"$end": -3, "VALUE": 3},
            {"$end": -2},
        ]
E       assert [{'$end': -3,... {'$end': -2}] == [{'$end': -3, ... {'$end': -2}]
E         At index 0 diff: {'VALUE': 2, '$end': -3} != {'$end': -3, 'VALUE': 3}

tests/test_parsergenerator.py:41: AssertionError

Can you replicate? py3.2 passes fine. These may be python minor version sensitive.
Do you require anything further?

How to setup productions for optional (0 or more)

With the following EBNF:

start = MODULE SYMBOL {declaration}

declaration = INCLUDE SYMBOL
                    | VAR SYMBOL expression
                    | FUNC SYMBOL LIST {expression}

It's not clear how to declare the pg.production(...) Basically, how to indicate 0 or more occurrences of productions?

Any help would be appreciated.

fix latest build error

Seems to be caused by changed pypy repo url.

$ wget https://bitbucket.org/pypy/pypy/get/default.tar.bz2 -O `pwd`/../pypy.tar.bz2

--2020-09-12 16:50:53--  https://bitbucket.org/pypy/pypy/get/default.tar.bz2

Resolving bitbucket.org (bitbucket.org)... 18.205.93.2, 18.205.93.0, 18.205.93.1, ...

Connecting to bitbucket.org (bitbucket.org)|18.205.93.2|:443... connected.

HTTP request sent, awaiting response... 404 Not Found

2020-09-12 16:50:53 ERROR 404: Not Found.

Should the pypy url in .travis.yml be changed to below?

https://foss.heptapod.net/pypy/pypy/-/archive/branch/default/pypy-branch-default.tar.bz2

Link redirection to empty pages in the documentation

rply and rply.token under the API Documentation section in the official docs redirect to empty pages.

Chaining multiple functions

Guys, is there a documentation somewhere on the best way to parse multiple chained functions.
For example:

func1(func3(arg1, arg2), arg1)

Supporting Python 3.3.5

Hello!

I am on the way to implement 'yield from' (PEP 380) construction in baron project. PyPy supports Python version 3.3.5 and I wonder if rply will support version 3.3.5.

If no one is going to add this in early future, I can try to improve rply.

Best regards,
Mehti

How to deal the state in lexer?

the ply has

t_NUMBER = r'\d+'
t_INITIAL_NUMBER = r'\d+'

But how to deal in rply?

Add support for "|" inside productions

Cleans up the use case where one has many small productions. For example,

@pg.production("prod: TOKEN_1")
# ... <24 lines>
@pg.production("prod: TOKEN_26")
def func(p):
    pass

Would be simplified to

@pg.production("prod: TOKEN_1 | ... | TOKEN_26")
def func(p):
    pass

More specifically, I need this right now, as I need to accept (and throwaway) all tokens until I see a newline.

I'd be happy to submit a PR if you guys think it's a valuable feature to have.

I couldn't find a way of quantification in definition for the parser generator

I was expecting I could write like this:

@parser_generator.production("list : '(' expr (',' expr)* ')'")

There seems not be any documentation. Does that mean I can't do that with rply?
Thanks

pypy fails to translate basic rply example

Using latest pypy/rply pypy translation fails when the build method of the parser generator is called. This is run with the example provided on the rply homepage.

[translation:ERROR]  TypeError: ('builtin_enumerate() takes exactly 1 argument (2 given)', <
[translation:ERROR] Occurred processing the following simple_call:
[translation:ERROR]       (KeyError getting at the binding!)
[translation:ERROR]  v1 = simple_call((type enumerate), v0, (1))
[translation:ERROR] In <FunctionGraph of (rply.parsergenerator:96)ParserGenerator.build at 0x25ac550>:
[translation:ERROR] Happened at file /usr/lib/python2.6/site-packages/rply/parsergenerator.py line 99
[translation:ERROR]
[translation:ERROR]             g = Grammar(self.tokens)
[translation:ERROR]
[translation:ERROR] ==>         for level, (assoc, terms) in enumerate(self.precedence, 1):

Parsing IF and IF-ELSE statement.

Hey, i made a parser with theses productions :

@self.pg.production('if_statement : IF expression OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifexp(p):
            return If(p[1], p[4])

        @self.pg.production('if_statement : IF expression NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE '
                            'CLOSE_CRO')
        def ifexp2(p):
            return If(p[1], p[5])

        @self.pg.production('if_statement : IF expression OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO '
                            'ELSE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifelse(p):
            return IfElse(p[1], p[4], p[10])

        @self.pg.production('if_statement : IF expression OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO '
                            'NEWLINE ELSE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        @self.pg.production('if_statement : IF expression OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO '
                            'ELSE NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifelse2(p):
            return IfElse(p[1], p[4], p[11])

        @self.pg.production('if_statement : IF expression OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO '
                            'NEWLINE ELSE NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifelse3(p):
            return IfElse(p[1], p[4], p[12])

        @self.pg.production('if_statement : IF expression NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE '
                            'CLOSE_CRO ELSE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifelse4(p):
            return IfElse(p[1], p[5], p[11])

        @self.pg.production('if_statement : IF expression NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE '
                            'CLOSE_CRO NEWLINE ELSE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        @self.pg.production('if_statement : IF expression NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE '
                            'CLOSE_CRO ELSE NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifelse5(p):
            return IfElse(p[1], p[5], p[12])

        @self.pg.production('if_statement : IF expression NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO '
                            'NEWLINE ELSE NEWLINE OPEN_CRO NEWLINE statementlist NEWLINE CLOSE_CRO')
        def ifelse6(p):
            return IfElse(p[1], p[5], p[13])

But if i made this code :

a = enter("Votre age : ")
a = int(a)
if a >= 18
{
    show("Vous etes majeur")
}

show(a)

I have an error on show. So i think, parser think this is a IFELSE statement but no.

Can you help me ?

Pass *args to parser rules

I've been writing a lot of code like this:

@pg.production('expr : expr PLUS expr')
def plus(p):
    left, _, right = p
    ...

I was thinking that it might be nicer if the rule functions were passed several args instead, so the former would become:

@pg.production('expr : expr PLUS expr')
def plus(*p):
    left, _, right = p
    ....

And this would become possible:

@pg.production('expr : expr PLUS expr')
def plus(left, op, right):
    ...

Sadly _ can't be used more than once in an argument list, but there's always *p for token-heavy rules.

Missing LALRError Exception class

Patch:

diff --git a/rply/errors.py b/rply/errors.py
index 1844c5f..ad88ce3 100644
--- a/rply/errors.py
+++ b/rply/errors.py
@@ -2,6 +2,10 @@ class ParserGeneratorError(Exception):
     pass


+class LALRError(Exception):
+    pass
+
+
 class LexingError(Exception):
     """
     Raised by a Lexer, if no rule matches.
diff --git a/rply/parsergenerator.py b/rply/parsergenerator.py
index 407ff82..aa7961f 100644
--- a/rply/parsergenerator.py
+++ b/rply/parsergenerator.py
@@ -8,7 +8,7 @@ import sys
 import tempfile
 import warnings

-from rply.errors import ParserGeneratorError, ParserGeneratorWarning
+from rply.errors import LALRError, ParserGeneratorError, ParserGeneratorWarning
 from rply.grammar import Grammar
 from rply.parser import LRParser
 from rply.utils import IdentityDict, Counter, iteritems, itervalues
@@ -331,7 +331,6 @@ class LRTable(object):
                                     else:
                                         chosenp, rejectp = oldp, pp
                                     rr_conflicts.append((st, repr(chosenp), repr(rejectp)))
-                                else:
                                     raise LALRError("Unknown conflict in state %d" % st)
                             else:
                                 st_action[a] = -p.number

New release

Could we get a new release? It would be nice to get #84 for Hy.

Getting KeyError: 'var_value'

I am making a code with rply, when I came across an error. I got a KeyError: 'var_value' when I ran a bit of code. The full error is listed below:

Traceback (most recent call last):
  File "/Users/User1/Project1/run.py", line 13, in <module>
    parser = pg.get_parser()
  File "/Users/User1/Project1/parser.py", line 74, in get_parser
    return self.pg.build()
  File "/usr/local/lib/python2.7/site-packages/rply/parsergenerator.py", line 172, in build
    g.compute_first()
  File "/usr/local/lib/python2.7/site-packages/rply/grammar.py", line 148, in compute_first
    for p in self.prod_names[n]:
KeyError: 'var_value'

Here is the snippet of my code where the error seems to have happened:

@self.pg.production('statement : VARIABLE var_name EQUAL var_value SEMI_COLON')

Why am I getting the KeyError: 'var_value' error? Is there something I can do to fix it?

no way to disable caching

There isn't any way to disable caching.
README seems to imply that caching is only enabled when you supply cache_id, but this is not quite the case.
If you don't provide cache_id, RPLY will generate random cache id for you. This means that the parser tables will be stored on disk, but most likely will be never read back. It sounds wasteful to me.

How can i suppress warning?

Hi
How can i remove the ParserGeneratorWarning that appears when a Toke is unused?

Test failure with pytest-3.0.3

Reported at https://bugs.gentoo.org/show_bug.cgi?id=607510:

collecting tests/test_ztranslation.py ____________________________________________________________________
Using @pytest.skip outside of a test (e.g. as a test function decorator) is not allowed. Use @pytest.mark.skip or @pytest.mark.skipif instead.

precedence on productions

Is it possible to add precedence directly on productions?

I'm writing a parser for a language that reuses the ! symbol for both factorial (postfix) and negation (prefix). I'd like to be able to resolve the conflict with precedence but it seems like it's only possible to add precedence to tokens.

Example

I want 2 ! 3 to parse as Times[Factorial[2], 3] not Times[2, Not[3]]

What is the equivalent of PLY's "def t_FOO(t): ..."?

In a PLY lexer, I can implement certain weird things such as case-insensitive keywords by defining a function with the same name as I'd normally give the string variable containing the regexp for that token.

For example:

from ply import lex

tokens = ("GREET", "FIGHT", "WORD")
reserved = ("GREET", "FIGHT")

t_ignore = ' +'

def t_error(t):
    raise ValueError("oh noooo")

def t_WORD(t):
    "[a-zA-Z]+"
    upper = t.value.upper()
    if upper in reserved:
        t.value = upper
        t.type = upper
    return t

lexer = lex.lex()
lexer.input("grEEt samuel FIGHT tomato greet potato FIght pOEtRY")
for token in lexer:
    print token

#LexToken(GREET,'GREET',1,0)
#LexToken(WORD,'samuel',1,6)
#LexToken(FIGHT,'FIGHT',1,13)
#LexToken(WORD,'tomato',1,19)
#LexToken(GREET,'GREET',1,26)
#LexToken(WORD,'potato',1,32)
#LexToken(FIGHT,'FIGHT',1,39)
#LexToken(WORD,'pOEtRY',1,45)

I can't find anything in rply's documentation that explains how to do the equivalent of defining t_WORD as a function in the above program. Nor can I find anything that indicates that it can't be done.

Exception on $end Token

I'm getting this exception:

/x/parser.py:53: ParserGeneratorWarning: 4 shift/reduce conflicts
  return self.pg.build()
Traceback (most recent call last):
  File "core.py", line 33, in <module>
    parser.parse(tokens).eval()
  File "/usr/local/lib/python3.7/site-packages/rply/parser.py", line 60, in parse
    self.error_handler(lookahead)
  File "/x/parser.py", line 50, in error_handle
    "Ran into a %s where it wasn't expected" % token.gettokentype())
ValueError: Ran into a $end where it wasn't expected

When attempting to parse:

text_input = "print(4 + 4 - 2);"

Given this token list:

Token('PRINT', 'print')
Token('OPEN_PAREN', '(')
Token('NUMBER', '4')
Token('SUM', '+')
Token('NUMBER', '4')
Token('SUB', '-')
Token('NUMBER', '2')
Token('CLOSE_PAREN', ')')
Token('SEMI_COLON', ';')

And this parser:

from rply import ParserGenerator
from ast import Number, Sum, Sub, Print


class Parser():
    def __init__(self, mlex):
        # A list of all token names accepted by the parser.
        self.pg = ParserGenerator(mlex.get_tokens())

    def parse(self):
        @self.pg.production(
            'program : PRINT OPEN_PAREN expression CLOSE_PAREN SEMI_COLON')
        def program(p):
            return Print(p[2])

        @self.pg.production('expression : expression SUM expression')
        @self.pg.production('expression : expression SUB expression')
        def expression(p):
            left = p[0]
            right = p[2]
            operator = p[1]
            if operator.gettokentype() == 'SUM':
                return Sum(left, right)
            elif operator.gettokentype() == 'SUB':
                return Sub(left, right)

        @self.pg.production('expression : NUMBER')
        def number(p):
            return Number(p[0].value)

        @self.pg.error
        def error_handle(token):
            raise ValueError(
                "Ran into a %s where it wasn't expected" % token.gettokentype())

    def get_parser(self):
        return self.pg.build()

can't lex byte strings on Python 3

I wanted a Python 3 lexer that consumes byte strings, but this doesn't seem possible with LexerGenerator. For example, for this test program:

from rply import LexerGenerator
lg = LexerGenerator()
lg.add('NUMBER', br'\d+')
lg.add('ADD', br'\+')
lg.ignore(br'\s+')
lexer = lg.build()
for token in lexer.lex(b'1 + 1'):
    print(token)

you get:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    for token in lexer.lex(b'1 + 1'):
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 46, in next
    colno = self._update_pos(match)
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 27, in _update_pos
    self._lineno += self.s.count("\n", match.start, match.end)
TypeError: a bytes-like object is required, not 'str'

(I ended up writing my own lexer for unrelated reasons, so this is not a show-stopper for me, but I thought you might want to fix it.)

rply fails in new release of pypy

There were some changes in the pypy's regular expression engine. rply stopped working after commit ac140c11bea3.

Error:

[translation:info] 2.7.10 (5.1.2+dfsg-1~16.04, Jun 16 2016, 17:37:42)
[PyPy 5.1.2 with GCC 5.3.1 20160413]
[platform:msg] Set platform with 'host' cc=None, using cc='gcc', version='Unknown'
[translation:info] Translating target as defined by src/tinySelf/target
[translation] translate.py configuration:
[translation] [translate]
    targetspec = src/tinySelf/target
[translation] translation configuration:
[translation] [translation]
    gc = incminimark
    gctransformer = framework
    list_comprehension_operations = True
    withsmallfuncsets = 5
[translation:info] Annotating&simplifying...
[2c] {translation-task
starting annotate
[translation:info] with policy: rpython.annotator.policy.AnnotatorPolicy
.............[60] translation-task}

[Timer] Timings:
[Timer] annotate                       --- 3.4 s
[Timer] ========================================
[Timer] Total:                         --- 3.4 s
[translation:info] Error:
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/goal/translate.py", line 318, in main
    drv.proceed(goals)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/driver.py", line 554, in proceed
    result = self._execute(goals, task_skip = self._maybe_skip())
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/tool/taskengine.py", line 114, in _execute
    res = self._do(goal, taskcallable, *args, **kwds)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/driver.py", line 278, in _do
    res = func()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/driver.py", line 315, in task_annotate
    s = annotator.build_types(self.entry_point, self.inputtypes)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 92, in build_types
    return self.build_graph_types(flowgraph, inputs_s, complete_now=complete_now)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 140, in build_graph_types
    self.complete()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 229, in complete
    self.complete_pending_blocks()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 224, in complete_pending_blocks
    self.processblock(graph, block)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 398, in processblock
    self.flowin(graph, block)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 501, in flowin
    self.consider_op(op)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 653, in consider_op
    resultcell = op.consider(self)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/flowspace/operation.py", line 104, in consider
    return spec(annotator, *self.args)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/flowspace/operation.py", line 189, in specialized
    return impl(*[annotator.annotation(x) for x in other_args])
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/unaryop.py", line 949, in simple_call
    return self.analyser(self.s_self, *args)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/tool/descriptor.py", line 18, in __call__
    return self.im_func(firstarg, *args, **kwds)
   File "/usr/lib/pypy/dist-packages/rply/lexergenerator.py", line 141, in method_matches
    model.SomeInteger(nonneg=True),
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/bookkeeper.py", line 572, in emulate_pbc_call
    return self.pbc_call(pbc, args, emulated=emulated)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/bookkeeper.py", line 535, in pbc_call
    results.append(desc.pycall(whence, args, s_previous_result, op))
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/description.py", line 284, in pycall
    inputcells = self.parse_arguments(args)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/description.py", line 269, in parse_arguments
    (self.name, e.getmsg()))
[translation:ERROR] AnnotatorError: 

signature mismatch: __init__() takes exactly 5 arguments (6 given)


Occurred processing the following simple_call:
      (AttributeError getting at the binding!)
    match_0 = simple_call(v0, v1, v2)

In <FunctionGraph of (rply.lexer:34)LexerStream.next at 0x5177168>:
Happened at file /usr/lib/pypy/dist-packages/rply/lexer.py line 43

==>             match = rule.matches(self.s, self.idx)
                if match:

Known variable annotations:
 v0 = SomeBuiltinMethod(analyser=<rpython.tool.descriptor.InstanceMethod object at 0x0000000005d69830>, methodname='matches', s_self=SomeRule())
 v1 = SomeChar(const='1', no_nul=True)
 v2 = SomeInteger(const=0, knowntype=int, nonneg=True, unsigned=False)

Processing block:
 block@131[rule_0...] is a <class 'rpython.flowspace.flowcontext.SpamBlock'> 
 in (rply.lexer:34)LexerStream.next 
 containing the following operations: 
       v0 = getattr(rule_0, ('matches')) 
       v1 = getattr(self_0, ('s')) 
       v2 = getattr(self_0, ('idx')) 
       match_0 = simple_call(v0, v1, v2) 
       v3 = bool(match_0) 
 --end--
[translation] start debugger...
> /home/bystrousak/Plocha/tests/pypy/rpython/annotator/description.py(269)parse_arguments()
-> (self.name, e.getmsg()))

IRC:

20:05 < RemoteFox> I have a slightly lame question - which of the branches are stable?
20:05 < cfbolz> RemoteFox: default (for CPython 2.7 compat) and py3.5
20:05 < RemoteFox> I was building my rpython project against the default, but it now can not translate
20:05 < cfbolz> what's the error?
20:07 < RemoteFox> https://gist.github.com/Bystroushaak/2913687c0bcac672bba9ca58cb3d5d18
20:07 < RemoteFox> last revision where it works is 94157
20:08 < cfbolz> RemoteFox: are you sure that's not a change in rply?
20:09 < RemoteFox> when I update the repo to the 94157, it translates
20:09 < RemoteFox> when I then swicht to the latest default, it doesn't
20:10 < RemoteFox> *switch
20:11 < RemoteFox> it may be a bug in rply, but I find strange that it would manifest itself between two revisons of pypy / rpython
20:12 < cfbolz> annoying
20:12 < cfbolz> no, I have a vague memory that we changed some stuff in the regular expression engine of rpython
20:12 < cfbolz> maybe that's the problem
20:13 -!- nunatak [~nunatak@unaffiliated/nunatak] has joined #pypy
20:13 < cfbolz> RemoteFox: can you see whether it is broken in ac140c11bea3
20:14 < cfbolz> and whether it works in b437cad15ce6
20:14 < RemoteFox> yeah, it is broken in ac140c11bea3
20:15 < RemoteFox> and yeah, works in b437cad15ce6
20:15 < cfbolz> ok, so I fear somebody needs to fix rply
20:16 < cfbolz> (it was really a severe miscompilation in the regular expression engine, which we couldn't easily fix without an API change)
20:16 < RemoteFox> oh, in which commits?
20:17 < cfbolz> ac140c11bea3
20:17 < RemoteFox> maybe I can just update my lexer's regexps
20:17 < RemoteFox> I will create a issue in rply
20:17 < RemoteFox> but I have my doubts whether it will be fixed
20:18 < cfbolz> I'll take a look
20:18 < cfbolz> why? rply is essentially dead?
20:19 < RemoteFox> I don't know, I can see some commits from this month, but there has been no reaction on my other issue
20:19 < cfbolz> Alex_Gaynor: you around?
20:19 <@Alex_Gaynor> cfbolz: yes
20:20 < cfbolz> Alex_Gaynor: I guess rply is not really on your agenda much, right?
20:20 <@Alex_Gaynor> cfbolz: uhh, I haven't spent a lot of time on it lately *reads scrollback*, it looks like rsre's API changed and I hven't fixed rply for it
20:21 < cfbolz> Alex_Gaynor: I can try to take a look
20:21 <@Alex_Gaynor> cfbolz: that'd be awesome, I'm happy to review/merge a PR if you have the time to figure out how the rsre API changed
20:22 < cfbolz> Alex_Gaynor: I did the change, so I should be able to ;-)
20:22 <@Alex_Gaynor> haha, perfect :D
20:22 < cfbolz> yeah, I see the test failures
20:22 < cfbolz> let me see 
20:49 < cfbolz> RemoteFox: I'll probably won't finish tonight, can you please file a bug and cc me? 
20:49 < cfbolz> So I don't forget

Highlighting @cfbolz as requested.

Replace the readme with real documentation.

Not having docs emberasses me.

Better warnings for conflicts

It would be nice if conflicts would report something a little more actionable than just the fact that a conflict exists. For example, reporting a relevant LRItem provides a clue, at least.

unbounded recursion in LexerStream.next()

This test program

from rply import LexerGenerator
lg = LexerGenerator()
lg.ignore(r'\s')
for token in lg.build().lex(' ' * 1000):
    pass

makes the Python interpreter sad:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    for token in lg.build().lex(' ' * 1000):
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
...
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 38, in next
    match = rule.matches(self.s, self.idx)
  File "/usr/lib/python3/dist-packages/rply/lexergenerator.py", line 33, in matches
    return Match(*m.span(0)) if m is not None else None
RecursionError: maximum recursion depth exceeded

meaning of "Production 'xxx' is not reachable"?

Take the test case for example:

rply/tests/test_warnings.py

Line 83 in 19a9e08

def test_unused_production(self):

Below are three cases with changed unused production rule:

case 1 -- No Warning

        @pg.production("main : VALUE")
        def main(p):
            return p[0]

        @pg.production("unused : main") 
        def unused(p):
            pass

case 2 -- Show Warning

        @pg.production("main : VALUE")
        def main(p):
            return p[0]

        @pg.production("unused : OTHER main")
        def unused(p):
            pass

case 3 -- No Warning

        @pg.production("main : VALUE")
        def main(p):
            return p[0]

        @pg.production("unused : VALUE main")
        def unused(p):
            pass

Now I'm quite puzzled about the meaning of this warning.

testsuite missing in pythonhosted tarball

Hello,

could you add tests/ subdir into release tarball? It can be used during the automated build for testing the package.

Thanks for considering

Feature request: version attribute

Could you add __version__ attribute to the rply module? Thanks!

rply 0.7.1 uses os.getuid() which does not exist on Windows

The recent commit: fc9bbcd uses os.getuid() which is only available on Unix systems. On Windows, a call to the ParserGenerator .build method results in an AttributeError:

>>> from rply import ParserGenerator
>>> pg = ParserGenerator(["VALUE"])
>>> @pg.production("main : VALUE")
... def main(p):
...     return p[0]
...
>>> parser = pg.build()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python3\lib\site-packages\rply-0.7.1-py3.3.egg\rply\parsergenerator.py", line 128, in build
AttributeError: 'module' object has no attribute 'getuid'

No module named 'rply' after installation

pypy wont locate 'rply', even after installing from pip

LexerGenerator has no error handler hooks

I can define a custom error handler for ParserGenerator with the ParserGenerator.error decorator, but if there's a lexing error I have to live with the default Python exception handling behaviour.

Looking at the code I see the else clause in LexerStream.next just throws a useless SourcePosition with both lineno and colno set to -1, so even if I were to try and catch the LexingError I wouldn't get any useful information.

Lexing error being thrown causes Segmentation Fault

I have a simple calculator written in RPython. If a lexing error is thrown, the program segfaults. It works fine under CPython and PyPy, however. Furthermore, if after a valid token, there is an invalid token, the invalid token and everything that follows it is completely ignored, rather than throwing a ParsingError.

The problem only occurs with RPython. I have the latest checkout(i.e. I just updated it right now).

Example:

I am on Ubuntu Precise.

translation failure with rpython in default branch of pypy but success with py3.6 branch

As I'm trying to translate cycy, first I used rpython in default branch of the cloned pypy source, ending in this error:

(py27) Xuans-MBP:cycy xuanwu$ python ~/git/pypy/rpython/bin/rpython cycy/target.py
[translation:info] 2.7.17 |Anaconda, Inc.| (default, Oct 21 2019, 14:10:59) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
[platform:msg] Set platform with 'host' cc=None, using cc='clang -arch x86_64', version='Unknown'
[translation:info] Translating target as defined by cycy/target
/Users/xuanwu/git/cycy/cycy/parser/core.py:507: ParserGeneratorWarning: 184 shift/reduce conflicts
  _parser = _pg.build()
/Users/xuanwu/git/cycy/cycy/parser/core.py:507: ParserGeneratorWarning: 30 reduce/reduce conflicts
  _parser = _pg.build()
[translation] translate.py configuration:
[translation] [translate]
    targetspec = cycy/target
[translation] translation configuration:
[translation] [translation]
    gc = incminimark
    gctransformer = framework
    list_comprehension_operations = True
    withsmallfuncsets = 5
[translation:info] Annotating&simplifying...
[f7] {translation-task
starting annotate
[translation:info] with policy: rpython.annotator.policy.AnnotatorPolicy
.....++++++++++++++****************************%%%%%%%%%%%%%%%%%%%%%###%########%%%#####################################%%%%%%%%%%%%###%***************
..++++++++++++++****[1b6] translation-task}

[Timer] Timings:
[Timer] annotate                       --- 3.5 s
[Timer] ========================================
[Timer] Total:                         --- 3.5 s
[translation:info] Error:
   File "/Users/xuanwu/git/pypy/rpython/translator/goal/translate.py", line 318, in main
    drv.proceed(goals)
   File "/Users/xuanwu/git/pypy/rpython/translator/driver.py", line 555, in proceed
    result = self._execute(goals, task_skip = self._maybe_skip())
   File "/Users/xuanwu/git/pypy/rpython/translator/tool/taskengine.py", line 114, in _execute
    res = self._do(goal, taskcallable, *args, **kwds)
   File "/Users/xuanwu/git/pypy/rpython/translator/driver.py", line 278, in _do
    res = func()
   File "/Users/xuanwu/git/pypy/rpython/translator/driver.py", line 315, in task_annotate
    s = annotator.build_types(self.entry_point, self.inputtypes)
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 92, in build_types
    return self.build_graph_types(flowgraph, inputs_s, complete_now=complete_now)
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 140, in build_graph_types
    self.complete()
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 229, in complete
    self.complete_pending_blocks()
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 224, in complete_pending_blocks
    self.processblock(graph, block)
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 398, in processblock
    self.flowin(graph, block)
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 501, in flowin
    self.consider_op(op)
   File "/Users/xuanwu/git/pypy/rpython/annotator/annrpython.py", line 653, in consider_op
    resultcell = op.consider(self)
   File "/Users/xuanwu/git/pypy/rpython/flowspace/operation.py", line 104, in consider
    return spec(annotator, *self.args)
   File "/Users/xuanwu/git/pypy/rpython/annotator/unaryop.py", line 118, in simple_call_SomeObject
    return s_func.call(argspec)
   File "/Users/xuanwu/git/pypy/rpython/annotator/unaryop.py", line 978, in call
    return bookkeeper.pbc_call(self, args)
   File "/Users/xuanwu/git/pypy/rpython/annotator/bookkeeper.py", line 535, in pbc_call
    results.append(desc.pycall(whence, args, s_previous_result, op))
   File "/Users/xuanwu/git/pypy/rpython/annotator/classdesc.py", line 732, in pycall
    s_init.call(args)
   File "/Users/xuanwu/git/pypy/rpython/annotator/unaryop.py", line 978, in call
    return bookkeeper.pbc_call(self, args)
   File "/Users/xuanwu/git/pypy/rpython/annotator/bookkeeper.py", line 535, in pbc_call
    results.append(desc.pycall(whence, args, s_previous_result, op))
   File "/Users/xuanwu/git/pypy/rpython/annotator/description.py", line 284, in pycall
    inputcells = self.parse_arguments(args)
   File "/Users/xuanwu/git/pypy/rpython/annotator/description.py", line 269, in parse_arguments
    (self.name, e.getmsg()))
[translation:ERROR] AnnotatorError: 

signature mismatch: __init__() takes exactly 4 arguments (5 given)


Occurred processing the following simple_call:
  function StrMatchContext.__init__ </Users/xuanwu/git/pypy/rpython/rlib/rsre/rsre_core.py, line 256> returning

    ctx_0 = simple_call((type StrMatchContext), s_0, pos_0, v6, v7)

In <FunctionGraph of (rply.lexergenerator:30)matches at 0x10911fb10>:
Happened at file /opt/miniconda3/envs/py27/lib/python2.7/site-packages/rply/lexergenerator.py line 36

==>             ctx = rsre_core.StrMatchContext(s, pos, len(s), self.flags)
    
                matched = rsre_core.match_context(ctx, self._pattern)
                if matched:

Known variable annotations:
 s_0 = SomeString(no_nul=True)
 pos_0 = SomeInteger(const=0, knowntype=int, nonneg=True, unsigned=False)
 v6 = SomeInteger(knowntype=int, nonneg=True, unsigned=False)
 v7 = SomeInteger(const=0, knowntype=int, nonneg=True, unsigned=False)

Processing block:
 block@101[pos_0...] is a <class 'rpython.flowspace.flowcontext.SpamBlock'> 
 in (rply.lexergenerator:30)matches 
 containing the following operations: 
       v6 = len(s_0) 
       v7 = getattr(self_0, ('flags')) 
       ctx_0 = simple_call((type StrMatchContext), s_0, pos_0, v6, v7) 
       v8 = getattr(self_0, ('_pattern')) 
       matched_0 = simple_call((function match_context), ctx_0, v8) 
       v9 = bool(matched_0) 
 --end--
[translation] start debugger...
> /Users/xuanwu/git/pypy/rpython/annotator/description.py(269)parse_arguments()
-> (self.name, e.getmsg()))

Then I switched to py3.6 branch with $ hg update py3.6 in pypy source path, and succeeded to translate.

Being new to both pypy and rply, I wonder what's the difference of rpython between these two branches?

Thanks.

Allow additional state in Parser

It would be useful, if an object could be passed to the parser, that is passed to every production and the error handler.

Such an object could be used to provide information about the current file being parsed or to maintain additional state within the parser.

Implement indentation-sensitive grammar

This paper presents a formalism for expressing indentation-sensitive grammars in GLR and LR(k) parsers which they claim easily generalises to various other parsers including LALR.

Would it be possible to extend the production syntax accepted by ParserGenerator to accept this and DTRT? I'd prefer not to insert a hack between the lexing and parsing stages of my compiler to convert whitespace into INDENT/DEDENT tokens.

parsergenerator.build fails on a read-only filesystem

See hylang/hy#1598.

Any plan to bump the version?

Is there any specific features to add or bugs to fix before a new release?

Weird SourcePosition for \n characters

It seems that \n characters get the line number for the next line, but a column number that corresponds to the last column seen on the previous line, plus 1. I.e.:

Token('NAME', 'def') 1 1
Token('NAME', 'main') 1 5
Token('LPAR', '(') 1 9
Token('RPAR', ')') 1 10
Token('NL', '\n') 2 11
Token('TABS', '\t') 2 1
Token('NAME', 'pass') 2 2

I'd argue that it should either be the last character of the previous line (e.g. 1, 11 in this example) or the first character of the next line (that would be 2, 1 here), but not some weird amalgam of both.

Is it possible to get the full list of generated gramar / on the debugging

Is there any way how to get the full grammar specification from the parser? I am getting "ParsingError" messages and I have no idea why. Also - how do you debug the parser when you are getting errors like this?

How to read the commands for all lines?

I have a file with the following commands:

print(if 1 == 1)
print(if 1 == 1)

When reading the first line, it returns True, but on the second it returns a ValueError. This is because my program reads only the first line. Is there any way to read all the lines?

API docs not online

For some reason, https://rply.readthedocs.org/en/latest/api/rply.html is empty. After some experimenting, I have no idea why that would happen, though, so I don't have any kind of solution.

What is the $end token?

I have used the RPLY library to write a compiler and one error seems to be cropping up the most. That error is the following:

ValueError: Token('$end', '$end')

What is the $end token and how can I fix the error?

How would you parse a standard if - else if - else statement?

I tried quite a few things and still did not find a way to do this.

It seems to me as if the parser desperately tries to follow one path and when it fails, instead of looking for another, it just stops.

Here is my current implementation of a parser:

        @self.pg.production('file : ')
        @self.pg.production('file : expression_seq')

        @self.pg.production('block : INDENT expression_seq DEDENT')

        @self.pg.production('expression_seq : expression')
        @self.pg.production('expression_seq : expression NEWLINE expression_seq')

        @self.pg.production('else_clause : else NEWLINE block')

        @self.pg.production('else_if_clause : else_if expression NEWLINE block')

        @self.pg.production('else_if_clause_seq : else_if_clause')
        @self.pg.production('else_if_clause_seq : else_if_clause NEWLINE else_if_clause_seq')

        @self.pg.production('expression : if expression NEWLINE block')
        @self.pg.production('expression : if expression NEWLINE block NEWLINE else_if_clause_seq')
        @self.pg.production('expression : if expression NEWLINE block NEWLINE else_clause')
        @self.pg.production('expression : if expression NEWLINE block NEWLINE else_if_clause_seq NEWLINE else_clause')

        @self.pg.production('expression : INTEGER')

        @self.pg.production('expression : false')
        @self.pg.production('expression : true')

Here is the grammar:

file = [ expression_seq ] ;
expression_seq = expression , { NEWLINE , expression } ;
block = INDENT , expression_seq , DEDENT ;
expression = if | INTEGER | 'false' | 'true' ;
if = 'if' , expression , NEWLINE , block , { NEWLINE , else_if_clause_seq } , [ NEWLINE , else_clause ] ;
else_clause = 'else' , block ;
else_if_clause = 'else if' , expression , NEWLINE , block ;
else_if_clause_seq = else_if_clause , { NEWLINE , else_if_clause } ;

Is there something wrong with my rules? How would you implement such a (common) grammar?

ParserGenerator creates cache files non-atomically

RPLY doesn't create cache files atomically.
Therefore it's possible that one ParserGenerator reads the cache file when it's already created, but not yet fully written by another ParserGenerator.
Here's a simple reproducer, which tries to create two grammars in parallel:

import concurrent.futures
import random

import rply

def build_grammar():
    pg = rply.ParserGenerator(['VALUE'], cache_id=cache_id)
    @pg.production('main : VALUE')
    def main(p):
        return p[0]
    pg.build()
    return pg.build()

while True:
    cache_id = 'simple-' + ''.join(str(random.randint(0, 9)) for x in range(1, 10))
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as tpe:
        fu1 = tpe.submit(build_grammar)
        fu2 = tpe.submit(build_grammar)
        print(fu1.result(), fu2.result())

Sooner or later it fails with:

Traceback (most recent call last):
  File "parallel-rply.py", line 19, in <module>
    print(fu1.result(), fu2.result())
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 395, in result
    return self.__get_result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 354, in __get_result
    raise self._exception
  File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "parallel-rply.py", line 11, in build_grammar
    pg.build()
  File "/usr/lib/python3/dist-packages/rply/parsergenerator.py", line 189, in build
    data = json.load(f)
  File "/usr/lib/python3.4/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

CVE-2014-1938: still uses /tmp insecurely (forwarding from Debian BTS #737627)

Hello,

There has been a security issue reported at Debian against rply. This issue is more than a year old. Can this be fixed by upstream?.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737627