Giter Club home page Giter Club logo

Comments (12)

MegaIng avatar MegaIng commented on July 16, 2024

See #1371

It's not exactly easy and especially for C-like syntax that is already tricky to parse with lark it might be quite annoying.

from lark.

jjalowie avatar jjalowie commented on July 16, 2024

Wow, thanks for a quick response.
I'm thinking about moving on with Earley using the start: /.*/ rule /.*/ approach (my inputs aren't very big so should be fine performance-wise).
I ran into the following problem though when experimenting around:

import lark
grammar = f"""
    start: /.*/ expr /.*/
    expr: "qwerty"
    %import common.WS
    %ignore WS
"""
parser = lark.Lark(grammar, start='start', parser='earley', lexer='dynamic_complete')

ast = parser.parse("""asdf qwerty poiuy""")
print(ast.pretty())

This gives the following error:

[...]
  File "/home/jjalowie/.bin/mlir_parser/venv/lib/python3.10/site-packages/lark/parser_frontends.py", line 215, in create_earley_parser
    return f(lexer_conf, parser_conf, resolve_ambiguity=resolve_ambiguity,
  File "/home/jjalowie/.bin/mlir_parser/venv/lib/python3.10/site-packages/lark/parser_frontends.py", line 192, in create_earley_parser__dynamic
    earley_matcher = EarleyRegexpMatcher(lexer_conf)
  File "/home/jjalowie/.bin/mlir_parser/venv/lib/python3.10/site-packages/lark/parser_frontends.py", line 178, in __init__
    raise GrammarError("Dynamic Earley doesn't allow zero-width regexps", t)
lark.exceptions.GrammarError: ("Dynamic Earley doesn't allow zero-width regexps", TerminalDef('__ANON_0', '.*'))

I'm using lark 1.1.9.

from lark.

MegaIng avatar MegaIng commented on July 16, 2024

As it says in the error message, the terminal can't be zero width. Use /.+/? instead.

from lark.

jjalowie avatar jjalowie commented on July 16, 2024

Hm, I can't figure out how to proceed.
Here is an example:

import lark
grammar = f"""
    start: /.+/? expr+ /.+/?
    expr: "qwerty" | "asdf"
"""
parser = lark.Lark(grammar, start='start', parser='earley', lexer='dynamic_complete')
ast = parser.parse("""asdf qwerty""")
print(ast.pretty())
# output:
# start
#   expr
#    qwerty

I can't force asdf to be parsed as an expr because it gets parsed as /.+/? greedily. Any tips how to cope with that?

from lark.

MegaIng avatar MegaIng commented on July 16, 2024

Use ambiguity='explicit' and manually select the correct parse, see https://github.com/lark-parser/lark/blob/master/examples/advanced/dynamic_complete.py

Or choose the easier and quicker root and use the scan function I coded up in the other issue.

from lark.

jjalowie avatar jjalowie commented on July 16, 2024

From what I see MegaIng@4975608 doesn't expose a way to use transformers on the matched code. Could this be improved so I can also use transformers? What would be the needed steps?

For now I went with the Earley parser. I'm stuck on the below code. What should it behave like?

import lark

grammar = f"""
    start: /.+/? expr+ /.+/?
    expr: "asdf" | "qwerty"
"""

parser = lark.Lark(grammar, parser='earley', lexer='dynamic_complete', ambiguity='explicit')
ast = parser.parse("""asdf qwerty""")
print(ast.pretty())

It yields this output:

start
  expr
   qwerty

Shouldn't there be an ambiguity between /.*/? and expr visible in the AST this time because of ambiguity='explicit'? Isn't it a bug in lark?

from lark.

MegaIng avatar MegaIng commented on July 16, 2024

Yeah, that does indeed look like a bug, maybe @erezsh has an idea what is going on.

With regard to scan: You have to do a bit of extra work right now, but you get the starting and end address of the segments that match, which you can then replace in the original. This would make a good recipe to add the documentation when scan gets officially added.

from lark.

MegaIng avatar MegaIng commented on July 16, 2024

This function should do what you want using the scan method:

def scan_and_replace(parser: lark.Lark, text: str, replacement: Callable[[lark.ParseTree], str],
                     start: str = None) -> str:
    """
    Scans the `text` and replaces all matches of `parser` by the value returned by `replacement`
    given the corresponding tree.
    
    `start` is for passing in the start rule if required, not the starting position of the scanning.
    """
    last = 0
    res = ""
    for (start_pos, end_pos), tree in parser.scan(text, start=start):
        res += text[last:start_pos]
        res += replacement(tree)
        last = end_pos
    return res

from lark.

jjalowie avatar jjalowie commented on July 16, 2024

This function should do what you want using the scan method:
[...]

That looks very promising. I will pick up from there. Thanks a lot!

from lark.

erezsh avatar erezsh commented on July 16, 2024

For now I went with the Earley parser. I'm stuck on the below code. What should it behave like?

import lark

grammar = f"""
    start: /.+/? expr+ /.+/?
    expr: "asdf" | "qwerty"
"""

It's not a bug, you have a space in your text, and only ANY can handle it.

When I change to this grammar:

    !start: any? expr+ any?
    any: /.+/
    !expr: "asdf" | "qwerty"

    %ignore " "

I get -

_ambig
  start
    expr        asdf
    any  qwerty
  start
    expr        asdf
    any qwerty
  start
    expr        asdf
    expr        qwerty

(this is without dynamic_complete, which adds more derivations)

from lark.

MegaIng avatar MegaIng commented on July 16, 2024

@erezsh No, what actually fixed it is moving /.+/? into a separate rule. These two grammars should have the same behavior, but don't:

grammar = f"""
    start: any? expr+ any?
    any: /.+/
    !expr: "asdf" | "qwerty"
"""
grammar = f"""
    start: /.+/? expr+ /.+/?
    !expr: "asdf" | "qwerty"
"""

The later doesn't produce any ambiguities with dynamic_complete, the former does.

Similar for your grammar, with dynamic_complete it produces fewer derivations when any is inlined.

from lark.

erezsh avatar erezsh commented on July 16, 2024

I tested @MegaIng 's example again on the latest master, and looks like this bug is fixed!

from lark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.