Comments (12)
See #1371
It's not exactly easy and especially for C-like syntax that is already tricky to parse with lark it might be quite annoying.
from lark.
Wow, thanks for a quick response.
I'm thinking about moving on with Earley using the start: /.*/ rule /.*/
approach (my inputs aren't very big so should be fine performance-wise).
I ran into the following problem though when experimenting around:
import lark
grammar = f"""
start: /.*/ expr /.*/
expr: "qwerty"
%import common.WS
%ignore WS
"""
parser = lark.Lark(grammar, start='start', parser='earley', lexer='dynamic_complete')
ast = parser.parse("""asdf qwerty poiuy""")
print(ast.pretty())
This gives the following error:
[...]
File "/home/jjalowie/.bin/mlir_parser/venv/lib/python3.10/site-packages/lark/parser_frontends.py", line 215, in create_earley_parser
return f(lexer_conf, parser_conf, resolve_ambiguity=resolve_ambiguity,
File "/home/jjalowie/.bin/mlir_parser/venv/lib/python3.10/site-packages/lark/parser_frontends.py", line 192, in create_earley_parser__dynamic
earley_matcher = EarleyRegexpMatcher(lexer_conf)
File "/home/jjalowie/.bin/mlir_parser/venv/lib/python3.10/site-packages/lark/parser_frontends.py", line 178, in __init__
raise GrammarError("Dynamic Earley doesn't allow zero-width regexps", t)
lark.exceptions.GrammarError: ("Dynamic Earley doesn't allow zero-width regexps", TerminalDef('__ANON_0', '.*'))
I'm using lark 1.1.9.
from lark.
As it says in the error message, the terminal can't be zero width. Use /.+/?
instead.
from lark.
Hm, I can't figure out how to proceed.
Here is an example:
import lark
grammar = f"""
start: /.+/? expr+ /.+/?
expr: "qwerty" | "asdf"
"""
parser = lark.Lark(grammar, start='start', parser='earley', lexer='dynamic_complete')
ast = parser.parse("""asdf qwerty""")
print(ast.pretty())
# output:
# start
# expr
# qwerty
I can't force asdf
to be parsed as an expr
because it gets parsed as /.+/?
greedily. Any tips how to cope with that?
from lark.
Use ambiguity='explicit'
and manually select the correct parse, see https://github.com/lark-parser/lark/blob/master/examples/advanced/dynamic_complete.py
Or choose the easier and quicker root and use the scan
function I coded up in the other issue.
from lark.
From what I see MegaIng@4975608 doesn't expose a way to use transformers on the matched code. Could this be improved so I can also use transformers? What would be the needed steps?
For now I went with the Earley parser. I'm stuck on the below code. What should it behave like?
import lark
grammar = f"""
start: /.+/? expr+ /.+/?
expr: "asdf" | "qwerty"
"""
parser = lark.Lark(grammar, parser='earley', lexer='dynamic_complete', ambiguity='explicit')
ast = parser.parse("""asdf qwerty""")
print(ast.pretty())
It yields this output:
start
expr
qwerty
Shouldn't there be an ambiguity between /.*/?
and expr
visible in the AST this time because of ambiguity='explicit'
? Isn't it a bug in lark?
from lark.
Yeah, that does indeed look like a bug, maybe @erezsh has an idea what is going on.
With regard to scan
: You have to do a bit of extra work right now, but you get the starting and end address of the segments that match, which you can then replace in the original. This would make a good recipe to add the documentation when scan
gets officially added.
from lark.
This function should do what you want using the scan method:
def scan_and_replace(parser: lark.Lark, text: str, replacement: Callable[[lark.ParseTree], str],
start: str = None) -> str:
"""
Scans the `text` and replaces all matches of `parser` by the value returned by `replacement`
given the corresponding tree.
`start` is for passing in the start rule if required, not the starting position of the scanning.
"""
last = 0
res = ""
for (start_pos, end_pos), tree in parser.scan(text, start=start):
res += text[last:start_pos]
res += replacement(tree)
last = end_pos
return res
from lark.
This function should do what you want using the scan method:
[...]
That looks very promising. I will pick up from there. Thanks a lot!
from lark.
For now I went with the Earley parser. I'm stuck on the below code. What should it behave like?
import lark grammar = f""" start: /.+/? expr+ /.+/? expr: "asdf" | "qwerty" """
It's not a bug, you have a space in your text, and only ANY can handle it.
When I change to this grammar:
!start: any? expr+ any?
any: /.+/
!expr: "asdf" | "qwerty"
%ignore " "
I get -
_ambig
start
expr asdf
any qwerty
start
expr asdf
any qwerty
start
expr asdf
expr qwerty
(this is without dynamic_complete
, which adds more derivations)
from lark.
@erezsh No, what actually fixed it is moving /.+/?
into a separate rule. These two grammars should have the same behavior, but don't:
grammar = f"""
start: any? expr+ any?
any: /.+/
!expr: "asdf" | "qwerty"
"""
grammar = f"""
start: /.+/? expr+ /.+/?
!expr: "asdf" | "qwerty"
"""
The later doesn't produce any ambiguities with dynamic_complete
, the former does.
Similar for your grammar, with dynamic_complete
it produces fewer derivations when any
is inlined.
from lark.
I tested @MegaIng 's example again on the latest master, and looks like this bug is fixed!
from lark.
Related Issues (20)
- How to define lark grammar for best parsing performance HOT 8
- Unable to parse Arabic text HOT 3
- Incorrect start_pos / end_pos in the tree HOT 8
- Add `outlines` in the list of projects using Lark HOT 2
- Lark.open_from_package() does not support namespace packages HOT 2
- Stand-alone program cannot be run HOT 4
- Issue of installing lark in Python HOT 1
- Pipe in terminal regex not working as expected HOT 1
- Transformer Not Applying Expected Transformations in Lark Parser HOT 3
- Deprecation Warning HOT 6
- accepts() vs choices() in InteractiveParser HOT 10
- No such file or directory: 'COMMON.lark' HOT 4
- Grammar Syntax For Unordered Groups HOT 1
- Forgiving syntax HOT 3
- Post 1388 changes HOT 4
- Dynamic Earley: Incorrect value for SymbolNode.end
- Inconsistent parse results from simple ambiguous grammar HOT 4
- Superfluous identical ambiguities in Earley HOT 2
- Porting from pyparsing match_previous_literal HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.