Comments (8)
The first recommendation is always to try and use parser='lalr'
instead of parser='earley'
. For most sane languages that should be perfectly possible, but it does require reworking the grammar.
Try to have as much as possible in your grammar be a terminals, not rules, however you shouldn't use non-regular regex features like lookaheads/behinds.
If you have to use earley because your grammar is ambiguous for some reason or just doesn't fit into LALR, you should make sure that your regex patterns are as simple and non-overlapping as possible, and make sure that your grammar is left-recurisve, i.e. a rule only (or at least primarily) appears as it's own first child.
You can try switching to lexer='basic'
if your tokens are cleanly distinct, although I am not sure if this will give a performance benefit.
You can also switch to use PyPy instead of CPython for a very huge boost almost always.
from lark.
Oh, also use a newer version of lark
than 0.7.8
from lark.
Try to let the regex engine capture as much as possible. For example, this is VERY bad:
bare_id : (letter| underscore) (letter|digit|underscore|id_chars)*
Try making bare_id (and its dependencies) into a terminal. It should help a lot.
Avoid unnecessary right-recursion. This is very bad:
semi_affine_expr : "(" semi_affine_expr ")" -> semi_affine_parens
| semi_affine_expr "+" semi_affine_expr -> semi_affine_add
See the calculator example.
Also, try reducing the number of rules. You have a lot of unnecessary duplication of rules.
It's not worth it to use Earley to validate every corner of the input. Just make sure it parses into the correct structure, and validate afterwards.
from lark.
@erezsh, @MegaIng, thank you so much for the quick response. Ill try some of these suggestions and update the thread accordingly.
from lark.
Try to let the regex engine capture as much as possible. For example, this is VERY bad:
bare_id : (letter| underscore) (letter|digit|underscore|id_chars)*Try making bare_id (and its dependencies) into a terminal. It should help a lot.
Avoid unnecessary right-recursion. This is very bad:
semi_affine_expr : "(" semi_affine_expr ")" -> semi_affine_parens | semi_affine_expr "+" semi_affine_expr -> semi_affine_addSee the calculator example.
Also, try reducing the number of rules. You have a lot of unnecessary duplication of rules.
It's not worth it to use Earley to validate every corner of the input. Just make sure it parses into the correct structure, and validate afterwards.
@erezsh, I have question on the difference between rules vs terminal. Is there any difference in the way a rule and a terminal is processed internally by lark. specifically with respect to your suggestion on keeping regrex's to terminals?
from lark.
Yes, there's an important difference. A terminal turns into a single regular expression. So in the following:
X: "a" "b" "c"
x: "a" "b" "c"
X
will be a regexp "abc"
, while x
will be 3 separate regexes, and their matching will be managed by the parser. The terminal is a lot faster to match. But the rule is more flexible and can do more things. But only use rules when it makes sense.
from lark.
https://lark-parser.readthedocs.io/en/stable/grammar.html#
Hi @erezsh, the document above is where I started to understand about lark grammar definition. Is there any other documents that describe how to define grammar using lark? Specifically guidelines on the best practices for grammar definition.
from lark.
I think we have good information to move forward here. So, closing the issue for now.
from lark.
Related Issues (20)
- Making a comment by using regular expression HOT 5
- earley very, very slow HOT 24
- Cant read `meta` from Tree or Token? HOT 5
- Unable to parse Arabic text HOT 3
- Incorrect start_pos / end_pos in the tree HOT 8
- Add `outlines` in the list of projects using Lark HOT 2
- Lark.open_from_package() does not support namespace packages HOT 2
- Stand-alone program cannot be run HOT 4
- Issue of installing lark in Python HOT 1
- Pipe in terminal regex not working as expected HOT 1
- Transformer Not Applying Expected Transformations in Lark Parser HOT 3
- Deprecation Warning HOT 6
- accepts() vs choices() in InteractiveParser HOT 10
- No such file or directory: 'COMMON.lark' HOT 4
- Grammar Syntax For Unordered Groups HOT 1
- Is it possible to parse parts of the input? HOT 12
- Forgiving syntax HOT 3
- Post 1388 changes HOT 4
- Dynamic Earley: Incorrect value for SymbolNode.end
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.