Comments (14)
If you mean the terminal values, then yes, of course. You do it by adding "i" to the end of the string or regexp.
Examples:
HELLO: "hello"i // Match HELLO, Hello, hello, etc.
REGEXP: /this is [also] case? insensitive/i
You can mix them too:
some_rule: "Case Sensitive" "insensitive"i "Sensitive again"
Does that answer you question?
from lark.
No. You can open an issue with that feature request.
from lark.
This is the only thing I don't like about Lark.
Today I implemented a grammar described in the Unicode standard.
// Grammar based on grammar in The Unicode Standard v13.0 (2020)
// ch. 18, sec. 2, "Ideographic Description Characters
//
// See http://www.unicode.org/versions/Unicode13.0.0/ch18.pdf p. 733
// Also, FYI, this library differentiates terminals/non-terminals by
// whether or not they're capitalized. ugh
?start : icds*
icds : UNICODE_CLASS
| IDS_BINARYOPERATOR icds icds
| IDS_TRINARYOPERATOR icds icds icds
UNICODE_CLASS : IDEOGRAPHIC | RADICAL | CJK_STROKE | PRIVATE_USE | "\uFF1F"
IDEOGRAPHIC : /[\p{Ideo}]/
RADICAL : /[\p{Radical}]/
PRIVATE_USE : /[\p{Private_Use}]/
CJK_STROKE : "\u31C0" .. "\u31E3"
IDS_BINARYOPERATOR : "\u2FF0" | "\u2FF1" | "\u2FF4" .. "\u2FFB"
IDS_TRINARYOPERATOR : "\u2FF2" | "\u2FF3"
It triggers my OCD big time that icds
has to be like that. Not even trying to alias it with -> IDS
works... 😖
This is a great little library, but yeah, this part bites.
from lark.
Yes, the grammar is case sensitive.
Rules consist of lowercase letters (such as stmt)
Terminals consist of uppercase letters (such as STMT)
This distinction is used in many parsers (such as yacc), and it affects the lexing stage (if any) and the resulting parse tree.
This is also explained in the json tutorial (https://github.com/erezsh/lark/blob/master/docs/json_tutorial.md)
from lark.
Can the tokens themselves in the grammar be case insensitive? It doesn't look like Lark can handle that.
from lark.
thankyoU!
from lark.
is there a global flag to set all terminals to be case insensitive?
from lark.
Let me know please if this issue is not the right place/format for a feature request.
from lark.
Looks good.
from lark.
@ctrlcctrlv And what is your suggestion? Terminals are by definition regex. If you can't parse it with a regex, it can't be a terminal.
from lark.
@MegaIng I suggest that aliases be allowed to go to a name of any casing.
from lark.
@ctrlcctrlv @erezsh It might be interesting to consider this meaning 'combine everything into a single Token'.
this would mean: Just take all children, fully expand all sub trees, then join all tokens in a single token of name IDS
. I will see if I manage to implement that.
Is backwards compatible, since the syntax was not allowed previously.
from lark.
@ctrlcctrlv I don't really understand the complaint. Using lowercase/uppercse for rules and terminals is a standard practice in many (if not most) parsing tools. Also, rules and terminals behave different, so it makes sense to differentiate between them somehow.
And actually, I think you're using too many terminals. For example, it makes more sense that UNICODE_CLASS
will actually be a rule, because you would want to process the result afterwards, and to be able to know which of the terminals it consists of was actually matched.
@MegaIng Not sure about your solution. Sounds a bit strange. If anything, it makes more sense to me that everything should be rules only, and we will automatically turn what we can into terminals. (ofc, that's not necessarily so simple to do)
from lark.
I would prefer if the standard and my parser used the same names, is all. Also, with all due respect, you're wrong for this use case, I do want this many terminals, and each ideograph to be its own node on the tree. :-)
from lark.
Related Issues (20)
- Incorrect start_pos / end_pos in the tree HOT 8
- Add `outlines` in the list of projects using Lark HOT 2
- Lark.open_from_package() does not support namespace packages HOT 2
- Stand-alone program cannot be run HOT 4
- Issue of installing lark in Python HOT 1
- Pipe in terminal regex not working as expected HOT 1
- Transformer Not Applying Expected Transformations in Lark Parser HOT 3
- Deprecation Warning HOT 6
- accepts() vs choices() in InteractiveParser HOT 10
- No such file or directory: 'COMMON.lark' HOT 4
- Grammar Syntax For Unordered Groups HOT 1
- Is it possible to parse parts of the input? HOT 12
- Forgiving syntax HOT 3
- Post 1388 changes HOT 4
- Dynamic Earley: Incorrect value for SymbolNode.end
- Inconsistent parse results from simple ambiguous grammar HOT 4
- Superfluous identical ambiguities in Earley HOT 2
- Porting from pyparsing match_previous_literal HOT 4
- _TERMINAL appears in tree HOT 1
- Lexer matches shorter literals before longer literals HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.