Giter Club home page Giter Club logo

Comments (14)

erezsh avatar erezsh commented on July 30, 2024 4

If you mean the terminal values, then yes, of course. You do it by adding "i" to the end of the string or regexp.

Examples:

HELLO: "hello"i   // Match HELLO, Hello, hello, etc.
REGEXP: /this is [also] case? insensitive/i

You can mix them too:

some_rule: "Case Sensitive"  "insensitive"i  "Sensitive again"

Does that answer you question?

from lark.

erezsh avatar erezsh commented on July 30, 2024 1

No. You can open an issue with that feature request.

from lark.

ctrlcctrlv avatar ctrlcctrlv commented on July 30, 2024 1

This is the only thing I don't like about Lark.

Today I implemented a grammar described in the Unicode standard.

// Grammar based on grammar in The Unicode Standard v13.0 (2020)
// ch. 18, sec. 2, "Ideographic Description Characters
//
// See http://www.unicode.org/versions/Unicode13.0.0/ch18.pdf p. 733

// Also, FYI, this library differentiates terminals/non-terminals by
// whether or not they're capitalized. ugh

?start              : icds*
icds                : UNICODE_CLASS
                    | IDS_BINARYOPERATOR icds icds
                    | IDS_TRINARYOPERATOR icds icds icds

UNICODE_CLASS       : IDEOGRAPHIC | RADICAL | CJK_STROKE | PRIVATE_USE | "\uFF1F"
IDEOGRAPHIC         : /[\p{Ideo}]/
RADICAL             : /[\p{Radical}]/
PRIVATE_USE         : /[\p{Private_Use}]/
CJK_STROKE          : "\u31C0" .. "\u31E3"

IDS_BINARYOPERATOR  : "\u2FF0" | "\u2FF1" | "\u2FF4" .. "\u2FFB"
IDS_TRINARYOPERATOR : "\u2FF2" | "\u2FF3"

It triggers my OCD big time that icds has to be like that. Not even trying to alias it with -> IDS works... 😖

This is a great little library, but yeah, this part bites.

from lark.

erezsh avatar erezsh commented on July 30, 2024

Yes, the grammar is case sensitive.
Rules consist of lowercase letters (such as stmt)
Terminals consist of uppercase letters (such as STMT)
This distinction is used in many parsers (such as yacc), and it affects the lexing stage (if any) and the resulting parse tree.
This is also explained in the json tutorial (https://github.com/erezsh/lark/blob/master/docs/json_tutorial.md)

from lark.

bjourne avatar bjourne commented on July 30, 2024

Can the tokens themselves in the grammar be case insensitive? It doesn't look like Lark can handle that.

from lark.

bjourne avatar bjourne commented on July 30, 2024

thankyoU!

from lark.

rdeoliveira avatar rdeoliveira commented on July 30, 2024

is there a global flag to set all terminals to be case insensitive?

from lark.

rdeoliveira avatar rdeoliveira commented on July 30, 2024

Let me know please if this issue is not the right place/format for a feature request.

from lark.

erezsh avatar erezsh commented on July 30, 2024

Looks good.

from lark.

MegaIng avatar MegaIng commented on July 30, 2024

@ctrlcctrlv And what is your suggestion? Terminals are by definition regex. If you can't parse it with a regex, it can't be a terminal.

from lark.

ctrlcctrlv avatar ctrlcctrlv commented on July 30, 2024

@MegaIng I suggest that aliases be allowed to go to a name of any casing.

from lark.

MegaIng avatar MegaIng commented on July 30, 2024

@ctrlcctrlv @erezsh It might be interesting to consider this meaning 'combine everything into a single Token'.

this would mean: Just take all children, fully expand all sub trees, then join all tokens in a single token of name IDS. I will see if I manage to implement that.

Is backwards compatible, since the syntax was not allowed previously.

from lark.

erezsh avatar erezsh commented on July 30, 2024

@ctrlcctrlv I don't really understand the complaint. Using lowercase/uppercse for rules and terminals is a standard practice in many (if not most) parsing tools. Also, rules and terminals behave different, so it makes sense to differentiate between them somehow.

And actually, I think you're using too many terminals. For example, it makes more sense that UNICODE_CLASS will actually be a rule, because you would want to process the result afterwards, and to be able to know which of the terminals it consists of was actually matched.

@MegaIng Not sure about your solution. Sounds a bit strange. If anything, it makes more sense to me that everything should be rules only, and we will automatically turn what we can into terminals. (ofc, that's not necessarily so simple to do)

from lark.

ctrlcctrlv avatar ctrlcctrlv commented on July 30, 2024

I would prefer if the standard and my parser used the same names, is all. Also, with all due respect, you're wrong for this use case, I do want this many terminals, and each ideograph to be its own node on the tree. :-)

from lark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.