Giter Club home page Giter Club logo

Comments (4)

goodmami avatar goodmami commented on August 10, 2024 1

I think it would be good to always error on immediate chaining of those operators as in e.g. A*+ regardless of whether or not this can be interpreted in a reasonable non-infinite-loop-producing way.

These are currently blocked at the syntax level (aside: maybe we need a new error class like GrammarSyntaxError), as they are not part of Ford's original PEG metasyntax. I'm not entirely opposed to allowing it if we allow (A+)?, etc.

However, as soon as we deal with indirections across rules, I think all repetition operators should be allowed as long as the referenced rule can not match an empty string.

One part of pe's design is that non-terminals are not inherently different from inline forms. That is, the A1 and A2 productions below have exactly equivalent behavior:

A1 <- B C
B  <- "b"
C  <- "c"

A2 <- "b" "c"

This design is what enables pe to reason about when to optimize patterns. Semantic changes, such as using a ~ capture or attaching an action to a rule, introduce differences that block optimization.

Thus, if we allow

A <- B?
B <- "b"+

Then we must also allow

A <- ("b"+)?

This is also why the capture in (~"a"*)* above causes the infinite loop. So maybe we need the ability to see across these semantic effects for detecting potential infinite loops.

However, as long as the referenced rule definitely matches at least one character/byte, applying a repetition operator to it makes sense.

Apologies if I misunderstood, but I want to be able to detect these things at compile time, not parse time. So I don't want to care if, at parse time, A* matches more than zero characters so that we can allow a second * to repeat over it. I want to detect at compile time that this pattern can potentially match an empty string infinitely.

Furthermore, I would like to point out that [...] is not equal to [...] nor to [...] this is due to auto-ignoring (of whitespace).

Yes, I did not bring up auto-ignore in my reply above because I wanted to keep the conversation focused, but I agree that we would need to consider its effects in the multiple repeat operator issue. It's a relatively new feature of pe and I'm still not as certain about its ramifications.

from pe.

goodmami avatar goodmami commented on August 10, 2024

Hmm, I think you're right, if we only relax the check for an outer ?. The purpose of this error is to avoid infinite loops. For the three repeat operators *, +, and ?, we have the following combinations:

Pattern Effect in pe* Allowed by re**
(X*)* infinite loop a**
(X*)+ infinite loop a*+ ✔️
(X*)? redundant a*? ✔️
(X+)* redundant a+*
(X+)+ redundant a++ ✔️
(X+)? equivalent to X* a+? ✔️
(X?)* infinite loop a?*
(X?)+ infinite loop a?+ ✔️
(X?)? redundant a?? ✔️

* You might not see the infinite loop in pe because the packrat parser has a simple guard for loops that do not advance the position, and, secondly, optimized grammars might turn the pattern into a single regex which does not have the infinite loop.

** The re module gives an error for a**, a+*, and a?*, but you can avoid the error with a group, like (a*)* or (?:a*)*, and using it to match does not result in an infinite loop.

In spite of protections against it, there are some ways to induce the infinite loop. For instance, if you use a capture (~) to prevent the grammar error and then use the machine parser, it will get stuck:

import pe
pe.match('(~"a"*)*', "b", parser="machine")  # DANGER: infinite loop

I'm prepared to allow (X+)?, because even if the effect is the same as X*, it seems natural to write grammars like this:

Something <- Token?
Token     <- Char+

For the "redundant" cases above, it should be safe to allow them, perhaps with a GrammarWarning instead of a GrammarError. For now I'd rather continue raising a GrammarError on the infinite loop cases.

(Aside: while investigating this issue, I came across another bug. See #38)

from pe.

Krzmbrzl avatar Krzmbrzl commented on August 10, 2024

I think it would be good to always error on immediate chaining of those operators as in e.g. A*+ regardless of whether or not this can be interpreted in a reasonable non-infinite-loop-producing way.
However, as soon as we deal with indirections across rules, I think all repetition operators should be allowed as long as the referenced rule can not match an empty string. If it can, all of those operators either don't make sense to begin with (?) or can create an infinite loop (* and +).
However, as long as the referenced rule definitely matches at least one character/byte, applying a repetition operator to it makes sense.

The fact that some of this chaining is redundant and can be rewritten with only a single operator should be an implementation detail of pe when it optimizes the grammar/parser.
However, from a user's POV it's quite convenient to write a rule matching a specific input and then reuse that rule elsewhere with different repetition operators. Just think of e.g. an ID rule such as

ID <- [a-zA-Z_0-9]+

which can then be used as e.g.

name <- ID ID? # first name and optional last name
name_list <- ID+ # Multiple names after another

Furthermore, I would like to point out that e.g.

A < B+
B <- [a-z]+

is not equal to

A < [a-z]+

nor to

A <- [a-z]+

this is due to auto-ignoring (of whitespace).

from pe.

Krzmbrzl avatar Krzmbrzl commented on August 10, 2024

I want to detect at compile time that this pattern can potentially match an empty string infinitely.

Indeed that is what I meant. If the formal specification of the rule allows it to potentially match an empty string, don't allow repetition operators on it.

from pe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.