The following snippet <div class="snippet-clipboard-content notranslate position-r

Hmm, I think you're right, if we only relax the check for an outer <code class="notran

Multiple repeat operators about pe HOT 4 OPEN

Krzmbrzl commented on August 10, 2024

Multiple repeat operators

from pe.

Comments (4)

goodmami commented on August 10, 2024 1

I think it would be good to always error on immediate chaining of those operators as in e.g. A*+ regardless of whether or not this can be interpreted in a reasonable non-infinite-loop-producing way.

These are currently blocked at the syntax level (aside: maybe we need a new error class like GrammarSyntaxError), as they are not part of Ford's original PEG metasyntax. I'm not entirely opposed to allowing it if we allow (A+)?, etc.

However, as soon as we deal with indirections across rules, I think all repetition operators should be allowed as long as the referenced rule can not match an empty string.

One part of pe's design is that non-terminals are not inherently different from inline forms. That is, the A1 and A2 productions below have exactly equivalent behavior:

A1 <- B C
B  <- "b"
C  <- "c"

A2 <- "b" "c"

This design is what enables pe to reason about when to optimize patterns. Semantic changes, such as using a ~ capture or attaching an action to a rule, introduce differences that block optimization.

Thus, if we allow

A <- B?
B <- "b"+

Then we must also allow

A <- ("b"+)?

This is also why the capture in (~"a"*)* above causes the infinite loop. So maybe we need the ability to see across these semantic effects for detecting potential infinite loops.

However, as long as the referenced rule definitely matches at least one character/byte, applying a repetition operator to it makes sense.

Apologies if I misunderstood, but I want to be able to detect these things at compile time, not parse time. So I don't want to care if, at parse time, A* matches more than zero characters so that we can allow a second * to repeat over it. I want to detect at compile time that this pattern can potentially match an empty string infinitely.

Furthermore, I would like to point out that [...] is not equal to [...] nor to [...] this is due to auto-ignoring (of whitespace).

Yes, I did not bring up auto-ignore in my reply above because I wanted to keep the conversation focused, but I agree that we would need to consider its effects in the multiple repeat operator issue. It's a relatively new feature of pe and I'm still not as certain about its ramifications.

from pe.

goodmami commented on August 10, 2024

Hmm, I think you're right, if we only relax the check for an outer ?. The purpose of this error is to avoid infinite loops. For the three repeat operators *, +, and ?, we have the following combinations:

Pattern	Effect in pe*	Allowed by re**
`(X)`	infinite loop	`a**` ❌
`(X*)+`	infinite loop	`a*+` ✔️
`(X*)?`	redundant	`a*?` ✔️
`(X+)*`	redundant	`a+*` ❌
`(X+)+`	redundant	`a++` ✔️
`(X+)?`	equivalent to `X*`	`a+?` ✔️
`(X?)*`	infinite loop	`a?*` ❌
`(X?)+`	infinite loop	`a?+` ✔️
`(X?)?`	redundant	`a??` ✔️

* You might not see the infinite loop in pe because the packrat parser has a simple guard for loops that do not advance the position, and, secondly, optimized grammars might turn the pattern into a single regex which does not have the infinite loop.

** The re module gives an error for a**, a+*, and a?*, but you can avoid the error with a group, like (a*)* or (?:a*)*, and using it to match does not result in an infinite loop.

In spite of protections against it, there are some ways to induce the infinite loop. For instance, if you use a capture (~) to prevent the grammar error and then use the machine parser, it will get stuck:

import pe
pe.match('(~"a"*)*', "b", parser="machine")  # DANGER: infinite loop

I'm prepared to allow (X+)?, because even if the effect is the same as X*, it seems natural to write grammars like this:

Something <- Token?
Token     <- Char+

For the "redundant" cases above, it should be safe to allow them, perhaps with a GrammarWarning instead of a GrammarError. For now I'd rather continue raising a GrammarError on the infinite loop cases.

(Aside: while investigating this issue, I came across another bug. See #38)

from pe.

Krzmbrzl commented on August 10, 2024

I think it would be good to always error on immediate chaining of those operators as in e.g. A*+ regardless of whether or not this can be interpreted in a reasonable non-infinite-loop-producing way.
However, as soon as we deal with indirections across rules, I think all repetition operators should be allowed as long as the referenced rule can not match an empty string. If it can, all of those operators either don't make sense to begin with (?) or can create an infinite loop (* and +).
However, as long as the referenced rule definitely matches at least one character/byte, applying a repetition operator to it makes sense.

The fact that some of this chaining is redundant and can be rewritten with only a single operator should be an implementation detail of pe when it optimizes the grammar/parser.
However, from a user's POV it's quite convenient to write a rule matching a specific input and then reuse that rule elsewhere with different repetition operators. Just think of e.g. an ID rule such as

ID <- [a-zA-Z_0-9]+

which can then be used as e.g.

name <- ID ID? # first name and optional last name
name_list <- ID+ # Multiple names after another

Furthermore, I would like to point out that e.g.

A < B+
B <- [a-z]+

is not equal to

A < [a-z]+

nor to

A <- [a-z]+

this is due to auto-ignoring (of whitespace).

from pe.

Krzmbrzl commented on August 10, 2024

I want to detect at compile time that this pattern can potentially match an empty string infinitely.

Indeed that is what I meant. If the formal specification of the rule allows it to potentially match an empty string, don't allow repetition operators on it.

from pe.

Multiple repeat operators about pe HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent