Giter Club home page Giter Club logo

Comments (5)

isaacs avatar isaacs commented on June 15, 2024

Simple repro:

> minimatch.makeRe('@(!(a)|@(b))x@(c)')
/^(?:(?!\.)(?=.)(?:(?:(?!(?:a)|(?:b)x(?:c))[^/]*?)|(?:b))x(?:c))$/
> minimatch.makeRe('@(@(a)|!(b))x@(c)')
/^(?:)$/  <-- ??

Problem occurs when an extglob pattern contains a negative extglob pattern as the final member, and there is a subsequent extglob in the pattern outside the containing pattern.

from minimatch.

isaacs avatar isaacs commented on June 15, 2024

Wow, yeah, this is super broken: 5d3bf66#diff-3aa183b9f29dded13d8765c75d4a39d31487e55bc8603c08e8e631f97fa582b9R8

!(!(a)|!(b))x!(c) should be an unmatchable pattern, because it means not any of (not a, not b). But every character is either not a or not b (or both). So how come that shows matches??

Very broken.

from minimatch.

isaacs avatar isaacs commented on June 15, 2024

Oh, wait, I'm dumb, I forgot to pass nonegate:true.

Still a bunch of broken stuff in here tho: https://github.com/isaacs/minimatch/blob/isaacs/fix-nested-extglob/tap-snapshots/test/nested-extglobs.js.test.cjs

from minimatch.

isaacs avatar isaacs commented on June 15, 2024

This is going to require a rewrite of the parser, I think.

The challenge is that there's no way to "anchor" on "the end of the pattern matched by the current capture group". So, the only way to anchor a negative extglob properly is to include the entire regexp tail that follows the negated group in the negated extglob pattern.

So for example, a!(b|c)d becomes /^a(?:(?!(?:b|c)d$).*)d$/. If they're nested, we have to recursively apply the parent tail along with the child tail, so a!(!(b|c)d|e)f becomes:

   v---------!(!(b|c)d|e)------------------v
   |        v--!(b|c)d-----------v         |
   |        v--!(b|c)-----------v|         |
/^a(?:(?!(?:(?:(!?(?:b|c)df$).*?)d|e)f$).*?)f$/
                         ||      |   ^-- outer tail "f"
                         ||      ^-- inner tail "d"
                         ^^--- combined recursive tail "df"

Algorithm:

  • Parse from the tail to the head, swapping out classes, magic chars, and extglobs, so that we have the "tail" of the pattern at all times.
  • Parse negative extglobs by replacing !(${...parts}) with (?:(?!(?:${...parts})${tail})${extra})
  • If ${parts} includes '', then extra is .+?, otherwise .*?. Remove '' from parts.
  • if any parts include an extglob, recursively swap those out, appending parent's tail to child's tail

Walking through this with a more extreme example:

o!(p|!(a|b)c|q|)r!(i|!(j|k)l|)s

  > parse s, eof
o!(p|!(a|b)c|q|)r!(i|!(j|k)l|)s(?:$|\/)

!(i|!(j|k)l|) -> parts=(i|!(j|k)l), contains '', tail=s(?:$|\/)
-> (?:(?!(?:${parts})${tail}).+)
  -> !(j|k) -> parts=(j|k), no '', tail=ls(?:$|\/)
  -> (?:(?!(?:j|k)ls(?:$|\/)).*?)
!(i|!(j|k)l|) -> parts={i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l}, contains '', tail=s(?:$|\/)
  -> (?:(?!(?:${parts})${tail}).+?)
  -> (?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)

o!(p|!(a|b)c|q|)r(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)

-> !(p|!(a|b)c|q|)
  -> parts={p|!(a|b)c|q}, contains '', tail=r(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)
    -> !(a|b) parts={a|b}, no'', tail=cr(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)
    -> (?:(?!(?:${parts})${tail}).*?)
    -> (?:(?!(?:a|b)cr(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)).*?)
  -> parts={p|(?:(?!(?:a|b)cr(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)).*?)c|q}, contains '', tail=r(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)
-> (?:(?!(?:${parts})${tail}).+?)
  -> (?:(?!(?:p|(?:(?!(?:a|b)cr(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)).*?)c|q)r(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)).+?)

o(?:(?!(?:p|(?:(?!(?:a|b)cr(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)).*?)c|q)r(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)).+?)r(?:(?!(?:i|(?:(?!(?:j|k)ls(?:$|\/)).*?)l)s(?:$|\/)).+?)s(?:$|\/)
 |          ^--!(a|b)--------------------------------------------------------------------------^                                                                         | |          ^-!(j|k)-------------------^               |
 ^-!(p|!(a|b)c|q|)-------------------------------------------------------------------------------------------------------------------------------------------------------^ ^-!(i|(!j|k)l|)---------------------------------------^

So these patterns can get pretty long and impenetrable, but I don't really see any way around it.

from minimatch.

isaacs avatar isaacs commented on June 15, 2024

There were a few other fixes to the extglob parsing in v5.1 (not to the full rewrite extent described here, and still with a few shortcomings), but incidentally, the pattern in the op actually does have an error in it.

                    v-- not an extglob, need one of [+!?*@] here!
minimatch.makeRe('*((*.py|*.js)|!(*.json))das*(*.js|!(*.json))')
                              ^-- so this ends the pattern

It still shouldn't necessarily fail entirely, a better failure mode would be to escape the extra ) and just be a pattern that won't match what @zwhitchcox is probably expecting.

Version 5.1 does properly handle the other extglobs posted above as minimal examples of failures, so I'm going to call this done for now.

from minimatch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.