Giter Club home page Giter Club logo

Comments (7)

joaomoreno avatar joaomoreno commented on May 17, 2024 3

Together with @bpasero and @lszomoru, I've spend some time figuring out a way forward for this feature request, assuming you'd consider it. As @bpasero already mentioned, today VS Code matches globs at the JS level. Doing that at the C++ level would be ideal for obvious performance reasons.

Scenario

Let's consider the classic glob scenario. The user should be able to provide a set of glob patterns, relative to the watched directory. Glob patterns can be positive or negative (starting with !). Events would only be emitted if they match any of the positive glob patterns and none of the negative ones.

Consider the following example:

  • Directory to watch: /work
  • Glob Patterns:
    • src/**
    • build/**
    • !src/**/test/**
    • !build/output/**

The following files and directories would match the glob patterns:

  • /work/src/
  • /work/src/main.js
  • /work/src/widgets/
  • /work/src/widgets/bar.js
  • /work/build/compile.sh

The following ones would not:

  • /work/README.md
  • /work/src/test/
  • /work/src/test/main.js
  • /work/build/output/
  • /work/build/output/a.out

Implementation

Matching glob patterns against paths is commonly solved by converting glob patterns to regular expressions, eg VS Code, minimatch, etc. We suggest doing the same here. One implementation possibility would be to port VS Code's conversion code to C++, leveraging std::regex. Another, to depend on or embed an existing C++ implementation.

API

Then, there's the question of API. An option of a set of glob patterns conflicts with the existing ignore option. Also, it is not ideal to drop the ignore option altogether since it actually has performance benefits by being forwarded to some underlying native watcher implementations. I see two solutions.

We can have both ignore and glob options. The former would be kept to perserve today's behavior of forwarding that value to the underlying watcher as well as using it for filtering, while the former would only be used for additional filtering. This is simple and backwards-compatible, but not elegant: what would it mean to have { ignore: ['foo'], glob: ['foo/**'] }? By definition, none of the files within foo would be detected, despite being included in the glob patterns. Another unhappy example is { ignore: [], glob: ['!build/**'] }, in which we would lose an opportunity to provide build to be ignored by the underlying watcher and improve performance.

Another solution involves breaking API: drop the ignore option and have a single glob option. We would then detect specific negative glob patterns to be automatically provided as ignore rules to the underlying watcher. Specifically, if a negative pattern ends with /** and contains no other ** part, then it can be automatically converted to a path to be ignored. Take the example from above once again: the !build/output/** pattern is negative, ends with /** and contains no other ** parts, so the path /work/build/output should be provided to the underlying watcher to be fully ignored. Such patterns are extremely common in glob scenarios, so we can assume a high rate of successful detections.

Next Steps

Asking again, do you still consider glob pattern matching useful? Would you be open to a contribution from our side? What would be your the preferred approach? Do you maybe have other thoughts on the problem and/or approaches? We'd love to hear more from you and help move the Parcel watcher forward as it has provided real value to VS Code since we started using it.

from watcher.

devongovett avatar devongovett commented on May 17, 2024 2

Thanks for the writeup! I think this would be a great feature for you to contribute. 🥳

In terms of implementation, I like the idea of converting globs to regexes on the JavaScript side and passing them to C++. This only needs to happen once at startup and shouldn't be too expensive performance wise. JS glob libraries are widely used and well tested, and managing C++ dependencies is much more complex, so I think this is a good approach.

For the API, another option could be to keep the ignore option, but allow both paths and globs. We can detect whether each ignore pattern is a glob e.g. using https://github.com/micromatch/is-glob, and perhaps pass them as separate lists to the C++ backend but keep the JS API unified. Would that make sense to you?

from watcher.

bpasero avatar bpasero commented on May 17, 2024 1

What a fantastic idea to convert from JS to RegEx, I was not even thinking about that. We can then just use our existing code to convert from VSCode 💯

from watcher.

mischnic avatar mischnic commented on May 17, 2024 1

That is, assuming JavaScript regular expressions are fairly compatible with C++11's... 🤔

There is an ECMAScript std::regex flavour which is also the default if not set explicitly in the constructor: https://en.cppreference.com/w/cpp/regex/syntax_option_type

from watcher.

devongovett avatar devongovett commented on May 17, 2024

Glob patterns would indeed be useful! There are two ways we could implement them. The simplest approach is to do it in JavaScript, as there are already lots of glob libraries out there. No operating system API supports passing globs directly, so we will have to filter after receiving events anyway.

The other way would be to do it in C++. The benefit there would be performance. Since watching occurs on a background thread, we wouldn't overwhelm the JS thread with events that will be ignored anyway. However, implementing glob matching from scratch sounds hard. Perhaps there are some C++ libraries that implement globbing, though I also wouldn't want to bring in a large library if it has the potential for security issues. I suppose we could use Rust for the glob matching... 🤔

Any thoughts on that from your perspective?

from watcher.

bpasero avatar bpasero commented on May 17, 2024

@devongovett unfortunately the JavaScript approach wouldn't be a good solution for VS Code at least: for Linux it is important to apply the ignore rules at the level of where folders are recursively resolved and added to be watched. Since Linux has a natural upper limit of opened file handles, the ignore pattern is the only way for Linux users to avoid hitting that limit (related issue: microsoft/vscode#40898).

VS Code is already filtering the events on the receiving end but I would much prefer a native solution in C++ within the watcher.

Btw I am not necessarily saying you need full glob support to achieve this. Even VS Code itself implements a simple glob library that supports patterns such as ** and * but not everything the spec allows you to do. It is optimised for performance by avoiding to run a RegEx for all patterns when patterns have a specific form.

When looking into nsfw I noticed this pull request that seems to implement simple globbing in C++: https://github.com/Axosoft/nsfw/pull/124/files?authenticity_token=ifa16H3MpIoYvE3UXvzRn%2FP3eXbU9xziU2OqaQuUTyK99mBonpsX9DPygK%2BJyZCGMgrDUG2aXOne1QvQBXYGtw%3D%3D&file-filters%5B%5D=.cpp&file-filters%5B%5D=.gyp&file-filters%5B%5D=.h&file-filters%5B%5D=.md&file-filters%5B%5D=.ts#diff-7f14a4a5a7200888990f03862d86776107f7825be09854976d56e0760d1d01bb

Maybe that could be used as foundation going forward.

from watcher.

joaomoreno avatar joaomoreno commented on May 17, 2024

Lovely, great idea! I also like the API suggestion, I feel like that's actually the minimal API change we'd need and stays aligned with today's API. Given that, what do you think of directly using minimatch's makeRe API? That is, assuming JavaScript regular expressions are fairly compatible with C++11's... 🤔

from watcher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.