Comments (7)
Together with @bpasero and @lszomoru, I've spend some time figuring out a way forward for this feature request, assuming you'd consider it. As @bpasero already mentioned, today VS Code matches globs at the JS level. Doing that at the C++ level would be ideal for obvious performance reasons.
Scenario
Let's consider the classic glob scenario. The user should be able to provide a set of glob patterns, relative to the watched directory. Glob patterns can be positive or negative (starting with !
). Events would only be emitted if they match any of the positive glob patterns and none of the negative ones.
Consider the following example:
- Directory to watch:
/work
- Glob Patterns:
src/**
build/**
!src/**/test/**
!build/output/**
The following files and directories would match the glob patterns:
/work/src/
/work/src/main.js
/work/src/widgets/
/work/src/widgets/bar.js
/work/build/compile.sh
The following ones would not:
/work/README.md
/work/src/test/
/work/src/test/main.js
/work/build/output/
/work/build/output/a.out
Implementation
Matching glob patterns against paths is commonly solved by converting glob patterns to regular expressions, eg VS Code, minimatch, etc. We suggest doing the same here. One implementation possibility would be to port VS Code's conversion code to C++, leveraging std::regex
. Another, to depend on or embed an existing C++ implementation.
API
Then, there's the question of API. An option of a set of glob patterns conflicts with the existing ignore
option. Also, it is not ideal to drop the ignore
option altogether since it actually has performance benefits by being forwarded to some underlying native watcher implementations. I see two solutions.
We can have both ignore
and glob
options. The former would be kept to perserve today's behavior of forwarding that value to the underlying watcher as well as using it for filtering, while the former would only be used for additional filtering. This is simple and backwards-compatible, but not elegant: what would it mean to have { ignore: ['foo'], glob: ['foo/**'] }
? By definition, none of the files within foo
would be detected, despite being included in the glob patterns. Another unhappy example is { ignore: [], glob: ['!build/**'] }
, in which we would lose an opportunity to provide build
to be ignored by the underlying watcher and improve performance.
Another solution involves breaking API: drop the ignore
option and have a single glob
option. We would then detect specific negative glob patterns to be automatically provided as ignore rules to the underlying watcher. Specifically, if a negative pattern ends with /**
and contains no other **
part, then it can be automatically converted to a path to be ignored. Take the example from above once again: the !build/output/**
pattern is negative, ends with /**
and contains no other **
parts, so the path /work/build/output
should be provided to the underlying watcher to be fully ignored. Such patterns are extremely common in glob scenarios, so we can assume a high rate of successful detections.
Next Steps
Asking again, do you still consider glob pattern matching useful? Would you be open to a contribution from our side? What would be your the preferred approach? Do you maybe have other thoughts on the problem and/or approaches? We'd love to hear more from you and help move the Parcel watcher forward as it has provided real value to VS Code since we started using it.
from watcher.
Thanks for the writeup! I think this would be a great feature for you to contribute. 🥳
In terms of implementation, I like the idea of converting globs to regexes on the JavaScript side and passing them to C++. This only needs to happen once at startup and shouldn't be too expensive performance wise. JS glob libraries are widely used and well tested, and managing C++ dependencies is much more complex, so I think this is a good approach.
For the API, another option could be to keep the ignore
option, but allow both paths and globs. We can detect whether each ignore pattern is a glob e.g. using https://github.com/micromatch/is-glob, and perhaps pass them as separate lists to the C++ backend but keep the JS API unified. Would that make sense to you?
from watcher.
What a fantastic idea to convert from JS to RegEx, I was not even thinking about that. We can then just use our existing code to convert from VSCode 💯
from watcher.
That is, assuming JavaScript regular expressions are fairly compatible with C++11's... 🤔
There is an ECMAScript std::regex flavour which is also the default if not set explicitly in the constructor: https://en.cppreference.com/w/cpp/regex/syntax_option_type
from watcher.
Glob patterns would indeed be useful! There are two ways we could implement them. The simplest approach is to do it in JavaScript, as there are already lots of glob libraries out there. No operating system API supports passing globs directly, so we will have to filter after receiving events anyway.
The other way would be to do it in C++. The benefit there would be performance. Since watching occurs on a background thread, we wouldn't overwhelm the JS thread with events that will be ignored anyway. However, implementing glob matching from scratch sounds hard. Perhaps there are some C++ libraries that implement globbing, though I also wouldn't want to bring in a large library if it has the potential for security issues. I suppose we could use Rust for the glob matching... 🤔
Any thoughts on that from your perspective?
from watcher.
@devongovett unfortunately the JavaScript approach wouldn't be a good solution for VS Code at least: for Linux it is important to apply the ignore rules at the level of where folders are recursively resolved and added to be watched. Since Linux has a natural upper limit of opened file handles, the ignore pattern is the only way for Linux users to avoid hitting that limit (related issue: microsoft/vscode#40898).
VS Code is already filtering the events on the receiving end but I would much prefer a native solution in C++ within the watcher.
Btw I am not necessarily saying you need full glob support to achieve this. Even VS Code itself implements a simple glob library that supports patterns such as **
and *
but not everything the spec allows you to do. It is optimised for performance by avoiding to run a RegEx for all patterns when patterns have a specific form.
When looking into nsfw
I noticed this pull request that seems to implement simple globbing in C++: https://github.com/Axosoft/nsfw/pull/124/files?authenticity_token=ifa16H3MpIoYvE3UXvzRn%2FP3eXbU9xziU2OqaQuUTyK99mBonpsX9DPygK%2BJyZCGMgrDUG2aXOne1QvQBXYGtw%3D%3D&file-filters%5B%5D=.cpp&file-filters%5B%5D=.gyp&file-filters%5B%5D=.h&file-filters%5B%5D=.md&file-filters%5B%5D=.ts#diff-7f14a4a5a7200888990f03862d86776107f7825be09854976d56e0760d1d01bb
Maybe that could be used as foundation going forward.
from watcher.
Lovely, great idea! I also like the API suggestion, I feel like that's actually the minimal API change we'd need and stays aligned with today's API. Given that, what do you think of directly using minimatch
's makeRe
API? That is, assuming JavaScript regular expressions are fairly compatible with C++11's... 🤔
from watcher.
Related Issues (20)
- Separate packages for precompiled binaries HOT 1
- @parcel/watcher not working with Docker node:alpine image HOT 3
- Watcher not reporting events when used in docker. HOT 2
- npm ci cannot locate a prebuild for any given architecture HOT 3
- Error: No prebuild or local build of @parcel/watcher found. Tried @parcel/watcher-linux-arm64-musl. HOT 14
- v2.2.0 node-gyp rebuild error, incorrect scripts in npm registry HOT 11
- Specifying path to `watchman` executable HOT 1
- When using Watchman on Windows, a console window pops up when subscribing HOT 1
- gyp: binding.gyp not found (...) while trying to load binding.gyp HOT 33
- error when use node v18.18 on github runner images
- InotifyBackend hungs forever when inotify_init1 returns error HOT 1
- AWS Amplify build error HOT 2
- Github M1 Build Error sh: node-gyp-build: command not found HOT 1
- Change event is not emitted on Windows when continuously appending to an initially existing file
- when moving folder to a watched folder , only event for the moved folder is generated and not for files inside the folder
- Support for negated Ignores Globs
- node-addon-api should be a dev dependency? HOT 1
- With Watchman installed on Windows subscribe hangs
- Segmentation fault using wildcard in ignore option on Node.js v20.12.x (LTS) HOT 6
- FSEvents behaves differently compared to inotify with glob excludes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from watcher.