Comments (4)
IMO, this is an egregious misuse of regular expressions. Between that and the ~6yo version of RE2, it wouldn't be reasonable for me to spend time (either professionally or personally) digging into why the linear-time constant seems to be larger for the single pattern. There are better (i.e. more readable, more maintainable, more efficient) ways of parsing data in such a format. (The data would be in a better format, ideally, but that may or may not be within your control.)
from re2.
Thank you for the quick response. Unfortunately, we don’t have much control over both the data and the pattern (parse query) as they come from our customers. We agree that such a pattern is not ideal. To us, it appears that the latency grows exponentially with the increase in the number of match groups in the pattern. If you can confirm that RE2::Match always maintains linear performance in the latest version of RE2, even for the edge case we encountered, it would greatly help us to determine whether an upgrade to RE2 is a viable solution.
from re2.
If you have a test case you should be able to try the latest RE2 yourself.
from re2.
Also, it seems unsound to draw conclusions about asymptotic complexity from two (2) data points. For now, my guess is that the DFA and NFA execution engines incur overhead (i.e. increase the linear-time constant) due to the large number of (.*?)
subexpressions. (Specifically, combining so much ambiguity with so many capturing groups would amplify cost significantly.) Updating the version of RE2 could be a mitigation, but considering it a solution would be a categorical mistake.
from re2.
Related Issues (20)
- Data race condition in void DFA::RWLocker::LockForWriting() HOT 2
- Support for Python versions lower than 3.8 HOT 1
- What is the 'sam' command used in doc scripts? HOT 2
- the parser can incorrectly merge runs of literals and/or character classes
- Fail on negative lookbehind regex compilation HOT 5
- Error : name 'apple_cc_toolchain' is not defined in CI at Bazel@HEAD HOT 1
- RE2::GlobalReplace case insensitive fail with OR-ed pattern AB|AC HOT 2
- Error loading '@pybind11_bazel~2.11.1.bzl.2//:python_configure.bzl' in re2 2024-02-01 HOT 5
- Non-greedy qualifier doesn't work with end-anchor HOT 6
- Installing google-re2 using pip HOT 1
- Cant get RE2::Consume to work HOT 1
- A compilation error with the flag -Werror=stringop-overread HOT 1
- SIGSEGV if `Match` is called before `Compile` HOT 5
- Success but unsuppressable stderr message when compiling an empty `re2.Filter` HOT 3
- please release a new pypi version HOT 6
- Incorrect re2Config.cmake.in for CMake 3.29 HOT 6
- How to install google_re2-1.1-2 by pip ? HOT 7
- Build error on arm64-osx with vcpkg HOT 1
- Getting a double free or corruption error when using re2 with an invalid regex HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from re2.