Comments (7)
the problem is this regex specifically /^\s*(\d+\s*)+$/
can you explain what this is for?
from changedetection.io.
Yes, it's quadratic time. If the string being searched has N characters, first it fails to find "x" in all N of 'em, then `.*` advances by one and it fails to find "x" in the trailing N-1 characters, then again in the trailing N-2, and so on. N + N-1 + N-2 + ... + 1 is quadratic in N.
That's how this kind of regexp engine works. And it's mild, as such things go: you can also create poor regexps that take time _exponential_ in N that fail to match certain strings.
It's unlikely this will change without replacing Python's regexp implementation entirely. For why, see Jeffrey Friedl's book "Mastering Regular Expressions" (published by O'Reilly). That book also spells out techniques for crafting regexps that don't suck ;-) It's not a small topic, alas.
https://bugs.python.org/issue35915
from changedetection.io.
the problem is your regex, the other problem is that the system doesnt timeout, the regex works but it takes an exponentially long time to use. your regex is bad and the system doesnt catch it.
the fix is to place the call to
in a thread and wrap that thread with a timeout, on timeout, it should throw an error that suggests to check all regexs etcmaybe something like
def search_with_timeout(pattern, text, timeout=3):
result = [None]
def search_thread():
result[0] = re.search(pattern, text)
# Create and start the thread
search_thread = threading.Thread(target=search_thread)
search_thread.start()
# Wait for the thread to finish or timeout
search_thread.join(timeout)
# If thread is still alive, it means it has exceeded the timeout
if search_thread.is_alive():
print("Search operation timed out!")
# Terminate the thread
search_thread.terminate() # This method doesn't exist, it's just for illustration
return result[0]
# Example usage
pattern = r'your_pattern_here'
text = 'your_text_here'
result = search_with_timeout(pattern, text)
if result:
print("Match found:", result.group())
else:
print("No match found within the timeout.")
from changedetection.io.
the problem is this regex specifically
/^\s*(\d+\s*)+$/
can you explain what this is for?
I am not good at regex. It was created by ChatGPT4. My intention was to ignore lines that has just number or space. Below is entire ChatGPT 4 response.
To match a line containing only numbers that may include spaces between them, such as "8 4 6 2 1 9 6 6", and also match lines that have just a number without spaces, you can use the following regular expression:
/^\s*(\d+\s*)+$/
This regular expression does the following:
^
asserts the position at the start of the line.\s*
matches any whitespace characters (like spaces or tabs) zero or more times.(\d+\s*)
is a group that matches one or more digits followed by zero or more whitespace characters.+
after the group(\d+\s*)
means that this group can appear one or more times, allowing for multiple numbers separated by spaces.$
asserts the position at the end of the line.So, this regex will match a line with a single number, as well as lines with multiple numbers separated by spaces.
from changedetection.io.
Following works and should be better as per ChatGPT4.
The regular expression you've provided
/^\s*(\d+\s*)+$/
is intended to match lines that consist solely of numbers with optional whitespace characters between them. However, it is susceptible to what's known as "catastrophic backtracking," which can occur when the regex engine has to evaluate a large number of possible ways to match a pattern. This happens because the pattern(\d+\s*)+
is highly ambiguous:\d+
can match as many digit characters as possible, and\s*
can match as many whitespace characters as possible. The+
at the end allows this entire group to repeat, creating many possible combinations for the regex engine to try and match.To improve the performance of this regex, we can try to make the quantifiers less ambiguous and remove unnecessary repetition. A revised version might look like this:
/^\s*\d+(?:\s+\d+)*\s*$/
Here’s what’s changed:
- Instead of
(\d+\s*)+
, it now uses\d+(?:\s+\d+)*
, which will match a number, followed by zero or more groups of one or more whitespace characters followed by another number. This pattern is less prone to backtracking because the+
inside the non-capturing group(?: ... )
requires at least one whitespace character to be present for a match to continue, eliminating the ambiguity of\s*
.- The non-capturing group
(?: ... )
is used with*
to match any additional numbers separated by whitespace without capturing them, which is more efficient in many regex engines.This optimized pattern should perform much better because it guides the regex engine more precisely, reducing the potential for excessive backtracking.
from changedetection.io.
I guess this is the classic old problem of pasting code that you dont understand fully.
from changedetection.io.
I guess this is the classic old problem of pasting code that you dont understand fully.
So true. Exactly why a good software should have all kinds of safety nets wherever it allows user's to inject code in a input box.
Your proposed solution to timeout and throw error looks good.
from changedetection.io.
Related Issues (20)
- After bare-metal pip installation: Illegal instruction
- Incompatibilities during installation HOT 2
- When using `tgram://` combined with other notification types in notifications settings, the tgram-specific notification length limit is applied to all HOT 8
- "Restock detection mode" should not use visual selector filters (and other filters?) HOT 1
- Changing GET request parameters for a date HOT 1
- Sending notification on every check even if there are no changes.
- [feature] Exclude muted watches from the RSS feed HOT 1
- UI - Text Filtering not displaying filters when watched site is added to a group with filters. HOT 2
- JMESPath support HOT 1
- Encoding Error HOT 8
- module 'lxml.etree' has no attribute '_ElementStringResult' error since v0.45.18 HOT 38
- [feature] Add token to get updated URL after browser steps change the page HOT 1
- Unable to send test notifications if using default Notification URL HOT 1
- [feature] Different notification settings for different apprise services HOT 2
- `'str' object has no attribute '__name__'` error on some xpath filters HOT 35
- [feature] Automatically Refresh page/status after check is complete HOT 2
- [feature] Include GROUP & Title on Diff Pages
- [feature] Include suggestion to send email notifications to Cellphone numbers. HOT 1
- Even though I have v0.45.20 installed, app says "A NEW VERSION IS AVAILABLE" HOT 1
- [Bug]: (apprise) E-mail notifications broken to email servers by IP address HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from changedetection.io.