Comments (7)
grep
and grepi
can be used directly, so you could do something like:
filter:
- grepi: 'price: <span>.*</span>'
- re.sub:
pattern: '^.*(price: <span>.*</span>).*$'
repl: '\1'
from urlwatch.
findall might be easier though, what are you thinking for the output, just put each match on a new line?
from urlwatch.
findall might be easier though, what are you thinking for the output, just put each match on a new line?
yes, that's what I was thinking too.
I didn't check in the source yet, how it is implemented, but it felt like, it could be easier integrated. But maybe the similar names of re.sub
in urlwatch and the re package fooled me.
from urlwatch.
Yes, its not hard to add it. A little more than re.sub because you have to do something with the matches, where re.sub will just give you the string to return.
https://github.com/thp/urlwatch/blob/master/lib/urlwatch/filters.py#L831
from urlwatch.
I actually have a couple of places that this would simplify my filters, so have put in an implementation in #805. See what you think.
from urlwatch.
For filtering out HTML elements, use the CSS or XPath filters. Never use regex.
from urlwatch.
For filtering out HTML elements, use the CSS or XPath filters. Never use regex.
For me, this was not the intention here. It's more that you want to extract certain data. I just tried to make my problem more clear by taking the previous example from the urlwatch docs. The scenario is more, that you have an p element and want to extract some data from there. E.g [...] and therefore for blablabla we set the price of 2.39€.[...]
. The idea is to only grab the data 2.39€ without the whole text.
With re.sub
you always have to build a regex which catches the whole paragraph which is error prone. And the grep
solution only works on full lines.
I actually got inspired by changedetection.io which I tried recently, because of their GUI and they have this nice data extraction feature. However their scripting is much more troublesome so I would like to stick with urlwatch.
OT: It's just a bit frustrating, why open source often has to invent new wheels instead of joining forces. It would be amazing to see, if changedetection.io would have used urlwatch under the hood, to build a more powerful solution.
from urlwatch.
Related Issues (20)
- Reporting blanks HOT 28
- add support to specify multiple recipients per URL HOT 7
- YAML Anchors/Aliases not working HOT 4
- CSS Filter "AttributeError: 'CSSSelector' object has no attribute 'evaluate'" HOT 2
- FEATURE: Support multiple reporters with different options HOT 6
- Meaning of max_tries is confusing
- urlwatch stopped working HOT 4
- sendmail is not documented HOT 2
- Randomly "not enough values to unpack" Python errors HOT 4
- Cache inconsistency creating new items from nowhere HOT 3
- Consider releasing version 2.29 HOT 5
- Question - Report http errors only once HOT 5
- urlwatch 2.25-1 on Debian Stable 12.5 (navigate fails) HOT 3
- Enable/disable job from the command line HOT 1
- urlwatch moans when supplying --config HOT 1
- urlwatch 2.25 - AttributeError: 'list' object has no attribute 'read' HOT 2
- XML parsing with CDATA not working HOT 6
- urlwatch 2.28: html: separate: true not sending separate emails HOT 2
- --test-filter works but not with a normal execution HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from urlwatch.