Comments (2)
Hey, thanks for the list. Most of these are carried over from Instapaper when I imported their site rules. They no longer have them public, but it used to be open for anyone to contribute (like this repository). I didn't implement all their directives, so most of these will just be ignored. Here's the list from Instapaper (at least I think all of these are from them, some might be users experimenting/guessing):
- convert_double_br_tags
- strip_comments
- move_into
- autodetect_next_page
- dissolve
- footnotes
- wrap_in
Of these, I'd like to implement dissolve. I think that removes the containing element without removing the contents. Would've been useful for that French site which had special links for regular words (linked to a dictionary I think). We ended up with a somewhat hacky solution. But dissolve would've come in useful.
These others are implemented in Full-Text RSS:
native_ad_clue
Introduced in Full-Text RSS 3.4. Used to identify if a given article is a native ad. Ad Detector has a lot of rules.
if_page_contains
Introduced in Full-Text RSS 3.5. This is only used with single_page_link at the moment. Added to make single_page_link directives conditional. Sometimes these rules use XPath functions like concat
, like in the example you linked to:
single_page_link: concat(//meta[@property="og:url"]/@content, '?print=1')
if_page_contains: //a[contains(@class, "articleNav")]
Here, single_page_link will always return a string, so even if the meta element doesn't exist, you'll get '?print=1'. For some sites, the single page view is only available on multi-page articles. When constructing URLs like this, we need a way to make it conditional. Otherwise we'd end up redirecting to a non-existent page, or simply unnecessarily requesting another page when the current one contains everything we need. So that's what if_page_contains does at the moment.
single_page_link_in_feed
This one should be documented, but it's not widely used. Basically the same as single_page_link but applied to the original feed item's description. So safe to ignore if the input URL is not a feed. See this question and our help page.
from ftr-site-config.
Of these, I'd like to implement dissolve.
It might be a good idea. From what I understand, it'll flatten the target node?
Like:
<ul>
<li>
<div>my text</div
</li>
<ul>
If I've dissolve: //ul/li
, it'll turn the node into :
<div>my text</div
Am I right?
Thanks for the explanation on pattern implemented in Full-Text RSS.
For the unused list, maybe we can just remove them from siteconfig to avoid confusion?
- convert_double_br_tags
- strip_comments
- move_into
- autodetect_next_page
- footnotes
- wrap_in
from ftr-site-config.
Related Issues (20)
- how to deal with JavaScript objects/json parsing HOT 4
- sz-magazin.sueddeutsche.de sz-plus Login HOT 2
- replace_string for <tag> also replaces </tag>. That is nice, but is this safe? HOT 1
- nytimes.com - lazy-loaded images not loaded HOT 1
- Fix tags when styled by CSS instead of using semantic HTML HOT 6
- How to contribute? HOT 2
- Kenfm.de is missing HOT 1
- Notebookcheck broken
- Specify title in site config file HOT 4
- tweakers.net pattern doesn't work anymore HOT 1
- nature.com Improvement HOT 2
- gizmodo/lifehacker don't work (they store text in JSON-LD now) HOT 3
- faz.net paywall articles shows payment-hint instead of the teaser as content HOT 2
- Need help to find a fingerprint for 60+ ippen.media newssites HOT 18
- Update vox.com.txt
- Suggestion for nytimes.com.txt HOT 4
- Are there any wildcard for ' find_string' or 'replace_string'? HOT 1
- Suggestion on wikipedia
- I can't set author for feeds from RSS-Bridge HOT 3
- How to get content from a site with bad ssl cert HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ftr-site-config.