Giter Club home page Giter Club logo

Comments (2)

fivefilters avatar fivefilters commented on June 23, 2024

Hey, thanks for the list. Most of these are carried over from Instapaper when I imported their site rules. They no longer have them public, but it used to be open for anyone to contribute (like this repository). I didn't implement all their directives, so most of these will just be ignored. Here's the list from Instapaper (at least I think all of these are from them, some might be users experimenting/guessing):

  • convert_double_br_tags
  • strip_comments
  • move_into
  • autodetect_next_page
  • dissolve
  • footnotes
  • wrap_in

Of these, I'd like to implement dissolve. I think that removes the containing element without removing the contents. Would've been useful for that French site which had special links for regular words (linked to a dictionary I think). We ended up with a somewhat hacky solution. But dissolve would've come in useful.

These others are implemented in Full-Text RSS:

native_ad_clue
Introduced in Full-Text RSS 3.4. Used to identify if a given article is a native ad. Ad Detector has a lot of rules.

if_page_contains
Introduced in Full-Text RSS 3.5. This is only used with single_page_link at the moment. Added to make single_page_link directives conditional. Sometimes these rules use XPath functions like concat, like in the example you linked to:

  single_page_link: concat(//meta[@property="og:url"]/@content, '?print=1')
  if_page_contains: //a[contains(@class, "articleNav")]

Here, single_page_link will always return a string, so even if the meta element doesn't exist, you'll get '?print=1'. For some sites, the single page view is only available on multi-page articles. When constructing URLs like this, we need a way to make it conditional. Otherwise we'd end up redirecting to a non-existent page, or simply unnecessarily requesting another page when the current one contains everything we need. So that's what if_page_contains does at the moment.

single_page_link_in_feed
This one should be documented, but it's not widely used. Basically the same as single_page_link but applied to the original feed item's description. So safe to ignore if the input URL is not a feed. See this question and our help page.

from ftr-site-config.

j0k3r avatar j0k3r commented on June 23, 2024

Of these, I'd like to implement dissolve.

It might be a good idea. From what I understand, it'll flatten the target node?
Like:

<ul>
  <li>
    <div>my text</div
  </li>
<ul>

If I've dissolve: //ul/li, it'll turn the node into :

    <div>my text</div

Am I right?

Thanks for the explanation on pattern implemented in Full-Text RSS.

For the unused list, maybe we can just remove them from siteconfig to avoid confusion?

  • convert_double_br_tags
  • strip_comments
  • move_into
  • autodetect_next_page
  • footnotes
  • wrap_in

from ftr-site-config.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.