Giter Club home page Giter Club logo

Comments (1)

fivefilters avatar fivefilters commented on June 28, 2024

Hi Holger, to be clear Full-Text RSS itself doesn't do anything magic here. It simply carries out the string replacement. So in your example, the <p> replacement on the input HTML you provided:

<h2 class="clay-subheader" ...>
   <span class="ordered-list-item">
      <p class="list-item-text">1.</p>
   </span>
   You don’t have to read everyone’s book.
</h2>

becomes the following after string replacement

<h2 class="clay-subheader" ...>
   <span class="ordered-list-item">
      <anydifferenttag>1.</p>
   </span>
   You don’t have to read everyone’s book.
</h2>

The magic that you're seeing happens with the HTML parser when it parses the above. Full-Text RSS relies on HTML5-PHP to parse HTML, which tries to follow HTML5 parsing rules. It's kind of the same way a browser will try to make sense of a malformed HTML document.

I think the ideal way to deal with such changes is with more advanced rules, some of which the original Instapaper rules these are based on supported. Things such as unwrap: XPath or move_into(), which would let you manipulate the DOM without string replacement. We haven't seen many cases where those are essential, so haven't added support for them.

Whether to use the string replacement method you've highlighted and rely on the parser to figure out the correct structure, I'm not sure. It's a bit hacky. I personally wouldn't do it for minor formatting improvements, but I don't have very strong feelings about it.

There are some site config files where we've used string replacement to signal an end to the document where a suitable XPath couldn't be found or it was just much simpler than constructing one to isolate the content. Imagine something like this:

replace_string(<!--end of article-->):</body></html>

We're not creating well-formed HTML, but we're hoping the HTML parser does what we want and ends the document at the point the comment is encountered and ignores all the other elements that follow.

from ftr-site-config.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.