Giter Club home page Giter Club logo

Comments (3)

fivefilters avatar fivefilters commented on June 23, 2024

Hi Vladimir, I think they still contain the content, but for some reason the HTML5 parser seems not to be able to parse it propertly.

In my tests switching to the libxml parser does appear to work. Are you able to try the updated site config for gizmodo and see if you have any luck: https://github.com/fivefilters/ftr-site-config/blob/master/gizmodo.com.txt

If you don't, can you please give us a URL so we can test with it and see if there's another solution.

When I looked the content in JSON-LD does not contain any HTML markup, and at the moment the Full-Text RSS code only uses JSON-LD for other metadata, not content.

from ftr-site-config.

vshabanov avatar vshabanov commented on June 23, 2024

Oh, that's great. I've tried to search for some text but they randomly insert <!-- --> comments in HTML so I haven't found it.

Both Gizmodo and Lifehacker are working now. Thank you!

from ftr-site-config.

fivefilters avatar fivefilters commented on June 23, 2024

It's pretty strange. The HTML they serve is a mess. If you disable Javascript in the browser, nothing loads (just a blank screen) and Firefox's developer tools don't seem to be able parse the HTML without Javascript for some reason, so you get a very minimal DOM tree from the result of Firefox's parsing compared to the actual source HTML returned by the server. I think that's what's happening when Full-Text RSS tries to use the HTML5 parser (HTML5PHP) too, although I haven't tested extensively. But the HTML does contain the content, and libxml can parse it, so that's the main change in the site config file that makes this work again: parser: libxml. Very odd.

from ftr-site-config.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.