Comments (3)
Hi Vladimir, I think they still contain the content, but for some reason the HTML5 parser seems not to be able to parse it propertly.
In my tests switching to the libxml parser does appear to work. Are you able to try the updated site config for gizmodo and see if you have any luck: https://github.com/fivefilters/ftr-site-config/blob/master/gizmodo.com.txt
If you don't, can you please give us a URL so we can test with it and see if there's another solution.
When I looked the content in JSON-LD does not contain any HTML markup, and at the moment the Full-Text RSS code only uses JSON-LD for other metadata, not content.
from ftr-site-config.
Oh, that's great. I've tried to search for some text but they randomly insert <!-- -->
comments in HTML so I haven't found it.
Both Gizmodo and Lifehacker are working now. Thank you!
from ftr-site-config.
It's pretty strange. The HTML they serve is a mess. If you disable Javascript in the browser, nothing loads (just a blank screen) and Firefox's developer tools don't seem to be able parse the HTML without Javascript for some reason, so you get a very minimal DOM tree from the result of Firefox's parsing compared to the actual source HTML returned by the server. I think that's what's happening when Full-Text RSS tries to use the HTML5 parser (HTML5PHP) too, although I haven't tested extensively. But the HTML does contain the content, and libxml can parse it, so that's the main change in the site config file that makes this work again: parser: libxml
. Very odd.
from ftr-site-config.
Related Issues (20)
- how to deal with JavaScript objects/json parsing HOT 4
- sz-magazin.sueddeutsche.de sz-plus Login HOT 2
- replace_string for <tag> also replaces </tag>. That is nice, but is this safe? HOT 1
- nytimes.com - lazy-loaded images not loaded HOT 1
- Fix tags when styled by CSS instead of using semantic HTML HOT 6
- quantamagazine.org - `picture`s missing HOT 2
- Kenfm.de is missing HOT 1
- Notebookcheck broken
- Specify title in site config file HOT 4
- tweakers.net pattern doesn't work anymore HOT 1
- nature.com Improvement HOT 2
- faz.net paywall articles shows payment-hint instead of the teaser as content HOT 2
- Need help to find a fingerprint for 60+ ippen.media newssites HOT 18
- Update vox.com.txt
- Suggestion for nytimes.com.txt HOT 4
- Are there any wildcard for ' find_string' or 'replace_string'? HOT 1
- Suggestion on wikipedia
- I can't set author for feeds from RSS-Bridge HOT 3
- How to get content from a site with bad ssl cert HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ftr-site-config.