Comments (14)
Could you provide an example. I just tried the first article that loaded and go this result (plenty of images):
from ftr-site-config.
I'll try to explain.
Even in the link you provided, if you visit the verge page, you'll notice that the header image or the first image (however you want to call it) doesn't load. Now for a lot of pages, this is usually the only image for the entire post.
Take a look at this page, it loads no image, but the verge website certainly has one:
from ftr-site-config.
Ah, I see what you mean. Yes, this is an issue which we hope to improve. The main problem is that these feature images are often outside the main body element. It's possible to include them with custom rules (I'll try to add one for theverge.com) but the ideal solution would be something a little smarter that can try to detect them.
A few versions ago we actually added code to Full-Text RSS that would look for the og:image
meta element and insert that into the start of the extracted article if and only if the extracted HTML contained no image elements. I need to see why it's not working for the example you provided, as it should really be including this image.
from ftr-site-config.
I've tried this using varying xpath patterns but the output remains the same. Another feed which has a similar issue is that of nytimes.com
For some weird reason even when I select the topmost of div elements, the main image (placeholder image) is always skipped. Same with this.
from ftr-site-config.
Just updated the site config for The Verge, so this issue should be fixed for this site if you try the links above again.
This line in the site config
body: //div[contains(@class, 'c-entry-content') or contains(@class, 'c-entry-hero__image')]
was changed to
body: //picture[contains(@class, 'c-picture')] | //div[contains(@class, 'c-entry-content') or contains(@class, 'c-entry-hero__image')]
from ftr-site-config.
Thanks a lot. For some weird reason, my own installation, v 3.5 doesn't show images even with the latest config update. Strange.
from ftr-site-config.
Could you test if this works with older versions? I tried your config with 2 older versions and they didn't work.
Thanks again. Appreciate the help.
from ftr-site-config.
No time at the moment to test older versions. But I can't see why it'd be an issue. Might have something to do with the parser being used or the lazy image replacement. My suggestion is try enabling debug on our hosted version and your own version and compare the results.
from ftr-site-config.
Great suggestion. Appreciate all the help. @fivefilters
from ftr-site-config.
I solved it. Turning off the html5php parser resolved the issue. Must be something with my system. @fivefilters
from ftr-site-config.
I'll just put this link here. The same thing happens with NyTimes too. The header image is missing.
from ftr-site-config.
Fixed for that too: https://github.com/fivefilters/ftr-site-config/blob/master/mobile.nytimes.com.txt
from ftr-site-config.
Thank you @fivefilters ! I'm horribly sorry, but the image still doesn't show up:
from ftr-site-config.
It does now, yay!
Actually sometimes it does, sometimes it doesn't. Weird.
from ftr-site-config.
Related Issues (20)
- how to deal with JavaScript objects/json parsing HOT 4
- sz-magazin.sueddeutsche.de sz-plus Login HOT 2
- replace_string for <tag> also replaces </tag>. That is nice, but is this safe? HOT 1
- nytimes.com - lazy-loaded images not loaded HOT 1
- Fix tags when styled by CSS instead of using semantic HTML HOT 6
- How to contribute? HOT 2
- Kenfm.de is missing HOT 1
- Notebookcheck broken
- Specify title in site config file HOT 4
- tweakers.net pattern doesn't work anymore HOT 1
- nature.com Improvement HOT 2
- gizmodo/lifehacker don't work (they store text in JSON-LD now) HOT 3
- faz.net paywall articles shows payment-hint instead of the teaser as content HOT 2
- Need help to find a fingerprint for 60+ ippen.media newssites HOT 18
- Update vox.com.txt
- Suggestion for nytimes.com.txt HOT 4
- Are there any wildcard for ' find_string' or 'replace_string'? HOT 1
- Suggestion on wikipedia
- I can't set author for feeds from RSS-Bridge HOT 3
- How to get content from a site with bad ssl cert HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ftr-site-config.