bndr / node-read Goto Github PK
View Code? Open in Web Editor NEWGet Readable Content from any page. Based on Arc90's readability project using cheerio engine.
License: Apache License 2.0
Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.
License: Apache License 2.0
First off, node-read is an awesome project and has been working great! Thanks a ton.
This issue has come up a few times -- it seems like links inside article content get removed, which can be jarring when the link is wrapping a sentence mid-paragraph.
An example is this Vox article: http://www.vox.com/2015/7/27/9044485/rush-limbaugh-donald-trump
See in-browser:
and in reader-mode:
I think the issue is this comparison not using the length of the link's inner-text: https://github.com/bndr/node-read/blob/master/lib/utils.js#L100
I'll make a PR shortly.
I tested https://github.com/bndr/node-read, it only until
so it's easy to switch
the rest is invalid? what happens?
node-readability is able to get the main content from this website (http://www.theguardian.com/world/2014/apr/27/ukraine-kidnapped-observers-slavyansk-vyacheslav-ponomarev), but node-read returns an empty string.
Not sure if this is really an 'issue', but I got the idea from the description that node-read would be able to return content for sites that node-readability could?
i'm just curious, any chances of providing an API to fetch main image, videos and favicon of the article soon?
/**
var linkDensity = getLinkDensity(node, $);
var len = node.text().length;
if (len < 3) return;
if (len > 80 && linkDensity < 0.25) {
append = true;
} else if (len < 80 && linkDensity == 0 && node.text().replace(regexps.trimRe, "").length > 0) {
append = true;
}
I parse this web page http://techdaily.cn/ios-android-pay-wear-zhiwenshibie.html.
I found the content of article was cleaned, like image tag.
Why content is not the origin html. it should't be cleaned or modified
I'm getting:
Uncaught TypeError: Cannot convert undefined or null to object
Seems to happen here:
function getInverseObj(obj){
return Object.keys(obj).sort().reduce(function(inverse, name){
inverse[obj[name]] = "&" + name + ";";
return inverse;
}, {});
}
Without the require my project runs without an issue...
Am I missing some dependency or anything?
In case there are more then one title tag, it sum up all of them.
For example a page like this one: http://news.nationalgeographic.com/2017/06/faceless-fish-deep-sea-voyage-australia/
Parsing this page I get unrelated content from another block, which identifies related content.
read('http://www.tvnet.lv/zinas/arvalstis/507357-krievija_pie_ukrainas_robezas_sakoncentrejusi_200_tanku', function(err, article, res){ console.log(article.content);})
Output (formatted to fit):
<div id="article" class="article">
<a href="http://www.tvnet.lv/zinas/arvalstis/507296-krievija_draud_ukraina_ievest_miera_uzturetajus"
class="thumb330_4-3"><img src="http://itvnet.lv/article/zinas/507296_330x248.jpg" alt=""></a>
<p>Ukrainā attīstoties sliktākajam scenārijam, Maskava «atcerēsies» par Krievijas parlamenta
augšpalātas doto atļauju ievest kaimiņvalstī armiju, paziņojis Krievijas vēstnieks ANO Vitālijs
Čurkins. Viņš piebildis, ka gadījumā, ja vardarbība Ukrainas dienvidaustrumos nerimsies, tad
Krievija sasaukšot ANO Drošības padomes ārkārtas sēdi.</p> </div>
If you find any links that node-read cannot correctly parse, please post them here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.