stantheman / rss2text Goto Github PK
View Code? Open in Web Editor NEWTakes a feed and a format string and prints
License: MIT License
Takes a feed and a format string and prints
License: MIT License
According to the ambiguous docs:
If given an improperly formatted string, this method may die.
I'm not sure what scenarios there are where the string is improperly formatted and it dies, but apparently neither are they. Try::Tiny.
too tired to do right this second, but should allow for better separation
W3CDTF is not the only format that's allowed, and apparently XML::FeedPP has an undocumented get_pubDate_epoch sub that I just found, which would have let me bypass all of the W3CDTF garbage.
I always hate when I use feed-parsers that don't let me auth. I'll hate my tongue for this, but they should do one thing and do it well. RSS will certainly make a hypocrite out of me, but I'll blame XML::FeedPP first :)
fuse-colors would have been a nightmare to write tests for, but this should be pretty straightforward and would be good to do. It would be cool if I was a TDD guy, but this is a cool first step.
I like keeping this as a single file for simplicity's sake, but it'd be much more easily maintained if I could split out the Cache object and maybe add a wrapper RSS object. I could keep the single file by fatpacking in the end.
I'm not 100% sure how I feel about this yet since it's nearly done (even with all of these issues open), so we'll see.
Should let you decide if you want feeds in newest to oldest or oldest to newest. Sorting by pubDate and cutting is beat.
Thanks @trevorparker
You should be able to pass multiple URLs in via STDIN or by passing them as distinct args
When an article is edited, many content management systems update the publication date in the RSS feed. This has the unfortunate side effect of invalidating our cached timestamp, causing the user to see the same entry an additional time. Add some mechanism for preventing this.
I'd like to run through some of my collected feeds to see how many populate the guid field for feed elements. In that case, I should be able to store the last-seen guid as a fallback. Think more about this.
Something about ThreeWordPhrase's RSS feed causes the builtin "pubDate" call not to work, but calling "get('pubDate')" works fine:
perl -MXML::FeedPP -E '$f=XML::FeedPP->new("http://www.threewordphrase.com/rss.xml"); say $f->get_item(0)->pubDate(); say $f->get_item(0)->get("pubDate")'
/var/cache is isn't friendly to users, have it live in /tmp and offer an option to point to a different dir
Every time I think I want a verbose flag, it means I've been looking at a problem for too long. I think what I'd really like to do is dup STDERR, and send major errors to it, and minor updates on stderr. That way you can 2>/dev/null and still see major errors, or 3>/dev/null 2>&3 for silence. Need to double check man page since forking gives you open file descriptors -- can't remember if I can be sure that 3 is open.
I'm thinking there's some problem with this because otherwise this seems like a really cool way to do with log levels
Now that multiple URLs can be specified, the script should be able to process multiple URLs at once so it doesn't take so long to run
rss2text relies on the Last-Modified
response header when sending an If-Modified-Since
request header to a server. This makes sense, but it assumes that servers hosting the same content will reliably send identical Last-Modified
headers. If they don't, then you get a fun split-brain condition where you bounce between getting 200 OK and 304 Not Modified as responses.
It looks like you're stashing last_pulled_dt
-- would there be any downsides to basing If-Modified-Since
on that instead?
Calling get($token) for a feed item doesn't work as well as its real function countepart (if it's available). A format string "link" for a github rss feed (atom 10) will fail using ->get("link"), but succeeds for ->link, because there's more smarts in link.
It kind of ruins the awesomeness of templates. For now, check if $token is a sub provided by XML::FeedPP and call that instead, otherwise fall back to get. Might also need to investigate alternative RSS modules.
Don't forget to increase max_redirects and make the UA something nice.
Need to take a minute and add a real license
In the beginning it was fine to use the first entry's pubDate as the most recent update. rss2text should first check for a pubDate in the channel, or a lastBuildDate in the channel.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.