Giter Club home page Giter Club logo

rss2text's People

Contributors

stantheman avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

rss2text's Issues

Better detection of new entries

When an article is edited, many content management systems update the publication date in the RSS feed. This has the unfortunate side effect of invalidating our cached timestamp, causing the user to see the same entry an additional time. Add some mechanism for preventing this.

I'd like to run through some of my collected feeds to see how many populate the guid field for feed elements. In that case, I should be able to store the last-seen guid as a fallback. Think more about this.

Process in parallel

Now that multiple URLs can be specified, the script should be able to process multiple URLs at once so it doesn't take so long to run

LWP redirects + UA

Don't forget to increase max_redirects and make the UA something nice.

Split-brain when relying on Last-Modified header

rss2text relies on the Last-Modified response header when sending an If-Modified-Since request header to a server. This makes sense, but it assumes that servers hosting the same content will reliably send identical Last-Modified headers. If they don't, then you get a fun split-brain condition where you bounce between getting 200 OK and 304 Not Modified as responses.

It looks like you're stashing last_pulled_dt -- would there be any downsides to basing If-Modified-Since on that instead?

fake autoload on certain available commands

Calling get($token) for a feed item doesn't work as well as its real function countepart (if it's available). A format string "link" for a github rss feed (atom 10) will fail using ->get("link"), but succeeds for ->link, because there's more smarts in link.

It kind of ruins the awesomeness of templates. For now, check if $token is a sub provided by XML::FeedPP and call that instead, otherwise fall back to get. Might also need to investigate alternative RSS modules.

Output order as an option

Should let you decide if you want feeds in newest to oldest or oldest to newest. Sorting by pubDate and cutting is beat.

Break it into proper separate files and fatpack/PAR it

I like keeping this as a single file for simplicity's sake, but it'd be much more easily maintained if I could split out the Cache object and maybe add a wrapper RSS object. I could keep the single file by fatpacking in the end.

I'm not 100% sure how I feel about this yet since it's nearly done (even with all of these issues open), so we'll see.

Write tests

fuse-colors would have been a nightmare to write tests for, but this should be pretty straightforward and would be good to do. It would be cool if I was a TDD guy, but this is a cool first step.

Verbose logging by duping stderr

Every time I think I want a verbose flag, it means I've been looking at a problem for too long. I think what I'd really like to do is dup STDERR, and send major errors to it, and minor updates on stderr. That way you can 2>/dev/null and still see major errors, or 3>/dev/null 2>&3 for silence. Need to double check man page since forking gives you open file descriptors -- can't remember if I can be sure that 3 is open.

I'm thinking there's some problem with this because otherwise this seems like a really cool way to do with log levels

Add support for feeds that need cookies/auth

I always hate when I use feed-parsers that don't let me auth. I'll hate my tongue for this, but they should do one thing and do it well. RSS will certainly make a hypocrite out of me, but I'll blame XML::FeedPP first :)

Capture DateTime::Format::W3CDTF death

According to the ambiguous docs:

If given an improperly formatted string, this method may die.

I'm not sure what scenarios there are where the string is improperly formatted and it dies, but apparently neither are they. Try::Tiny.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.