Giter Club home page Giter Club logo

rsspls's Introduction


RSS Please

A small tool (rsspls) to generate RSS feeds from web pages that lack them. It runs on BSD, Linux, macOS, Windows, and more.


rsspls generates RSS feeds from web pages. Example use cases:

  • Create a feed for a blog that does not have one so that you will know when there are new posts.
  • Create a feed from the search results on real estate agent's website so that you know when there are new listings—without having to check manually all the time.
  • Create a feed of the upcoming tour dates of your favourite band or DJ.
  • Create a feed of the product page for a company, so you know when new products are added.

The idea is that you will then subscribe to the generated feeds in your feed reader. This will typically require the feeds to be hosted via a web server.

For more information including installation instructions, documentation, and news visit the RSS Please website.

Visit Website

Build From Source

Minimum Supported Rust Version: 1.70.0

rsspls is implemented in Rust. See the Rust website for instructions on installing the toolchain.

From Git Checkout or Release Tarball

Build the binary with cargo build --release --locked. The binary will be in target/release/rsspls.

From crates.io

cargo install rsspls

Credits

Licence

This project is dual licenced under either of:

at your option.

rsspls's People

Contributors

dependabot[bot] avatar lcchy avatar regexident avatar wezm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rsspls's Issues

Windows support

Everything should work on Windows except that the xdg crate doesn't build. Need to swap it with something else on Windows (like dirs) or manually determine the paths to use for config and cache.

Error selecting date field - `invalid selector for date: a.rounded-md`

Given the following configuration:

[rsspls]
output = "/tmp"

[[feed]]
title = "DeepLearning.ai"
filename = "deeplearning-ai.rss"

[feed.config]
url = "https://www.deeplearning.ai/the-batch/"
item = "article"
heading = "h2"
link = "a.absolute"

[feed.config.date]
selector = "a.rounded-md" # or "a.relative"
type = "Date"
format = "[month repr:short] [day], [year]"

It always fails with the following error:

 INFO  rsspls > processing https://www.deeplearning.ai/the-batch/
 ERROR rsspls > error processing feed for https://www.deeplearning.ai/the-batch/

Caused by:
    invalid selector for date: a.rounded-md

I'm pretty sure document.querySelectorAll("article a.rounded-md") returns the date elements in my browser. What could be the problem?

As a workaround, I used selector = "div a", which seems to work, but with some warnings:

 WARN  rsspls > unable to parse date ''
 WARN  rsspls > unable to parse date ''
 WARN  rsspls > unable to parse date ''

Invalidate cache items when version changes

Since using a new release of rsspls might result in different RSS being generated for the same input HTML the cache should consider the version. A version field needs to be added to the cache files and then used when deciding whether to deserialise the headers.

Parse date with ordinals

These dates fail to parse:

 WARN  rsspls > unable to parse date 'Sunday, April 18th, 2021'
 WARN  rsspls > unable to parse date 'Friday, January 8th, 2021'
 WARN  rsspls > unable to parse date 'Tuesday, September 15th, 2020'
 WARN  rsspls > unable to parse date 'Monday, April 6th, 2020'
 WARN  rsspls > unable to parse date 'Friday, March 13th, 2020'
 WARN  rsspls > unable to parse date 'Sunday, June 2nd, 2019'
 WARN  rsspls > unable to parse date 'Wednesday, May 29th, 2019'
 WARN  rsspls > unable to parse date 'Saturday, May 25th, 2019'
 WARN  rsspls > unable to parse date 'Thursday, May 16th, 2019'
 WARN  rsspls > unable to parse date 'Tuesday, May 7th, 2019'
 WARN  rsspls > unable to parse date 'Friday, April 26th, 2019'
 WARN  rsspls > unable to parse date 'Tuesday, April 23rd, 2019'
 WARN  rsspls > unable to parse date 'Wednesday, April 17th, 2019'

unable to write cache on windows

PS C:\Users\wes\Downloads\rsspls-0.8.1-x86_64-pc-windows-msvc> .\rsspls.exe --config C:\Users\wes\Documents\rsspls.toml --output .
 INFO  rsspls::feed > processing https://example.com/
 INFO  rsspls::feed > no explicit link selector provided, falling back to heading selector: "a"
 INFO  rsspls       > write .\example.rss
 ERROR rsspls       > error processing feed for https://example.com/

Caused by:
   0: unable to write to cache
   1: The system cannot find the path specified. (os error 3)

Publish new version to crates.io

For 0.5.0 I started depending on the git version of the time crate in order to get the feature mentioned in time-rs/time#541 (comment) when a new release of time is out I can go back to depending on a published time version and update rsspls on crates.io.

Adding the option to route requests through a proxy

For my use of rsspls I have added the option to route all requests through a proxy url (my home residiential IP, while rsspls runs on a VPS which gets blocked) specified in the [rsspls] section of feeds.toml. It supports socks5, http and https proxies with a quite minimal code change.

Would you be ok with a PR for this feature?

400 Bad Request for a valid url

If I put url = "https://www.stephenspencerdavis.com/recent-work/" into feeds.toml, then it gives:

 INFO  rsspls::feed > processing https://www.stephenspencerdavis.com/recent-work/
 ERROR rsspls       > error processing feed for https://www.stephenspencerdavis.com/recent-work/

Caused by:
    failed to fetch https://www.stephenspencerdavis.com/recent-work/: 400 Bad Request

I'm just starting to use rsspls. Another test url I put in seems to not encounter this problem. But the url opens fine in a browser, and the documentation includes another example of a url with a trailing slash, so I don't see why that would be the issue.

Any help you can give would be appreciated.

This is with the Linux x86_64 binary download v. 0.8.1 running on Ubuntu.

Optional media

If a media selector is specified it's currently required that all items match this selector, otherwise the feed will not generate. Since the enclosure is an optional element in the feed I think it would make sense to skip the enclosure if the media selector does not match anything. What do you think @Lcchy?

Adding a User agent header to the requests made

Hi there,

first of all, I'd like to thank you for the great project, I was desperate to find a nice RSS scraper service, and have found it! It being written in Rust adds to my excitement.

I would like to add some features to rsspls that I am needing for my usage, and therefore wanted to ask you if you are open to PRs?

The first one would be the option to add a user-agent header to the requests made as some websites don't allow for an empty one.

I have already added it to my fork and can open a PR if it sounds good to you.

Support providing separate selectors for item heading and link

For websites that list a bunch of articles but then only provide links to the items behind a generic anchor title such as "read more" the generated feed is not very user-friendly.

As such it would be nice to be able to grab the heading title and link url from different elements.

Add key-value argument for dynamic feed config

Hi there,

For my personal usage I have been adding a feature to the feeds config.

I am subscribing to a website that has different "profiles" (a bit like mastodon or instagram).
The website is the same but the content is different for each profile and I would like to subscribe to different profiles as separate feeds.

I could duplicate the feed config in the feeds.toml but I have close to 100 subscriptions so that would become cumbersome.

That is why I implemented this feature which enables the passing of an arbitrary key-value parameter that will be inserted into the config.

Example:

rsspls --parameter username=john

with the config:

[rsspls]

[[feed]]
title = "%<username> - Feed"
filename = "feed_%<username>.rss"

[feed.config]
url = "https://www.example.com/profile/%<username>/"
...

will simply replace the "%" with "john" and then generate the feed as before.

I already opened the PR so thats its easier to discuss, but feel free to reject it if you don't think its applicable.

Trouble with selector for "summary"

I'm trying to get this working to pull articles and abstracts from here. Article titles are working fine, but I'm struggling to pull the abstracts as well. The abstracts are folded away by default, but the text is there in the HTML anyhow, so I think this should be possible. The css selector ".abstract p" targets the abstracts I'm after. But when I use summary = ".abstract p" in my feeds.toml, I get the error "Invalid selector for summary: .abstract p". Can you help me understand the problem I'm running into?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.