Giter Club home page Giter Club logo

docker-fivefilters-full-text-rss's Introduction

Docker Image for fivefilters Full-Text RSS service

Docker Pulls Docker Image Size (latest by date)

A Dockerfile for the open (older) version of the Full-Text RSS made by FiveFilters.org. This version is a fork of mine, which fixed the site config updates.

A volume mounting /var/www/html/site_config/ is recommended, especially when using custom site configs.

Since this old version is running on PHP 7.3 (where the security support is running out on 6 Dec 2021) additional protection measures are recommended!

Not affiliated with fivefilters.org. The Dockerfile is licensed under Unlicense.

Config via Environment Variables

ENV Default1 Accepted Description
ftr_enabled true true/false Set this to false if you want to disable the service.
ftr_debug true true/'user'/'admin'/'false' Enable or disable debugging. When enabled, debugging works by passing &debug to the makefulltextfeed.php querystring.
ftr_default_entries 5 int The number of feed items to process when no API key is supplied and no &max=x value is supplied in the querystring.
ftr_max_entries 10 int The maximum number of feed items to process when no access key is supplied.
ftr_content 'user' true/false/'user' By default Full-Text RSS includes the extracted content in the output. You can exclude this from the output by passing '&content=0' in the querystring.
ftr_html5_output 'user' true/false/'user' Full-Text RSS used to rely on libxml to output HTML extracted from a web page. Since version 3.8 we use HTML5-PHP by default.
ftr_summary 'user' true/false/'user' By default Full-Text RSS does not include excerpts in the output. You can enable this by passing '&summary=1' in the querystring. This will include a plain text excerpt from the extracted content.
ftr_rewrite_relative_urls true true/false With this enabled relative URLs found in the extracted content block are automatically rewritten as absolute URLs.
ftr_exclude_items_on_fail 'user' true/false/'user' Excludes items from the resulting feed if we cannot extract any content from the item URL.
ftr_singlepage true true/false If enabled, we will try to follow single page links (e.g. print view) on multi-page articles (if defined in a site config file).
ftr_multipage true true/false If enabled, we will try to follow next page links on multi-page articles (if defined in a site config file).
ftr_caching false true/false Enable this if you'd like to cache results on disk.
ftr_cache_time 10 int How long should a response be cached (minutes)?
ftr_message_to_prepend '' str HTML to insert at the beginning of each feed item when no access key is supplied.
ftr_message_to_append '' str HTML to insert at the end of each feed item when no access key is supplied.
ftr_error_message '[unable to retrieve full-text content]' str Error message when content extraction fails (without access key)
ftr_keep_enclosures true true/false If enabled, we will try to preserve enclosures if present.
ftr_detect_language 'user' * Ignore language: 0
* Use article/feed metadata (e.g. HTML lang attribute): 1
* As above, but guess if not present: 2
* Always guess: 3
* User decides: 'user'
Should we try and find/guess the language of the article being processed?
ftr_user_submitted_config false true/false If enabled, a user can submit site config rules directly in the request using the siteconfig request parameter. Disabled (false) by default.
ftr_remove_native_ads false true/false Many news sites now carry native advertising - articles which have been paid for by a corporation to promote their brand or product.
ftr_admin_credentials array('username'=>'admin', 'password'=>'') Format like this: admin:my-secret-password Certain pages/actions, e.g. updating site patterns with our online tool, will require admin credentials.
ftr_allowed_urls array() ๐Ÿคทโ€โ™‚๏ธ List of URLs (or parts of a URL) which the service will accept.
ftr_blocked_urls array() ๐Ÿคทโ€โ™‚๏ธ List of URLs (or parts of a URL) which the service will not accept.
ftr_blocked_message 'URL blocked' str If a request is blocked outright because of the two rules above, this is the message that is shown.
ftr_key_required false true/false Set this to true if you want to restrict access only to those with a key.
ftr_api_keys array() ๐Ÿคทโ€โ™‚๏ธ Keys let you group users - those with a key and those without - and restrict access to the service to those without a key. If you want everyone to access the service in the same way, you can leave the array below empty and ignore the access key options further down.
ftr_default_entries_with_key 5 int The number of feed items to process when a valid access key is supplied.
ftr_max_entries_with_key 10 int The maximum number of feed items to process when a valid access key is supplied.
ftr_xss_filter 'user' true/false/'user' We have not enabled this by default because we assume the majority of our users do not display the HTML retrieved by Full-Text RSS in a web page without further processing. If you subscribe to our generated feeds in your news reader application, it should, if it's good software, already filter the resulting HTML for XSS attacks, making it redundant for Full-Text RSS do the same.
ftr_favour_effective_url 'user' true/false/'user' When we extract content for feed items, we often end up at a different URL than the one in the original feed. This is often a result of URL shorteners or tracking services being used by the feed publisher. We include the final (effective) URL we reached to get the content inside the dc:identifier field. If you enable this, we'll also use this URL in place of the original item URL in the new feed we produce.
ftr_favour_feed_titles 'user' true/false/'user' By default, when processing feeds, we assume item titles in the feed have not been truncated. So after processing web pages, the extracted titles are not used in the generated feed.
ftr_allowed_parsers array('libxml', 'html5php') ๐Ÿคทโ€โ™‚๏ธ Full-Text RSS attempts to use PHP's libxml extension to process HTML. While fast, on some sites it may not always produce good results.
ftr_allow_parser_override true true/false If enabled, user can pass &parser=html5php to override default parser.
ftr_cors false true/false If enabled we'll send the following HTTP header: Access-Control-Allow-Origin: *
ftr_proxy_servers array() ๐Ÿคทโ€โ™‚๏ธ
array('example2'=>array('host'=>'127.0.0.1:8888', 'auth'=>'user:pass')
You can specify proxy servers here and ask Full-Text RSS to route HTTP requests through these servers. If no proxy server is listed, all requests will be made directly.
ftr_proxy true * Disable: false (no proxy will be used)
* Named: specify which server should be used (e.g. 'example1')
* Random: true (default) a random one from the set above will be used each time Full-Text RSS is called.
How the proxy servers above should be used:
ftr_allow_proxy_override true true/false If enabled, user can disable or change the proxy server used.
ftr_apc true true/false If enabled we will store site config files (when requested for the first time) in APC's user cache. [Since there is no APC in this Dockerfile, this setting doesn't do anything.]
ftr_smart_cache true true/false With this option enabled we will not cache to disk immediately. We will store the cache key in APC and if it's requested again we will cache results to disk. Keys prefixed with 'cache.'
ftr_cache_cleanup 100 0 = script will not clean cache (rename cachecleanup.php and use it for scheduled (e.g. cron) cache cleanup)
1 = clean cache everytime the script runs (not recommended)
100 = clean cache roughly once every 100 script runs
How often the cache is cleared.

Footnotes

  1. as of commit 384d52f โ†ฉ

docker-fivefilters-full-text-rss's People

Contributors

zottelchen avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.