Giter Club home page Giter Club logo

poca's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

poca's Issues

feature: limit by number of entries as well as/rather than than size

Currently you assign a set limit of MBs to the subscription. Make it an optional setting (if not set, there is no limit) and add an additional optional setting limiting the number of entries to keep (e.g. I always want the latest newshour and the one from the day before)

poca-subscribe: review xmlconf

xmlconf seems a bit antiquated way of doing things - just write out a huge string - when we have lxml.objectify. Also, it's style is a bit different from that output by poca-subscribe. Is there a better way of producing a default template than giant string-writing?

Multiprocessing

Set up a socket for receiving feed updates and fire off one process for each subscription. The feed processes report back to the socket. On that socket runs a single, serial downloader that processes the updates (little Wanted+Unwanted+Lacking etc. packages). The processing includes deletes, downloads, and reports to user. The downloader/main process simply deals with the updates in the order they appear on the socket, i.e. more responsive servers will get first in line.

The proposed distinction between multiple update processes and main process is identifiable in the current code as that between 'plans' and 'execution'.

Since the downloading will still be serial, multiprocessing won't accomplish much in terms of speed gains but it should minimize 'lag' and waiting. We stay away from parallel downloads partly because each dl would steal bandwidth from the others, partly because most updates won't see multiple downloads if your average user subscribes to say 10-20 podcasts and update once an hour (assuming). Finally and most importantly, total multiprocessing invites far more chaos when things go wrong and would require a greater ui rethink.

Documentation: Architecture, configuration missing

  • Architecture: A wiki page detailing the inner workings and categories ('fruit' labels and color codes)
  • Configuration: An overview of the configuration file, a listing of all settings and an example configuration

Feed failures are saved to .poca db but episode failures are not

The following download failure entries into the file log


2017-02-27 13:10 RADIOAVISEN. Removed: radioavisen-2017-02-24-12-00-2.mp3
2017-02-27 13:10 RADIOAVISEN. Failed: radioavisen-2017-02-27-12-00-2.mp3
2017-02-27 14:10 RADIOAVISEN. Failed: radioavisen-2017-02-27-12-00-2.mp3
2017-02-27 15:10 RADIOAVISEN. Failed: radioavisen-2017-02-27-12-00-2.mp3
2017-02-27 16:10 RADIOAVISEN. Downloaded: radioavisen-2017-02-27-12-00-2.mp3

are not added to the buffer:

In [2]: fname = '/home/mads/.poca/db/.poca'

In [3]: with open(fname, 'rb') as f:
    jar = pickle.load(f)

In [4]: jar.buffer
Out[4]: []

It seems only failures on the feedparser part are added to the buffer. Is this how we want it to work

Email log

It should be comparaively easy to add support for mailing the changes to yourself? At least with a local mail server. See dispatches for inspiration.

logging actions: standardize on either entry or filename for both file and stream log

Currently file logging user deletions are being handed uids and logging uids. This is due to the confusion over what sort of entity we're handing over to output. Standardizing on filename or entrys would help avoid this confusion.

Pro filenames:

  • It's all output needs. Output is a dumb function that should only be given the very basic necessities, unlike the central machinery of the Feed/Combo/Wanted classes.

Pro entries:

  • Entries are the standard of data exchange throughout the program
  • entry['poca_filename'] is instantly recognisable - you know what that is. Plain filename could be anything.

Ogg file support

CUrrently only mp3 files are tagged. We should extend support for ogg. (test case: Linux Voice)

Looking up file sizes imposes serious lag

When the amount was governed by file sizes we needed file sizes on every file. As part of creating a combo instance an expansion was done on all file entries, including adding information about file size. When this is not included in the feed, we resort to pinging each url in turn to gather this information. For a long feed this can take several minutes.

This should only happen once, because the entryinfo.expand function is only run on entries not in jar. However, it seems to be a returning issue in some cases....?

Options:

  • Investigate if it is indeed a returning issue or just a one time thing per feed
  • Remove all references to file size (we aren't using it currently but it might return?)
  • Work around the fact that some entries will not have file size information

'Terse' output

Terse output would mean only outputting actual changes. So the user would only see lines saying episode removed and episode downloaded. And probably the error ones as well. This could be useful for logging, especially as a prerequisite for email logging (issue #26)

Syntastic moaning

Syntastic has a ton of (style?) complaints. Go through them and either dismiss or adjust.

Download cover.jpg image from feed

Most podcast MP3s come with an embedded image these days but some seem to rely on some itunes magic with images inserted into the feed (usually itunes speciffic tags) Does feedparser report these? can we access them? Download them as a fallback cover.jpg in the folder?

Files that by some error drop out of db are never removed

An error in some (previous?) version seems to have caused some files in Savage Love and TAC to 'drop out' of the db. These files are then invisible to poca and are never removed unless by hand.

Solutions:

  • Ideally not to have files drop out
  • Abandon db, embrace file-on-disk-is-history
  • Some check-up/reset loop that cleans up discrepancies

Unicode testing

We never really probe for what sort of strings we are tossing around. More testing to make sure that we don't run into trouble with unicode/non-unicode strings in filenames, feeds or tags.

Check validity of config

We do a select few checks on config settings but not in any consistent way. E.g. if an incorrect date format is used in after_date the program simply crashes with a ValueError.

There are actually a number of distinct jobs here:

  • Checking if needed elements are present (like settings and subscriptions in the global part and title + url in each subscription)
  • Checking if the values are valid - like a correct url, the proper date format, a path, etc.
  • Converting certain values, like a string into an integer (max_number) or an XXXX-XX-XX date string into a struct_time instance.

Currently a selection of these tasks are performed in between harvesting XML and creating poca's own data holding objects. Which begs three questions:

  • Could we make config less of a jungle if we separated these tasks into their own functions to be performed one after the other?
  • What are the criteria for testing values and element presence? Should we test a select few, all or none?
  • Should we perform all needed conversions in config or is it ok to pass on max_number as a string, to be converted at convenience?

Filter entries

Similar to other restrictions on combo.lst we could restrict it further by filtering based on

  • feed info
  • filename
  • size
  • date and time

Old (ancient) RFI entries are not replaced

Some entries in the RFI feed are not being replaced despite them being from november 2015. How they got in there is a mystery. More improtantly: Why aren't they being replaced by newer ones? They have clearcut entries in both jar.lst and jar.dic - though they may not be conform to 0.5 specs? Maybe 'valid' is not in entry?

logging: filter solution on stream logger is a bad hack

In order to avoid getting summaries of file actions on the stream (in addition to the one-per-line +/-/%) we use logger.warn() but filter warnings out from the stream handler. This works but is utterly incomprehensible to anybody not in on it. Requires explanation or reworking.

Global subscription settings

Subscription settings for

  • max_number
  • metadata
  • filters

should inherit global settings for same. Overrides should be possible on a per-subscription basis.

Outdated man file

Man file makes references to google code and other outdated information.

Use variables in metadata (aka make up consecutive track numbers because the podcast's own are useless)

Some podcasts leave out track numbering or play fast and loose with it, occasionally inserting 'special' shows, that do not get a track number. This can be a problem for audio players.

A solution could be to allow the user to draw on variables for insertion into the metadata, specifically:

  • Consecutive track numbering: We don't care what number the episode has in the mind of the creator, we simply label the first one '001' and take it from there, incrementing one with each new episode.
  • 'Reverse date' into title or album? Or track? (if id3 fields accept So January 18th 2017 would be 20170118 ensuring that ordering by title or album will be ordered in the order they appear.
  • Other feed data -> metadata?

This raises two related question

  • whether any or all of this does not apply equally to file naming (see issue #16 )
  • whether we want to go down the put variables into users' hands route or just add a toggle switch ("yes, please overwrite this subscription's track numbers with made up stuff")

If we want it in file names, variables is the best option. If not it might be best to contain it to a few select scenarios.

Option to start a podcast from the beginning rather than the latest episodes

Working your way through: When a narrative podcast - e.g. Welcome to Night Vale, has a large back archive, you'll want to start at the beginning and work your way through. We need a setting that will give you the first ten episodes and then when you send the signal, it will replace those with the next ten and so forth.

Options to rename files

Many podcasts have either random UID filenames or just name the files inconsistently. This can cause problems with the order in which the files appear in a player.

We should have options for each podcast to rename files based on:

  • Metadata
  • feed data
  • pubDate
  • serial number running from first downloaded to latest
  • ?

Instead of giving free reins we could simply start by having a few simple prepackaged solutions for misbehaving podcasts.

We might also need sign scrubbing similar to derailleur?

Filter: Add per-day-quota to deal with too-frequent updates

Some podcasts, typically news, update more frequently than you might need them to. Limiting the max_number doesn't do anything to combat that as you may only ever have one episode but it will constantly be a new one.

One way to deal with this is using the hour fitler which filters according to pubdate. However, some feeds either vary in the hour of publishing or simply disregard setting the hour on pubdate.

To deal with this we add a quota filter. This will simply instruct poca to filter the feed so that only X entries from any single day remain in the feed. So it will still rely on pubdate but to a lesser extent - hopefully.

poca-subscribe: various questions

  • Should delete with no title/url parameters loop though each and every sub? Or just inform user of option to match using --title/--url? Or both? (yes)
  • Should add inform user of defaults? (no there are not enough settings that benefit)
  • Should add have a better way to apply metadata/filters to subs? (yes but this depends on the one below - add a new issue to 0.9)
  • Should add sample metadata from most recent file? (This belongs in 0.9)
  • Should we add a check command to poca-subscribe that runs through the settings, defaults and subscriptions and informs user of validity and consequences? (no)

Download: socket.gaierror is not caught

If a download starts up without internet connection a socket.gaierror is generated but we're unable to catch it. Instead we cuse a genereic catch-all exception.

files.py, line 47:

except: return Outcome(False, "Unknown error")

Defaults

Some settings are fairly complex to most users. Some may edit them without knowing what they mean. It may therefore be necessary to load defaults and only override them with legitimate settings and settings combinations (utf8/2.4)

poca-subscribe: subscription attributes

Feature: Add tags to subscriptions by way of tag attributes. Specifically:

  • Categories: <subscription category="news">...</subscription> This ties mostly into the list command that would be able to group subscriptions in the same category together
  • State: <subscription state="inactive">...</subscription> A way to temporarily opt out of a podcast without having to save it somewhere else. Should delete audio files but keep db.

README.md needs work

The readme could be updated after all these years.

  1. Only Python 3 instructions
  2. Pip install
  3. Overhauling / rewriting the description, making note of the prevalence of smart phones (rsync anybody?)
  4. Jazzing up: Add a recording of the tool working (https://asciinema.org/ ?)

Use symbols in output for easier parsing

While the less verbose output is easier to interpret, it is still not immediately obvious when there are changes as opposed to when nothing new is in the pipes. One way to make the output quicker to eye-parse is by changing the output ("No changes", "1 file(s) to download", etc.) to signs indicating what's going on.

There shouldn't be an issue with the encoding seeing as we're running Python 3 and Bash (from which we would be cat/less-reading the log) shouldn't have an issue with it either. I believe.

It isn't customary in a CLI program but why not? It could also be an option in preferences:
<pictogram_output>yes</pictogram_output>

Suggestions:

Error: ⚠ (http://unicode-table.com/en/26A0/)
Download: ➕ (http://unicode-table.com/en/2795/)
Remove: ➖ (http://unicode-table.com/en/2796/)
Exit: ❌ (http://unicode-table.com/en/274C/)
Downloading: ⇵ (http://unicode-table.com/en/21C5/)
Failed download: ☇ (http://unicode-table.com/en/2607/)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.