martijnboers / blottertrax Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 2.0 159 KB

/r/listentothis submissions reddit bot

License: GNU General Public License v3.0

Python 99.66% Dockerfile 0.34%

bot lastfm musicbrainz reddit

blottertrax's People

Contributors

Stargazers

Watchers

Forkers

gaff nedlinin

blottertrax's Issues

Use SoundCloud service to enforce rules

Looking at new posts in /r/listentothis I see a lot of SoundCloud links, would be nice if this service is also implemented and can be used to enforced the rules

Send report instead of deleting submisions

A requirement to test the bot is that as a first step it should send modmail instead of deleting submissions when they exceed the threshold.

Should it also not reply with the artist bio?

Link to artist's Last.fm page should be be included in automatic removal message

This will help users understand any false positives that might occur.

Move Python source to src directory

Remove Python source from project root
Create start.sh bash file that executes python src/main.py

Make exclusion list of artists

To prevent things as 'Various Artists' being flagged for too many Last.fm plays:

https://www.reddit.com/r/listentothis/comments/fdtbst/various_artists_various_tracks_instrumental_the/fjjmu1g/?context=3

Multiprocessing output not logged in docker

https://stackoverflow.com/questions/63051943/logging-with-multiprocessing-in-docker

Find a better way for logging then printing everywhere. Database logging could be combined achieved if using Python's logging package

Feature request: Include Track Information

Hi,

This is a feature request really: Have you considered including track information?

I was thinking of trying to build up a playlist based on the top daily tracks and I sumbled accross your code here. My thinking is I could have enhance each of your service objects to return standard info (artist, track, album, streamcount, url, confidence_level). Then you could put the track URLs in the message body. Another script could then scrape these up and build a playlist for each platform.

Have you got any interest in this idea?

Last.FM license

Last.FM summary is licensed under cc by-sa, which means bot has to link to the license to be allowed to use it.

Issue can be solved by either having the bot state the license with a link to cc by-sa or, if possible, by changing to discogs, as they use CC0

Ensure description gets sanitized for markdown formatting

https://www.reddit.com/r/listentothis/comments/frhf3e/caravan_golf_girl_progressive_rock_1971/flvs3pr/

Ends up with the following contained in it's markdown:

[wikipedia](https://en.wikipedia.org/wiki/Caravan_(band))

When it reality it needs to be

[wikipedia](https://en.wikipedia.org/wiki/Caravan_(band\))

We should probably escape any closing parens contained within links so that reddit will properly parse the URL.

Set requirement dependencies to update untill next major version

I setup dependabot for this repository so dependencies don't get outdated

Add ability to override limits in config file

Right now if we want to change say the YouTube listeners limit we are forced to change source code and recreate the docker image. Instead, maybe keep the defaults in source but allow a way to override it in our configuration file.

Add LICENSE file

I'm a GPL kind of guy but this is open to discussion

Save failed artists lookup

I want to get some insight in why the bot can't find an artists so failed attempts should get saved

Youtu.be shortened links don't have the v query in the url

Example: https://youtu.be/XnZuDiF1w8I

The Youtube Service should adopt to this as now it will fail:

BlotterTrax/youtube.py

Lines 36 to 39 in 2457a4e

 request = self.youtubeClient.videos().list( 

 part="statistics", 

 id=query['v'][0] 

 )

It possible that double > can end up in the artist bio post

See: https://www.reddit.com/r/listentothis/comments/f65cxl/mischief_brew_olde_tyme_memry_folk_2016/fi4q7up/?context=3

This could possibly be cause by a double newline character as they get filtered out here

BlotterTrax/lastfm.py

Lines 51 to 52 in 1331c76

 # Fix formatting for linebreaks 

 description = description.replace("\n", "\n>")

Last.fm's get_bio_summary() doesn't format nicely when there are multiple bands named the same

Some examples:

Cut of at a weird position, actually two bands with this name https://www.reddit.com/r/listentothis/comments/f6hrys/vaz_visiting_hours_noise_rock_2013/fi4t8ml/?context=3
Same here https://www.reddit.com/r/listentothis/comments/f6hrag/the_busy_signals_look_the_other_way_punkpowerpop/fi4t4co?utm_source=share&utm_medium=web2x
Only shows one artist https://www.reddit.com/r/listentothis/comments/f6dot5/aviators_ghosts_of_our_fathers_dark_alternative/fi4s5ay?utm_source=share&utm_medium=web2x
Same here https://www.reddit.com/r/listentothis/comments/f6bqmf/eggplant_pity_alternative1989/

Don't know what the best approach is, maybe the API exposes when an artist name has multiple records and it can be formatted differently

_extract_artist_post_title can match on artist name itself

The function _extract_artist_post_title currently matches based on " -" appearing in the post title. However, this could be within the artist name itself. Look into making this function more robust in the hopes of not hitting false matches.

/r/listentothis currently states there is to be double dashes in the title but experience tells me many users do not follow this and use a single dash so we can't necessarily rely on that.

https://www.reddit.com/r/listentothis/wiki/rules#wiki_title_format.3A_artistname_--_trackname_.5Bgenre_.2F_genres.5D_.28year.29

Add Python 3.7 method return types

Abstract title regex to own class so tests can work in actions step

Now it will crash because the config isn't set in the workflow

Dependabot couldn't authenticate with https://pypi.python.org/simple/

Dependabot couldn't authenticate with https://pypi.python.org/simple/.

You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.

View the update logs.

Provide descriptive function names

Ensure all function names are descriptive as to what they actually do. Preferably ensure comments above the function to describe their intended purpose.

For instance: "Perhaps exceeds_threshold should be named something more descriptive as it is returning a full object rather than a boolean. "get_artist_info" or "get_service_info" or similar?"

Track named returned instead of album title

DescriptionProvider uses the track title instead of the album title in the template. Patch included.

diff --git a/blottertrax/description_provider.py b/blottertrax/description_provider.py
index 85e2901..b24800f 100644
--- a/blottertrax/description_provider.py
+++ b/blottertrax/description_provider.py
@@ -33,6 +33,7 @@ class DescriptionProvider:
         recording = result['recording-list'][0]
         artist = self._get_artist_by_id(recording.get('artist-credit')[0]['artist']['id'])
 
+        album_title = ArrayUtil.safe_list_get(recording, recording['title'], 'release-list', 0, 'title')
         album_release_date = ArrayUtil.safe_list_get(recording, False, 'release-list', 0, 'date')
         life_span_begin = ArrayUtil.safe_list_get(artist, '?', 'life-span', 'begin')
         life_span_end = ArrayUtil.safe_list_get(artist, 'now', 'life-span', 'end')
@@ -55,7 +56,7 @@ class DescriptionProvider:
         return templates.musicbrainz_artist_info.strip().format(
             artist['name'],
             life_span,
-            recording['title'],
+            album_title,
             album_release_date,
             tags,
             'none' if not socials else socials,

Ignore posts with [Playlist] tag

TitleParser exception causing crash

Looks like we need to test song_title for null in TitleParser before we try any further ops. Otherwise, we sometimes crash when the parsing fails out.

Traceback (most recent call last):
  File "./main.py", line 108, in daemon
    self._run()
  File "./main.py", line 53, in _run
    parsed_submission = TitleParser.create_parsed_submission_from_submission(submission)
  File "/usr/src/app/blottertrax/helper/title_parser.py", line 46, in create_parsed_submission_from_submission
    song_title = re.sub(r"\(.*\)", "", song_title)
  File "/usr/local/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./main.py", line 108, in daemon
    self._run()
  File "./main.py", line 53, in _run
    parsed_submission = TitleParser.create_parsed_submission_from_submission(submission)
  File "/usr/src/app/blottertrax/helper/title_parser.py", line 46, in create_parsed_submission_from_submission
    song_title = re.sub(r"\(.*\)", "", song_title)
  File "/usr/local/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./main.py", line 116, in <module>
    BlotterTrax().daemon()
  File "./main.py", line 112, in daemon
    self.daemon()
  File "./main.py", line 112, in daemon
    self.daemon()
  File "./main.py", line 112, in daemon
    self.daemon()
  [Previous line repeated 328 more times]
  File "./main.py", line 110, in daemon
    traceback.print_exc(file=sys.stdout)
  File "/usr/local/lib/python3.7/traceback.py", line 163, in print_exc
    print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain)
  File "/usr/local/lib/python3.7/traceback.py", line 104, in print_exception
    type(value), value, tb, limit=limit).format(chain=chain):
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  [Previous line repeated 328 more times]
  File "/usr/local/lib/python3.7/traceback.py", line 508, in __init__
    capture_locals=capture_locals)
  File "/usr/local/lib/python3.7/traceback.py", line 333, in extract
    limit = getattr(sys, 'tracebacklimit', None)
RecursionError: maximum recursion depth exceeded while calling a Python object

Ensure artist description links are good

https://www.reddit.com/r/listentothis/comments/fru372/brooke_waggoner_lungs_speed_lungs_sped_indie_folk/flxn6v6/

The artist description pulled from discogs has a link for lyrics. This link redirects multiple times and ends up on an adfilled hellscape of a website.

Maybe we should whitelist a handful of sites and remove any links that don't match that whitelist? For instance, the main socials, amazon, wiki, etc.

Free streaming links should be parsed to show the streaming service

Basically the same as #61 but for streaming services

Example: https://www.reddit.com/r/listentothisbottest/comments/fn6x9o/black_kirin_wangchuan_river_black_metalfolk_2017/fm959ys?utm_source=share&utm_medium=web2x

Add better README

Add MusicBrainz link to ‘Socials’

Add the MusicBrainz link to ‘Socials’ - although it is down by ‘submit corrections’, for a hot second I couldn’t find the MB link. This would be consistent with the ‘other databases’ links there already.

Disclaimer: I work part-time for the MetaBrainz foundation - I see this tool pop up a lot on my Reddit alerts :)
Feel free to decline the change, of course!

Report possible self promo

Would like to see us try to detect self promotion. One simple step would be to take the artist name, remove all spaces and drop it to lowercase and compare to lowercased reddit account name. Perhaps if the artist name appears in the name we report the post for mods to double check.

Determine hosting provider and details

Need to determine how we are intending to host the bot.

Hosting provider: Amazon EC2 Micro? Digital Ocean?
Organize login methods for @martijnboers and @Nedlinin for maintenance needs.

Continuous deployment? Docker container desired?

Key errors in artist description

Traceback (most recent call last):
  File "./main.py", line 133, in daemon
    self._run()
  File "./main.py", line 63, in _run
    self._reply_with_sticky_post(submission, self.description_provider.get_reply(parsed_submission))
  File "/usr/src/app/blottertrax/description_provider.py", line 51, in get_reply
    recording['release-list'][0]['date'],
KeyError: 'date'

Properly label social networks in the description reply

https://www.reddit.com/r/listentothis/comments/frnzfw/squid_sludge_experimental_rock_2020/flwrnku/

This contains

socials: bandcamp, discogs, free streaming, purchase for download, social network, social network, social network, soundcloud

as the reply. Should be fairly trivial to check the URL for containing facebook, twitter, instagram, etc and replace the text with the actual social network being linked to rather than forcing the user to click/hover each.

        ('007Bonez feat Adro - Motion [Hip-Hop / Rap] (2019)', '007Bonez feat Adro', None, 'Motion'),

This should be

       ('007Bonez feat Adro - Motion [Hip-Hop / Rap] (2019)', '007Bonez', 'Adro', 'Motion'),

But, worse yet, artists that actually have & in the name are detected as a featured artist. For example:

       'Simon & Garfunkel - The Sound of Silence'

Would list the main artist as Simon and the featured artist as Garfunkel.

Not sure there will be an easy way to solve that second one though.

It should use the first album release, not the last

Now it shows collection albums etc if there's more than one release for the track

https://github.com/martijnboers/BlotterTrax/blob/master/blottertrax/description_provider.py#L36

	request = self.youtubeClient.videos().list(
	part="statistics",
	id=query['v'][0]
	)

	# Fix formatting for linebreaks
	description = description.replace("\n", "\n>")