sofaworks / nosleepautobot Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 4.0 285 KB

Submission-checking bot for r/nosleep

License: Apache License 2.0

Python 99.37% Dockerfile 0.63%

nosleep reddit bot

nosleepautobot's People

Contributors

Stargazers

Watchers

Forkers

craigralston watchful1 binarybrat

nosleepautobot's Issues

Post a reply for bad tags rather than send a PM

Don't clutter up the modmail with a bunch of tag removal messages. Since posts with bad titles are already deleted anyway, just send a reply within the post like the bot does for posts that violate the time limit rule.

Long paragraph checker edge cases

The recent paragraph checker feature implemented in #6 does not account for the following edge case.

When a block of text does line breaks like this, it is a problem.

I am text.\n \nI am more text.

The long paragraph checker splits on consecutive \n characters, but if a user for whatever reason has spaces between their newlines, it throws the checker off.

Enhancements to /moderation/activity_tracker.py

We need a post count-specific report that can be generated on request which includes total number of posts made to /r/nosleep, number of posts approved by any/all mods, and number of posts removed by any/all mods within a specific time period.

Sample Input (this is just an example and can be adjusted as makes sense):

posts --startdate 2019-08-19 --enddate 2019-08-21

Note: if dates are the same, only that one day is being requested. Or you can make up a different way to only request a single day of data, I don't care. But single day requests should be allowed.

Generates a table like:

Date	Total Posts to NoSleep	Approved Posts	Removed Posts
2019-08-19	50	25	25
2019-08-20	70	40	30
2019-08-21	60	30	30

Bot is flaky on enforcing 24-hour posting rule

User /u/CigarettesAndSongs triggered a bug/mishap with the bot wherein they posted the same story (same title and text) twice in a 45 minute period and the bot did not remove the second, duplicate posting.

The stories were:

The bot should always remove stories if the user has an active story (non-removed) on /r/NoSleep within the last 24-hours.

Two spaces at the end of a line followed by a single return should be considered valid formatting for a post

We recently ran into a story on /r/NoSleep that utilized the following format:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc consequat consequat elit. Donec ut dignissim tellus.
Integer vehicula, sem ut condimentum porta, lacus mauris venenatis elit, eget cursus lectus ante sed leo. Mauris at leo porta, semper diam eget, viverra nulla. Praesent sit amet felis nisl. Etiam quis luctus diam. Ut tempor, quam a pellentesque dictum, mauris metus blandit nibh, et aliquam enim quam at ipsum. Pellentesque nec tellus suscipit nisi gravida fringilla id at augue. Nam auctor neque sed luctus aliquam.
Donec quis arcu at ex consequat accumsan. Proin eu fermentum tortor. Ut malesuada lorem sed tortor sagittis, eu pharetra augue egestas. Nunc aliquam fermentum porta.
Aliquam volutpat erat nec turpis cursus rhoncus. Quisque tempor sapien a ipsum hendrerit, non tristique velit bibendum. Cras mattis luctus nulla, sed ultrices velit. Praesent a magna nec tellus condimentum aliquam.
Nulla nibh turpis, pharetra quis semper sed, posuere vel quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Duis ultricies est in metus tincidunt euismod.

This format is: lines or paragraphs which have two spaces at the end of them, followed by a single return. Since the bot doesn't recognize this as a valid paragraph break style, if the story is more than 350 words, the bot will reject it for formatting issues.

Although this formatting isn't a common case and doesn't conform to the "double-return to create new paragraphs" format, the mods have deemed it acceptable. The bot should not reject stories formatted like this for formatting issues.

NoSleepAutoBot posted in another subreddit

NoSleepAutoBot posted a comment in the /r/Politiek subreddit here:

https://www.reddit.com/r/Politiek/comments/sxuwru/comment/ifrvjyc/

Interestingly, the post is five months old, but it did this three days ago...

Move off Heroku

Heroku has finally decided to end their free tier, so the bot will need to be migrated to a different service since it certainly doesn't need the resources of their $7 tier.

Add 'Days Active' concept to mod activity tools

For both the weekly report sent to mods (/tools/track_mod_activity.py) and the on-demand activity reports (/moderation/activity_tracker.py), can we additionally have 'days active' reported?

'Days Active' is the number of days within the requested time window that the moderator did any sort of activity (even so much as one removal/approval of a post or comment).

Now the weekly PM to mods should read:

Post Approvals:
Post Rejections:
Comment Approvals:
Comment Rejections:
Days Active:

And the on-demand activity report should include an additional column for 'Days Active'.

Bot removes stories when there are 4 or more spaces on a blank line

The bot incorrectly identified this story as a formatting issue and removed it for "4 or more spaces at the beginning of a line".

On closer inspection (i.e. checking the source of the story), you can see that many of the blank lines between paragraphs have a big block of blank spaces/tabs on them.

Example clip from the story:

I did find something just as weird, though. As I was heading back toward the shop I came in through, I saw an open door that I hadn't seen before. Behind it was a staircase leading down. I figured it must just lead to the boiler or something, but since I was here, why not explore the whole place?

The stairs went a lot farther down than I expected. I'd probably descended five floors or more before I finally reached the bottom.

Lucene search syntax doesn't work for users with hyphenated names during recent submission search

Submissions from users with names like -Bat-Country or Strawberry-Sunrise result in incorrect search results because Reddit's Lucene indexes aren't searchable using - characters (probably using StandardAnalyzer in Lucene).

Searching for submissions by users should be done using cloudsearch syntax.

Repro case example:

#!/usr/bin/env python
import praw

reddit = praw.Reddit()
subreddit = reddit.subreddit('nosleep')
submissions = list(subreddit.search("author:-Bat-Country-", time_filter='day', syntax='lucene', sort='new'))

The above will return []

Running the search with the following works:

submissions = list(subreddit.search("author:-Bat-Country-", time_filter='day', syntax='cloudsearch', sort='new'))

Set TTLs on indexes

Since the bot uses the Walrus Redis wrapper, it also generates several indexes on the AutoBotSubmission model objects that are saved to the database. The indexes are saved to Redis as SET keys, but are never removed.

Add users to approved users list in /moderation/activity_tracker.py

NoSleep now has moderator team leads and we’d like them to be able to use the Activity Tracker portion of the bot’s functionality. Please add the following users as approved users so that they can use bot PMs to track moderator activity:

/u/flard
/u/onyxoctopus
/u/desidarling

Rewritten bot does not properly enforce post time limit

This is a tracking issue because this was fixed in source. The problem was that cache_activity_maybe stored the name of a submission rather than the id, and is_post_deleted would return True in these situations because the praw Submission model requires a construction using just the base36 id, and not something like t3_base36id.

nosleepautobot/autobot/autobot.py

Lines 284 to 294 in 3b21414

 last_post_id=submission.name, 

 last_post_time=submission.created_utc 

 ) 

 ttl = tl - int(diff) 

 logger.info("Caching activity", info=activity, ttl=ttl) 

 self.activity_db.persist(activity.author, activity, ttl=ttl) 

 else: 

 logger.info("Not caching activity for post outside timelimit", 

 author=submission.author.name, 

 subreddit=submission.subreddit.display_name, 

 id=submission.name)

Disable 24-Hour Rule For "Beyond Belief" Event

Flip the 24-Hour Rule flag in Heroku sometime on the evening of March 1st in preparation for "Beyond Belief" event which will run from March 2nd until March 5th. The subreddit will be closed for entries prior to that, so the exact time doesn't matter as long as it's done before 11:59PM.

Search for "Redditor's Most Recent Posts" being performed twice

Issue has existed since the original write of the bot, and was carried over into the rewrite.

When processing new posts, a search is done on the author's posts to the subreddit, with a time_filter=day to see if they have posted within the last day already. This search is being done twice - once in a method to check if they have posted already, and once again in the method that generates the comment that tells them they're not allowed to post in the time limit.

nosleepautobot/autobot/autobot.py

Lines 194 to 199 in 8ce8977

 # TODO this is duplicated in process_time_limit_message 

 most_recent = min( 

 self.reddit.get_redditor_posts(post.author), 

 key=attrgetter("created_utc"), 

 default=None 

 )

nosleepautobot/autobot/autobot.py

Lines 223 to 228 in 8ce8977

 # TODO this is duplicated in reject_submission_by_timelimit 

 most_recent = min( 

 self.reddit.get_redditor_posts(post.author), 

 key=attrgetter("created_utc"), 

 default=None 

 )

Usernames with hyphens aren't processed/handled properly

It appears that the reddit search functionality isn't handling users with hyphens in their usernames properly.

Example:

This user posted three stories over the course of two 24-hour periods, which should have been removed by the bot but weren't.

https://www.reddit.com/user/then-system-1022

Family Mystery (tglfmw) @ 5:56pm Mar 17
Lucky Rabbit Foot (tg01n8) @ 10:43pm Mar 16
Reverse Surgery (tfoav4) @ 2:10pm Mar 16

Codeblock checking corner cases

Right now, the code to determine if codeblocks exist in the code is to check if a paragraph starts with 4 spaces.

if some_string.startswith('    '):

However, posts with hard tab \t characters leading off paragraphs will also generate code blocks.

Additionally, this means that leading spaces followed by tabs can also trigger a code block to be generated.

Bot should allow "Update [X]" and "Update #[X]" tags

In addition to "Update", valid tags should also include "Update" + a number and "Update" + a space + "#" + a number.

For example:

Update 2
Update 3
Update #4
Update #5

Add the pipe character ( | ) to the title tagging rules

In addition to brackets/braces, the only thing allowed in titles between pipes should be part numbers.

Example:

Allowed: |Part 1|
Not Allowed: |HORROR|

Essentially, add | into the match list at line 206?

Bot is not posting series reminder comment if post is tagged "NSFW"

It looks like the bot isn't posting the "series reminder" comment on series posts if they've been marked "NSFW" through reddit's interface, even if they are properly flaired as "Series" or have allowable series tags in their title.

Examples:

https://www.reddit.com/r/nosleep/comments/bh3jyv/living_in_an_rv_park_pt_2/
https://www.reddit.com/r/nosleep/comments/bgx5ut/hell_is_other_rabbits_father/
https://www.reddit.com/r/nosleep/comments/bgd2nb/weird_shit_ive_seen_in_the_army_fob_scorpion/

Formatting for AutoBot removals/re-approval requests sent to modmail is ugly

Not sure when this happened or why, but text formatting for the removal messages (the ones users send to modmail to request re-approvals) is sometimes ugly. See here:

I'm not sure if this is something that can be fixed on our side, or if it's something that has to do specifically with the client a user is submitting the message through...

Provide option to filter out 'new()' posts that are older

The bot only caches post data for post_time_limit * 2, and depending on how many posts are making it into the subreddit, it's very possible for new() to return old posts in the initial request, which means that the bot might end up double-processing posts.

Make the bot remove "unreadable" wall-of-text posts

One of /r/NoSeep's rules is that all story posts must be formatted in a readable fashion. One of the biggest issues we have is people posting "walls of text" -- entire stories without paragraph breaks, like this (and worse -- that's actually a really mild example).

We'd like to have Noxbot automatically remove posts that have any paragraphs of text with more than 350 words in them.

When a post is removed, a PM should be send (from the subreddit) to the user with the following text:

[[YOUR POST]] has been temporarily removed from /r/NoSleep because it appears to have formatting issues which make it unreadable. In this case, you have one or more paragraphs with more than 350 words. Please break up your story into smaller paragraphs. You can create paragraphs by pressing "Enter" twice at the end of a line.

Once you have fixed the formatting, please respond to this PM for re-approval. The re-approval process is manual, so send a single request only. Multiple requests do not mean faster approval; in fact they will clog the modqueue and result in re-approvals taking even more time.

Please test this change before implementation and alert /u/Himekat so she can watch the bot's behavior on /r/NoSleep.

PS. If it would make it easier to do this by character count instead of word count, I can provide a character count that would be appropriate.

Re-enable 24 Hour Rule After "Beyond Belief" Event

Flip the 24-Hour Rule flag back to active on the night of March 4th at 11:59 PM (since the event ends a minute later at midnight on March 5th).

Make bot remove "unreadable" code block posts

A common formatting issue we see in /r/NoSleep is for people to have a tab or series of spaces at the beginning of their paragraphs (like you might in a traditional word processor or book). Unfortunately, reddit renders four spaces at the beginning of a line as a block of code, which almost 100% of the time makes the story unreadable.

Please have the bot check the beginning of every paragraph/line in a story. If any paragraph/line starts with four or more spaces, the story should be removed (as it is more than likely unreadable).

When a post is removed, a PM should be send (from the subreddit) to the user with the following text:

[[YOUR POST]] has been temporarily removed from /r/NoSleep because it appears to have formatting issues which make it unreadable. In this case, you have one or more paragraphs or lines which begin with four or more spaces.

On reddit, lines beginning with four or more spaces are treated as blocks of code and make your story unreadable. Please remove spaces at the beginning of paragraphs/lines. You can create paragraphs by pressing "Enter" twice at the end of a line if you haven't already done so.

Once you have fixed the formatting, please respond to this PM for re-approval. The re-approval process is manual, so send a single request only. Multiple requests do not mean faster approval; in fact they will clog the modqueue and result in re-approvals taking even more time.

Please test this change before implementation and alert /u/Himekat so she can watch the bot's behavior on /r/NoSleep.

24-hour rule possibly not working for select users?

A user's posts slipped through the 24-hour rule implementation:

Posted Jul 13 1pm: https://www.reddit.com/r/nosleep/comments/vy92xd/i_quit_my_job_because_of_this_encounter/

Posted July 14th 6am: https://www.reddit.com/r/nosleep/comments/vyspi6/i_will_never_know_what_took_my_family_away_and_i/

Posted July 14th 8am: https://www.reddit.com/r/nosleep/comments/vyv1u6/the_crooked_men/

24-hour rule comment (by bot) doesn't display time remaining correctly

The comment that the bot leaves on posts it removes should display the time remaining until the user is allowed to post again in the subreddit. Right now, it doesn't display the time remaining correctly.

Adjust caching mechanism to accommodate for people who delete/remove posts and repost within 24 hours

Each user can only have one active/approved post to /r/NoSleep within a 24 hour period. If they have more than one post but, for some reason, the other post was removed by the mods, that shouldn't count against their limit. If they posted but then deleted the post, that also shouldn't count against their limit.

Change formatting removal from PM to comment on post

When the bot removes a post for formatting issues, instead of sending a PM about it (from the subreddit), please have the bot comment on the story directly. In addition to the current stuff the message says, please have a link at the bottom that says:

Once you have fixed your formatting issues, please [[click here]] to request re-approval. The re-approval process is manual, so send a single request only. Multiple requests do not mean faster approval; in fact they will clog the modqueue and result in re-approvals taking even more time.

The "click here" part should either send a PM to the subreddit with the below message/information, or open a message window pre-populated with the below information so that the user can manually send it:

[[MY POST]] to /r/NoSleep was flagged for formatting issues. I have fixed those issues and am now requesting re-approval.

Note to moderation team: if this story is eligible for re-approval, remember to remove the bot's comment from it.

Handle new exception raised on sending PMs to authors with restricted PMs on

Reddit API seems to have enabled a new exception when sending PMs to people who have restricted PM access on, resulting in messages like this:

APIException: NOT_WHITELISTED_BY_USER_MESSAGE: 'no puedes enviar un mensaje a ese usuario' on field 'to'

[LOW PRIORITY] Lock NoSleepAutoBot's series reminder comment

Reddit just added a new feature: the ability to lock comments.

When NoSleepAutoBot leaves a series reminder comment on a Series story, that comment should be locked so that no one can reply to it.

Bot doesn't send series PM after a non-tagged, processed post gets a "Series" flair`

Note

This is a tracking issue because this has been fixed with 0b7094f

Reproduction Steps

Make a post meeting the following criteria:
- Title does not contain tags (because all tags identify a story as being a 'series'
- Do not flair the post as Series (or whatever series flair it is)
Run bot to analyze the post and have it cache the analysis
Flair the post Series
Re-run bot on post
Bot identifies the post as having been tagged Series after the fact
Bot doesn't doesn't PM user the "Series PM"

Description

This is a bug that has probably existed since the creation of the bot, but I noticed that it happened while testing the Major Overhaul PR #107.

When an already-processed post was rechecked during subsequent runs to figure out if it had been flaired as a Series, the code did this:

if obj:
    # Do processing on previous submissions to see if we need to add the series message
    # if we saw this before and it's not a series but then later flaired as one, send
    # the message
    if not obj.is_series and (p.link_flair_text == 'Series'):
        obj.is_series = True
        self.post_series_reminder(p)
        obj.save()

So it would post the comment like "Oh, there may be more to this story," but it wouldn't send the series PM or mark it as having been sent when it updated the object in cache.

Rule-breaking title issue

The following story has a rule-breaking title: https://www.reddit.com/r/nosleep/comments/zzenot/the_vulture_house_part_1_of_2/

(Part 1 of 2) shouldn't be allowed as a title, yet it appears here in "The Vulture House (Part 1 of 2)".

Only a single part should be allowed (e.g., Part 1 or Part One).

Add user to approved users list in Heroku for Activity Tracker

Please grant user survivalprocedure the ability to pull Activity Reports via the setting in Heroku.

/tools/track_mod_activity.py enhancement

Our head mod, cmd102, would like the bot to be able to help her monitor moderator activity.

She should be able to send a PM to /u/NoSleepAutoBot with a command and receive back moderator activity for the entire month (First day of the current month to current day of month, like the current mod activity script),

If she sends the command activity all, she should get back every NoSleep moderators' post approvals, post removals, comment approvals, and comment removals. The format should be readable and I wouldn't spend too much time on it, although bonus points if it were a well-laid-out grid. Otherwise, just a big long list is fine.

If she sends the command activity {USERNAME}, she should get back the activity for that one moderator only. Bonus points if you support comma-separated lists for the purposes of submitting multiple usernames, but not entirely necessary.

Additional notes:

Right now, only /u/cmd102 and /u/Himekat (for testing purposes) should be able to receive a response to these commands. The bot should ignore requests from anyone else. But it might make sense to make it an easily-editable whitelist in case we need to add more people in the future.

Change moderation/activity_tracker.py to have on-demand date ranges

Currently, PMing the bot to request an activity tracking report uses an inflexible date range (first day of the current month until current day of the current month).

Please change it so that a date range can be specified in the activity report request. Something like:

activity all start_day {STARTDATE} end_day {ENDDATE}

Exact command and date format are flexible, but should be human-readable (e.g. 2019-05-28). Day is the most granular you need to get, no need to mess with times.

Bot should not post a series reminder if the series tag in the title was "Final"

Obviously if someone posts a story with the tag [Final], they are done writing the series. In these cases, NoSleepAutoBot should not post the series reminder comment on the post.

NoSleepAutoBot should add a comment to a post if it's a series to remind about updates

Context

Users of NoSleep like series posts and want to be reminded to go back to see if there are updates to popular stories and series. We used to have a third-party-maintained bot to remind people to check for more entries to a series, but that is gone now. We'd like to use RemindMeBot's PMing functionality to allow people to set reminders for series posts in an easy and contained way.

Implementation

If a post passes all of NoSleepAutoBot's other checks, and NoSleepAutoBot has detected and flaired it as a series, NoSleepAutoBot should also post a stickied comment in the thread with the following text:

It looks like there may be more to this story. Click here to get a reminder to check back later.

"Click here" should be the following link:

https://www.reddit.com/message/compose?to=RemindMeBot&subject=Reminder&message=[STORY LINK HERE] %0A%0ANOTE%3A Don't forget to add the time options after the command below, such as "1 Day" or "48 Hours".%0A%0ARemindMe!

Where is says "STORY LINK HERE" (retain the brackets, as those are important to the RemindMeBot), you should substitute the story's link.

It would also be great if NoSleepAutoBot could look back over the most recent hour's worth of stories and look for anything that was tagged "Series" by a user (manually) and also post a comment in those threads.

Support tags of the form (update 1), {update 10}, etc.

A post with the tag (update 2) was removed by the bot (and later re-approved by moderators). Support this tag form as well.

Minor series reminder text change

Please change series message in the post_series_reminder method to read:

series_message = "It looks like there may be more to this story. Click [here]({}) to get a reminder to check back later. Comment replies will be ignored by me."

Increase speed of submission polling/checking to beat eager moderation

Since the bot's inception, it's been designed to do periodic polling via an interval (right now, 2 minutes in production) but this does have some major shortcomings:

It does periodic submission polling by using the subreddit/search endpoint, and Reddit's search indexing can be quite slow (ranging anywhere from immediate to nearly 10 minutes from observation)
Similarly, it uses subreddit/search to find a user's previous posts, and this search has the same indexing speed issues as submissions
Moderators who are very eager in moderation and immediately see posts will then do things ahead of the bot, which can lead to confusion when a post is perhaps approved but then immediately removed by the bot because it's invalid for whatever reason.

The following things would need to be done for this:

Add functionality to use the new() listing generator for subreddits
Cache subredditor information so that the bot can quickly see when a subredditor last posted

Invalid word allowed in NoSleep post title

This post allowed the word "expanded" to be listed in the parentheses of a title, which should not be allowed:

https://www.reddit.com/r/nosleep/comments/vysa15/when_the_world_fell_away_expanded/

See https://github.com/sofaworks/nosleepautobot/blob/master/bot.py#L202

Bot should remove any post with "NSFW" in the title

NoSleep does not allow "NSFW" to be part of a post's title. If "NSFW" appears in the title (either in brackets or outside of brackets), the bot should remove the post and comment with the following text:

[[YOUR POST]] has been removed from /r/NoSleep because it appears to include "NSFW" in the title. NoSleep does not allow "NSFW" to be stated in the titles of stories. Stories can be marked "NSFW" after they are posted by clicking "NSFW" or "Add Trigger Warning" (depending on your UI) at the bottom of the post.

Because reddit does not allow you to edit titles, you will have to repost your story with a corrected title.

I am a bot, blah, blah, blah...

	last_post_id=submission.name,
	last_post_time=submission.created_utc
	)
	ttl = tl - int(diff)
	logger.info("Caching activity", info=activity, ttl=ttl)
	self.activity_db.persist(activity.author, activity, ttl=ttl)
	else:
	logger.info("Not caching activity for post outside timelimit",
	author=submission.author.name,
	subreddit=submission.subreddit.display_name,
	id=submission.name)

	# TODO this is duplicated in process_time_limit_message
	most_recent = min(
	self.reddit.get_redditor_posts(post.author),
	key=attrgetter("created_utc"),
	default=None
	)

	# TODO this is duplicated in reject_submission_by_timelimit
	most_recent = min(
	self.reddit.get_redditor_posts(post.author),
	key=attrgetter("created_utc"),
	default=None
	)

sofaworks / nosleepautobot Goto Github PK

nosleepautobot's People

Contributors

Stargazers

Watchers

Forkers

nosleepautobot's Issues

Note

Reproduction Steps

Description

Context

Implementation

Recommend Projects

Recommend Topics

Recommend Org