Giter Club home page Giter Club logo

nosleepautobot's People

Contributors

dependabot[bot] avatar leikahing avatar sofaassassin avatar watchful1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nosleepautobot's Issues

Post a reply for bad tags rather than send a PM

Don't clutter up the modmail with a bunch of tag removal messages. Since posts with bad titles are already deleted anyway, just send a reply within the post like the bot does for posts that violate the time limit rule.

Long paragraph checker edge cases

The recent paragraph checker feature implemented in #6 does not account for the following edge case.

When a block of text does line breaks like this, it is a problem.

I am text.\n \nI am more text.

The long paragraph checker splits on consecutive \n characters, but if a user for whatever reason has spaces between their newlines, it throws the checker off.

Enhancements to /moderation/activity_tracker.py

We need a post count-specific report that can be generated on request which includes total number of posts made to /r/nosleep, number of posts approved by any/all mods, and number of posts removed by any/all mods within a specific time period.

Sample Input (this is just an example and can be adjusted as makes sense):

posts --startdate 2019-08-19 --enddate 2019-08-21

Note: if dates are the same, only that one day is being requested. Or you can make up a different way to only request a single day of data, I don't care. But single day requests should be allowed.

Generates a table like:

Date Total Posts to NoSleep Approved Posts Removed Posts
2019-08-19 50 25 25
2019-08-20 70 40 30
2019-08-21 60 30 30

Bot is flaky on enforcing 24-hour posting rule

User /u/CigarettesAndSongs triggered a bug/mishap with the bot wherein they posted the same story (same title and text) twice in a 45 minute period and the bot did not remove the second, duplicate posting.

The stories were:

The bot should always remove stories if the user has an active story (non-removed) on /r/NoSleep within the last 24-hours.

Two spaces at the end of a line followed by a single return should be considered valid formatting for a post

We recently ran into a story on /r/NoSleep that utilized the following format:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc consequat consequat elit. Donec ut dignissim tellus.
Integer vehicula, sem ut condimentum porta, lacus mauris venenatis elit, eget cursus lectus ante sed leo. Mauris at leo porta, semper diam eget, viverra nulla. Praesent sit amet felis nisl. Etiam quis luctus diam. Ut tempor, quam a pellentesque dictum, mauris metus blandit nibh, et aliquam enim quam at ipsum. Pellentesque nec tellus suscipit nisi gravida fringilla id at augue. Nam auctor neque sed luctus aliquam.
Donec quis arcu at ex consequat accumsan. Proin eu fermentum tortor. Ut malesuada lorem sed tortor sagittis, eu pharetra augue egestas. Nunc aliquam fermentum porta.
Aliquam volutpat erat nec turpis cursus rhoncus. Quisque tempor sapien a ipsum hendrerit, non tristique velit bibendum. Cras mattis luctus nulla, sed ultrices velit. Praesent a magna nec tellus condimentum aliquam.
Nulla nibh turpis, pharetra quis semper sed, posuere vel quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Duis ultricies est in metus tincidunt euismod.

This format is: lines or paragraphs which have two spaces at the end of them, followed by a single return. Since the bot doesn't recognize this as a valid paragraph break style, if the story is more than 350 words, the bot will reject it for formatting issues.

Although this formatting isn't a common case and doesn't conform to the "double-return to create new paragraphs" format, the mods have deemed it acceptable. The bot should not reject stories formatted like this for formatting issues.

Move off Heroku

Heroku has finally decided to end their free tier, so the bot will need to be migrated to a different service since it certainly doesn't need the resources of their $7 tier.

Add 'Days Active' concept to mod activity tools

For both the weekly report sent to mods (/tools/track_mod_activity.py) and the on-demand activity reports (/moderation/activity_tracker.py), can we additionally have 'days active' reported?

'Days Active' is the number of days within the requested time window that the moderator did any sort of activity (even so much as one removal/approval of a post or comment).

Now the weekly PM to mods should read:

Post Approvals:
Post Rejections:
Comment Approvals:
Comment Rejections:
Days Active:

And the on-demand activity report should include an additional column for 'Days Active'.

Bot removes stories when there are 4 or more spaces on a blank line

The bot incorrectly identified this story as a formatting issue and removed it for "4 or more spaces at the beginning of a line".

On closer inspection (i.e. checking the source of the story), you can see that many of the blank lines between paragraphs have a big block of blank spaces/tabs on them.

Example clip from the story:

I did find something just as weird, though. As I was heading back toward the shop I came in through, I saw an open door that I hadn't seen before. Behind it was a staircase leading down. I figured it must just lead to the boiler or something, but since I was here, why not explore the whole place?

The stairs went a lot farther down than I expected. I'd probably descended five floors or more before I finally reached the bottom.

Lucene search syntax doesn't work for users with hyphenated names during recent submission search

Submissions from users with names like -Bat-Country or Strawberry-Sunrise result in incorrect search results because Reddit's Lucene indexes aren't searchable using - characters (probably using StandardAnalyzer in Lucene).

Searching for submissions by users should be done using cloudsearch syntax.

Repro case example:

#!/usr/bin/env python
import praw

reddit = praw.Reddit()
subreddit = reddit.subreddit('nosleep')
submissions = list(subreddit.search("author:-Bat-Country-", time_filter='day', syntax='lucene', sort='new'))

The above will return []

Running the search with the following works:

submissions = list(subreddit.search("author:-Bat-Country-", time_filter='day', syntax='cloudsearch', sort='new'))

Set TTLs on indexes

Since the bot uses the Walrus Redis wrapper, it also generates several indexes on the AutoBotSubmission model objects that are saved to the database. The indexes are saved to Redis as SET keys, but are never removed.

Add users to approved users list in /moderation/activity_tracker.py

NoSleep now has moderator team leads and we’d like them to be able to use the Activity Tracker portion of the bot’s functionality. Please add the following users as approved users so that they can use bot PMs to track moderator activity:

  • /u/flard
  • /u/onyxoctopus
  • /u/desidarling

Rewritten bot does not properly enforce post time limit

This is a tracking issue because this was fixed in source. The problem was that cache_activity_maybe stored the name of a submission rather than the id, and is_post_deleted would return True in these situations because the praw Submission model requires a construction using just the base36 id, and not something like t3_base36id.

last_post_id=submission.name,
last_post_time=submission.created_utc
)
ttl = tl - int(diff)
logger.info("Caching activity", info=activity, ttl=ttl)
self.activity_db.persist(activity.author, activity, ttl=ttl)
else:
logger.info("Not caching activity for post outside timelimit",
author=submission.author.name,
subreddit=submission.subreddit.display_name,
id=submission.name)

Disable 24-Hour Rule For "Beyond Belief" Event

Flip the 24-Hour Rule flag in Heroku sometime on the evening of March 1st in preparation for "Beyond Belief" event which will run from March 2nd until March 5th. The subreddit will be closed for entries prior to that, so the exact time doesn't matter as long as it's done before 11:59PM.

Search for "Redditor's Most Recent Posts" being performed twice

Issue has existed since the original write of the bot, and was carried over into the rewrite.

When processing new posts, a search is done on the author's posts to the subreddit, with a time_filter=day to see if they have posted within the last day already. This search is being done twice - once in a method to check if they have posted already, and once again in the method that generates the comment that tells them they're not allowed to post in the time limit.

# TODO this is duplicated in process_time_limit_message
most_recent = min(
self.reddit.get_redditor_posts(post.author),
key=attrgetter("created_utc"),
default=None
)

# TODO this is duplicated in reject_submission_by_timelimit
most_recent = min(
self.reddit.get_redditor_posts(post.author),
key=attrgetter("created_utc"),
default=None
)

Codeblock checking corner cases

Right now, the code to determine if codeblocks exist in the code is to check if a paragraph starts with 4 spaces.

if some_string.startswith('    '):

However, posts with hard tab \t characters leading off paragraphs will also generate code blocks.

Additionally, this means that leading spaces followed by tabs can also trigger a code block to be generated.

Bot is not posting series reminder comment if post is tagged "NSFW"

It looks like the bot isn't posting the "series reminder" comment on series posts if they've been marked "NSFW" through reddit's interface, even if they are properly flaired as "Series" or have allowable series tags in their title.

Examples:

https://www.reddit.com/r/nosleep/comments/bh3jyv/living_in_an_rv_park_pt_2/
https://www.reddit.com/r/nosleep/comments/bgx5ut/hell_is_other_rabbits_father/
https://www.reddit.com/r/nosleep/comments/bgd2nb/weird_shit_ive_seen_in_the_army_fob_scorpion/

Formatting for AutoBot removals/re-approval requests sent to modmail is ugly

Not sure when this happened or why, but text formatting for the removal messages (the ones users send to modmail to request re-approvals) is sometimes ugly. See here:

image

I'm not sure if this is something that can be fixed on our side, or if it's something that has to do specifically with the client a user is submitting the message through...

Provide option to filter out 'new()' posts that are older

The bot only caches post data for post_time_limit * 2, and depending on how many posts are making it into the subreddit, it's very possible for new() to return old posts in the initial request, which means that the bot might end up double-processing posts.

Make the bot remove "unreadable" wall-of-text posts

One of /r/NoSeep's rules is that all story posts must be formatted in a readable fashion. One of the biggest issues we have is people posting "walls of text" -- entire stories without paragraph breaks, like this (and worse -- that's actually a really mild example).

We'd like to have Noxbot automatically remove posts that have any paragraphs of text with more than 350 words in them.

When a post is removed, a PM should be send (from the subreddit) to the user with the following text:

[[YOUR POST]] has been temporarily removed from /r/NoSleep because it appears to have formatting issues which make it unreadable. In this case, you have one or more paragraphs with more than 350 words. Please break up your story into smaller paragraphs. You can create paragraphs by pressing "Enter" twice at the end of a line.

Once you have fixed the formatting, please respond to this PM for re-approval. The re-approval process is manual, so send a single request only. Multiple requests do not mean faster approval; in fact they will clog the modqueue and result in re-approvals taking even more time.

Please test this change before implementation and alert /u/Himekat so she can watch the bot's behavior on /r/NoSleep.

PS. If it would make it easier to do this by character count instead of word count, I can provide a character count that would be appropriate.

Make bot remove "unreadable" code block posts

A common formatting issue we see in /r/NoSleep is for people to have a tab or series of spaces at the beginning of their paragraphs (like you might in a traditional word processor or book). Unfortunately, reddit renders four spaces at the beginning of a line as a block of code, which almost 100% of the time makes the story unreadable.

Please have the bot check the beginning of every paragraph/line in a story. If any paragraph/line starts with four or more spaces, the story should be removed (as it is more than likely unreadable).

When a post is removed, a PM should be send (from the subreddit) to the user with the following text:

[[YOUR POST]] has been temporarily removed from /r/NoSleep because it appears to have formatting issues which make it unreadable. In this case, you have one or more paragraphs or lines which begin with four or more spaces.

On reddit, lines beginning with four or more spaces are treated as blocks of code and make your story unreadable. Please remove spaces at the beginning of paragraphs/lines. You can create paragraphs by pressing "Enter" twice at the end of a line if you haven't already done so.

Once you have fixed the formatting, please respond to this PM for re-approval. The re-approval process is manual, so send a single request only. Multiple requests do not mean faster approval; in fact they will clog the modqueue and result in re-approvals taking even more time.

Please test this change before implementation and alert /u/Himekat so she can watch the bot's behavior on /r/NoSleep.

Change formatting removal from PM to comment on post

When the bot removes a post for formatting issues, instead of sending a PM about it (from the subreddit), please have the bot comment on the story directly. In addition to the current stuff the message says, please have a link at the bottom that says:

Once you have fixed your formatting issues, please [[click here]] to request re-approval. The re-approval process is manual, so send a single request only. Multiple requests do not mean faster approval; in fact they will clog the modqueue and result in re-approvals taking even more time.

The "click here" part should either send a PM to the subreddit with the below message/information, or open a message window pre-populated with the below information so that the user can manually send it:

[[MY POST]] to /r/NoSleep was flagged for formatting issues. I have fixed those issues and am now requesting re-approval.

Note to moderation team: if this story is eligible for re-approval, remember to remove the bot's comment from it.

Bot doesn't send series PM after a non-tagged, processed post gets a "Series" flair`

Note

This is a tracking issue because this has been fixed with 0b7094f

Reproduction Steps

  1. Make a post meeting the following criteria:
    • Title does not contain tags (because all tags identify a story as being a 'series'
    • Do not flair the post as Series (or whatever series flair it is)
  2. Run bot to analyze the post and have it cache the analysis
  3. Flair the post Series
  4. Re-run bot on post
  5. Bot identifies the post as having been tagged Series after the fact
  6. Bot doesn't doesn't PM user the "Series PM"

Description

This is a bug that has probably existed since the creation of the bot, but I noticed that it happened while testing the Major Overhaul PR #107.

When an already-processed post was rechecked during subsequent runs to figure out if it had been flaired as a Series, the code did this:

if obj:
    # Do processing on previous submissions to see if we need to add the series message
    # if we saw this before and it's not a series but then later flaired as one, send
    # the message
    if not obj.is_series and (p.link_flair_text == 'Series'):
        obj.is_series = True
        self.post_series_reminder(p)
        obj.save()

So it would post the comment like "Oh, there may be more to this story," but it wouldn't send the series PM or mark it as having been sent when it updated the object in cache.

/tools/track_mod_activity.py enhancement

Our head mod, cmd102, would like the bot to be able to help her monitor moderator activity.

She should be able to send a PM to /u/NoSleepAutoBot with a command and receive back moderator activity for the entire month (First day of the current month to current day of month, like the current mod activity script),

If she sends the command activity all, she should get back every NoSleep moderators' post approvals, post removals, comment approvals, and comment removals. The format should be readable and I wouldn't spend too much time on it, although bonus points if it were a well-laid-out grid. Otherwise, just a big long list is fine.

If she sends the command activity {USERNAME}, she should get back the activity for that one moderator only. Bonus points if you support comma-separated lists for the purposes of submitting multiple usernames, but not entirely necessary.

Additional notes:

  • Right now, only /u/cmd102 and /u/Himekat (for testing purposes) should be able to receive a response to these commands. The bot should ignore requests from anyone else. But it might make sense to make it an easily-editable whitelist in case we need to add more people in the future.

Change moderation/activity_tracker.py to have on-demand date ranges

Currently, PMing the bot to request an activity tracking report uses an inflexible date range (first day of the current month until current day of the current month).

Please change it so that a date range can be specified in the activity report request. Something like:

activity all start_day {STARTDATE} end_day {ENDDATE}

Exact command and date format are flexible, but should be human-readable (e.g. 2019-05-28). Day is the most granular you need to get, no need to mess with times.

NoSleepAutoBot should add a comment to a post if it's a series to remind about updates

Context

Users of NoSleep like series posts and want to be reminded to go back to see if there are updates to popular stories and series. We used to have a third-party-maintained bot to remind people to check for more entries to a series, but that is gone now. We'd like to use RemindMeBot's PMing functionality to allow people to set reminders for series posts in an easy and contained way.

Implementation

If a post passes all of NoSleepAutoBot's other checks, and NoSleepAutoBot has detected and flaired it as a series, NoSleepAutoBot should also post a stickied comment in the thread with the following text:

It looks like there may be more to this story. Click here to get a reminder to check back later.

"Click here" should be the following link:

Where is says "STORY LINK HERE" (retain the brackets, as those are important to the RemindMeBot), you should substitute the story's link.

It would also be great if NoSleepAutoBot could look back over the most recent hour's worth of stories and look for anything that was tagged "Series" by a user (manually) and also post a comment in those threads.

Minor series reminder text change

Please change series message in the post_series_reminder method to read:

series_message = "It looks like there may be more to this story. Click [here]({}) to get a reminder to check back later. Comment replies will be ignored by me."

Increase speed of submission polling/checking to beat eager moderation

Since the bot's inception, it's been designed to do periodic polling via an interval (right now, 2 minutes in production) but this does have some major shortcomings:

  1. It does periodic submission polling by using the subreddit/search endpoint, and Reddit's search indexing can be quite slow (ranging anywhere from immediate to nearly 10 minutes from observation)
  2. Similarly, it uses subreddit/search to find a user's previous posts, and this search has the same indexing speed issues as submissions
  3. Moderators who are very eager in moderation and immediately see posts will then do things ahead of the bot, which can lead to confusion when a post is perhaps approved but then immediately removed by the bot because it's invalid for whatever reason.

The following things would need to be done for this:

  1. Add functionality to use the new() listing generator for subreddits
  2. Cache subredditor information so that the bot can quickly see when a subredditor last posted

Bot should remove any post with "NSFW" in the title

NoSleep does not allow "NSFW" to be part of a post's title. If "NSFW" appears in the title (either in brackets or outside of brackets), the bot should remove the post and comment with the following text:

[[YOUR POST]] has been removed from /r/NoSleep because it appears to include "NSFW" in the title. NoSleep does not allow "NSFW" to be stated in the titles of stories. Stories can be marked "NSFW" after they are posted by clicking "NSFW" or "Add Trigger Warning" (depending on your UI) at the bottom of the post.

Because reddit does not allow you to edit titles, you will have to repost your story with a corrected title.

I am a bot, blah, blah, blah...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.