samliew / se-electionbot Goto Github PK

View Code? Open in Web Editor NEW

7.0 5.0 6.0 5.19 MB

A Stack Exchange/Stack Overflow election chatbot written in Node.js to handle FAQs in an election chat room

License: MIT License

JavaScript 94.04% HTML 4.20% Procfile 0.01% TypeScript 0.54% Shell 0.11% SCSS 1.10%

chat chatbot stackoverflow stackexchange elections bot nodejs

se-electionbot's People

Contributors

Stargazers

Watchers

Forkers

thatryanperson ryanmentley oaphi double-beep ndattani makyen

se-electionbot's Issues

Eligible visitors to the election using Caucus badge

We can also count how many eligible users have visited the election using the Caucus badge.

getSiteUserIdFromChatStackExchangeId network user id lookup is broken

getSiteUserIdFromChatStackExchangeId function relies on scraping the profile page of the user to get the network user id in case it did not get one from the "linked site". Unfortunately, it is now broken (as indicated by the failing CI) - I highly suspect this is due to the recent change to profile pages that SE made (see MSE post).

I already know how to fix it and will make a PR a bit later, but opening the issue for documentation purposes 😇

Bot needs to either change rooms (leave old), or restart process if after rescraping an election chat room is found

(Non-debug/Production mode only)

As election chat rooms are manually created by a CM and then linked on the election page, the bot may have been in a default room and needs to switch to the official election chat room.

To do this we have these options:

Leave the current room, and join a different room
Restart the Heroku dyno (or exit the process, or throw a fatal error)
Change an environment variable via the Heroku API

Message guards should cross-reference all other guard matches

Currently, guard unit tests are added and cross-referenced manually. We should auto-generate cross-referencing tests instead.

Solving this will also allow us to move forward with removing the rest of the if...else statements in favor of a reducer.

When candidate is withdrawn involuntarily during an election, bot doesn't add the user to list of withdrawn

Debug info:

https://tex.stackexchange.com/election/2

user withdrawn https://tex.stackexchange.com/election/2#post-620144

No update

Withdrawn status was announced in election room only #148, but had wrong link.

Undefined behaviour in some cases when calculating candidate score

There are some cases that the candidate score does not handle properly right now that we need to address:

There was an error 503 response from the API (it actually seems like with the upgraded network id getter the risk of hitting API throttle got higher) which caused the bot to respond with this (lack of whitespace is on me, though :( - fixed by cf55705):

RESPONSE Wow! You have a maximum candidate score of 40!Alas, the nomination period is over Hope to see your candidature next election!

The response has been building up correctly - the real score should've been 34/40 with 6 badges missing and rep maxed out:

Missing Badges: Electorate,Marshal,Reviewer,Steward,Investor,Copy Editor

Not sure of the root cause yet, will investigate

Lacking an account on the site causes the bot to respond with a calculation error. That's kind of expected, but we could look into returning a user-friendly message in this case (tested on me by accident - forgot that I do not have an account on Academia). Not sure if it is easy to distinguish between an API failure and a missing account, though.
We should also guard against null from the utility fetching network accounts ( 312c1ff fixes this) - it does not help us much if an error happens at this point, but it reduces log cluttering and allows the utility to gracefully exit

Drop Babel

Babel requires significant amount of setup for little benefit nowadays, especially given that we already harness a lot of TypeScript's features either way. I propose considering dropping Babel in favor of allowJS set to true (no extra setup, it works like this already now) and using what TS emits directly.

This discussion is inspired by an upcoming PR I'll link soon 😇

Get Heroku free dyno hours via CLI

Display free dyno hours on dashboard

https://devcenter.heroku.com/articles/free-dyno-hours

Bot needs to reschedule cron jobs if election schedule (dates) changes halfway

Election dates could change halfway (i.e.: when not enough candidates and the nomination period is extended).

The bot will need to cancel existing cron jobs and reschedule them, otherwise we will need to manually restart the bot/instance.

This feature is necessary if the bot gains automation abilities in future #61 as it may be more difficult to restart the process/instance(?)

Investigate and set up persistent database

For #60 and #61, we will need to store the scraped data and bot instances persistently, which can be accessed by all instances/processes.

Possible suggestions:

https://elements.heroku.com/addons/heroku-postgresql (Heroku recommended, documentation)
https://cloud.google.com/free/docs/gcp-free-tier#free-tier

Markdown to HTML conversion incorrect strikethrough

The markdown to HTML conversion makes this table header a strikethrough when displayed in the dashboard

Return a dedicated error page on invalid election data

Currently, if the election data ends up invalid, depending on the occurrence of the check, the server is either not started up at all (if the failure happens early or crashes with a 500 on subsequent connects if the rescraper lead to an invalid election state. Latest to date downtime of the bot resulted from Stack Exchange breaking access to nomination pages - we want to avoid crashing if that happens as we can still serve some useful info in the meantime and periodically rescrape to ensure we are up and running as soon as the issue is fixed.

Already working on, documenting to keep track of the updates.

There are two sub-issues that need to be solved as part of this issue:

The server is started way after Election#validate method is called, and the latter exits early
The ScheduledAnnouncement#rescrape method ends up breaking the bot if the election state is invalid

Test: Defaulting to Chat.SE for network sites with non-SE domains - failing when SO election

A test is failing on a certain condition:

  1) Election
       getters
         chatDomain
           should default to stackexchange.com for network sites with non-SE domains:

      AssertionError: expected 'stackoverflow.com' to equal 'stackexchange.com'
      + expected - actual

      -stackoverflow.com
      +stackexchange.com

Config variables when test fails:

CHAT_DOMAIN=stackoverflow.com
CHAT_ROOM_ID=190503
DEBUG=true
ELECTION_URL=https://stackoverflow.com/election/13

Removing the above values for CHAT_DOMAIN and CHAT_ROOM_ID passes the test.

Investigate whether running concurrent elections instances under a single account may cause throttling

Should this bot be able to run in multiple election rooms, will this cause an issue if it's all on the same Stack Exchange chat account?

Should we need to create mutiple accounts for the bot to utilise in chat?

If possible we should be able to run everything via a single account, which means keeping track of when the last message was sent and then back-off.

Cancelled election is displayed as ended

For some reason, the cancelled French Language election is showing up as ended, not cancelled. Also, there is a problem with the ballot link that surfaces because of this bug. On it.

Preliminary investigation: notice parsing issue

Fallback to API for finding election announcements

Given that https://stackexchange.com/filters/421979/all-elections is a relatively new addition, it only goes back as far as 2021, so the scraping we utilize can only go so far. It is possible, though, to fallback to the API and fetch all per-site Meta posts tagged [election] before processing instead. Most likely will result in only 1-2 additional API calls on startup and during election hopping.

Improve display of oneboxed messages

Onexboxed messages currently get only a very simple display that needs some love - backlogging as a low-priority item to address.

Add a unit test to check if the dashboard page can be loaded

Bot should rescrape election page for winners during election end

Right now, the bot only checks elections for changes every scrapeIntervalMins minutes (default: 5).

This means when an election ends, the bot could take an additional 5 minutes to announce the winners.

This should be automated to rescrape 30 seconds (or a period allowing for tallying of votes) following the end of the election since SO has automated the elections.

This may complicate #60

Fix election history numbers in the dashboard

Create a separate program/process to handle bot instances for each election on the network

Following an implementation of #60 (Create a separate program/process to scrape election status on all network sites),

We could have another "main" process that spins up more instances/processes for each election when an election is detected, and terminates them automatically N days after it ends or is cancelled.

This main process then could also "own" the development chatroom test instance if started by a dev.

We then may need to create a dev-only UI/API to manually start/stop instances, as well as override variables for each election instance.

When election has ended, do not scrape withdrawn candidates from chat transcript on startup

Because bot could be in the development chat room

Automatically detect election permalink/election number

Currently the election permalink/number is manually found by trial and error, ever since the past election history page is no longer accessible during an active election.

i.e. https://stackoverflow.com/election/12

Currently, this is then inserted into the bot's environment variable to direct the bot to scrape the right page.

We need a method to automatically find the current active election's permalink/number if the bot scrapes /election for each site and finds an active election, for automation of the bot #60 and #61.

Chat transcript should be fetched until TRANSCRIPT_SIZE is reached

Currently, fetchChatTranscript only fetches messages for the last day. It is a known limitation, however, we can (and should) expand on it to keep fetching transcripts until TRANSCRIPT_SIZE threshold is met.

i18n for scraping of non-English sites

Scraping of election page & regex matching/testing of text may not work as intended for elections on non-English sites:
date formats will be different, some text may be RTL, unicode, etc.

Site examples:

https://es.stackoverflow.com/election/2
https://pt.stackoverflow.com/election/4
https://ru.stackoverflow.com/election/4
https://rus.stackexchange.com/election/2 (cancelled election)
https://ja.stackoverflow.com/election (no elections yet)

Nomination period extension before cancellation

If the election is to be extended for a week due to insufficient candidates, it should not say cancelled.

https://chat.stackexchange.com/transcript/message/61237556#61237556

Bot instance should only listen to events for the room it is connected to

Currently, when bot is in multiple rooms, a triggering message event in any of the rooms will cause the bot to respond in the room that it's listening to.

We must be careful about this because the bot can't leave rooms by itself, so if we are testing in another room using a local instance, the main instance will still get those events and causing it to also respond in the election chat room (or all instances).

We need to get the event's room Id, and ignore it if it's not from the same room that it's listening/connected to.

A temporary fix would be to kick bot from all except one room. However this will raise automatic flags for privileged users (mods) in other rooms. Another option would be to login as that account and then leave the rooms.

Add an environment variable to not autoscale to Hobby

For some elections, autoscaling to Hobby dynos simply wastes resources due to low or non-existent activity in the election chat room.

Currently, the only way to prevent autoscaling is to launch the bot in debug mode, but it has a disadvantage of posting a debug message every time the bot restarts. It would not be an issue if Heroku wouldn't have dyno cycling that happens at least once 24 hours (expected behavior), causing the bot to continuously post the debug message.

Move some bot moderator/privileged commands to dev-only

With the addition of some developer-only testing commands to the privileged triggers, mods on the SE network may accidentally trigger an unwanted action, e.g.: time-travelling during an active election.

Commands that may require moving to a separate dev-only menu:

timetravel
debug
test cron
get cron
die (currently doesn't do any good since Heroku restarts the bot instance)

We could continue using the current env variable ADMIN_IDS for dev IDs.

Create a separate program/process to scrape election status on all network sites

Currently, the setting up of this bot is done manually by changing environment vars in Heroku:

Set CHAT_DOMAIN, if election is on another site on the Stack Exchange network
Find election chat room, or default chat room for that site, set CHAT_ROOM_ID to that
Set ELECTION_PAGE_URL to the active election page
Set ELECTION_QA_URL, to override if the bot couldn't detect the correct site meta question link

If we can create a program to scrape all network sites' election pages, and store the status in a persistent database, we could potentially use this to automatically spin up a new instance for that site if a scheduled election is detected (usually the announcement on per-site meta comes later, although they are automating the election process - so look out for changes).

Maybe we don't need to find/scrape default chatroom urls for each site, since that is unlikely to change. The list can be fetched from https://api.stackexchange.com/docs/sites#filter=default&run=true, and the room with the most users & messages should be the default (e.g.: https://chat.stackexchange.com/?tab=site&sort=people&host=serverfault.com).

But it would be nice for the bot to detect the election chat room if there is one, using the following methods (in order):

chat link from active election post
go to https://chat.stackexchange.com/?tab=site&sort=people&host=serverfault.com and find a room with the word "election" in the room name with the most users
last resort, a dev should be able to set this value

Investigate whether it is possible to detect a pro-tem election

and display it on the bot dashboard

Consider upgrading to Heroku 22 stack

Heroku is slowly starting to nudge towards upgrading from 20 to 22. To avoid reaching EoL of stack 20, we need to consider upgrading. So far, the upgrade seems harmless, but I think we should defer until there are little to no elections running before experimenting with upgrading.

See https://devcenter.heroku.com/articles/heroku-22-stack for reference.

Ensure PRs from outside contributors do not fail the build

As per what transpired during the approval of the PR #147 , we need to tweak the GitHub workflow action to avoid failing the build with no options. The failure is related to secrets not being passed to normal pull_request workflows as per the docs:

With the exception of GITHUB_TOKEN, secrets are not passed to the runner when a workflow is triggered from a forked repository. The permissions for the GITHUB_TOKEN in forked repositories is read-only.

The easiest way to solve this is to switch to pull_request_target but comes with its own set of issues, see
Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests.

The most promising workflow is to use types: [labeled] that runs upon changing labels on the PR
and create a "safe" (or similar) label as external PRs have to be closely reviewed either way.

Type heroku-client package

This is a tracking issue to keep tabs on the PR submitted to DefinitelyTyped to add definitions for the package:

DefinitelyTyped/DefinitelyTyped#55984

Ensure nomination posts are not scraped for chat room links

On the 12th Stack Overflow election page, there is no link to the official chat room. This causes the scraper to try to find the room's URL in nomination posts leading to a semi-random incorrect room being picked up (Charcoal HQ, SOCVR, etc).

get maintainerChatIds should default to empty array if config var not set

SERVER - failed to render home route: TypeError: Cannot read properties of undefined (reading 'stackexchange.com')
at BotConfig.get maintainerChatIds [as maintainerChatIds] (file:///app/dist/bot/config.js:115:22)

Also reduce dependency on this in case it is not set.

Untie contributor list from GitHub "Insights" as the page might take a very long time to load

Following up on this comment, let's switch the command from relying on GitHub's contributor list to just using what's available in the contributors array from the package.json file.

Incorrect revision link when candidate withdraws

See https://chat.stackexchange.com/messages/59451396/history

Use a dedicated randomizer package

As per discussion, we are moving away from the pseudo-random Math.random() to a dedicated package for getting random values. This is a tracking issue. Current candidates are:

Consider storing a separate list of adminIds and devIds for different chat domains

Currently envars ADMIN_IDS and DEV_IDS are for our accounts on Chat.SO.

Our ids are different on Chat.SE and Chat.Meta.SE, so it makes sense to check against a different set when the bot is in these domains.

However unlikely, we do not know if a fourth chat server may be spun up in the future, so consideration may need to be made for that as well.

Also need to consider support should the bot spawn child processes for different chat servers/rooms (#61)

E.g.: my Chat.SO id is 584192, while on Chat.SE is 86504

I've made you an RO of the network test room: https://chat.stackexchange.com/rooms/info/92073/?tab=access

Investigate rescraper not triggering more than once

this.start(); doesn't queue another scrape.

The bot scrapes election once on startup, one more scrape gets triggered 5 minutes after, and then no further ones occur.

Do not show dev-only commands to mods; remove or split to separate dev menu

Dev-only commands should not be shown to mods when using @ElectionBot commands, otherwise they will get confused when a command doesn't get a response.

I've already removed the keyword help from displaying the mod-only menu, as I intend this to display the user help menu.

After time travelling, need to reset bot flags

Note to self:

se-electionbot/src/commands/commands.js

Lines 61 to 62 in 1e592d0

 config.flags.announcedWinners = false; 

 config.flags.saidElectionEndingSoon = false;

On startup, bot scrapes transcript and detects that the winners have already been announced.

Then we timetravel back to the end of election and try to trigger announce winners command.

This may be setting a local copy of BotConfig, so we are unable to trigger announce winners again if the election has already ended.

Pro-tem: which mods are running/have not nominated yet

If it's a graduation election, the appointed mods have to run if they wish to continue being a mod.

We could add a new command to ask which ones are running for, or have not submitted their nomination (yet - if nomination phase, or stepping down if past the nomination phase).

https://chat.stackexchange.com/transcript/message/61292331#61292331

Daily Greet should not occur after election has ended

This is because we may not be using the election chat room (site default chat room instead), and will cause noise.

Add a way to bulk-update environment variables on all bot instances

To resolve #70 in a way that does not cause us to manually go through each and every instance, we need a way to bulk-update environment variables of all heroku instances. Ideas:

shell script (the simplest approach)
enhancement for the /config server endpoint.

Remove @-mentions of the bot from anywhere in the message for the purposes of rule matching

Originally reported: https://chat.stackexchange.com/transcript/message/60660156#60660156

Currently, the removal of the @-mention of the bot only happens at the start of the string.
However, it is not guaranteed that users always follow the @-mention first convention.
This sometimes leads to confusing matches as some of the regular expressions used are more permissive than others.

Badge IDs on each site on the network are different breaking candidate score calculation

E.g.: IDs of badges with the same name are different between the sites.

Stack Overflow named badges
https://api.stackexchange.com/docs/badges-by-name#pagesize=100&order=desc&sort=rank&filter=default&site=stackoverflow&run=true

Academia named badges
https://api.stackexchange.com/docs/badges-by-name#pagesize=100&order=desc&sort=rank&filter=default&site=academia&run=true

On bot start, we should call the API once to fetch named badges and update IDs of badges in electionBadges.

Weird cancellation text in HTML

Something weird happened 2 days after Tor has been cancelled - both dev and prod instances finally announced cancellations (which is strange by itself), but not only that, the text contained HTML for some reason: https://chat.stackexchange.com/transcript/message/61312901#61312901

Apparently, this is because SE uses a different notice template when the election is cancelled in the nomination phase and is not a pro tempore election. Compare the notice to Joomla's election, which is scraped correctly:

And this is what we got on French, for example:

Add a response for BLT file format questions

When the election ends, a ballot file becomes available - we should add responses to what it is and how to find it (even if in a form of an annotated link to OpaVote's website). Mostly a note to not forget to implement it until the end of the election phase (to have some time in advance - till October 22nd).

	config.flags.announcedWinners = false;
	config.flags.saidElectionEndingSoon = false;