Giter Club home page Giter Club logo

robot's Introduction

Robot

Robot is a bot for Twitch.TV IRC that learns from people and responds to them with things that it has learned.

Tools for broadcasters and mods

Robot has a number of features for managing activity level and knowledge. Most are automatic: for example, by default, the bot is configured not to send more than one message per two seconds (although this can be changed), and it deletes recently learned information from users who get banned or timed out, or from messages that are individually deleted.

There are a few commands for more explicit management. All of these commands require admin priviliges (which are assigned automatically to the broadcaster and mods). The most relevant ones are:

  • forget pattern deletes all recent messages containing the supplied pattern. E.g., if the bot's username is "Robot", then saying @Robot forget anime is trash makes the bot remove all messages received in the last fifteen minutes that contain "anime is trash" anywhere in the message.
  • you're too active reduces the random response rate, making the bot speak less often when not addressed.
  • set response probability to nn% sets the random response rate to a particular value. This is a good way to make the bot more talkative. Depending on the channel's activity level, the most reasonable values for this are usually somewhere around 2% to 10%.
  • be quiet for 2 hours makes Robot neither learn from nor speak in the channel for two hours. You can use other amounts of time, but the bot limits the length to 12 hours – if you really need longer, contact the bot owner. If you don't provide an amount of time, it defaults to one hour.
  • you may speak disables a previous use of the "be quiet" command.

For the exact syntax to use these commands, see the relevant section.

What information does Robot store?

Robot stores five types of information:

  • Configuration details. This includes things like channels to connect to, how frequently to send messages, and who has certain privileges (including "privacy" privileges). For the most part, this information is relevant only to bot owners, broadcasters, and mods.
  • Fifteen-minute history. Robot records all chat messages received in the last fifteen minutes, storing a hash specific to the sender, the channel it was sent to, the time it was received, and the full message text. Robot uses this information to delete messages it's learned under certain circumstances. Whenever Robot receives a new message, all records older than fifteen minutes are removed. Robot also records the messages it's generated in the last fifteen minutes.
  • One-week privileged command audit. Robot records uses of most admin- and owner-level commands for seven days, including the user, the command that was used with the full message text, the channel in which it was used, and the time the message was received. For security reasons, there is no way to opt out of this data collection.
  • Markov chain tuples. This is the majority of Robot's data, a simple list of prefix and suffix words tagged with the location that prefix and suffix may be used. This data is anonymous; Robot does not know who sent the messages that were used to obtain this information.
  • Affection information. If you use the marriage command, Robot associates an "affection level" roughly based on how often you cause her to speak with your Twitch user ID (which is a number unrelated to your username).

If you want Robot not to record information from you for any reason, simply use the give me privacy command. Once you're set up to be private, none of your messages will enter her history or Markov chain data. You'll still be able to ask Robot for messages. If you'd like the bot to learn from you again after going private, use the learn from me again command.

How Robot works

Robot uses the mathematical concept of Markov chains, extended in some interesting ways, to learn from chat and apply its knowledge. Here's an example.

Let's say Robot receives this chat message: Can you provide a better example for me please? The first thing it will do is run some preliminary checks to make sure it's ok to learn from the message, e.g. no links, sender isn't a bot, &c.

This particular message is fine. Robot's next step is to break it up into a list of tokens – basically words, except that the English articles "a," "an," and "the" are usually combined with the next word as well, along with invisible tokens for the start and end of a message. The tokens here are Can, you, provide, a better, example, for, me, please?.

Robot is configured with an "order", a number governing how much context matters when learning from messages. Let's say in this case that the bot has an order of 4. This means Robot takes groups of five tokens at a time and learns that the first four can be followed by the fifth. So, the bot will learn:

  • Can may be used to start a message.
  • you may follow can at the start. (Robot learns the exact capitalization of the "to" word, but ignores it for the "from" words.)
  • provide may follow can you at the start.
  • a better may follow can you provide at the start.
  • example may follow can you provide a better. (At this point, the start of the message is more than four tokens old, so she doesn't consider it anymore.)
  • for may follow you provide a better example.
  • me may follow provide a better example for.
  • please? may follow a better example for me.
  • A message may end after example for me please?.

Learning the message is finished. But, robots don't like learning things they'll never use.

When it's time for Robot to think of something to say, the bot does a "random walk" on everything it's learned. Starting with the invisible token for the end of a message, the bot picks out everything it knows can follow, then chooses one of those words entirely at random. Let's say it picks the word You. Robot records that the random walk went to You, then looks for everything that can follow you at the start. It might pick SHOULD; record it and look from you should, and maybe choose HAVE; then waited.

Now that the walk is at four words chosen, same as the order, Robot stop caring about the start again. So, it's looking for words that can follow you should have waited seen anywhere in a message. In my database at the time I write this, the only option that can follow is till. Since there are few options, and Robot wants to be clever, the bot tries to find matches with less context: where the previous three tokens were should have waited, but the token before that is not you. (Note, it's just a coincidence this happened right when the beginning-of-message token fell out of context. Robot could potentially look for extra matches this way at any point.)

It turns out there are no possibilities for you should have waited, but you should have waited has a few. Let's say so is next. It turns out should have waited so has zero matches, because it came from the search with the first two tokens eliminated. Searching should have waited so gives long! as the only option, and should have waited so only gives long (the idea of applying Markov chains to natural language is that regularities like this exist). If we pick long!, the only option we'll find with any of the searches we try will be the invisible end-of-message token. So, the generated message is You SHOULD HAVE waited so long!.

Commands

Robot acknowledges chat messages which start or end with the bot's username, ignoring case, possibly preceded by an @ character or followed by punctuation. For example, if the bot's name is "Robot", then it will recognize these as command messages:

  • @Robot madoka
  • madoka @rObOt
  • robot madoka
  • Robot: madoka
  • madoka Robot ?

These are not recognized as commands:

  • madoka @Robot homura
  • ¡Robot madoka!

When Robot recognizes a command, it strips the triggering portion from the message, and the remainder is the command invocation. So, in all of the command examples above, the command invocation is madoka. If a message both starts and ends by addressing Robot, only the start is removed for the invocation; e.g. the invocation for @Robot madoka @Robot is madoka @Robot.

The command invocation is checked against the list of commands for which the user who sent the command message has appropriate privileges. Robot is designed to understand some amount of English for command invocations; there are usually multiple forms you can use to perform a given command.

Commands for everyone

  • where is your source code? provides a link to this page, along with a short summary of prominent technologies leveraged.
  • what information do you collect on me? provides a link to the section on privacy on this page.
  • give me privacy makes the bot never record any information from your messages.
  • learn from me again undoes give me privacy.
  • generate something with starting chain generates a message that starts with starting chain. (Nothing happens if the bot doesn't know anything to say from there.)
  • uwu genyewates an especiawwy uwu message.
  • how are you? AAAAAAAAA A AAAAAAA AAA AAAA AAAAAAAA AA AA AAAAA.
  • roar makes the bot go rawr ;3
  • will you marry me? asks the bot to be your waifu, husbando, or whatever other label for a domestic partner is appropriate. Robot is choosy and capricious.

If a command invocation doesn't match any command, it instead prompts Robot to speak.

Commands for admins

  • forget <pattern to forget> un-learns all recent messages that contain the text "pattern to forget".
  • help <command-name> displays a brief help message on a command.
  • invocation <command-name> displays the exact regular expression used to match a command.
  • list commands displays all admin and regular command names.
  • be quiet until tomorrow causes the bot to neither learn from nor randomly speak in the channel for twelve hours.
  • be quiet for 2 hours is the same, but for 2 hours, or any other duration.
  • you may speak undoes "be quiet" commands, allowing the bot to learn and speak again immediately.
  • you're too active reduces Robot's random response rate.
  • set response probability to <prob>% sets Robot's random response rate to a particular value.
  • speak <n> times generates up to n messages at once, bypassing the bot's rate limit. The maximum for n is 5.
  • raid generates five messages at once.
  • echo <message> repeats an arbitrary message.

Effects

Robot can apply effects to randomly generated messages, modifying the actual output text. Effects can be configured per channel. The possible effects are:

  • uwu: Transform using the uwu command.
  • me: Use /me (CTCP ACTION) for the message.
  • o: Replace vowels with o.

Privileges

Robot has six privilege levels:

  • owner is a special privilege level for the bot owner.
  • admin gives access to extra commands for moderating robot's activity levels and knowledge.
  • regular is the default privilege level, for basic fun with the Markov chain features.
  • ignore removes access to any commands, including Markov chain features. Robot also does not learn from ignored users.
  • bot is a mix of admin and ignore privileges. Users with bot privileges can invoke admin commands, but Robot does not learn from their other messages.
  • privacy is a mix of regular and ignore privileges. Users with privacy privileges can invoke regular-level commands, but Robot does not learn from their messages.

Robot scans a user's chat badges to assign default privileges. Unless overridden per user, broadcasters, mods, and Twitch staff have owner privileges, and everyone else (including VIPs and subscribers) has regular privileges.

Rate limits

When Robot wants to generate a message, whether randomly or through a command, the bot requests a ticket from its rate limiter for the channel it would send to. If there is no ticket available, Robot does not generate the message.

There are two knobs on Robot's rate limiters: the "rate" and the "burst size." I have done a poor job of explaining what these mean elsewhere, but essentially, the burst size is the maximum number of tickets Robot can take, and the rate is how many tickets regenerate per second.

Say a channel has a rate of 0.1 and a burst size of 2. Robot hasn't said anything for a couple minutes. Someone asks the bot to generate a message at 10:53:00; Robot takes a ticket, leaving two remaining, and sends a message. At 10:53:02, another person talks and triggers a random message; Robot takes another ticket, and now there are 1.2 tickets. Another person demands an uwu at 10:53:07, leaving 0.7 tickets. Someone in the channel needs a meme at 10:53:10; the rate limiter has just regenerated a full ticket, which Robot takes, leaving 0.0. Then, at 10:53:19, another person asks for an uwu, because they're beautiful, but the rate limiter has only 0.9 tickets, so Robot cannot take one and so does not speak.

In addition to the rate limiter described above, Robot has a global rate limiter that prevents her from trying to send more than 100 messages per thirty seconds on average, per Twitch's documentation. However, for that global limiter, she waits for a ticket to become available instead of giving up if there isn't one already.

Running your own instance

Robot is designed to be reasonable to install and use on your own. Doing so lets you control exactly when the bot runs and what information is in its database. If you can program in Go and SQL, you can even modify how Robot works to add new features or to remove existing ones, as long as you follow the GPLv3 license.

Installing and running

First, make sure you have the latest versions of Go and GCC installed. If you open a command prompt or terminal, entering go version should print something like go version go1.15.2 windows/amd64, and entering gcc --version should print something like gcc.exe (tdm64-1) 9.2.0. If either of these commands fails, follow the installation instructions for Go and, on Windows, TDM-GCC. (If you aren't on Windows, GCC is almost certainly installed already.)

Also recommended is to install an SQLite3 database interface, like SQLiteStudio.

With at least Go and GCC installed, simply enter go get github.com/zephyrtronium/robot/.... This installs the robot, robot-convert, robot-init, and robot-talk commands. At this point, you should be able to enter robot -help to see a basic help message.

Before you can run Robot, you'll need to use robot-init to initialize a database. You'll probably want to copy and modify the example configuration, then do robot-init -conf modified.json -source robot.sqlite3. See the README for robot-init for more information.

You'll also need an OAuth token for Twitch, with at least chat:read and chat:edit scopes. If you don't already have one, you can get one through the TMI OAuth generator.

Finally, run Robot using robot -source robot.sqlite3 -token y0UrOAuth70Ken.

Database structure

Robot's database tables are:

  • audit - log of uses of most admin- and owner-level commands
    • time - time the command was received
    • chan - channel in which the command was received
    • sender - username of the sender
    • cmd - command name that was executed
    • msg - full message text, including the command activation
  • chans - configuration for channels known to the bot.
    • name - primary key, name of the channel; must begin with '#' and be all lower case, otherwise Twitch will silently ignore the bot trying to join.
    • learn - tag to use for learned messages
    • send - tag to select from to generate messages
    • lim - maximum length of generated messages
    • prob - probability that a non-command message will trigger generating a message; must be between 0 and 1
    • rate - maximum average messages per second to send
    • burst - maximum number of messages to send in a burst
    • block - regular expression matching messages to block, in addition to the block expression in config
    • respond - whether commands can generate messages, separate from the random speaking chance
    • silence - datetime before which the bot will not learn or speak
    • echo - whether to report an echo directory for this channel (part of a personal experiment)
  • config - global configuration, only row 1 is used
    • me - bot's username, used as nick
    • pfix - Markov chain order, as described above
    • block - regular expression matching messages to block
  • copypasta - copypasta detection configuration
    • chan - channel to which this configuration applies
    • min - number of messages required to trigger copypasta detection
    • lim - time in seconds to consider messages for copypasta
  • effects - global and per-channel effects to apply to messages
    • tag - send tag where used, or everywhere if null
    • effect - effect name, see above
    • weight - integer weight; higher values mean more likely to select
  • emotes - global and per-channel emotes to append to messages
    • tag - send tag where used, or everywhere if null
    • emote - emote text
    • weight - integer weight; higher values mean more likely to select
  • generated - messages generated in the last fifteen minutes
    • time - timestamp of generated message
    • tag - tag used to generate the message
    • msg - generated message text
  • history - messages learned from in the last fifteen minutes
    • tid - Twitch IRC message ID
    • time - timestamp of message receipt
    • tags - formatted tags on the message, with display-name and user-id stripped
    • senderh - hash corresponding to the message sender
    • chan - channel received in
    • tag - tag used to learn the message
    • msg - message text
  • marriages - users the bot is currently "married" to
    • chan - channel to which this marriage applies
    • userid - Twitch user ID for the partner in this channel
    • time - time at which the partnership was affirmed
  • memes - copypasta messages in the last fifteen minutes
    • time - timestamp of copypaste
    • chan - channel message was copypasted in
    • msg - copypasta message text
  • privs - global and per-channel user priviliges
    • user - username receiving this privilege
    • chan - channel where applicable, or NULL if a global default
    • priv - privilege type, one of "owner", "admin", "bot", "privacy", or "ignore". (Regular privs are implied by not being in the table, or can be forced by setting this to the empty string.)
  • scores - "affection" levels for marriages
    • chan - channel to which this affection level applies
    • userid - Twitch user ID
    • score - affection level for this user
  • tuplesn, where n is a number - Markov chain data n prefix words
    • tag - tag with which this chain was learned
    • p0, p1, ... - prefix words; null means before start of message
    • suffix - suffix word; null means end of message

Owner commands

  • warranty prints a brief "NO WARRANTY" message, extracted from the GPLv3, on the bot's terminal.
  • disable <command-name> globally disables a command.
  • enable <command-name> globally re-enables a command.
  • resync synchronizes the bot's channel configurations with what is in the database. Use this after modifying the chans, emotes, or privs tables.
  • EXEC <query> executes an arbitrary mutating SQL query.
  • raw <command> <params> :<trailing> sends a raw IRC message to the server.
  • join <channel> [<learn-tag> <send-tag>] joins a channel and adds its configuration to the database. If the tags are not given, the inserted values will be NULL.
  • give <user> {owner|admin|bot|regular|ignore} privileges [in <channel>|everywhere] sets a user's privileges level, in the current channel if omitted.
  • quit disconnects from the IRC server and closes the bot.
  • reconnect disconnects from and then reconnects to the IRC server.
  • list commands (overriding the admin version) lists all commands, including owner-only and disabled ones. The latter are marked by a *.
  • debug [<channel>] lists (and prints to terminal) the in-memory configuration of the given channel, or the current one if omitted.

robot's People

Contributors

zephyrtronium avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

robot's Issues

move off sqlite

While trying to reproduce #39, it came to light that the new SQL approach is overwhelmingly slow. We could revert to the old algorithm, but we switched off it for many reasons, especially that it's way too memory-hungry. I want to try a NoSQL/KV solution anyway, so let's just go for it.

The design is roughly as follows:

  • Keys are the list of entropy-reduced tokens in reverse order. That allows us to match prefixes of arbitrary length by prefix scan. Keys are prepended with the tenant tag.
  • Values include lists of suffixes at each position along the key after the first or second. We need all suffixes so that we can match chains shorter than the full message without entropy reduction.
  • Each suffix tracks the list of message IDs that produced that suffix. Then we can check for deletions in a separate table/DB. (Alternatively, we could put message ID in the key, so that every learn is an insert rather than some being inserts and others being updates.)
  • Every prefix of a message is saved this way. In particular, this lets us easily search for new prompts by looking for just an end-of-message sentinel.
  • Out of consideration for storage, we keep a lexicography to map tokens to integer IDs. uint32 will suffice for these; after about four years in operation, RobotIsBroken has only 1418123 distinct words, or 1141243 post entropy reduction.

Based on past investigation into this idea, we probably want Badger for knowledge and bbolt for message deletes and lexicography.

robot-init not recognized

I'm getting the following error when I try to initialize the database for my bot:
image

robot -help works, so I'm wondering what I might be doing wrong here?

change ForgetUserSince to just ForgetUser

The caller is the one who needs to arrange the "since" part because message times are encoded in user hashes. The time parameter doesn't add anything except possibly limiting result sets on the SQL brain, which we almost certainly aren't going to use anyway.

refresh storage stops loading

After a couple refreshes:

couldn't obtain access token for TMI login: couldn't retrieve current token: couldn't load saved token: chacha20poly1305: message authentication failed

Installation instructions out of date?

I am attempting to install this bot on my own personal server but I'm a bit lost on the instructions. I did some searching and saw that I needed to run go install github.com/zephyrtronium/robot/...@latest instead of the existing command in the readme - that all works fine as it appears that the installation goes well (no errors in the logs). My issue happens when I try to execute robot -help. For some reason

I'm only able to execute said function if I go deep into the go/bin directory and run ./robot -help that way - the problem here is that my config file is in the cloned robot repo - am I going to have to run things from inside the pkg directory every time? That seems like it's a bit counterintuitive, am I just missing something or are the instructions out of date?

bad word filter

The bot produced a slur by combining inputs typed out like t h i s that were individually fine. Passing fully generated messages through a bad word filter would be a good step to prevent things like this. A straightforward place to add it would be in func badmatch.

Dashboards

Make web interfaces for bot management, separate for admin and owner. Both will eventually require Twitch OIDC, implying a redirect URI, but can be implemented without auth for LAN for now.

Owner dashboard: An owner connects to e.g. /o. Webpage displays:

  • Messages in history, generated messages (on tabs)
  • SQL result pane & query entry
  • Command output & entry (behaves as current terminal command entry)

On mobile breakpoints, SQL and commands are also tabs.

Admin dashboard: An admin in #channel connects to e.g. /m/channel. Webpage displays a menu of options corresponding to admin commands which modify the database: forget, silence, set-prob, &c. when others are added. Selecting one displays a page appropriate to the command.

kvbrain forgets wrongly

We need to be able to effectively forget a message before we actually record tuples from that message. #41 lays out that we should do this by recording deletes in bbolt. Don't forget to actually implement that.

describe self

Add a command to describe the bot's mechanism in chat. Should respond to:

  • who are you?
  • how do you work?

fixed chain responses

Extend respond to make the bot generate messages starting with a given chain. E.g. have AYAYA AYAYA @RobotIsBroken take (assuming prefix length 3) \x01 AYAYA AYAYA as the working chain to generate the rest of a message, or AYAYA AYAYA AYAYA AYAYA AYAYA @RobotIsBroken use AYAYA AYAYA AYAYA with AYAYA AYAYA as a fixed prefix. May require slightly reworking Walk.

add a send closure to channels

Channels should each record how to send to them. Add a field containing a closure that implements it, including with global rate limiting.

improve user privacy

  • Add a "privacy" privilege level. Users of this level can trigger commands, but the bot does not record any information from them, in history or in tuples.
  • Add a "priv" channel config option, setting the default privilege level for each channel. Usually this will be null or "privacy", depending on the broadcaster's wishes.
  • Add a "privacy" command. Responds with a link to a summary of the information Robot collects (like in the current top-level README) and mentions commands to manage a user's own privacy.
  • Add "ignore me" and "unignore me" commands. The former sets a user's privileges to "privacy", or to "bot" if they're currently an admin. The latter sets a user's privileges to regular, or to "admin" if they're currently "bot".
  • Save hashes of usernames in history instead of full usernames.
  • Remove mentions of users with "privacy" privileges when learning chains. Replace with self, triggering user, broadcaster?

Problems while trying to install

I've trying for a few days by now to install the bot on my computer but as for now i've been unsuccesful, i downloaded the master file and installed go (version: 1.15.2) and gcc (tdm64-1 10.3.0) but as soon as i get to the ''go get github.com/zephyrtronium/robot/'' step i have an issue, it gives a prompt that the ''go get'' function is depreciated and i should use the ''go install'' function instead, so i tried that and after numerous tries and even visiting their site to see what could've been going wrong i simply can't install it because it says it need the version of the repository and nothing that i tried works, i even tried to download an older version of go to see if i could use the go get command (hence why i'm using 1.15.2 rn), so having said that i would like to ask for some kind of help on how to properly install robot, sorry if this sounds awkwardly amateur but i would really appreciate if you could give me an answer.
Thanks for your attention, Diogo.

expand tuples info

Some additional info about tuples will not only improve the suitability of Robot's data for research if that direction is pursued, but will also improve maintainability and consistency of the data.

  • timestamp
  • userhash
  • forgotten, or forget reason
  • msgid (maybe use this to relate tuples to this other data instead of recording those for every message)

handle USERSTATE tags

Use USERSTATE tags to find the bot's own badges in a channel and determine whether it can send 20 or 100 PRIVMSGs per thirty seconds.

Running the bot as a background process?

I did a bit of research on my own and couldn't find an answer that seemed to make sense to my brain - is there a way a person could run the bot as a background process at all? I have it set up on a VPS and I would like to simply turn it on and let it run, is this at all possible via a flag or something similar?

follow raids

Optionally follow raids issued by the broadcaster and allow copypasta detection to trigger to join in with the raid message, then leave after a few minutes.

thank sub gifters

Robot should thank people who gift her subs. Look for USERNOTICE with msg-id either "subgift" or "anonsubgift" and with msg-param-recipient-user-name equal to the bot's lowercase username or msg-param-recipient-id equal to the bot's userid if we get that at some point. Additional relevant docs: https://dev.twitch.tv/docs/irc/tags#usernotice-twitch-tags

Thank with a chat message, possibly a special set of emote options for doing so, maybe a big chunk of affection.

channel config

This probably doesn't need to go into a database, as long as it's possible to update configs and particularly the list of channels while the bot is running.

Also be careful to avoid making this IRC-centric. What do we need for e.g. Discord?

Discord

Robot should be able to connect to Discord servers and channels within them. This might involve making fake IRC messages out of Discord messages and vice-versa, or making an interface to capture both kinds. It might be possible to add Discord info directly into the chans table, or it might be necessary to add a new table.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.