The chaindog from caseyjohnsonwv

Refactor for Error Handling System-Wide

Related to #7 - the current architecture limits all SMS interaction to one Lambda function, invoked by a synchronous API call. At scale, this will be more cost-effective as a container rather than a serverless function. It will also be easier to perform proper error handling in a long-running container.

Twilio invokes the API endpoint
API endpoint invokes a Lambda, which pushes the payload to a queue and returns an empty 200 OK to Twilio.
A long-running ECS container consumes the queue, performs NLP etc., then invokes the SMS Lambda.

This has a few advantages:

No more worrying about Lambda package size OR memory / runtime for intensive NLP processes.
SMS message response templates can be loaded into the NLP container from a CSV / XLSX file in S3.
Cleaner architecture for successes AND failures with all outbound messages being sent via the SMS Lambda.

Implementation

Create long-running NLP loop that can poll SQS and publish to SNS.
Containerize the NLP code and add ECS task / service to the IaC config.
Add an SQS queue to the IaC config.
Modify the Watch Lambda source to simply push messages to the queue.

Require Y/N Response to Delete a Watch

Similar to #14 - I don't want to delete a watch under most circumstances, bulk delete or single delete, unless the user confirms via Y/N response that this is what they intended to do. The exception being for watches created in the last 5 minutes - those are probably mistakes and can be deleted without confirmation.

Creating / updating watches without confirmation is okay in my opinion. If these somehow end up with bad or wrong data, it's easy for the user to simply send another message correcting it.

Dependent on #13, as this will require state tracking.

Allow All SMS Interactions to Handle Ambiguity Gracefully

Currently, the entire application is stateless: requests and responses have to be specified in their entirety. It would be nice if they didn't.

For example, the user requests a watch for "Batman at Six Flags."

Create a partial watch "transaction" with their phone number and commit it to the DB.
Pull every park they could have meant and ask them to specify. Repeat as needed until only one park.
Update the transaction in the DB.
Repeat step 2 for ride name if needed.
Finish creating the watch in the DB, setting its expiration based on the time it was finished.

Architecture To Enable This Change

I think this will require a separate Dynamo table for "Transactions" - this can store metadata about the user's interactions with the bot as well as a partial watch that is being populated through the conversation:

{
  "phone_number" : "+11234567890",
  "user_message" : "Watch Batman at Six Flags.",
  "our_message" : "Which park are you referring to? Six Flags Great Adventure, Six Flags Great America, [etc.]",
  "timestamp_utc" : "2022-09-12T02:58:00.123456+00:00",
  "expecting_yn_response" : False,
  "expected_responses": ["Six Flags Great Adventure", "Six Flags Great America", ...],
  "watch" : {
    (all the fields of a normal watch object, but null if needed)
  }
}

With each successive message, we can populate the watch object, finally adding it to the "Watches" table in Dynamo when none of its fields are null. This way, we can always assume a watch in the "Watches" table is complete and valid.

Implementation

Add "Transactions" table (PK = phone_number) + IAM permissions to Terraform
Modify fuzzy matching to return all options above the specified threshold, then make Watch Lambda create / update a transaction if there are multiple good options.
Make Watch Lambda check for transactions in progress on each invocation. Use the expected_responses to determine if the user has answered the question. Determine what question to ask next based on the null fields remaining in the nested watch object. If no fields are null, write the watch to the "Watches" table.
Prune incomplete transactions that are older than 5 minutes (?) with a new Lambda function. Text the user that we didn't create the watch. This can be invoked by the same SNS topic that triggers the Notification Lambda.

List All of a User's Watches

Related to #4 - if no park name, ride name, or wait time... this is probably what's being requested.

Request: What rides am I watching?
Response: You are watching: Maverick, under 20 min (currently 45); Steel Vengeance, under 30 min (currently 60).

Deduce Park Name from Existing Watches

Similar to #3, #15, probably others - if the user already has watches open, they shouldn't need to include the park name on subsequent requests.

As a future enhancement, it would be really cool to cache the park they're visiting in a separate table - related to #13. This way only the first request of the day needs to include park name.

Robust Error Handling

If the Watch Lambda errors out, it is unable to respond to the end user via Twilio. This is because the default error response is in JSON format and Twilio requires an application/xml TwiML response. I have a few ideas to make this work, probably need to just try them.

Idea 1 - Likely impossible

~~Link the Watch Lambda to the SMS Topic with an event destination. Not sure if this can pass data with the failure message. If it can, this may be the easiest way.~~

Idea 2 - More possible, but ugly

Figure out how to catch a Lambda function failure inside the function itself and respond with a different TwiML. Not sure if this is possible, but it's probably some sort of C signal or something that we could catch.

EDIT: I guess we could just wrap the whole thing in a try/catch, log the error, and return a canned answer... this is probably the most feasible solution, but it's definitely the ugliest.

Idea 3 - Most possible, but least customizable

~~Configure an API Gateway default response for failures. If the Lambda returns a 4xx or 5xx code, send the user a canned TwiML saying something went wrong.~~

Delete Watches Without Requiring Park Name

Related to #3 and #10 - watches must be unique by phone number + ride name. We can query all of a user's watches when deleting to determine which is being closed without needing park name / park id.

Notification Lambda Says "0 Minutes" for Closed Rides

When a ride closes due to downtime, the Notification Lambda incorrectly notifies the user of a "0 minute" wait. This should instead do nothing, as the ride may reopen, or follow the extension / expiration logic we've already implemented.

Improve Natural Language Processing

In #2, we left an item for later implementation. There are also things that could be improved otherwise.

Extract a time keyword such as "hours," or "minutes." Convert to minutes on the backend.
#12
Implement actual NLP rather than just fuzzy matching everything.
#13
Properly extract an action (create, update, delete) rather than using regular expressions.

EDIT: It may be worth exploring AWS Comprehend - this is cheaper than I thought.

Update a Watch if it Already Exists

Watches are assigned a unique UUID for watch_id, but they also must be unique for the combination of phone_number and ride_id. It would be nice to update/extend a watch if a duplicate request is created. Currently, the 2nd request is being rejected and a rejection message is returned to the end user.

Do Not Require Park Name If Ride Name Is Unique

Kind of a weird one, kind of related to #2. This was a feature I really wanted in Firewatch that just wasn't feasible. If a user requests a watch for a ride with a unique name - it's the only ride in the world with that exact name - they shouldn't have to supply the park name.

Request that should work:
Watch Steel Vengeance for a line shorter than 30 minutes.

Request that should fail, because the ride name is a duplicate:
Tell me when Boomerang has a wait under 1 hour.

Looking at the overall architecture, this will probably require us to ditch the S3 storage in favor of more Dynamo tables. I'm not thrilled about the idea because it's a lot more efficient (versus parsing every park's entire wait time JSON every 5 minutes). Maybe a cronjob can periodically update a Dynamo table that we use as a cache?

Make Park and Ride Names Case-Insensitive

The database queries currently fail for case sensitivity issues. You have to ask for GhostRider because Ghostrider doesn't exactly match any ride name at Knott's Berry Farm. Pls fix.

Twilio API Request Validation

Twilio sends signed requests with the X-Twilio-Signature header. The API endpoint should be modified to validate this signature before processing the request. ~~It should probably also use an API key to ensure only Twilio can send requests.~~

EDIT: this will require some modifications to the API Gateway integration - only the application/x-www-form-urlencoded payload is being converted to JSON and passed to Lambda; the X-Twilio-Signature (and all other headers) are getting left out by the translation at the API Gateway -> Lambda step).

Natural Language Processing

Incoming messages currently have to be in the format:

exact park name
exact ride name
integer wait time

This should be replaced with natural language processing in a few parts:

Extract park name, ride name, and wait time
Use fuzzy matching to allow partial names

Canceled:

~~Extract "hours," "minutes," etc, then convert~~

We did something like this for Firewatch, so I know it's possible. Even a naive implementation would be a huge improvement.

Fix Park Name and Ride Name Detection to Handle Edge Cases

I found an edge case for the current park name / ride name implementation: "Adventure Island" takes precedence over "Islands of Adventure at Universal Orlando" if you search by "Islands of Adventure." This is obviously not the intended behavior. We are using token_set_ratio() to pick the best match; we should match by a composite score that also includes token_sort_ratio().

Cancel All of a User's Watches in Bulk

Related to #10 - if a user wants to open watches for a different park, they will have to close all of their watches first. This currently requires a separate message for each watch (ie, 10 messages for 10 watches). It would be nice to close all of a user's watches for a given park with one message.

Dependent on #13 - I don't want to delete all of a user's watches without confirming that's what they intended to do. However, this will require some semblance of state tracking, which is what #13 aims to resolve.

Get All Rides With a Short Line Right Now

This is a little more complicated than it seems because Queue-Times tracks every single ride on the park's app. For some parks, such as Cedar Point, that includes flat rides. Some parks' JSONs delineate "Coasters" into their own grouping, but some do not. What I don't want this to do is tell the user every single ride in the park.

Request: What rides at Cedar Point have a wait under 15 minutes right now?
Response: You can ride Blue Streak (5), Gemini (10), or Magnum XL-200 (10)!

Maybe as a rudimentary implementation, just return a link to the park on Queue-Times for the user to explore themselves.

Delete a Watch

Surprised I didn't have an issue for this already... definitely need functionality to delete a watch via SMS. Could be done quick & dirty with a keyword, but a proper solution depends on #2.

Check Wait Time Without Creating a Watch

Another feature I wanted in Firewatch that was just too painful to build. Sometimes you just want to know the wait time for a ride - it would be great to query this and respond to a text without creating a watch.

Example request:
How long is the line for Ride of Steel at Six Flags Darien Lake?

Example response:
The line for Ride of Steel is currently 15 minutes.

As a naive implementation: if the user doesn't include a target wait time, assume they're querying the ride's current wait.

Only Allow Watches for One Park Per User

This was a feature I explored for Firewatch that ended up being ditched - I like the idea of only allowing the user to watch rides at one park at any given time. If the user has a watch open, only allow additional watches to be opened for the same park.

As a side effect, this could help with #3, because we'd already know what park the user is visiting. This also doesn't require any additional database querying, as we're already checking to see if the user has a watch open - we can just grab the park data from there.

caseyjohnsonwv / chaindog Goto Github PK

chaindog's Introduction

Chaindog

Prerequisites

Quickstart

Managing Multiple Environments

chaindog's People

Contributors

Watchers

chaindog's Issues

Implementation

Architecture To Enable This Change

Implementation

Idea 1 - Likely impossible

Idea 2 - More possible, but ugly

Idea 3 - Most possible, but least customizable

Recommend Projects

Recommend Topics

Recommend Org