levwtech / wa-gpt Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 1.0 200 KB

💬 Chat, Create Stickers & Images, Send Voice Notes, Files, and More! 🎨 Start now for free! ✨

Home Page: https://whatsapp-assistant.com

JavaScript 100.00%

aws dynamodb lambda-functions nodejs s3 openai-api dalle-3 gpt gumroad

wa-gpt's Introduction

WhatsApp AI Assistant

Chat, Create Stickers & Images, Send Voice Notes, Files, and More!

Features

Users can easily communicate with friends and family using visuals, sending stickers or images instantly to convey ideas, emotions, and more, enhancing the conversation with creativity and expression. 🌟💬🎨
The product leverages the most advanced AI models' capabilities to answer questions, tell stories, and perform diverse tasks seamlessly, offering a blend of knowledge and creativity. 🤖
It has a built-in feature to generate images using the /image command, allowing users to visualize their ideas instantly. 🖼️✨
Users can create custom stickers effortlessly with the /sticker command, adding a fun and personalized touch to their conversations. 🎉📌
Both image and sticker generation functionalities are text-based, making it incredibly easy for users to express themselves creatively. ✏️🎨
With its magical capabilities, users can type anything and receive a corresponding sticker, fostering limitless creativity and engagement. 🌟🔮
The chatbot seamlessly understands voice notes, enabling users to send voice messages on the fly, transforming communication into a convenient and interactive experience. 🔊

Technologies Used

Node.js
AWS Lambda
DynamoDB
API Gateway
S3
Sharp for Image Processing
OpenAI GPT, DALL-E and Whisper
Gumroad APIs for payment
Github Actions for AWS Automatic Deployments
Next.js for Frontend

Website URL

whatsapp-assistant.com

Open Source

Curious about the journey behind WhatsApp AI Assistant and why it was open-sourced? Check out the detailed story and technical dive in this LinkedIn article.

wa-gpt's People

Contributors

Stargazers

Watchers

Forkers

auto-deploy-landingpage-inspirations

wa-gpt's Issues

Enhance the prompt of sticker generation

Find a prompt that prevents having text on the image, negative prompts such as "no words" dont work with dalle2, try "visual" words or even consider giving the prompt to gpt3 text generation and ask it to write the sticker generation prompt for dalle

Encourage the assistant to use emojis

Prompt request take only N latest messages

Make messages expiration time lower, in hours not days
With gpt requests consider not sending the entire conversation if it's more than N messages take last N only, as some conversations can have 300+ messages sent with the request, study this performance.

stickers & images generated multiple times sometimes

change image generation model

dall-e-2 is not so good for image/sticker generation.

consider:

dall-e-3, move to tier one by paying atleast 5 usd, when moved to a higher tier, for stickers, will need to resize the image to 512x512 after, as their prompt refuses this size for dall-e-3
gemini once image generation is live: https://ai.google.dev/pricing?hl=en
Midjourney https://docs.midjourney.com/docs/plans
Sadly there is no pay-as-you-go pricing model :(
First plan has limited generations (about 200 a month) so we cant work with it.
Second plan is fine but a bit costly. (30 USD)
Deploy your own AI model:
Can use juggernaut + a sticker lora from https://civit.ai/ but and run it on a serverless gpu runpod.io
But this is the last option

Some messages are sent multiple times over the webhook

This happens because when the meta request is not returned to it statusCode of 200, they try to send the message again over the next 7 days according to their retry mechanism.

Therefore, find a way to always return statusCode 200 even on Errors.

Or, save the messageId and ignore the message if it already exists in the conversation, identify all possible errors and find a way to resolve all possible cases to prevent duplicate messages.

Rate limit at organisation level

OpenAI's rate limiting is applied at the organisation level, not user level.

If you have just started using the API the rate limiting for dalle2 is about 5 images per min.

This is applied for all users. Not for each user.

Even though open AI suggest on their website using a backoff algorithm but I don't think this is a good idea as each failed attempt contribute to the rate limit.

I suggest creating a SQS message queue that is adjusted based on the rate limit.

Each message in the queue represents a request that needs to be sent to dalle2

The lambda function that is triggered on SQS additions that consumes the messages should process 5 requests per min only this can be done using different methods:

create a dynamoDB table to insert each message in it after it has been processed, with a TTL field of 1 minute, if at any given time this table has more than 5 items, don't process the message and keep it in the queue.
create an event on the SQS consumer function to run every minute and process 5 requests, the function should run only based on that 1minute timer event, and not triggered on SQS additions.

This will require you to have the sendMessage logic in another function consumer from an SQS as well, because the consumer of the Dalle SQS will send to the sendMessage SQS, and you don't need to worry about meta/WhatsApp's rate limit for this one.

conversations db performance concerns

Consider sorting the messages on db in get messages query and not on the server, see if that would result in faster response times.
Consider removing id field and putting the ttl as sort key instead.
Consider a conversation table with a field of messages that is a list, queries will be super quick because of a single document with a unique number key, but how do we apply TTL on the list elements? How about a daily Cron job that filters the list inside each conversation?

Create users dynamoDB table

each user document should have 1) userNumber which is the partition key that uniquely identifies the user and 2) The amount of tokens the user has and 3) subscribed flag

When the user first joins, assign 20 tokens for the them, treat each text prompt as 1 token and each media prompt as 4 tokens, after the user's tokens are 0, and if the subscribed flag is false, reply to every message with the payment link

check S3 and ensure the delete policy is applied

All stickers in the bucket should be deleted after they are sent over meta's server, there is no reason to retain them.

Remove background from stickers

Current sticker implementation is fine for a v1.0

Remove /phone endpoint and API Gateway, Attach the Lambda URL directly as the webhook to cut costs

handle empty sticker/image messages

New users do /sticker or /message without any text

Send them a message saying you must provide a text

multilingual chat bot

make the bot multilingual

Gpt can already respond in any language but you need to determine the user language to sent the automatic messages accourdingly. Automated messages such as Thank you for subscribing, Your free trial has ended..etc messages should each be an object with ar, en..etc fields and each user as a lang field stored and you get the message using message.[user.lang] or getMessage(user.lang) and user.lang always defaults to en

When creating a new user, check if it is one of the supported starting messages languages and and pick the language accourdingly, if the message is not one of the 3 starting messages, use this library to check the language https://www.npmjs.com/package/languagedetect, if the language is not one of the supported languages, default to english.
and store it as the user.lang. Add another command such as /lang arabic to change the user's language. accommodate for all possible values such as arabic, Arabic, عربي, AR..etc and respond with your language has been changed. Add this instruction in the system message.

Payment link

The payment link should uniquely identify the user, consider paypal invoice if successful payment notifications are received via webhook, or consider gumroad's tier creation.

on successful subscriptions from the webhook, if tokens based, give extra N tokens (depends on payment amount which is soon to be determined)
if subscription based (like gumroad) turn the boolean flag of subscribed to true, when you receive cancellation notification on the webhook, toggle it back to false.

Real Time Information

GPT Function to check if the request requires real time information or no, and return search term
if yes we send to Google API with search term
Send results to GPT4o with the original question and return the response