Giter Club home page Giter Club logo

wa-gpt's Introduction

WhatsApp AI Assistant

WhatsApp AI Assistant

Chat, Create Stickers & Images, Send Voice Notes, Files, and More!

Features

  • Users can easily communicate with friends and family using visuals, sending stickers or images instantly to convey ideas, emotions, and more, enhancing the conversation with creativity and expression. ๐ŸŒŸ๐Ÿ’ฌ๐ŸŽจ
  • The product leverages the most advanced AI models' capabilities to answer questions, tell stories, and perform diverse tasks seamlessly, offering a blend of knowledge and creativity. ๐Ÿค–
  • It has a built-in feature to generate images using the /image command, allowing users to visualize their ideas instantly. ๐Ÿ–ผ๏ธโœจ
  • Users can create custom stickers effortlessly with the /sticker command, adding a fun and personalized touch to their conversations. ๐ŸŽ‰๐Ÿ“Œ
  • Both image and sticker generation functionalities are text-based, making it incredibly easy for users to express themselves creatively. โœ๏ธ๐ŸŽจ
  • With its magical capabilities, users can type anything and receive a corresponding sticker, fostering limitless creativity and engagement. ๐ŸŒŸ๐Ÿ”ฎ
  • The chatbot seamlessly understands voice notes, enabling users to send voice messages on the fly, transforming communication into a convenient and interactive experience. ๐Ÿ”Š

Technologies Used

  • Node.js
  • AWS Lambda
  • DynamoDB
  • API Gateway
  • S3
  • Sharp for Image Processing
  • OpenAI GPT, DALL-E and Whisper
  • Gumroad APIs for payment
  • Github Actions for AWS Automatic Deployments
  • Next.js for Frontend

Website URL

whatsapp-assistant.com

Open Source

Curious about the journey behind WhatsApp AI Assistant and why it was open-sourced? Check out the detailed story and technical dive in this LinkedIn article.

wa-gpt's People

Contributors

levwtech avatar

Stargazers

 avatar  avatar

Watchers

 avatar

wa-gpt's Issues

Enhance the prompt of sticker generation

Find a prompt that prevents having text on the image, negative prompts such as "no words" dont work with dalle2, try "visual" words or even consider giving the prompt to gpt3 text generation and ask it to write the sticker generation prompt for dalle

Prompt request take only N latest messages

  • Make messages expiration time lower, in hours not days

  • With gpt requests consider not sending the entire conversation if it's more than N messages take last N only, as some conversations can have 300+ messages sent with the request, study this performance.

change image generation model

dall-e-2 is not so good for image/sticker generation.

consider:

  1. dall-e-3, move to tier one by paying atleast 5 usd, when moved to a higher tier, for stickers, will need to resize the image to 512x512 after, as their prompt refuses this size for dall-e-3
    image

  2. gemini once image generation is live: https://ai.google.dev/pricing?hl=en

  3. Midjourney https://docs.midjourney.com/docs/plans
    Sadly there is no pay-as-you-go pricing model :(
    First plan has limited generations (about 200 a month) so we cant work with it.
    Second plan is fine but a bit costly. (30 USD)

  4. Deploy your own AI model:
    Can use juggernaut + a sticker lora from https://civit.ai/ but and run it on a serverless gpu runpod.io
    But this is the last option

Some messages are sent multiple times over the webhook

This happens because when the meta request is not returned to it statusCode of 200, they try to send the message again over the next 7 days according to their retry mechanism.

Therefore, find a way to always return statusCode 200 even on Errors.

Or, save the messageId and ignore the message if it already exists in the conversation, identify all possible errors and find a way to resolve all possible cases to prevent duplicate messages.

Rate limit at organisation level

OpenAI's rate limiting is applied at the organisation level, not user level.

If you have just started using the API the rate limiting for dalle2 is about 5 images per min.

This is applied for all users. Not for each user.

Even though open AI suggest on their website using a backoff algorithm but I don't think this is a good idea as each failed attempt contribute to the rate limit.

I suggest creating a SQS message queue that is adjusted based on the rate limit.

Each message in the queue represents a request that needs to be sent to dalle2

The lambda function that is triggered on SQS additions that consumes the messages should process 5 requests per min only this can be done using different methods:

  1. create a dynamoDB table to insert each message in it after it has been processed, with a TTL field of 1 minute, if at any given time this table has more than 5 items, don't process the message and keep it in the queue.

  2. create an event on the SQS consumer function to run every minute and process 5 requests, the function should run only based on that 1minute timer event, and not triggered on SQS additions.

This will require you to have the sendMessage logic in another function consumer from an SQS as well, because the consumer of the Dalle SQS will send to the sendMessage SQS, and you don't need to worry about meta/WhatsApp's rate limit for this one.

conversations db performance concerns

  • Consider sorting the messages on db in get messages query and not on the server, see if that would result in faster response times.

  • Consider removing id field and putting the ttl as sort key instead.

  • Consider a conversation table with a field of messages that is a list, queries will be super quick because of a single document with a unique number key, but how do we apply TTL on the list elements? How about a daily Cron job that filters the list inside each conversation?

Create users dynamoDB table

each user document should have 1) userNumber which is the partition key that uniquely identifies the user and 2) The amount of tokens the user has and 3) subscribed flag

When the user first joins, assign 20 tokens for the them, treat each text prompt as 1 token and each media prompt as 4 tokens, after the user's tokens are 0, and if the subscribed flag is false, reply to every message with the payment link

multilingual chat bot

make the bot multilingual

Gpt can already respond in any language but you need to determine the user language to sent the automatic messages accourdingly. Automated messages such as Thank you for subscribing, Your free trial has ended..etc messages should each be an object with ar, en..etc fields and each user as a lang field stored and you get the message using message.[user.lang] or getMessage(user.lang) and user.lang always defaults to en

When creating a new user, check if it is one of the supported starting messages languages and and pick the language accourdingly, if the message is not one of the 3 starting messages, use this library to check the language https://www.npmjs.com/package/languagedetect, if the language is not one of the supported languages, default to english.
and store it as the user.lang. Add another command such as /lang arabic to change the user's language. accommodate for all possible values such as arabic, Arabic, ุนุฑุจูŠ, AR..etc and respond with your language has been changed. Add this instruction in the system message.

Payment link

The payment link should uniquely identify the user, consider paypal invoice if successful payment notifications are received via webhook, or consider gumroad's tier creation.

on successful subscriptions from the webhook, if tokens based, give extra N tokens (depends on payment amount which is soon to be determined)
if subscription based (like gumroad) turn the boolean flag of subscribed to true, when you receive cancellation notification on the webhook, toggle it back to false.

Real Time Information

  1. GPT Function to check if the request requires real time information or no, and return search term
  2. if yes we send to Google API with search term
  3. Send results to GPT4o with the original question and return the response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.