aws-samples / aws-lex-conv-faq Goto Github PK

View Code? Open in Web Editor NEW

10.0 3.0 6.0 157 KB

Demonstration of LLM integration into a lex bot using Lambda codehooks and a Sagemaker endpoint.

License: MIT No Attribution

Python 94.87% Dockerfile 3.80% Batchfile 1.33%

aws-cdk aws-lex aws-lex-lambda aws-sagemaker large-language-models huggingface llama-index

aws-lex-conv-faq's Introduction

AWS Lex Conversational FAQ Demo

Demonstration of LLM integration into a lex bot using Lambda codehooks and a Sagemaker endpoint.

What resources will be created?

This CDK code will create the following:

1 Sagemaker endpoint hosting a model (default configuration is falcon-7b-instruct on ml.g5.8xlarge but you can configure model or hardware)
1 Lex bot
2 S3 buckets (one for your uploaded source, one for the created index)
2 Lambda functions (one to ingest the source and create an image, one to be invoked as codehook during lambda and provide an FAQ answer when needed)
1 Event listener attached to an S3 bucket to call the index creation lambda automatically when a file is uploaded
2 Iam roles (one for the lex bot to call lambda, one for the lambdas to call sagemaker and S3)

Requirements

AWS setup

Region

If you have not yet run aws configure and set a default region, you must do so, or you can also run export AWS_DEFAULT_REGION=<your-region>

Authorization

You must use a role that has sufficient permissions to create Iam roles, as well as cloudformation resources

Python >=3.7

Make sure you have python3 installed at a version >=3.7.x

Docker

Make sure you have Docker installed on your machine and running in the background

AWS CDK

Make sure you have the AWS CDK installed on your machine

Setup

Set up virtual enviroment and gather packages

cd src/lex-gen-ai-demo-cdk-files

Install the required dependencies (aws-cdk-lib and constructs) into your Python environment

pip install -r requirements.txt

Gather and deploy resources with the CDK

First synthesize, which executes the application, defines which resources will be created, and translates this into a cloudformation template

cdk synth

Now bootstrap, which provisions the resources you'll use when deploying the application

cdk bootstrap

and deploy with

cdk deploy LexGenAIDemoFilesStack

The deployment will create a lex bot and S3 buckets and will dockerize the code in the lex-gen-ai-demo-cdk/index-creation-docker-image and lex-gen-ai-demo-cdk/lex-gen-ai-demo-docker-image directory and push that image to ECR so it can run in Lambda. Don't worry if this step takes a long time while pushing to ECR, we are bundling up two docker images and uploading them so it will take some time.

Usage

Once all the resources are created after cdk deploy finishes running you must upload a .pdf or .txt file at least once so an index can be created. You can use our upload script upload_file_to_s3.py path/to/your/file or you can navigate to the S3 console and manually upload a file. On upload the ingestion lambda will read the file and create an embedding which it will upload to the other S3 bucket. Now that an embedding exists you can go to your bot and begin using it. If you want to update the embedding you can upload a new file and a new embedding will overwrite the old embedding. Once you have a new embedding you must restart the runtime lambda function for it to start using the new embedding.

Note, the first time the embedding lambda and the runtime lambda are called the latency will be much slower as it must load resources and save them in the lambda enviroment. Once loaded these resources will stay in the enviroment as long as the ECR image is not deleted. This means your first request will be slow but after that it will be faster now that the resources are cached.

Uploading files

Now, you have to upload your source file so the indexing lambda can create an index for the runtime lambda to use. You can use our script with any .pdf or .txt file by running

python3 upload_file_to_s3.py path/to/your/file

or you can open the S3 bucket in the console and manually upload a file. On upload an index will automatically be generated. Note: If you upload a large file, the index will be large and the S3 read time on cold start may become large.

Once you've uploaded your file, wait a little for your index to be created and then you can go into the Lex console and test your bot (no need to build your bot unless you've made changes after creation). The first time you create an index and the first time you query the bot it will take a little longer (around 90 seconds) as we need to load models and cache them in the lambda-ECR enviroment, but once they are cached there is no need to download them and latency will be much faster. These resources will remain cached as long as the ECR image is not deleted. Additionally for better cold start performance you can provision an instance for your runtime lambda function. There are directions to do so below.

Configurations

🚨 Remember to shut down your endpoint if you're done using it! 🚨

We have provided a script to deactivate an endpoint and endpoint configuration with whatever name is in the endpoint creation script. To run:

python3 shut_down_endpoint.py

Custom model and instance type configuration:

The function create_endpoint_from_HF_image() is called in app.py. This function accepts the following arguments:

hf_model_id (required): For the purposes of the demo we have this set to tiiuae/falcon-7b-instruct. You can find any model on https://huggingface.co/ and feed it in
instance_type (optional, default is ml.g5.8xlarge): If you don't give an argument we'll use ml.g5.8xlarge. You can use any endpoint sage instance type
endpoint_name (optional, default is whatever SAGEMAKER_ENDPOINT_NAME is set to in the file endpoint_handler.py): You can give your endpoint a custom name. It is recomended that you don't do this but if you do, you have to change it in the lamdba images (constant is called ENDPOINT_NAME in index_creation_app.py and runtime_lambda_app.py)
number_of_gpu (optional, default is 1): Set this to any number of GPUs the hardware you chose allows.

If you have in invalid configuration the endpoint will fail to create. You can see the specific error in the cloudwatch logs. If you fail creation you can run python3 shut_down_endpoint.py to clean up the endpoint but if you do so manually in the console you must delete both the endpoint and the endpoint configuration

Further configuration

If you would like to further configure the endpoint you can change the specific code in endpoint_handler.py

The LLM is hosted on a sagemaker endpoint and deployed as a sagemaker ceModel. We are also using a ce model image. You can read more about it here. For further model configuration you can read about sagemaker model deployments here.

For our indexing and retrieval we are using llama-index. If you would like to configure the index retriever you can do so in the runtime_lambda_app.py file in the VectorIndexRetriever object on line 70. If you want to update index creation you can update the constants defined at the top of the index creation and runtime lambdas (index_creation_app.py and runtime_lambda_app.py). Make sure to familiarize yourself with llama-index terms and the llama-index prompthelper for best results.

Tips for best results

Keep your lambda perpetually warm by provisioning an instance for the runtime lambda (lex-codehook-fn)

Go to Lambda console > select the function lex-codehook-fn

Versions > Publish new version

Under this version

Provisioned Concurrency > set value to 1
Permissions > Resource based policy statements > Add Permissions > AWS Service > Other, your-policy-name, lexv2.amazonaws.com, your-lex-bot-arn, lamdba:InvokeFunction

Go to your Lex Bot (LexGenAIDemoBotCfn)

Aliases > your-alias > your-language > change lambda function version or alias > change to your-version

This will keep an instance running at all times and keep your lambda ready so that you won't have cold start latency. This will cost a bit extra (https://aws.amazon.com/lambda/pricing/) so use thoughtfully.

aws-lex-conv-faq's People

Contributors

Stargazers

Watchers

Forkers

matiascirilloch rupeshtiwari danakath nghtm rohit036

aws-lex-conv-faq's Issues

unable to redeploy

I've made a successfully build of the demo, once and I've deleted all the resources at the end. Now, I'm trying to do the same process again and it doesn't creates all resources as it made in the first round.
Already deleted the CDKToolkit template in cloudformation

Deleted all buckets
Deleted all Lambdas
Deleted the Lex bot
Deleted the ECR Images and Repository
Deleted the Instance in Sagemaker with configuration and model
Deleted the SSM configuration

Now it just creates the Asset bucket + ECR Repository + Images and thats it. when I run the cdk deploy
and I'm attempting to do it in the same acocunt and the same region as before.

MemorySize issue with lambda function

while running the code in the step of "cdk deploy LexGenAIDemoFilesStack" i receive error , below is the error

❌ LexGenAIDemoFilesStack failed: Error: The stack named LexGenAIDemoFilesStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "'MemorySize' value failed to satisfy constraint: Member must have value less than or equal to 3008 (Service: Lambda, Status Code: 400, Request ID: c44f0e2d-457d-4c11-837e-35da515dfa2f)

how to resolve it,
i also requested in service quota of lambda service to increase concurrency , but after increment still problem is there.

[ERROR] Runtime.ImportModuleError: Unable to import module 'index_creation_app': cannot import name 'BaseCache' from 'langchain'

Read-source-and-build-index-fn Lambda Function is failing with:

[ERROR] Runtime.ImportModuleError: Unable to import module 'index_creation_app': cannot import name 'BaseCache' from 'langchain' (/var/task/langchain/__init__.py)Traceback (most recent call last): | [ERROR] Runtime.ImportModuleError: Unable to import module 'index_creation_app': cannot import name 'BaseCache' from 'langchain' (/var/task/langchain/__init__.py) Traceback (most recent call last):
-- | --

ValueError: Unsupported model type falcon

I am getting the following error. Will update with FLAN instead and rety.

#033[2m2023-10-06T18:32:49.087661Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: "tiiuae/falcon-7b-instruct", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 8080, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/tmp"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false }
--
#033[2m2023-10-06T18:32:49.087762Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting download process.
#033[2m2023-10-06T18:32:51.646683Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model tiiuae/falcon-7b-instruct at revision None. Downloading PyTorch weights.
#033[2m2023-10-06T18:32:51.722142Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download file: pytorch_model-00001-of-00002.bin
#033[2m2023-10-06T18:33:02.752506Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Downloaded /tmp/models--tiiuae--falcon-7b-instruct/snapshots/cf4b3c42ce2fdfe24f753f0f0d179202fea59c99/pytorch_model-00001-of-00002.bin in 0:00:11.
#033[2m2023-10-06T18:33:02.752599Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download: [1/2] -- ETA: 0:00:11
#033[2m2023-10-06T18:33:02.752840Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download file: pytorch_model-00002-of-00002.bin
#033[2m2023-10-06T18:33:07.745517Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Downloaded /tmp/models--tiiuae--falcon-7b-instruct/snapshots/cf4b3c42ce2fdfe24f753f0f0d179202fea59c99/pytorch_model-00002-of-00002.bin in 0:00:04.
#033[2m2023-10-06T18:33:07.745585Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Download: [2/2] -- ETA: 0
#033[2m2023-10-06T18:33:07.745660Z#033[0m #033[33m WARN#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m No safetensors weights found for model tiiuae/falcon-7b-instruct at revision None. Converting PyTorch weights to safetensors.
#033[2m2023-10-06T18:33:07.745780Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /tmp/models--tiiuae--falcon-7b-instruct/snapshots/cf4b3c42ce2fdfe24f753f0f0d179202fea59c99/pytorch_model-00001-of-00002.bin to /tmp/models--tiiuae--falcon-7b-instruct/snapshots/cf4b3c42ce2fdfe24f753f0f0d179202fea59c99/model-00001-of-00002.safetensors.
#033[2m2023-10-06T18:33:19.153703Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert: [1/2] -- Took: 0:00:11.407687
#033[2m2023-10-06T18:33:19.153776Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert /tmp/models--tiiuae--falcon-7b-instruct/snapshots/cf4b3c42ce2fdfe24f753f0f0d179202fea59c99/pytorch_model-00002-of-00002.bin to /tmp/models--tiiuae--falcon-7b-instruct/snapshots/cf4b3c42ce2fdfe24f753f0f0d179202fea59c99/model-00002-of-00002.safetensors.
#033[2m2023-10-06T18:33:24.271356Z#033[0m #033[32m INFO#033[0m #033[1mdownload#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Convert: [2/2] -- Took: 0:00:05.117383
#033[2m2023-10-06T18:33:24.829104Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Successfully downloaded weights.
#033[2m2023-10-06T18:33:24.829294Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Starting shard 0
#033[2m2023-10-06T18:33:28.178304Z#033[0m #033[31mERROR#033[0m #033[1mshard-manager#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Error when initializing model
Traceback (most recent call last):  File "/opt/conda/bin/text-generation-server", line 8, in <module>    sys.exit(app())  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__    return get_command(self)(*args, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__    return self.main(*args, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main    return _main(  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main    rv = self.invoke(ctx)  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke    return _process_result(sub_ctx.command.invoke(sub_ctx))  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke    return ctx.invoke(self.callback, **ctx.params)  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke    return __callback(*args, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper    return callback(**use_params)  # type: ignore  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run    return loop.run_until_complete(main)  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete    self.run_forever()  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever    self._run_once()  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once    handle._run()  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 314, in get_model    raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type falcon #033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m
#033[2m2023-10-06T18:33:28.832440Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start:
Traceback (most recent call last):  File "/opt/conda/bin/text-generation-server", line 8, in <module>    sys.exit(app())  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run    return loop.run_until_complete(main)  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete    return future.result()  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 314, in get_model    raise ValueError(f"Unsupported model type {model_type}")
 

<br class="Apple-interchange-newline">

[Error] when uploading txt file to source materials bucket

Revieve following error from the read-source-and-build-index Lambda function when I try to upload attached .txt file to the source materials bucket. Able to execute with a pdf instead
allergies_faq.txt

[ERROR] TypeError: __init__() takes 1 positional argument but 2 were givenTraceback (most recent call last):  File "/var/task/index_creation_app.py", line 63, in handler    documents = [Document(t) for t in text_list]  File "/var/task/index_creation_app.py", line 63, in <listcomp>    documents = [Document(t) for t in text_list] | [ERROR] TypeError: __init__() takes 1 positional argument but 2 were given Traceback (most recent call last):   File "/var/task/index_creation_app.py", line 63, in handler     documents = [Document(t) for t in text_list]   File "/var/task/index_creation_app.py", line 63, in <listcomp>     documents = [Document(t) for t in text_list]
-- | --

[Error] exec ./entrypoint.sh: exec format error

When I run cdk synth on an M1 Mac, my endpoint health checks fail with the following CloudWatch logs:

exec ./entrypoint.sh: exec format error

I have read that this is due to architecture incompatibilities when building on an M1 chip - however I'm not quite sure where in the repository I should update the target platform configuration (one of the included Docker images? An HuggingFace image change? Something else?)

Thanks for any advice.

Cloudformation Issues

Hi Guys,

I`m having a Cloudformation issue on the step 2/9 while creating the StaginBucket during "cdk boostrap". Any suggestion?