containers / ai-lab-recipes Goto Github PK

View Code? Open in Web Editor NEW

83.0 83.0 92.0 4.98 MB

Examples for building and running LLM services and applications locally with Podman

License: Apache License 2.0

Dockerfile 27.94% Python 32.03% Shell 14.95% Jupyter Notebook 9.33% Makefile 15.20% Jinja 0.56%

ai-lab-recipes's People

Contributors

Stargazers

Watchers

Forkers

sallyom shreyanand jeffmaury mairin michaelclifford gbraad-redhat lstocchi suppathak codificat jameskunstle bryonbaker hemajv jerpeter1 ez7051 datianshi arslankhanali leseb slp vladikr lmilbaum weibb123 vishnoianil vrothberg rhatdan cevich gregory-pereira srampal jasonoh1998 axel7083 ckyrouac cgwalters paulyuuu stefwalter airlied n1hility cooktheryan hellohellenmao danmcp cdoern jaideepr97 ramkrsna geoffallendev ericcurtin jamesfalkner dougsland javipolo markmc benoitf romfreiman yangcao77 fabiendupont gcolman algoskynet xuliucool bcrochet johnmcollier wes-spinks pierdipi brianwcook kenmoini enriquebelarte luis5tb barakhamami tiran prarit yourdanov tsorya dimka108 ralphbean aspirina765 lordunix mhdawson alxlenc omertuc bilal-rh jeremyeder eranco74 pastequo kwozyman jorgeromero2008 yleah braultatgithub maysunfaisal jhutar relyt0925 jwielerh melodyliu1986

ai-lab-recipes's Issues

Lots of 404 when running playground

@MichaelClifford i see a lot of calls fialing wiht 404. Is it expected?

Bad build instruction in the audio_to_text recipe

This would lead to:

Error: no Containerfile or Dockerfile specified or found in context directory, C:\Users\Jeff\work\src\github.com\containers\ai-lab-recipes\model_servers\whispercpp: Le fichier spécifié est introuvable.

ai-lab-recipes/recipes/audio/audio_to_text/README.md

Line 53 in 96555a1

podman build -t whispercppserver .

Document how to enable use of GPU

It's not clear how to utilize GPU with recipes, if GPU is available - should add documentation for each app where this is possible.

Bad Python script in chatbot readme

The script is ask not summarize

https://github.com/MichaelClifford/locallm/blob/8a3095328385fb1cc49ec87898dbf44e9018ab54/chatbot/README.md?plain=1#L62

READMEs: explain how to create disk images

Some of the recipe READMEs explain how to build bootable container images (e.g., https://github.com/containers/ai-lab-recipes/tree/main/recipes/natural_language_processing/chatbot#embed-the-ai-application-in-a-bootable-container-image).

To complete the e2e UX, I want the docs/READMEs to also elaborate on how to build disk images. I want to cover CLI use cases and also point to Podman Desktop Bootc Extension for desktop users.

Cc: @rhatdan @sallyom

make run doesn't work on mac

Following the README instructions on mac, I ran:

make -f Makefile build && make -f Makefile run

build step completes. And the it runs:

cd ../../models && \
        podman run -it -d -p 8001:8001 -v ./mistral-7b-instruct-v0.1.Q4_K_M.gguf:/locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf:ro -e MODEL_PATH=/locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf -e HOST=0.0.0.0 -e PORT=8001 quay.io/ai-lab/model_servers/llamacpp_python:latest

The container instantly crashes with:

llama_model_load: error loading model: failed to open /locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf: Permission denied
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 88, in <module>
    main()
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 74, in main
    app = create_app(
          ^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 138, in create_app
    set_llama_proxy(model_settings=model_settings)
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 75, in set_llama_proxy
    _llama_proxy = LlamaProxy(models=model_settings)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 31, in __init__
    self._current_model = self.load_llama_from_model_settings(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 138, in load_llama_from_model_settings
    _model = create_fn(
             ^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama.py", line 314, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf

I think that the it might be an issue with the BIND_MOUNT_OPTION in the model_severs common Makefile.

add memory to rag and chat applications

Both chat_langchain and rag_langchain do not have memory components. Meaning, that you cannot ask follow up questions during a chat session.

We should add the a chat memory component that allows for asking follow up questions. It should also gracefully handle the case in which the chat history exceeds the models context window.

need to add model-converter image to workflows

This PR removed the model-converter image, it needs to be added back when we figure out why it is failing.

Cannot integrate audio_to_text in AI Lab

path is incorrect as the context dir is audio_to_text

See

ai-lab-recipes/recipes/audio/audio_to_text/builds/Containerfile

Line 3 in cd6730d

COPY requirements.txt /locallm/requirements.txt

object-detection does not work anymore in AI Lab

Caused by 7677970

See

ai-lab-recipes/recipes/computer_vision/object_detection/ai-lab.yaml

Line 8 in 7677970

contextdir: ./model_server

Convert Mirror Repo strategy to self-hosted github Runners

The current repo mirror strategy to drive builds down is not scaleable. We should look to move to using self-hosted Github runners where we can mount the models, stored on persistent storage, to the filesystem in such a way that our tests will not run out storage, and will not have flakiness due to multi-gigabyte model downloads. Even if we could limp along with our current solution, swapping to this strategy will be a requirement of testing our multi-model feature in llamacpp_python model_server.

Initial idea was discussed in the thread beginning with: https://redhat-internal.slack.com/archives/C06S75ZF9JT/p1713089733094399?thread_ts=1712828397.645709&cid=C06S75ZF9JT .

We plan to implement this after Release 1.0 so as to not interfere, but the POC can be developed and run alongside our workloads leading up to and during release.

/assign @lmilbaum
/assign @Gregory-Pereira

Add Whisper Client App

Once #63 is complete and we have an API for the whisper model service. We should add an app that allows a user to upload an audio file and get text response.

Ideally, we'd like to be able to send the output of this model to the summarizer or rag app to interact with it further.

Error when running auto_to_text recipe in AI Lab with the provided audio file

Here is the error in the client application:

Add audio file converter

We should add a tool into the whisper_playground workflow that does this file conversion for the user.

ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>

feat(audio_to_text): convert audio to the expected format

As raised by @lstocchi in containers/podman-desktop-extension-ai-lab#869 (review) user may try to use some format which are not compatible.

As ffmpeg is available, the whisper_client.py could check for the format of the source audio and convert it to the expected format before sending it.

cc @MichaelClifford

publish container images

The following images would be interesting to have available to a registry

models/convert_models

Required by containers/podman-desktop-extension-ai-lab#542

model_servers/llamacpp_python/cuda
model_servers/llamacpp_python/vulkan

docs reference XDG_RUNTIME_DIR

The docs mention XDG_RUNTIME_DIR for mounting the auth.json file:

podman build --build-arg "sshpubkey=$(cat ~/.ssh/id_rsa.pub)" \
           --security-opt label=disable \
	   -v ${XDG_RUNTIME_DIR}/containers/auth.json:/run/containers/0/auth.json \
	   --cap-add SYS_ADMIN \
	   -t quay.io/yourrepo/youros:tag .

The docs further mention building on the Mac. But, on the Mac the file resides in $HOME/.config/containers/auth.json.

I think the docs need some tweaking to be portable across Linux/Mac/Win.

Add quadlet for RAG-langchain app

add quadlet

Add documentation for `make` targets

We've added convenient make targets & we need to document them!

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

chore(deps): update dependency safetensors to v0.4.4

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.

chore(deps): update all dependencies (Pygments, attrs, aws-actions/configure-aws-credentials, blinker, cachetools, certifi, containers.podman, fastapi, huggingface-hub, jsonschema, langchain, mwader/static-ffmpeg, numpy, nvcr.io/nvidia/cuda, packaging, pillow, pip, protobuf, pyarrow, pydantic, pydantic_core, pydeck, pypdf, pytest, quay.io/containers/podman, referencing, regex, requests, rpds-py, selenium, sentence-transformers, starlette, streamlit, sympy, tenacity, torch, torchvision, transformers, typing_extensions)

Detected dependencies

ansible-galaxy

recipes/natural_language_processing/chatbot/provision/requirements.yml

containers.podman 1.13.0

recipes/natural_language_processing/codegen/provision/requirements.yml

containers.podman 1.13.0

recipes/natural_language_processing/rag/provision/requirements.yml

containers.podman 1.13.0

recipes/natural_language_processing/summarizer/provision/requirements.yml

containers.podman 1.13.0

dockerfile

.devcontainer/Containerfile

quay.io/containers/podman v5.0.2

convert_models/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

eval/promptfoo/base/Containerfile

registry.access.redhat.com/ubi9/nodejs-20-minimal 1-57

model_servers/llamacpp_python/base/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

model_servers/llamacpp_python/cuda/Containerfile

model_servers/llamacpp_python/vulkan/amd64/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

model_servers/llamacpp_python/vulkan/arm64/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

model_servers/object_detection_python/base/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

model_servers/ollama/base/Containerfile

model_servers/whispercpp/base/Containerfile

mwader/static-ffmpeg 6.1.1

mwader/static-ffmpeg 6.1.1

models/Containerfile

registry.access.redhat.com/ubi9/ubi-micro 9.4-13

recipes/audio/audio_to_text/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/audio/audio_to_text/bootc/Containerfile

recipes/audio/audio_to_text/bootc/Containerfile.nocache

recipes/computer_vision/object_detection/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/multimodal/image_understanding/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/natural_language_processing/chatbot/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/natural_language_processing/chatbot/bootc/Containerfile

recipes/natural_language_processing/chatbot/bootc/Containerfile.nocache

recipes/natural_language_processing/codegen/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/natural_language_processing/codegen/bootc/Containerfile

recipes/natural_language_processing/codegen/bootc/Containerfile.nocache

recipes/natural_language_processing/rag/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/natural_language_processing/rag/bootc/Containerfile

recipes/natural_language_processing/rag/bootc/Containerfile.nocache

recipes/natural_language_processing/summarizer/app/Containerfile

registry.access.redhat.com/ubi9/python-311 1-72.1722518949

recipes/natural_language_processing/summarizer/bootc/Containerfile

recipes/natural_language_processing/summarizer/bootc/Containerfile.nocache

training/amd-bootc/Containerfile

training/common/driver-toolkit/Containerfile

training/deepspeed/Containerfile

nvcr.io/nvidia/cuda 12.1.1-cudnn8-devel-ubi9

training/intel-bootc/Containerfile

training/model/Containerfile

training/nvidia-bootc/Containerfile

training/vllm/Containerfile

vector_dbs/chromadb/Containerfile

vector_dbs/milvus/Containerfile

github-actions

.github/workflows/chatbot.yaml

actions/checkout v4.1.7

actions/setup-python v5.1.1

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

registry 2.8.3

ubuntu 24.04

.github/workflows/codegen.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

actions/setup-python v5.1.1

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

registry 2.8.3

ubuntu 24.04

.github/workflows/instructlab.yaml

actions/checkout v4.1.7

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

slackapi/slack-github-action v1.26.0

.github/workflows/manual_build_trigger.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2

actions/checkout v4.1.7

redhat-actions/buildah-build v2

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2

actions/checkout v4.1.7

redhat-actions/buildah-build v2

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2

actions/checkout v4.1.7

redhat-actions/buildah-build v2

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2

actions/checkout v4.1.7

redhat-actions/buildah-build v2

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2.8

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2.8

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1

redhat-actions/push-to-registry v2.8

ubuntu 24.04

ubuntu 24.04

ubuntu 24.04

ubuntu 24.04

ubuntu 24.04

ubuntu 24.04

ubuntu 24.04

ubuntu 24.04

.github/workflows/mirror_repository.yaml

actions/checkout v4.1.7

pixta-dev/repository-mirroring-action v1.1.1

slackapi/slack-github-action v1.26.0

ubuntu 24.04

.github/workflows/model_converter.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

ubuntu 24.04

.github/workflows/model_servers.yaml

actions/checkout v4.1.7

actions/setup-python v5.1.1

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

registry 2.8.3

ubuntu 24.04

.github/workflows/models.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

ubuntu 24.04

.github/workflows/object_detection.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

actions/setup-python v5.1.1

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

registry 2.8.3

ubuntu 24.04

.github/workflows/rag.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

actions/setup-python v5.1.1

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

registry 2.8.3

ubuntu 24.04

.github/workflows/summarizer.yaml

actions/checkout v4.1.7

redhat-actions/buildah-build v2.13

actions/setup-python v5.1.1

redhat-actions/podman-login v1.7

redhat-actions/push-to-registry v2.8

registry 2.8.3

ubuntu 24.04

.github/workflows/test-trace-steps.yaml

actions/checkout v4.1.7

actions/setup-python v5.1.1

ubuntu 24.04

.github/workflows/testing_framework.yaml

actions/checkout v4.1.7

actions/setup-python v5.1.1

actions/checkout v4.1.7

hashicorp/setup-terraform v3.1.1

slackapi/slack-github-action v1.26.0

redhat-actions/podman-login v1.7

slackapi/slack-github-action v1.26.0

actions/checkout v4.1.7

actions/setup-python v5.1.1

slackapi/slack-github-action v1.26.0

ubuntu 24.04

ubuntu 24.04

.github/workflows/training-e2e.yaml

actions/checkout v4.1.7

actions/checkout v4.1.7

hashicorp/setup-terraform v3.1.1

mxschmitt/action-tmate v3.18

slackapi/slack-github-action v1.26.0

ubuntu 24.04

.github/workflows/training_bootc.yaml

aws-actions/configure-aws-credentials v1

machulav/ec2-github-runner v2

actions/checkout v4.1.7

redhat-actions/push-to-registry v2.8

redhat-actions/push-to-registry v2.8

slackapi/slack-github-action v1.26.0

actions/checkout v4.1.7

redhat-actions/push-to-registry v2.8

slackapi/slack-github-action v1.26.0

actions/checkout v4.1.7

redhat-actions/push-to-registry v2.8

slackapi/slack-github-action v1.26.0

aws-actions/configure-aws-credentials v1

machulav/ec2-github-runner v2

pip_requirements

convert_models/requirements.txt

model_servers/llamacpp_python/src/requirements.txt

llama-cpp-python ==0.2.85

transformers ==4.41.2

pip ==24.0

model_servers/object_detection_python/src/requirements-unlocked.txt

model_servers/object_detection_python/src/requirements.txt

annotated-types ==0.7.0

anyio ==4.4.0

certifi ==2024.6.2

charset-normalizer ==3.3.2

click ==8.1.7

dnspython ==2.6.1

email_validator ==2.2.0

fastapi ==0.111.1

fastapi-cli ==0.0.5

filelock ==3.15.4

fsspec ==2024.6.1

h11 ==0.14.0

httpcore ==1.0.5

httptools ==0.6.1

httpx ==0.27.0

huggingface-hub ==0.23.4

idna ==3.7

Jinja2 ==3.1.4

markdown-it-py ==3.0.0

MarkupSafe ==2.1.5

mdurl ==0.1.2

mpmath ==1.3.0

networkx ==3.3

numpy ==2.0.1

orjson ==3.10.6

packaging ==24.1

pillow ==10.3.0

pydantic ==2.7.4

pydantic_core ==2.18.4

Pygments ==2.18.0

python-dotenv ==1.0.1

python-multipart ==0.0.9

PyYAML ==6.0.1

regex ==2024.5.15

requests ==2.32.3

rich ==13.7.1

safetensors ==0.4.3

shellingham ==1.5.4

sniffio ==1.3.1

starlette ==0.37.2

sympy ==1.12.1

timm ==1.0.8

tokenizers ==0.19.1

torch ==2.3.1

torchvision ==0.18.1

tqdm ==4.66.5

transformers ==4.41.2

typer ==0.12.3

typing_extensions ==4.12.2

ujson ==5.10.0

urllib3 ==2.2.2

uvicorn ==0.30.5

uvloop ==0.19.0

watchfiles ==0.22.0

websockets ==12.0

recipes/audio/audio_to_text/app/requirements.txt

recipes/computer_vision/object_detection/app/requirements.txt

altair ==5.3.0

attrs ==23.2.0

blinker ==1.7.0

cachetools ==5.3.3

certifi ==2024.2.2

charset-normalizer ==3.3.2

click ==8.1.7

gitdb ==4.0.11

GitPython ==3.1.43

idna ==3.7

Jinja2 ==3.1.4

jsonschema ==4.21.1

jsonschema-specifications ==2023.12.1

markdown-it-py ==3.0.0

MarkupSafe ==2.1.5

mdurl ==0.1.2

numpy ==1.26.4

packaging ==24.0

pandas ==2.2.2

pillow ==10.3.0

protobuf ==4.25.3

pyarrow ==15.0.2

pydeck ==0.8.1b0

Pygments ==2.17.2

python-dateutil ==2.9.0.post0

pytz ==2024.1

referencing ==0.34.0

requests ==2.31.0

rich ==13.7.1

rpds-py ==0.18.1

six ==1.16.0

smmap ==5.0.1

streamlit ==1.33.0

tenacity ==8.2.3

toml ==0.10.2

toolz ==0.12.1

tornado ==6.4.1

typing_extensions ==4.11.0

tzdata ==2024.1

urllib3 ==2.2.2

recipes/multimodal/image_understanding/app/requirements.txt

recipes/natural_language_processing/chatbot/app/requirements.txt

langchain ==0.2.3

langchain-openai ==0.1.7

langchain-community ==0.2.4

streamlit ==1.34.0

recipes/natural_language_processing/codegen/app/requirements.txt

langchain ==0.1.20

langchain-openai ==0.1.7

streamlit ==1.34.0

recipes/natural_language_processing/rag/app/requirements.txt

langchain-openai ==0.1.7

langchain ==0.1.20

chromadb ==0.5.5

sentence-transformers ==2.7.0

streamlit ==1.34.0

pypdf ==4.2.0

pymilvus ==2.4.1

recipes/natural_language_processing/summarizer/app/requirements.txt

langchain ==0.1.20

langchain-openai ==0.1.7

streamlit ==1.34.0

PyMuPDF ==1.24.9

rouge_score ==0.1.2

requirements-test.txt

pip ==24.0

pytest-container ==0.4.2

pytest-selenium ==4.1.0

pytest-testinfra ==10.1.1

pytest ==8.2.2

requests ==2.31.0

selenium ==4.20.0

tenacity ==8.2.3

update descriptions of recipes

The AI Lab extension displays the descriptions of the recipes which at the present are quite repetitive:

They all start with "This is a[n] ..."
All claim it's a "demo application" which doesn't sound fit for production
I'd prefer a short description of the use cases. For instance, "Summarizer: Summarize text files in a web front end." (or something similar).

Release coordination for release `1.0`

We will be releasing version 1.0 to matchup with Podman Desktop AI Lab.
Release Date: Wednesday, March 17th

Release Criteria:

Brief change log or summary of issues tackled and new features
- (OPTIONAL): A comparison of where we are in relation to the Podman Desktop team
A commit is made to trigger every workflow in the repo with tests passing
Potentially switch to granite mode by default, if it is released?

...
Thoughts @lmilbaum @sallyom @MichaelClifford @rhatdan ?

Typo in image name

See

ai-lab-recipes/CONTRIBUTING.md

Line 53 in 53af15b

image: quay.io/ai-lab/llamacppp-python:latest

Also

ai-lab-recipes/recipes/natural_language_processing/chatbot/ai-lab.yaml

Line 18 in 53af15b

image: quay.io/ai-lab/llamacppp-python:latest

confusing recipe descriptions for Podman Desktop users

We received user feedback that the recipe descriptions can be confusing. Let's take the ChatBot as an example (see screenshot) below. The description does not target the desktop use case but the terminal/CLI one. So there is some friction.

Since the AI Lab Recipes target both use cases, I guess that the READMEs should first target the desktop use case and then the CLI use with clear descriptions of the target audience?

summarizer demo not working because of unresolved module chat

Happens when the summarizer image is run

extend CI for all sample applications with every PR

Need to add functional testing for:

(pending #247)

object_detection recipe does not start in AI Lab

The object_detection_server starts with the following error:

Traceback (most recent call last):
  File "/opt/app-root/bin/uvicorn", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/main.py", line 409, in main
    run(
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/main.py", line 575, in run
    server.run()
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/server.py", line 69, in serve
    await self._serve(sockets)
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/server.py", line 76, in _serve
    config.load()
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/config.py", line 433, in load
    self.loaded_app = import_from_string(self.app)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/locallm/object_detection_server.py", line 15, in <module>
    processor = AutoImageProcessor.from_pretrained(model, revision=revision)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�  File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/image_processing_auto.py", line 358, in from_pretrained
    config_dict, _ = ImageProcessingMixin.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�  File "/opt/app-root/lib64/python3.11/site-packages/transformers/image_processing_utils.py", line 363, in get_image_processor_dict
    text = reader.read()
           ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

Add API to whisper_playground model service

Now that we have a whispercpp image that can be used to convert audio to text, want to use it the same way we use the existing playground image, as a model service with an exposed API. Unlike Llama_cpp_python, whisper.cpp does not appear to come with a prebuilt api server for the model. So we should create a light weight API to use for this model type.

Provides a WAV file as part of the data folder

That would be a great plus for people demoing the audio_to_text recipe

Path in recipes are not uniform

Some of them use - others use _

link model instead of `wget`

re https://github.com/MichaelClifford/locallm/blob/a82ac930d2905ad390a0fd3fcb3ade5a2319bde7/Containerfile-rh#L6-L7

First there's no reason to install wget - we already have curl.

But even better, I think we want to use COPY --link here, as is recommended by e.g. https://github.com/depot/depot.ai#usage

Restore Readme for the recipes

With the latest changes on the recipes, the readme are not providing the same level of details.

Now: https://github.com/redhat-et/locallm/blob/main/summarizer-langchain/README.md
Before: https://github.com/redhat-et/locallm/blob/5ff3767ff3e265a917a4f604211557d2511d41b4/summarizer/README.md

We would like to get back to the previous version of the content, aligned with the latest changes.

bug(llamacpp): GPU access blocked by the operating system

System information

Windows 11
WSL version: 2.1.5.0

Details

To access the GPU in a container while using podman on windows, we have to make some commands¹

In the following example, we are using official nvidia images. Not images provided by this repository

podman run \
    --device=/dev/dxg \
    --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl \
    --gpus all \
    --entrypoint=sh \
    docker.io/nvidia/cuda:12.3.1-devel-ubuntu22.04 \
    -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nvidia-smi'

This will output the following on my system

Fri Mar 22 17:42:48 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.53       Driver Version: 497.29       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:08:00.0  On |                  N/A |
| 34%   38C    P0    28W / 120W |   1344MiB /  6144MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Using `model_servers/llamacpp/cuda/Containerfile`

In the following, I will try to reproduce the GPU access I had using official nvidia images with the one from this repository

Reproduce

git clone https://github.com/redhat-et/locallm
cd locallm\model_servers\llamacpp\
podman build --platform linux/amd64 -t chatbot:service-cuda -f cuda/Containerfile .

Once I have the chatbot:service-cuda image, I will run the same scenario I made with the previous ubuntu based image

podman run \
    --device=/dev/dxg \
    --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl \
    --gpus all \
    --entrypoint=sh \
    localhost/chatbot:service-cuda \
    -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nvidia-smi'

but this operation will result in the following

ln: target '/usr/lib/x86_64-linux-gnu/': No such file or directory

https://github.com/microsoft/WSL/issues/8666#issue-1322829203 ↩

Containerfile.nocache won't build w/ podman system reset command

STEP 11/11: RUN podman system reset --force 2>/dev/null
Error: building at STEP "RUN podman system reset --force 2>/dev/null": while running runtime: exit status 125

Looking at the containerfile I'm not sure we need this line. It fails for me regardless of ARCH=aarch64 or x86_64. It seems to work when I drop it. I'm testing with make FROM=registry.redhat.io/rhel9-beta/rhel-bootc:9.4 CONTAINERFILE=Containerfile.nocache bootc with the summarizer app

Restructuring Proposal

I would like to change the structure of this repo a bit to make it more intuitive and easier to maintain as it continues to grow. Below please find a sketch of the proposed structure.

The main changes would be:

playground/ would be moved into model_servers/ and renamed llamacpp_python/
A new directory called recipes/ with N subdirectories for each recipe category. Right now that is natural_language_processing/, computer_vision/, audio/ and multimodal/.
All existing recipes will be moved to the correct subdirectory.
A new directory called VectorDBs/. With the intent to add any new component types that may arise in the future in the same way to the root directory.
Move chromadb/ into VectorDBs/

Please let me know what you all think of the proposed changes and how they might affect any dependencies on this repo.

locallm:
|-- assets
    |-- image.1.png
    |-- image2.png
    |-- ...
|-- data
    |-- file.txt
    |-- ...
|-- models
    |-- Model1Folder
    |-- Model2Folder
    |-- Convert_Models
    |-- ...
|-- model_servers
    |-- llamacpp_python
        |-- base
        |-- cuda
        |-- vulkan
        |-- ....
    |-- ollama
        |-- base
        |-- ....
    |-- caikit
        |-- base
        |-- ....
    |-- WhisperCPP
        |-- base
        |-- ....
    |-- ...
|-- recipes
    |-- Natural Language
        |-- Chat
        |-- Summary
        |-- RAG
        |-- Fine Tune
        |-- ...
    |-- Computer Vision
        |-- Object Detection
        |-- ...
    |-- Audio
        |-- Transcription
        |-- ...
    |-- Multimodal
        |-- Image Description
        |-- ...
|-- VectorDBs
    |-- ChromaDB
    |-- ...

Need integration testing for all sample applications

Need to add make bootc testing for:
(pending #254)

codegen
summarizer
rag
audio-to-text

Add Java example recipe using langchain4J

Add model images to integration tests

Currently mistral is containerized in this workflow , but we want to add these to build with the model-servers, and to be promoted to ai-lab with the promotion workflow from ghcr to quay.io/ai-lab

Text to Json Recipe

A recipe that does the following:

Given a small set of json schemas (Ok to start with 1), create an LLM application where a user inputs their responses to the fields of the json schema using unstructured natural language and the model returns a correctly populated json.

A simple example to illustrate how this should work would be placing an order at a fast food restaurant. The user simply states their order, and the LLM generates the appropriate json that can be submitted to their oder management system.

Add quadlet and GH actions for whisper app

whisper images should be added to GH workflow and need to create quadlet files

Add links to the repos (make files) in each of the AI recipe readme

Discussed during the Weekly Demo on Apr 15. It's challenging for the user to connect with the AI recipe on Podman Desktop to convert the images with bootc utilizing the make files, including the discoverability of the make files so adding the links to repos with the make files in the readme(s) should be helpful

Wrong environment variable in README

Should be MODEL_ENDPOINT

ai-lab-recipes/recipes/audio/audio_to_text/README.md

Line 86 in 426143a

 podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text 

llamacpp_python container may not work on all CPUs

llama-cpp detects CPU features like AVX, AVX2, FMA3, and F16C at build time. If the container is built on a machine that supports these instruction sets, then the binary won't work on CPUs without these instructions.

ai-lab-recipes/model_servers/llamacpp_python/cuda/Containerfile

Lines 4 to 7 in 96555a1

 RUN pip install --upgrade pip 

 ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on" 

 ENV FORCE_CMAKE=1 

 RUN pip install --no-cache-dir --upgrade -r /locallm/requirements.txt

References:

Credits to @bbrowning for figuring this out. He suggested to build llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DLLAMA_F16C=OFF"

Deploy Chatbot to the ET cluster in a persistent way

/cc @MichaelClifford

Initial idea: Deploy the chatbot LLM to the ET cluster so we can start using our own tools and ideating on how we test and evaluate model performance.

Spinoff Idea: Might be cool to also do this with the RAG application once its ready so we can evaluate RAG cloud vs edge models and how we reconcile / update the 2 with data

Stretch goal: It would be cool if we can deploy chatbot to review readme changes in this repo.

Research a few tools + methods for implementing guard rails for LLM applications.
Add a recipe that provides developers with an example of implementing guard rails.

	RUN pip install --upgrade pip
	ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"
	ENV FORCE_CMAKE=1
	RUN pip install --no-cache-dir --upgrade -r /locallm/requirements.txt