Giter Club home page Giter Club logo

ai-lab-recipes's People

Contributors

axel7083 avatar cgwalters avatar cooktheryan avatar danmcp avatar enriquebelarte avatar ericcurtin avatar fabiendupont avatar gregory-pereira avatar hemajv avatar jaideepr97 avatar javipolo avatar jeffmaury avatar johnmcollier avatar kwozyman avatar lmilbaum avatar lstocchi avatar markmc avatar michaelclifford avatar n1hility avatar omertuc avatar pastequo avatar paulyuuu avatar platform-engineering-bot avatar ralphbean avatar rhatdan avatar sallyom avatar shreyanand avatar tiran avatar tsorya avatar vrothberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai-lab-recipes's Issues

Document how to enable use of GPU

It's not clear how to utilize GPU with recipes, if GPU is available - should add documentation for each app where this is possible.

make run doesn't work on mac

Following the README instructions on mac, I ran:

make -f Makefile build && make -f Makefile run

build step completes. And the it runs:

cd ../../models && \
        podman run -it -d -p 8001:8001 -v ./mistral-7b-instruct-v0.1.Q4_K_M.gguf:/locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf:ro -e MODEL_PATH=/locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf -e HOST=0.0.0.0 -e PORT=8001 quay.io/ai-lab/model_servers/llamacpp_python:latest

The container instantly crashes with:

llama_model_load: error loading model: failed to open /locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf: Permission denied
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 88, in <module>
    main()
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 74, in main
    app = create_app(
          ^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 138, in create_app
    set_llama_proxy(model_settings=model_settings)
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 75, in set_llama_proxy
    _llama_proxy = LlamaProxy(models=model_settings)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 31, in __init__
    self._current_model = self.load_llama_from_model_settings(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 138, in load_llama_from_model_settings
    _model = create_fn(
             ^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama.py", line 314, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /locallm/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf

I think that the it might be an issue with the BIND_MOUNT_OPTION in the model_severs common Makefile.

add memory to rag and chat applications

Both chat_langchain and rag_langchain do not have memory components. Meaning, that you cannot ask follow up questions during a chat session.

We should add the a chat memory component that allows for asking follow up questions. It should also gracefully handle the case in which the chat history exceeds the models context window.

Convert Mirror Repo strategy to self-hosted github Runners

The current repo mirror strategy to drive builds down is not scaleable. We should look to move to using self-hosted Github runners where we can mount the models, stored on persistent storage, to the filesystem in such a way that our tests will not run out storage, and will not have flakiness due to multi-gigabyte model downloads. Even if we could limp along with our current solution, swapping to this strategy will be a requirement of testing our multi-model feature in llamacpp_python model_server.

Initial idea was discussed in the thread beginning with: https://redhat-internal.slack.com/archives/C06S75ZF9JT/p1713089733094399?thread_ts=1712828397.645709&cid=C06S75ZF9JT .

We plan to implement this after Release 1.0 so as to not interfere, but the POC can be developed and run alongside our workloads leading up to and during release.

/assign @lmilbaum
/assign @Gregory-Pereira

Add Whisper Client App

Once #63 is complete and we have an API for the whisper model service. We should add an app that allows a user to upload an audio file and get text response.

Ideally, we'd like to be able to send the output of this model to the summarizer or rag app to interact with it further.

Add audio file converter

We should add a tool into the whisper_playground workflow that does this file conversion for the user.

ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>

docs reference XDG_RUNTIME_DIR

The docs mention XDG_RUNTIME_DIR for mounting the auth.json file:

podman build --build-arg "sshpubkey=$(cat ~/.ssh/id_rsa.pub)" \
           --security-opt label=disable \
	   -v ${XDG_RUNTIME_DIR}/containers/auth.json:/run/containers/0/auth.json \
	   --cap-add SYS_ADMIN \
	   -t quay.io/yourrepo/youros:tag .

The docs further mention building on the Mac. But, on the Mac the file resides in $HOME/.config/containers/auth.json.

I think the docs need some tweaking to be portable across Linux/Mac/Win.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): update dependency safetensors to v0.4.4

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.

  • chore(deps): update all dependencies (Pygments, attrs, aws-actions/configure-aws-credentials, blinker, cachetools, certifi, containers.podman, fastapi, huggingface-hub, jsonschema, langchain, mwader/static-ffmpeg, numpy, nvcr.io/nvidia/cuda, packaging, pillow, pip, protobuf, pyarrow, pydantic, pydantic_core, pydeck, pypdf, pytest, quay.io/containers/podman, referencing, regex, requests, rpds-py, selenium, sentence-transformers, starlette, streamlit, sympy, tenacity, torch, torchvision, transformers, typing_extensions)

Detected dependencies

ansible-galaxy
recipes/natural_language_processing/chatbot/provision/requirements.yml
  • containers.podman 1.13.0
recipes/natural_language_processing/codegen/provision/requirements.yml
  • containers.podman 1.13.0
recipes/natural_language_processing/rag/provision/requirements.yml
  • containers.podman 1.13.0
recipes/natural_language_processing/summarizer/provision/requirements.yml
  • containers.podman 1.13.0
dockerfile
.devcontainer/Containerfile
  • quay.io/containers/podman v5.0.2
convert_models/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
eval/promptfoo/base/Containerfile
  • registry.access.redhat.com/ubi9/nodejs-20-minimal 1-57
model_servers/llamacpp_python/base/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
model_servers/llamacpp_python/cuda/Containerfile
model_servers/llamacpp_python/vulkan/amd64/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
model_servers/llamacpp_python/vulkan/arm64/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
model_servers/object_detection_python/base/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
model_servers/ollama/base/Containerfile
model_servers/whispercpp/base/Containerfile
  • mwader/static-ffmpeg 6.1.1
  • mwader/static-ffmpeg 6.1.1
models/Containerfile
  • registry.access.redhat.com/ubi9/ubi-micro 9.4-13
recipes/audio/audio_to_text/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/audio/audio_to_text/bootc/Containerfile
recipes/audio/audio_to_text/bootc/Containerfile.nocache
recipes/computer_vision/object_detection/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/multimodal/image_understanding/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/natural_language_processing/chatbot/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/natural_language_processing/chatbot/bootc/Containerfile
recipes/natural_language_processing/chatbot/bootc/Containerfile.nocache
recipes/natural_language_processing/codegen/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/natural_language_processing/codegen/bootc/Containerfile
recipes/natural_language_processing/codegen/bootc/Containerfile.nocache
recipes/natural_language_processing/rag/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/natural_language_processing/rag/bootc/Containerfile
recipes/natural_language_processing/rag/bootc/Containerfile.nocache
recipes/natural_language_processing/summarizer/app/Containerfile
  • registry.access.redhat.com/ubi9/python-311 1-72.1722518949
recipes/natural_language_processing/summarizer/bootc/Containerfile
recipes/natural_language_processing/summarizer/bootc/Containerfile.nocache
training/amd-bootc/Containerfile
training/common/driver-toolkit/Containerfile
training/deepspeed/Containerfile
  • nvcr.io/nvidia/cuda 12.1.1-cudnn8-devel-ubi9
training/intel-bootc/Containerfile
training/model/Containerfile
training/nvidia-bootc/Containerfile
training/vllm/Containerfile
vector_dbs/chromadb/Containerfile
vector_dbs/milvus/Containerfile
github-actions
.github/workflows/chatbot.yaml
  • actions/checkout v4.1.7
  • actions/setup-python v5.1.1
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • registry 2.8.3
  • ubuntu 24.04
.github/workflows/codegen.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • actions/setup-python v5.1.1
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • registry 2.8.3
  • ubuntu 24.04
.github/workflows/instructlab.yaml
  • actions/checkout v4.1.7
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • slackapi/slack-github-action v1.26.0
.github/workflows/manual_build_trigger.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2.8
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2.8
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1
  • redhat-actions/push-to-registry v2.8
  • ubuntu 24.04
  • ubuntu 24.04
  • ubuntu 24.04
  • ubuntu 24.04
  • ubuntu 24.04
  • ubuntu 24.04
  • ubuntu 24.04
  • ubuntu 24.04
.github/workflows/mirror_repository.yaml
  • actions/checkout v4.1.7
  • pixta-dev/repository-mirroring-action v1.1.1
  • slackapi/slack-github-action v1.26.0
  • ubuntu 24.04
.github/workflows/model_converter.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • ubuntu 24.04
.github/workflows/model_servers.yaml
  • actions/checkout v4.1.7
  • actions/setup-python v5.1.1
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • registry 2.8.3
  • ubuntu 24.04
.github/workflows/models.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • ubuntu 24.04
.github/workflows/object_detection.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • actions/setup-python v5.1.1
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • registry 2.8.3
  • ubuntu 24.04
.github/workflows/rag.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • actions/setup-python v5.1.1
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • registry 2.8.3
  • ubuntu 24.04
.github/workflows/summarizer.yaml
  • actions/checkout v4.1.7
  • redhat-actions/buildah-build v2.13
  • actions/setup-python v5.1.1
  • redhat-actions/podman-login v1.7
  • redhat-actions/push-to-registry v2.8
  • registry 2.8.3
  • ubuntu 24.04
.github/workflows/test-trace-steps.yaml
  • actions/checkout v4.1.7
  • actions/setup-python v5.1.1
  • ubuntu 24.04
.github/workflows/testing_framework.yaml
  • actions/checkout v4.1.7
  • actions/setup-python v5.1.1
  • actions/checkout v4.1.7
  • hashicorp/setup-terraform v3.1.1
  • slackapi/slack-github-action v1.26.0
  • redhat-actions/podman-login v1.7
  • slackapi/slack-github-action v1.26.0
  • actions/checkout v4.1.7
  • actions/setup-python v5.1.1
  • slackapi/slack-github-action v1.26.0
  • ubuntu 24.04
  • ubuntu 24.04
.github/workflows/training-e2e.yaml
  • actions/checkout v4.1.7
  • actions/checkout v4.1.7
  • hashicorp/setup-terraform v3.1.1
  • mxschmitt/action-tmate v3.18
  • slackapi/slack-github-action v1.26.0
  • ubuntu 24.04
.github/workflows/training_bootc.yaml
  • aws-actions/configure-aws-credentials v1
  • machulav/ec2-github-runner v2
  • actions/checkout v4.1.7
  • redhat-actions/push-to-registry v2.8
  • redhat-actions/push-to-registry v2.8
  • slackapi/slack-github-action v1.26.0
  • actions/checkout v4.1.7
  • redhat-actions/push-to-registry v2.8
  • slackapi/slack-github-action v1.26.0
  • actions/checkout v4.1.7
  • redhat-actions/push-to-registry v2.8
  • slackapi/slack-github-action v1.26.0
  • aws-actions/configure-aws-credentials v1
  • machulav/ec2-github-runner v2
pip_requirements
convert_models/requirements.txt
model_servers/llamacpp_python/src/requirements.txt
  • llama-cpp-python ==0.2.85
  • transformers ==4.41.2
  • pip ==24.0
model_servers/object_detection_python/src/requirements-unlocked.txt
model_servers/object_detection_python/src/requirements.txt
  • annotated-types ==0.7.0
  • anyio ==4.4.0
  • certifi ==2024.6.2
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • dnspython ==2.6.1
  • email_validator ==2.2.0
  • fastapi ==0.111.1
  • fastapi-cli ==0.0.5
  • filelock ==3.15.4
  • fsspec ==2024.6.1
  • h11 ==0.14.0
  • httpcore ==1.0.5
  • httptools ==0.6.1
  • httpx ==0.27.0
  • huggingface-hub ==0.23.4
  • idna ==3.7
  • Jinja2 ==3.1.4
  • markdown-it-py ==3.0.0
  • MarkupSafe ==2.1.5
  • mdurl ==0.1.2
  • mpmath ==1.3.0
  • networkx ==3.3
  • numpy ==2.0.1
  • orjson ==3.10.6
  • packaging ==24.1
  • pillow ==10.3.0
  • pydantic ==2.7.4
  • pydantic_core ==2.18.4
  • Pygments ==2.18.0
  • python-dotenv ==1.0.1
  • python-multipart ==0.0.9
  • PyYAML ==6.0.1
  • regex ==2024.5.15
  • requests ==2.32.3
  • rich ==13.7.1
  • safetensors ==0.4.3
  • shellingham ==1.5.4
  • sniffio ==1.3.1
  • starlette ==0.37.2
  • sympy ==1.12.1
  • timm ==1.0.8
  • tokenizers ==0.19.1
  • torch ==2.3.1
  • torchvision ==0.18.1
  • tqdm ==4.66.5
  • transformers ==4.41.2
  • typer ==0.12.3
  • typing_extensions ==4.12.2
  • ujson ==5.10.0
  • urllib3 ==2.2.2
  • uvicorn ==0.30.5
  • uvloop ==0.19.0
  • watchfiles ==0.22.0
  • websockets ==12.0
recipes/audio/audio_to_text/app/requirements.txt
recipes/computer_vision/object_detection/app/requirements.txt
  • altair ==5.3.0
  • attrs ==23.2.0
  • blinker ==1.7.0
  • cachetools ==5.3.3
  • certifi ==2024.2.2
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • gitdb ==4.0.11
  • GitPython ==3.1.43
  • idna ==3.7
  • Jinja2 ==3.1.4
  • jsonschema ==4.21.1
  • jsonschema-specifications ==2023.12.1
  • markdown-it-py ==3.0.0
  • MarkupSafe ==2.1.5
  • mdurl ==0.1.2
  • numpy ==1.26.4
  • packaging ==24.0
  • pandas ==2.2.2
  • pillow ==10.3.0
  • protobuf ==4.25.3
  • pyarrow ==15.0.2
  • pydeck ==0.8.1b0
  • Pygments ==2.17.2
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • referencing ==0.34.0
  • requests ==2.31.0
  • rich ==13.7.1
  • rpds-py ==0.18.1
  • six ==1.16.0
  • smmap ==5.0.1
  • streamlit ==1.33.0
  • tenacity ==8.2.3
  • toml ==0.10.2
  • toolz ==0.12.1
  • tornado ==6.4.1
  • typing_extensions ==4.11.0
  • tzdata ==2024.1
  • urllib3 ==2.2.2
recipes/multimodal/image_understanding/app/requirements.txt
recipes/natural_language_processing/chatbot/app/requirements.txt
  • langchain ==0.2.3
  • langchain-openai ==0.1.7
  • langchain-community ==0.2.4
  • streamlit ==1.34.0
recipes/natural_language_processing/codegen/app/requirements.txt
  • langchain ==0.1.20
  • langchain-openai ==0.1.7
  • streamlit ==1.34.0
recipes/natural_language_processing/rag/app/requirements.txt
  • langchain-openai ==0.1.7
  • langchain ==0.1.20
  • chromadb ==0.5.5
  • sentence-transformers ==2.7.0
  • streamlit ==1.34.0
  • pypdf ==4.2.0
  • pymilvus ==2.4.1
recipes/natural_language_processing/summarizer/app/requirements.txt
  • langchain ==0.1.20
  • langchain-openai ==0.1.7
  • streamlit ==1.34.0
  • PyMuPDF ==1.24.9
  • rouge_score ==0.1.2
requirements-test.txt
  • pip ==24.0
  • pytest-container ==0.4.2
  • pytest-selenium ==4.1.0
  • pytest-testinfra ==10.1.1
  • pytest ==8.2.2
  • requests ==2.31.0
  • selenium ==4.20.0
  • tenacity ==8.2.3

update descriptions of recipes

The AI Lab extension displays the descriptions of the recipes which at the present are quite repetitive:

  • They all start with "This is a[n] ..."
  • All claim it's a "demo application" which doesn't sound fit for production
  • I'd prefer a short description of the use cases. For instance, "Summarizer: Summarize text files in a web front end." (or something similar).

Screenshot 2024-04-22 at 09 47 35

Release coordination for release `1.0`

We will be releasing version 1.0 to matchup with Podman Desktop AI Lab.
Release Date: Wednesday, March 17th

Release Criteria:

  • Brief change log or summary of issues tackled and new features
    • (OPTIONAL): A comparison of where we are in relation to the Podman Desktop team
  • A commit is made to trigger every workflow in the repo with tests passing
  • Potentially switch to granite mode by default, if it is released?

...
Thoughts @lmilbaum @sallyom @MichaelClifford @rhatdan ?

confusing recipe descriptions for Podman Desktop users

We received user feedback that the recipe descriptions can be confusing. Let's take the ChatBot as an example (see screenshot) below. The description does not target the desktop use case but the terminal/CLI one. So there is some friction.

Since the AI Lab Recipes target both use cases, I guess that the READMEs should first target the desktop use case and then the CLI use with clear descriptions of the target audience?

Screenshot 2024-04-11 at 10 59 31

object_detection recipe does not start in AI Lab

The object_detection_server starts with the following error:

Traceback (most recent call last):
  File "/opt/app-root/bin/uvicorn", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/main.py", line 409, in main
    run(
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/main.py", line 575, in run
    server.run()
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/server.py", line 69, in serve
    await self._serve(sockets)
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/server.py", line 76, in _serve
    config.load()
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/config.py", line 433, in load
    self.loaded_app = import_from_string(self.app)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/locallm/object_detection_server.py", line 15, in <module>
    processor = AutoImageProcessor.from_pretrained(model, revision=revision)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�  File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/image_processing_auto.py", line 358, in from_pretrained
    config_dict, _ = ImageProcessingMixin.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�  File "/opt/app-root/lib64/python3.11/site-packages/transformers/image_processing_utils.py", line 363, in get_image_processor_dict
    text = reader.read()
           ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

Add API to whisper_playground model service

Now that we have a whispercpp image that can be used to convert audio to text, want to use it the same way we use the existing playground image, as a model service with an exposed API. Unlike Llama_cpp_python, whisper.cpp does not appear to come with a prebuilt api server for the model. So we should create a light weight API to use for this model type.

bug(llamacpp): GPU access blocked by the operating system

System information

  • Windows 11
  • WSL version: 2.1.5.0

Details

To access the GPU in a container while using podman on windows, we have to make some commands1

In the following example, we are using official nvidia images. Not images provided by this repository

podman run \
    --device=/dev/dxg \
    --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl \
    --gpus all \
    --entrypoint=sh \
    docker.io/nvidia/cuda:12.3.1-devel-ubuntu22.04 \
    -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nvidia-smi'

This will output the following on my system

Fri Mar 22 17:42:48 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.53       Driver Version: 497.29       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:08:00.0  On |                  N/A |
| 34%   38C    P0    28W / 120W |   1344MiB /  6144MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Using model_servers/llamacpp/cuda/Containerfile

In the following, I will try to reproduce the GPU access I had using official nvidia images with the one from this repository

Reproduce

git clone https://github.com/redhat-et/locallm
cd locallm\model_servers\llamacpp\
podman build --platform linux/amd64 -t chatbot:service-cuda -f cuda/Containerfile .

Once I have the chatbot:service-cuda image, I will run the same scenario I made with the previous ubuntu based image

podman run \
    --device=/dev/dxg \
    --mount type=bind,source=/usr/lib/wsl,target=/usr/lib/wsl \
    --gpus all \
    --entrypoint=sh \
    localhost/chatbot:service-cuda \
    -c '/usr/bin/ln -s /usr/lib/wsl/lib/* /usr/lib/x86_64-linux-gnu/ && PATH="${PATH}:/usr/lib/wsl/lib/" && nvidia-smi'

but this operation will result in the following

ln: target '/usr/lib/x86_64-linux-gnu/': No such file or directory

Footnotes

  1. https://github.com/microsoft/WSL/issues/8666#issue-1322829203

Containerfile.nocache won't build w/ podman system reset command

STEP 11/11: RUN podman system reset --force 2>/dev/null
Error: building at STEP "RUN podman system reset --force 2>/dev/null": while running runtime: exit status 125

Looking at the containerfile I'm not sure we need this line. It fails for me regardless of ARCH=aarch64 or x86_64. It seems to work when I drop it. I'm testing with make FROM=registry.redhat.io/rhel9-beta/rhel-bootc:9.4 CONTAINERFILE=Containerfile.nocache bootc with the summarizer app

Restructuring Proposal

I would like to change the structure of this repo a bit to make it more intuitive and easier to maintain as it continues to grow. Below please find a sketch of the proposed structure.

The main changes would be:

  • playground/ would be moved into model_servers/ and renamed llamacpp_python/
  • A new directory called recipes/ with N subdirectories for each recipe category. Right now that is natural_language_processing/, computer_vision/, audio/ and multimodal/.
  • All existing recipes will be moved to the correct subdirectory.
  • A new directory called VectorDBs/. With the intent to add any new component types that may arise in the future in the same way to the root directory.
  • Move chromadb/ into VectorDBs/

Please let me know what you all think of the proposed changes and how they might affect any dependencies on this repo.

locallm:
|-- assets
    |-- image.1.png
    |-- image2.png
    |-- ...
|-- data
    |-- file.txt
    |-- ...
|-- models
    |-- Model1Folder
    |-- Model2Folder
    |-- Convert_Models
    |-- ...
|-- model_servers
    |-- llamacpp_python
        |-- base
        |-- cuda
        |-- vulkan
        |-- ....
    |-- ollama
        |-- base
        |-- ....
    |-- caikit
        |-- base
        |-- ....
    |-- WhisperCPP
        |-- base
        |-- ....
    |-- ...
|-- recipes
    |-- Natural Language
        |-- Chat
        |-- Summary
        |-- RAG
        |-- Fine Tune
        |-- ...
    |-- Computer Vision
        |-- Object Detection
        |-- ...
    |-- Audio
        |-- Transcription
        |-- ...
    |-- Multimodal
        |-- Image Description
        |-- ...
|-- VectorDBs
    |-- ChromaDB
    |-- ...

Text to Json Recipe

A recipe that does the following:

Given a small set of json schemas (Ok to start with 1), create an LLM application where a user inputs their responses to the fields of the json schema using unstructured natural language and the model returns a correctly populated json.

A simple example to illustrate how this should work would be placing an order at a fast food restaurant. The user simply states their order, and the LLM generates the appropriate json that can be submitted to their oder management system.

Add links to the repos (make files) in each of the AI recipe readme

Add links to the repos (make files) in each of the AI recipe readme

Discussed during the Weekly Demo on Apr 15. It's challenging for the user to connect with the AI recipe on Podman Desktop to convert the images with bootc utilizing the make files, including the discoverability of the make files so adding the links to repos with the make files in the readme(s) should be helpful

llamacpp_python container may not work on all CPUs

llama-cpp detects CPU features like AVX, AVX2, FMA3, and F16C at build time. If the container is built on a machine that supports these instruction sets, then the binary won't work on CPUs without these instructions.

RUN pip install --upgrade pip
ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"
ENV FORCE_CMAKE=1
RUN pip install --no-cache-dir --upgrade -r /locallm/requirements.txt

References:

Credits to @bbrowning for figuring this out. He suggested to build llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DLLAMA_F16C=OFF"

Deploy Chatbot to the ET cluster in a persistent way

/cc @MichaelClifford

Initial idea: Deploy the chatbot LLM to the ET cluster so we can start using our own tools and ideating on how we test and evaluate model performance.

Spinoff Idea: Might be cool to also do this with the RAG application once its ready so we can evaluate RAG cloud vs edge models and how we reconcile / update the 2 with data

Stretch goal: It would be cool if we can deploy chatbot to review readme changes in this repo.

Example with Guard Rails

  • Research a few tools + methods for implementing guard rails for LLM applications.
  • Add a recipe that provides developers with an example of implementing guard rails.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.