nvidia / workbench-example-hybrid-rag Goto Github PK

An NVIDIA AI Workbench example project for Retrieval Augmented Generation (RAG)

License: Apache License 2.0

Python 91.40% CSS 0.17% HTML 2.57% JavaScript 0.07% Shell 5.79%

workbench-example-hybrid-rag's Introduction

A Hybrid RAG Project on AI Workbench

This is an NVIDIA AI Workbench project for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It lets you:

Embed your documents into a locally running vector database.
Run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs):
- 4-bit, 8-bit, and no quantization options are supported for locally running models served by TGI.
- Other models may be specified to run locally using their Hugging Face tag.
- Locally-running microservice option is supported for Docker users only.

Table 1 Default Supported Models by Inference Mode

Model	Local Inference (TGI)	Cloud Endpoints	Microservices (Local, Remote)
Llama3-ChatQA-1.5	Y		*
Mistral-7B-Instruct-v0.1	Y (gated)		*
Mistral-7B-Instruct-v0.2	Y (gated)	Y	*
Mistral-Large		Y	*
Mixtral-8x7B-Instruct-v0.1		Y	*
Mixtral-8x22B-Instruct-v0.1		Y	*
Llama-2-7B-Chat	Y (gated)		*
Llama-2-13B-Chat			*
Llama-2-70B-Chat		Y	*
Llama-3-8B-Instruct	Y (gated)	Y	Y (default) *
Llama-3-70B-Instruct		Y	*
Gemma-2B		Y	*
Gemma-7B		Y	*
CodeGemma-7B		Y	*
Phi-3-Mini-4k-Instruct		Y	*
Phi-3-Mini-128k-Instruct	Y	Y	*
Phi-3-Small-8k-Instruct		Y	*
Phi-3-Small-128k-Instruct		Y	*
Phi-3-Medium-4k-Instruct		Y	*
Arctic		Y	*
Granite-8B-Code-Instruct		Y	*
Granite-34B-Code-Instruct		Y	*

*NIM containers for LLMs are starting to roll out under General Availability (GA). If you set up any accessible language model NIM running on another system, it is supported under Remote NIM inference inside this project. For Local NIM inference, this project provides a flow for setting up the default meta/llama3-8b-instruct NIM locally as an example. Advanced users may choose to swap this NIM Container Image out with other NIMs as they are released.

Quickstart

This section demonstrates how to use this project to run RAG via NVIDIA Inference Endpoints hosted on the NVIDIA API Catalog. For other inference options, including local inference, see the Advanced Tutorials section for set up and instructions.

Prerequisites

An NGC account is required to generate an NVCF run key.
A valid NVCF key is required to access NVIDIA API endpoints. Generate a key on any NVIDIA API catalog model card, eg. here by clicking "Get API Key".

Tutorial: Using a Cloud Endpoint

Install and configure AI Workbench locally and open up AI Workbench. Select a location of your choice.
Fork this repo into your own GitHub account.
Inside AI Workbench:
- Click Clone Project and enter the repo URL of your newly-forked repo.
- AI Workbench will automatically clone the repo and build out the project environment, which can take several minutes to complete.
- Upon Build Complete, navigate to Environment > Secrets > NVCF_RUN_KEY > Configure and paste in your NVCF run key as a project secret.
- Select Open Chat on the top right of the AI Workbench window, and the Gradio app will open in a browser.
In the Gradio Chat app:
- Click Set up RAG Backend. This triggers a one-time backend build which can take a few moments to initialize.
- Select the Cloud option, select a model family and model name, and submit a query.
- To perform RAG, select Upload Documents Here from the right hand panel of the chat UI.
  - You may see a warning that the vector database is not ready yet. If so wait a moment and try again.
- When the database starts, select Click to Upload and choose the text files to upload.
- Once the files upload, the Toggle to Use Vector Database next to the text input box will turn on.
- Now query your documents! What are they telling you?
- To change the endpoint, choose a different model from the dropdown on the right-hand settings panel and continue querying.

Next Steps:

If you get stuck, check out the "Troubleshooting" section.
For tutorials on other supported inference modes, check out the "Advanced Tutorials" section below. Note: All subsequent tutorials will assume NVCF_RUN_KEY is already configured with your credentials.

NVIDIA AI Workbench

Note: NVIDIA AI Workbench is the easiest way to get this RAG app running.

NVIDIA AI Workbench is a free client application that you can install on your own machines.
It provides portable and reproducible dev environments by handling Git repos and containers for you.
Installing on a local system? Check out our guides here for Windows, Local Ubuntu 22.04 and for macOS 12 or higher
Installing on a remote system? Check out our guide for Remote Ubuntu 22.04

Troubleshooting

Need help? Submit any questions, bugs, feature requests, and feedback at the Developer Forum for AI Workbench. The dedicated thread for this Hybrid RAG example project is located here.

How do I open AI Workbench?

Make sure you installed AI Workbench. There should be a desktop icon on your system. Double click it to start AI Workbench.

How do I clone this repo with AI Workbench?

Make sure you have opened AI Workbench.
Click on the Local location (or whatever location you want to clone into).
If this is your first project, click the green Clone Existing Project button.
- Otherwise, click Clone Project in the top right
Drop in the repo URL, leave the default path, and click Clone.

I've cloned the project, but now nothing seems to be happening?

The container is likely building and can take several minutes.
Look at the very bottom of the Workbench window, you will see a Build Status widget.
Click it to expand the build output.
When the container is built, the widget will say Build Ready.
Now you can begin.

How do I start the Chat application?

Check that the container finished building.
When it finishes, click the green Open Chat button at the top right.

Something went wrong, how do I debug the Chat application?

Look at the bottom left of the AI Workbench window, you will see an Output widget.
Click it to expand the output.
Expand the dropdown, navigate to Applications > Chat.
You can now view all debug messages in the Output window in real time.

How can I customize this project with AI Workbench?

Check that the container is built.
Then click the green dropdown next to the Open Chat button at the top right.
Select JupyterLab to start editing the code. Alternatively, you may configure VSCode support here.

Advanced Tutorials

This section shows you how to use different inference modes with this RAG project. For these tutorials, a GPU of at least 12 GB of vRAM is recommended. If you don't have one, go back to the Quickstart Tutorial that shows how to use Cloud Endpoints.

Tutorial 1: Using a local GPU

This tutorial assumes you already cloned this Hybrid RAG project to your AI Workbench. If not, please follow the beginning of the Quickstart Tutorial.

Additional Configurations

Ungated Models

The following models are ungated. These can be accessed, downloaded, and run locally inside the project with no additional configurations required:

Gated models

Some additional configurations in AI Workbench are required to run certain listed models. Unlike the previous tutorials, these configs are not added to the project by default, so please follow the following instructions closely to ensure a proper setup. Namely, a Hugging Face API token is required for running gated models locally. See how to create a token here.

The following models are gated. Verify that You have been granted access to this model appears on the model cards for any models you are interested in running locally:

Then, complete the following steps:

If the project is already running, shut down the project environment under Environment > Stop Environment. This will ensure restarting the environment will incorporate all the below configurations.
In AI Workbench, add the following entries under Environment > Secrets.
- Your Hugging Face Token: This is used to clone the model weights locally from Hugging Face.
  - Name: HUGGING_FACE_HUB_TOKEN
  - Value: (Your HF API Key)
  - Description: HF Token for cloning model weights locally
Rebuild the project if needed to incorporate changes.

Note: All subsequent tutorials will assume both NVCF_RUN_KEY and HUGGING_FACE_HUB_TOKEN are already configured with your credentials.

Inference

Select the green Open Chat button on the top right the AI Workbench project window.
Once the UI opens, click Set up RAG Backend. This triggers a one-time backend build which can take a few moments to initialize.
Select the Local System inference mode under Inference Settings > Inference Mode.
Select a model from the dropdown on the right hand settings panel. You can filter by gated vs ungated models for convenience.
- Ensure you have proper access permissions for the model; instructions are here.
- You can also input a custom model from Hugging Face, following the same format. Careful, as not all models and quantization levels may be supported in the current TGI version!
Select a quantization level. The recommended precision for your system will be pre-selected for you, but full, 8-bit, and 4-bit bitsandbytes precision levels are currently supported.

Table 2 System Resources vs Model Size and Quantization

vRAM	System RAM	Disk Storage	Model Size & Quantization
>=12 GB	32 GB	40 GB	7B & int4
>=24 GB	64 GB	40 GB	7B & int8
>=40 GB	64 GB	40 GB	7B & none

Select Load Model to pre-fetch the model. This will take up to several minutes to perform an initial download of the model to the project cache. Subsequent loads will detect this cached model.
Select Start Server to start the inference server with your current local GPU. This may take a moment to warm up.
Now, start chatting! Queries will be made to the model running on your local system whenever this inference mode is selected.

Using RAG

In the right hand panel of the Chat UI select Upload Documents Here. Click to upload or drag and drop the desired text files to upload.
- You may see a warning that the vector database is not ready yet. If so wait a moment and try again.
Once the files upload, the Toggle to Use Vector Database next to the text input box will turn on by default.
Now query your documents! To use a different model, stop the server, make your selections, and restart the inference server.

Tutorial 2: Using a Remote Microservice

This tutorial assumes you already cloned this Hybrid RAG project to your AI Workbench. If not, please follow the beginning of the Quickstart Tutorial.

Additional Configurations

Set up your NVIDIA Inference Microservice (NIM) to run self-hosted on another system of your choice. The playbook to get started is located here. Remember the model name (if not the meta/llama3-8b-instruct default) and the ip address of this running microservice. Ports for NIMs are generally set to 8000 by default.
Alternatively, you may set up any other 3rd party supporting the OpenAI API Specification. One example is Ollama, as they support the OpenAI API Spec. Remember the model name, port, and the ip address when you set this up.

Inference

Select the green Open Chat button on the top right the AI Workbench project window.
Once the UI opens, click Set up RAG Backend. This triggers a one-time backend build which can take a few moments to initialize.
Select the Self-hosted Microservice inference mode under Inference Settings > Inference Mode.
Select the Remote tab in the right hand settings panel. Input the IP address of the accessible system running the microservice, Port if different from the 8000 default for NIMs, as well as the model name to run if different from the meta/llama3-8b-instruct default.
Now start chatting! Queries will be made to the microservice running on a remote system whenever this inference mode is selected.

Using RAG

In the right hand panel of the Chat UI select Upload Documents Here. Click to upload or drag and drop the desired text files to upload.
- You may see a warning that the vector database is not ready yet. If so wait a moment and try again.
Once uploaded successfully, the Toggle to Use Vector Database should turn on by default next to your text input box.
Now you may query your documents!

Tutorial 3: Using a Local Microservice

This tutorial assumes you already cloned this Hybrid RAG project to your AI Workbench. If not, please follow the beginning of the Quickstart Tutorial.

Here are some important PREREQUISITES:

Your AI Workbench must be running with a DOCKER container runtime. Podman is unsupported.
You must have access to NeMo Inference Microservice (NIMs) General Availability Program.
Shut down any other processes running locally on the GPU as these may result in memory issues when running the microservice locally.

Additional Configurations

Some additional configurations in AI Workbench are required to run this tutorial. Unlike the previous tutorials, these configs are not added to the project by default, so please follow the following instructions closely to ensure a proper setup.

If running, shut down the project environment under Environment > Stop Environment. This will ensure restarting the environment will incorporate all the below configurations.
In AI Workbench, add the following entries under Environment > Secrets:
- Your NGC API Key: This is used to authenticate when pulling the NIM container from NGC. Remember, you must be in the General Availability Program to access this container.
  - Name: NGC_CLI_API_KEY
  - Value: (Your NGC API Key)
  - Description: NGC API Key for NIM authentication
Add and/or modify the following under Environment > Variables:
- DOCKER_HOST: location of your docker socket, eg. unix:///var/host-run/docker.sock
- LOCAL_NIM_HOME: location of where your NIM files will be stored, for example /mnt/c/Users/<my-user> for Windows or /home/<my-user> for Linux
Add the following under Environment > Mounts:
- A Docker Socket Mount: This is a mount for the docker socket for the container to properly interact with the host Docker Engine.
  - Type: Host Mount
  - Target: /var/host-run
  - Source: /var/run
  - Description: Docker socket Host Mount
- A Filesystem Mount: This is a mount to properly run and manage your LOCAL_NIM_HOME on the host from inside the project container for generating the model repo.
  - Type: Host Mount
  - Target: /mnt/host-home
  - Source: (Your LOCAL_NIM_HOME location) , for example /mnt/c/Users/<my-user> for Windows or /home/<my-user> for Linux
  - Description: Host mount for LOCAL_NIM_HOME
Rebuild the project if needed.

Inference

Select the green Open Chat button on the top right the AI Workbench project window.
Once the UI opens, click Set up RAG Backend. This triggers a one-time backend build which can take a few moments to initialize.
Select the Self-hosted Microservice inference mode under Inference Settings > Inference Mode.
Select the Local sub-tab in the right hand settings panel.
Bring your NIM Container Image (placeholder can be used as the default flow), and select Prefetch NIM. This one-time process can take a few moments to pull down the NIM container.
Select Start Microservice. This may take a few moments to complete.
Now, you can start chatting! Queries will be made to your microservice running on the local system whenever this inference mode is selected.

Using RAG

In the right hand panel of the Chat UI select Upload Documents Here. Click to upload or drag and drop the desired text files to upload.
- You may see a warning that the vector database is not ready yet. If so wait a moment and try again.
Once uploaded successfully, the Toggle to Use Vector Database should turn on by default next to your text input box.
Now you may query your documents!

Tutorial 4: Customizing the Gradio App

By default, you may customize Gradio app using the jupyterlab container application. Alternatively, you may configure VSCode support here.

In AI Workbench, select the green dropdown from the top right and select Open JupyterLab.
Go into the code/chatui/ folder and start editing the files.
Save the files.
To see your changes, stop the Chat UI and restart it.
To version your changes, commit them in the Workbench project window and push to your GitHub repo.

In addition to modifying the Gradio frontend, you can also use the Jupyterlab or another IDE to customize other aspects of the project, eg. custom chains, backend server, scripts, configs, etc.

License

This NVIDIA AI Workbench example project is under the Apache 2.0 License

This project may download and install additional third-party open source software projects. Review the license terms of these open source projects before use. Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for confirming compliance with third-party component license terms and requirements.

workbench-example-hybrid-rag's People

Contributors

Stargazers

Watchers

Forkers

gilbertt78 mildhorton shjboy pk3roots dachou5224 kevinjonsmith rslse egrish starrett4jc jambyung jekelsc artofthepossible gseaton-dell grexzen kalyan-immadisetty rsb1993 jakefurtaw andregerver sibiren-spins zoularry camille2020 bobbul jax79sg elisamastroianni dtl8352 cliffho-nv nick13033 pattydelafuente tommywu052 aj-medianet bkdell mctouch bfurtaw uclick-ljw suinht dineshtripathi30 rockymtnbri chew4444 hunnter7 robert-sw jakerains seongs allwavemedia vparashar109 jasondriley fewarren mariuscomoyo stanicm station2040 jear joelgmiller ylzmdmoon amwyddh dirk-gluecker speedo8769 stm-70 md157092 eilskov sheetalkamthe55 dimitrioskotsos deorbit nv-shruthiis sitengm jjai2 bbartlett-nv kroukampm hyabean adi0106 version-0 ostaveli kkangle11 kkangle mosnic jtcasablanca nayyaung w1kblad ruanmalherbe mkrouk8 progoveremail grantcurell sayakawaii chennu simpsonry1620 louis-kotze davrollins kiritojim enzodandrea alantam168 marifl nvaitchk chaiebnadhem gerx07 danyolee oicqq jasonmart1002 sunafterdark rocsanz chintan3195 minerale00 pshah16

workbench-example-hybrid-rag's Issues

Build error - command not found

I have just started looking at this, but if this is meant to be an out of the box demo sort of deal, it looks like the build is broken. I just did a fresh install, let workbench deploy podman for me, and forked this repo. Below are the results.

Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libgl1
E: Unable to locate package libglib2.0-0
E: Couldn't find any package by glob 'libglib2.0-0
   '
E: Couldn't find any package by regex 'libglib2.0-0
   '
E: Unable to locate package git
Error: building at STEP "RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     libgl1
     libglib2.0-0
     git
     jq": while running runtime: exit status 100

System Info

Not that it really much matters since the build appears to be failing looking for some packages that don't exist in a container, but the host system is:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy


Linux linux-desktop 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Error during build process: failed to fetch http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease

#0 building with "default" instance using docker driver

#1 [internal] load build definition from Containerfile
#1 transferring dockerfile: 1.66kB done
#1 DONE 0.0s

#2 [internal] load metadata for ghcr.io/huggingface/text-generation-inference:latest
#2 ...

#3 [auth] huggingface/text-generation-inference:pull token for ghcr.io
#3 DONE 0.0s

#2 [internal] load metadata for ghcr.io/huggingface/text-generation-inference:latest
#2 DONE 0.8s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [ 1/17] FROM ghcr.io/huggingface/text-generation-inference:latest@sha256:cabd2a49a3afa7106aaa0bb3458c11904007fcd0369a83fcc1fa2fab76d444b4
#5 DONE 0.0s

#6 [internal] load build context
#6 transferring context: 3.94kB done
#6 DONE 0.0s

#7 [ 7/17] COPY --chown=1000:1000 [preBuild.bash, /opt/project/build/]
#7 CACHED

#8 [ 5/17] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y sudo
#8 CACHED

#9 [ 2/17] WORKDIR /opt/project/build/
#9 CACHED

#10 [ 4/17] RUN useradd -u 1000 -g 1000 -rm -d /home/workbench -s /bin/bash workbench || usermod -l workbench $(getent passwd 1000 | cut -d: -f1)
#10 CACHED

#11 [ 9/17] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y wget
#11 CACHED

#12 [ 6/17] RUN echo "workbench ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/workbench
#12 CACHED

#13 [ 3/17] RUN groupadd -g 1000 workbench || true
#13 CACHED

#14 [ 8/17] RUN ["/bin/bash", "/opt/project/build/preBuild.bash"]
#14 CACHED

#15 [10/17] RUN dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')"; wget -O- https://github.com/tianon/gosu/releases/download/1.17/gosu-${dpkgArch} | install /dev/stdin /usr/local/bin/gosu
#15 CACHED

#16 [11/17] COPY --chmod=755 [entrypoint.sh, /]
#16 CACHED

#17 [12/17] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1 libglib2.0-0 git jq
#17 0.416 Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease
#17 21.40 Err:2 http://archive.ubuntu.com/ubuntu jammy InRelease
#17 21.40 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.39:80: dial tcp 185.125.190.39:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.39 80]
#17 44.00 Ign:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
#17 66.06 Err:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
#17 66.06 Connection failed [IP: 185.125.190.36 80]
#17 72.48 Err:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
#17 72.48 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.39:80: dial tcp 185.125.190.39:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.39 80]
#17 123.6 Err:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
#17 123.6 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.36:80: dial tcp 185.125.190.36:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.36 80]
#17 123.6 Reading package lists...
#17 124.3 E: The repository 'http://archive.ubuntu.com/ubuntu jammy InRelease' is no longer signed.
#17 124.3 E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy/InRelease 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.39:80: dial tcp 185.125.190.39:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.39 80]
#17 124.3 E: The repository 'http://security.ubuntu.com/ubuntu jammy-security InRelease' is no longer signed.
#17 124.3 E: Failed to fetch http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease Connection failed [IP: 185.125.190.36 80]
#17 124.3 E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy-updates/InRelease 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.39:80: dial tcp 185.125.190.39:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.39 80]
#17 124.3 E: The repository 'http://archive.ubuntu.com/ubuntu jammy-updates InRelease' is no longer signed.
#17 124.3 E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy-backports/InRelease 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.36:80: dial tcp 185.125.190.36:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.36 80]
#17 124.3 E: The repository 'http://archive.ubuntu.com/ubuntu jammy-backports InRelease' is no longer signed.
#17 ERROR: process "/bin/bash -c apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1 libglib2.0-0 git jq" did not complete successfully: exit code: 100

[12/17] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1 libglib2.0-0 git jq:
123.6 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.36:80: dial tcp 185.125.190.36:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.36 80]

124.3 E: The repository 'http://archive.ubuntu.com/ubuntu jammy InRelease' is no longer signed.
124.3 E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy/InRelease 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.39:80: dial tcp 185.125.190.39:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.39 80]
124.3 E: The repository 'http://security.ubuntu.com/ubuntu jammy-security InRelease' is no longer signed.
124.3 E: Failed to fetch http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease Connection failed [IP: 185.125.190.36 80]
124.3 E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy-updates/InRelease 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.39:80: dial tcp 185.125.190.39:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.39 80]
124.3 E: The repository 'http://archive.ubuntu.com/ubuntu jammy-updates InRelease' is no longer signed.
124.3 E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy-backports/InRelease 403 connecting to archive.ubuntu.com:80: connecting to 185.125.190.36:80: dial tcp 185.125.190.36:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [IP: 185.125.190.36 80]
124.3 E: The repository 'http://archive.ubuntu.com/ubuntu jammy-backports InRelease' is no longer signed.

Containerfile:45

44 |
45 | >>> RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y
46 | >>> libgl1
47 | >>> libglib2.0-0
48 | >>> git
49 | >>> jq
50 |

ERROR: failed to solve: process "/bin/bash -c apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1 libglib2.0-0 git jq" did not complete successfully: exit code: 100

Build Failed

Preservation of vector database

The vector database seems to be cleared when Docker and the Nvidia AI Workbench are shutdown. Anyone know how to preserve and reload the vector database between instantiations?

Build error on postbash script

I'm trying to build this example, both with the AI Workbench GUI and the nvwb cli binary and am getting the following error:

#20 311.6    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.6/85.6 kB 3.2 MB/s eta 0:00:00
#20 311.6 Downloading pymilvus-2.3.1-py3-none-any.whl (168 kB)
#20 311.6    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.9/168.9 kB 8.5 MB/s eta 0:00:00
#20 311.7 Downloading grpcio-1.58.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB)
#20 311.8    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/5.3 MB 37.2 MB/s eta 0:00:00
#20 313.9 Installing collected packages: grpcio, anyio, pymilvus
#20 313.9   Attempting uninstall: grpcio
#20 313.9     Found existing installation: grpcio 1.62.1
#20 313.9     Uninstalling grpcio-1.62.1:
#20 313.9       Successfully uninstalled grpcio-1.62.1
#20 314.3 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
#20 314.3 chromadb 0.4.22 requires fastapi>=0.95.2, which is not installed.
#20 314.3 chromadb 0.4.22 requires onnxruntime>=1.14.1, which is not installed.
#20 314.3 chromadb 0.4.22 requires uvicorn[standard]>=0.18.3, which is not installed.
#20 314.3 grpcio-reflection 1.62.1 requires grpcio>=1.62.1, but you have grpcio 1.58.0 which is incompatible.
#20 314.3 grpcio-status 1.62.1 requires grpcio>=1.62.1, but you have grpcio 1.58.0 which is incompatible.
#20 314.3 text-generation-server 1.4.5 requires typer<0.7.0,>=0.6.1, but you have typer 0.12.0 which is incompatible.
#20 314.3 Successfully installed anyio-4.3.0 grpcio-1.58.0 pymilvus-2.3.1
#20 314.3 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#20 314.6 groupadd: GID '1001' already exists
#20 ERROR: process "/bin/bash /opt/project/build/postBuild.bash" did not complete successfully: exit code: 4
------
 > [16/17] RUN ["/bin/bash", "/opt/project/build/postBuild.bash"]:
314.3 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
314.3 chromadb 0.4.22 requires fastapi>=0.95.2, which is not installed.
314.3 chromadb 0.4.22 requires onnxruntime>=1.14.1, which is not installed.
314.3 chromadb 0.4.22 requires uvicorn[standard]>=0.18.3, which is not installed.
314.3 grpcio-reflection 1.62.1 requires grpcio>=1.62.1, but you have grpcio 1.58.0 which is incompatible.
314.3 grpcio-status 1.62.1 requires grpcio>=1.62.1, but you have grpcio 1.58.0 which is incompatible.
314.3 text-generation-server 1.4.5 requires typer<0.7.0,>=0.6.1, but you have typer 0.12.0 which is incompatible.
314.3 Successfully installed anyio-4.3.0 grpcio-1.58.0 pymilvus-2.3.1
314.3 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
314.6 groupadd: GID '1001' already exists
------
Containerfile:60
--------------------
  58 |     COPY --chown=$NVWB_UID:$NVWB_GID  ["postBuild.bash", "/opt/project/build/"]
  59 |     
  60 | >>> RUN ["/bin/bash", "/opt/project/build/postBuild.bash"]
  61 |     
  62 |     USER $NVWB_USERNAME
--------------------
ERROR: failed to solve: process "/bin/bash /opt/project/build/postBuild.bash" did not complete successfully: exit code: 4

This is on an NVidia DGX box running the latest release.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

nvidia / workbench-example-hybrid-rag Goto Github PK

workbench-example-hybrid-rag's Introduction

Table of Contents

A Hybrid RAG Project on AI Workbench

Table 1 Default Supported Models by Inference Mode

Quickstart

Prerequisites

Tutorial: Using a Cloud Endpoint

NVIDIA AI Workbench

Troubleshooting

How do I open AI Workbench?

How do I clone this repo with AI Workbench?

I've cloned the project, but now nothing seems to be happening?

How do I start the Chat application?

Something went wrong, how do I debug the Chat application?

How can I customize this project with AI Workbench?

Advanced Tutorials

Tutorial 1: Using a local GPU

Additional Configurations

Ungated Models

Gated models

Inference

Table 2 System Resources vs Model Size and Quantization

Using RAG

Tutorial 2: Using a Remote Microservice

Additional Configurations

Inference

Using RAG

Tutorial 3: Using a Local Microservice

Additional Configurations

Inference

Using RAG

Tutorial 4: Customizing the Gradio App

License

workbench-example-hybrid-rag's People

Contributors

Stargazers

Watchers

Forkers

workbench-example-hybrid-rag's Issues

System Info

Containerfile:45

44 | 45 | >>> RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y 46 | >>> libgl1 47 | >>> libglib2.0-0 48 | >>> git 49 | >>> jq 50 |

Recommend Projects

Recommend Topics

Recommend Org

44 |
45 | >>> RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y
46 | >>> libgl1
47 | >>> libglib2.0-0
48 | >>> git
49 | >>> jq
50 |