nvidia / generativeaiexamples Goto Github PK

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

License: Apache License 2.0

Dockerfile 0.65% Python 52.59% CSS 0.09% HTML 1.07% JavaScript 0.02% Jupyter Notebook 44.31% Shell 1.27%

gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server

generativeaiexamples's Introduction

NVIDIA Generative AI Examples

This repository serves as a starting point for generative AI developers looking to integrate with the NVIDIA software ecosystem to accelerate their generative AI systems. Whether you are building RAG pipelines, agentic workflows, or finetuning models, this repository will help you integrate NVIDIA, seamlesly and natively, with your development stack.

What's new?

Knowledge Graph RAG

The example implements a GPU-accelerated pipeline for creating and querying knowledge graphs using RAG by leveraging NIM microservices and the RAPIDS ecosystem for efficient processing of large-scale datasets.

Knowledge Graphs for RAG with NVIDIA AI Foundation Models and Endpoints

Agentic Workflows with Llama 3.1

Build an Agentic RAG Pipeline with Llama 3.1 and NVIDIA NeMo Retriever NIM microservices [Blog, notebook]
NVIDIA Morpheus, NIM microservices, and RAG pipelines integrated to create LLM-based agent pipelines

RAG with local NIM deployment and Langchain

Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints by Amit Bleiweiss. [Blog, notebook]

NeMo Guardrails with RAG

Notebook for demonstrating how to integrate NeMo Guardrails with a basic RAG pipeline in LangChain to ensure safe and accurate LLM responses using NVIDIA NIM microservices. [Blog, notebook]

For more details view the releases.

Try it now!

Experience NVIDIA RAG Pipelines with just a few steps!

Get your NVIDIA API key.

Visit the NVIDIA API Catalog, select on any model, then click on Get API Key

Afterward, run export NVIDIA_API_KEY=nvapi-....

Clone the repository and then build and run the basic RAG pipeline:

git clone https://github.com/nvidia/GenerativeAIExamples.git
cd GenerativeAIExamples/RAG/examples/basic_rag/langchain/
docker compose up -d --build

Open a browser to https://localhost:8090/ and submit queries to the sample RAG Playground.

When done, stop containers by running docker compose down.

End to end RAG Examples and Notebooks

NVIDIA has first class support for popular generative AI developer frameworks like LangChain, LlamaIndex and Haystack. These notebooks will show you how to integrate NIM microservices using your preferred generative AI development framework.

Notebooks

Use the notebooks to learn about the LangChain and LlamaIndex connectors.

LangChain Notebooks

LlamaIndex Notebooks

Basic RAG with LlamaIndex Integration

End to end RAG Examples

By default, the examples use preview NIM endpoints on NVIDIA API Catalog. Alternatively, you can run any of the examples on premises.

Basic RAG Examples

Advanced RAG Examples

How To Guides

Change the inference or embedding model
Customize the vector database
Customize the chain server:
- Chunking strategy
- Prompt template engineering
Support multiturn conversations
Configure LLM parameters at runtime
Speak queries and listen to responses with NVIDIA Riva.

Tools

Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.

Community

We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback. We invite contributions! Open a GitHub issue or pull request!

Check out the community examples and notebooks.

Related NVIDIA RAG Projects

NVIDIA Tokkio LLM-RAG: Use Tokkio to add avatar animation for RAG responses.
Hybrid RAG Project on AI Workbench: Run an NVIDIA AI Workbench example project for RAG.

generativeaiexamples's People

Contributors

Stargazers

Watchers

Forkers

shdchen a-rich flg77 jdcfd madhurima-nath jackyh zvonkok jdye64 mikemckiernan gilvbp sumitkbh jihyun-nv yanwei23 sashidhar-rafay rafaysystems ashbrainwave wbhm dearborn-open-ai gimmyalex danniellcheong durume dougrao benbastaki malimszen jordimartos davidgao7 matthewkfho tspannhw lenceai rajaramkuberan hwolff99 ken2190 3a1b2c3 ikad95 liuqi mirrorcy y-h-lin ameliataihui harperjuanl gyanachand1 vineetvermaml ironfistofgod rkp64 nadav-nesher readerenesgotiz chermark-y sovyborn2hannah lint888 channetr-targetcoops deltatana-softonal geniusboaairqu chattymppromnica idealed-godzillayellow chamerlireackste cyogf mbrukman coolig-l hulkferdy58 mohammedpithapur tutumomo ailabteam hhy5277 glauciaschnoeller awesome-release deecode123 clever-boy greydoubt gab-e-ai benelgiz jaedukseo jay3ss arrrsh lisabuilds dicode04 xinghan yuwang881 gayathrirajpatel jliberma evelynmitchell techthiyanes linecode asdlei99 yanxg mostalt haozj pvjammer jtmancilla victorsemenovgithub trombone1 ejhortala xxlest stackshareiodev suchismitasahu1993 rohrao edwardplataacn nv-pranjald hot5auce dtripathilenovo aicodehunt david-fox-chang

generativeaiexamples's Issues

a small issues in v0.6.0

Issue - deploy/compose/rag-app-text-chatbot.yaml

docker env got error and suggest ENABLE_TRACING should be int ,string or null

      ENABLE_TRACING: null

chain-server container keeps crashing (rag-app-text-chatbot.yaml)

I'm trying to deploy a basic RAG chatbot using the rag-app-text-chatbot.yaml file, but I'm running into issues with the chain-server container crashing shortly after startup. I believe I've properly followed the directions on https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html. I'm using the v0.6.0 tag on the github repository. If I run docker logs on the chain-server container, here's the output I see:

===
INFO:     Started server process [1]
INFO:     Waiting for application startup.
/usr/local/lib/python3.10/dist-packages/langchain/embeddings/__init__.py:29: LangChainDeprecationWarning: Importing embeddings from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.embeddings import HuggingFaceEmbeddings`.

To install langchain-community run `pip install -U langchain-community`.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/__init__.py:35: LangChainDeprecationWarning: Importing vector stores from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.vectorstores import FAISS`.

To install langchain-community run `pip install -U langchain-community`.
  warnings.warn(
INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/service_pb2_grpc.py:21: RuntimeWarning: The grpc package installed is at version 1.60.0, but the generated code in grpc_service_pb2_grpc.py depends on grpcio>=1.64.0. Please upgrade your grpc module to grpcio>=1.64.0 or downgrade your generated code using grpcio-tools<=1.60.0. This warning will become an error in 1.65.0, scheduled for release on June 25, 2024.
  warnings.warn(
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
INFO:RetrievalAugmentedGeneration.common.utils:Using huggingface as model engine and WhereIsAI/UAE-Large-V1 and model for embeddings
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: WhereIsAI/UAE-Large-V1
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO:RetrievalAugmentedGeneration.common.utils:Using triton-trt-llm as model engine for llm. Model name: ensemble
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 734, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 610, in __aenter__
    await self._router.startup()
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 713, in startup
    handler()
  File "/opt/RetrievalAugmentedGeneration/common/server.py", line 158, in import_example
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/RetrievalAugmentedGeneration/example/chains.py", line 56, in <module>
    set_service_context()
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 131, in wrapper
    return func(*args_hashable, **kwargs_hashable)
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 138, in set_service_context
    llm = LangChainLLM(get_llm(**kwargs))
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 131, in wrapper
    return func(*args_hashable, **kwargs_hashable)
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 270, in get_llm
    trtllm = TensorRTLLM(  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
    super().__init__(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for TensorRTLLM
__root__
  Channel.unary_unary() got an unexpected keyword argument '_registered_method' (type=type_error)

ERROR:    Application startup failed. Exiting.
Exception ignored in: <function InferenceServerClient.__del__ at 0x7561548a9750>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in __del__
    self.close()
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 264, in close
    self.stop_stream()
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1811, in stop_stream
    if self._stream is not None:
AttributeError: 'InferenceServerClient' object has no attribute '_stream'
===

Here's some docker ps output:

$ docker ps -a --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID   NAMES                  STATUS
f025cd96cc5c   milvus-standalone      Up 13 minutes
ca017bfe8648   milvus-etcd            Up 13 minutes (healthy)
b44caa6c6e9a   milvus-minio           Up 13 minutes (healthy)
4b812c48035b   rag-playground         Up 13 minutes
a686d2b3938f   chain-server           Exited (3) 13 minutes ago
7fe575e94855   llm-inference-server   Up 13 minutes
80f535f5a462   notebook-server        Up 13 minutes

Text appears and disappears when POD is launched with models larger than 10B

hello.

As the title says, when I use a model over 10B and reference it, the text comes up and then disappears.

I would appreciate your opinion on what could be the cause and how to fix it.

Thank you.

I'm also attaching the triton pod related error.

Received stop request for requestId 7796140 but it's not active (might be completed already).

RIVA integrated with Chat UI does not transcribe speech to text correctly and completely

I am trying to use RIVA ASR with frontend as given in example, it fails to transcribe speech to text. Most of the time it fails catch my voice correctly.

Aurora Mpox Sentinela OMS

Aurora Mpox Sentinel.

1. `data_collection.py`

Este módulo coleta e armazena dados em tempo real.

import pandas as pd
import requests

def fetch_health_data(api_endpoint):
    try:
        response = requests.get(api_endpoint)
        response.raise_for_status()
        data = response.json()
        df = pd.DataFrame(data)
        df.to_csv('health_data.csv', index=False)
        return df
    except requests.exceptions.RequestException as e:
        print(f"Error fetching data: {e}")
        return pd.DataFrame()

2. `data_analysis.py`

Este módulo realiza a análise e previsão usando modelos avançados.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import tensorflow as tf

def load_model(model_path):
    return tf.keras.models.load_model(model_path)

def preprocess_data(df):
    # Example preprocessing
    df.fillna(0, inplace=True)
    X = df[['feature1', 'feature2']]  # Replace with actual features
    return X

def train_model(X_train, y_train):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=10)
    return model

def predict(model, X):
    return model.predict(X)

# Example usage
if __name__ == "__main__":
    df = pd.read_csv('health_data.csv')
    X = preprocess_data(df)
    y = df['target']  # Example target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    model = train_model(X_train, y_train)
    y_pred = model.predict(X_test)
    print(classification_report(y_test, y_pred))

3. `response_system.py`

Este módulo lida com a resposta e envio de alertas.

import smtplib
from email.mime.text import MIMEText

def send_alert(email_recipient, subject, message):
    try:
        msg = MIMEText(message)
        msg['Subject'] = subject
        msg['From'] = '[email protected]'
        msg['To'] = email_recipient

        with smtplib.SMTP('smtp.yourdomain.com', 587) as server:
            server.starttls()
            server.login('your_username', 'your_password')
            server.sendmail(msg['From'], [msg['To']], msg.as_string())
    except Exception as e:
        print(f"Error sending alert: {e}")

4. `app.py`

Este módulo cria uma interface web usando Flask.

from flask import Flask, request, jsonify
import pandas as pd
from data_analysis import load_model, preprocess_data, predict
from response_system import send_alert

app = Flask(__name__)

# Load pre-trained model
model = load_model('model_path')

@app.route('/predict', methods=['POST'])
def predict_endpoint():
    data = request.json
    df = pd.DataFrame(data)
    preprocessed_data = preprocess_data(df)
    predictions = predict(model, preprocessed_data)
    return jsonify(predictions.tolist())

@app.route('/alert', methods=['POST'])
def alert():
    data = request.json
    email = data['email']
    subject = data['subject']
    message = data['message']
    send_alert(email, subject, message)
    return 'Alert sent!', 200

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0')

Notas Finais

Instale as Dependências:

Certifique-se de que você tem todas as bibliotecas necessárias instaladas:
```
pip install pandas scikit-learn tensorflow flask requests
```
Modelo de Machine Learning:

Certifique-se de que o modelo treinado esteja salvo e acessível no caminho especificado (model_path). Se você não tiver um modelo treinado, pode usar o código de treinamento fornecido em data_analysis.py para criar um.
Segurança e Configuração:
- Email: Configure o servidor SMTP e as credenciais no módulo response_system.py.
- Proteção de Dados: Certifique-se de que todas as medidas de segurança e privacidade estão implementadas conforme necessário.

Exception: [500] Internal Server Error

Hi,

After uploading a PDF, I am able to see the screen below-

But it is showing error while returning a response like below-
Traceback (most recent call last): File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_nvidia_ai_endpoints\_common.py", line 203, in _try_raise response.raise_for_status() File "d:\nvidia_learning\llm-env\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/8f4118ba-60a8-4e6b-8574-e38a4067a4a3

Is this from the endpoint? Please suggest how to resolve this?

Complete logs are-
`The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "d:\nvidia_learning\llm-env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
exec(code, module.dict)
File "D:\NVIDIA_Learning\nvidia_streamlit_llm__main.py", line 126, in
for response in chain.stream({"input": augmented_user_input}):
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\runnables\base.py", line 2424, in stream
yield from self.transform(iter([input]), config, **kwargs)
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\runnables\base.py", line 2411, in transform
yield from self._transform_stream_with_config(
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\runnables\base.py", line 1497, in _transform_stream_with_config
chunk: Output = context.run(next, iterator) # type: ignore
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\runnables\base.py", line 2375, in _transform
for output in final_pipeline:
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\output_parsers\transform.py", line 50, in transform
yield from self._transform_stream_with_config(
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\runnables\base.py", line 1473, in _transform_stream_with_config
final_input: Optional[Input] = next(input_for_tracing, None)
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\runnables\base.py", line 1045, in transform
yield from self.stream(final, config, **kwargs)
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\language_models\chat_models.py", line 249, in stream
raise e
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_core\language_models\chat_models.py", line 233, in stream
for chunk in self._stream(
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_nvidia_ai_endpoints\chat_models.py", line 123, in _stream
for response in self.get_stream(
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_nvidia_ai_endpoints_common.py", line 484, in get_stream
return self.client.get_req_stream(self.model, stop=stop, payload=payload)
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_nvidia_ai_endpoints_common.py", line 371, in get_req_stream
self._try_raise(response)
File "d:\nvidia_learning\llm-env\lib\site-packages\langchain_nvidia_ai_endpoints_common.py", line 218, in _try_raise
raise Exception(f"{title}\n{body}") from e
Exception: [500] Internal Server Error
Internal error while making inference request`

When I run /RetrievalAugmentedGeneration/examples/developer_rag/chains.py

My setting about rag-app-text-chatbot.yaml is services:
jupyter-server:
container_name: notebook-server
image: notebook-server:${TAG:-latest}
build:
context: ../../
dockerfile: ./notebooks/Dockerfile.notebooks # replace GPU enabled Dockerfile ./notebooks/Dockerfile.gpu_notebook
ports:
- "8888:8888"
expose:
- "8888"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

chain-server:
container_name: chain-server
image: chain-server:${TAG:-latest}
build:
context: ../../
dockerfile: ./RetrievalAugmentedGeneration/Dockerfile
args:
EXAMPLE_NAME: developer_rag
command: --port 8081 --host 0.0.0.0
environment:
APP_VECTORSTORE_URL: "http://milvus:19530"
APP_VECTORSTORE_NAME: "milvus"
APP_EMBEDDINGS_MODELNAME: ${APP_EMBEDDINGS_MODELNAME:-G:/jjx/moxing/snowflake-arctic-embed-l}
APP_EMBEDDINGS_MODELENGINE: ${APP_EMBEDDINGS_MODELENGINE:-local}
APP_EMBEDDINGS_SERVERURL: ${APP_EMBEDDINGS_SERVERURL:-""}
APP_LLM_SERVERURL: ${APP_LLM_SERVERURL:-""}
APP_LLM_MODELNAME: ${APP_LLM_MODELNAME:-"G:/jjx/moxing/llama-2-13b-chat-hf"}
APP_LLM_MODELENGINE: ${APP_LLM_MODELENGINE:-local}
NVIDIA_API_KEY: ${NVIDIA_API_KEY}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password}
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_DB: ${POSTGRES_DB:-api}
COLLECTION_NAME: ${COLLECTION_NAME:-developer_rag}
APP_RETRIEVER_TOPK: 4
APP_RETRIEVER_SCORETHRESHOLD: 0.25
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL: grpc
ENABLE_TRACING: false
APP_TEXTSPLITTER_MODELNAME: Snowflake/snowflake-arctic-embed-l
APP_TEXTSPLITTER_CHUNKSIZE: 506
APP_TEXTSPLITTER_CHUNKOVERLAP: 200
LOGLEVEL: ${LOGLEVEL:-INFO}
ports:
- "8081:8081"
expose:
- "8081"
shm_size: 5gb
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

rag-playground:
container_name: rag-playground
image: rag-playground:${TAG:-latest}
build:
context: ../.././RetrievalAugmentedGeneration/frontend/
dockerfile: Dockerfile
command: --port 8090
environment:
APP_SERVERURL: http://chain-server
APP_SERVERPORT: 8081
APP_MODELNAME: ${APP_LLM_MODELNAME:-"meta/llama3-8b-instruct"}
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL: grpc
ENABLE_TRACING: false
RIVA_API_URI: ${RIVA_API_URI:-}
RIVA_API_KEY: ${RIVA_API_KEY:-}
RIVA_FUNCTION_ID: ${RIVA_FUNCTION_ID:-}
TTS_SAMPLE_RATE: ${TTS_SAMPLE_RATE:-48000}
ports:
- "8090:8090"
expose:
- "8090"
depends_on:
- chain-server

networks:
default:
name: nvidia-rag
What should I do?

Projeto liliti stk 3.6.9 acabou

LMMS/lmms.io#379

[Kubernetes Deployment Issue] milvus-minio pod not coming up

kubectl get po -n rag-llm-pipeline
NAME READY STATUS RESTARTS AGE
jupyter-notebook-server-78f5bdd7cb-nx7hv 0/1 CrashLoopBackOff 9 (3m21s ago) 24m
llm-playground-5559f8499-qwtzw 1/1 Running 0 24m
milvu-etcd-5c98d7c546-jwh74 0/1 CrashLoopBackOff 9 (3m52s ago) 24m
milvus-minio-54fbffdcfd-d6w7w 0/1 CrashLoopBackOff 9 (3m23s ago) 24m
milvus-standalone-794d77777b-7tqtb 0/1 Pending 0 24m
query-router-6b5dcf4f97-q29ck 0/1 Pending 0 24m
triton-inference-server-76b58dcb4f-7j5p8 0/1 Pending 0 24m
(venv39) [root@BM48-aiocp-worker-0 templates]#

oc logs milvus-minio-54fbffdcfd-d6w7w -n rag-llm-pipeline
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD

API: SYSTEM()
Time: 10:20:17 UTC 01/10/2024
Error: unable to rename (/minio_data/.minio.sys/tmp -> /minio_data/.minio.sys/tmp-old/1b3cd0ff-d3b4-4be1-9267-d7765af26ff5) file access denied, drive may be faulty please investigate (*fmt.wrapError)
6: internal/logger/logger.go:258:logger.LogIf()
5: cmd/prepare-storage.go:88:cmd.bgFormatErasureCleanupTmp()
4: cmd/xl-storage.go:250:cmd.newXLStorage()
3: cmd/object-api-common.go:61:cmd.newStorageAPI()
2: cmd/format-erasure.go:673:cmd.initStorageDisksWithErrors.func1()
1: internal/sync/errgroup/errgroup.go:123:errgroup.(*Group).Go.func1()

API: SYSTEM()
Time: 10:20:17 UTC 01/10/2024
Error: unable to create (/minio_data/.minio.sys/tmp) file access denied, drive may be faulty please investigate (*fmt.wrapError)
6: internal/logger/logger.go:258:logger.LogIf()
5: cmd/prepare-storage.go:95:cmd.bgFormatErasureCleanupTmp()
4: cmd/xl-storage.go:250:cmd.newXLStorage()
3: cmd/object-api-common.go:61:cmd.newStorageAPI()
2: cmd/format-erasure.go:673:cmd.initStorageDisksWithErrors.func1()
1: internal/sync/errgroup/errgroup.go:123:errgroup.(*Group).Go.func1()
ERROR Unable to use the drive /minio_data: drive access denied: Invalid arguments specified
(venv39) [root@BM48-aiocp-worker-0 templates]#

oc describe po milvus-minio-54fbffdcfd-d6w7w -n rag-llm-pipeline
Name: milvus-minio-54fbffdcfd-d6w7w
Namespace: rag-llm-pipeline
Priority: 0
Service Account: default
Node: bm92-aiocp-worker-1.aiocp.hpelab.local/192.168.22.117
Start Time: Wed, 10 Jan 2024 03:53:51 -0600
Labels: app.kubernetes.io/name=milvus-minio
app.trailblazer.nvidia.com/owned-by=HelmOrchard
pod-template-hash=54fbffdcfd
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.128.2.46"
],
"default": true,
"dns": {}
}]
openshift.io/scc: privileged
Status: Running
IP: 10.128.2.46
IPs:
IP: 10.128.2.46
Controlled By: ReplicaSet/milvus-minio-54fbffdcfd
Containers:
milvus-minio:
Container ID: cri-o://3d41e290e1e5bb0138ac348e59bfae93fb2405c9223212df054515a4b45d7afe
Image: minio/minio:RELEASE.2023-03-20T20-16-18Z
Image ID: docker.io/minio/minio@sha256:6d770d7f255cda1f18d841ffc4365cb7e0d237f6af6a15fcdb587480cd7c3b93
Ports: 9001/TCP, 9000/TCP
Host Ports: 0/TCP, 0/TCP
Command:
minio
server
/minio_data
--console-address
:9001
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 10 Jan 2024 04:15:16 -0600
Finished: Wed, 10 Jan 2024 04:15:16 -0600
Ready: False
Restart Count: 9
Readiness: exec [curl -f http://localhost:9000/minio/health/live] delay=20s timeout=1s period=5s #success=1 #failure=3
Environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
Mounts:
/minio_data from minio-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6dzck (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
minio-data:
Type: HostPath (bare host directory volume)
Path: /minio_data
HostPathType: DirectoryOrCreate
kube-api-access-6dzck:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional:
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Normal Scheduled 25m default-scheduler Successfully assigned rag-llm-pipeline/milvus-minio-54fbffdcfd-d6w7w to bm92-aiocp-worker-1.aiocp.hpelab.local
Normal AddedInterface 25m multus Add eth0 [10.128.2.46/23] from openshift-sdn
Normal Pulling 25m kubelet Pulling image "minio/minio:RELEASE.2023-03-20T20-16-18Z"
Normal Pulled 25m kubelet Successfully pulled image "minio/minio:RELEASE.2023-03-20T20-16-18Z" in 7.131695968s (7.13170319s including waiting)
Normal Started 24m (x4 over 25m) kubelet Started container milvus-minio
Normal Pulled 24m (x4 over 25m) kubelet Container image "minio/minio:RELEASE.2023-03-20T20-16-18Z" already present on machine
Normal Created 24m (x5 over 25m) kubelet Created container milvus-minio
Warning BackOff 47s (x123 over 25m) kubelet Back-off restarting failed container milvus-minio in pod milvus-minio-54fbffdcfd-d6w7w_rag-llm-pipeline(e0f9dfc3-935c-4013-ae89-69c45a5cc3ce)

I had updated the milvus-minio.yaml file in my environment to point the volume to a pvc...

cat milvus-minio.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: milvus-minio
labels:
app.kubernetes.io/name: milvus-minio
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: milvus-minio
template:
metadata:
labels:
app.kubernetes.io/name: milvus-minio
spec:
containers:
- name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
command:
- minio
- server
- /minio_data
- --console-address
- :9001
env:
- name: MINIO_ACCESS_KEY
value: minioadmin
- name: MINIO_SECRET_KEY
value: minioadmin
ports:
- containerPort: 9001
- containerPort: 9000
volumeMounts:
- mountPath: /minio_data
name: minio-data
readinessProbe:
exec:
command:
- curl
- -f
- http://localhost:9000/minio/health/live
initialDelaySeconds: 20
periodSeconds: 5
volumes:
- name: minio-data
persistentVolumeClaim:
claimName: ashish-scalable-ai-pipeline-volume-claim-2

apiVersion: v1
kind: Service
metadata:
name: milvus-minio
spec:
selector:
app.kubernetes.io/name: milvus-minio
ports:
- protocol: TCP
port: 9000
targetPort: 9000

(venv39) [root@BM48-aiocp-worker-0 templates]#

triton-inference-server cannot be started

NAME READY STATUS RESTARTS AGE
jupyter-notebook-server-5f785cd7c8-x8qd6 1/1 Running 0 45m
llm-playground-7d8c999487-fgmj5 1/1 Running 0 45m
milvu-etcd-7cf545456f-m8q9m 1/1 Running 0 45m
milvus-minio-7ff64c76f-4njkz 1/1 Running 0 45m
milvus-standalone-7479bf9ddd-n6s6f 1/1 Running 0 45m
query-router-65c6f864ff-fstkb 1/1 Running 0 45m
triton-inference-server-7cd84c8f4b-wzsk9 0/1 CrashLoopBackOff 8 (18s ago) 23m

[triton-inference-server-7cd84c8f4b-wzsk9:30 :0:30] Caught signal 7 (Bus error: nonexistent physical address)
backtrace (tid: 30)
0 0x0000000000042520 __sigaction() ???:0
1 0x000000000001678b uct_iface_mp_chunk_alloc_inner() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_mem.c:469
2 0x000000000001678b uct_iface_mp_chunk_alloc() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_mem.c:443
3 0x000000000005407b ucs_mpool_grow() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucs/datastruct/mpool.c:266
4 0x00000000000542c9 ucs_mpool_get_grow() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucs/datastruct/mpool.c:312
5 0x000000000001b488 uct_mm_iface_t_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/sm/mm/base/mm_iface.c:822
6 0x000000000001b9f2 uct_mm_iface_t_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/sm/mm/base/mm_iface.c:720
7 0x0000000000014f02 uct_iface_open() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_md.c:284
8 0x000000000004a017 ucp_worker_iface_open() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:1357
9 0x000000000004afe0 ucp_worker_add_resource_ifaces() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:1101
10 0x000000000004d2db ucp_worker_create() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:2441
11 0x000000000000702f mca_pml_ucx_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/ucx/pml_ucx.c:306
12 0x00000000000093a5 mca_pml_ucx_component_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/ucx/pml_ucx_component.c:136
13 0x00000000000c7022 mca_pml_base_select() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/base/pml_base_select.c:127
14 0x00000000000d01c9 ompi_mpi_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/runtime/ompi_mpi_init.c:647
15 0x0000000000075899 PMPI_Init_thread() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mpi/c/profile/pinit_thread.c:69
16 0x00000000000327a8 __pyx_f_6mpi4py_3MPI_bootstrap() /tmp/pip-install-05lukizf/mpi4py_8cc4cad65d414a8995a9d1c890fac173/src/mpi4py.MPI.c:8115
17 0x00000000000327a8 __pyx_pymod_exec_MPI() /tmp/pip-install-05lukizf/mpi4py_8cc4cad65d414a8995a9d1c890fac173/src/mpi4py.MPI.c:176976
18 0x000000000023b2d3 PyModule_ExecDef() ???:0
19 0x000000000023bda0 PyInit__thread() ???:0
20 0x000000000015f854 PyObject_GenericGetAttr() ???:0
21 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0
22 0x000000000016070c _PyFunction_Vectorcall() ???:0
23 0x000000000014e8a2 _PyEval_EvalFrameDefault() ???:0
24 0x000000000016070c _PyFunction_Vectorcall() ???:0
25 0x0000000000148f52 _PyEval_EvalFrameDefault() ???:0
26 0x000000000016070c _PyFunction_Vectorcall() ???:0
27 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
28 0x000000000016070c _PyFunction_Vectorcall() ???:0
29 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
30 0x000000000016070c _PyFunction_Vectorcall() ???:0
31 0x000000000015fb24 PyObject_CallFunctionObjArgs() ???:0
32 0x000000000023f4af _PyObject_CallMethodIdObjArgs() ???:0
33 0x00000000001740ca PyImport_ImportModuleLevelObject() ???:0
34 0x0000000000184458 PyImport_Import() ???:0
35 0x000000000015fe0e PyObject_CallFunctionObjArgs() ???:0
36 0x000000000016f12b PyObject_Call() ???:0
37 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0
38 0x000000000016070c _PyFunction_Vectorcall() ???:0
39 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
40 0x000000000016070c _PyFunction_Vectorcall() ???:0
41 0x000000000015fb24 PyObject_CallFunctionObjArgs() ???:0
42 0x000000000023f4af _PyObject_CallMethodIdObjArgs() ???:0
43 0x0000000000174cda PyImport_ImportModuleLevelObject() ???:0
44 0x000000000014b9e5 _PyEval_EvalFrameDefault() ???:0
45 0x0000000000239e56 PyEval_EvalCode() ???:0
46 0x0000000000239cf6 PyEval_EvalCode() ???:0
47 0x000000000023fb0d PyFrozenSet_New() ???:0
48 0x0000000000160969 PyCell_New() ???:0
49 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0
50 0x000000000016070c _PyFunction_Vectorcall() ???:0
51 0x000000000014e8a2 _PyEval_EvalFrameDefault() ???:0
52 0x000000000016070c _PyFunction_Vectorcall() ???:0
53 0x0000000000148f52 _PyEval_EvalFrameDefault() ???:0
54 0x000000000016070c _PyFunction_Vectorcall() ???:0
55 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
56 0x000000000016070c _PyFunction_Vectorcall() ???:0

[triton-inference-server-7cd84c8f4b-wzsk9:00030] *** Process received signal ***
[triton-inference-server-7cd84c8f4b-wzsk9:00030] Signal: Bus error (7)
[triton-inference-server-7cd84c8f4b-wzsk9:00030] Signal code: (-6)
[triton-inference-server-7cd84c8f4b-wzsk9:00030] Failing at address: 0x1e
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9d7caa7520]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 1] /opt/hpcx/ucx/lib/libuct.so.0(uct_iface_mp_chunk_alloc+0x7b)[0x7f9d3689178b]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 2] /opt/hpcx/ucx/lib/libucs.so.0(ucs_mpool_grow+0x7b)[0x7f9d3691607b]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 3] /opt/hpcx/ucx/lib/libucs.so.0(ucs_mpool_get_grow+0x19)[0x7f9d369162c9]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 4] /opt/hpcx/ucx/lib/libuct.so.0(+0x1b488)[0x7f9d36896488]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 5] /opt/hpcx/ucx/lib/libuct.so.0(uct_mm_iface_t_new+0xb2)[0x7f9d368969f2]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 6] /opt/hpcx/ucx/lib/libuct.so.0(uct_iface_open+0xe2)[0x7f9d3688ff02]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 7] /opt/hpcx/ucx/lib/libucp.so.0(ucp_worker_iface_open+0x317)[0x7f9d36a93017]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 8] /opt/hpcx/ucx/lib/libucp.so.0(+0x4afe0)[0x7f9d36a93fe0]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 9] /opt/hpcx/ucx/lib/libucp.so.0(ucp_worker_create+0x7cb)[0x7f9d36a962db]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [10] /opt/hpcx/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_init+0x9f)[0x7f9d36b2f02f]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [11] /opt/hpcx/ompi/lib/openmpi/mca_pml_ucx.so(+0x93a5)[0x7f9d36b313a5]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [12] /opt/hpcx/ompi/lib/libmpi.so.40(mca_pml_base_select+0x1e2)[0x7f9c1bc35022]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [13] /opt/hpcx/ompi/lib/libmpi.so.40(ompi_mpi_init+0x6c9)[0x7f9c1bc3e1c9]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [14] /opt/hpcx/ompi/lib/libmpi.so.40(PMPI_Init_thread+0x79)[0x7f9c1bbe3899]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [15] /usr/local/lib/python3.10/dist-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x327a8)[0x7f9c1bcbf7a8]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [16] /usr/bin/python3(PyModule_ExecDef+0x73)[0x55f3c471e2d3]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [17] /usr/bin/python3(+0x23bda0)[0x55f3c471eda0]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [18] /usr/bin/python3(+0x15f854)[0x55f3c4642854]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [19] /usr/bin/python3(_PyEval_EvalFrameDefault+0x2b71)[0x55f3c462e2c1]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [20] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [21] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6152)[0x55f3c46318a2]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [22] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [23] /usr/bin/python3(_PyEval_EvalFrameDefault+0x802)[0x55f3c462bf52]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [24] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [25] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6bd)[0x55f3c462be0d]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [26] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [27] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6bd)[0x55f3c462be0d]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [28] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [29] /usr/bin/python3(+0x15fb24)[0x55f3c4642b24]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] *** End of error message ***
[23] May 29 04:16:21 [ ERROR] - main - TensorRT conversion returned a non-zero exit code.

Asking question which does not have a relevant snippet in the knowledge base leads to frontend error

Asking a question, unrelated to knowledge base content, causes frontend to fail with error.

Without knowledge base:

With knowledge base:

Unable to upload files in Q&A Chatbot's RAG service

Hi there,
I'm currently trying to build a Q&A Chatbot using the example here. Since my GPU model is A30 (with two 20GB GPUs), I'm using a smaller-scale model: llama-2-7B-chat. I followed the instructions in the example to set up the service, but I'm having trouble uploading knowledge documents. What can I do here to solve the problem?

Logo

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File

I am looking to create a Chinese RAG demo service using RetrievalAugmentedGeneration.

However, I encountered an issue where the default SentenceTransformersTokenTextSplitter model used in the RetrievalAugmentedGeneration/common/utils.py file is hardcoded as 'intfloat/e5-large-v2'. This model generates a significant number of [UNK] tokens when processing Chinese text.

I would like the ability to specify a specific model for the text splitter, similar to how the embedding model can be specified through the config.yaml file.

Thank you for your assistance and support.

ImportError: DLL load failed while importing _rust: The specified procedure could not be found.

Hi,

I am trying to run the given example. But when I upload a PDF file I get the below error-
ImportError: DLL load failed while importing _rust: The specified procedure could not be found.

The complete log trace is here-
error_logs.txt

My system config is-
Python 3.9.2rc1
Windows 11

upload_pdf_files does not check for file type or format

Experimenting with notebooks can create .ipynb_checkpoints folder in RetrievalAugmentedGeneration/notebooks/dataset.

This will cause upload_pdf_files function to fail when increasing NUM_DOCS_TO_UPLOAD. The issue does not occur with default NUM_DOCS_TO_UPLOAD (100), as the number of .pdf files in dataset.zip is greater than 100.

Any plans to deploy in Kubernetes?

You will need to set `allow_dangerous_deserialization` to `True`

Example 10_RAG_for_HTML_docs_with_Langchain_NVIDIA_AI_Endpoints fails with error:

ValueError: The de-serialization relies loading a pickle file. Pickle files can be modified to deliver a malicious payload that results in execution of arbitrary code on your machine.You will need to set allow_dangerous_deserialization to True to enable deserialization. If you do this, make sure that you trust the source of the data. For example, if you are loading a file that you created, and no that no one else has modified the file, then this is safe to do. Do not set this to True if you are loading a file from an untrusted source (e.g., some random site on the internet.).

RAG using Llama2 7b model

In 05_dataloader.ipynb in examples folder https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.4.0/notebooks/05_dataloader.ipynb
is there a way to use Llama2 7b model instead of default Llama2 13b model

Internal server error for role orders in LLM inference

Hi,

I am getting 500s with description

chat messages must alternate roles between 'user' and 'assistant'.  Message may have a leading 'system' role message

and

"Internal Server Error\",\"status\":500,\"detail\":\"Last message role should be 'user'

I think this order validation is unnecessary. Most of the models (e.g. mixtrals, llama3, gemma etc) are perfectly fine with any order of roles. This claim can be validated on groq's playground: https://console.groq.com/playground

This issue currently breaks some existing patterns, like continuations (without the user explicitly saying "continue") or in some cases running agents with observations etc.

Correct typo "compatiable" in the Readme.md

Correct typo "compatiable"->should be "compatible" in phrase: "Gemma models are compatiable with NeMo Framework." first line of the "Customizing Gemma with NeMo Framework" section. https://github.com/NVIDIA/GenerativeAIExamples/tree/main/models/Gemma/

Missing url

on rag in 5 minutes example

this hyperlink linked to here which has no file.

Missing info / incomplete documentation found

Hi,

Sorry this is my first time putting an issue with such a big company! Not sure if there's any decorum to abide by but, here is the issue:

[] (https://github.com/NVIDIA/GenerativeAIExamples/docs/api-catalog.md)

At the very end there is a missing bullet point and an incomplete sentence that would be nice to have the info for. Here is the sentence:

Next Steps

Access the web interface for the chat server. Refer to missing_info for information about using the web interface.

extra bullet or missing info

Stop the containers by running docker compose -f deploy/compose/rag-app-api-catalog-text-chatbot.yaml down and docker compose -f deploy/compose/docker-compose-vectordb.yaml down.

Standalone examples.md has a bad link for obtaining API key.

Visit https://github.com/NVIDIA/GenerativeAIExamples/tree/main/examples
Click on link embedded in step 3. (Follow the steps mentioned here to get this.) Notice the 404 for the referenced file: https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/ai-foundation-models.md#get-an-api-key-for-the-mixtral-8x7b-instruct-api-endpoint

ImportError: Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt.

I am trying the below Lora training notebook
https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/Gemma/lora.ipynb
When running the below code segment

from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.utils.exp_manager import exp_manager

trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()
exp_manager(trainer, cfg.exp_manager)

I am getting the error

ImportError: Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt.

So as per the official documentation when I try to install Apex,

git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout b496d85fb88a801d8e680872a12822de310951fd
pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./

this installation raises the given error


Usage:
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...
no such option: --config-settings

triton-inference-server bring-up on kubernetes error

I am facing issues related to incompatible CUDA. My host cuda version is CUDA 12.3 and it seems the Nemo Inference Server for RAG image is built with CUDA 12.2.

kubectl logs pod/triton-inference-server-58f477b7d7-2gnps -n rag-llm-pipeline

/usr/bin/python3 -m model_server llama --max-input-length 3000 --max-output-length 512 --quantization int4_awq
[22] Jan 19 11:39:07 [ INFO] - model_server - Reading the model directory.
[22] Jan 19 11:39:07 [ INFO] - model_server - Model file format: PYTORCH
[22] Jan 19 11:39:07 [ INFO] - model_server - World Size: 1
[22] Jan 19 11:39:07 [ INFO] - model_server - Compute Capability: 7.0
[22] Jan 19 11:39:07 [ INFO] - model_server - Quantization: int4_awq
[22] Jan 19 11:39:07 [ INFO] - model_server - Starting TensorRT Conversion.
[22] Jan 19 11:39:07 [ INFO] - model_server.conversion.llama - Model Format: PYTORCH
[01/19/2024-11:39:10] [TRT-LLM] [I] Setting inter_size to 11008.
[01/19/2024-11:39:10] [TRT-LLM] [I] Serially build TensorRT engines.
[01/19/2024-11:39:10] [TRT] [W] Unable to determine GPU memory usage: out of memory
[01/19/2024-11:39:10] [TRT] [W] Unable to determine GPU memory usage: out of memory
[01/19/2024-11:39:10] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 228, GPU 0 (MiB)
[01/19/2024-11:39:10] [TRT] [E] 6: CUDA initialization failure with error: 2. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Traceback (most recent call last):
File "/opt/conversion_scripts/llama/build.py", line 772, in
build(0, args)
File "/opt/conversion_scripts/llama/build.py", line 702, in build
builder = Builder()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 75, in init
self._trt_builder = trt.Builder(logger.trt_logger)
TypeError: pybind11::init(): factory function returned nullptr
[22] Jan 19 11:39:11 [ ERROR] - main - TensorRT conversion returned a non-zero exit code.

Error 401 when running application

Hi, the app was running fine last week but last Friday there were the following error, I am using Nvidia's Neva-22b api key.

Exception: [401] Unknown Error {'timestamp': '2024-05-27T01:43:25.029+00:00', 'path': '/v2/nvcf/functions', 'status': 401, 'error': 'Unauthorized', 'requestId': 'd2360345-1860488'}

Tasks

Beta Give feedback

No tasks being tracked yet.

Options

Can't connect to the pgvector database with the developer RAG

Steps to reproduce:

In compose.env, set POSTGRES_PASSWORD=password; POSTGRES_USER=pgadmin; POSTGRES_DB=api.
Start the developer RAG with docker-compose-pgvector.yaml
On the developer RAG UI, swith to the 'kb' page, click 'Add file' to upload a pdf file.

Problem description: The UI show 'Error' for the document upload action. In the logs, saw these messages:

chain-server | connection to server at "xx.xx.xx.xx", port 5432 failed: FATAL: no pg_hba.conf entry for host "192.168.0.1", user "pgadmin", database "pgadmin", no encryption

Root cause: it seems the connection string is not correctly specified in the code. Database name should not be the same as the user name.

Other information: if the default user name 'postgres' is used, then this problem will not happen.

Vamos criar os apps e web sites mais simples para pessoas comuns poderem entender e usar de forma fácil

ImportError: cannot import name 'BaseHTTPResponse' from 'urllib3.response'

chain-server container start error, error info as follows:

log link error_log.txt.txt

langchain_nvidia_trt not working

I have gone through the notebooks but couldn't able to stream the tokens from the TensorRTLLM.
Here's the issue:

Code used:

from langchain_nvidia_trt.llms import TritonTensorRTLLM
import time
import random

triton_url = "localhost:8001"
pload = {
            'tokens':300,
            'server_url': triton_url,
            'model_name': "ensemble",
            'temperature':1.0,
            'top_k':1,
            'top_p':0,
            'beam_width':1,
            'repetition_penalty':1.0,
            'length_penalty':1.0
}
client = TritonTensorRTLLM(**pload)

LLAMA_PROMPT_TEMPLATE = (
 "<s>[INST] <<SYS>>"
 "{system_prompt}"
 "<</SYS>>"
 "[/INST] {context} </s><s>[INST] {question} [/INST]"
)
system_prompt = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Please ensure that your responses are positive in nature."
context=""
question='What is the fastest land animal?'
prompt = LLAMA_PROMPT_TEMPLATE.format(system_prompt=system_prompt, context=context, question=question)

start_time = time.time()
tokens_generated = 0

for val in client._stream(prompt):
    tokens_generated += 1
    print(val, end="", flush=True)

total_time = time.time() - start_time
print(f"\n--- Generated {tokens_generated} tokens in {total_time} seconds ---")
print(f"--- {tokens_generated/total_time} tokens/sec")

Error message has incorrect model engine name of nemo-infer instead of ai-playground

RetrievalAugmentedGeneration.common.utils.get_llm() reports the incorrect model_engine name in the last line of the function in error string. Says "Supported engines are triton-trt-llm and nemo-infer", but should say "Supported engines are triton-trt-llm and ai-playground".

In the config.yaml file, we must specify llm.model_engine as one of [triton-trt-llm, ai-playground].

Fase 3 foco educação precose de linguagem de programação escolar para futuros gênios

// Importando os módulos necessários
import React from 'react';
import { View, Text, Button } from 'react-native';

// Componente da página inicial
function HomeScreen({ navigation }) {
return (
<View style={{ flex: 1, alignItems: 'center', justifyContent: 'center' }}>
Bem-vindo ao EduConnect!
<Button
title="Ir para a página de aulas"
onPress={() => navigation.navigate('Aulas')}
/>

);
}

// Componente da página de aulas
function AulasScreen() {
return (
<View style={{ flex: 1, alignItems: 'center', justifyContent: 'center' }}>
Aulas

);
}

// Configuração do navegador
const Stack = createStackNavigator();

function App() {
return (

<Stack.Navigator initialRouteName="Home">
<Stack.Screen name="Home" component={HomeScreen} options={{ title: 'Início' }} />
<Stack.Screen name="Aulas" component={AulasScreen} />
</Stack.Navigator>

);
}

export default App;

k8s deploy gives an error in llm-playground

The error occurs even though the module is installed.

Traceback (most recent call last): │
│ File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main │
│ return _run_code(code, main_globals, None, │
│ File "/usr/lib/python3.8/runpy.py", line 87, in _run_code │
│ exec(code, run_globals) │
│ File "/app/frontend/main.py", line 25, in │
│ import uvicorn │
│ ModuleNotFoundError: No module named 'uvicorn' │
│ Stream closed EOF for rag-llm-pipeline/llm-playground-7f7775fb6c-fvnxr (llm-playground)

Please check and get back to us.

Unauthorized issue

File "/root/Python-3.10.14/oranbot/lib/python3.10/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 311, in _try_raise
raise Exception(f"{header}\n{body}") from None
Exception: [401] Unauthorized
Bearer error="invalid_token"
error_description="Bearer token is malformed"
error_uri="https://tools.ietf.org/html/rfc6750#section-3.1"
Please check or regenerate your API key.

This is the error we are getting in the vector db creation

Data not written to the specified pgvector database with the developer RAG