rh-aiservices-bu / llm-on-openshift Goto Github PK

Resources, demos, recipes,... to work with LLMs on OpenShift with OpenShift AI or Open Data Hub.

License: Apache License 2.0

Dockerfile 69.92% Python 12.73% Jupyter Notebook 17.35%

llm-on-openshift's Introduction

LLM on OpenShift

In this repo you will find resources, demos, recipes... to work with LLMs on OpenShift with OpenShift AI or Open Data Hub.

Content

Inference Servers

The following Inference Servers for LLMs can be deployed standalone on OpenShift:

vLLM: how to deploy vLLM, the "Easy, fast, and cheap LLM serving for everyone".
Hugging Face TGI: how to deploy the Text Generation Inference server from Hugging Face.
Caikit-TGIS-Serving (external): how to deploy the Caikit-TGIS-Serving stack, from OpenDataHub.
Ollama: how to deploy Ollama using CPU only for inference.
SBERT: runtime to serve Sentence Transformers models.

Serving Runtimes deployment

The following Runtimes can be imported in the Single-Model Serving stack of Open Data Hub or OpenShift AI.

Vector Databases

The following Databases can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications:

Milvus: Full recipe to deploy the Milvus vector store, in standalone or cluster mode.
PostgreSQL+pgvector: Full recipe to create an instance of PostgreSQL with the pgvector extension, making it usable as a vector store.
Redis: Full recipe to deploy Redis, create a Cluster and a suitable Database for a Vector Store.

Inference and application examples

Caikit: Basic example demonstrating how to work with Caikit+TGIS for LLM serving.
Langchain examples: Various notebooks demonstrating how to work with Langchain. Examples are provided for different types of LLM servers (standalone or using the Single-Model Serving stack of Open Data Hub or OpenShift AI) and different vector databases.
Langflow examples: Various examples demonstrating how to work with Langflow.
UI examples: Various examples on how to create and deploy a UI to interact with your LLM.

llm-on-openshift's People

Contributors

Stargazers

Watchers

llm-on-openshift's Issues

docs: add updated example with caikit-nlp-client

The python client has been released on pypi and is available here https://github.com/opendatahub-io/caikit-nlp-client. docs should be updated to use this new client

Enable vLLM for FIPS enabled enviroments

vLLM Deployment fails on FIPS enabled clusters.
Adding the following to the Dockerfile of the vLLM build fixes that issue:

USER 0
RUN sed -i s/md5/sha1/g /opt/app-root/lib64/python3.11/site-packages/triton/runtime/jit.py
USER 1001

the container fails when built using the container file given in the example

Please refer to the file - https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/examples/ui/gradio/gradio-hftgi-rag-redis/Containerfile. When you try to run the container it fails with an error python: can't open file '/opt/app-root/src/app.py': [Errno 13] Permission denied. I was using my own app.py. @guimou FYI

Multi GPU setup for VLLM in Openshift does not work

Hi, so tried using your deployment.yaml; however while the single GPU instance works, multi GPU stalls. Here is the log output

/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
WARNING 06-10 20:49:01 config.py:1155] Casting torch.bfloat16 to torch.float16.
2024-06-10 20:49:04,319 INFO worker.py:1749 -- Started a local Ray instance.
INFO 06-10 20:49:04 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='mistralai/Mistral-7B-Instruct-v0.2', speculative_config=None, tokenizer='mistralai/Mistral-7B-Instruct-v0.2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=6144, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=mistralai/Mistral-7B-Instruct-v0.2)

And the deployment file modified

kind: Deployment
apiVersion: apps/v1
metadata:
  name: vllm
  labels:
    app: vllm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: vllm
    spec:
      imagePullSecrets:
      - name: regcred
      restartPolicy: Always
      schedulerName: default-scheduler
      affinity: {}
      terminationGracePeriodSeconds: 120
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '8'
              memory: 24Gi
              nvidia.com/gpu: '2'
            requests:
              cpu: '6'
          readinessProbe:
            httpGet:
              path: /health
              port: http
              scheme: HTTP
            timeoutSeconds: 5
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: server
          livenessProbe:
            httpGet:
              path: /health
              port: http
              scheme: HTTP
            timeoutSeconds: 8
            periodSeconds: 100
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: HUGGING_FACE_HUB_TOKEN
              value: xxxxxx
          args: [
            "--model",
            "mistralai/Mistral-7B-Instruct-v0.2",
            "--dtype", "float16",
            "--max-model-len", "6144",
            "--tensor-parallel-size", "2"]
          securityContext:
            capabilities:
              drop:
                - ALL
            runAsNonRoot: false
            allowPrivilegeEscalation: True
            seccompProfile:
              type: RuntimeDefault
          ports:
            - name: http
              containerPort: 8000
              protocol: TCP
          imagePullPolicy: IfNotPresent
          startupProbe:
            httpGet:
              path: /health
              port: http
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 24
          volumeMounts:
            - mountPath: /opt/app-root/src/.cache/huggingface/hub
              name: model
            - name: shm
              mountPath: /dev/shm
          terminationMessagePolicy: File
          image: 'quay.io/rh-aiservices-bu/vllm-openai-ubi9:0.4.2'
      volumes:
        - name: model
          persistentVolumeClaim:
            claimName: hub-pv-filesystem
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 10Gi
      dnsPolicy: ClusterFirst
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
  strategy:
    type: Recreate
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600

Granite support in vLLM containers

It looks like full Granite support was added to vLLM in v0.4.3, and there are some enhancements coming in the 'vtest' release.

Currently, I only see v0.4.2 available in Quay: https://quay.io/rh-aiservices-bu/vllm-openai-ubi9:0.4.2

Coud the vllm-openai-ubi9 image be rebuilt with vLLM v0.4.3?

Does it make sense to also provide vtest release images?

new serving container image would be needed with trust_remote_code=True

Getting this error

Traceback (most recent call last):
File "/opt/app-root/src/app.py", line 35, in
model = SentenceTransformer(args.model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 287, in init
modules = self._load_sbert_model(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 1487, in _load_sbert_model
module = Transformer(model_name_or_path, cache_dir=cache_folder, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 53, in init
config = AutoConfig.from_pretrained(model_name_or_path, **config_args, cache_dir=cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 937, in from_pretrained
trust_remote_code = resolve_trust_remote_code(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/transformers/dynamic_module_utils.py", line 639, in resolve_trust_remote_code
raise ValueError(
ValueError: Loading /mnt/models requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

AS per slack conversation

llm-on-openshift/llm-servers/sbert/gpu/app.py

Line 35 in 1b864db

model = SentenceTransformer(args.model_path)

So model = SentenceTransformer(args.model_path, trust_remote_code=True)

https://redhat-internal.slack.com/archives/C03UGJY6Z1A/p1721669067245059?thread_ts=1721160037.008339&cid=C03UGJY6Z1A

ValueError: "CaikitLLM" object has no field "inference_server

I am trying to run through one of the example notebooks with the Minimal image from OpenShift AI and I am getting the error below.

Langchain-Caikit-Basic.ipynb

# Basic llm object definition, no text streaming
llm = caikit_tgis_langchain.CaikitLLM(
    inference_server_url=inference_server_url,
    model_id=model_id,
    certificate_chain=certificate_chain_file,
    streaming=False
)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 2
      1 # Basic llm object definition, no text streaming
----> 2 llm = caikit_tgis_langchain.CaikitLLM(
      3     inference_server_url=inference_server_url,
      4     model_id=model_id,
      5     certificate_chain=certificate_chain_file,
      6     streaming=False
      7 )

File ~/llm-on-openshift/examples/notebooks/langchain/caikit_tgis_langchain.py:21, in CaikitLLM.__init__(self, inference_server_url, model_id, certificate_chain, streaming)
     12 def __init__(
     13     self,
     14     inference_server_url: str,
   (...)
     17     streaming: bool = False,
     18 ):
     19     super().__init__()
---> 21     self.inference_server = inference_server_url
     22     self.model_id = model_id
     24     if certificate_chain:

File /opt/app-root/lib64/python3.9/site-packages/pydantic/v1/main.py:357, in BaseModel.__setattr__(self, name, value)
    354     return object_setattr(self, name, value)
    356 if self.__config__.extra is not Extra.allow and name not in self.__fields__:
--> 357     raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')
    358 elif not self.__config__.allow_mutation or self.__config__.frozen:
    359     raise TypeError(f'"{self.__class__.__name__}" is immutable and does not support item assignment')

ValueError: "CaikitLLM" object has no field "inference_server"

redis database failed to create

Hi,

I am trying this demo on openshift 4.13.15, when I try to create redis database using this

apiVersion: app.redislabs.com/v1alpha1
kind: RedisEnterpriseDatabase
metadata:
  name: my-doc
spec:
  memorySize: 4GB
  modulesList:
    - name: search
      version: 2.8.4
  persistence: snapshotEvery12Hour
  replication: true
  tlsMode: disabled
  type: redis

I got error:

Error from server: error when creating "/home/3node//tmp/redis-db.yaml": admission webhook "redisenterprise.admission.redislabs" denied the request: cluster missing this module: search version: 2.8.4

The Redis Enterprise Operator version installed : 7.2.4-7.0 provided by Redis

I also create a route to the redis management UI, and find out there is no module loaded.

But I can see the modules is under /opt/persistent after login into rec-0 pod

[3node@ocp4-helper ~]$ oc exec -n llm-demo rec-0 -- ls persistent/modules
Defaulted container "redis-enterprise-node" out of: redis-enterprise-node, bootstrapper
ReJSON
bf
graph
redisgears_2
search
timeseries

I also try to upload the module using the management UI, but the module fail to import.

Could you please help to point out what to troubleshoot next step?

Regards,
George

vLLM 0.4.0.post1 image missing libnccl

Trying to run multi-GPU inference with this image, I get the below error:

INFO 04-23 19:30:01 pynccl_utils.py:17] Failed to import NCCL library: libnccl.so.2: cannot open shared object file: No such file or directory
INFO 04-23 19:30:01 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-23 19:30:03 selector.py:16] Using FlashAttention backend.
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:04 pynccl.py:53] Failed to load NCCL library from libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerVllm pid=1209) INFO 04-23 19:30:04 pynccl_utils.py:17] Failed to import NCCL library: libnccl.so.2: cannot open shared object file: No such file or directory
(RayWorkerVllm pid=1209) INFO 04-23 19:30:04 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerVllm pid=1209) INFO 04-23 19:30:05 selector.py:16] Using FlashAttention backend.
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] Traceback (most recent call last):
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]     return executor(*args, **kwargs)
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 100, in init_device
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]     init_distributed_environment(self.parallel_config, self.rank,
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 287, in init_distributed_environment
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]     pynccl_utils.init_process_group(
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/parallel_utils/pynccl_utils.py", line 45, in init_process_group
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]     logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44]                                        ^^^^^^^^^^^^^^
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] NameError: name 'ncclGetVersion' is not defined

Looking at the installed packages, I don't see any libnccl installed. It looks like from

llm-on-openshift/llm-servers/vllm/Containerfile

Line 53 in 6864d21

ENV NCCL_VERSION 2.17.1

that the intention was perhaps to install a matching libnccl, but it just got missed?

databaseSecretName throws error

Issues:
when trying to apply RedisEnterpriseDatabase I get an error below:

Solution:

(either) create the secret withname "redb-my-doc" following the instructions here
(or) remove the attribute altogether.

Update Ollama to at least 0.2.1

I'd like to leverage the new added v1/models endpoint that was added last week. Could you please rebuild/publish a new image on Quay.io based on Ollama 0.2.1.

llm-on-openshift/llm-servers/ollama/Containerfile

Line 2 in 1b864db

ARG OLLAMA_VERSION=v0.1.45

Vector Extension not enabled in PGVector

In the following notebook:

https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/examples/notebooks/langchain/Langchain-PgVector-Ingest.ipynb

The Create the index and ingest the documents step fails with the following error message:

Exception: Failed to create vector extension: (psycopg.errors.InsufficientPrivilege) permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.
[SQL: BEGIN;SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;COMMIT;]
(Background on this error at: https://sqlalche.me/e/20/f405)

The full stack trace can be found here:

---------------------------------------------------------------------------
InsufficientPrivilege                     Traceback (most recent call last)
File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1970     if not evt_handled:
-> 1971         self.dialect.do_execute(
   1972             cursor, str_statement, effective_parameters, context
   1973         )
   1975 if self._has_events or self.engine._has_events:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    918 def do_execute(self, cursor, statement, parameters, context=None):
--> 919     cursor.execute(statement, parameters)

File /opt/app-root/lib64/python3.9/site-packages/psycopg/cursor.py:732, in Cursor.execute(self, query, params, prepare, binary)
    731 except e._NO_TRACEBACK as ex:
--> 732     raise ex.with_traceback(None)
    733 return self

InsufficientPrivilege: permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:383, in PGVector.create_vector_extension(self)
    377 statement = sqlalchemy.text(
    378     "BEGIN;"
    379     "SELECT pg_advisory_xact_lock(1573678846307946496);"
    380     "CREATE EXTENSION IF NOT EXISTS vector;"
    381     "COMMIT;"
    382 )
--> 383 session.execute(statement)
    384 session.commit()

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/orm/session.py:2306, in Session.execute(self, statement, params, execution_options, bind_arguments, _parent_execute_state, _add_event)
   2255 r"""Execute a SQL expression construct.
   2256 
   2257 Returns a :class:`_engine.Result` object representing
   (...)
   2304 
   2305 """
-> 2306 return self._execute_internal(
   2307     statement,
   2308     params,
   2309     execution_options=execution_options,
   2310     bind_arguments=bind_arguments,
   2311     _parent_execute_state=_parent_execute_state,
   2312     _add_event=_add_event,
   2313 )

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/orm/session.py:2200, in Session._execute_internal(self, statement, params, execution_options, bind_arguments, _parent_execute_state, _add_event, _scalar_result)
   2199 else:
-> 2200     result = conn.execute(
   2201         statement, params or {}, execution_options=execution_options
   2202     )
   2204 if _scalar_result:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1422, in Connection.execute(self, statement, parameters, execution_options)
   1421 else:
-> 1422     return meth(
   1423         self,
   1424         distilled_parameters,
   1425         execution_options or NO_OPTIONS,
   1426     )

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/sql/elements.py:514, in ClauseElement._execute_on_connection(self, connection, distilled_params, execution_options)
    513         assert isinstance(self, Executable)
--> 514     return connection._execute_clauseelement(
    515         self, distilled_params, execution_options
    516     )
    517 else:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1644, in Connection._execute_clauseelement(self, elem, distilled_parameters, execution_options)
   1636 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
   1637     dialect=dialect,
   1638     compiled_cache=compiled_cache,
   (...)
   1642     linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
   1643 )
-> 1644 ret = self._execute_context(
   1645     dialect,
   1646     dialect.execution_ctx_cls._init_compiled,
   1647     compiled_sql,
   1648     distilled_parameters,
   1649     execution_options,
   1650     compiled_sql,
   1651     distilled_parameters,
   1652     elem,
   1653     extracted_params,
   1654     cache_hit=cache_hit,
   1655 )
   1656 if has_events:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1850, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1849 else:
-> 1850     return self._exec_single_context(
   1851         dialect, context, statement, parameters
   1852     )

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1990, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1989 except BaseException as e:
-> 1990     self._handle_dbapi_exception(
   1991         e, str_statement, effective_parameters, cursor, context
   1992     )
   1994 return result

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:2357, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
   2356     assert sqlalchemy_exception is not None
-> 2357     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
   2358 else:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1970     if not evt_handled:
-> 1971         self.dialect.do_execute(
   1972             cursor, str_statement, effective_parameters, context
   1973         )
   1975 if self._has_events or self.engine._has_events:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    918 def do_execute(self, cursor, statement, parameters, context=None):
--> 919     cursor.execute(statement, parameters)

File /opt/app-root/lib64/python3.9/site-packages/psycopg/cursor.py:732, in Cursor.execute(self, query, params, prepare, binary)
    731 except e._NO_TRACEBACK as ex:
--> 732     raise ex.with_traceback(None)
    733 return self

ProgrammingError: (psycopg.errors.InsufficientPrivilege) permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.
[SQL: BEGIN;SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;COMMIT;]
(Background on this error at: https://sqlalche.me/e/20/f405)

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
Cell In[14], line 3
      1 embeddings = HuggingFaceEmbeddings()
----> 3 db = PGVector.from_documents(
      4     documents=all_splits,
      5     embedding=embeddings,
      6     collection_name=COLLECTION_NAME,
      7     connection_string=CONNECTION_STRING,
      8     #pre_delete_collection=True # This deletes existing collection and its data, use carefully!
      9 )

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:1139, in PGVector.from_documents(cls, documents, embedding, collection_name, distance_strategy, ids, pre_delete_collection, use_jsonb, **kwargs)
   1135 connection_string = cls.get_connection_string(kwargs)
   1137 kwargs["connection_string"] = connection_string
-> 1139 return cls.from_texts(
   1140     texts=texts,
   1141     pre_delete_collection=pre_delete_collection,
   1142     embedding=embedding,
   1143     distance_strategy=distance_strategy,
   1144     metadatas=metadatas,
   1145     ids=ids,
   1146     collection_name=collection_name,
   1147     use_jsonb=use_jsonb,
   1148     **kwargs,
   1149 )

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:1011, in PGVector.from_texts(cls, texts, embedding, metadatas, collection_name, distance_strategy, ids, pre_delete_collection, use_jsonb, **kwargs)
   1003 """
   1004 Return VectorStore initialized from texts and embeddings.
   1005 Postgres connection string is required
   1006 "Either pass it as a parameter
   1007 or set the PGVECTOR_CONNECTION_STRING environment variable.
   1008 """
   1009 embeddings = embedding.embed_documents(list(texts))
-> 1011 return cls.__from(
   1012     texts,
   1013     embeddings,
   1014     embedding,
   1015     metadatas=metadatas,
   1016     ids=ids,
   1017     collection_name=collection_name,
   1018     distance_strategy=distance_strategy,
   1019     pre_delete_collection=pre_delete_collection,
   1020     use_jsonb=use_jsonb,
   1021     **kwargs,
   1022 )

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:481, in PGVector.__from(cls, texts, embeddings, embedding, metadatas, ids, collection_name, distance_strategy, connection_string, pre_delete_collection, use_jsonb, **kwargs)
    478 if connection_string is None:
    479     connection_string = cls.get_connection_string(kwargs)
--> 481 store = cls(
    482     connection_string=connection_string,
    483     collection_name=collection_name,
    484     embedding_function=embedding,
    485     distance_strategy=distance_strategy,
    486     pre_delete_collection=pre_delete_collection,
    487     use_jsonb=use_jsonb,
    488     **kwargs,
    489 )
    491 store.add_embeddings(
    492     texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs
    493 )
    495 return store

File /opt/app-root/lib64/python3.9/site-packages/langchain_core/_api/deprecation.py:183, in deprecated.<locals>.deprecate.<locals>.finalize.<locals>.warn_if_direct_instance(self, *args, **kwargs)
    181     warned = True
    182     emit_warning()
--> 183 return wrapped(self, *args, **kwargs)

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:341, in PGVector.__init__(self, connection_string, embedding_function, embedding_length, collection_name, collection_metadata, distance_strategy, pre_delete_collection, logger, relevance_score_fn, connection, engine_args, use_jsonb, create_extension)
    320 if not use_jsonb:
    321     # Replace with a deprecation warning.
    322     warn_deprecated(
    323         "0.0.29",
    324         pending=True,
   (...)
    339         ),
    340     )
--> 341 self.__post_init__()

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:348, in PGVector.__post_init__(self)
    346 """Initialize the store."""
    347 if self.create_extension:
--> 348     self.create_vector_extension()
    350 EmbeddingStore, CollectionStore = _get_embedding_collection_store(
    351     self._embedding_length, use_jsonb=self.use_jsonb
    352 )
    353 self.CollectionStore = CollectionStore

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:386, in PGVector.create_vector_extension(self)
    384         session.commit()
    385 except Exception as e:
--> 386     raise Exception(f"Failed to create vector extension: {e}") from e

Exception: Failed to create vector extension: (psycopg.errors.InsufficientPrivilege) permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.
[SQL: BEGIN;SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;COMMIT;]
(Background on this error at: https://sqlalche.me/e/20/f405)

Work Around:

Access the terminal for the PGVector pod.
Open postgres cli interface: psql
Select the vector database: \c vectordb
Create the vector extension: CREATE EXTENSION vector;

The yq script for setting the security context in the vectordb-minio deployment for Milvus is not correct

Running the yq command as described here produces an invalid deployment for vectored-minio. Trying to apply the resulting deployment produces this error:

$ oc-n milvus apply -f milvus_manifest_standalone.yaml
secret/vectordb-minio unchanged
configmap/vectordb-minio unchanged
configmap/vectordb-milvus unchanged
persistentvolumeclaim/vectordb-minio unchanged
persistentvolumeclaim/vectordb-milvus unchanged
service/vectordb-minio unchanged
service/vectordb-milvus unchanged
deployment.apps/vectordb-milvus-standalone configured
the namespace from the provided object "default" does not match the namespace "milvus". You must pass '--namespace=default' to perform this operation.
Error from server (BadRequest): error when creating "milvus_manifest_standalone.yaml": Deployment in version "v1" cannot be handled as a Deployment: strict decoding error: unknown field "spec.template.spec.securityContext.allowPrivilegeEscalation", unknown field "spec.template.spec.securityContext.capabilities"

This is because the command :

yq '(select(.kind == "Deployment" and .metadata.name == "vectordb-minio") | .spec.template.spec.securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml

applies pod-level security context fields that should be set at container level.

The correct setting should be:

yq '(select(.kind == "Deployment" and .metadata.name == "vectordb-minio") | .spec.template.spec.containers[0].securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false, "seccompProfile": {"type": "RuntimeDefault"} }' -I milvus_manifest_standalone.yaml