milvus-io / milvus-helm Goto Github PK
View Code? Open in Web Editor NEWThe helm chart to deploy Milvus
License: Apache License 2.0
The helm chart to deploy Milvus
License: Apache License 2.0
My goal is: change default mysql password and remove values from the helm chart because both constitute a potential security issue.
Milvus will use this for authentication to mysql link:
mysql://root:{{ .Values.mysql.mysqlRootPassword }}@{{ .Release.Name }}-mysql:3306/{{ .Values.mysql.mysqlDatabase }}
I could use existing secret for mysql installation as the documentation suggest via something like mysql.secretName
, but then milvus ignores it via the line above and still tries to login using the credentials from the chart. The desired behaviour is to use the existing mysql secret here somehow.
I was configuring this chart to run on a cluster kubernetes (kops) at amazon and when I configure the volume with:
persistence:
mountPath: "/var/lib/milvus/db"
## If true, alertmanager will create/use a Persistent Volume Claim
## If false, use emptyDir
##
enabled: true
annotations: {}
# helm.sh/resource-policy: keep
persistentVolumeClaim:
existingClaim: ""
## milvus data Persistent Volume Storage Class
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
storageClass: "gp2"
accessModes: ReadWriteOnce
size: 50Gi
subPath: ""
I get the result (No error message):
__ _________ _ ____ ______
/ |/ / _/ /| | / / / / / __/
/ /|_/ // // /_| |/ / /_/ /\ \
/_/ /_/___/____/___/\____/___/
Welcome to use Milvus!
Milvus Release version: v0.10.0, built at 2020-06-15 14:51.51, with OpenBLAS library.
You are using Milvus CPU edition
Last commit id: 5f3c0052478a08d07d68c5dc1aab57a42293f430
Loading configuration from: ../conf/server_config.yaml
Supported CPU instruction sets: avx2, sse4_2
FAISS hook AVX2
Milvus server started successfully!
Milvus server is going to shutdown ...
if I leave the defaults configuration of the chart i got no problem, but if the pod is recreated i lost the data.
Failed to install app milvus. Error: rendered manifests contain a resource that already exists. Unable to continue with install: Service "milvus" in namespace "milvus" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "milvus"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "milvus"
I have k8s cluster running with 0.10.5, using AWS-EFS as external storage.
❯ helm history poc-release
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
4 Thu Feb 4 18:22:01 2021 superseded milvus-0.10.5 0.10.5 Upgrade complete
5 Wed Feb 10 11:47:46 2021 superseded milvus-0.10.5 0.10.5 Upgrade complete
6 Wed Feb 10 14:54:56 2021 superseded milvus-0.10.5 0.10.5 Upgrade complete
7 Wed Feb 10 15:47:03 2021 superseded milvus-0.10.5 0.10.5 Upgrade complete
8 Tue Aug 24 11:28:22 2021 pending-upgrade milvus-1.1.6 1.1.1 Preparing upgrade
9 Tue Aug 24 11:39:02 2021 superseded milvus-0.10.5 0.10.5 Rollback to 7
10 Tue Aug 24 11:39:24 2021 superseded milvus-1.1.6 1.1.1 Upgrade complete
11 Tue Aug 24 11:40:38 2021 superseded milvus-0.10.5 0.10.5 Rollback to 7
12 Tue Aug 24 11:43:31 2021 superseded milvus-1.1.6 1.1.1 Upgrade complete
13 Tue Aug 24 11:54:39 2021 deployed milvus-0.10.5 0.10.5 Rollback to 9
When I tried to upgrade my cluster to 1.1.1 with(https://artifacthub.io/packages/helm/milvus/milvus/1.1.6):
cd milvus-helm/
git checkout 1.1
<update values.yaml>
helm upgrade <release> .
Seems only -milvus-mishards and -milvus-readonly will actually init new pods, and newly created mishards node crashing with:
2021-08-24 15:49:24,501 | INFO: ----------- DiscoveryConfig ----------------- (__init__.py:18) (MainThread)
2021-08-24 15:49:24,501 | INFO: DISCOVERY_CLASS_NAME: kubernetes (__init__.py:20) (MainThread)
2021-08-24 15:49:24,501 | INFO: DISCOVERY_KUBERNETES_NAMESPACE: default (__init__.py:20) (MainThread)
2021-08-24 15:49:24,501 | INFO: DISCOVERY_KUBERNETES_IN_CLUSTER: True (__init__.py:20) (MainThread)
2021-08-24 15:49:24,501 | INFO: DISCOVERY_KUBERNETES_POLL_INTERVAL: 10 (__init__.py:20) (MainThread)
2021-08-24 15:49:24,501 | INFO: DISCOVERY_KUBERNETES_POD_PATT: poc-release-milvus-readonly-.* (__init__.py:20) (MainThread)
2021-08-24 15:49:24,501 | INFO: DISCOVERY_KUBERNETES_LABEL_SELECTOR: component=readonly (__init__.py:20) (MainThread)
2021-08-24 15:49:24,501 | INFO: --------------------------------------------- (__init__.py:23) (MainThread)
2021-08-24 15:49:24,589 | INFO: Plugin '/source/tracer/./plugins/jaeger_factory.py' Installed In Package: tracer.plugins (jaeger_factory.py:36) (MainThread)
2021-08-24 15:49:24,602 | INFO: Plugin '/source/mishards/router/./plugins/file_based_hash_ring_router.py' Installed In Package: mishards.router.plugins (file_based_hash_ring_router.py:155) (MainThread)
2021-08-24 15:49:24,603 | DEBUG: Init grpc server with max_workers: 50 (server.py:43) (MainThread)
2021-08-24 15:49:24,605 | INFO: Regiterring <bound method Server.pre_run_handler of <mishards.server.Server object at 0x7f52b93639e8>> into server pre_run_handlers (server.py:65) (MainThread)
2021-08-24 15:49:24,606 | INFO: Milvus server start ...... (server.py:109) (MainThread)
2021-08-24 15:49:24,608 | INFO: Adding group "<TopoGroup: default>" (topology.py:108) (MainThread)
Traceback (most recent call last):
File "mishards/main.py", line 15, in <module>
sys.exit(main())
File "mishards/main.py", line 10, in main
server.run(port=settings.SERVER_PORT)
File "/source/mishards/server.py", line 111, in run
ok = self.on_pre_run()
File "/source/mishards/server.py", line 93, in on_pre_run
handler()
File "/source/mishards/server.py", line 62, in pre_run_handler
group.create(name='WOSERVER', uri='{}://{}:{}'.format(url.scheme, ip, url.port or 80))
File "/source/mishards/connections.py", line 268, in create
pool = Milvus(name=name, **milvus_args)
File "/usr/local/lib/python3.6/site-packages/milvus/client/stub.py", line 98, in __init__
self._pool = SingletonThreadPool(pool_uri, **pool_kwargs)
File "/usr/local/lib/python3.6/site-packages/milvus/client/pool.py", line 226, in __init__
self._prepare()
File "/usr/local/lib/python3.6/site-packages/milvus/client/pool.py", line 241, in _prepare
support_versions))
milvus.client.exceptions.VersionError: Version of python SDK(1.1.1) not match that of server0.10.5, excepted is ('1.1.x',)
Any idea what I m missing? Thanks!
When deploying these charts to a Kubernetes 1.19+ cluster, the deployment fails with the following error:
'failed to create resource: Ingress.extensions "milvus" is invalid: spec.rules[0].http.paths[0].pathType: Required value: pathType must be specified'
The ingress templates for Milvus are missing this field. It's required for both networking.k8s.io/v1 and extensions/v1beta1 class objects
Milvus 2.0rc is unstable and it has been crashing for various reasons which is understandable ... .If i redeploy milvus with a fix - I don’t want to recreate every collection to be recreated all over again - I would just like to connect to existing pvc but idk where in the helm chart I would specify that in my redeployment?
is there a way to connect to existing etcd pvc which was created via the helm chart in a new deployment ?(kubernetes deployment) The default pvc that is created is named " data-milvus-etcd-0"
your help is greatly appreciated
currently working with 2.1.5 helm chart
[root@2-62-godfs-01 milvus-gpu]# kubectl describe pods milvus-milvus-gpu-engine-5bc644dd9f-5kbjd | grep image
Normal Pulling 40s (x4 over 2m51s) kubelet, 192.168.2.63 Pulling image "registry.zilliz.com/milvus/engine:branch-0.5.0-release"
Warning Failed 25s (x4 over 2m32s) kubelet, 192.168.2.63 Failed to pull image "registry.zilliz.com/milvus/engine:branch-0.5.0-release": rpc error: code = Unknown desc = Errorresponse from daemon: Get https://registry.zilliz.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal BackOff 11s (x5 over 2m32s) kubelet, 192.168.2.63 Back-off pulling image "registry.zilliz.com/milvus/engine:branch-0.5.0-release"
[root@2-62-godfs-01 milvus-gpu]#
i guess there is no way to set config cache.preload_collection when installing milvus via helm chart.
Hi guys, I am interested in using the milvus kubernetes with farm-haystack and hence, I need to use V1.1.0 of Milvus for that. I have managed to use it with docker-compose but now I want to use Kubernetes instead of docker for my tasks but I don't know how to start the V1.1.0 in Kubernetes. I tried it once but it kept giving an CrashLoopBackoff so, I am stuck now as I can't figure what I have done wrong. Please guide me with that if possible, I am completely new to Kubernetes and want to learn all these things. Thank you.
I know it won't support cluster/mishards, but I hope that's not a dealbreaker for now.
milvus-io/milvus#4738 supports logging to stdout as an option.
logs:
log_to_stdout: false
log_to_file: true
But the same is not supported by helm chart it seems. Milvus version used is 1.0.0
helm chart version is 1.1.0.
I've been trying to get Milvus (0.10.2) to run on GPU in a kubernetes installation. It is running on the correct nodepool, but milvus stays in CrashLoopBack with the error /var/lib/milvus/bin/milvus_server: error while loading shared libraries: /lib64/libnvidia-ml.so.1: file too short
. Here's the values.yaml I'm using:
nodeSelector:
cloud.google.com/gke-nodepool: ai-gpu2
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
gpu:
enabled: true
image:
repository: milvusdb/milvus
tag: 0.10.2-gpu-d081520-8a2393
pullPolicy: Always
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
I've tried this with all of the 0.10.x versions with the same outcome.
When upgrading Milvus from 0.9.0 to 0.9.1, kubernetes first starts a new pod with 0.9.1 before stopping 0.9.0 (as the deployment is currently set to RollingUpdate).
However the 0.9.1 pod gets stuck restarting over and over with the following error:
Permission denied. Could not get lock.
Milvus server exit...
deploy_mode: single instance lock wal path failed.
So the upgrade never takes place until someone manually deletes the 0.9.0 pod
Maybe the deployment should be set to Recreate instead?
For our k8s set up we are required to add a custom label
labels:
team: our_ai_team
to all team-owned resources.
Unfortunately the values are hardcoded here and here
like that:
{{/* Helm required labels */}}
{{- define "milvus.labels" -}}
helm.sh/chart: {{ include "milvus.chart" . }}
{{ include "milvus.matchLabels" . }}
{{- if or .Chart.AppVersion .Values.image.tag }}
app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}
{{/* matchLabels */}}
{{- define "milvus.matchLabels" -}}
app.kubernetes.io/name: {{ include "milvus.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}
and then used e.g. here or here
like that
template:
metadata:
labels:
{{ include "milvus.matchLabels" . | indent 8 }}
component: "writable"
annotations:
so I don't see an easy was of fixing it at the moment
release name is test-standalone-pod-kill-2021-12-31-16-42-49
Pod name is test-standalone-pod-kill-2021-12-31-16-42-49-milvus-standaq8d4r
error log
[2021/12/31 08:56:29.230 +00:00] [DEBUG] [index_coord.go:859] ["IndexCoord find unassigned tasks "] ["Unassigned tasks number"=4] ["Available IndexNode IDs"="[136]"]
[2021/12/31 08:56:29.230 +00:00] [DEBUG] [meta_table.go:212] ["IndexCoord metaTable update UpdateVersion"] [IndexBuildId=430162685561143298]
[2021/12/31 08:56:29.230 +00:00] [DEBUG] [meta_table.go:224] ["IndexCoord metaTable update UpdateVersion"] [IndexBuildId=430162685561143298] [Version=2]
[2021/12/31 08:56:29.232 +00:00] [DEBUG] [index_coord.go:792] ["IndexCoord watchMetaLoop find meta updated."]
[2021/12/31 08:56:29.232 +00:00] [DEBUG] [meta_table.go:102] ["IndexCoord metaTable saveIndexMeta "] [key=indexes/430162685561143298] []
[2021/12/31 08:56:29.232 +00:00] [DEBUG] [meta_table.go:108] ["IndexCoord metaTable saveIndexMeta success"] [meta.revision=3]
[2021/12/31 08:56:29.232 +00:00] [DEBUG] [index_coord.go:867] ["The version of the task has been updated"] [indexBuildID=430162685561143298]
[2021/12/31 08:56:29.232 +00:00] [DEBUG] [node_manager.go:106] ["IndexCoord NodeManager PeekClient"]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x16a17c7]
goroutine 334 [running]:
github.com/milvus-io/milvus/internal/indexcoord.(*NodeManager).PeekClient(0xc0005b0120, 0xc001654360, 0x2, 0x0, 0x0, 0x0)
/go/src/github.com/milvus-io/milvus/internal/indexcoord/node_manager.go:113 +0x107
github.com/milvus-io/milvus/internal/indexcoord.(*IndexCoord).assignTaskLoop(0xc000323700)
/go/src/github.com/milvus-io/milvus/internal/indexcoord/index_coord.go:869 +0x65f
created by github.com/milvus-io/milvus/internal/indexcoord.(*IndexCoord).Start.func1
/go/src/github.com/milvus-io/milvus/internal/indexcoord/index_coord.go:264 +0xe7
The labels we used to manage milvus dependencies charts are different.
Let's considering about using a unified label for easier management. @LoveEachDay @Bennu-Li
For example, the milvus release benchmark-no-clean-cs79g-1
, we got:
labels for etcd as:
labels:
app.kubernetes.io/instance: benchmark-no-clean-cs79g-1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: etcd
labels for pulsar as:
labels:
app: minio
release: benchmark-no-clean-cs79g-1
labels for minio as:
labels:
app: pulsar
component: bookkeeper
release: benchmark-no-clean-cs79g-1
when I install milvus with this command:
helm install --set cluster.enabled=true --set cache.insertBufferSize=4GB --set cache.cacheSize=6GB --set metrics.enabled=true --set readonly.cache.insertBufferSize=2GB --set readonly.cache.cacheSize=5GB --set replicas=3 milvus milvus/milvus
the milvus-readonly pod stuck in initializing.
In tag milvus-1.1.1, milvus version is 1.1.0, when does the version name of helm-chart tag match with the version of milvus?
Hi,
When I run helm lint
on this chart I get:
==> Linting milvus-helm/charts/milvus
[ERROR] templates/hpa.yaml: unable to parse YAML: error converting YAML to JSON: yaml: line 6: did not find expected key
[ERROR] templates/hpa.yaml: unable to parse YAML: error converting YAML to JSON: yaml: line 6: did not find expected key
[ERROR] templates/hpa.yaml: unable to parse YAML: error converting YAML to JSON: yaml: line 6: did not find expected key
[ERROR] templates/hpa.yaml: unable to parse YAML: error converting YAML to JSON: yaml: line 6: did not find expected key
Error: 1 chart(s) linted, 1 chart(s) failed
I also see this helm error when I try and install the chart while overriding the autoscaling values:
helm upgrade --install example milvus/milvus \
--set queryNode.autoscaling.enabled=true \
--set dataNode.autoscaling.enabled=true \
--set indexNode.autoscaling.enabled=true \
--set proxy.autoscaling.enabled=true
Error:
Error: YAML parse error on milvus/templates/hpa.yaml: error converting YAML to JSON: yaml: line 6: did not find expected key
helm.go:88: [debug] error converting YAML to JSON: yaml: line 6: did not find expected key
YAML parse error on milvus/templates/hpa.yaml
helm.sh/helm/v3/pkg/releaseutil.(*manifestFile).sort
helm.sh/helm/v3/pkg/releaseutil/manifest_sorter.go:146
helm.sh/helm/v3/pkg/releaseutil.SortManifests
helm.sh/helm/v3/pkg/releaseutil/manifest_sorter.go:106
helm.sh/helm/v3/pkg/action.(*Configuration).renderResources
helm.sh/helm/v3/pkg/action/action.go:165
helm.sh/helm/v3/pkg/action.(*Install).Run
helm.sh/helm/v3/pkg/action/install.go:247
main.runInstall
helm.sh/helm/v3/cmd/helm/install.go:242
main.newUpgradeCmd.func2
helm.sh/helm/v3/cmd/helm/upgrade.go:115
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:897
main.main
helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
runtime/proc.go:225
runtime.goexit
runtime/asm_amd64.s:1371
Thanks!
should replace .Values.image.resources
with .Values.readonly.resources
Milvus standalone doesn't just use one?
error info:
Warning FailedScheduling 15h default-scheduler 0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
I deploy milvus-cluster via helm, config as follows:
➜ ~ kc get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
milvus-cluster-test-admin 1/1 1 1 2d16h
milvus-cluster-test-mishards 2/2 2 2 2d16h
milvus-cluster-test-mysql 1/1 1 1 2d16h
milvus-cluster-test-readonly 3/3 3 3 2d16h
milvus-cluster-test-writable 1/1 1 1 2d16h
nfs-client-nfs-client-provisioner 1/1 1 1 2d16h
But seems that query only handled by only one RO instance:
➜ ~ kc top pod
NAME CPU(cores) MEMORY(bytes)
milvus-cluster-test-admin-57d9f68488-l4zfx 1m 7Mi
milvus-cluster-test-mishards-648dd788f6-cc9x2 2m 150Mi
milvus-cluster-test-mishards-648dd788f6-l72fv 1m 347Mi
milvus-cluster-test-mysql-768d99ddd7-tnnzm 7m 207Mi
milvus-cluster-test-readonly-69d4b8765b-9j9t8 2m 8461Mi
milvus-cluster-test-readonly-69d4b8765b-hlvd8 2m 70Mi
milvus-cluster-test-readonly-69d4b8765b-k4mv4 2m 8Mi
milvus-cluster-test-writable-69dfb5c88d-2jl2c 3919m 4523Mi
nfs-client-nfs-client-provisioner-69475b9bfc-wbnkg 4m 10Mi
Only pod milvus-cluster-test-readonly-69d4b8765b-9j9t8 has logs and CPU RAM usage increase when query come.
Problem: after the removal of the chart (e.g. by mistake or because some breaking clean-up was needed) the persistent volume is gone for good. In our case we use Milvus as the primary storage for our vectors, and re-creation of all vectors may take several days keeping the production system unavailable in the meanwhile.
The basic reason is: PVCs do not have "persistentVolumeReclaimPolicy":"Retain"
set but it is Delete
instead. I am not really a k8s expert, so I am trying to fix it by creating two (milvus+mysql) PVC and two PV linking them to each other and to milvus values.yaml. I find it really ugly and not user-friendly, maybe there is/should be a better way?
We install the milvus through the Helm with a local value.yaml file including items as following
cluster:
enabled: true
minio:
enabled: false
pulsarStandalone:
persistence:
storageClass: milvus-pulsar
size: 100Gi
etcd:
persistence:
storageClass: milvus-ectd
size: 20Gi
PVs named milvus-pulsar and milvus-ectd have been created in manual before. Two PVCs for the pulsar and etcd are created by Helm, but there is no storageClassName items in their yaml file. Just like this
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
volumeMode: Filesystem
# "storageClassName: milvus-etcd" Normally, we expected a storageClassName here
...
Whether the value.yaml is incorrect, or it is a bug undiscovered.
If consumer hang for some reason, user keep writing request into channel with no response, the pulsar backlog will keep increasing and my cause user running out of disk space.
by reading this document https://pulsar.apache.org/docs/zh-CN/cookbooks-retention-expiry/#backlog-quotas
I would suggest to config the policy to producer_exception and let the produce fail.
Once the consumer keep processing and catch up then the sdk can keep search
k8s: v1.17.17 on rancher
root@10-102-35-35:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17", GitCommit:"f3abc15296f3a3f54e4ee42e830c61047b13895f", GitTreeState:"clean", BuildDate:"2021-01-13T13:21:12Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17", GitCommit:"f3abc15296f3a3f54e4ee42e830c61047b13895f", GitTreeState:"clean", BuildDate:"2021-01-13T13:13:00Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
root@10-102-35-35:~# helm list
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
my-release default 3 2021-11-04 11:10:39.708777253 +0800 CST deployed milvus-2.3.1 2.0.0-rc.8
root@10-102-35-35:~# helm version
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
version.BuildInfo{Version:"v3.7.0", GitCommit:"eeac83883cb4014fe60267ec6373570374ce770b", GitTreeState:"clean", GoVersion:"go1.16.8"}
root@10-102-35-35:~# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 4d16h
root@10-102-35-35:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system cattle-cluster-agent-568889bd6c-62flb 1/1 Running 0 4d16h
cattle-system cattle-node-agent-h725h 1/1 Running 0 3d22h
cattle-system cattle-node-agent-xnzwp 1/1 Running 0 4d16h
cattle-system kube-api-auth-wpk9l 1/1 Running 0 4d16h
default my-release-etcd-0 0/1 CrashLoopBackOff 1108 3d22h
default my-release-etcd-1 1/1 Running 0 3d22h
default my-release-etcd-2 1/1 Running 0 3d22h
default my-release-milvus-datacoord-6765bcffdf-hqknk 1/1 Running 0 4d16h
default my-release-milvus-datanode-699f55794d-5skbk 1/1 Running 0 4d16h
default my-release-milvus-indexcoord-f75f84786-8vm44 1/1 Running 0 3d22h
default my-release-milvus-indexnode-7fdb75f647-nhbkd 1/1 Running 0 3d22h
default my-release-milvus-proxy-6d4d594875-crzb6 1/1 Running 0 4d16h
default my-release-milvus-querycoord-55dd8dbfd5-8qjqp 1/1 Running 0 3d22h
default my-release-milvus-querynode-65c577b69-lhlqf 1/1 Running 0 4d16h
default my-release-milvus-rootcoord-59545d48f7-9xmx5 1/1 Running 4 3d22h
default my-release-minio-0 1/1 Running 0 3d22h
default my-release-minio-1 1/1 Running 0 4d16h
default my-release-minio-2 1/1 Running 0 3d22h
default my-release-minio-3 1/1 Running 0 4d16h
default my-release-pulsar-autorecovery-7cbfd6ccc-mcwlf 1/1 Running 0 3d22h
default my-release-pulsar-bastion-85886c49b7-4hp9t 1/1 Running 0 4d16h
default my-release-pulsar-bookkeeper-0 1/1 Running 0 3d22h
default my-release-pulsar-bookkeeper-1 1/1 Running 1 4d16h
default my-release-pulsar-broker-bcf858d9c-blz9c 1/1 Running 6 4d16h
default my-release-pulsar-proxy-678998cb5f-d88dm 2/2 Running 0 4d16h
default my-release-pulsar-zookeeper-0 1/1 Running 0 3d22h
default my-release-pulsar-zookeeper-1 1/1 Running 0 4d16h
default my-release-pulsar-zookeeper-2 1/1 Running 0 3d22h
kube-system coredns-6b84d75d99-94mwh 1/1 Running 0 3d22h
kube-system coredns-6b84d75d99-b2p77 1/1 Running 0 4d16h
kube-system coredns-autoscaler-5c4b6999d9-zwdvm 1/1 Running 0 3d22h
kube-system kube-flannel-6c795 2/2 Running 0 4d16h
kube-system kube-flannel-8jfkx 2/2 Running 0 3d22h
kube-system metrics-server-7579449c57-t6n7l 1/1 Running 0 4d16h
kube-system rke-coredns-addon-deploy-job-hdc62 0/1 Completed 0 3d22h
kube-system rke-metrics-addon-deploy-job-mmkzs 0/1 Completed 0 4d16h
kube-system rke-network-plugin-deploy-job-wc6mc 0/1 Completed 0 4d16h
local-path-storage local-path-provisioner-85cff57c57-2d9mh 1/1 Running 0 3d22h
root@10-102-35-35:~# kubectl describe pod my-release-etcd-0
Name: my-release-etcd-0
Namespace: default
Priority: 0
Node: 10-102-35-36/10.102.35.36
Start Time: Thu, 04 Nov 2021 11:57:34 +0800
Labels: app.kubernetes.io/instance=my-release
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=etcd
controller-revision-hash=my-release-etcd-666797d59
helm.sh/chart=etcd-6.3.3
statefulset.kubernetes.io/pod-name=my-release-etcd-0
Annotations: <none>
Status: Running
IP: 10.42.1.4
IPs:
IP: 10.42.1.4
Controlled By: StatefulSet/my-release-etcd
Containers:
etcd:
Container ID: docker://824f1259aec2dc36b0837c2bd52be9944604cde6df0463dabdded445c47b9ef7
Image: docker.io/bitnami/etcd:3.5.0-debian-10-r24
Image ID: docker-pullable://bitnami/etcd@sha256:914039ec8f4ba2c53580195bb21f487a1a86f6c3cd7275a1ec451e03c6c52dd1
Ports: 2379/TCP, 2380/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 08 Nov 2021 09:56:43 +0800
Finished: Mon, 08 Nov 2021 09:56:43 +0800
Ready: False
Restart Count: 1107
Liveness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
Readiness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
Environment:
BITNAMI_DEBUG: false
MY_POD_IP: (v1:status.podIP)
MY_POD_NAME: my-release-etcd-0 (v1:metadata.name)
ETCDCTL_API: 3
ETCD_ON_K8S: yes
ETCD_START_FROM_SNAPSHOT: no
ETCD_DISASTER_RECOVERY: no
ETCD_NAME: $(MY_POD_NAME)
ETCD_DATA_DIR: /bitnami/etcd/data
ETCD_LOG_LEVEL: info
ALLOW_NONE_AUTHENTICATION: yes
ETCD_ADVERTISE_CLIENT_URLS: http://$(MY_POD_NAME).my-release-etcd-headless.default.svc.cluster.local:2379
ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS: http://$(MY_POD_NAME).my-release-etcd-headless.default.svc.cluster.local:2380
ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
ETCD_AUTO_COMPACTION_MODE: revision
ETCD_AUTO_COMPACTION_RETENTION: 1000
ETCD_INITIAL_CLUSTER_TOKEN: etcd-cluster-k8s
ETCD_INITIAL_CLUSTER_STATE: existing
ETCD_INITIAL_CLUSTER: my-release-etcd-0=http://my-release-etcd-0.my-release-etcd-headless.default.svc.cluster.local:2380,my-release-etcd-1=http://my-release-etcd-1.my-release-etcd-headless.default.svc.cluster.local:2380,my-release-etcd-2=http://my-release-etcd-2.my-release-etcd-headless.default.svc.cluster.local:2380
ETCD_CLUSTER_DOMAIN: my-release-etcd-headless.default.svc.cluster.local
ETCD_QUOTA_BACKEND_BYTES: 4294967296
Mounts:
/bitnami/etcd from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tfhpq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-release-etcd-0
ReadOnly: false
default-token-tfhpq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tfhpq
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 115s (x26604 over 3d22h) kubelet, 10-102-35-36 Back-off restarting failed container
root@10-102-35-35:~# kubectl logs my-release-etcd-0
etcd 01:56:43.05
etcd 01:56:43.06 Welcome to the Bitnami etcd container
etcd 01:56:43.06 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-etcd
etcd 01:56:43.06 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-etcd/issues
etcd 01:56:43.07
etcd 01:56:43.07 INFO ==> ** Starting etcd setup **
etcd 01:56:43.09 INFO ==> Validating settings in ETCD_* env vars..
etcd 01:56:43.10 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 01:56:43.10 INFO ==> Initializing etcd
etcd 01:56:43.12 INFO ==> Detected data from previous deployments
etcd 01:56:43.26 INFO ==> Updating member in existing cluster
{"level":"warn","ts":"2021-11-08T01:56:43.331Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000230000/#initially=[my-release-etcd-2.my-release-etcd-headless.default.svc.cluster.local:2379;my-release-etcd-1.my-release-etcd-headless.default.svc.cluster.local:2379]","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"}
Error: etcdserver: member not found
Do note this part etcd-endpoints://0xc000230000/#initially=xxxxxx.
There is a pointer memory address like string in the url part, I think it a bug related to golang.
And the etcd-0 yaml looks ok to me
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2021-11-04T03:49:38Z"
generateName: my-release-etcd-
labels:
app.kubernetes.io/instance: my-release
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: etcd
controller-revision-hash: my-release-etcd-666797d59
helm.sh/chart: etcd-6.3.3
statefulset.kubernetes.io/pod-name: my-release-etcd-0
name: my-release-etcd-0
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: my-release-etcd
uid: ea918741-d390-426c-b814-7ccaa66c1245
resourceVersion: "1140611"
selfLink: /api/v1/namespaces/default/pods/my-release-etcd-0
uid: 3e306390-41cc-4850-a292-dfcda8853df3
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: my-release
app.kubernetes.io/name: etcd
namespaces:
- default
topologyKey: kubernetes.io/hostname
weight: 1
containers:
- env:
- name: BITNAMI_DEBUG
value: "false"
- name: MY_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ETCDCTL_API
value: "3"
- name: ETCD_ON_K8S
value: "yes"
- name: ETCD_START_FROM_SNAPSHOT
value: "no"
- name: ETCD_DISASTER_RECOVERY
value: "no"
- name: ETCD_NAME
value: $(MY_POD_NAME)
- name: ETCD_DATA_DIR
value: /bitnami/etcd/data
- name: ETCD_LOG_LEVEL
value: info
- name: ALLOW_NONE_AUTHENTICATION
value: "yes"
- name: ETCD_ADVERTISE_CLIENT_URLS
value: http://$(MY_POD_NAME).my-release-etcd-headless.default.svc.cluster.local:2379
- name: ETCD_LISTEN_CLIENT_URLS
value: http://0.0.0.0:2379
- name: ETCD_INITIAL_ADVERTISE_PEER_URLS
value: http://$(MY_POD_NAME).my-release-etcd-headless.default.svc.cluster.local:2380
- name: ETCD_LISTEN_PEER_URLS
value: http://0.0.0.0:2380
- name: ETCD_AUTO_COMPACTION_MODE
value: revision
- name: ETCD_AUTO_COMPACTION_RETENTION
value: "1000"
- name: ETCD_INITIAL_CLUSTER_TOKEN
value: etcd-cluster-k8s
- name: ETCD_INITIAL_CLUSTER_STATE
value: existing
- name: ETCD_INITIAL_CLUSTER
value: my-release-etcd-0=http://my-release-etcd-0.my-release-etcd-headless.default.svc.cluster.local:2380,my-release-etcd-1=http://my-release-etcd-1.my-release-etcd-headless.default.svc.cluster.local:2380,my-release-etcd-2=http://my-release-etcd-2.my-release-etcd-headless.default.svc.cluster.local:2380
- name: ETCD_CLUSTER_DOMAIN
value: my-release-etcd-headless.default.svc.cluster.local
- name: ETCD_QUOTA_BACKEND_BYTES
value: "4294967296"
image: docker.io/bitnami/etcd:3.5.0-debian-10-r24
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /opt/bitnami/scripts/etcd/prestop.sh
livenessProbe:
exec:
command:
- /opt/bitnami/scripts/etcd/healthcheck.sh
failureThreshold: 5
initialDelaySeconds: 60
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
name: etcd
ports:
- containerPort: 2379
name: client
protocol: TCP
- containerPort: 2380
name: peer
protocol: TCP
readinessProbe:
exec:
command:
- /opt/bitnami/scripts/etcd/healthcheck.sh
failureThreshold: 5
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: {}
securityContext:
runAsNonRoot: true
runAsUser: 1001
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /bitnami/etcd
name: data
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-tfhpq
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: my-release-etcd-0
nodeName: 10-102-35-36
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
serviceAccount: default
serviceAccountName: default
subdomain: my-release-etcd-headless
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: data
persistentVolumeClaim:
claimName: data-my-release-etcd-0
- name: default-token-tfhpq
secret:
defaultMode: 420
secretName: default-token-tfhpq
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-11-04T03:57:34Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-11-04T03:57:34Z"
message: 'containers with unready status: [etcd]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-11-04T03:57:34Z"
message: 'containers with unready status: [etcd]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-11-04T03:57:34Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://eabe664e022b6e667209a63f133befe054e944fcab4d834446f591032c5514b1
image: bitnami/etcd:3.5.0-debian-10-r24
imageID: docker-pullable://bitnami/etcd@sha256:914039ec8f4ba2c53580195bb21f487a1a86f6c3cd7275a1ec451e03c6c52dd1
lastState:
terminated:
containerID: docker://eabe664e022b6e667209a63f133befe054e944fcab4d834446f591032c5514b1
exitCode: 1
finishedAt: "2021-11-08T02:01:44Z"
reason: Error
startedAt: "2021-11-08T02:01:44Z"
name: etcd
ready: false
restartCount: 1108
started: false
state:
waiting:
message: back-off 5m0s restarting failed container=etcd pod=my-release-etcd-0_default(3e306390-41cc-4850-a292-dfcda8853df3)
reason: CrashLoopBackOff
hostIP: 10.102.35.36
phase: Running
podIP: 10.42.1.4
podIPs:
- ip: 10.42.1.4
qosClass: BestEffort
startTime: "2021-11-04T03:57:34Z"
version:
0.10.1
cmd:
helm install --wait --timeout 300s --set cluster.enabled=true --set persistence.enabled=true --set image.repository=registry.zilliz.com/milvus/engine --set mishards.image.tag=test --set mishards.image.pullPolicy=Always --set image.tag=0.10.1-gpu-centos7-release --set image.pullPolicy=Always --set service.type=LoadBalancer -f ci/db_backend/mysql_gpu_values.yaml --namespace milvus test1 .
result:
readonly: CPU
writable: GPU
expected:
readonly: GPU
writable: GPU
I found this error when trying to enable milvus cluster with helm.
Below how the installation goes with it config:
helm install \
--set cluster.enabled=true \
--set persistence.enabled=true \
--set mysql.enabled=false \
--set mishards.replica=3 \
--set readonly.replica=3 \
--set externalMysql.enable=true \
--set externalMysql.ip=192.168.99.99 \
--set externalMysql.port=3306 \
--set externalMysql.user=root \
--set externalMysql.password=example \
--set externalMysql.database=db_milvus_cluster \
milvus-release milvus/milvus
After some trial it turns out that the error comes from --set mysql.enabled=false
configuration, and I need to disable internal mysql pods, because I already set up external mysql service.
Error message:
Error: YAML parse error on milvus/templates/writable-deployment.yaml: error converting YAML to JSON: yaml: line 43: did not find expected '-' indicator
I'm having issues deploying the chart to multiple namespaces due to the ClusterRole defined in mishards-rbac.yaml
Does this role need to be at the cluster level?
Helm output when trying to deploy to namspace test
when milvus is already to deployed to namespace dev
:
Error: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "pods-list" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "test": current value is "dev"
Hi and thanks for this project! Our data scientists have not been this happy in a long time :-)
I think it would be great to include https://hub.docker.com/r/milvusdb/milvus-admin as part of this repo, either as an option to the current chart or as a separate chart. What do you think?
Many thanks
Error message : /var/lib/milvus/bin/milvus_server: error while loading shared libraries: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file too short
Used image tag : "0.10.0-gpu-d061620-5f3c00"
Helm generally recommends not to specify resources. I think it would be good to follow this recommendation.
is there a way to add env variable for instance specifying OMP_NUM_THREADS when starting milvus
I was installing the chart as a dependency and found that the package has milvus-0.10.0.tgz
and milvus-0.9.1.tgz
files inside. And those packages include milvus-0.10.0.tgz
and milvus-0.9.1.tgz
recursively.
There is also ci
folder having some extra files.
Looks like the package contains many unintended files.
I recommend checking the CI process and/or revising .helmignore contents
milvus.yaml:
dataCoord:
gc:
interval: 3600 # gc interval in seconds
missingTolerance: 86400 # file meta missing tolerance duration in seconds, 6024
dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 6024
Hi folks. Is there any way to place toleration and nodeSelector in only one place? Coz current helm creating a lot of deployments and it's little bit hard to change all tolerations and nodeSelectors. Also there is no pulsar-bastion in helm config so it's by default has no any toleration and nodeSelector that user can change while installing.
Add global annotations and labels in values.yaml of milvus-helm
When deploy mivlus cluster, need add application and applicationinstance to annotations and lables.
No response
error getting logo image https://raw.githubusercontent.com/milvus-io/docs/master/assets/milvus_logo.png: unexpected status code received: 404 (package: milvus version: 2.0.0)
when use flag with helm command --set nodeSelector={'middlerware: common'}
got error as follow:
error validating "milvus.yaml": error validating data: ValidationError(Deployment.spec.template.spec.nodeSelector): invalid type for io.k8s.api.core.v1.PodSpec.nodeSelector: got "array", expected "map"; if you choose to ignore these errors, turn validation off with --validate=false
if changed --set nodeSelector={'middlerware: common'}
to --set nodeSelector={'middlerware': 'common'}
and got error:
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/shenshouer/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/shenshouer/.kube/config
install.go:173: [debug] Original chart version: ""
Error: expected at most two arguments, unexpected arguments: ./milvus/
helm.go:81: [debug] expected at most two arguments, unexpected arguments: ./milvus/
helm.sh/helm/v3/pkg/action.(*Install).NameAndChart
/home/circleci/helm.sh/helm/pkg/action/install.go:547
main.runInstall
/home/circleci/helm.sh/helm/cmd/helm/install.go:179
main.newTemplateCmd.func2
/home/circleci/helm.sh/helm/cmd/helm/template.go:73
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/[email protected]/command.go:850
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/[email protected]/command.go:958
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
main.main
/home/circleci/helm.sh/helm/cmd/helm/helm.go:80
runtime.main
/usr/local/go/src/runtime/proc.go:204
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1374
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.