zilliztech / milvus-helm Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
When install in newer k8s version. helm will always report error: no matches for kind "PodDisruptionBudget" in version "policy/v1beta1".
I found we use pulsar dependency 2.7.8 which don't support newer pdb api version. Maybe we should upgrade pulsar chart version.
Support enabling authentication through the --set option
bitnami etcd can assign nodeSelector,https://github.com/bitnami/charts/blob/main/bitnami/etcd/templates/cronjob.yaml
Hi folks,
I'm trying to enable the ingress with nginx ingress controller on an AKS cluster but facing multiple issue starting with lack of documentation(which I would be happily contribute if I can solve this).
Chart:
appVersion: 2.3.1
name: milvus
sources:
https://github.com/zilliztech/milvus
version: 4.1.4
Nginx:
appVersion: 1.7.1
name: nginx-ingress-controller
sources:
https://github.com/bitnami/charts/tree/main/bitnami/nginx-ingress-controller
version: 9.7.1
Aks:
Docs consulted:
https://milvus.io/docs/azure.md
https://milvus.io/docs/tls.md#Encryption-in-Transit
https://milvus.io/docs/gcp_layer7.md#Set-up-a-Layer-7-Load-Balancer-for-Milvus-on-GCP
First of all this instruction is incorrect:
helm upgrade my-release milvus/milvus --set common.security.tlsMode=1
Since in the value.yaml file we need to use this to set variables for Milvus.yaml ( mounted as cm in the application):
extraConfigFiles:
user.yaml: |+
Ingress is configured as requested in the helm values file with:
ingress:
enabled: true
annotations:
# Annotation example: set nginx ingress type
#kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/listen-ports-ssl: '[19530]'
nginx.ingress.kubernetes.io/proxy-body-size: 4m
nginx.ingress.kubernetes.io/ssl-redirect: "true"
labels: {}
rules:
- host: "subdomain.example.com"
path: "/"
pathType: "Prefix"
# - host: "milvus-example2.local"
# path: "/otherpath"
# pathType: "Prefix"
tls:
- secretName: subdomain.example.com
hosts:
- subdomain.example.com
I would expect this to work since the tis termination should happen on nginx level and the ingress to the backend Milvus-proxy traffic should be unencrypted in the cluster so plain GRPC. However if I follow this doc https://milvus.io/docs/gcp_layer7.md#Set-up-a-Layer-7-Load-Balancer-for-Milvus-on-GCP, I should enable the tlsMode=1 for Milvus-proxy which I would expect to request ingress to do not end tls on ingress but forward traffic with GRPCS annotation which is not shared in any doc.
However I have tried in both ways with and without tlsMode setted but without success.
Current error is:
192.168.2.1 - - [16/Oct/2023:00:13:23 +0000] "GET / HTTP/2.0" 502 150 "-" "curl/8.1.2" 36 0.009 [milvusdb-milvusdb-19530] [] IP:19530 0 0.009 502 1b0d807fb0eb67979d2fda9c6406f916
2023/10/16 00:13:23 [error] 2865#2865: *19639942 upstream sent too large http2 frame: 4740180 while reading response header from upstream, client: 192.168.2.1, server: subdomain.example.com, request:
"GET / HTTP/2.0", upstream: "grpc://192.168.3.82:19530", host: "subdomain.example.com"
2023/10/16 00:14:52 [error] 2864#2864: *19641510 upstream sent too large http2 frame: 4740180 while reading response header from upstream, client: IP, server: subdomain.example.com, request
: "GET / HTTP/2.0", upstream: "grpc://192.168.3.88:19530", host: "subdomain.example.com"
IP - - [16/Oct/2023:00:14:52 +0000] "GET / HTTP/2.0" 502 150 "-" "curl/8.1.2" 36 0.003 [milvusdb-milvusdb-19530] [] 192.168.3.88:19530 0 0.002 502 990d7afd06c998beac44e22f1a24c135
2023/10/16 00:15:11 [error] 2865#2865: *19641848 upstream sent too large http2 frame: 4740180 while reading response header from upstream, client: 192.168.2.1, server: subdomain.example.com, request:
"GET / HTTP/2.0", upstream: "grpc://192.168.3.88:19530", host: "subdomain.example.com"
192.168.2.1 - - [16/Oct/2023:00:15:11 +0000] "GET / HTTP/2.0" 502 150 "-" "curl/8.1.2" 36 0.003 [milvusdb-milvusdb-19530] [] 192.168.3.88:19530 0 0.003 502 5903aa3c0173896ccc3e5224a669795c
Has anybody tried to enable Encryption on transit on AKS? Is there any doc which I can check and correct/add anything I'm missing?
Thanks!
Gents,
Don't know why this wasn't requested earlier, may be there is justification, but
Would you be so kind to adapt Ingress to support https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Since ALB supports GRPC backend protocol from the box https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#target-group-protocol-version
Hello, i'am trying to install milvus on a k8s cluster using helm:
helm install milvus milvus/milvus --values='/home/siradjedd/airstream/application/k8s/helm/milvus/values/milvus.yml' --namespace milvus
NAME: milvus
LAST DEPLOYED: Fri May 31 09:34:37 2024
NAMESPACE: milvus
STATUS: deployed
REVISION: 1
TEST SUITE: None
BUT i got problems on some pods saying :
read source yaml failed: error converting YAML to JSON: yaml: line 22: block sequence entries are not allowed in this context
here is my values.yaml:
## Enable or disable Milvus Cluster mode
cluster:
enabled: true
image:
all:
repository: milvusdb/milvus
tag: v2.2.13 #v2.2.4
pullPolicy: IfNotPresent
## Optionally specify an array of imagePullSecrets.
## Secrets must be manually created in the namespace.
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
# pullSecrets:
# - myRegistryKeySecretName
tools:
repository: milvusdb/milvus-config-tool
tag: v0.1.1
pullPolicy: IfNotPresent
# Global node selector
# If set, this will apply to all milvus components
# Individual components can be set to a different node selector
nodeSelector:
tech: milvus
# Global tolerations
# If set, this will apply to all milvus components
# Individual components can be set to a different tolerations
tolerations:
- key: "milvus"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# Global affinity
# If set, this will apply to all milvus components
# Individual components can be set to a different affinity
affinity: {}
# Global labels and annotations
# If set, this will apply to all milvus components
labels: {}
annotations: {}
# Extra configs for milvus.yaml
# If set, this config will merge into milvus.yaml
# Please follow the config structure in the milvus.yaml
# at https://github.com/milvus-io/milvus/blob/master/configs/milvus.yaml
# Note: this config will be the top priority which will override the config
# in the image and helm chart.
extraConfigFiles:
user.yaml: |+
# For example enable rest http for milvus proxy
# proxy:
# http:
# enabled: true
## Expose the Milvus service to be accessed from outside the cluster (LoadBalancer service).
## or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it.
## ref: http://kubernetes.io/docs/user-guide/services/
##
service:
type: LoadBalancer
port: 19530
nodePort: ""
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
labels: {}
## List of IP addresses at which the Milvus service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []
# - externalIp1
# LoadBalancerSourcesRange is a list of allowed CIDR values, which are combined with ServicePort to
# set allowed inbound rules on the security group assigned to the master load balancer
loadBalancerSourceRanges:
#- 172.254.0.0/16
- 0.0.0.0/0
# Optionally assign a known public LB IP
# loadBalancerIP: 1.2.3.4
ingress:
enabled: false
annotations:
# Annotation example: set nginx ingress type
# kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/listen-ports-ssl: '[19530]'
nginx.ingress.kubernetes.io/proxy-body-size: 4m
nginx.ingress.kubernetes.io/ssl-redirect: "true"
labels: {}
hosts:
- milvus-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - milvus-example.local
serviceAccount:
create: false
name:
annotations:
labels:
metrics:
enabled: true
serviceMonitor:
# Set this to `true` to create ServiceMonitor for Prometheus operator
enabled: false
interval: "30s"
scrapeTimeout: "10s"
# Additional labels that can be used so ServiceMonitor will be discovered by Prometheus
additionalLabels: {}
livenessProbe:
enabled: true
initialDelaySeconds: 90
periodSeconds: 30
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
enabled: true
initialDelaySeconds: 90
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
log:
level: "info"
file:
maxSize: 300 # MB
maxAge: 5 # day
maxBackups: 20
format: "text" # text/json
persistence:
mountPath: "/milvus/logs"
## If true, create/use a Persistent Volume Claim
## If false, use emptyDir
##
enabled: true
annotations:
helm.sh/resource-policy: keep
persistentVolumeClaim:
existingClaim: ""
## Milvus Logs Persistent Volume Storage Class
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner.
## ReadWriteMany access mode required for milvus cluster.
##
storageClass: efs-csi-sc
accessModes: ReadWriteMany
size: 10Gi
subPath: ""
## Heaptrack traces all memory allocations and annotates these events with stack traces.
## See more: https://github.com/KDE/heaptrack
## Enable heaptrack in production is not recommended.
heaptrack:
image:
repository: milvusdb/heaptrack
tag: v0.1.0
pullPolicy: IfNotPresent
# standalone:
# replicas: 1 # Run standalone mode with replication disabled
# resources: {}
# # Set local storage size in resources
# # limits:
# # ephemeral-storage: 100Gi
# nodeSelector: {}
# affinity: {}
# tolerations: []
# extraEnv: []
# heaptrack:
# enabled: false
# disk:
# enabled: true
# size:
# enabled: false # Enable local storage size limit
# profiling:
# enabled: false # Enable live profiling
# ## Default message queue for milvus standalone
# ## Supported value: rocksmq, pulsar and kafka
# messageQueue: rocksmq
# persistence:
# mountPath: "/var/lib/milvus"
# ## If true, alertmanager will create/use a Persistent Volume Claim
# ## If false, use emptyDir
# ##
# enabled: true
# annotations:
# helm.sh/resource-policy: keep
# persistentVolumeClaim:
# existingClaim: ""
# ## Milvus Persistent Volume Storage Class
# ## If defined, storageClassName: <storageClass>
# ## If set to "-", storageClassName: "", which disables dynamic provisioning
# ## If undefined (the default) or set to null, no storageClassName spec is
# ## set, choosing the default provisioner.
# ##
# storageClass: efs-csi-sc
# accessModes: ReadWriteOnce
# size: 50Gi
# subPath: ""
proxy:
enabled: true
replicas: 1
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: false # Enable live profiling
http:
enabled: true # whether to enable http rest server
debugMode:
enabled: false
rootCoordinator:
enabled: true
# You can set the number of replicas greater than 1, only if enable active standby
replicas: 1 # Run Root Coordinator mode with replication disabled
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: true # Enable live profiling
activeStandby:
enabled: false # Enable active-standby when you set multiple replicas for root coordinator
service:
port: 53100
annotations: {}
labels: {}
clusterIP: ""
queryCoordinator:
enabled: true
# You can set the number of replicas greater than 1, only if enable active standby
replicas: 1 # Run Query Coordinator mode with replication disabled
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: true # Enable live profiling
activeStandby:
enabled: false # Enable active-standby when you set multiple replicas for query coordinator
service:
port: 19531
annotations: {}
labels: {}
clusterIP: ""
queryNode:
enabled: true
replicas: 2
resources: {}
# Set local storage size in resources
# limits:
# ephemeral-storage: 100Gi
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
disk:
enabled: true # Enable querynode load disk index, and search on disk index
size:
enabled: false # Enable local storage size limit
profiling:
enabled: false # Enable live profiling
indexCoordinator:
enabled: true
# You can set the number of replicas greater than 1, only if enable active standby
replicas: 1 # Run Index Coordinator mode with replication disabled
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: false # Enable live profiling
activeStandby:
enabled: false # Enable active-standby when you set multiple replicas for index coordinator
service:
port: 31000
annotations: {}
labels: {}
clusterIP: ""
indexNode:
enabled: true
replicas: 2
resources: {}
# Set local storage size in resources
# limits:
# ephemeral-storage: 100Gi
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: true # Enable live profiling
disk:
enabled: true # Enable index node build disk vector index
size:
enabled: false # Enable local storage size limit
dataCoordinator:
enabled: true
# You can set the number of replicas greater than 1, only if enable active standby
replicas: 1 # Run Data Coordinator mode with replication disabled
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: true # Enable live profiling
activeStandby:
enabled: false # Enable active-standby when you set multiple replicas for data coordinator
service:
port: 13333
annotations: {}
labels: {}
clusterIP: ""
dataNode:
enabled: true
replicas: 2
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: true # Enable live profiling
## mixCoordinator contains all coord
## If you want to use mixcoord, enable this and disable all of other coords
mixCoordinator:
enabled: false
# You can set the number of replicas greater than 1, only if enable active standby
replicas: 1 # Run Mixture Coordinator mode with replication disabled
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
extraEnv: []
heaptrack:
enabled: false
profiling:
enabled: false # Enable live profiling
activeStandby:
enabled: false # Enable active-standby when you set multiple replicas for Mixture coordinator
service:
annotations: {}
labels: {}
clusterIP: ""
attu:
enabled: true
name: attu
image:
repository: zilliz/attu
tag: v2.2.3
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 3000
resources: {}
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
# Annotation example: set nginx ingress type
# kubernetes.io/ingress.class: nginx
labels: {}
hosts:
# - milvus.padasiradjme.actops.io
- milvus.padasiradjmeplus.com
tls:
- secretName: milvus-tls
hosts:
#- milvus.padasiradjme.actops.io
- milvus.padasiradjmeplus.com
## Configuration values for the minio dependency
## ref: https://github.com/minio/charts/blob/master/README.md
##
minio:
enabled: false
## Configuration values for the etcd dependency
## ref: https://artifacthub.io/packages/helm/bitnami/etcd
##
etcd:
enabled: true
name: etcd
replicaCount: 3
pdb:
create: false
image:
repository: "milvusdb/etcd"
tag: "3.5.5-r2"
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 2379
peerPort: 2380
auth:
rbac:
enabled: false
persistence:
enabled: true
storageClass: efs-csi-sc
accessMode: ReadWriteOnce
size: 10Gi
## Enable auto compaction
## compaction by every 1000 revision
##
autoCompactionMode: revision
autoCompactionRetention: "1000"
nodeSelector:
tech: milvus
tolerations:
- key: "milvus"
operator: "Equal"
value: "true"
effect: "NoSchedule"
## Increase default quota to 4G
##
extraEnvVars:
- name: ETCD_QUOTA_BACKEND_BYTES
value: "4294967296"
- name: ETCD_HEARTBEAT_INTERVAL
value: "500"
- name: ETCD_ELECTION_TIMEOUT
value: "2500"
## Configuration values for the pulsar dependency
## ref: https://github.com/apache/pulsar-helm-chart
##
pulsar:
enabled: false
kafka:
enabled: true
name: kafka
replicaCount: 3
nodeSelector:
tech: milvus
tolerations:
- key: "milvus"
operator: "Equal"
value: "true"
effect: "NoSchedule"
image:
repository: bitnami/kafka
tag: 3.1.0-debian-10-r52
## Increase graceful termination for kafka graceful shutdown
terminationGracePeriodSeconds: "90"
pdb:
create: false
## Enable startup probe to prevent pod restart during recovering
startupProbe:
enabled: true
## Kafka Java Heap size
heapOpts: "-Xmx4096m -Xms4096m"
maxMessageBytes: _10485760
defaultReplicationFactor: 3
offsetsTopicReplicationFactor: 3
## Only enable time based log retention
logRetentionHours: 168
logRetentionBytes: _-1
extraEnvVars:
- name: KAFKA_CFG_MAX_PARTITION_FETCH_BYTES
value: "5242880"
- name: KAFKA_CFG_MAX_REQUEST_SIZE
value: "5242880"
- name: KAFKA_CFG_REPLICA_FETCH_MAX_BYTES
value: "10485760"
- name: KAFKA_CFG_FETCH_MESSAGE_MAX_BYTES
value: "5242880"
- name: KAFKA_CFG_LOG_ROLL_HOURS
value: "24"
persistence:
enabled: true
storageClass: efs-csi-sc
accessMode: ReadWriteOnce
size: 100Gi
metrics:
## Prometheus Kafka exporter: exposes complimentary metrics to JMX exporter
kafka:
enabled: false
## Prometheus JMX exporter: exposes the majority of Kafkas metrics
jmx:
enabled: false
## To enable serviceMonitor, you must enable either kafka exporter or jmx exporter.
## And you can enable them both
serviceMonitor:
enabled: false
service:
type: ClusterIP
ports:
client: 9092
zookeeper:
enabled: true
replicaCount: 3
nodeSelector:
tech: milvus
tolerations:
- key: "milvus"
operator: "Equal"
value: "true"
effect: "NoSchedule"
## Configuration values for the mysql dependency
## ref: https://artifacthub.io/packages/helm/bitnami/mysql
##
## MySQL used for meta store is testing internally
mysql:
enabled: false
###################################
# External S3
# - these configs are only used when `externalS3.enabled` is true
###################################
externalS3:
enabled: true
host: "s3.eu-west-3.amazonaws.com"
port: "80"
accessKey: "-"
secretKey: "-"
useSSL: false
bucketName: "milvus-match-video-objects-bucket-prod"
rootPath: ""
useIAM: false
iamEndpoint: ""
同一个k8s集群可以安装两个不同版本的milvus helm-chart创建集群吗
Azur added Azure Workload Identities (see https://github.com/Azure/azure-workload-identity and https://azure.github.io/azure-workload-identity/docs/) to AKS. The functionality is stable and is the recommended approach for production situations.
AWI is essentially the equivalent of AWS's IAM Roles for Service Accounts and works the same. I.e, your cluster becomes an OIDC identity provider and a specific service account in a specific namespace can be designated as federated principal to which Azure IAM roles can be attached. This is significantly more secure than using credentials (StorageAccountName + secret)
Can you please make minio support this?
I need to deploy milvus on kubernetes cluster that is being run through rancher. To do that I need to modify the security context both for the pod and the containers in the following way (below is an example values.yaml file):
apiVersion: apps/v1
kind: Deployment
metadata:
name: example
spec:
replicas: 1
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
securityContext: # Pod security context
fsGroupChangePolicy: OnRootMismatch
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- image: ubuntu
name: example
securityContext: # Container security context
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
However, I am not exactly sure, how I should modify the sections of default values.yaml and which sections require these modifications. If there are any tips on that, they would be greatly appreciated.
Thank you very much in advance!
Hello Milvus team,
I wanted to bring to your attention that the latest helm chart in the helm repo is outdated since the release of version 2.3. As an example, the latest version available in the repo is 4.0.31. However, yesterday, the 4.1.1 chart version was released, but it cannot be deployed using the remote repo.
Could you please investigate and provide an explanation for this issue?
Thank you.
externalS3.port not used no matter what value is configured.
externalS3.host should contains hostname and port, or else pod throw exception if minio deployed in same namespace.
datacoord error log:
[2024/06/26 06:34:52.708 +00:00] [ERROR] [datacoord/server.go:548] ["chunk manager init failed"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/internal/datacoord.(*Server).newChunkManagerFactory\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:548\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).initDataCoord\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:348\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).Init\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:334\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).init\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:129\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:256\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/data_coord.go:52\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:113"] [2024/06/26 06:34:52.708 +00:00] [ERROR] [datacoord/service.go:130] ["dataCoord init error"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).init\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:130\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:256\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/data_coord.go:52\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:113"] [2024/06/26 06:34:52.708 +00:00] [ERROR] [components/data_coord.go:53] ["DataCoord starts error"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/data_coord.go:53\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:113"] panic: Endpoint url cannot have fully qualified paths.
configma/milvus/default.yaml
minio: address: minio port: 9000 accessKeyID: admin secretAccessKey: admin.password useSSL: false bucketName: milvus rootPath: / useIAM: false cloudProvider: minio iamEndpoint: region: useVirtualHost: false
pod/configs/milvus.yaml
minio: accessKeyID: admin address: minio bucketName: milvus cloudProvider: minio iamEndpoint: null listObjectsMaxKeys: 0 logLevel: fatal port: 9000 region: null requestTimeoutMs: 10000 rootPath: / secretAccessKey: admin.password ssl: tlsCACert: /path/to/public.crt useIAM: false useSSL: false useVirtualHost: false
See detailed discussion in https://milvusio.slack.com/archives/CMNHUC371/p1695943308429629
Baiscally for K8S deployment, users need to be able to specify a mountPath inside proxy pod to use their own Cert and Key files to enable TLS between the client and the service.
I looked through some deployment documents and did not find any place to configure etcd SSL certificates. There is only a bit of configuration below. Does this support etcd cluster certificates?
externalEtcd:
enabled: true
## the endpoints of the external etcd
##
endpoints:
- 10.0.0.9:2379
rootPath: milvus
milvus_version :- 2.2.13
mode :- standalone
Installed through helm in minikube and recieved the following error
grpc_message:"grpc: received message larger than max (363335405 vs. 67108864)", grpc_status:8
and there is no configuration parameters for grpc max size in vaues.yaml file while installing trough helm.
Hello,
I'm using milvus db in k8s as standalone, have tls.crt and tls.key for my ingress dns and put it in standalone-pod via secretName: milvus-tls. CA-cert is also added to standalone-pod in /etc/ssl/certs. Certs are valid. Config milvus tls:
Python 3.10.12
protobuf 3.20.0
milvus-4.1.17
grpcio-tools 1.53.0
Milvus cli version: 0.4.2
Pymilvus version: 2.3.4
extraConfigFiles:
user.yaml: |
tls:
serverPemPath: /tmp/tls.crt
serverKeyPath: /tmp/tls.key
common:
security:
tlsMode: 1
Ingress by default:
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/listen-ports-ssl: '[19530]'
nginx.ingress.kubernetes.io/proxy-body-size: 4m
nginx.ingress.kubernetes.io/ssl-redirect: "true"
rules:
- host: k8s-milvus.example.com
http:
paths:
- backend:
service:
name: my-release-milvus
port:
number: 19530
path: /
pathType: Prefix
tls:
- hosts:
- k8s-milvus.example.com
secretName: milvus-tls
I get 502 in browser and Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED. trying connect via python script like connections.connect("default", host="k8s-milvus.example.com", port="443", secure=True, server_pem_path="/home/testuser/milvus/ca.pem")
What I tried:
If disable TLS on milvus, drop ingress-line nginx.ingress.kubernetes.io/backend-protocol: GRPC and keep tls on ingress - I get 404 in browser (thats good) and CERTIFICATE_VERIFY_FAILED via script.
If connect via 80 port without milvus-tls I get Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
Tried to mix params like server_pem_path, ca_pem_path, client_pem_path and etc.
Without milvus-tls in minikube and port-forward standalone-pod - it's well connecting. Through ingress its also dont work even with simple\milvus-default ingress. Maybe its main problem.
All pods are running without errors in logs. How can I connect to milvus db via python script? How fix ssl error? I can't disable ingress tls, but can do it on milvus db, if same (Ingress TLS, Milvus noTLS) config is possible.
using milvus helm 4.1.9 etcd image is 3.5.5-r2.
in this image /opt/bitnami/scripts/etcd/snapshot.sh
/opt/bitnami/scripts/libetcd.sh
etcdctl_get_endpoints() {
echo "$ETCD_INITIAL_CLUSTER" | sed 's/^[^=]+=http/http/g' |sed 's/,[^=]+=/,/g'
}
i need to add env ETCD_INITIAL_CLUSTER in cronjob.
without this env . will show error "all etcd endpoints are unhealthy!"
in etcd etcd:3.5.5-debian-11-r23
/opt/bitnami/scripts/libetcd.sh
etcdctl_get_endpoints() {
local only_others=${1:-false}
local -a endpoints=()
local host domain port
ip_has_valid_hostname() {
local ip="${1:?ip is required}"
local parent_domain="${1:?parent_domain is required}"
# 'getent hosts $ip' can return hostnames in 2 different formats:
# POD_NAME.HEADLESS_SVC_DOMAIN.NAMESPACE.svc.cluster.local (using headless service domain)
# 10-237-136-79.SVC_DOMAIN.NAMESPACE.svc.cluster.local (using POD's IP and service domain)
# We need to discad the latter to avoid issues when TLS verification is enabled.
[[ "$(getent hosts "$ip")" = *"$parent_domain"* ]] && return 0
return 1
}
hostname_has_ips() {
local hostname="${1:?hostname is required}"
[[ "$(getent ahosts "$hostname")" != "" ]] && return 0
return 1
}
# This piece of code assumes this code is executed on a K8s environment
# where etcd members are part of a statefulset that uses a headless service
# to create a unique FQDN per member. Under these circumstances, the
# ETCD_ADVERTISE_CLIENT_URLS env. variable is created as follows:
# SCHEME://POD_NAME.HEADLESS_SVC_DOMAIN:CLIENT_PORT,SCHEME://SVC_DOMAIN:SVC_CLIENT_PORT
#
# Assuming this, we can extract the HEADLESS_SVC_DOMAIN and obtain
# every available endpoint
read -r -a advertised_array <<<"$(tr ',;' ' ' <<<"$ETCD_ADVERTISE_CLIENT_URLS")"
host="$(parse_uri "${advertised_array[0]}" "host")"
port="$(parse_uri "${advertised_array[0]}" "port")"
domain="${host#"${ETCD_NAME}."}"
# When ETCD_CLUSTER_DOMAIN is set, we use that value instead of extracting
# it from ETCD_ADVERTISE_CLIENT_URLS
! is_empty_value "$ETCD_CLUSTER_DOMAIN" && domain="$ETCD_CLUSTER_DOMAIN"
# Depending on the K8s distro & the DNS plugin, it might need
# a few seconds to associate the POD(s) IP(s) to the headless svc domain
if retry_while "hostname_has_ips $domain"; then
local -r ahosts="$(getent ahosts "$domain" | awk '{print $1}' | uniq | wc -l)"
for i in $(seq 0 $((ahosts - 1))); do
# We use the StatefulSet name stored in MY_STS_NAME to get the peer names based on the number of IPs registered in the headless service
pod_name="${MY_STS_NAME}-${i}"
if ! { [[ $only_others = true ]] && [[ "$pod_name" = "$MY_POD_NAME" ]]; }; then
endpoints+=("${pod_name}.${ETCD_CLUSTER_DOMAIN}:${port:-2380}")
fi
done
fi
echo "${endpoints[*]}" | tr ' ' ','
}
bitnami helm template has the env ETCD_CLUSTER_DOMAIN and MY_STS_NAME. So we can running snapshot successfully.
i think this is problem.
For security reasons, we use Kyverno's admission controller on our cluster to ensure that certain Linux capabilities are dropped and that containers run as non-root, along with other policies. While we can change the security contexts of the components using the Bitnami Helm charts (etcd, Kafka, etc.) we are unable to do this for MinIO.
In addition, in order to improve resiliency, we would like to be able to set Pod Topology Spread Constraints for the same components.
This is a feature request to expose these in the MinIO Helm chart.
Related to zilliztech/milvus-operator#144
I want to ask that is there any update or expected-date for the milvus-helm-release for milvus-2.3.17 or milvus-2.3.18.
`[root@master containers]# kubectl describe pod my-milvus-zookeeper-0
Name: my-milvus-zookeeper-0
Namespace: default
Priority: 0
Node: master/192.168.6.242
Start Time: Fri, 17 May 2024 10:24:16 +0800
Labels: app.kubernetes.io/component=zookeeper
app.kubernetes.io/instance=my-milvus
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=zookeeper
controller-revision-hash=my-milvus-zookeeper-76fd4b8cf7
helm.sh/chart=zookeeper-8.1.2
statefulset.kubernetes.io/pod-name=my-milvus-zookeeper-0
Annotations: cni.projectcalico.org/containerID: 3f63624af437de0a6227b24d364ac49377704c9e36a93251b81d0b748babcad4
cni.projectcalico.org/podIP: 10.244.219.115/32
cni.projectcalico.org/podIPs: 10.244.219.115/32
Status: Running
IP: 10.244.219.115
IPs:
IP: 10.244.219.115
Controlled By: StatefulSet/my-milvus-zookeeper
Containers:
zookeeper:
Container ID: docker://a8f4be73fc54aabf0265a8a181c29a36f4fe4badcb1b949dd30c41b464603293
Image: docker.io/bitnami/zookeeper:3.7.0-debian-10-r320
Image ID: docker-pullable://bitnami/zookeeper@sha256:c19c5473ef3feb8a0db00b92891c859915d06f7b888be4b3fdb78aaca109cd1f
Ports: 2181/TCP, 2888/TCP, 3888/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/scripts/setup.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 17 May 2024 14:50:32 +0800
Finished: Fri, 17 May 2024 14:50:32 +0800
Ready: False
Restart Count: 57
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 250m
memory: 256Mi
Liveness: exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
ZOO_DATA_LOG_DIR:
ZOO_PORT_NUMBER: 2181
ZOO_TICK_TIME: 2000
ZOO_INIT_LIMIT: 10
ZOO_SYNC_LIMIT: 5
ZOO_PRE_ALLOC_SIZE: 65536
ZOO_SNAPCOUNT: 100000
ZOO_MAX_CLIENT_CNXNS: 60
ZOO_4LW_COMMANDS_WHITELIST: srvr, mntr, ruok
ZOO_LISTEN_ALLIPS_ENABLED: no
ZOO_AUTOPURGE_INTERVAL: 0
ZOO_AUTOPURGE_RETAIN_COUNT: 3
ZOO_MAX_SESSION_TIMEOUT: 40000
ZOO_SERVERS: my-milvus-zookeeper-0.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::1 my-milvus-zookeeper-1.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::2 my-milvus-zookeeper-2.my-milvus-zookeeper-headless.default.svc.cluster.local:2888:3888::3
ZOO_ENABLE_AUTH: no
ZOO_HEAP_SIZE: 1024
ZOO_LOG_LEVEL: ERROR
ALLOW_ANONYMOUS_LOGIN: yes
POD_NAME: my-milvus-zookeeper-0 (v1:metadata.name)
Mounts:
/bitnami/zookeeper from data (rw)
/scripts/setup.sh from scripts (rw,path="setup.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sbjvl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-milvus-zookeeper-0
ReadOnly: false
scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-milvus-zookeeper-scripts
Optional: false
kube-api-access-sbjvl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Pulled 39m (x51 over 4h29m) kubelet Container image "docker.io/bitnami/zookeeper:3.7.0-debian-10-r320" already present on machine
Warning BackOff 4m16s (x1319 over 4h29m) kubelet Back-off restarting failed container`
while deploy milvus in Azure, we want to use the Azure Blob Storage as remote object storage
however, it doesn't support S3 protocol
To differ Azure Blob Storage with other object storages which support S3 protocol
we use the configuration item: common.storageType
minio.cloudProvider
Milvus Version: 2.3.1
Exeucuting below operation on milvus Client using JAVA Client
1. CreatedCollection with 4 fields
2. Inserted 10 elements/entries into a collection created in Step 1
3. CreatedIndex
4. Loading Collection
The Load collection API is struck and after some time it finishing, but load is not performed. From Attu UI it intitially displyed as 50%, after Load API finish execution, it it visiable as unloaded.
Attaching here
Attaching queryNode logs for reference:
milvus-querynodelogs-loadAPI.txt
Attu UI.
Many providers and Kubernetes flavors require resources to satisfy the Restricted
Pod Security Standard.
This includes:
seccompProfile
to RuntimeDefault
or Localhost
Running containers with such restrictions effectively reduces security risks.
It would be great if the Milvus Helm Chart supports this restrictions either by setting the securityContext by default or by providing an option to modify the securityContext over the Helm values.
For the stated setting to be compatible, all containers must run with a non-root user.
Currently, this is not given, e.g. for Milvus itself: milvus-io/milvus #25565
securityContext
configuration options for all components over Helm valuesI see document in K8s
A container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates.
And helm use subPath:
- name: milvus-config
mountPath: /milvus/configs/user.yaml
subPath: user.yaml
readOnly: true
I have a milvus standalone deployment through milvus helm chart deployed on kubenretes cluster with s3 backend
The milvus pod throws an error when attempting to load na index:
[2023/08/28 20:15:12.506 +00:00] [INFO] [querycoordv2/services.go:77] ["show collections request received"] [msgID=0] [collections="[443370269700382014]"]
[2023/08/28 20:15:12.506 +00:00] [INFO] [proxy/impl.go:2016] ["GetLoadingProgress done"] [traceID=5c81b4b44103a701] [request="base:<msg_type:ShowCollections sourceID:280 > collection_name:\"luke_entities\" db_name:\"default\" "]
2023-08-28 20:15:12,564 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-28 20:15:12.564 CurlHttpClient [140134785742592] Curl returned error code 3 - URL using bad/illegal format or missing URL
2023-08-28 20:15:12,565 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-28 20:15:12.565 AWSClient [140134785742592] HTTP response code: -1
Resolved remote host IP address:
Request ID:
Exception name:
Error message: curlCode: 3, URL using bad/illegal format or missing URL
0 response headers:
2023-08-28 20:15:12,565 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-28 20:15:12.565 CurlHttpClient [140134405990144] Curl returned error code 3 - URL using bad/illegal format or missing URL
My milvus configuerations:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: milvus
namespace: argocd
spec:
project: default
source:
chart: milvus
repoURL: https://zilliztech.github.io/milvus-helm/
targetRevision: 4.0.34
helm:
parameters:
- name: serviceAccount.create
value: "true"
- name: serviceAccount.name
value: "milvus-sa"
- name: serviceAccount.annotations.eks\.amazonaws\.com/role-arn
value: REDACTED
- name: image.pullSecrets
value: "regcred"
- name: cluster.enabled
value: "false"
- name: minio.enabled
value: "false"
- name: pulsar.enabled
value: "false"
- name: etcd.replicaCount
value: "3"
- name: externalS3.enabled
value: "true"
- name: externalS3.bucketName
value: REDACTED
- name: externalS3.rootPath
value: "milvus"
- name: externalS3.host
value: "s3.us-east-1.amazonaws.com"
- name: externalS3.useSSL
value: "true"
- name: externalS3.useSSL
value: "true"
- name: externalS3.useIAM
value: "true"
- name: standalone.persistence.persistentVolumeClaim.existingClaim
value: "milvus-pvc-a062923c-e99b-4cce-ab24-f091da914d26"
- name: standalone.resources.requests.cpu
value: "4"
- name: standalone.resources.requests.memory
value: "48Gi"
- name: standalone.resources.limits.cpu
value: "6"
- name: standalone.resources.limits.memory
value: "64Gi"
- name: etcd.resources.requests.cpu
value: "2"
- name: etcd.resources.limits.cpu
value: "4"
- name: etcd.resources.requests.memory
value: "16Gi"
- name: etcd.resources.limits.memory
value: "24Gi"
destination:
server: https://kubernetes.default.svc
namespace: milvus
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Milvus supported x86 and ARM, which started from version v2.3.0
.
Branch master milvus version is v2.3.3
, and etcd dependency version is 3.5.5-r2
, which on dockerhub repository "milvusdb/etcd"
milvus v2.3.2 has x86 and ARM images.
https://hub.docker.com/r/milvusdb/milvus/tags?page=1&name=2.3.2
But milvusdb/etcd
only has x86 images, ARM image is not exist.
https://hub.docker.com/r/milvusdb/etcd/tags?page=1&name=3.5.5-r2
I found that repo bitnami/etcd
supports x86 and ARM for 3.5.7
, while both 3.5.5
and 3.5.6
do not supports.
https://hub.docker.com/r/bitnami/etcd/tags?page=1&name=3.5.7
https://hub.docker.com/r/bitnami/etcd/tags?page=1&name=3.5.5
https://hub.docker.com/r/bitnami/etcd/tags?page=1&name=3.5.6
If it's possible to upgrade etcd version from 3.5.5
to 3.5.7
for milvus v2.3.3
, which can supports both x86 and ARM arch ?
Attention: Maybe we should check milvus helm chart v2.3.0, v2.3.1, v2.3.2 for this same problem.
When using 4.1.17 w/ flux I get the following error:
Helm install failed for release milvus/milvus-milvus with chart [email protected]: error while running post render on files: map[string]interface {}(nil): yaml: unmarshal errors:
line 48: mapping key "httpNumThreads" already defined at line 36
Here are the values I'm passing
---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
name: milvus
namespace: milvus
spec:
targetNamespace: milvus
interval: 1m
chart:
spec:
chart: milvus
version: "4.1.17"
sourceRef:
kind: HelmRepository
name: milvus
namespace: milvus
interval: 1m
values:
cluster:
enabled: true
serviceAccount:
create: true
name: milvus-s3-access-sa
annotations:
eks.amazonaws.com/role-arn: "my-s3-arn"
service:
type: LoadBalancer
port: 19530
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-name: milvus-service
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
minio:
enabled: false
externalS3:
enabled: true
host: "s3.us-east-2.amazonaws.com"
port: "443"
useSSL: true
bucketName: "milvusbucket"
useIAM: true
cloudProvider: "aws"
iamEndpoint: ""
rootCoordinator:
replicas: 2
activeStandby:
enabled: true
resources:
limits:
cpu: 1
memory: 2Gi
indexCoordinator:
replicas: 2
activeStandby:
enabled: true
resources:
limits:
cpu: "0.5"
memory: 0.5Gi
queryCoordinator:
replicas: 2
activeStandby:
enabled: true
resources:
limits:
cpu: "0.5"
memory: 0.5Gi
dataCoordinator:
replicas: 2
activeStandby:
enabled: true
resources:
limits:
cpu: "0.5"
memory: 0.5Gi
proxy:
replicas: 2
resources:
limits:
cpu: 1
memory: 2Gi
settings:
clusterName: "basis"
clusterEndpoint: "myclusterendpoint"
logLevel: info
install:
crds: CreateReplace
upgrade:
crds: CreateReplace
Hi !
If users configure HPA on their side to control index/data/query/proxy pods Deployments, we need to remove the replicas field to avoid the bad well-known HPA behavior in case of rollout-update/apply.
Maybe by setting its value to 0 or "none" in values.yaml ?
This feature will be necessary when HPA will be available in this chart again as well :)
Regards,
externalPulsar:
enabled: true
host: "xxxx"
port: 30012
maxMessageSize: "5242880" # 5 * 1024 * 1024 Bytes, Maximum size of each message in pulsar.
tenant: "xxxx"
namespace: "xxx"
authPlugin: "org.apache.pulsar.client.impl.auth.AuthenticationToken"
authParams: {"token": "eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJsbG0tcGxhdGZvcm0tdG9rZW4ifQ.g4cejYFJtfUlPHbeOdmNRjMlNWImhvLkcF-YR71YpPjiXihqCTOM4vghlRIDYECR4Oc7xxxxxx"}
看源码发现是go解析jwt没有解析出来,是否有解决方案呢?
Hi Team,
We tried to use your helm templates (https://github.com/zilliztech/milvus-helm/tree/master/charts/milvus) and tried to deploy milvus on our openshift cluster, Our openshift team or Kubernetes cluster admins won't let us specify any security context i.e(runsAsUser, runAsGroup, fsGroup, ) for pods/deployments/replicasets/statefulsets, So we should not be specifying the below.
# runAsUser: 1000
# runAsGroup: 1000
# fsGroup: 1000
So I had to comment them and tried to milvus install. But it does not work
None of my pods start and I see the following errors.
Please assist, how to proceed further.
Thanks!
Tharun M
Team,
It is learned that milvus-io/milvus-helm is now archived and allegedly zilliztech/milvus-helm is its successor.
However, Milvus Helm Charts as outlined under readme still point to milvus.io.
Requesting concerned to comment whether is this expected and if not, how do you suggest we pull helm artifacts (helm pull) from zillitech ?
I faced with an issue when ingress for minio can't be deploy in k8s cluster due to this error:
The Kubernetes API could not find version "v1beta1" of networking.k8s.io/Ingress for requested resource milvus-minio. Version "v1" of networking.k8s.io/Ingress is installed on the destination cluster.
As i can see now you are using this minion (from side repo):
https://github.com/zilliztech/milvus-helm/blob/minio-8.0.16/charts/minio/templates/_helpers.tpl#L70
And from helper i can see that there are no options for v1 netwkork.
Please update it to:
{{/*
Return the appropriate apiVersion for ingress.
*/}}
{{- define "minio.ingress.apiVersion" -}}
{{- if semverCompare "<1.14-0" .Capabilities.KubeVersion.GitVersion -}}
{{- print "extensions/v1beta1" -}}
{{- else if semverCompare "<1.19-0" .Capabilities.KubeVersion.GitVersion -}}
{{- print "networking.k8s.io/v1beta1" -}}
{{- else -}}
{{- print "networking.k8s.io/v1" -}}
{{- end -}}
{{- end -}}
Thank you!
Both Milvus Backup and CDC require filling out a considerable amount of configuration during deployment, and the deployment process can be quite complicated. It would be convenient if, like Attu, Backup and CDC could be optional deployment options.
Hi !
We use MixCoord and wanted to enable the active standby feature.
According to this chart, you should just have do that:
rootCoordinator:
enabled: false
queryCoordinator:
enabled: false
indexCoordinator:
enabled: false
dataCoordinator:
enabled: false
mixCoordinator:
replicas: 2
activeStandby:
enabled: true
But the MixCoord pod will fail and loop with this error in its logs for each type of coord (query, index, data...):
[2023/09/13 08:12:21.852 +00:00] [ERROR] [sessionutil/session_util.go:451] ["retry func failed"] ["retry time"=16] [error="function
CompareAndSwap error for compare is false for key: querycoord"] [stack="github.com/milvus-io/milvus/internal/util/sessionutil.
(*Session).registerService\n\t/go/src/github.com/milvus-io/milvus/internal/util/sessionutil/session_util.go:451\ngithub.com/milvus-
io/milvus/internal/util/sessionutil.(*Session).Register\n\t/go/src/github.com/milvus-io/milvus/internal/util/sessionutil
/session_util.go:271\ngithub.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register\n\t/go/src/github.com/milvus-io/milvus
/internal/querycoordv2/server.go:137\ngithub.com/milvus-io/milvus/internal/distributed/querycoord.(*Server).start\n\t/go
/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:262\ngithub.com/milvus-io/milvus/internal/distributed
/querycoord.(*Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:100\ngithub.com
/milvus-io/milvus/cmd/components.(*QueryCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components
/query_coord.go:53\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus
/cmd/roles/roles.go:111"]
To fix the issue, by quickly reading the error "function CompareAndSwap error for compare is false for key: XXXX",
I did a bet that I needed to enable activeStandby even with all other coordinator disabled:
rootCoordinator:
enabled: false
activeStandby:
enabled: true
queryCoordinator:
enabled: false
activeStandby:
enabled: true
indexCoordinator:
enabled: false
activeStandby:
enabled: true
dataCoordinator:
enabled: false
activeStandby:
enabled: true
mixCoordinator:
replicas: 2
activeStandby:
enabled: true
And indeed now it works \o/
I think the documentation (of the values.yml file?) need somehow to be updated.
Regards,
Hello,
My company has a policy where we have to upload images to our internal artifactory via docker.
When trying to run our CICD pipeline, we noticed that the 'docker.io/' prefix was being appended to our internal ref.
Does anyone know why this is just for etcd and how we can remove the 'docker.io/' prefix?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.