druid-io / druid-operator Goto Github PK
View Code? Open in Web Editor NEWDruid Kubernetes Operator
License: Other
Druid Kubernetes Operator
License: Other
Brokers and routers can be exposed using ingress, the operator should support ingress spec.
any reason to not tag docker images with latest tag? Generally ppl are interested in latest version and lazy to check what versions we have there;)
Error response from daemon: manifest for druidio/druid-operator:latest not found: manifest unknown: manifest unknown
autoscaling/v2beta1
.v2beta2 has nice features such as cooldown and stablization for pods in a rapid scaling env. Can be useful for MM, Brokers.
It's not possible to use nodeSelector inside a DruidNodeSpec as it's defined only at global level. Trivial workaround is to use affinity, but it's inconsistent.
Hi,
today i receive a lot of erros from middlemanager:
org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 9) in 718ms.
com.amazonaws.SdkClientException: Unable to execute HTTP request: druidtest.storagegw.estaleiro.XXX
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getAcl(AmazonS3Client.java:3381) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1160) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1150) ~[aws-java-sdk-s3-1.11.199.jar:?]
at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getBucketAcl(ServerSideEncryptingAmazonS3.java:71) ~[?:?]
at org.apache.druid.storage.s3.S3Utils.grantFullControlToBucketOwner(S3Utils.java:181) ~[?:?]
at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:263) ~[?:?]
at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:86) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:114) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:104) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:86) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:374) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
Caused by: java.net.UnknownHostException: druidtest.storagegw.estaleiro.serpro.gov.br
at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_222]
at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_222]
at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_222]
at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) ~[aws-java-sdk-core-1.11.199.jar:?]
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_222]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.conn.$Proxy76.connect(Unknown Source) ~[?:?]
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.3.jar:4.5.3]
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1235) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-core-1.11.199.jar:?]
... 27 more
2020-08-10T19:34:38,460 WARN [forking-task-runner-0] org.apache.druid.java.util.common.RetryUtils - Retrying (2 of 9) in 2,069ms.
com.amazonaws.SdkClientException: Unable to execute HTTP request: druidtest.storagegw.estaleiro.serpro.gov.br
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getAcl(AmazonS3Client.java:3381) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1160) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1150) ~[aws-java-sdk-s3-1.11.199.jar:?]
at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getBucketAcl(ServerSideEncryptingAmazonS3.java:71) ~[?:?]
at org.apache.druid.storage.s3.S3Utils.grantFullControlToBucketOwner(S3Utils.java:181) ~[?:?]
at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:263) ~[?:?]
at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:86) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:114) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:104) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:86) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:374) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
Caused by: java.net.UnknownHostException: druidtest.storagegw.estaleiro.serpro.gov.br
at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_222]
at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_222]
at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_222]
at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) ~[aws-java-sdk-core-1.11.199.jar:?]
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_222]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.conn.$Proxy76.connect(Unknown Source) ~[?:?]
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.3.jar:4.5.3]
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1235) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-core-1.11.199.jar:?]
... 27 more
My deepstorage configuration is like:
deepStorage:
spec:
properties: |-
druid.storage.type=s3
druid.s3.accessKey=...
druid.s3.secretKey=...
druid.s3.endpoint.url=https://storagegw.estaleiro.XXXXX
druid.storage.bucket=druidtest
druid.storage.baseKey=druid/segments
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=druidtest
druid.indexer.logs.s3Prefix=druid/indexing-logs
type: default
I look that the middlemanager try resolve the url:
druidtest.storagegw.estaleiro.XXXX
They shouldnt try resolve storagegw.estaleiro.XXXXX? Whats can be wrong? Tkz
Hi there, I'm hoping you could add some documentation on the features this operator provides. Obviously it creates a druid cluster, but what else does it do? Why should I use this? Thanks!
Hi,
Could you please attach a more descriptive examples including the Console UI, middlemanager, etc.
Example of running configuration of full prod deployment would be much appreciated.
Thanks!
I did an update in a running druid cluster, what i notice is the historicals do not come up , on describing the druid cluster, here is what i see. Trying to understand the output, does this mean that the updated historical pod is trying to get all the segments which were in the older historical pod ?
Since historical are the first process to get updated, the other druid process are stuck.
Not sure what is happening
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal UPDATE_SUCCESS 8m druid-operator Updated [ConfigMap:cluster-druid-common-config].
Normal UPDATE_SUCCESS 8m druid-operator Updated [StatefulSet:druid-cluster-historicals].
Normal ROLLING_DEPLOYMENT_WAIT 8m druid-operator StatefulSet[druid-cluster-historicals] roll out is in progress CurrentRevision[druid-cluster-historicals-75c77695db] != UpdateRevision[druid-cluster-historicals-844d58fc54], UpdatedReplicas[0/824650262508]
Normal ROLLING_DEPLOYMENT_WAIT 8m druid-operator StatefulSet[druid-cluster-historicals] roll out is in progress CurrentRevision[druid-cluster-historicals-75c77695db] != UpdateRevision[druid-cluster-historicals-844d58fc54], UpdatedReplicas[0/824648533800]
Normal ROLLING_DEPLOYMENT_WAIT 8m druid-operator StatefulSet[druid-cluster-historicals] roll out is in progress CurrentRevision[druid-cluster-historicals-75c77695db] != UpdateRevision[druid-cluster-historicals-844d58fc54], UpdatedReplicas[0/824646252040]
I think there is a strange behaviour with the Makefile
on this part
fmt:
test -z $$(gofmt -l -s pkg/apis/druid/v1alpha1/druid_types.go)
test -z $$(gofmt -l -s pkg/controller/druid)
I'm not sure about the best way to fix it
Currently termination grace period is set to default ie 30 sec. Can we add support in the spec so that we can modify the grace period.
Here are some of my coordinators issues/doubts while running druid on k8s
Coordinator follow leader election, meaning at a single time only one coordinator is active. So in case of HPA, if coordinator scales to lets say 2 pods, the request to coordinators is not still being load balanced as only one of them is a leader. The idea of HPA is to scale so that multiple pods can handle a request with the service load balancing it.
For HA architecture keeping in mind quorum, fault tolerance etc always have at least 3 coordinators ? Is it recommended ?
I want to expose the coordinator service to the end users so that can POST their ingestion specs to the coordinator. If i use ingress to expose it, i need the ingress to point it to the coordinator service. Now how does the k8s service know which coordinator is active between the 3 coordinators, i know druid has zookeeper for service discovery but that's for druid's internal service discovery, an HTTP request routing through an ingress and k8s service doesn't know which coordinator to hit to ( basically that's k8 service ).
I may be not conceptually aware about druid coordinator mechanisms, i am thinking about these more from a k8s perspective.
Thank you
On adding toleration in the druid cluster spec and in the makestatefull function return type ,
the stateful set fails to create a toleration in the spec.
I plan to add HPA autoscaling support in the types definition, for all the nodes ie historical, middle managers and brokers. Currently need to manually add after the operator deploys the druid cluster. Here a prototype which can be included in the nodeSpec, can keep this feature optional.
type AutoScaleSpec struct {
Autoscale *bool `json:"autoscale,omitempty"`
Replicas *int32 `json:"replicas,omitempty"`
MinReplicas *int32 `json:"minReplicas,omitempty"`
MaxReplicas *int32 `json:"maxReplicas,omitempty"`
I am using kind to try somethings.
well, when i deploy tiny-cluster example with default values, init script of docker image fails.
mkdir -p var/tmp var/druid/segments var/druid/indexing-logs var/druid/task var/druid/hadoop-tmp var/druid/segment-cache
Above part of init scripts fail with error "permission denied cannot create directory".
when i run containers as root (i have changed securityContext.runAsUser value to 0) everthing works as expected.
Am i doing something wrong ?
Name string `json:"name"`
in types.go and the use it as unique identifier for different nodes. This gives more flexibility. Can define what ever combination my node is.func makeNodeSpecificUniqueString(m *v1alpha1.Druid,n *v1alpah1.DruidNodeSpec) string {
return fmt.Sprintf("druid-%s-%s", m.Name, n.Name)
}
Looking for any other approach with minimal changes to existing code base
@himanshug
router is needed!
@himanshug
Please cut out a release. Here's a change log to refer too
Today if I try to update the storage size of my volumeClaimTemplates
on a NodeSpec and try to apply my change the operator return the following error :
{"level":"error","ts":1585819955.9165432,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"druid-controller","request":"partner-stats/druid-partner-stats","error":"Failed to update [StatefulSet:druid-druid-partner-stats-historicals] due to [StatefulSet.apps \"druid-druid-partner-stats-historicals\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden].","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/druid-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/druid-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/druid-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/druid-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/druid-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/druid-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/druid-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
Most of Kubernetes cluster now handle PVC extension and need a pod restart. We could manage this feature.
The current implementation of this method first tries to create the resource and then checks for metav1.StatusReasonAlreadyExists
error. I have come across some custom admission webhook that might not return the metav1.StatusReasonAlreadyExists
error to indicate already existing resources.
This proposal is to change the sdkCreateOrUpdateAsNeeded method(also rename the method to sdkCheckAndCreate
) to first get the resource and create only if the resource doesn't exist.
I tested out the above changes in our prod environment and will create a pr for these changes.
with below....
kind: "Druid"
metadata:
name: tiny-cluster
spec:
image: apache/incubator-druid:0.16.0-incubating
startScript: /druid.sh
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 1000
services:
- spec:
type: ClusterIP
clusterIP: None
nodeSelector:
type: "druid"
commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
....
I am having a small doubt since this operator creates all the druid services as statefulsets, do we really need to have statefull set for coordinators and overloads. Is it necessary to use a persistence such as ebs for it, or something such as hostPath or emptyDir should do the job as well.
I propose to deploy the druid cluster in a service account so that adequate RBAC's can be applied in a prod environment.
Currently operator cannot pull custom built images private registeries which require authentication, add support for ImagePullSecrets option in sts.
Currently only the operator runs in a SA. Not sure adding SA in druid cluster spec should it be a Required or Optional Parameter ?
@himanshug could you suggest on service account
NodeSelectors are only supported in the Druid Cluster Spec. Though affinities should work, but it would be good to have in cluster and nodeSpec
Hello,
I am trying to setup druid locally on minikube, anyone help please. Right now, when I try to do it, the cluster pods do't run due to the resource limits set.
Thanks!
@himanshug can we add a roadmap for this operator. Some contribution guidelines, how can we handle seamless version upgrade, and maybe running auto pilot. Can you highlight those.
Scenario:
MM are running, and incase of dynamic scaling their are time when HPA scales down the MM, even when MM are running a task. Even regardless of HPA their should be a mechanism which protects the MM Pod from getting deleted, when a task is associated with that MM at all Times.
Not exactly sure on how to solve this, but maybe having finalizers can be useful. Finalizers can make sure a pre-deletion hooks runs and make sure the pod remains in terminating phase till the time task is not complete ( not sure how feasible this options is as i see mostly finalizers are associated with CRD's, how can i associate it with statefulset).
Have pre-Stop hook and extend the TerminationGracePeriod ( which is supported by operator ) to make sure the tasks are running.
Some questions arise:
How do we me make a mapping which between a MM pod and task being associated with it. The endpoint /druid/v1/indexer/runningTasks
has the following json output
[{"id":"ds_2020-04-09T17:41:09.220Z","groupId":"ds",09T17:41:09.220Z","type":"compact","createdTime":"2020-04-09T18:19:05.795Z","queueInsertionTime":"2020-04-09T18:19:05.795Z","statusCode":"RUNNING","status":"RUNNING","runnerStatusCode":"RUNNING","duration":null,"location":{"host":"10.28.6.38","port":8100,"tlsPort":-1},"dataSource":"ds","errorMsg":null}]
Here the "host":"10.28.6.38"
maps to a kubernetes ep ( kubectl get ep
). Now i need to make sure anyhow the pod associated with this host does not scale down or gets deleted at all times.
Any suggestions, ideas would be helpful.
@vDMG
Today, all our password are in clear in our configuration, which is not a good securiy practice.
example
druid.metadata.storage.connector.password=toto
druid.zk.service.pwd
We could allow user to specify somme environment variable to erase some values
example
env:
- name: DRUID_METADATA_STORAGE_CONNECTOR_PASSWORD
valueFrom:
secretKeyRef:
name: druid-connector
key: password
We could imagine to replace after those values in the configuration.
We need to define stricts norms for those environment variables ("_" replace ".",all upper case and starting with "DRUID_".
What are your thoughts about this idea ? @himanshug @AdheipSingh
Everything starts fine without any errors in the log. I am able to call APIs to check health of brokers, coordinator & historicals but when I call overlord api https:///druid/indexer/v1/leader or any other indexer apis or open web console, I get error in the logs
ServerDiscoverySelector - No server instance found for [druid/overlord]
2019-11-17T03:48:53,704 WARN [qtp715602332-103] org.eclipse.jetty.server.HttpChannel - /druid/indexer/v1/scaling
org.apache.druid.java.util.common.ISE: Couldn't find leader.
at org.apache.druid.discovery.DruidLeaderClient.findCurrentLeader(DruidLeaderClient.java:262) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.server.http.OverlordProxyServlet.rewriteTarget(OverlordProxyServlet.java:63) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.eclipse.jetty.proxy.ProxyServlet.service(ProxyServlet.java:63) ~[jetty-proxy-9.4.10.v20180503.jar:9.4.10.v20180503]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:73) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:82) ~[druid-server-0.16.0-incubating.jar:0.16.0-in
cubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:75) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.AllowAllAuthenticator$1.doFilter(AllowAllAuthenticator.java:84) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:86) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:724) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.Server.handle(Server.java:531) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
Caused by: org.apache.druid.java.util.common.IOE: No known server
at org.apache.druid.discovery.DruidLeaderClient.getCurrentKnownLeader(DruidLeaderClient.java:297) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.discovery.DruidLeaderClient.makeRequest(DruidLeaderClient.java:132) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.discovery.DruidLeaderClient.makeRequest(DruidLeaderClient.java:140) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
at org.apache.druid.discovery.DruidLeaderClient.findCurrentLeader(DruidLeaderClient.java:259) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]
I'm having a hard time getting the operator to launch a working cluster. From what I can tell from the logs, the coordinator and overlord do not seem to be getting initiated with the config in the yaml deployment (attached as help.txt). The other parts of the cluster seem to be able to find each other.
Logs for coordinator:
2020-03-09T20:21:01.946365074Z 2020-03-09T20:21:01+0000 startup service coordinator
2020-03-09T20:21:01.971997182Z Setting druid.host=10.44.2.78 in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
2020-03-09T20:21:03.19655891Z ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.
Logs for overlord:
2020-03-09T21:58:57+0000 startup service overlord
Setting druid.host=10.44.3.7 in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.
Logs for Broker:
logs-from-druid-cluster-brokers-in-druid-cluster-brokers-0.txt
At first I was using the prebuilt image from docker to run the operator and ran into the same errors. I am now using a locally built docker image from cloning the repo and building it from there.
The error shows up if I use image: "apache/incubator-druid:0.16.0-incubating"
in the configs as well.
I'm running out of ideas for what could be going wrong. Help would be greatly appreciated!
Hello,
Today, I'm facing a blocking situation i'm not able to add PodAnnotations on my DruidNodeSpec because of this error :
Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden
Someone know a workaround ?
How to properly fix it in the operator ?
Thanks.
On the basis of this https://github.com/kubernetes-sigs/kubebuilder/tree/master/docs/testing.
I am planning to add test case for druid_types.go. Two specs which are to be tested is the DruidClusterSpec and DruidNodeSpec, what to assert is the // Required Params in the structs //.
Here is a sample playaround code.
package v1alpha1_test
import (
. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"
"golang.org/x/net/context"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
. "github.com/druid-io/druid-operator/pkg/apis/druid/v1alpha1"
)
var _ = Describe("Druids", func() {
var (
key types.NamespacedName
created, fetched *Druid
)
BeforeEach(func() {
// Add any setup steps that needs to be executed before each test
})
AfterEach(func() {
// Add any teardown steps that needs to be executed after each test
})
// Avoid adding tests for vanilla CRUD operations because they would
// test Kubernetes API server, which isn't the goal here.
Context("Create API", func() {
It("should create an object successfully", func() {
key = types.NamespacedName{
Name: "foo",
Namespace: "default",
}
created = &Druid{
ObjectMeta: metav1.ObjectMeta{
Name: "foo",
Namespace: "default",
},
Spec: DruidClusterSpec{
// Add all the params to be tested
JvmOptions: "",
}}
// test same for nodespec
// add another context for nodespec
By("creating an API obj")
Expect(k8sClient.Create(context.TODO(), created)).To(Succeed())
fetched = &Druid{}
Expect(k8sClient.Get(context.TODO(), key, fetched)).To(Succeed())
Expect(fetched).To(Equal(created))
By("deleting the created object")
Expect(k8sClient.Delete(context.TODO(), created)).To(Succeed())
Expect(k8sClient.Get(context.TODO(), key, created)).ToNot(Succeed())
})
})
})
Any specifics which you feel needed to be looked upon...... @himanshug @akashdw
Since the yaml is large lot of parameters are sometimes go unoticed. Can we add a validation openapi for custom resource so we can be checked if a REQUIRED parameter is missing.
@himanshug this requires change in directory structure. Ill create a WIP draft and test it out, to make sure it backward compatible and does not break existing structure.
add support to customize preStop?
spec:
template:
spec:
terminationGracePeriodSeconds: 4200
containers:
- name: druid-xapi-druid-middlemanagers
lifecycle:
preStop:
exec:
command:
- sh
- -c
- |
wget --post-data='{}' -q -O - http://localhost:8088/druid/worker/v1/disable > /dev/null && while [[ "$(wget -q -O - http://localhost:8088/druid/worker/v1/tasks)" != "[]" ]]; do sleep 10; done
Once #54 is done, can sumbit this operator to operatorhub.io. This will help in gaining some more community and contributions to the operator.
Here are the specific criteria's and checklist to be done before submitting.
https://operatorhub.io/contribute
OLM must be integrated with the current setup. CRD's, RBAC will deployed through this catalog.
https://github.com/operator-framework/operator-lifecycle-manager
Currently the operator does validation at the operator level. So in case we miss out or miss configure configurations, after we apply the CR, we need to check the operator logs and describe the operator to figure what went wrong. Then do a manual deletion of the CR and then apply again.
With SDK supporting validating webhooks will be able to reject at Kube API level.
While using a custom entry point for druid, which substitutes some env values during initialisation of pods, i saw the config maps getting mounted were nt able to get copied to the right directory as in the entry point script. something equivalent too https://github.com/apache/druid/blob/master/distribution/docker/druid.sh#L49
i ended up adding subPaths to the configmaps, which basically solved that problem.
We should have in general too avoid confusion.
volumeMount := []v1.VolumeMount{
{
MountPath: m.Spec.CommonConfigMountPath + "/common.runtime.properties",
Name: "common-config-volume",
ReadOnly: true,
SubPath: "common.runtime.properties",
},
{
MountPath: m.Spec.CommonConfigMountPath + "/log4j2.xml",
Name: "common-config-volume",
ReadOnly: true,
SubPath: "log4j2.xml",
},
{
MountPath: nodeSpec.NodeConfigMountPath + "/jvm.config",
Name: "nodetype-config-volume",
ReadOnly: true,
SubPath: "jvm.config",
},
{
MountPath: nodeSpec.NodeConfigMountPath + "/runtime.properties",
Name: "nodetype-config-volume",
ReadOnly: true,
SubPath: "runtime.properties",
},
$ kubectl get po
NAME READY STATUS RESTARTS AGE
druid-operator-5868c9bb8c-stvsm 0/1 InvalidImageName 0 3m50s
$ kubectl logs druid-operator-5868c9bb8c-stvsm
Error from server (BadRequest): container "druid-operator" in pod "druid-operator-5868c9bb8c-stvsm" is waiting to start: InvalidImageName
what are your thoughts/plans on provisioning/terminating deepstorage mainly buckets on cloud platforms (s3, gcs etc ) using this interface ?
Should we have a seperate CR in the operator with kind: Deepstorage
? and have seperate controller for this CR ? ( just a thought )
Or we can just add methods to type deepStorageManager interface
, and maybe have a Type S3, GCS in their ( currently its only default ) i think that was mainly why you added this interface
It will be nice to have some management around deepstorage.
Hello ,
I m trying to run druid on k8s and I've been stuck for the past few days: all the pods are running but I cannot get the "wiki example" working.
The overlord keeps logging the following exception:
error":"org.skife.jdbi.v2.exceptions.CallbackFailedException: java.lang.NullPointerException: tuples must be non-null
I can see the tables created in posgresql. Also, I can see the segments created in my s3 bucket (even though it's very slow) but the datasource is not created
here is my configuration
apiVersion: "druid.apache.org/v1alpha1"
kind: "Druid"
metadata:
name: druid-cluster
spec:
image: apache/incubator-druid:0.16.1-incubating
# Optionally specify image for all nodes. Can be specify on nodes also
# imagePullSecrets:
# - name: tutu
env:
- name: AWS_REGION
value: eu-west-1
startScript: /druid.sh
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 1000
services:
- spec:
type: ClusterIP
clusterIP: None
commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
jvm.options: |
-server
-XX:MaxDirectMemorySize=10240g
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Dlog4j.debug
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
log4j.config: |
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
common.runtime.properties: |
# Zookeeper
druid.zk.service.host=druid-cluster-zk.druid.svc.cluster.local
druid.zk.paths.base=/druid
druid.zk.service.compress=false
# Metadata Store
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://postgres.storage.svc.cluster.local/druid-metadata
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=#{POSTGRES_DRUID_PASSWORD}#
druid.metadata.storage.connector.createTables=true
# druid.metadata.storage.type=derby
# druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527//opt/var/druid_state/derby;create=true
# Deep Storage
# druid.storage.type=local
# druid.storage.storageDirectory=/druid/data/deepstorage
druid.storage.type=s3
druid.storage.bucket=#{S3_BUCKET}#
druid.storage.baseKey=druid/segments
druid.indexer.logs.directory=data/logs/
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=#{S3_BUCKET}#
druid.indexer.logs.s3Prefix=druid/indexing-logs
druid.s3.accessKey=#{S3_ACCESS_KEY}#
druid.s3.secretKey=#{S3_SECRET_KEY}#
#
# Extensions
#
druid.extensions.loadList=["druid-s3-extensions","druid-kafka-indexing-service","postgresql-metadata-storage"]
#
# Service discovery
#
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
#
# Monitoring
#
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
#druid.emitter=noop
druid.emitter.logging.logLevel=debug
nodes:
brokers:
nodeType: "broker"
# Optionally specify for broker nodes
# imagePullSecrets:
# - name: tutu
druid.port: 8088
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
replicas: 1
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8088
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8088
log4j.config: |
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
services:
- metadata:
name: broker-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8088
targetPort: 8088
type: ClusterIP
runtime.properties: |
druid.service=druid/broker
# HTTP server threads
druid.broker.http.numConnections=5
druid.server.http.numThreads=10
# Processing threads and buffers
druid.processing.buffer.sizeBytes=1
druid.processing.numMergeBuffers=1
druid.processing.numThreads=1
druid.sql.enable=true
extra.jvm.options: |
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-cx-druid-cluster-broker-0
coordinators:
nodeType: "coordinator"
druid.port: 8088
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
replicas: 1
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8088
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8088
log4j.config: |
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
services:
- metadata:
name: coordinator-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8088
targetPort: 8088
type: ClusterIP
runtime.properties: |
druid.service=druid/coordinator
# HTTP server threads
druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
druid.coordinator.asOverlord.enabled=false
# Configure this coordinator to also run as Overlord
# druid.coordinator.asOverlord.enabled=true
# druid.coordinator.asOverlord.overlordService=druid/overlord
# druid.indexer.queue.startDelay=PT30S
# druid.indexer.runner.type=remote
# druid.indexer.storage.type=metadata
extra.jvm.options: |
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-cx-druid-cluster-coordinator-0
historicals:
nodeType: "historical"
druid.port: 8088
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
replicas: 1
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8088
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8088
log4j.config: |
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
services:
- metadata:
name: historical-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8088
targetPort: 8088
type: ClusterIP
runtime.properties: |
druid.service=druid/historical
druid.server.http.numThreads=5
druid.processing.buffer.sizeBytes=1
druid.processing.numMergeBuffers=1
druid.processing.numThreads=1
# Segment storage
druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":10737418240}]
druid.server.maxSize=10737418240
extra.jvm.options: |
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-cx-druid-cluster-historical-0
middlemanagers:
nodeType: "middleManager"
druid.port: 8091
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/middleManager"
replicas: 1
ports:
-
containerPort: 8100
name: peon-0-pt
-
containerPort: 8101
name: peon-1-pt
-
containerPort: 8102
name: peon-2-pt
-
containerPort: 8103
name: peon-3-pt
-
containerPort: 8104
name: peon-4-pt
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8091
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8091
log4j.config: |
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
runtime.properties: |
druid.service=druid/middleManager
druid.worker.capacity=4
druid.indexer.runner.javaOpts=-server -XX:MaxDirectMemorySize=10240g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/druid/data/tmp -Dlog4j.debug -XX:+UnlockDiagnosticVMOptions -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=50 -XX:GCLogFileSize=10m -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -XX:HeapDumpPath=/druid/data/logs/peon.%t.%p.hprof -Xms10G -Xmx10G
druid.indexer.task.baseTaskDir=/druid/data/baseTaskDir
druid.server.http.numThreads=10
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=1
druid.indexer.fork.property.druid.processing.numMergeBuffers=1
druid.indexer.fork.property.druid.processing.numThreads=1
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1
services:
- metadata:
name: middlemanager-%s-service
spec:
clusterIP: None
ports:
-
name: tcp-service-port
port: 8091
targetPort: 8091
-
name: peon-port-0
port: 8100
targetPort: 8100
-
name: peon-port-1
port: 8101
targetPort: 8101
-
name: peon-port-2
port: 8102
targetPort: 8102
-
name: peon-port-3
port: 8103
targetPort: 8103
-
name: peon-port-4
port: 8104
targetPort: 8104
type: ClusterIP
extra.jvm.options: |
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-cx-druid-cluster-middlemanager-0
overlords:
livenessProbe:
initialDelaySeconds: 50
httpGet:
path: /status/health
port: 8090
readinessProbe:
initialDelaySeconds: 50
httpGet:
path: /status/health
port: 8090
log4j.config: |
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
druid.port: 8090
extra.jvm.options: |
-Xmx4G
-Xms4G
nodeType: overlord
nodeConfigMountPath: /opt/druid/conf/druid/cluster/master/coordinator-overlord
replicas: 1
runtime.properties: |
druid.service=druid/overlord
druid.indexer.queue.startDelay=PT2M
druid.indexer.queue.restartDelay=PT2M
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
services:
- metadata:
name: overlord-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8090
targetPort: 8090
type: ClusterIP
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-cx-druid-cluster-overlord-0
volumeMounts:
- mountPath: /druid/data
name: data-volume
routers:
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8888
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8888
log4j.config: |
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
druid.port: 8888
extra.jvm.options: |
-Xmx512m
-Xms512m
nodeType: router
nodeConfigMountPath: /opt/druid/conf/druid/cluster/query/router
replicas: 1
runtime.properties: |
druid.service=druid/router
druid.plaintextPort=8888
# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true
services:
- metadata:
name: router-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8888
targetPort: 8888
type: ClusterIP
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-cx-druid-cluster-routeur-0
volumeMounts:
- mountPath: /druid/data
name: data-volume
HI, can we roll out a release mentioning the features added.
What i would like to automate here, is the second step. I can easily delete sts using operator with cascade, what i am not sure at what events will the operator know that k8s api will not update the sts. If we can get this, this will be a big help.
@himanshug any thoughts what can help us trigger this command ?
I have my deepstorage set to local. I am using hostpath in middle manager to push data to local ssd. Somehow it keeps on failing to write to deep storage.
Here is my middle manager spec.
I am using deep storage set as this, not sure if i am missing anything.
deepStorage:
spec:
properties: |-
druid.storage.type=local
druid.storage.storageDirectory=data/segments
druid.indexer.logs.type=file
druid.indexer.logs.directory=data/logs/
type: default
middlemanagers:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
-
matchExpressions:
-
key: node-type
operator: In
values:
- druid-data
druid.port: 8091
extra.jvm.options: |-
-Xmx4G
-Xms4G
nodeType: middleManager
nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/middlemanager
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="info">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
podDisruptionBudgetSpec:
maxUnavailable: 1
ports:
-
containerPort: 8100
name: peon-0-pt
-
containerPort: 8101
name: peon-1-pt
-
containerPort: 8102
name: peon-2-pt
-
containerPort: 8103
name: peon-3-pt
-
containerPort: 8104
name: peon-4-pt
replicas: 2
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
runtime.properties: |-
druid.service=druid/middleManager
druid.plaintextPort=8091
# Number of tasks per middleManager
druid.worker.capacity=4
# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
# HTTP server threads
druid.server.http.numThreads=60
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1
services:
-
spec:
clusterIP: None
ports:
-
name: tcp-service-port
port: 8091
targetPort: 8091
-
name: peon-port-0
port: 8100
targetPort: 8100
-
name: peon-port-1
port: 8101
targetPort: 8101
-
name: peon-port-2
port: 8102
targetPort: 8102
-
name: peon-port-3
port: 8103
targetPort: 8103
-
name: peon-port-4
port: 8104
targetPort: 8104
type: ClusterIP
tolerations:
-
effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
volumeMounts:
-
mountPath: /opt/apache-druid-0.16.0-incubating/data
name: data-volume
volumes:
-
hostPath:
path: /data
name: data-volume
overlords:
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
kubectl get druid my-druid -o yaml -n my-namespace
In the above shot, we are not getting status, basically all k8s objects using runtime.object interface are responsible sending out a status which can also be polled /status endpoint.
Just curious is this something we did on purpose or missed out in the CRD spec.
Ill be evaulting and fixing this anyways. :)
@himanshug
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.