druid-io / druid-operator Goto Github PK

View Code? Open in Web Editor NEW

205.0 205.0 92.0 58.73 MB

Druid Kubernetes Operator

License: Other

Dockerfile 0.37% Makefile 4.30% Go 93.82% Smarty 1.50%

druid-operator's People

Contributors

Stargazers

Watchers

Forkers

chenzhouzhan elanv adheipsingh camerondavison youngwookim koosie0507 2514millerj himanshug cspargo vdmg gl myafq guilongyang intellipse aiyi2099 arrawatia beni20 kentakozuka jinfwhuang nishantmonu51 merouaneben niketh isabella232 zhangyue19921010 sahr-imonitor infrastructure-as-code junneyang laashub-soa wistia neuroform-ai averma111 peakbi fffialho lum-splunk samitks y56 a2l007 ragnarow codingparsley layoaster aleximb pharnoux arunchandythomas piotrkmita emilmarkose dchang1125 confluentinc pauldx sohansamant08 alenbadel sohansamant8 hercules261188 timesking fredcoq shrutimantri meetpraveen tufanrakshit pkb149 influbi shipperizer thatyodacoder roelofkuijpers usamaali77 frostruan acherla isandeep41 deepak-bs kapkiai blockcoder1 cintosunny neeleshkorade sairamdevarashetty robinnilsson vladislavpv zhangluva sensoria-pro datacentric-ai lgn-ai kienvu58 miprop-dv ballastdata anmoldukaan deepakic22 jwitko eylonlevy popovicmiroslav playsimple-games oceanluo anhngd pinimo bryanflxea aruraghuwanshi

druid-operator's Issues

Add Ingress in Spec

Brokers and routers can be exposed using ingress, the operator should support ingress spec.

docker hub latest tag

any reason to not tag docker images with latest tag? Generally ppl are interested in latest version and lazy to check what versions we have there;)

Error response from daemon: manifest for druidio/druid-operator:latest not found: manifest unknown: manifest unknown

Support HPA v2beta2 apiVersion

autoscaling HPA apiVersions supported by operator is autoscaling/v2beta1.
add v2beta2 support in operator, keeping backward compatibility in mind.

v2beta2 has nice features such as cooldown and stablization for pods in a rapid scaling env. Can be useful for MM, Brokers.

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-cooldown-delay

nodeSelector missing in DruidNodeSpec

It's not possible to use nodeSelector inside a DruidNodeSpec as it's defined only at global level. Trivial workaround is to use affinity, but it's inconsistent.

S3 URL Problem

Hi,
today i receive a lot of erros from middlemanager:

org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 9) in 718ms.
com.amazonaws.SdkClientException: Unable to execute HTTP request: druidtest.storagegw.estaleiro.XXX
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.getAcl(AmazonS3Client.java:3381) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1160) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1150) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getBucketAcl(ServerSideEncryptingAmazonS3.java:71) ~[?:?]
	at org.apache.druid.storage.s3.S3Utils.grantFullControlToBucketOwner(S3Utils.java:181) ~[?:?]
	at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:263) ~[?:?]
	at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:86) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:114) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:104) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:86) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:374) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
Caused by: java.net.UnknownHostException: druidtest.storagegw.estaleiro.serpro.gov.br
	at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_222]
	at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_222]
	at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_222]
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) ~[aws-java-sdk-core-1.11.199.jar:?]
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_222]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.conn.$Proxy76.connect(Unknown Source) ~[?:?]
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.3.jar:4.5.3]
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1235) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-core-1.11.199.jar:?]
	... 27 more
2020-08-10T19:34:38,460 WARN [forking-task-runner-0] org.apache.druid.java.util.common.RetryUtils - Retrying (2 of 9) in 2,069ms.
com.amazonaws.SdkClientException: Unable to execute HTTP request: druidtest.storagegw.estaleiro.serpro.gov.br
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.getAcl(AmazonS3Client.java:3381) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1160) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at com.amazonaws.services.s3.AmazonS3Client.getBucketAcl(AmazonS3Client.java:1150) ~[aws-java-sdk-s3-1.11.199.jar:?]
	at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getBucketAcl(ServerSideEncryptingAmazonS3.java:71) ~[?:?]
	at org.apache.druid.storage.s3.S3Utils.grantFullControlToBucketOwner(S3Utils.java:181) ~[?:?]
	at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:263) ~[?:?]
	at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:86) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:114) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:104) [druid-core-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:86) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) [druid-s3-extensions-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:374) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.16.0-incubating.jar:0.16.0-incubating]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
Caused by: java.net.UnknownHostException: druidtest.storagegw.estaleiro.serpro.gov.br
	at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_222]
	at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_222]
	at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_222]
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) ~[aws-java-sdk-core-1.11.199.jar:?]
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_222]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.conn.$Proxy76.connect(Unknown Source) ~[?:?]
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.3.jar:4.5.3]
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1235) ~[aws-java-sdk-core-1.11.199.jar:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-core-1.11.199.jar:?]
	... 27 more

My deepstorage configuration is like:

 deepStorage:
    spec:
      properties: |-
        druid.storage.type=s3
        druid.s3.accessKey=...
        druid.s3.secretKey=...
        druid.s3.endpoint.url=https://storagegw.estaleiro.XXXXX

        druid.storage.bucket=druidtest
        druid.storage.baseKey=druid/segments
        druid.indexer.logs.type=s3
        druid.indexer.logs.s3Bucket=druidtest
        druid.indexer.logs.s3Prefix=druid/indexing-logs
    type: default

I look that the middlemanager try resolve the url:
druidtest.storagegw.estaleiro.XXXX

They shouldnt try resolve storagegw.estaleiro.XXXXX? Whats can be wrong? Tkz

Document what the operator does

Hi there, I'm hoping you could add some documentation on the features this operator provides. Obviously it creates a druid cluster, but what else does it do? Why should I use this? Thanks!

Detailed prod examples

Hi,
Could you please attach a more descriptive examples including the Console UI, middlemanager, etc.
Example of running configuration of full prod deployment would be much appreciated.

Thanks!

I did an update in a running druid cluster, what i notice is the historicals do not come up , on describing the druid cluster, here is what i see. Trying to understand the output, does this mean that the updated historical pod is trying to get all the segments which were in the older historical pod ?
Since historical are the first process to get updated, the other druid process are stuck.
Not sure what is happening

Events:
  Type    Reason                   Age   From            Message
  ----    ------                   ----  ----            -------
  Normal  UPDATE_SUCCESS           8m    druid-operator  Updated [ConfigMap:cluster-druid-common-config].
  Normal  UPDATE_SUCCESS           8m    druid-operator  Updated [StatefulSet:druid-cluster-historicals].
  Normal  ROLLING_DEPLOYMENT_WAIT  8m    druid-operator  StatefulSet[druid-cluster-historicals] roll out is in progress CurrentRevision[druid-cluster-historicals-75c77695db] != UpdateRevision[druid-cluster-historicals-844d58fc54], UpdatedReplicas[0/824650262508]
  Normal  ROLLING_DEPLOYMENT_WAIT  8m    druid-operator  StatefulSet[druid-cluster-historicals] roll out is in progress CurrentRevision[druid-cluster-historicals-75c77695db] != UpdateRevision[druid-cluster-historicals-844d58fc54], UpdatedReplicas[0/824648533800]
  Normal  ROLLING_DEPLOYMENT_WAIT  8m    druid-operator  StatefulSet[druid-cluster-historicals] roll out is in progress CurrentRevision[druid-cluster-historicals-75c77695db] != UpdateRevision[druid-cluster-historicals-844d58fc54], UpdatedReplicas[0/824646252040]

Fix ci

I think there is a strange behaviour with the Makefile on this part

fmt:
	test -z $$(gofmt -l -s pkg/apis/druid/v1alpha1/druid_types.go)
	test -z $$(gofmt -l -s pkg/controller/druid)

I'm not sure about the best way to fix it

TerminationGracePeriod Support

Currently termination grace period is set to default ie 30 sec. Can we add support in the spec so that we can modify the grace period.

Operator should delete PVC once CR is deleted

Currently the operator does not remove PVC's after the CR has been removed.

Proposal

In the types.go have a volumeReclaimPolicy if set to delete, the operator should clear up the pvc associated with that CR.
Ideal way is to have a finalizer which shall run up clean up task to remove pvc.

Additionally the operator should also support cleaning of pvc's which are not mounted or referenced by the CR pods. Statefulset do not delete the pvc associated by the pod. so incase of scale up/down pvc's keep on hanging up. This can also be added optionally.

@himanshug

Coordinators Scenarios

Here are some of my coordinators issues/doubts while running druid on k8s

Coordinator follow leader election, meaning at a single time only one coordinator is active. So in case of HPA, if coordinator scales to lets say 2 pods, the request to coordinators is not still being load balanced as only one of them is a leader. The idea of HPA is to scale so that multiple pods can handle a request with the service load balancing it.
For HA architecture keeping in mind quorum, fault tolerance etc always have at least 3 coordinators ? Is it recommended ?
I want to expose the coordinator service to the end users so that can POST their ingestion specs to the coordinator. If i use ingress to expose it, i need the ingress to point it to the coordinator service. Now how does the k8s service know which coordinator is active between the 3 coordinators, i know druid has zookeeper for service discovery but that's for druid's internal service discovery, an HTTP request routing through an ingress and k8s service doesn't know which coordinator to hit to ( basically that's k8 service ).

I may be not conceptually aware about druid coordinator mechanisms, i am thinking about these more from a k8s perspective.

Thank you

add support for tolerations #2

On adding toleration in the druid cluster spec and in the makestatefull function return type ,
the stateful set fails to create a toleration in the spec.

AutoScaling Using HPA

I plan to add HPA autoscaling support in the types definition, for all the nodes ie historical, middle managers and brokers. Currently need to manually add after the operator deploys the druid cluster. Here a prototype which can be included in the nodeSpec, can keep this feature optional.

type AutoScaleSpec struct {
	Autoscale *bool `json:"autoscale,omitempty"`
	Replicas *int32 `json:"replicas,omitempty"`
	MinReplicas *int32 `json:"minReplicas,omitempty"`
	MaxReplicas *int32 `json:"maxReplicas,omitempty"`

permission denied error when using default securityContext in tiny-cluster example

I am using kind to try somethings.
well, when i deploy tiny-cluster example with default values, init script of docker image fails.

mkdir -p var/tmp var/druid/segments var/druid/indexing-logs var/druid/task var/druid/hadoop-tmp var/druid/segment-cache

Above part of init scripts fail with error "permission denied cannot create directory".
when i run containers as root (i have changed securityContext.runAsUser value to 0) everthing works as expected.
Am i doing something wrong ?

Ability to define cold and hot historical in Node Spec

ISSUE

Historicals can be associated with a tier cold, hot or default . As a user i should be able to specify whether my historical is cold or hot.
NodeType key only accepts "historical"

druid-operator/pkg/controller/druid/handler.go

Line 960 in 0d843a4

historical: make([]keyAndNodeSpec, 0, 1),
Plus the pod names are based on this key.

druid-operator/pkg/controller/druid/handler.go

Line 418 in 0d843a4

func makeNodeSpecificUniqueString(m *v1alpha1.Druid, key string) string {

To have this possible we can add a Name string `json:"name"` in types.go and the use it as unique identifier for different nodes. This gives more flexibility. Can define what ever combination my node is.

func makeNodeSpecificUniqueString(m *v1alpha1.Druid,n *v1alpah1.DruidNodeSpec) string {
	return fmt.Sprintf("druid-%s-%s", m.Name, n.Name)
}

Looking for any other approach with minimal changes to existing code base
@himanshug

example add router

router is needed!

Release 0.03

@himanshug
Please cut out a release. Here's a change log to refer too

Change Log

go modules updated to kubernetes 1.16.2
Support for Deployments
Ingress, containerSecurityContext. LifeCycle and ReadOnly for ConfigMaps added

Allow possibility to update a storage size of a Node

Today if I try to update the storage size of my volumeClaimTemplates on a NodeSpec and try to apply my change the operator return the following error :

{"level":"error","ts":1585819955.9165432,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"druid-controller","request":"partner-stats/druid-partner-stats","error":"Failed to update [StatefulSet:druid-druid-partner-stats-historicals] due to [StatefulSet.apps \"druid-druid-partner-stats-historicals\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden].","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/druid-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/druid-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/druid-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/druid-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/druid-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/druid-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/druid-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Most of Kubernetes cluster now handle PVC extension and need a pod restart. We could manage this feature.

druid.sdkCreateOrUpdateAsNeeded to first get and then create the resource

The current implementation of this method first tries to create the resource and then checks for metav1.StatusReasonAlreadyExists error. I have come across some custom admission webhook that might not return the metav1.StatusReasonAlreadyExists error to indicate already existing resources.
This proposal is to change the sdkCreateOrUpdateAsNeeded method(also rename the method to sdkCheckAndCreate) to first get the resource and create only if the resource doesn't exist.
I tested out the above changes in our prod environment and will create a pr for these changes.

nodeSelector not working

with below....

kind: "Druid"
metadata:
  name: tiny-cluster
spec:
  image: apache/incubator-druid:0.16.0-incubating
  startScript: /druid.sh
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsGroup: 1000
  services:
    - spec:
        type: ClusterIP
        clusterIP: None
  nodeSelector:
    type: "druid"
  commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
....

Statefull Sets for Master Nodes

I am having a small doubt since this operator creates all the druid services as statefulsets, do we really need to have statefull set for coordinators and overloads. Is it necessary to use a persistence such as ebs for it, or something such as hostPath or emptyDir should do the job as well.

allow custom sidecar containers & initContaniers

Deploy druid cluster in Service Account and support ImagePullSecrets

I propose to deploy the druid cluster in a service account so that adequate RBAC's can be applied in a prod environment.
Currently operator cannot pull custom built images private registeries which require authentication, add support for ImagePullSecrets option in sts.
Currently only the operator runs in a SA. Not sure adding SA in druid cluster spec should it be a Required or Optional Parameter ?
@himanshug could you suggest on service account

NodeSelectors are only supporting druid cluster spec

NodeSelectors are only supported in the Druid Cluster Spec. Though affinities should work, but it would be good to have in cluster and nodeSpec

Deploying on minikube

Hello,

I am trying to setup druid locally on minikube, anyone help please. Right now, when I try to do it, the cluster pods do't run due to the resource limits set.

Thanks!

RoadMap for Operator

@himanshug can we add a roadmap for this operator. Some contribution guidelines, how can we handle seamless version upgrade, and maybe running auto pilot. Can you highlight those.

Why druid-operator use operator-sdk-0.11?

Why operator-SDK 0.11?

Any Plans to submit to hub?

Which stage is druid-operator?

Will tuning, anomalies, alerts and other features be added later on?

Protect Middle Manager From Deletion when a running task is associated with MM

Scenario:
MM are running, and incase of dynamic scaling their are time when HPA scales down the MM, even when MM are running a task. Even regardless of HPA their should be a mechanism which protects the MM Pod from getting deleted, when a task is associated with that MM at all Times.

Not exactly sure on how to solve this, but maybe having finalizers can be useful. Finalizers can make sure a pre-deletion hooks runs and make sure the pod remains in terminating phase till the time task is not complete ( not sure how feasible this options is as i see mostly finalizers are associated with CRD's, how can i associate it with statefulset).

Have pre-Stop hook and extend the TerminationGracePeriod ( which is supported by operator ) to make sure the tasks are running.

Some questions arise:
How do we me make a mapping which between a MM pod and task being associated with it. The endpoint /druid/v1/indexer/runningTasks has the following json output

[{"id":"ds_2020-04-09T17:41:09.220Z","groupId":"ds",09T17:41:09.220Z","type":"compact","createdTime":"2020-04-09T18:19:05.795Z","queueInsertionTime":"2020-04-09T18:19:05.795Z","statusCode":"RUNNING","status":"RUNNING","runnerStatusCode":"RUNNING","duration":null,"location":{"host":"10.28.6.38","port":8100,"tlsPort":-1},"dataSource":"ds","errorMsg":null}]

Here the "host":"10.28.6.38" maps to a kubernetes ep ( kubectl get ep). Now i need to make sure anyhow the pod associated with this host does not scale down or gets deleted at all times.

Any suggestions, ideas would be helpful.
@vDMG

Allow possibilities to add env var to erase some config value

Context

Today, all our password are in clear in our configuration, which is not a good securiy practice.
example

druid.metadata.storage.connector.password=toto
druid.zk.service.pwd

Proposal

We could allow user to specify somme environment variable to erase some values
example

env:
- name: DRUID_METADATA_STORAGE_CONNECTOR_PASSWORD
   valueFrom:
     secretKeyRef:
       name: druid-connector
       key: password

We could imagine to replace after those values in the configuration.
We need to define stricts norms for those environment variables ("_" replace ".",all upper case and starting with "DRUID_".

What are your thoughts about this idea ? @himanshug @AdheipSingh

Getting error ServerDiscoverySelector - No server instance found for [druid/overlord]

Everything starts fine without any errors in the log. I am able to call APIs to check health of brokers, coordinator & historicals but when I call overlord api https:///druid/indexer/v1/leader or any other indexer apis or open web console, I get error in the logs
ServerDiscoverySelector - No server instance found for [druid/overlord]
2019-11-17T03:48:53,704 WARN [qtp715602332-103] org.eclipse.jetty.server.HttpChannel - /druid/indexer/v1/scaling

org.apache.druid.java.util.common.ISE: Couldn't find leader.

    at org.apache.druid.discovery.DruidLeaderClient.findCurrentLeader(DruidLeaderClient.java:262) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.apache.druid.server.http.OverlordProxyServlet.rewriteTarget(OverlordProxyServlet.java:63) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.eclipse.jetty.proxy.ProxyServlet.service(ProxyServlet.java:63) ~[jetty-proxy-9.4.10.v20180503.jar:9.4.10.v20180503]

    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]

    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:73) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:82) ~[druid-server-0.16.0-incubating.jar:0.16.0-in

cubating]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:75) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.apache.druid.server.security.AllowAllAuthenticator$1.doFilter(AllowAllAuthenticator.java:84) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:86) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) ~[jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:724) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.Server.handle(Server.java:531) ~[jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]

    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

Caused by: org.apache.druid.java.util.common.IOE: No known server

    at org.apache.druid.discovery.DruidLeaderClient.getCurrentKnownLeader(DruidLeaderClient.java:297) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.apache.druid.discovery.DruidLeaderClient.makeRequest(DruidLeaderClient.java:132) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.apache.druid.discovery.DruidLeaderClient.makeRequest(DruidLeaderClient.java:140) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

    at org.apache.druid.discovery.DruidLeaderClient.findCurrentLeader(DruidLeaderClient.java:259) ~[druid-server-0.16.0-incubating.jar:0.16.0-incubating]

Coordinator/Overlord not being initiated with config

I'm having a hard time getting the operator to launch a working cluster. From what I can tell from the logs, the coordinator and overlord do not seem to be getting initiated with the config in the yaml deployment (attached as help.txt). The other parts of the cluster seem to be able to find each other.

help.txt

Logs for coordinator:

 2020-03-09T20:21:01.946365074Z 2020-03-09T20:21:01+0000 startup service coordinator
2020-03-09T20:21:01.971997182Z Setting druid.host=10.44.2.78 in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
2020-03-09T20:21:03.19655891Z ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.

Logs for overlord:

 2020-03-09T21:58:57+0000 startup service overlord
Setting druid.host=10.44.3.7 in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.

Logs for Broker:

logs-from-druid-cluster-brokers-in-druid-cluster-brokers-0.txt

At first I was using the prebuilt image from docker to run the operator and ran into the same errors. I am now using a locally built docker image from cloning the repo and building it from there.

The error shows up if I use image: "apache/incubator-druid:0.16.0-incubating" in the configs as well.

I'm running out of ideas for what could be going wrong. Help would be greatly appreciated!

Allow possibility to add PodAnnotations to a running Druid node

Hello,

Today, I'm facing a blocking situation i'm not able to add PodAnnotations on my DruidNodeSpec because of this error :

Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

Someone know a workaround ?
How to properly fix it in the operator ?

Thanks.

Update go modules

Update the go modules to latest available versions.
i guess we should target release 1.16 k8s
operator-sdk release v0.17.1
go 14
@himanshug

Integration Test Suite

On the basis of this https://github.com/kubernetes-sigs/kubebuilder/tree/master/docs/testing.
I am planning to add test case for druid_types.go. Two specs which are to be tested is the DruidClusterSpec and DruidNodeSpec, what to assert is the // Required Params in the structs //.
Here is a sample playaround code.

package v1alpha1_test

import (
	. "github.com/onsi/ginkgo"
	. "github.com/onsi/gomega"

	"golang.org/x/net/context"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/types"

	. "github.com/druid-io/druid-operator/pkg/apis/druid/v1alpha1"
)

var _ = Describe("Druids", func() {
	var (
		key              types.NamespacedName
		created, fetched *Druid
	)

	BeforeEach(func() {
		// Add any setup steps that needs to be executed before each test
	})

	AfterEach(func() {
		// Add any teardown steps that needs to be executed after each test
	})


	// Avoid adding tests for vanilla CRUD operations because they would
	// test Kubernetes API server, which isn't the goal here.
	Context("Create API", func() {

		It("should create an object successfully", func() {
			key = types.NamespacedName{
				Name:      "foo",
				Namespace: "default",
			}
			created = &Druid{
				ObjectMeta: metav1.ObjectMeta{
					Name:      "foo",
					Namespace: "default",
				},
				Spec: DruidClusterSpec{
                                      // Add all the params to be tested
					JvmOptions: "",

				}}

                        // test same for nodespec
                       // add another context for nodespec
			By("creating an API obj")
			Expect(k8sClient.Create(context.TODO(), created)).To(Succeed())

			fetched = &Druid{}
			Expect(k8sClient.Get(context.TODO(), key, fetched)).To(Succeed())
			Expect(fetched).To(Equal(created))

			By("deleting the created object")
			Expect(k8sClient.Delete(context.TODO(), created)).To(Succeed())
			Expect(k8sClient.Get(context.TODO(), key, created)).ToNot(Succeed())

		})

	})

})

Any specifics which you feel needed to be looked upon...... @himanshug @akashdw

OpenAPI Validation for CR

Since the yaml is large lot of parameters are sometimes go unoticed. Can we add a validation openapi for custom resource so we can be checked if a REQUIRED parameter is missing.

Operator-SDK Project Migration

Follow migration guide for new project structure, as linked from 0.19 migration guide
https://github.com/operator-framework/operator-sdk/blob/master/website/content/en/docs/upgrading-sdk-version/v0.19.0.md
https://github.com/operator-framework/operator-sdk/blob/master/website/content/en/docs/building-operators/golang/project_migration_guide.md
Follow migration guide for 1.0 https://github.com/operator-framework/operator-sdk/blob/master/website/content/en/docs/upgrading-sdk-version/v1.0.0.md

@himanshug this requires change in directory structure. Ill create a WIP draft and test it out, to make sure it backward compatible and does not break existing structure.

Unit tests for Deployment Object

Currently their is unit tests only for statefulset.
Add for deployments too.

middlemanager graceful termination

https://druid.apache.org/docs/latest/operations/rolling-updates.html#rolling-restart-graceful-termination-based

add support to customize preStop?

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 4200
      containers:
      - name: druid-xapi-druid-middlemanagers
        lifecycle:
          preStop:
            exec:
              command:
              - sh
              - -c
              - |
                wget --post-data='{}' -q -O - http://localhost:8088/druid/worker/v1/disable > /dev/null && while [[ "$(wget -q -O - http://localhost:8088/druid/worker/v1/tasks)" != "[]" ]]; do sleep 10; done

Submit operator to operatorhub.io

@himanshug

Once #54 is done, can sumbit this operator to operatorhub.io. This will help in gaining some more community and contributions to the operator.

Here are the specific criteria's and checklist to be done before submitting.
https://operatorhub.io/contribute

OLM must be integrated with the current setup. CRD's, RBAC will deployed through this catalog.
https://github.com/operator-framework/operator-lifecycle-manager

Support StatefulSet only For Historicals,MM

Description

Currently the operator provisions statefulset for all the druid processes, regardless of their nature (stateless/statefull )
Brokers, routers, coordinators, overlords are stateless in nature and don't require a persistence.
Statefulsets have their own challenges such as sts does not support rollbacks ( since their is no replicaset associated with sts ), pvc's remain in not reference/hanging state during scale up/down of pods ( though ideally operator should address this problem #43 ). At time sts need to be manually deleted using cascade options.
The operator should support deployment objects for all process accept MM/Historicals.
The current druid_controller can reconcile against deployment events, we don't need a separate controller.
As discussed earlier #17, mostly logging solutions such as fluentd, dd agents running as daemonsets can push logging to external datastores for querying. Using EBS for logging can be an option not necessity,
https://github.com/druid-io/druid-operator/blob/master/pkg/controller/druid/handler.go#L537 this func needs to be split up for seperate podSpec and podTemplateSpec which can be used as common to both deployments.sts.
https://github.com/druid-io/druid-operator/blob/master/pkg/controller/druid/handler.go#L133, a conditional check on the basis of nodeType can be used whether to deploy deployment or sts.
Implementing these changes on existing cluster may require manual intervention.

@himanshug

Use validating webhook for validation of CR

Currently the operator does validation at the operator level. So in case we miss out or miss configure configurations, after we apply the CR, we need to check the operator logs and describe the operator to figure what went wrong. Then do a manual deletion of the CR and then apply again.

With SDK supporting validating webhooks will be able to reject at Kube API level.

@himanshug

SubPath in ConfigMaps

@himanshug

While using a custom entry point for druid, which substitutes some env values during initialisation of pods, i saw the config maps getting mounted were nt able to get copied to the right directory as in the entry point script. something equivalent too https://github.com/apache/druid/blob/master/distribution/docker/druid.sh#L49

i ended up adding subPaths to the configmaps, which basically solved that problem.
We should have in general too avoid confusion.

volumeMount := []v1.VolumeMount{
		{
			MountPath: m.Spec.CommonConfigMountPath + "/common.runtime.properties",
			Name:      "common-config-volume",
			ReadOnly:  true,
			SubPath:   "common.runtime.properties",
		},
		{
			MountPath: m.Spec.CommonConfigMountPath + "/log4j2.xml",
			Name:      "common-config-volume",
			ReadOnly:  true,
			SubPath:   "log4j2.xml",
		},
		{
			MountPath: nodeSpec.NodeConfigMountPath + "/jvm.config",
			Name:      "nodetype-config-volume",
			ReadOnly:  true,
			SubPath:   "jvm.config",
		},
		{
			MountPath: nodeSpec.NodeConfigMountPath + "/runtime.properties",
			Name:      "nodetype-config-volume",
			ReadOnly:  true,
			SubPath:   "runtime.properties",
		},

container "druid-operator" in pod is waiting to start: InvalidImageName

$ kubectl get po
NAME                              READY   STATUS             RESTARTS   AGE
druid-operator-5868c9bb8c-stvsm   0/1     InvalidImageName   0          3m50s

$ kubectl logs druid-operator-5868c9bb8c-stvsm
Error from server (BadRequest): container "druid-operator" in pod "druid-operator-5868c9bb8c-stvsm" is waiting to start: InvalidImageName

DeepStorage Interface extensibility

@himanshug

what are your thoughts/plans on provisioning/terminating deepstorage mainly buckets on cloud platforms (s3, gcs etc ) using this interface ?

Should we have a seperate CR in the operator with kind: Deepstorage ? and have seperate controller for this CR ? ( just a thought )

Or we can just add methods to type deepStorageManager interface, and maybe have a Type S3, GCS in their ( currently its only default ) i think that was mainly why you added this interface

It will be nice to have some management around deepstorage.

Druid in cluster mode: sql exception "tuples must be non-null"

Hello ,

I m trying to run druid on k8s and I've been stuck for the past few days: all the pods are running but I cannot get the "wiki example" working.
The overlord keeps logging the following exception:

error":"org.skife.jdbi.v2.exceptions.CallbackFailedException: java.lang.NullPointerException: tuples must be non-null

I can see the tables created in posgresql. Also, I can see the segments created in my s3 bucket (even though it's very slow) but the datasource is not created

here is my configuration

apiVersion: "druid.apache.org/v1alpha1"
kind: "Druid"
metadata:
  name: druid-cluster
spec:
  image: apache/incubator-druid:0.16.1-incubating
  # Optionally specify image for all nodes. Can be specify on nodes also
  # imagePullSecrets:
  # - name: tutu
  env:
    - name: AWS_REGION
      value: eu-west-1
  startScript: /druid.sh
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsGroup: 1000
  services:
    - spec:
        type: ClusterIP
        clusterIP: None
  commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
  jvm.options: |
    -server
    -XX:MaxDirectMemorySize=10240g
    -Duser.timezone=UTC
    -Dfile.encoding=UTF-8
    -Dlog4j.debug
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
  log4j.config: |
    <?xml version="1.0" encoding="UTF-8" ?>
    <Configuration status="WARN">
        <Appenders>
            <Console name="Console" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
            </Console>
        </Appenders>
        <Loggers>
            <Root level="info">
                <AppenderRef ref="Console"/>
            </Root>
        </Loggers>
    </Configuration>
  common.runtime.properties: |

    # Zookeeper
    druid.zk.service.host=druid-cluster-zk.druid.svc.cluster.local
    druid.zk.paths.base=/druid
    druid.zk.service.compress=false

    # Metadata Store
    druid.metadata.storage.type=postgresql
    druid.metadata.storage.connector.connectURI=jdbc:postgresql://postgres.storage.svc.cluster.local/druid-metadata
    druid.metadata.storage.connector.user=druid
    druid.metadata.storage.connector.password=#{POSTGRES_DRUID_PASSWORD}#
    druid.metadata.storage.connector.createTables=true
    # druid.metadata.storage.type=derby
    # druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527//opt/var/druid_state/derby;create=true

    # Deep Storage
    # druid.storage.type=local
    # druid.storage.storageDirectory=/druid/data/deepstorage
    druid.storage.type=s3
    druid.storage.bucket=#{S3_BUCKET}#
    druid.storage.baseKey=druid/segments
    druid.indexer.logs.directory=data/logs/
    druid.indexer.logs.type=s3
    druid.indexer.logs.s3Bucket=#{S3_BUCKET}#
    druid.indexer.logs.s3Prefix=druid/indexing-logs
    druid.s3.accessKey=#{S3_ACCESS_KEY}#
    druid.s3.secretKey=#{S3_SECRET_KEY}#

    #
    # Extensions
    #
    druid.extensions.loadList=["druid-s3-extensions","druid-kafka-indexing-service","postgresql-metadata-storage"]

    #
    # Service discovery
    #
    druid.selectors.indexing.serviceName=druid/overlord
    druid.selectors.coordinator.serviceName=druid/coordinator

    #
    # Monitoring
    #

    druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
    #druid.emitter=noop
    druid.emitter.logging.logLevel=debug

  nodes:
    brokers:
      nodeType: "broker"
      # Optionally specify for broker nodes
      # imagePullSecrets:
      # - name: tutu
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
      replicas: 1
      livenessProbe:
          initialDelaySeconds: 30
          httpGet:
              path: /status/health
              port: 8088
      readinessProbe:
          initialDelaySeconds: 30
          httpGet:
              path: /status/health
              port: 8088
      log4j.config: |
          <Configuration status="WARN">
            <Appenders>
              <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      services:
        - metadata:
            name: broker-%s-service
          spec:
            clusterIP: None
            ports:
              - name: tcp-service-port
                port: 8088
                targetPort: 8088
            type: ClusterIP
      runtime.properties: |
        druid.service=druid/broker

        # HTTP server threads
        druid.broker.http.numConnections=5
        druid.server.http.numThreads=10

        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=1
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=1
        druid.sql.enable=true
      extra.jvm.options: |
        -Xmx1G
        -Xms1G
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: data-cx-druid-cluster-broker-0

    coordinators:
      nodeType: "coordinator"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
      replicas: 1
      livenessProbe:
            initialDelaySeconds: 30
            httpGet:
              path: /status/health
              port: 8088
      readinessProbe:
            initialDelaySeconds: 30
            httpGet:
              path: /status/health
              port: 8088
      log4j.config: |
          <Configuration status="WARN">
            <Appenders>
              <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      services:
        - metadata:
            name: coordinator-%s-service
          spec:
            clusterIP: None
            ports:
              - name: tcp-service-port
                port: 8088
                targetPort: 8088
            type: ClusterIP
      runtime.properties: |
        druid.service=druid/coordinator

        # HTTP server threads
        druid.coordinator.startDelay=PT30S
        druid.coordinator.period=PT30S
        druid.coordinator.asOverlord.enabled=false

        # Configure this coordinator to also run as Overlord
        # druid.coordinator.asOverlord.enabled=true
        # druid.coordinator.asOverlord.overlordService=druid/overlord
        # druid.indexer.queue.startDelay=PT30S
        # druid.indexer.runner.type=remote
        # druid.indexer.storage.type=metadata
      extra.jvm.options: |
        -Xmx1G
        -Xms1G
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: data-cx-druid-cluster-coordinator-0

    historicals:
      nodeType: "historical"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
      replicas: 1
      livenessProbe:
            initialDelaySeconds: 30
            httpGet:
              path: /status/health
              port: 8088
      readinessProbe:
            initialDelaySeconds: 30
            httpGet:
              path: /status/health
              port: 8088
      log4j.config: |
          <Configuration status="WARN">
            <Appenders>
              <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      services:
        - metadata:
            name: historical-%s-service
          spec:
            clusterIP: None
            ports:
              - name: tcp-service-port
                port: 8088
                targetPort: 8088
            type: ClusterIP
      runtime.properties: |
        druid.service=druid/historical
        druid.server.http.numThreads=5
        druid.processing.buffer.sizeBytes=1
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=1
        # Segment storage
        druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":10737418240}]
        druid.server.maxSize=10737418240
      extra.jvm.options: |
        -Xmx1G
        -Xms1G
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: data-cx-druid-cluster-historical-0

    middlemanagers:
      nodeType: "middleManager"
      druid.port: 8091
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/middleManager"
      replicas: 1
      ports:
        -
          containerPort: 8100
          name: peon-0-pt
        -
          containerPort: 8101
          name: peon-1-pt
        -
          containerPort: 8102
          name: peon-2-pt
        -
          containerPort: 8103
          name: peon-3-pt
        -
          containerPort: 8104
          name: peon-4-pt
      livenessProbe:
          initialDelaySeconds: 30
          httpGet:
            path: /status/health
            port: 8091
      readinessProbe:
          initialDelaySeconds: 30
          httpGet:
            path: /status/health
            port: 8091
      log4j.config: |
          <Configuration status="WARN">
            <Appenders>
              <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      runtime.properties: |
          druid.service=druid/middleManager
          druid.worker.capacity=4
          druid.indexer.runner.javaOpts=-server -XX:MaxDirectMemorySize=10240g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/druid/data/tmp -Dlog4j.debug -XX:+UnlockDiagnosticVMOptions -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=50 -XX:GCLogFileSize=10m -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -XX:HeapDumpPath=/druid/data/logs/peon.%t.%p.hprof -Xms10G -Xmx10G
          druid.indexer.task.baseTaskDir=/druid/data/baseTaskDir
          druid.server.http.numThreads=10
          druid.indexer.fork.property.druid.processing.buffer.sizeBytes=1
          druid.indexer.fork.property.druid.processing.numMergeBuffers=1
          druid.indexer.fork.property.druid.processing.numThreads=1

          # Processing threads and buffers on Peons
          druid.indexer.fork.property.druid.processing.numMergeBuffers=2
          druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
          druid.indexer.fork.property.druid.processing.numThreads=1
      services:
        - metadata:
            name: middlemanager-%s-service
          spec:
            clusterIP: None
            ports:
              -
                name: tcp-service-port
                port: 8091
                targetPort: 8091
              -
                name: peon-port-0
                port: 8100
                targetPort: 8100
              -
                name: peon-port-1
                port: 8101
                targetPort: 8101
              -
                name: peon-port-2
                port: 8102
                targetPort: 8102
              -
                name: peon-port-3
                port: 8103
                targetPort: 8103
              -
                name: peon-port-4
                port: 8104
                targetPort: 8104
            type: ClusterIP
      extra.jvm.options: |
        -Xmx1G
        -Xms1G
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: data-cx-druid-cluster-middlemanager-0

    overlords:
      livenessProbe:
            initialDelaySeconds: 50
            httpGet:
              path: /status/health
              port: 8090
      readinessProbe:
            initialDelaySeconds: 50
            httpGet:
              path: /status/health
              port: 8090
      log4j.config: |
          <Configuration status="WARN">
            <Appenders>
              <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      druid.port: 8090
      extra.jvm.options: |
          -Xmx4G
          -Xms4G
      nodeType: overlord
      nodeConfigMountPath: /opt/druid/conf/druid/cluster/master/coordinator-overlord
      replicas: 1
      runtime.properties: |
          druid.service=druid/overlord
          druid.indexer.queue.startDelay=PT2M
          druid.indexer.queue.restartDelay=PT2M
          druid.indexer.runner.type=remote
          druid.indexer.storage.type=metadata
      services:
        - metadata:
            name: overlord-%s-service
          spec:
            clusterIP: None
            ports:
              - name: tcp-service-port
                port: 8090
                targetPort: 8090
            type: ClusterIP
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: data-cx-druid-cluster-overlord-0
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume

    routers:
      livenessProbe:
            initialDelaySeconds: 30
            httpGet:
              path: /status/health
              port: 8888
      readinessProbe:
            initialDelaySeconds: 30
            httpGet:
              path: /status/health
              port: 8888
      log4j.config: |
          <Configuration status="WARN">
            <Appenders>
              <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      druid.port: 8888
      extra.jvm.options: |
          -Xmx512m
          -Xms512m
      nodeType: router
      nodeConfigMountPath: /opt/druid/conf/druid/cluster/query/router
      replicas: 1
      runtime.properties: |
          druid.service=druid/router
          druid.plaintextPort=8888

          # HTTP proxy
          druid.router.http.numConnections=50
          druid.router.http.readTimeout=PT5M
          druid.router.http.numMaxThreads=100
          druid.server.http.numThreads=100

          # Service discovery
          druid.router.defaultBrokerServiceName=druid/broker
          druid.router.coordinatorServiceName=druid/coordinator

          # Management proxy to coordinator / overlord: required for unified web console.
          druid.router.managementProxy.enabled=true
      services:
        - metadata:
            name: router-%s-service
          spec:
            clusterIP: None
            ports:
              - name: tcp-service-port
                port: 8888
                targetPort: 8888
            type: ClusterIP
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: data-cx-druid-cluster-routeur-0
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume

Roll Out Release

HI, can we roll out a release mentioning the features added.

Operator Should Re-Create Sts when k8s API blocks Sts updates

So most common issue is when i need to expand sts volume for Historical ( or even MM ), as of now here's how its done:

apply new druid-cr, operator can't re-concile new configs since k8s does not allow mutating anything accept podSpec.
So i delete sts using --cascad=false to make sure pod doesnt get deleted and only sts gets deleted.
Operator on next re-conciles creates the new sts with new configs.
I manually expand pvc ( sc supports volume expansion )

What i would like to automate here, is the second step. I can easily delete sts using operator with cascade, what i am not sure at what events will the operator know that k8s api will not update the sts. If we can get this, this will be a big help.

@himanshug any thoughts what can help us trigger this command ?

MiddleManager Druid DeepStorage HostPath Issue

I have my deepstorage set to local. I am using hostpath in middle manager to push data to local ssd. Somehow it keeps on failing to write to deep storage.
Here is my middle manager spec.
I am using deep storage set as this, not sure if i am missing anything.

deepStorage:
    spec:
      properties: |-
         druid.storage.type=local
         druid.storage.storageDirectory=data/segments
         druid.indexer.logs.type=file
         druid.indexer.logs.directory=data/logs/
    type: default

middlemanagers:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              -
                matchExpressions:
                  -
                    key: node-type
                    operator: In
                    values:
                      - druid-data
      druid.port: 8091
      extra.jvm.options: |-
          -Xmx4G
          -Xms4G
      nodeType: middleManager
      nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/middlemanager
      log4j.config: |-
          <Configuration status="WARN">
            <Appenders>
                <Console name="logline" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
              </Console>
              <Console name="msgonly" target="SYSTEM_OUT">
                <PatternLayout pattern="%m%n"/>
              </Console>
            </Appenders>
            <Loggers>
              <Root level="info">
                <AppenderRef ref="logline"/>
              </Root>
              <Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="info">
                <AppenderRef ref="msgonly"/>
              </Logger>
            </Loggers>
          </Configuration>
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      ports:
        -
          containerPort: 8100
          name: peon-0-pt
        -
          containerPort: 8101
          name: peon-1-pt
        -
          containerPort: 8102
          name: peon-2-pt
        -
          containerPort: 8103
          name: peon-3-pt
        -
          containerPort: 8104
          name: peon-4-pt
      replicas: 2
      resources:
        limits:
          cpu: "2"
          memory: 5Gi
        requests:
          cpu: "2"
          memory: 5Gi
      runtime.properties: |-
          druid.service=druid/middleManager
          druid.plaintextPort=8091

          # Number of tasks per middleManager
          druid.worker.capacity=4

          # Task launch parameters
          druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
          druid.indexer.task.baseTaskDir=var/druid/task

          # HTTP server threads
          druid.server.http.numThreads=60

          # Processing threads and buffers on Peons
          druid.indexer.fork.property.druid.processing.numMergeBuffers=2
          druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
          druid.indexer.fork.property.druid.processing.numThreads=1
      services:
        -
          spec:
            clusterIP: None
            ports:
              -
                name: tcp-service-port
                port: 8091
                targetPort: 8091
              -
                name: peon-port-0
                port: 8100
                targetPort: 8100
              -
                name: peon-port-1
                port: 8101
                targetPort: 8101
              -
                name: peon-port-2
                port: 8102
                targetPort: 8102
              -
                name: peon-port-3
                port: 8103
                targetPort: 8103
              -
                name: peon-port-4
                port: 8104
                targetPort: 8104
            type: ClusterIP
      tolerations:
        -
          effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
      volumeMounts:
        -
          mountPath: /opt/apache-druid-0.16.0-incubating/data
          name: data-volume
      volumes:
        -
          hostPath:
            path: /data
          name: data-volume
    overlords:

CR not Giving status

The CRD has been missing this part which is responsible for status

status:
  acceptedNames:
    kind: ""
    plural: ""
  conditions: []
  storedVersions: []

To know what state is in druid cluster, we have events but they only print on CRUD events. ( and describe of cr ) . So we are updating the status but some its now coming up when to doing kubectl get druid my-druid -o yaml -n my-namespace

In the above shot, we are not getting status, basically all k8s objects using runtime.object interface are responsible sending out a status which can also be polled /status endpoint.

Just curious is this something we did on purpose or missed out in the CRD spec.
Ill be evaulting and fixing this anyways. :)
@himanshug

druid-io / druid-operator Goto Github PK

druid-operator's People

Contributors

Stargazers

Watchers

Forkers

druid-operator's Issues

Proposal

ISSUE

Change Log

Context

Proposal

Description

Recommend Projects

Recommend Topics

Recommend Org