Giter Club home page Giter Club logo

doris-operator's Introduction

English | 中文

doris-operator

Doris-operator for doris creates, configures and manages doris cluster running on kubernetes. Operator provide deploy and manage fe, be, cn,broker components. Users custom DorisCluster CRD to deploy doris as demand.

Features

  • Create Doris clusters by custom DorisCluster resource
  • Customized storage provisioning(VolumeClaim templates)
  • Customized pod templates
  • Doris configuration management
  • Doris version upgrades
  • Provided HorizontalPodAutoscaler v1 and v2 versions for compute node.

Requirements

  • Kubernetes 1.19+
  • Doris's components need 8c cpu and 8G memory at least to normal start.

Installation

  1. Install custom resource definitions:
kubectl create -f https://raw.githubusercontent.com/selectdb/doris-operator/$(curl -s  https://api.github.com/repos/selectdb/doris-operator/releases/latest | grep tag_name | cut -d '"' -f4)/config/crd/bases/doris.selectdb.com_dorisclusters.yaml
  1. Install the operator with its RBAC rules:
    the default deployed namespace is doris, when deploy on specific namespace, please pull yaml and update namespace field.
kubectl apply -f https://raw.githubusercontent.com/selectdb/doris-operator/$(curl -s  https://api.github.com/repos/selectdb/doris-operator/releases/latest | grep tag_name | cut -d '"' -f4)/config/operator/operator.yaml

Get Started to Deploy Doris

The Quick Start Guide have some examples to deploy doris on kubernetes. they represent some mode to deploy doris on different situation.
Example specify 8 cores and 16GB of memory for every fe or be, and deployed 3 fe and 3 be. Please confirm the K8s cluster have enough resources.
for only deploy fe and be without persistentVolume:

kubectl apply -f https://raw.githubusercontent.com/selectdb/doris-operator/$(curl -s  https://api.github.com/repos/selectdb/doris-operator/releases/latest | grep tag_name | cut -d '"' -f4)/doc/examples/doriscluster-sample.yaml

This doriscluster-sample-storageclass.yaml displayed to deploy doris with StorageClass mode to provide persistent Volume.

Notice

  1. currently operator only supports the fqdn mode to deploy doris on kubernetes. when the operator uses the official image to deploy container, the relevant work service will set the enable_fqdn_mode as true automatically. by running the doris docker container without k8s-operator, fqdn mode is closed by default. for other configurations about deploying doris on kubernetes, refer to example/doriscluster-sample-configmap.yaml.
  2. fe and be print log by kubectl logs -ndoris -f ${pod_name} also in /opt/apache-doris/fe/log, /opt/apache-doris/be/log in pod. When have not log processing system on k8s, mount a volume for log directory is good idea. the config to mount volume for log can reference the docexample/doriscluster-sample-storageclass.yaml.

doris-operator's People

Contributors

austinlmayes avatar catpineapple avatar freeoneplus avatar intelligentfu avatar kpfly avatar lemonlitree avatar mel3c avatar mklzl avatar xiedeyantu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

doris-operator's Issues

pod config annotations not effect

now annotations in baseSpec are not added in pod annotations. allow set annotations to adapt privatization environments. example: use annotation for static ip.

update service failed

when update the service config, doris operator error
metadata.resourceversion invalid value ''’ must be specified for an update.
WechatIMG619

bug: volume resize does not work

Description

I provisioned the cluster with the following spec and it works.

DorisCluster spec
  feSpec:
    replicas: 2
    image: selectdb/doris.fe-ubuntu:2.1.2
    service:
      type: LoadBalancer
    persistentVolumes:
    - mountPath: /opt/apache-doris/fe/doris-meta
      name: fetest
      persistentVolumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    - mountPath: /opt/apache-doris/fe/log
      name: felog
      persistentVolumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  beSpec:
    replicas: 3
    image: selectdb/doris.be-ubuntu:2.1.2
    service:
      type: LoadBalancer
    persistentVolumes:
    - mountPath: /opt/apache-doris/be/storage
      name: betest
      persistentVolumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 40Gi
    - mountPath: /opt/apache-doris/be/log
      name: belog
      persistentVolumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi

I changed the storage size for all volumes and the DorisCluster CR spec has been updated. But it is not applied to underlying resources like statefulset and PVCs. I feel like there is no reconcile logic for that.

There is no error logs in doris operator. Here is the logs.

doris-operator log
I0607 09:52:22.652461       1 doriscluster_controller.go:91] DorisClusterReconciler reconcile the update crd name test-c1 namespace doris
I0607 09:52:22.652602       1 client.go:35] CreateOrUpdateService service Name, Ports, Selector, ServiceType, Labels have not change namespace doris name test-c1-fe-internal
I0607 09:52:22.652670       1 client.go:35] CreateOrUpdateService service Name, Ports, Selector, ServiceType, Labels have not change namespace doris name test-c1-fe-service
I0607 09:52:22.653179       1 statefulset.go:96] the statefulset name test-c1-fe new hash value 1386567562 old have value 1386567562
I0607 09:52:22.653196       1 client.go:54] ApplyStatefulSet Sync exist statefulset name=test-c1-fe, namespace=doris, equals to new statefulset.
I0607 09:52:22.653247       1 client.go:35] CreateOrUpdateService service Name, Ports, Selector, ServiceType, Labels have not change namespace doris name test-c1-be-internal
I0607 09:52:22.653365       1 client.go:35] CreateOrUpdateService service Name, Ports, Selector, ServiceType, Labels have not change namespace doris name test-c1-be-service
I0607 09:52:22.653812       1 statefulset.go:96] the statefulset name test-c1-be new hash value 3597274972 old have value 3597274972
I0607 09:52:22.653826       1 client.go:54] ApplyStatefulSet Sync exist statefulset name=test-c1-be, namespace=doris, equals to new statefulset.
I0607 09:52:22.653837       1 controller.go:201] Doris cluster is not have cn
I0607 09:52:22.653845       1 controller.go:201] Doris cluster is not have cn

Also DorisCluster status is available.

NAME     FESTATUS    BESTATUS    CNSTATUS   BROKERSTATUS
test-c1   available   available 

Expectation

THe volume size change in DorisCluster spec should be applied.

How to reproduce

  1. Create a cluster with any size of volume
  2. Increase that volume size after the cluster is up

是否有计划支持FE和BE节点缩容的操作?

当前1.2.0版本operator看FE、BE扩容是通过各自的entrypoint.sh脚本来完成节点注册到集群的操作。

但是进行缩容操作的时候,Operator只是通过sts来进行Pod缩容,并没有在集群中对缩容节点进行处理,如BE节点的安全下线操作。

当be缩容后,通过show backends查看就是错误状态,如下图:
image

rewrite entrypoint script

The entrypoint script for doris components to register self in cluster.
scripts rely on the command output of show frontends; but, the script uses an index to fetch the value that will not be right if the output sequence changes.

  • fe_entrypoint.sh find master by parse column name IsMaster to avoid query sequence change. fe entrypoint
  • be_entrypoint.sh find master by parse column name IsMaster to avoid query sequence change.be entrypoint

[Feature] support pod crash debug

when a pod of doris crashes and does not restart successfully, we should exec the pod for manual debugging.
when a pod crashes, we can add an annotation selectdb.com.doris/runmode=debug, the next start of the pod should run in debug mode. The pod will re-enter running status in the next restart. when you have finished debugging the service, you can delete the pod or cancel the annotation, the pod will run in normal mode at the next start.

[Feature] Support for Arrow Flight SQL Port Configuration in Doris Operator

Description:

The Apache Doris v2.1 supports Arrow Flight SQL channel out of the box. To enable Arrow Flight SQL, the arrow_flight_sql_port needs to be configured in fe/conf/fe.conf and be/conf/be.conf [1]. Adding these properties in feSpec.configMap.fe.conf and beSpec.configMap.be.conf enables the Arrow Flight SQL port on BE and FE.

However, this configuration does not expose the port on the pod and the service when deploying Doris using the DorisCluster CRD. Any attempt to manually expose it by editing the StatefulSet and Service results in the following error (with PVC as well):

2024-06-03 16:28:33,158 INFO (UNKNOWN fe_a9d8b97b_7b57_4961_ac21_0ffa0bbcc532(-1)|1) [Env.waitForReady():1067] wait catalog to be ready. feType:UNKNOWN isReady:false, counter:101 reason:

This issue persists even when metadata_failure_recovery=true is set in fe.conf [2].

Request:

It would be highly beneficial if the doris-operator could support the configuration for the Arrow Flight SQL port on BE and FE. This enhancement will facilitate seamless deployment and usage of the Arrow Flight SQL feature in Doris.

References:
[1] Arrow Flight SQL in Apache Doris
[2] apache/doris#4322

AdminUser not work as expected.

Hi team:

As refer from this link: https://doris.apache.org/zh-CN/docs/install/cluster-deployment/k8s-deploy/root-user-use/ I'm trying to set root password.

Step 1: apply DorisCluster without add adminUser.name and adminUser.password. the default empty password works for me, both be & fe start well.
Step 2: login doris with mysql cli, run SHOW ALL GRANTS; and set password for 'root' = password('pwd'), verify login with mysql cli new password.
Step 3: update DorisCluster and add adminUser.name and adminUser.password, apply by kubectl. new started BE continuing log the error message.

The following error in BE pod as follow:

[Thu Jun 13 03:42:23 UTC 2024] [info] use root no password show frontends result ERROR 1045 (28000): Access denied for user '[email protected]' (using password: NO) .
ERROR 1045 (28000): Access denied for user '[email protected]' (using password: YES)

Images:

  1. selectdb/doris.k8s-operator:1.5.2
  2. selectdb/doris.be-ubuntu:2.1.3
  3. selectdb/doris.fe-ubuntu:2.1.3

Also check the issue: #131

helm 安装通过命令行指定集群名和命令空间不生效

helm版本:v3.7.1

通过以下命令安装,指定集群名称和命名空间不生效:

helm install -n default doris-test ./ -f values.yaml

输出提示:看上去是对的
NAME: doris-test
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing doris-1.4.1

但是还是部署到了doris的命名空间下,集群名称为:doriscluster-helm

看着是_helpers.tpl 里面设置的模板有些问题。

support more one configmap config for doris

now doriscluster CR provide one configmap config in doris components. The configmap as file in read-only volume, for the application to read, The files mount the point "/etc/doris/".
the files in "/etc/doris" for doris components starting. But, the plugins provided by doris also need configuration to start. Doris Operator should provide the ability to add a config file for the plugin.

[Feature] pre initial config of host for deploy doris

present situation

Now Doris deployed host has some restrictive limits for deploying. for example:
vm.max_map_count,
ulimit -n
swapoff -a
For the user environment, These limits have not been initialed. If not, users should configure these by themself. It is a burden to manage these limits. Doris-Operator should pre-config restrictive limits.

Solution

CRD set a field DisableDefaultPreInitial
construct initContainer for the initial config of the host.

[bug]-按文档跑示例fe只能成功启动一个

参考文档: https://doris.apache.org/zh-CN/docs/install/k8s-deploy

kubectl apply -f https://raw.githubusercontent.com/selectdb/doris-operator/master/config/crd/bases/doris.selectdb.com_dorisclusters.yaml    
kubectl apply -f https://raw.githubusercontent.com/selectdb/doris-operator/master/config/operator/operator.yaml
kubectl apply -f https://raw.githubusercontent.com/selectdb/doris-operator/master/doc/examples/doriscluster-sample.yaml

只是将名称doriscluster-sample修改了一下名称为了:doriscluster

image

错误信息:

ERROR (stateListener|97) [Env.checkCurrentNodeExist():1601] current node doriscluster-fe-1.doriscluster-fe-internal.doris.svc.cluster.local:9010 is not added to the cluster, will exit. Your FE IP maybe changed, please set 'priority_networks' config in fe.conf properly.

看提示是说需要设置priority_networks,我把,fe.conf加上以下参数,挂载到pod也没有任何效果:

priority_networks = 10.42.0.0/16
enable_fqdn_mode = true

image

请帮忙看看哪里有问题呢?

doris 能够部署成功,但更新yaml后重新部署,不自动重启be,fe等pod

operator报错2024-04-10T12:04:12Z ERROR Reconciler error {"controller": "doriscluster", "controllerGroup": "doris.selectdb.com", "controllerKind": "DorisCluster", "DorisCluster": {"name":"doriscluster-zelos","namespace":"doris"}, "namespace": "doris", "name": "doriscluster-zelos", "reconcileID": "cb2fa73c-c75e-4f07-9807-388357fb09fb", "error": "Service "doriscluster-zelos-fe-service" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset"}

Doris 2.1 show frontends return adjusted

Return result adjustment affects cluster initialization and master node acquisition, requiring adaptation.

mysql> show frontends;
+-----------------------------------------+----------------------------------------------------------------------------------+-------------+----------+-----------+---------+--------------------+----------+----------+-----------+------+-------+-------------------+---------------------+---------------------+----------+--------+------------------------------+------------------+
| Name | Host | EditLogPort | HttpPort | QueryPort | RpcPort | ArrowFlightSqlPort | Role | IsMaster | ClusterId | Join | Alive | ReplayedJournalId | LastStartTime | LastHeartbeat | IsHelper | ErrMsg | Version | CurrentConnected |
+-----------------------------------------+----------------------------------------------------------------------------------+-------------+----------+-----------+---------+--------------------+----------+----------+-----------+------+-------+-------------------+---------------------+---------------------+----------+--------+------------------------------+------------------+
| fe_0dba4cb9_55a4_44ac_b82c_a81d8137532f | doriscluster-sample-fe-0.doriscluster-sample-fe-internal.doris.svc.cluster.local | 9010 | 8030 | 9030 | 9020 | -1 | FOLLOWER | true | 847488715 | true | true | 39 | 2024-01-16 05:26:10 | 2024-01-16 05:28:41 | true | | doris-0.0.0-trunk-9ef4e49307 | Yes |
+-----------------------------------------+----------------------------------------------------------------------------------+-------------+----------+-----------+---------+--------------------+----------+----------+-----------+------+-------+-------------------+---------------------+---------------------+----------+--------+------------------------------+------------------+
1 row in set (0.00 sec)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.