datavane / datasophon Goto Github PK

The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.

Home Page: https://datasophon.github.io/datasophon-website/

License: Apache License 2.0

Java 66.49% Shell 10.26% FreeMarker 1.90% Vue 20.62% Stylus 0.01% Less 0.70% Dockerfile 0.02%

doris kubernetes spark yarn cloudnative easy-to-use

datasophon's People

Contributors

Stargazers

Watchers

Forkers

miyingljw wky15163053670 lovingfeel kxg916361108 wuchunfu shuiliuwusheng gtk96 wolfboys xiaozhch5 xingjieshitu gwxog lishijiang diaohancai meitianjinbu liuxuchen hxp0618 emotioncai wanghe-sudo jred1989 lugela tieke1121 webzmj530 jackchenc-lang wmtbnbo gavinke seewei1985 liuzhao-lz jon-qj boy-xiaozhang airtosupply wuhui23 quanzhian yixingod yepeng05 avatartaier kerwinchina lidengtai eyys zzm0809 zhenhb liugddx cn-tingguo bigdatas marlon4dashen phoenixhadoop hitozhu zhunzhong chyx018 rookeyframe cghhnty lipengyu gongweibiao1986 zhu-ming-ye goomhow ruanwenjun sqlking22 guolianwei princesssipi gkcreate freeoneplus shinelord erzhen1379 xedcoing liangxinji duowan1520 hyzhang13 stopit ijihang castiel-chenxian zmusn44 zq0757 gzdproc wwhui alexander1005 minge-b chenss-1 adam1iu synpore liuxin319 olderbabyml yuemeng001 zhaoyangjun0328 zhzhenqin jehol1983 kick156 zhangdw123 itsharex leeoo caishunfeng hhhcommon xiaoxiaoxing88 zuoqunrpc gankhe waylee2015 guanpx jl570156375 killerddd3 e1asy zhangxin1124 zhouyinyan

datasophon's Issues

[Bug]: 新版本后台去启动的 Missing Hive Execution Jar: /opt/datasophon/hive-3.1.0/lib/hive-exec-*.jar

What happened?

A bug happened!
[Bug]: 新版本后台去启动的 Missing Hive Execution Jar: /opt/datasophon/hive-3.1.0/lib/hive-exec-.jar，
应该是[Bug]: opt/datasophon/hive-3.1.3/lib/hive-exec-.jar而且我看注册的hive环境变量也是有问题，不是最新的

Additional Information

No response

[Feature][controller]Add component dependency verification

For example , When you are adding HDFS, you need to verify whether its dependency zk is added

[Feature Request]: add param check on controller

Tell us what feature you want?

add param check on controller

[Feature Request]: Processing constants in ActorUtils class

Tell us what feature you want?

you can set the constants to "public static final String ..."

[Bug]: flink/spark on yarn: submitted by user root application rejected by placement rules.

/opt/datasophon/flink-1.15.2/bin/flink run -t yarn-per-job /opt/datasophon/flink-1.15.2/examples/batch/WordCount.jar --input /test/input/word.txt --output /test/output/fwordcount/

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/datasophon/flink-1.15.2/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/datasophon/hadoop-3.3.3/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-12-09 17:38:31,895 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory ('/opt/datasophon/flink-1.15.2/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2022-12-09 17:38:32,108 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2022-12-09 17:38:32,120 WARN org.apache.flink.yarn.YarnClusterDescriptor [] - Job Clusters are deprecated since Flink 1.15. Please use an Application Cluster/Application Mode instead.
2022-12-09 17:38:32,243 INFO org.apache.hadoop.conf.Configuration [] - resource-types.xml not found
2022-12-09 17:38:32,243 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils [] - Unable to find 'resource-types.xml'.
2022-12-09 17:38:32,286 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured JobManager memory is 1600 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 448 MB may not be used by Flink.
2022-12-09 17:38:32,286 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2022-12-09 17:38:32,286 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cluster specification: ClusterSpecification{masterMemoryMB=1600, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2022-12-09 17:38:33,769 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Removing 'localhost' Key: 'jobmanager.bind-host' , default: null (fallback keys: []) setting from effective configuration; using '0.0.0.0' instead.
2022-12-09 17:38:33,770 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Removing 'localhost' Key: 'taskmanager.bind-host' , default: null (fallback keys: []) setting from effective configuration; using '0.0.0.0' instead.
2022-12-09 17:38:33,799 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Submitting application master application_1670577058759_0008

The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not deploy Yarn job cluster.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:836)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:247)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1078)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1156)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1156)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.
at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:491)
at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:82)
at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1053)
at org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:132)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 11 more
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1670577058759_0008 to YARN : Reject application application_1670577058759_0008 submitted by user root application rejected by placement rules.
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:336)
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1240)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:616)
at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:484)
... 21 more
2022-12-09 17:38:33,842 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cancelling deployment from Deployment Failure Hook
2022-12-09 17:38:33,843 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Killing YARN application
2022-12-09 17:38:33,848 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Killed application application_1670577058759_0008
2022-12-09 17:38:33,849 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deleting files in hdfs://nameservice1/user/root/.flink/application_1670577058759_0008.

[Feature][server] Add Kafka Manager

Extensibility, can I customize big data components and versions through configuration files?

manger 和前端会开源吗

Which chapter do you think needs improvement?

manger 和前端会开源吗

What information do you think needs to be added?

manger 和前端会开源吗

[Feature Request]: add hdfs and yarn decommission

Tell us what feature you want?

add hdfs and yarn decommission

[Bug]: Optimize state synchronization

What happened?

A bug happened!

Additional Information

Add 113 Datanodes in batch, only 23 of which are displayed successfully on the page, while the rest are displayed unsuccessfully, and the subsequent status will slowly change to successful;
Hopes to optimize status synchronization and display intermediate status (such as progress bar) to avoid misleading users

[Feature][server]新增磁盘异常告警指标

在磁盘发生异常，无法读写时，产生告警，展示出主机和磁盘位置

[Feature Request]: If the operation of the master node fails, the slave node will not be executing

Tell us what feature you want?

If the operation of the master node fails, the slave node will not be executing

[Bug]: Cannot cancel the installation task when blocked

What happened?

Cannot cancel the installation task when blocked

Additional Information

No response

[Feature Request]: Add support for Dinky.

Tell us what feature you want?

Dinky( https://github.com/DataLinkDC/dlink ) is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse.

Can dinky be integrated?

[Feature][website] translate file to english

[Feature Request]: 重试按钮

Tell us what feature you want?

1、安装是吧，是否可以有个重试按钮。
2、安装经常存卡在哪里变动了

[Feature Request]: add hdfs disk balancer

Tell us what feature you want?

add hdfs disk balancer

[Feature][server]prometheus web ui port can be dynamically configured

[Feature][server]Add NodeManager Lost Alert

[Feature][server]支持Kerberos管理，支持kerberos友好的开启和关闭

[Elasticsearch]: package is not found

the dashboard can add 'Elasticsearch' ,but the packages is not found package

按照指导手册部署好框架之后，页面上有elasticsearch，但是packages里面没有安装包

[Feature][controller]add a prompt when deleting a service role

if the service role is running，the deletion operation prompt is running

[Feature Request]: find hdfs and yarn master role

Tell us what feature you want?

1 in instance list , you can see what is the master namenode or master resourcemanager

[Feature][server] Add Apache Kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on Data Warehouses and Lakehouses.[1]

Kyuubi provides a docker compose based playground[2] and online try me[3](based-on arm64).

Kyuubi was adopted by AliCloud EMR[4] and TencentCloud EMR[5].

The minimal deps of Kyuubi is Zookeeper 3.4+, and Spark 3.1+ (assuming you want to use Kyuubi as the Spark Thrift Gateway)

[1] https://mp.weixin.qq.com/s/5Sj12_qbQTnCOcZVZUSNhg
[2] https://github.com/apache/incubator-kyuubi/tree/master/docker/playground
[3] https://try.kyuubi.cloud/
[4] https://help.aliyun.com/document_detail/439451.html
[5] https://cloud.tencent.com/document/product/589/72001

[Bug]: Add Service Error

What happened?

Additional Information

when i add hdfs service, but giving up at last step, error happend.
we can find two hdfs service in list and can not delete it

[Feature Request]:add rolling restart

Tell us what feature you want?

hdfs , yarn , add rolling restart

[Feature][server] StarRocks组件支持FE高可用

[Bug]: 在线文档打不开了

What happened?

在线文档：https://gaodayu168.github.io/datasophon-website打不开，无法访问

Additional Information

No response

[Feature Request]: 自定义配置安装路径及hdfs nameservice 名称

Tell us what feature you want?

1、目前无法自定义安装目录名称，代码中有大量写死安装路径为/opt/datasophon，导致在common.properties文件中修改了install.path属性后安装报错，希望可以真正实现自定义配置安装路径。
2、hdfs 修改nameservice后，dfs.ha.namenodes.nameservice1，dfs.namenode.rpc-address.nameservice1.nn1等多个依赖nameservcie 值的配置项没有更改为配置的nameservice ,希望优化。

[Test][service]添加/etc/hosts文件解析测试类

[Feature][server]支持Yarn标签调度管理

[Feature][server]新增集群网络异常告警

[Bug][service]内存使用显示为负数，部分机器总内存展示不正确

[Feature][server]新增Kafka磁盘使用率分布指标和图表

[Test][service]添加主机列表解析测试类

[Feature][api] 新增系统租户管理

[Feature][server]集成Apache DolphinScheduler，支持弹性部署和配置化管理

[Feature][server] Add Kibana service

Tell us what feature you want?

elasticsearch's web operator need kibana-web

[Feature Request]: README.md translate to english

Tell us what feature you want?

README.md translate to english

[Feature][service]新增支持删除服务/组件

删除前需要停止服务角色
服务之前有依赖关系需要提示用户，比如kafka以来zk，删除zk，需要提示kafka会不可用

Who is using DataSophon?

Who is using DataSophon？

Sincerely thank everyone who constantly keeps on using and supporting DataSophon. We will try our best to make DataSophon better and make the community and ecology more prosperous.

The original intention of this issue
We’d like to listen to the community to make DataSophon better.
Learn more about the practical use scenarios of DataSophon to facilitate the next step of planning.
What we expect from you
Please submit a comment in this issue to include the following information:

logo: your company/school/organization logo.
name: your company/school/organization name
website: your company/school/organization website
contact: contact info, e.g: blog, email, Twitter (at least one).
usage Scenario: for what business scenario do you use DataSophon.

[Feature Request]: Service role operations can be executed concurrently

Tell us what feature you want?

Service role operations can be executed concurrently such as start , stop and restart

[Feature][server]新增支持GPU，FPGA资源调度管理

[Bug]: 服务明明没起来，但是页面显示正常状态

What happened?

1、zookeeper 服务未起来，像hdfs、kafka等依赖zk，但是页面上一直都提示是正常状态
2、所有服务不管能不能启动成功，任务进度条都是100% ok

Additional Information

No response

[Feature Request]: 是否可以支持接管已经使用Apache版本部署的大数据集群

Tell us what feature you want?

背景

很多小厂没有预算去购买EMR，且HDP 、CDH集成度太高，可能没有满足业务要求的组件版本套件，比如hdp3 Hadoop是3.x的版本，但spark却是2.x。

所有有些小厂会选择使用Apache版本进行集成。比如我们，当时我集成花了大量时间，而且集成完之后集群依旧不好管理，只能编写一些自动化脚本简化日常运维操作，可视化程度非常低。

Wants

如果后期有计划支持接管已经在线上使用的集群那就更好了。没有的话，我们可能只能自研，使用salt等运维工具实现类似一个Supervisor的功能。

望考虑datasophon实现之可行性。

[Feature][server]新增Flink默认集成Iceberg，Hudi

[Bug]: NullPointerException when dispatch agent

What happened?

when processing to this step, the runtime log shows the following exception:

[INFO] 2022-12-04 14:15:45 com.datasophon.api.master.handler.host.StartWorkerHandler:[66] - end dispatcher host agent :ctyun9
[INFO] 2022-12-04 14:15:51 com.datasophon.api.master.alert.HostCheckActor:[27] - start to check host info
[INFO] 2022-12-04 14:15:56 com.datasophon.api.master.WorkerStartActor:[39] - receive message when worker first start :ctyun9
[INFO] 2022-12-04 14:15:56 com.datasophon.api.master.WorkerStartActor:[55] - host install set to 100%
[INFO] 2022-12-04 14:15:56 com.datasophon.api.master.WorkerStartActor:[69] - host install save to database
[ERROR] [12/04/2022 14:15:56.773] [datasophon-akka.actor.default-dispatcher-156] [akka://datasophon/user/master/prometheusActor] null
java.lang.NullPointerException
at com.datasophon.api.master.PrometheusActor.onReceive(PrometheusActor.java:144)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Additional Information

No response

[Enhancement]: Eliminate several issues of com.datasophon.api.master.ActorUtils

Which chapter do you think needs improvement?

1、remove unnecessary imports.
2、provide a private no-argument constructor.
3、"seconds" should be eliminate.

What information do you think needs to be added?

No response

[Feature Request]: remove jsch

Tell us what feature you want?

Because jsch runs abnormally under different operating systems, we need to remove it and use linux ssh to implement its functions

datavane / datasophon Goto Github PK

datasophon's People

Contributors

Stargazers

Watchers

Forkers

datasophon's Issues

What happened?

Additional Information

Tell us what feature you want?

Tell us what feature you want?

Which chapter do you think needs improvement?

What information do you think needs to be added?

Tell us what feature you want?

What happened?

Additional Information

Tell us what feature you want?

What happened?

Additional Information

Tell us what feature you want?

Tell us what feature you want?

Tell us what feature you want?

Tell us what feature you want?

What happened?

Additional Information

Tell us what feature you want?

What happened?

Additional Information

Tell us what feature you want?

Tell us what feature you want?

Tell us what feature you want?

Tell us what feature you want?

What happened?

Additional Information

Tell us what feature you want?

背景

Wants

What happened?

Additional Information

Which chapter do you think needs improvement?

What information do you think needs to be added?

Tell us what feature you want?

Recommend Projects

Recommend Topics

Recommend Org