gradiant / bigdata-charts Goto Github PK
View Code? Open in Web Editor NEWCurated Big Data Applications for Kubernetes
License: Apache License 2.0
Curated Big Data Applications for Kubernetes
License: Apache License 2.0
Hi
Please update information in artifacthub,
not view new version
last rename repository
https://artifacthub.io/packages/helm/gradiant/jupyter
I am trying to copy a local file on my HDFS, deployed with the helm chart.
I am doing :
helm install hdfs https://github.com/Gradiant/charts/releases/download/hdfs-0.1.0/hdfs-0.1.0.tgz -f hdfs-values.yaml
My hdfs-values file :
conf:
coreSite:
hdfsSite:
dfs.replication: 2
dataNode:
replicas: 2
pdbMinAvailable: 2
resources:
requests:
memory: "256Mi"
cpu: "10m"
limits:
memory: "2048Mi"
cpu: "1000m"
kubectl get pods
show that all pods are running and ready.
NAME READY STATUS RESTARTS AGE
hdfs-httpfs-5686fd75df-2pgk7 1/1 Running 0 59m
hdfs-namenode-0 2/2 Running 1 59m
hdfs-datanode-0 1/1 Running 0 59m
hdfs-datanode-1 1/1 Running 0 58m
I use port forward to access to the K8S HDFS from my local machine :
# namenode web UI
kubectl port-forward svc/hdfs-namenode 50070:50070
# hdfs port
kubectl port-forward hdfs-namenode-0 8020:8020
On my local machine, I have just unzipped a hadoop 2 distribution (2.10.0) and updated core-site.xml like this, to use the forwarded port :
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.
</description>
<final>true</final>
</property>
# ok :
hdfs dfs -mkdir /jars
# not ok :
hdfs dfs -put helloSpark.jar /jars
20/12/11 09:50:53 INFO hdfs.DataStreamer: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1699)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1655)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)
20/12/11 09:50:53 WARN hdfs.DataStreamer: Abandoning BP-831521929-10.42.1.6-1607678530556:blk_1073741827_1003
20/12/11 09:50:53 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.42.1.5:50010,DS-4153f502-30da-42d7-a415-69601658066a,DISK]
I don't have any error in the datanodes logs, and in the namenode , the only error is the same as above.
Did I miss something in the configuration?
Thanks :)
When we deploy the hbase
helm chart, the namenode
replicas should be configurable. Currently, it is currently hard-coded to 1
.
This is a single point of failure. See output of kubectl get pods | grep hbase
:
[ec2-user@ip-xyz ~]$ kubectl get pods | grep hbase
dev-hbase-hbase-master-0 1/1 Running 1 16m
dev-hbase-hbase-master-1 1/1 Running 0 14m
dev-hbase-hbase-master-2 1/1 Running 0 14m
dev-hbase-hbase-regionserver-0 1/1 Running 2 16m
dev-hbase-hbase-regionserver-1 1/1 Running 0 14m
dev-hbase-hbase-regionserver-2 1/1 Running 0 14m
dev-hbase-hdfs-datanode-0 1/1 Running 0 16m
dev-hbase-hdfs-datanode-1 1/1 Running 0 15m
dev-hbase-hdfs-datanode-2 1/1 Running 0 15m
dev-hbase-hdfs-httpfs-5fff67bcb8-8mkpj 1/1 Running 0 16m
dev-hbase-hdfs-namenode-0 2/2 Running 2 16m
dev-hbase-zookeeper-0 1/1 Running 0 16m
dev-hbase-zookeeper-1 1/1 Running 0 16m
dev-hbase-zookeeper-2 1/1 Running 0 16m
We are deploy this chart:
https://artifacthub.io/packages/helm/gradiant/hbase
Environment:
Steps to reproduce:
from impala.dbapi import connect
from impala.util import as_pandas
import pandas as pd
import os
Connection
conn = connect(host=os.environ['IP_HIVE'], port=10000, user=os.environ['USER'],
password=os.environ['PASSWORD'], auth_mechanism='PLAIN')
Writing to a Hive table
cursor = conn.cursor()
cursor.execute('CREATE TABLE default.helloworld (hello STRING,world STRING)')
cursor.execute("insert into default.helloworld values ('hello1','world1')")
Expected: values are inserted
Actual:
df = as_pandas(cursor)
print(df.head())```
tab_name
0 helloworld```
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File file:/user/hive/warehouse/helloworld2 does not exist
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2886)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:3297)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:2022)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:360)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
... 11 more
Caused by: java.io.FileNotFoundException: File file:/user/hive/warehouse/helloworld2 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at org.apache.hadoop.hive.io.HdfsUtils$HadoopFileStatus.<init>(HdfsUtils.java:211)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2884)
... 22 more```
ISSUE
The main pod opentsdb
crashes afer some time/some amount of data processed. In our case it is regular - after about two weeks of usage.
The opentsdb
logs:
opentsdb-create-hbase-tables Checking if given compression is supported...
opentsdb-create-hbase-tables 2022-08-16 07:59:33,333 INFO [main] metrics.MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesIm
opentsdb-create-hbase-tables 2022-08-16 07:59:33,350 INFO [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
opentsdb-create-hbase-tables 2022-08-16 07:59:33,523 INFO [main] zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
opentsdb-create-hbase-tables 2022-08-16 07:59:33,538 INFO [main] compress.CodecPool: Got brand-new compressor [.gz]
opentsdb-create-hbase-tables 2022-08-16 07:59:33,542 INFO [main] compress.CodecPool: Got brand-new compressor [.gz]
opentsdb-create-hbase-tables 2022-08-16 07:59:33,727 INFO [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
opentsdb-create-hbase-tables 2022-08-16 07:59:33,741 INFO [main] compress.CodecPool: Got brand-new decompressor [.gz]
opentsdb-create-hbase-tables SUCCESS
opentsdb-create-hbase-tables HBase script:
opentsdb-create-hbase-tables ----
opentsdb-create-hbase-tables create 'tsdb-uid',
opentsdb-create-hbase-tables {NAME => 'id', COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'},
opentsdb-create-hbase-tables {NAME => 'name', COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables create 'tsdb',
opentsdb-create-hbase-tables {NAME => 't', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables create 'tsdb-tree',
opentsdb-create-hbase-tables {NAME => 't', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables create 'tsdb-meta',
opentsdb-create-hbase-tables {NAME => 'name', COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables ----
opentsdb-create-hbase-tables Checking if opentsdb tsdb-uid hbase table exists
opentsdb-create-hbase-tables java exception
PRECONDITIONS
chart version - 0.1.7
custom values:
config: {"tsd.http.request.enable_chunked":true,"tsd.http.request.max_chunk":65535}
daemons: 1
hbase:
enabled: true
hbase:
master:
replicas:3
regionServer:
replicas: 3
resources:
requests:
memory: 1Gi
cpu: 100m
hdfs:
enabled: true
dataNode:
replicas: 3
persistence:
nameNode:
enabled: true
accessMode: ReadWriteOnce
size: 5Gi
dataNode:
enabled: true
accessMode: ReadWriteOnce
size: 10Gi
zookeeper:
enabled: true
replicaCount: 1
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchLabels:
release: zookeeper
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 1Gi
resources:
requests:
memory: 1Gi
cpu: 100m
STEPS TO REPRODUCE
Install chart with custom values, use it as normal for a while (about a week or two depending on configuration)
When using external zookeeper and deploying hbase, the following error occurs.
(#helm install hbase --namespace hadoop --set hdfs.enabled=false --set conf.hbaseSite."hbase.rootdir"="hdfs://hive-hdfs-namenode:8020/hbase" --set zookeeper.enabled=false --set conf.hbaseSite."hbase.zookeeper.quorum"="kafka-zookeeper-headless:2181" ./ )
---------logs------------
==> /opt/hbase-2.0.1/logs/hbase--master-hbase-hbase-master-0.log <==
2020-05-12 09:27:40,835 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.IllegalArgumentException: java.net.UnknownHostException: hbase-hdfs-namenode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:681)
at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:590)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:472)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2903)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2921)
Caused by: java.net.UnknownHostException: hbase-hdfs-namenode
... 28 more
2020-05-12 09:27:40,846 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2910)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2921)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hbase-hdfs-namenode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:681)
at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:590)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:472)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2903)
... 5 more
Caused by: java.net.UnknownHostException: hbase-hdfs-namenode
... 28 more
I am trying to install Hbase on openshift k8s cluster , I am getting below exception for HMaster to start
==> /opt/hbase-2.0.1/logs/hbase--master-my-hbase-hbase-master-0.log <==
2021-07-21 23:02:16,154 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.IllegalArgumentException: java.net.UnknownHostException: my-hbase-hdfs-namenode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
Caused by: java.net.UnknownHostException: cstone-hbase-hdfs-namenode
... 28 more
2021-07-21 23:02:16,156 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
I feel its because of HDFS Namenode connection, So I have tried installing hdfs chart , Pods running fine but when issuing any command its throwing below error (Which is similar to above )
1000760000@hadoop-hdfs-httpfs-9c76bf768-njdx6:/opt/hadoop-2.7.7$ hadoop fs -ls .
-ls: java.net.UnknownHostException: hadoop-hdfs-namenode
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]
cat /etc/hosts
Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.32.8.177 hadoop-hdfs-httpfs-9c76bf768-njdx6
Can somebody helm me on this ?
@cgiraldo Came across on helm hub a repo for prestosql by you but went to fetch and got a nice little error that the release is missing :(?
helm fetch gradiant/prestosql --version 0.1.0
Error: failed to fetch https://github.com/Gradiant/charts/releases/download/prestosql-0.1.0/prestosql-0.1.0.tgz : 404 Not Found
I'm not sure if this is intentional. I'm not using gitNotebooks and my pv for /home/jovyan has several things included (e.g. some special ssh keys I need). Now even with gitNotebooks disabled, it only mounts the subpath git-notebooks.sh so all my other things from the pv are not in there anymore.
Looks like most of the stack pods are failing to start after pod termination because of DNS entries are not registered fast enough in headless services (depends on cloud venodr, in GKE it's up to 60s), for example:
hdfs-namenode-0 namenode java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-namenode
This may also influence other services (such as kafka).
Fix:
spec.publishNotReadyAddresses: true
, which will enforce registration of dns hosts even if they are not ready. Ready state is based on pod readiness/liveness probes, if they pass, the given pod is added to the endpoints. In this setup enforcing publishing DNS entries which are not read is not an issue, actually this is expected because the way Hadoop stack was designed. So the DNS entries should be added, while appropriate java apps will handle the actual availability of the processes within pods.Reference:
It's not DNS
There's no way it's DNS
It was DNS
Right now there is antiaffinity setting 'soft' and 'hard' but it is hardcoded to the specific entries.
I need more fine-grained affinity rules, such as availability zone i GKE.
So there are two options:
Init containers use shell script to set up OpenTSDB tables in HBASE.
This script is using certain env vars to adjust the table initialization, but because current implementation does not allow passing any env vars to the pods, thus it is not really usable - the user is forced to use defaults or manually control that part of the process.
Fix:
Other notes:
thrift://{{.Release.Name}}-metastore:9083 <-- what is release name var?
HDFS uses rack awareness to contol where to store replicas https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html
Notice this is not the same as antiaffinity, and antiaffinity can only solve this issue partially
Current setup does not support rack awareness, not to mention that it is quite complex and actually require additonal info about nodes used to host pods for hdfs datanodes, such as labels on the nodes.
Need to investigate this in the future, just adding this as issue to keep it in mind.
When I install the current chart version like this :
helm install hdfs gradiant/hdfs
, I am unable to use httpfs to create directories or files :
http://httpfs-host/webhdfs/v1/user/loic/test?op=MKDIRS&user.name=loic
gives me a 500 Internal Server Error :
{
"RemoteException": {
"message": "User: hdfs is not allowed to impersonate loic",
"exception": "RemoteException",
"javaClassName": "org.apache.hadoop.ipc.RemoteException"
}
}
However, it is working when I'm using the 0.1.0 version like this :
helm install hdfs https://github.com/Gradiant/charts/releases/download/hdfs-0.1.0/hdfs-0.1.0.tgz
Hi,
Is there any special reason for the missing "clusterIP: None" in Hdfs NameNode Service compared to the one in helm stable? If the service type is ClusterIP the datanode IP will be masqued by the kubernetes iptables and the namenode will register it by the worker node's IP address. This will fail in for example a single node MicroK8s with multiple datanodes as the namenode will overwrite the registration and only one datanode will be registered at a time. I had some headache over this;-.
BR
/Peter
When hive was installed by helm under k8s, hive server pod reported the following error:
"Readiness probe failed: Get http://10.42.1.234:10002/: dial tcp 10.42.1.234:10002: connect: connection refused".
--------------logs----------
Configuring core
/tmp': No such file or directory chmod:
/user/hive/warehouse': No such file or directoryhive charts always can't pass livenessProbe check. I try to use version 0.1.5
and 0.1.4
.
hive-server error info is:
Readiness probe failed: Get "http://172.17.0.9:10002/": dial tcp 172.17.0.9:10002: connect: connection refused
Hi,
Thanks for this great helm chart, but when I installed the helm chart without changing any of the default values hive-server would keep crashing and restarting. I used the command helm install my-hive gradiant/hive --version 0.1.4
What else can I try? Any help would be appreciated
I don't see a reason why opentsdb needs to be a statefulset.
It's pretty stateless service because it does not store anything on the disks except temporaty graphs.
I believe it is better to convert it to a deployment with 60s graceful shutdown so it can flush data to the hbase backend.
It would also make it much easier to scale it, especially based on CPU.
Hi!
I'm testing hbase and have found your repository (and used it while it was called gradiant/charts); specifically for hbase
As testing goes, I had to reinstall and then I hit a snag; I think the problem is that hbase
is missing a release asset; would you be so kind to publish hbase-0.1.6?
helm install hbase bigdata-gradiant/hbase --set persistence.dataNode.storageClass=local-path,persistence.storageClass=local-path --debug
install.go:173: [debug] Original chart version: ""
Error: failed to fetch https://github.com/Gradiant/bigdata-charts/releases/download/hbase-0.1.6/hbase-0.1.6.tgz : 404 Not Found
helm.go:88: [debug] failed to fetch https://github.com/Gradiant/bigdata-charts/releases/download/hbase-0.1.6/hbase-0.1.6.tgz : 404 Not Found
helm.sh/helm/v3/pkg/getter.(*HTTPGetter).get
helm.sh/helm/v3/pkg/getter/httpgetter.go:90
helm.sh/helm/v3/pkg/getter.(*HTTPGetter).Get
helm.sh/helm/v3/pkg/getter/httpgetter.go:42
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).DownloadTo
helm.sh/helm/v3/pkg/downloader/chart_downloader.go:99
helm.sh/helm/v3/pkg/action.(*ChartPathOptions).LocateChart
helm.sh/helm/v3/pkg/action/install.go:704
main.runInstall
helm.sh/helm/v3/cmd/helm/install.go:185
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:120
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:897
main.main
helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
runtime/proc.go:225
runtime.goexit
runtime/asm_amd64.s:1371
Hadoop startup script uses /opt/hadoop/sbin/hadoop-daemon.sh in bootstrap process.
hadoop-daemon.sh uses some env vars that are expected to be set (some are optional), and it tries to use default ones to define some, for example HADOOP_IDENT_STRING if not set then it falls back to USER env var.
Unfortunately in Docker USER var is not set, thus HADOOP_IDENT_STRING is set to empty string.
This means some other relying commands are failing or misbehaving:
chown: missing operand after ‘/opt/hadoop/logs’
bash-5.0# ls -la /opt/hadoop/logs/
total 208
drwxr-xr-x 2 root root 4096 Apr 3 13:21 .
drwxr-xr-x 1 root root 4096 Apr 3 13:21 ..
-rw-r--r-- 1 root root 0 Apr 3 13:21 SecurityAuth-root.audit
-rw-r--r-- 1 root root 188934 Apr 3 17:51 hadoop--datanode-hdfs-datanode-2.log
-rw-r--r-- 1 root root 730 Apr 3 13:21 hadoop--datanode-hdfs-datanode-2.out
... and so on.
unfortunately, current chart implementation does not support adding custom env vars to the pods.
Fix:
There are two ways:
env:
- name: HADOOP_IDENT_STRING
value: "root"
Not sure it if this is done on purpose or not.
(probably all) charts use helpers in such way that they define helm.sh/chart
:
{{- define "hdfs.labels" -}}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/part-of: {{ .Chart.Name }}
{{- end -}}
and then those labels are used in selectors for statefulsets (and actually anything else aswell).
Unfortunately kubernetes does not allow changing those and throws error in case of upgrade:
Error: UPGRADE FAILED: cannot patch "h-hdfs-httpfs" with kind Deployment: Deployment.apps "h-hdfs-httpfs" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"httpfs", "app.kubernetes.io/instance":"h", "app.kubernetes.io/managed-by":"Helm", "app.kubernetes.io/name":"hdfs", "app.kubernetes.io/part-of":"hdfs", "app.kubernetes.io/version":"2.7.7", "helm.sh/chart":"hdfs-0.1.7"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "h-hdfs-datanode" with kind StatefulSet: StatefulSet.apps "h-hdfs-datanode" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden && cannot patch "h-hdfs-namenode" with kind StatefulSet: StatefulSet.apps "h-hdfs-namenode" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden
Fix:
helm.sh/chart
, app.kubernetes.io/version
and other offending labels or use different selector (stripped down to .../managed-by
, .../part-of
, .../instance
) from selectorshelm.sh/chart
version to be the same as on the current deploymentJust tried to deploy hdfs
using your helm charts and test it using rclone
beta client.
Unfortunately I am unable to copy over files to hdfs. Client hags and in logs I see
2021-01-12 11:57:06,270 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741839_1015{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-1e01efed-b1b9-40bc-adc6-0abf2ae3ca00:NORMAL:10.244.0.25:50010|RBW], ReplicaUC[[DISK]DS-ab65c535-fabb-4e29-ad48-fb15ac3665a9:NORMAL:10.244.0.23:50010|RBW], ReplicaUC[[DISK]DS-52242411-b848-490b-9b28-9de6d5ad63ff:NORMAL:10.244.0.27:50010|RBW]]} for /test-folder/test.txt
hdfs support issue in rclone: rclone/rclone#42
https://gradiant.github.io/bigdata-charts/ this address was 404 not found.
I've noticed that opentsdb uses terminationGracePeriodSeconds: 0
, which is quite unsafe [1] and can also lead to data loss if opentsdb will not be able to flush data to the hbase backend.
I suggest to make it default or configurable.
Issue:
charts/charts/hbase/templates/hbase-master-svc.yaml
defines service for hbase-master, but it does not set .spec.clusterIP: None
steve-hbase-master-0.steve-hbase-master.my-ns.svc.cluster.local
Fix:
charts/charts/hbase/templates/hbase-master-svc.yaml
add clusterIp: None
under spec section:
spec:
clusterIP: None
Hbase regionserver logs:
steve-hbase-regionserver-0 regionserver 2020-04-02 10:26:17,206 WARN [regionserver/steve-hbase-regionserver-0:16020] regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
steve-hbase-regionserver-0 regionserver java.net.UnknownHostException: can not resolve steve-hbase-master-0.steve-hbase-master.data-metrics.svc.cluster.local,16000,1585809657345
steve-hbase-regionserver-0 regionserver at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createAddr(AbstractRpcClient.java:430)
steve-hbase-regionserver-0 regionserver at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:507)
steve-hbase-regionserver-0 regionserver at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionServerStatusStub(HRegionServer.java:2498)
steve-hbase-regionserver-0 regionserver at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2546)
steve-hbase-regionserver-0 regionserver at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:930)
steve-hbase-regionserver-0 regionserver at java.lang.Thread.run(Thread.java:748)
Is there a way we can work together to make better docs, so external tools like Spark, Hue or Drill can connect to HIVE Cluster?
Since Kubernetes 1.19 the extensions/v1beta1 for ingress has been deprecated and with the recently released 1.22 they got dropped. Would be good to upgrade this to networking.k8s.io/v1 see https://kubernetes.io/docs/reference/using-api/deprecation-guide/
Value for conf.hiveSite."hive.metastore.uris"? I am assuming its the thrift:// url, but how do we get it?
I cannot create tables or run queries from any client.
Hi! We are using the hdfs service outside of the kubernetes cluster, so we are facing the HDFS data node host name issue.
Since the name node only return the internal data node hostname/ip to service outside the kubernetes cluster, so that external service cannot access to the data node through the hostname/ip. Can anyone help on this scenario? Thanks!
I am getting below exception while running Hbase chart on Openshift K8S , Looks like permission issues , can anyone help me with resolution
cp: can't create '/opt/hbase-2.0.1/conf/hbase-site.xml': File exists
cp: can't create '/opt/hbase-2.0.1/conf/hbase-env.sh': File exists
running master, logging to /opt/hbase-2.0.1/bin/../logs/hbase--master-hbase-hbase-master-0.out
/opt/hbase-2.0.1/bin/hbase-daemon.sh: line 195: /opt/hbase-2.0.1/bin/../logs/hbase--master-hbase-hbase-master-0.out: Permission denied
head: /opt/hbase-2.0.1/bin/../logs/hbase--master-hbase-hbase-master-0.out: No such file or directory
running thrift, logging to /opt/hbase-2.0.1/bin/../logs/hbase--thrift-hbase-hbase-master-0.out
/opt/hbase-2.0.1/bin/hbase-daemon.sh: line 195: /opt/hbase-2.0.1/bin/../logs/hbase--thrift-hbase-hbase-master-0.out: Permission denied
head: /opt/hbase-2.0.1/bin/../logs/hbase--thrift-hbase-hbase-master-0.out: No such file or directory
Thu Jul 22 03:48:38 UTC 2021: Waiting for logs...
tail: can't open '/opt/hbase-2.0.1/logs/*': No such file or directory
> helm install hdfs bigdata-gradiant/hdfs
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "hdfs-datanode" namespace: "" from "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "hdfs-namenode" namespace: "" from "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
ensure CRDs are installed first]
MacOs 13.4.1
helm version: version.BuildInfo{Version:"v3.12.2", GitCommit:"1e210a2c8cc5117d1055bfaa5d40f51bbc2e345e", GitTreeState:"clean", GoVersion:"go1.20.6"}
kubectl version:
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:14:48Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:13:28Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
Current helm chart template does not support setting up replicas for HBase masters.
This may lead to noticeable unavailability of the HBase in general.
HBase uses zookeeper to store information about which node is a master, so that only one is active at a time.
Fix:
Hello, the helm chart URL is broken https://gradiant.github.io/charts/ is this something you can look into or could you please confirm if the chart has been moved somewhere else and where? Thank you and regards.
Hi there!
My k8s are installed using kops. Here is the K8s and kubectl info:
Client Version: v1.24.3
Kustomize Version: v4.5.4
Server Version: v1.24.3
I first implemented hdfs and then went to proceed to install hive-metastore. I have not overwritten any values in yaml.
Below is the current status of the pods:
NAME READY STATUS RESTARTS AGE
hdfs-datanode-0 1/1 Running 0 8m40s
hdfs-datanode-1 1/1 Running 0 8m9s
hdfs-datanode-2 1/1 Running 0 7m49s
hdfs-httpfs-67d6d56bb6-w2z69 1/1 Running 0 8m40s
hdfs-namenode-0 2/2 Running 0 8m40s
hive-metastore-0 0/1 CrashLoopBackOff 5 (72s ago) 5m34s
hive-metastore-postgresql-0 1/1 Running 0 5m34s
The error I received is the following:
2022-08-22T16:16:27,275 ERROR [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Metastore Thrift Server threw an exception... org.apache.hadoop.hive.metastore.api.MetaException: java.net.UnknownHostException: hive-metastore-hdfs at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6887) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6882) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7140) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7067) [hive-exec-2.3.2.jar:2.3.2]
Can you please advise what am I missing?
Thank you!
Current implementation of log processing has some drawbacks:
tail -f
on multiple files generates additional garbage (due to tail itself)Fix:
Other solution is to add additional sidecar containers for log processing.
Looks like hbase 0.1.3 is not available in gh pages.
But hdfs 0.1.6 was published.
Because required files are not created/copied, then canary container is actually creating files within container itself.
Running any other command will return message
"FileSystem is file:///"
...and it does not exit with an error.
Fix:
Will do PR.
i have an issue while installing hdfs , apiVersion v2 should be v1, how can i solve this problem ?
According to the official Hadoop docs running more than 1 namenode requires shared storage, for example nfs or running in Quorum Journal Manager. [1]
Fix:
hadoop-journalnode
, building images and publishing them) with at least 3 replicas with antiaffinity rules.References:
[1] https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.