gradiant / bigdata-charts Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 58.0 439 KB

Curated Big Data Applications for Kubernetes

License: Apache License 2.0

Shell 16.86% PLpgSQL 67.38% Mustache 15.10% Smarty 0.66%

bigdata-charts's People

Stargazers

Watchers

bigdata-charts's Issues

[chart/jupyter] Need update target repo in artifacthub

Please update information in artifacthub,
not view new version
last rename repository
https://artifacthub.io/packages/helm/gradiant/jupyter

HDFS chart : unable to put files in the filesystem

I am trying to copy a local file on my HDFS, deployed with the helm chart.

I am doing :
helm install hdfs https://github.com/Gradiant/charts/releases/download/hdfs-0.1.0/hdfs-0.1.0.tgz -f hdfs-values.yaml

My hdfs-values file :

conf:
  coreSite:
  hdfsSite:
    dfs.replication: 2
dataNode:
  replicas: 2  
  pdbMinAvailable: 2  
  resources:
    requests:
      memory: "256Mi"
      cpu: "10m"
    limits:
      memory: "2048Mi"
      cpu: "1000m"

kubectl get pods show that all pods are running and ready.

NAME                                  READY   STATUS      RESTARTS   AGE
hdfs-httpfs-5686fd75df-2pgk7          1/1     Running     0          59m
hdfs-namenode-0                       2/2     Running     1          59m
hdfs-datanode-0                       1/1     Running     0          59m
hdfs-datanode-1                       1/1     Running     0          58m

I use port forward to access to the K8S HDFS from my local machine :

# namenode web UI
kubectl port-forward svc/hdfs-namenode 50070:50070

# hdfs port
kubectl port-forward hdfs-namenode-0 8020:8020

On my local machine, I have just unzipped a hadoop 2 distribution (2.10.0) and updated core-site.xml like this, to use the forwarded port :

<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
    <description>The name of the default file system.  Either the
      literal string "local" or a host:port for NDFS.
    </description>
    <final>true</final>
  </property>

# ok : 
hdfs dfs -mkdir /jars 
# not ok : 
hdfs dfs -put helloSpark.jar /jars
20/12/11 09:50:53 INFO hdfs.DataStreamer: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1699)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1655)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)
20/12/11 09:50:53 WARN hdfs.DataStreamer: Abandoning BP-831521929-10.42.1.6-1607678530556:blk_1073741827_1003
20/12/11 09:50:53 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.42.1.5:50010,DS-4153f502-30da-42d7-a415-69601658066a,DISK]

I don't have any error in the datanodes logs, and in the namenode , the only error is the same as above.

Did I miss something in the configuration?

Thanks :)

NameNode replicas not configurable

When we deploy the hbase helm chart, the namenode replicas should be configurable. Currently, it is currently hard-coded to 1.

https://github.com/Gradiant/bigdata-charts/blob/master/charts/hdfs/templates/hdfs-nn-statefulset.yaml

This is a single point of failure. See output of kubectl get pods | grep hbase:

[ec2-user@ip-xyz ~]$ kubectl  get pods | grep hbase
dev-hbase-hbase-master-0                             1/1     Running   1          16m
dev-hbase-hbase-master-1                             1/1     Running   0          14m
dev-hbase-hbase-master-2                             1/1     Running   0          14m
dev-hbase-hbase-regionserver-0                       1/1     Running   2          16m
dev-hbase-hbase-regionserver-1                       1/1     Running   0          14m
dev-hbase-hbase-regionserver-2                       1/1     Running   0          14m
dev-hbase-hdfs-datanode-0                            1/1     Running   0          16m
dev-hbase-hdfs-datanode-1                            1/1     Running   0          15m
dev-hbase-hdfs-datanode-2                            1/1     Running   0          15m
dev-hbase-hdfs-httpfs-5fff67bcb8-8mkpj               1/1     Running   0          16m
dev-hbase-hdfs-namenode-0                            2/2     Running   2          16m
dev-hbase-zookeeper-0                                1/1     Running   0          16m
dev-hbase-zookeeper-1                                1/1     Running   0          16m
dev-hbase-zookeeper-2                                1/1     Running   0          16m

We are deploy this chart:

https://artifacthub.io/packages/helm/gradiant/hbase

File file:/user/hive/warehouse/helloworld does not exist

Environment:

GKE
Helm 3
Jupyter server

Steps to reproduce:

helm install gradiant/hive --generate-name
get ip address of hive server (thrift)
Run this simple example code: https://saagie.zendesk.com/hc/en-us/articles/360007829439-Read-Write-from-Hive

from impala.dbapi import connect
from impala.util import as_pandas
import pandas as pd
import os
Connection
conn = connect(host=os.environ['IP_HIVE'], port=10000, user=os.environ['USER'], 
              password=os.environ['PASSWORD'], auth_mechanism='PLAIN')
Writing to a Hive table
cursor = conn.cursor()
cursor.execute('CREATE TABLE default.helloworld (hello STRING,world STRING)')
cursor.execute("insert into default.helloworld values ('hello1','world1')")

Expected: values are inserted

Actual:

Table is created

df = as_pandas(cursor)
print(df.head())```

  tab_name

0 helloworld```

Insert fails

	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
	at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File file:/user/hive/warehouse/helloworld2 does not exist
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2886)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:3297)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:2022)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:360)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
	... 11 more
Caused by: java.io.FileNotFoundException: File file:/user/hive/warehouse/helloworld2 does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
	at org.apache.hadoop.hive.io.HdfsUtils$HadoopFileStatus.<init>(HdfsUtils.java:211)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2884)
	... 22 more```

java exception during startup of opentsdb init container

ISSUE
The main pod opentsdb crashes afer some time/some amount of data processed. In our case it is regular - after about two weeks of usage.
The opentsdb logs:

opentsdb-create-hbase-tables Checking if given compression is supported...
opentsdb-create-hbase-tables 2022-08-16 07:59:33,333 INFO  [main] metrics.MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesIm
opentsdb-create-hbase-tables 2022-08-16 07:59:33,350 INFO  [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
opentsdb-create-hbase-tables 2022-08-16 07:59:33,523 INFO  [main] zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
opentsdb-create-hbase-tables 2022-08-16 07:59:33,538 INFO  [main] compress.CodecPool: Got brand-new compressor [.gz]
opentsdb-create-hbase-tables 2022-08-16 07:59:33,542 INFO  [main] compress.CodecPool: Got brand-new compressor [.gz]
opentsdb-create-hbase-tables 2022-08-16 07:59:33,727 INFO  [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
opentsdb-create-hbase-tables 2022-08-16 07:59:33,741 INFO  [main] compress.CodecPool: Got brand-new decompressor [.gz]
opentsdb-create-hbase-tables SUCCESS
opentsdb-create-hbase-tables HBase script:
opentsdb-create-hbase-tables ----
opentsdb-create-hbase-tables create 'tsdb-uid',
opentsdb-create-hbase-tables {NAME => 'id', COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'},
opentsdb-create-hbase-tables {NAME => 'name', COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables create 'tsdb',
opentsdb-create-hbase-tables {NAME => 't', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables create 'tsdb-tree',
opentsdb-create-hbase-tables {NAME => 't', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables create 'tsdb-meta',
opentsdb-create-hbase-tables {NAME => 'name', COMPRESSION => 'GZ', BLOOMFILTER => 'ROW'}
opentsdb-create-hbase-tables ----
opentsdb-create-hbase-tables Checking if opentsdb tsdb-uid hbase table exists
opentsdb-create-hbase-tables java exception

PRECONDITIONS
chart version - 0.1.7
custom values:

config: {"tsd.http.request.enable_chunked":true,"tsd.http.request.max_chunk":65535}
daemons: 1
hbase:
  enabled: true
  hbase:
    master:
      replicas:3
    regionServer:
      replicas: 3
      resources:
        requests:
          memory: 1Gi
          cpu: 100m
  hdfs:
    enabled: true
    dataNode:
      replicas: 3
    persistence:
      nameNode:
        enabled: true
        accessMode: ReadWriteOnce
        size: 5Gi
      dataNode:
        enabled: true
        accessMode: ReadWriteOnce
        size: 10Gi
  zookeeper:
    enabled: true
    replicaCount: 1
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: "kubernetes.io/hostname"
            labelSelector:
              matchLabels:
                release: zookeeper
    persistence:
      enabled: true
      accessMode: ReadWriteOnce
      size: 1Gi
resources:
  requests:
    memory: 1Gi
    cpu: 100m

STEPS TO REPRODUCE
Install chart with custom values, use it as normal for a while (about a week or two depending on configuration)

HBASE-Readiness probe failed: Get http://10.42.1.252:16010/: dial tcp 10.42.1.252:16010: connect: connection refused

When using external zookeeper and deploying hbase, the following error occurs.

(#helm install hbase --namespace hadoop --set hdfs.enabled=false --set conf.hbaseSite."hbase.rootdir"="hdfs://hive-hdfs-namenode:8020/hbase" --set zookeeper.enabled=false --set conf.hbaseSite."hbase.zookeeper.quorum"="kafka-zookeeper-headless:2181" ./ )
---------logs------------
==> /opt/hbase-2.0.1/logs/hbase--master-hbase-hbase-master-0.log <==
2020-05-12 09:27:40,835 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.IllegalArgumentException: java.net.UnknownHostException: hbase-hdfs-namenode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:681)
at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:590)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:472)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2903)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2921)
Caused by: java.net.UnknownHostException: hbase-hdfs-namenode
... 28 more
2020-05-12 09:27:40,846 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2910)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2921)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hbase-hdfs-namenode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:681)
at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:590)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:472)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2903)
... 5 more
Caused by: java.net.UnknownHostException: hbase-hdfs-namenode
... 28 more

java.net.UnknownHostException: hadoop-hdfs-namenode

I am trying to install Hbase on openshift k8s cluster , I am getting below exception for HMaster to start

==> /opt/hbase-2.0.1/logs/hbase--master-my-hbase-hbase-master-0.log <==
2021-07-21 23:02:16,154 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.IllegalArgumentException: java.net.UnknownHostException: my-hbase-hdfs-namenode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)

Caused by: java.net.UnknownHostException: cstone-hbase-hdfs-namenode
... 28 more
2021-07-21 23:02:16,156 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster

I feel its because of HDFS Namenode connection, So I have tried installing hdfs chart , Pods running fine but when issuing any command its throwing below error (Which is similar to above )

1000760000@hadoop-hdfs-httpfs-9c76bf768-njdx6:/opt/hadoop-2.7.7$ hadoop fs -ls .
-ls: java.net.UnknownHostException: hadoop-hdfs-namenode
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]

cat /etc/hosts
Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.32.8.177 hadoop-hdfs-httpfs-9c76bf768-njdx6

Can somebody helm me on this ?

Presto Chart missing?

@cgiraldo Came across on helm hub a repo for prestosql by you but went to fetch and got a nice little error that the release is missing :(?

helm fetch gradiant/prestosql --version 0.1.0
Error: failed to fetch https://github.com/Gradiant/charts/releases/download/prestosql-0.1.0/prestosql-0.1.0.tgz : 404 Not Found

jupyter-0.1.7 chart only mounts /home/jovyan/git-notebooks.sh

I'm not sure if this is intentional. I'm not using gitNotebooks and my pv for /home/jovyan has several things included (e.g. some special ssh keys I need). Now even with gitNotebooks disabled, it only mounts the subpath git-notebooks.sh so all my other things from the pv are not in there anymore.

DNS issues in the Hadoop/HBase stack

Looks like most of the stack pods are failing to start after pod termination because of DNS entries are not registered fast enough in headless services (depends on cloud venodr, in GKE it's up to 60s), for example:

hdfs-namenode-0 namenode java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-namenode

This may also influence other services (such as kafka).

Fix:

set in headless services spec.publishNotReadyAddresses: true, which will enforce registration of dns hosts even if they are not ready. Ready state is based on pod readiness/liveness probes, if they pass, the given pod is added to the endpoints. In this setup enforcing publishing DNS entries which are not read is not an issue, actually this is expected because the way Hadoop stack was designed. So the DNS entries should be added, while appropriate java apps will handle the actual availability of the processes within pods.

Reference:

It's not DNS

There's no way it's DNS

It was DNS

antiaffinity rules should allow custom rules

Right now there is antiaffinity setting 'soft' and 'hard' but it is hardcoded to the specific entries.
I need more fine-grained affinity rules, such as availability zone i GKE.

So there are two options:

alter current antiaffinity settings and move them from templates to values.yaml - this is most commonnly used option
keep it as is and introduce 'custom' which would then use something like antiAffinitySpec as passthrough value

[opentsdb] OpenTSDB chart should support passing env vars to init containers

Init containers use shell script to set up OpenTSDB tables in HBASE.
This script is using certain env vars to adjust the table initialization, but because current implementation does not allow passing any env vars to the pods, thus it is not really usable - the user is forced to use defaults or manually control that part of the process.

Fix:

allow passing env vars from values.yaml to containers in sts pods

Other notes:

AFAIR OpenTSDB does not have to be as statefulset, it can be a deployment (we have that implemented elsewhere but with BigTable support ad extended termination time period so that OpenTSDB can flush data from memory to HBase safely)
it would be rather beneficial to run the table creation as separate batch job, and not as init container with each instance (create new issue?)

thrift://{{.Release.Name}}-metastore:9083 -> This repo needs better documentation

thrift://{{.Release.Name}}-metastore:9083 <-- what is release name var?

hdfs rack awareness

HDFS uses rack awareness to contol where to store replicas https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html

Notice this is not the same as antiaffinity, and antiaffinity can only solve this issue partially

Current setup does not support rack awareness, not to mention that it is quite complex and actually require additonal info about nodes used to host pods for hdfs datanodes, such as labels on the nodes.

Need to investigate this in the future, just adding this as issue to keep it in mind.

HDFS HttpFS Authentication errors using current chart version

When I install the current chart version like this :

helm install hdfs gradiant/hdfs, I am unable to use httpfs to create directories or files :

http://httpfs-host/webhdfs/v1/user/loic/test?op=MKDIRS&user.name=loic gives me a 500 Internal Server Error :

{
"RemoteException": {
"message": "User: hdfs is not allowed to impersonate loic",
"exception": "RemoteException",
"javaClassName": "org.apache.hadoop.ipc.RemoteException"
}
}

However, it is working when I'm using the 0.1.0 version like this :

helm install hdfs https://github.com/Gradiant/charts/releases/download/hdfs-0.1.0/hdfs-0.1.0.tgz

Hdfs NameNode Service not headless

Hi,

Is there any special reason for the missing "clusterIP: None" in Hdfs NameNode Service compared to the one in helm stable? If the service type is ClusterIP the datanode IP will be masqued by the kubernetes iptables and the namenode will register it by the worker node's IP address. This will fail in for example a single node MicroK8s with multiple datanodes as the namenode will overwrite the registration and only one datanode will be registered at a time. I had some headache over this;-.

BR
/Peter

Readiness probe failed: Get http://10.42.1.234:10002/: dial tcp 10.42.1.234:10002: connect: connection refused

When hive was installed by helm under k8s, hive server pod reported the following error:
"Readiness probe failed: Get http://10.42.1.234:10002/: dial tcp 10.42.1.234:10002: connect: connection refused".
--------------logs----------
Configuring core

Setting fs.defaultFS=hdfs://hive-hdfs-namenode:8020
Configuring hdfs
Configuring yarn
Configuring httpfs
Configuring kms
Configuring mapred
Configuring hive
Configuring for multihomed network
mkdir: Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
mkdir: Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
chmod: /tmp': No such file or directory chmod: /user/hive/warehouse': No such file or directory
2020-05-12 05:48:56: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

The `gradiant.github.io` is in `404` !!

charts of hive, docker 2.3.2-postgresql-metastore maybe have some problem

hive charts always can't pass livenessProbe check. I try to use version 0.1.5 and 0.1.4.

hive-server error info is:

Readiness probe failed: Get "http://172.17.0.9:10002/": dial tcp 172.17.0.9:10002: connect: connection refused

Deployed the hive helm chart as is but liveness and readiness fails

Hi,
Thanks for this great helm chart, but when I installed the helm chart without changing any of the default values hive-server would keep crashing and restarting. I used the command helm install my-hive gradiant/hive --version 0.1.4

Things I have already tried

I pinpointed the issue to liveness and readiness probes of the hive-server statefulset failing.
I increased resources given to hive and hive-metastore as well
I made sure the firewall wasn't blocking any ports but still the probes fail

Environment

GKE
Master version - 1.16.13-gke.401

What else can I try? Any help would be appreciated

OpenTSDB does not need to be statefulset

I don't see a reason why opentsdb needs to be a statefulset.
It's pretty stateless service because it does not store anything on the disks except temporaty graphs.

I believe it is better to convert it to a deployment with 60s graceful shutdown so it can flush data to the hbase backend.
It would also make it much easier to scale it, especially based on CPU.

q

missing release for hbase 0.1.6

Hi!
I'm testing hbase and have found your repository (and used it while it was called gradiant/charts); specifically for hbase

As testing goes, I had to reinstall and then I hit a snag; I think the problem is that hbase is missing a release asset; would you be so kind to publish hbase-0.1.6?

helm install hbase bigdata-gradiant/hbase --set persistence.dataNode.storageClass=local-path,persistence.storageClass=local-path --debug
install.go:173: [debug] Original chart version: ""
Error: failed to fetch https://github.com/Gradiant/bigdata-charts/releases/download/hbase-0.1.6/hbase-0.1.6.tgz : 404 Not Found
helm.go:88: [debug] failed to fetch https://github.com/Gradiant/bigdata-charts/releases/download/hbase-0.1.6/hbase-0.1.6.tgz : 404 Not Found
helm.sh/helm/v3/pkg/getter.(*HTTPGetter).get
	helm.sh/helm/v3/pkg/getter/httpgetter.go:90
helm.sh/helm/v3/pkg/getter.(*HTTPGetter).Get
	helm.sh/helm/v3/pkg/getter/httpgetter.go:42
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).DownloadTo
	helm.sh/helm/v3/pkg/downloader/chart_downloader.go:99
helm.sh/helm/v3/pkg/action.(*ChartPathOptions).LocateChart
	helm.sh/helm/v3/pkg/action/install.go:704
main.runInstall
	helm.sh/helm/v3/cmd/helm/install.go:185
main.newInstallCmd.func2
	helm.sh/helm/v3/cmd/helm/install.go:120
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/[email protected]/command.go:897
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_amd64.s:1371

[hdfs] Datanode HADOOP_IDENT_STRING and USER env var not set in containers

Hadoop startup script uses /opt/hadoop/sbin/hadoop-daemon.sh in bootstrap process.
hadoop-daemon.sh uses some env vars that are expected to be set (some are optional), and it tries to use default ones to define some, for example HADOOP_IDENT_STRING if not set then it falls back to USER env var.

Unfortunately in Docker USER var is not set, thus HADOOP_IDENT_STRING is set to empty string.
This means some other relying commands are failing or misbehaving:

chown for logs dir will fail with message similar to chown: missing operand after ‘/opt/hadoop/logs’
log files will be with two dashes:

bash-5.0# ls -la /opt/hadoop/logs/
total 208
drwxr-xr-x 2 root root   4096 Apr  3 13:21 .
drwxr-xr-x 1 root root   4096 Apr  3 13:21 ..
-rw-r--r-- 1 root root      0 Apr  3 13:21 SecurityAuth-root.audit
-rw-r--r-- 1 root root 188934 Apr  3 17:51 hadoop--datanode-hdfs-datanode-2.log
-rw-r--r-- 1 root root    730 Apr  3 13:21 hadoop--datanode-hdfs-datanode-2.out

... and so on.

unfortunately, current chart implementation does not support adding custom env vars to the pods.

Fix:
There are two ways:

quick: add to the current sts template:

env:
  - name: HADOOP_IDENT_STRING
    value: "root"

more complex - extend templates to allow passing env vars per datanode, namenode from map (for example incubating/zookeeper chart does it)
also in the future run as non-root user

Is there a chance to bump the version of OpenTSDB to 3.0.90-SNAPSHOT

labels defined in helpers are used in selectors - makes normal upgrades impossible

Not sure it if this is done on purpose or not.

(probably all) charts use helpers in such way that they define helm.sh/chart:

{{- define "hdfs.labels" -}}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/part-of: {{ .Chart.Name }}
{{- end -}}

and then those labels are used in selectors for statefulsets (and actually anything else aswell).
Unfortunately kubernetes does not allow changing those and throws error in case of upgrade:

Error: UPGRADE FAILED: cannot patch "h-hdfs-httpfs" with kind Deployment: Deployment.apps "h-hdfs-httpfs" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"httpfs", "app.kubernetes.io/instance":"h", "app.kubernetes.io/managed-by":"Helm", "app.kubernetes.io/name":"hdfs", "app.kubernetes.io/part-of":"hdfs", "app.kubernetes.io/version":"2.7.7", "helm.sh/chart":"hdfs-0.1.7"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "h-hdfs-datanode" with kind StatefulSet: StatefulSet.apps "h-hdfs-datanode" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden && cannot patch "h-hdfs-namenode" with kind StatefulSet: StatefulSet.apps "h-hdfs-namenode" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

Fix:

proper - remove helm.sh/chart, app.kubernetes.io/version and other offending labels or use different selector (stripped down to .../managed-by, .../part-of, .../instance) from selectors
(workaround) use local chart but with manually changed helm.sh/chart version to be the same as on the current deployment

cannot copy files to hdfs: block under construction

Just tried to deploy hdfs using your helm charts and test it using rclone beta client.

Unfortunately I am unable to copy over files to hdfs. Client hags and in logs I see

2021-01-12 11:57:06,270 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741839_1015{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-1e01efed-b1b9-40bc-adc6-0abf2ae3ca00:NORMAL:10.244.0.25:50010|RBW], ReplicaUC[[DISK]DS-ab65c535-fabb-4e29-ad48-fb15ac3665a9:NORMAL:10.244.0.23:50010|RBW], ReplicaUC[[DISK]DS-52242411-b848-490b-9b28-9de6d5ad63ff:NORMAL:10.244.0.27:50010|RBW]]} for /test-folder/test.txt

hdfs support issue in rclone: rclone/rclone#42

Readme chart address was invalid

https://gradiant.github.io/bigdata-charts/ this address was 404 not found.

terminationGracePeriodSeconds 0 is unsafe

I've noticed that opentsdb uses terminationGracePeriodSeconds: 0, which is quite unsafe [1] and can also lead to data loss if opentsdb will not be able to flush data to the hbase backend.
I suggest to make it default or configurable.

[1] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees

[hbase] HBase master service should have clusterIP: None

Issue:

let's assume the helm chart release name was set to 'steve'
hadoop, hbase, zookeeper are deployed separately as standalone charts (not using umbrella)
charts/charts/hbase/templates/hbase-master-svc.yaml defines service for hbase-master, but it does not set .spec.clusterIP: None
hbase statefulset when starts then it registes in zookeeper under /hbase/master under hostname steve-hbase-master-0.steve-hbase-master.my-ns.svc.cluster.local
hbase region server when connecting to zookeeper gets info about /hbase/master and tries to solve the hostname, but it fails
in the effect hbase region server is up and running but not connected to hbase masters.

Fix:
charts/charts/hbase/templates/hbase-master-svc.yaml add clusterIp: None under spec section:

spec:
  clusterIP: None

Hbase regionserver logs:

steve-hbase-regionserver-0 regionserver 2020-04-02 10:26:17,206 WARN  [regionserver/steve-hbase-regionserver-0:16020] regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
steve-hbase-regionserver-0 regionserver java.net.UnknownHostException: can not resolve steve-hbase-master-0.steve-hbase-master.data-metrics.svc.cluster.local,16000,1585809657345
steve-hbase-regionserver-0 regionserver   at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createAddr(AbstractRpcClient.java:430)
steve-hbase-regionserver-0 regionserver   at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:507)
steve-hbase-regionserver-0 regionserver   at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionServerStatusStub(HRegionServer.java:2498)
steve-hbase-regionserver-0 regionserver   at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2546)
steve-hbase-regionserver-0 regionserver   at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:930)
steve-hbase-regionserver-0 regionserver   at java.lang.Thread.run(Thread.java:748)

Better docs

Is there a way we can work together to make better docs, so external tools like Spark, Hue or Drill can connect to HIVE Cluster?

jupyter chart uses deprecated ingress apiVersion

Since Kubernetes 1.19 the extensions/v1beta1 for ingress has been deprecated and with the recently released 1.22 they got dropped. Would be good to upgrade this to networking.k8s.io/v1 see https://kubernetes.io/docs/reference/using-api/deprecation-guide/

What value is needed for conf.hiveSite."hive.metastore.uris"

Value for conf.hiveSite."hive.metastore.uris"? I am assuming its the thrift:// url, but how do we get it?

I cannot create tables or run queries from any client.

HDFS data node hostname issue

Hi! We are using the hdfs service outside of the kubernetes cluster, so we are facing the HDFS data node host name issue.

Since the name node only return the internal data node hostname/ip to service outside the kubernetes cluster, so that external service cannot access to the data node through the hostname/ip. Can anyone help on this scenario? Thanks!

Hbase Master: Permission Denied

I am getting below exception while running Hbase chart on Openshift K8S , Looks like permission issues , can anyone help me with resolution

cp: can't create '/opt/hbase-2.0.1/conf/hbase-site.xml': File exists
cp: can't create '/opt/hbase-2.0.1/conf/hbase-env.sh': File exists
running master, logging to /opt/hbase-2.0.1/bin/../logs/hbase--master-hbase-hbase-master-0.out
/opt/hbase-2.0.1/bin/hbase-daemon.sh: line 195: /opt/hbase-2.0.1/bin/../logs/hbase--master-hbase-hbase-master-0.out: Permission denied
head: /opt/hbase-2.0.1/bin/../logs/hbase--master-hbase-hbase-master-0.out: No such file or directory
running thrift, logging to /opt/hbase-2.0.1/bin/../logs/hbase--thrift-hbase-hbase-master-0.out
/opt/hbase-2.0.1/bin/hbase-daemon.sh: line 195: /opt/hbase-2.0.1/bin/../logs/hbase--thrift-hbase-hbase-master-0.out: Permission denied
head: /opt/hbase-2.0.1/bin/../logs/hbase--thrift-hbase-hbase-master-0.out: No such file or directory
Thu Jul 22 03:48:38 UTC 2021: Waiting for logs...
tail: can't open '/opt/hbase-2.0.1/logs/*': No such file or directory

helm install hdfs chart cause exception

> helm install hdfs bigdata-gradiant/hdfs

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "hdfs-datanode" namespace: "" from "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "hdfs-namenode" namespace: "" from "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
ensure CRDs are installed first]

MacOs 13.4.1

helm version: version.BuildInfo{Version:"v3.12.2", GitCommit:"1e210a2c8cc5117d1055bfaa5d40f51bbc2e345e", GitTreeState:"clean", GoVersion:"go1.20.6"}

kubectl version:
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:14:48Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:13:28Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}

[hbase] HBase master should support setting number of replicas

Current helm chart template does not support setting up replicas for HBase masters.
This may lead to noticeable unavailability of the HBase in general.
HBase uses zookeeper to store information about which node is a master, so that only one is active at a time.

Fix:

add to the templates and values file support to set replicas for hbase.master.replicas

Helm Chart URL broken

Hello, the helm chart URL is broken https://gradiant.github.io/charts/ is this something you can look into or could you please confirm if the chart has been moved somewhere else and where? Thank you and regards.

Error launching hive-metastore on k8 cluster on aws

Hi there!

My k8s are installed using kops. Here is the K8s and kubectl info:
Client Version: v1.24.3
Kustomize Version: v4.5.4
Server Version: v1.24.3

I first implemented hdfs and then went to proceed to install hive-metastore. I have not overwritten any values in yaml.
Below is the current status of the pods:
NAME READY STATUS RESTARTS AGE
hdfs-datanode-0 1/1 Running 0 8m40s
hdfs-datanode-1 1/1 Running 0 8m9s
hdfs-datanode-2 1/1 Running 0 7m49s
hdfs-httpfs-67d6d56bb6-w2z69 1/1 Running 0 8m40s
hdfs-namenode-0 2/2 Running 0 8m40s
hive-metastore-0 0/1 CrashLoopBackOff 5 (72s ago) 5m34s
hive-metastore-postgresql-0 1/1 Running 0 5m34s

The error I received is the following:
2022-08-22T16:16:27,275 ERROR [main] org.apache.hadoop.hive.metastore.HiveMetaStore - Metastore Thrift Server threw an exception... org.apache.hadoop.hive.metastore.api.MetaException: java.net.UnknownHostException: hive-metastore-hdfs at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6887) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6882) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7140) ~[hive-exec-2.3.2.jar:2.3.2] at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7067) [hive-exec-2.3.2.jar:2.3.2]

Can you please advise what am I missing?

Thank you!

Reconfigure logging of the apps so that they log to stdout and stderr

Current implementation of log processing has some drawbacks:

it spawns multiple processes in containers - like main java app and then additional tools like tail
tail -f on multiple files generates additional garbage (due to tail itself)
out of order logs, hard to process with other tools

Fix:

for each application adjust logging parameters
-- for hadoop stack via log4j properties
-- for opentsdb by providing logback.xml config

Other solution is to add additional sidecar containers for log processing.

chart for hbase 0.1.3 not published

Looks like hbase 0.1.3 is not available in gh pages.

But hdfs 0.1.6 was published.

[hdfs] canary pod actually does not test hdfs storage

Because required files are not created/copied, then canary container is actually creating files within container itself.
Running any other command will return message

"FileSystem is file:///"

...and it does not exit with an error.

Fix:

ensure to run bash, import bootstrap, add exit codes

Will do PR.

Apiversion of hdfs should be v1

i have an issue while installing hdfs , apiVersion v2 should be v1, how can i solve this problem ?

[hdfs] setting namenode replicas to something else than 1 may lead to cluster instability

According to the official Hadoop docs running more than 1 namenode requires shared storage, for example nfs or running in Quorum Journal Manager. [1]

Fix:

temporary - add comment about namenode.replicas: 1 that it should not be changed (or drop it)
proper - implement running journalnodes (requires also changes in https://github.com/Gradiant/dockerized-hadoop to implement hadoop-journalnode, building images and publishing them) with at least 3 replicas with antiaffinity rules.

References:
[1] https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

gradiant / bigdata-charts Goto Github PK

bigdata-charts's People

Stargazers

Watchers

Forkers

bigdata-charts's Issues

Things I have already tried

Environment

Recommend Projects

Recommend Topics

Recommend Org