Hello,
Objective
I want to deploy this chart to an on-prem kubrnetes cluster with a catalog that makes use of minio for the hive metastore and backend storage. For reference, see -
https://blog.minio.io/building-an-on-premise-ml-ecosystem-with-minio-powered-by-presto-weka-r-and-s3select-feature-fefbbaa87054
Reason to think it should work
I have managed to use the prestosql/presto
docker image to achieve this successfully on a dev machine (just docker, no k8s). This proof of concept used 1 minio container, 1 presto container, and 1 jupyter python container to connect to presto and push/pull data.
In order to get it working, I had to sort out networking and to create the right catalog file. The catalog file I'm using is -
# lake.catalog
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://presto/
hive.allow-drop-table=true
hive.s3.aws-access-key=<USER>
hive.s3.aws-secret-key=<PASSWORD>
hive.s3.endpoint=<URL>
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.s3select-pushdown.enabled=true
hive.storage-format=parquet
How far I've gotten with wiwdata/presto-chart
I have a jupyter notebook up in the cluster. I have minio up in the cluster.
- I am able to bring up a wiwdata presto cluster with defaults.
- I am able to interact with the presto using the python notebook. I am able to interact with the minio server using the python notebook. So networking is posing no issues.
Where I am stuck
I cannot bring up a presto cluster with a working catalog configmap. This is what my configmap looks like -
---
# Source: presto/templates/configmap-catalog.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: presto-catalog
labels:
app: presto
chart: presto-1
release: presto-1579859901
data:
hive.properties: |
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://presto/
hive.allow-drop-table=true
hive.s3.aws-access-key=<USER>
hive.s3.aws-secret-key=<PASSWORD>
hive.s3.endpoint="minio-service.default.svc.cluster.local:9000"
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.s3select-pushdown.enabled=true
hive.storage-format=parquet
---
** All other values in values.yaml
are unchanged**
Unfortunately, the cluster keeps entering a crashloop.
riaz@k3s-dev:~/presto-chart$ sudo kubectl get all
NAME READY STATUS RESTARTS AGE
pod/kubernetes-cockpit-tlnsw 1/1 Running 0 3h54m
pod/minio-69c5c44c7c-74dkh 1/1 Running 0 136m
pod/presto-1579859901-worker-845cd7cb9c-2tkzz 0/1 CrashLoopBackOff 3 3m45s
pod/presto-1579859901-worker-845cd7cb9c-hh295 0/1 CrashLoopBackOff 3 3m45s
pod/presto-1579859901-coordinator-7df8fc5c45-m699k 0/1 CrashLoopBackOff 3 3m45s
NAME DESIRED CURRENT READY AGE
replicationcontroller/kubernetes-cockpit 1 1 1 3h54m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 13h
service/workbench ExternalName <none> proxy-public.workbench.svc.cluster.local 80/TCP 13h
service/kubernetes-cockpit ClusterIP 10.43.32.92 <none> 443/TCP 3h54m
service/minio-service ClusterIP 10.43.102.159 <none> 9000/TCP 136m
service/presto-1579859901 ClusterIP 10.43.214.17 <none> 80/TCP 3m45s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/minio 1/1 1 1 136m
deployment.apps/presto-1579859901-worker 0/2 2 0 3m45s
deployment.apps/presto-1579859901-coordinator 0/1 1 0 3m45s
NAME DESIRED CURRENT READY AGE
replicaset.apps/minio-69c5c44c7c 1 1 1 136m
replicaset.apps/presto-1579859901-worker-845cd7cb9c 2 2 0 3m45s
replicaset.apps/presto-1579859901-coordinator-7df8fc5c45 1 1 0 3m45s
I will post the logs in the next message so that they don't clog up this one, but the salient error (I think) is this -
2020-01-24T11:13:07.350Z ERROR main com.facebook.presto.server.PrestoServer Unable to create injector, see the following errors:
1) Explicit bindings are required and com.facebook.presto.hive.authentication.HdfsAuthentication is not explicitly bound.
while locating com.facebook.presto.hive.authentication.HdfsAuthentication
for the 3rd parameter of com.facebook.presto.hive.HdfsEnvironment.<init>(HdfsEnvironment.java:50)
at com.facebook.presto.hive.HiveClientModule.configure(HiveClientModule.java:68)
2) Explicit bindings are required and com.facebook.presto.hive.s3.S3ConfigurationUpdater is not explicitly bound.
while locating com.facebook.presto.hive.s3.S3ConfigurationUpdater
for the 2nd parameter of com.facebook.presto.hive.HdfsConfigurationUpdater.<init>(HdfsConfigurationUpdater.java:77)
at com.facebook.presto.hive.HiveClientModule.configure(HiveClientModule.java:66)
3) Error: Could not coerce value 'parquet' to com.facebook.presto.hive.HiveStorageFormat (property 'hive.storage-format') in order to call [public com.facebook.presto.hive.HiveClientConfig com.facebook.presto.hive.HiveClientConfig.setHiveStorageFormat(com.facebook.presto.hive.HiveStorageFormat)]
4) Configuration property 'hive.s3.aws-access-key' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
5) Configuration property 'hive.s3.aws-secret-key' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
6) Configuration property 'hive.s3.endpoint' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
7) Configuration property 'hive.s3.path-style-access' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
8) Configuration property 'hive.s3.ssl.enabled' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
9) Configuration property 'hive.s3select-pushdown.enabled' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
10) Configuration property 'hive.storage-format' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)
10 errors
This suggests to me that this docker container has been built with a version of presto that doesn't support the s3-backed hive metastore.
Is this correct? If so, could I build an updated one?