cuebook / cuelake Goto Github PK
View Code? Open in Web Editor NEWUse SQL to build ELT pipelines on a data lakehouse.
Home Page: https://cuelake.cuebook.ai
License: Apache License 2.0
Use SQL to build ELT pipelines on a data lakehouse.
Home Page: https://cuelake.cuebook.ai
License: Apache License 2.0
While development Cuelake should be able to automatically port-forward zeppelin-server in a given namespace. Namespace can be provided via an env variable.
Similarily create port-forwards for zeppelin-job-server when running a job in dev environment.
Currently the logic very complicated, have a simpler more readable function to do the sorting.
Test the following scenarios on hive metastore for iceberg, delta and parquert tables:
Test and fix the behaviour of hive metastore on both S3 and GCS.
Use the latest version of iceberg and delta jars and also upgrade the spark version if required.
Do not inherit from PeriodicTask instead use a foreign key
Currently, logs are just JSON dumps. Copy the parser code from zeppelin and implement in CueLake so that the logs look the same as they are in zeppelin.
Is your feature request related to a problem? Please describe.
Your current system supports zeplin notebooks. We have a lot of notebooks designed with jupyter. And we have tons of tooling around the same. Its a tremendous effort to shift these. Requesting support for jupyter notebooks besides zepplyn.
Describe the solution you'd like
Ability to run jupyter notebooks
Describe alternatives you've considered
Tools for convert from jupyter to zepplyn. But thats a lot of work internally
Is your feature request related to a problem? Please describe.
Can we used minio as S3 compatible for apache iceberg
Describe the solution you'd like
Can we used minio as S3 compatible for apache iceberg
Describe alternatives you've considered
If we can use minio, need the steps to configure minio with cuelake
Additional context
Can we used minio as S3 compatible for apache iceberg
There is a syntax error (missing comma) on line 1275 in https://raw.githubusercontent.com/cuebook/cuelake/main/zeppelinConf/interpreter.json
Also, there is a \t on lines 1271 and 1272 that I suspect are incorrect.
And finally if you use less to view the content it C in the word Comma on line 201 is displayed as .
Below is a diff file or the changes that I made to the file.
201c201
< "description": "Сomma separated schema (schema \u003d catalog \u003d database) filters to get metadata for completions. Supports \u0027%\u0027 symbol is equivalent to any set of characters. (ex. prod_v_%,public%,info)"
---
> "description": "Comma separated schema (schema \u003d catalog \u003d database) filters to get metadata for completions. Supports \u0027%\u0027 symbol is equivalent to any set of characters. (ex. prod_v_%,public%,info)"
1271,1272c1271,1272
< "spark.executor.extraJavaOptions\t": {
< "name": "spark.executor.extraJavaOptions\t",
---
> "spark.executor.extraJavaOptions": {
> "name": "spark.executor.extraJavaOptions",
1275c1275
< }
---
> },
Some models name are not so apt. Change the following model names:
RunStatus -> NotebookRunLogs
WorkflowRuns -> WorkflowRunLogs
Jobs stay in running status when the pod gets killed or evicted due to constraints. Save the state in database and start the job again when pod comes live again.
Dashboard will show all the workspaces and their resouces.
CueLake will start with 0 workspaces.
User can add a workspace from dashboard.
For a workspace following info will be shown:
Ask following info while creating a workspace:
Describe the bug
The default RBAC role is missing pods as a resource, which causes exceptions in lakehouse as shown below.
27.0.0.1 - - [27/May/2021:06:14:14 +0000] "GET /api/genie/notebooks/0 HTTP/1.1" 200 68 "http://127.0.0.1:8080/notebooks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
Internal Server Error: /api/genie/driverAndExecutorStatus/
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/django/views/generic/base.py", line 70, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 509, in dispatch
response = self.handle_exception(exc)
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 469, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
raise exc
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/code/genie/views.py", line 243, in get
res = KubernetesServices.getDriversCount()
File "/code/genie/services/services.py", line 657, in getDriversCount
ret = v1.list_namespaced_pod(POD_NAMESPACE, watch=False)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 15302, in list_namespaced_pod
return self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 15427, in list_namespaced_pod_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 243, in GET
query_params=query_params)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '96c45951-281d-41d5-908d-b6429974a4dd', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 27 May 2021 06:14:14 GMT', 'Content-Length': '282'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:cuelake:default\" cannot list resource \"pods\" in API group \"\" in the namespace \"cuelake\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}
```
***Workaround***
A workaround is to add "pods" as a resource in the default-role in cuelake.yaml.
```
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: default-role
rules:
- apiGroups: [""]
resources: ["pods", "configmaps"]
verbs: ["create", "get", "update", "patch", "list", "delete", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["bind", "create", "get", "update", "patch", "list", "delete", "watch"]
```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.