googlecloudplatform / spark-on-k8s-operator Goto Github PK
View Code? Open in Web Editor NEWKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
License: Apache License 2.0
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
License: Apache License 2.0
right now the onDelete handler in the sparkapp controller doesn't really do anything. this is the last opportunity for it to clean up the pods / services created. seems like deleteDriverAndUIService()
should be called in onDelete()
The operator currently does not support PySpark, which is available now in the master branch of Spark. The following changes are needed to make the operator support PySpark in the master branch:
Mounting volumes can be a way of making users' application dependencies available, additionally to HTTP/HTTPs servers, GCS, S3, etc.
Currently, there is no implementation for s3 - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/sparkctl/cmd/create.go#L273
One solution, reusing existing flags and envs (--project
and GOOGLE_APPLICATION_CREDENTIALS
) could be to have --project
as s3 profile and some other ENV to point to configuration file, structured as:
[<project>]
s3_access_key_id=<key>
s3_secret_access_key=<secret>
region=<region>
output=json
Could that be a correct approach?
Spark driver pods need certain permissions to be able to create and watch executor pods. The service account used by the driver pod must have the appropriate permission for the driver to be able to do its work. The Spark operator should support automatically creating a service account with the necessary permissions for the driver pods to run. Specifically, it needs to create a service account, a ClusterRole
and a ClusterRoleBinding
. It probably also should support automatically creating the namespace of a SparkApplication
if it doesn't exist.
Hello,
We are using our private Docker repository for our images, and as a result when specifying a image for container for pod we have to specify imagePullSecrets.
Right now as I can see it is not supported. I can see that normal Secrets are supported, but not imagePullSecrets which are used to authenticate with Docker repository. Please correct me if I'm wrong.
Do you plan to add this support? I can and create a PR for that if that helps.
Thank you.
AG
Currently we repeatedly run sparkctl log command to get the new log lines. It would be good to have -f option that shows the log as its updated.
SparkApplication
events can be exported to Stackdriver logging using https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/event-exporter. Event exporting to Stackdriver logging is enabled by default on GKE. We need better documentation.
Starting a couple of discussions around specification of properties.
spec:
deps: {}
driver:
cores: "0.1"
image: liyinan926/spark-driver:v2.3.0
executor:
image: liyinan926/spark-executor:v2.3.0
instances: 1
memory: 512m
Should this mimic the driver/executor properties more closely?
For example: spark.kubernetes.driver.limit.cores
.
/xref: https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/running-on-kubernetes.html
It's been changed upstream to combine the images for the driver, executor, and init-container into a single image (apache/spark#20192). The Spark operator needs to be updated to support that.
I noticed that whenever both hadoopConfigMap and sparkConfigMap are specified, there is an exception raised
However, if I specify them alone, it works e.g.
apiVersion: "sparkoperator.k8s.io/v1alpha1"
kind: SparkApplication
metadata:
name: secure-eos-events-select
namespace: default
spec:
type: Scala
mode: cluster
image: "gitlab-registry.cern.ch/db/spark-service/docker-registry/spark-on-k8s:v2.3.0-xrootd-s3"
imagePullPolicy: Always
mainClass: org.sparkservice.sparkrootapplications.examples.EventsSelect
mainApplicationFile: "local:///opt/spark/examples/jars/spark-service-examples_2.11-0.0.1.jar"
arguments:
- "root://eosuser.cern.ch/eos/user/p/pmrowczy/Downloads/evsel.root"
deps:
jars:
- http://central.maven.org/maven2/org/diana-hep/root4j/0.1.6/root4j-0.1.6.jar
- http://central.maven.org/maven2/org/apache/bcel/bcel/5.2/bcel-5.2.jar
- http://central.maven.org/maven2/org/diana-hep/spark-root_2.11/0.1.16/spark-root_2.11-0.1.16.jar
#hadoopConfigMap: secure-eos-events-select-hadoop-conf #<- does not work with sparkConfigMap
sparkConfigMap: secure-eos-events-select-spark-conf #<- does not work with hadoopConfigMap
sparkConf:
"spark.kubernetes.driverEnv.KRB5CCNAME": /usr/lib/hadoop/etc/hadoop/krb5cc_0
"spark.executorEnv.KRB5CCNAME": /usr/lib/hadoop/etc/hadoop/krb5cc_0
# "spark.kubernetes.driverEnv.KRB5CCNAME": /etc/hadoop/conf/krb5cc_0
# "spark.executorEnv.KRB5CCNAME": /etc/hadoop/conf/krb5cc_0
driver:
cores: 1
coreLimit: "1000m"
memory: "1024m"
labels:
version: 2.3.0
serviceAccount: spark
configMaps:
- name: secure-eos-events-select-hadoop-conf #<- does work
path: /usr/lib/hadoop/etc/hadoop/
# configMaps:
# - name: secure-eos-events-select-spark-conf #<- does work
# path: /opt/spark/conf
executor:
instances: 1
cores: 2
memory: "2048m"
labels:
version: 2.3.0
configMaps:
- name: secure-eos-events-select-hadoop-conf #<- does work
path: /usr/lib/hadoop/etc/hadoop/
# configMaps:
# - name: secure-eos-events-select-spark-conf #<- does work
# path: /opt/spark/conf
restartPolicy: Never
Below there is an exception raised
Name: secure-eos-events-select
Namespace: default
Labels: <none>
Annotations: <none>
API Version: sparkoperator.k8s.io/v1alpha1
Kind: SparkApplication
Metadata:
Cluster Name:
Creation Timestamp: 2018-05-24T13:51:57Z
Generation: 1
Resource Version: 104967
Self Link: /apis/sparkoperator.k8s.io/v1alpha1/namespaces/default/sparkapplications/secure-eos-events-select
UID: a028cf32-5f59-11e8-a690-02163e01c159
Spec:
Arguments:
root://eosuser.cern.ch/eos/user/p/pmrowczy/Downloads/evsel.root
Deps:
Jars:
http://central.maven.org/maven2/org/diana-hep/root4j/0.1.6/root4j-0.1.6.jar
http://central.maven.org/maven2/org/apache/bcel/bcel/5.2/bcel-5.2.jar
http://central.maven.org/maven2/org/diana-hep/spark-root_2.11/0.1.16/spark-root_2.11-0.1.16.jar
Driver:
Core Limit: 1000m
Cores: 1
Labels:
Version: 2.3.0
Memory: 1024m
Service Account: spark
Executor:
Cores: 2
Instances: 1
Labels:
Version: 2.3.0
Memory: 2048m
Hadoop Config Map: secure-eos-events-select-hadoop-conf
Image: gitlab-registry.cern.ch/db/spark-service/docker-registry/spark-on-k8s:v2.3.0-xrootd-s3
Image Pull Policy: Always
Main Application File: local:///opt/spark/examples/jars/spark-service-examples_2.11-0.0.1.jar
Main Class: org.sparkservice.sparkrootapplications.examples.EventsSelect
Mode: cluster
Restart Policy: Never
Spark Config Map: secure-eos-events-select-spark-conf
Type: Scala
Status:
App Id: secure-eos-events-select-4075353935
Application State:
Error Message: Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:363)
at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$3.apply(KubernetesClientApplication.scala:145)
at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$3.apply(KubernetesClientApplication.scala:144)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:144)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:235)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:230)
at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall.execute(RealCall.java:69)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:226)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:769)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:356)
... 14 more
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at okio.Okio$2.read(Okio.java:139)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
... 41 more
State: SUBMISSION_FAILED
Completion Time: <nil>
Driver Info:
Web UI Port: 30289
Web UI Service Name: secure-eos-events-select-4075353935-ui-svc
Submission Time: 2018-05-24T13:51:57Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SparkApplicationAdded 17s spark-operator SparkApplication secure-eos-events-select was added, enqueued it for submission
Warning SparkApplicationSubmissionFailed 3s spark-operator SparkApplication secure-eos-events-select failed submission: Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed.
Here's the full stack trace,
I0117 19:27:35.500409 1 main.go:71] Checking the kube-dns add-on
I0117 19:27:35.523940 1 main.go:76] Starting the Spark operator
I0117 19:27:35.524650 1 controller.go:136] Starting the SparkApplication controller
I0117 19:27:35.524664 1 controller.go:138] Creating the CustomResourceDefinition sparkapplications.sparkoperator.k8s.io
W0117 19:27:35.539429 1 crd.go:69] CustomResourceDefinition sparkapplications.sparkoperator.k8s.io already exists
I0117 19:27:35.539489 1 controller.go:144] Starting the SparkApplication informer
I0117 19:27:35.639770 1 controller.go:151] Starting the workers of the SparkApplication controller
I0117 19:27:35.640005 1 spark_pod_monitor.go:109] Starting the Spark Pod monitor
I0117 19:27:35.640012 1 spark_pod_monitor.go:112] Starting the Pod informer of the Spark Pod monitor
I0117 19:27:35.640065 1 initializer.go:112] Starting the Spark Pod initializer
I0117 19:27:35.640073 1 initializer.go:164] Adding the InitializerConfiguration spark-pod-initializer-config
I0117 19:27:35.640194 1 submission_runner.go:58] Starting the spark-submit runner
F0117 19:27:35.645496 1 main.go:96] failed to create InitializerConfiguration spark-pod-initializer-config: the server could not find the requested resource (post initializerconfigurations.admissionregistration.k8s.io)
I am able to run python spark applications without any issues, but when I add the schedule
and change from SparkApplication
to ScheduledSparkApplication
(with other required changes) it fails to start any pods. It creates the service which fails with following and nothing happens.
Error creating load balancer (will retry): error getting LB for service default/spark-my-wc-py-1525717818370411106-365697849-ui-svc: Service(default/spark-my-wc-py-1525717818370411106-365697849-ui-svc) - Loadbalancer not found
Env:
I am running the 2.2 fork which has the python support.
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:54Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Here is the deployment YAML for scheduledsparkapplication .
apiVersion: "sparkoperator.k8s.io/v1alpha1"
kind: ScheduledSparkApplication
metadata:
name: pyspark-example
namespace: default
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace
runHistoryLimit: 3
template:
type: Python
mode: cluster
mainApplicationFile: "/code/pyspark-example.py"
Arguments:
- "wasb://[email protected]/episodes.avro"
deps:
jars:
- "local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar"
sparkConf:
"spark.hadoop.fs.azure.account.key.sparkdkakadia.blob.core.windows.net": "4gIguFMiymTI4BoHL2LsbIdEVjDYUZfcgj+/eUXWH9RzJPQAlQpUsT1+hAAsVJdlZd9fBQ+ctZV+I55A=="
driver:
cores: 0.1
image: dharmeshkakadia/spark-py-example:latest
coreLimit: "200m"
memory: "1024m"
labels:
version: 2.2.0
serviceAccount: spark
executor:
cores: 1
instances: 2
image: dharmeshkakadia/spark-executor-py:latest
memory: "1024m"
labels:
version: 2.2.0
restartPolicy: Never
spark-operator-rbac.yaml currently grants ALL roles to the spark-operator service account.
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: sparkoperator
rules:
- apiGroups:
- "*"
resources:
- "*"
verbs:
- "*"
it should only be granted the roles it actually needs.
This is an issue for tracking the 2018 roadmap for the Spark Operator. The list below is tentative and may be constantly updated as ideas and thoughts pop up.
sparkctl
SparkApplication
s to Stackdriver (supported by default on GKE, supported on other providers through the Stackdriver event exporter).SparkApplication
API.Notable changes in apache/spark/master post 2.3.0:
SPARK_CONF_DIR
is set to point to the mount path /opt/spark/conf
of the ConfigMap. This is in conflict with what spec.sparkConfigMap
is designed to do. Created #216.Hi, thanks for this project.
Could you point me to the source code of this base docker image?
I have a docker image for a Spark 2.3 job that I could run successfully on Kubernetes using spark-submit. But when I started the job using the operator, the only things that got started were the driver pod and the UI svc, no Spark executors were launched. I didn't see any error in the sparkoperator pod and my app's driver pod. Anyone know what might be causing this issue? Btw, running the example spark-pi job using the operator works fine (executors got started etc).
When i used restful api to create a SparkApplication for the spark pi.I got duplicate drivers and service.
I saw the issue #155 , and i pull the new master and rebuild it,but it was always happening.
my k8s cluster has one master and two workers.
my restful request:
curl --basic -u username:passwd -k -l -H "Content-type: application/json" -X POST -d '{"apiVersion":"sparkoperator.k8s.io/v1alpha1","kind":"SparkApplication","metadata":{"name":"spark-pi001","namespace":"default"},"spec":{"driver":{"memory":"512m","coreLimit":"200m","cores":0.1,"labels":{"version":"2.3.0"},"serviceAccount":"spark","volumeMounts":[{"mountPath":"/SkyDiscovery/cephfs/user/username","name":"test-volume"}]},"executor":{"cores":1,"instances":2,"labels":{"version":"2.3.0"},"memory":"512m","volumeMounts":[{"mountPath":"/SkyDiscovery/cephfs/user/username","name":"test-volume"}]},"image":"spark:v2.3.0_skydata","mainApplicationFile":"local:///SkyDiscovery/cephfs/user/username/spark-examples_2.11-2.3.0.jar","arguments":[],"mainClass":"org.apache.spark.examples.SparkPi","mode":"cluster","restartPolicy":"Never","type":"Scala","volumes":[{"name":"test-volume","hostPath":{"path":"/SkyDiscovery/cephfs/user/username"}}]}}' https://*****:6443/apis/sparkoperator.k8s.io/v1alpha1/namespaces/default/sparkapplications
duplicate drivers๏ผ
spark-pi001-090fd4fdefb131e08b4142efea55eca9-driver 0/1 ContainerCreating 0 20s
spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-driver 0/1 ContainerCreating 0 19s
duplicate service๏ผ
spark-pi001-2249495700-ui-svc
spark-pi001-2284845769-ui-svc
spark-pi001-090fd4fdefb131e08b4142efea55eca9-driver-svc
spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-driver-svc
when i check the operator log, spark-pi001 was updated after it was added,then spark-submit was executed twice.
I0528 06:20:17.754806 1 controller.go:169] SparkApplication spark-pi001 was added, enqueueing it for submission
I0528 06:20:17.760060 1 sparkui.go:75] Creating a service spark-pi001-2284845769-ui-svc for the Spark UI for application spark-pi001
I0528 06:20:17.838537 1 submission_runner.go:81] spark-submit arguments: [/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master k8s://https://10.96.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=default --conf spark.app.name=spark-pi001 --conf spark.kubernetes.container.image=192.168.61.32:9980/skydiscovery/spark:v2.3.0_skydata --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi001 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-id=spark-pi001-2284845769 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.driver.cores=0.100000 --conf spark.kubernetes.driver.limit.cores=200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=2.3.0 --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGiEvU2t5RGlzY292ZXJ5L2NlcGhmcy91c2VyL3NqbHRlc3QiAA== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRIlCiMKIS9Ta3lEaXNjb3ZlcnkvY2VwaGZzL3VzZXIvc2psdGVzdA== --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi001 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-id=spark-pi001-2284845769 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.executor.instances=2 --conf spark.executor.cores=1 --conf spark.executor.memory=512m --conf spark.kubernetes.executor.label.version=2.3.0 --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGiEvU2t5RGlzY292ZXJ5L2NlcGhmcy91c2VyL3NqbHRlc3QiAA== --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRIlCiMKIS9Ta3lEaXNjb3ZlcnkvY2VwaGZzL3VzZXIvc2psdGVzdA== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/ownerreference=ChBTcGFya0FwcGxpY2F0aW9uGgtzcGFyay1waTAwMSIkMzBlM2ViMzctNjIzZi0xMWU4LWE2NzItZmExNjNlNWE4ZDg3Kh1zcGFya29wZXJhdG9yLms4cy5pby92MWFscGhhMQ== local:///SkyDiscovery/cephfs/user/username/spark-examples_2.11-2.3.0.jar]
I0528 06:20:18.338777 1 controller.go:218] SparkApplication spark-pi001 was updated, enqueueing it for submission
I0528 06:20:18.344042 1 sparkui.go:75] Creating a service spark-pi001-2249495700-ui-svc for the Spark UI for application spark-pi001
I0528 06:20:18.433535 1 submission_runner.go:81] spark-submit arguments: [/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master k8s://https://10.96.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=default --conf spark.app.name=spark-pi001 --conf spark.kubernetes.container.image=192.168.61.32:9980/skydiscovery/spark:v2.3.0_skydata --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi001 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-id=spark-pi001-2249495700 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.driver.cores=0.100000 --conf spark.kubernetes.driver.limit.cores=200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=2.3.0 --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGiEvU2t5RGlzY292ZXJ5L2NlcGhmcy91c2VyL3NqbHRlc3QiAA== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRIlCiMKIS9Ta3lEaXNjb3ZlcnkvY2VwaGZzL3VzZXIvc2psdGVzdA== --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi001 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-id=spark-pi001-2249495700 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.executor.instances=2 --conf spark.executor.cores=1 --conf spark.executor.memory=512m --conf spark.kubernetes.executor.label.version=2.3.0 --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGiEvU2t5RGlzY292ZXJ5L2NlcGhmcy91c2VyL3NqbHRlc3QiAA== --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRIlCiMKIS9Ta3lEaXNjb3ZlcnkvY2VwaGZzL3VzZXIvc2psdGVzdA== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/ownerreference=ChBTcGFya0FwcGxpY2F0aW9uGgtzcGFyay1waTAwMSIkMzBlM2ViMzctNjIzZi0xMWU4LWE2NzItZmExNjNlNWE4ZDg3Kh1zcGFya29wZXJhdG9yLms4cy5pby92MWFscGhhMQ== local:///SkyDiscovery/cephfs/user/username/spark-examples_2.11-2.3.0.jar]
I0528 06:20:20.672806 1 initializer.go:292] Processing Spark driver pod spark-pi001-090fd4fdefb131e08b4142efea55eca9-driver
I0528 06:20:20.672905 1 initializer.go:439] Removed initializer on pod spark-pi001-090fd4fdefb131e08b4142efea55eca9-driver
I0528 06:20:21.194235 1 initializer.go:292] Processing Spark driver pod spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-driver
I0528 06:20:21.194350 1 initializer.go:439] Removed initializer on pod spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-driver
I0528 06:20:21.238599 1 submission_runner.go:99] spark-submit completed for SparkApplication spark-pi001 in namespace default
I0528 06:20:21.749293 1 submission_runner.go:99] spark-submit completed for SparkApplication spark-pi001 in namespace default
I0528 06:21:53.275856 1 initializer.go:292] Processing Spark executor pod spark-pi001-090fd4fdefb131e08b4142efea55eca9-exec-2
I0528 06:21:53.275968 1 initializer.go:439] Removed initializer on pod spark-pi001-090fd4fdefb131e08b4142efea55eca9-exec-2
I0528 06:21:53.576619 1 initializer.go:292] Processing Spark executor pod spark-pi001-090fd4fdefb131e08b4142efea55eca9-exec-1
I0528 06:21:53.576683 1 initializer.go:439] Removed initializer on pod spark-pi001-090fd4fdefb131e08b4142efea55eca9-exec-1
I0528 06:21:58.583400 1 initializer.go:292] Processing Spark executor pod spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-exec-2
I0528 06:21:58.583499 1 initializer.go:439] Removed initializer on pod spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-exec-2
I0528 06:21:58.779218 1 initializer.go:292] Processing Spark executor pod spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-exec-1
I0528 06:21:58.779334 1 initializer.go:439] Removed initializer on pod spark-pi001-af2e021c6dcc37f3b404093f37da3fe8-exec-1
*However, when i use "kubectl apply -f .yaml" to create sparkApplication, i got the only driver.
I am new to k8s. Is there anyway to get the only driver and service.
And before the RSS is upstreamed, an intermediate alternative is to support staging local dependencies to a simple HTTP-based file server running inside a K8s cluster, without security baked in for the downloading part that is in-cluster communication. The simple file server probably should support HTTPs for the uploading part that is often from outside the cluster.
the current spark operator setup assumes that it runs in its own namespace and manages the sparkapp crds for the whole cluster. we were hoping that we could deploy this alongside our application which currently has the requirement that it has namespace isolation. with our current setup, if we have multiple applications on our cluster, each spark-operator sees all the crds on the cluster regardless of namespace. i was hoping i could restrict this with different role bindings but alas...
E0529 21:44:38.579020 1 reflector.go:205] k8s.io/spark-on-k8s-operator/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1alpha1.SparkApplication: sparkapplications.sparkoperator.k8s.io is forbidden: User "system:serviceaccount:sreisor:ideal-tuatara-sparkoperator" cannot list sparkapplications.sparkoperator.k8s.io at the cluster scope
E0529 21:44:38.579370 1 reflector.go:205] k8s.io/spark-on-k8s-operator/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1alpha1.ScheduledSparkApplication: scheduledsparkapplications.sparkoperator.k8s.io is forbidden: User "system:serviceaccount:sreisor:ideal-tuatara-sparkoperator" cannot list scheduledsparkapplications.sparkoperator.k8s.io at the cluster scope
is there anything we can do to restrict spark operator listing only within its namespace or a specific namespace? it's our first dependency that would break our ability to helm install
our application into an isolated namespace.
Batch-only-training is cool, but do you know whats cooler? E2E training to serving!
We could look at the ML pipeline stage as the starting point. But maybe add trait for exportable to.
Currently pastRunNames doesn't differentiate between successful and failed runs. Separating them like k8s cronjobs (failedJobsHistoryLimit
and successfulJobsHistoryLimit
) allow retaining failed jobs for later debugging.
Currently, cache resync is disabled for SparkApplication
objects. This has a potential problem: the controller may miss updates to SparkApplication
objects in cases the watch connection is lost and re-established later or the controller gets restarted during an upgrade. To prevent the controller from missing update events, it's critical to enable cache re-sync.
A related issue is how to handle re-listed SparkApplication
objects upon controller restarting. The controller needs to determine if it should submit for each existing SparkApplication
object.
Ref: liyinan926/spark-operator#1.
@sylus, I moved the issue over to this new repository where future development will be done.
Currently only container local dependencies or remote dependencies that can be downloaded are supported. This means users need either bake their dependencies into custom images or manually upload them to, e.g., a Google Cloud Storage bucket.
i'm trying to make use of the generated client code, but can't even fetch the dependency because the client refers to a repo that doesn't exist...
import (
sparkClientset "github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/client/clientset/versioned"
sparkV1aplpha1 "github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/apis/sparkoperator.k8s.io/v1alpha1"
)
$ go get
# cd .; git clone https://github.com/kubernetes/spark-on-k8s-operator /Users/scott/go/src/k8s.io/spark-on-k8s-operator
Cloning into '/Users/scott/go/src/k8s.io/spark-on-k8s-operator'...
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
package k8s.io/spark-on-k8s-operator/pkg/apis/sparkoperator.k8s.io: exit status 128
package k8s.io/spark-on-k8s-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1alpha1: cannot find package "k8s.io/spark-on-k8s-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1alpha1" in any of:
/usr/local/Cellar/go/1.10/libexec/src/k8s.io/spark-on-k8s-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1alpha1 (from $GOROOT)
/Users/scott/go/src/k8s.io/spark-on-k8s-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1alpha1 (from $GOPATH)
Suppose a user's cluster has a Spark operator bundled with version 2.3 of Spark Submit. Suppose then we introduce a breaking change in the contract between Spark Submit and the Spark application docker containers. Therefore any Spark application using Spark 2.4 needs to use Spark 2.4's spark submit, and conversely, all applications using Spark 2.3 need to use Spark 2.3's spark submit.
Unfortunately it's not easy right now to support having a single Spark operator deploying both Spark 2.3 and Spark 2.4 applications. If you upgrade the Spark operator's bundled Spark submit version to 2.4, suddenly all applications running Spark 2.3 will break.
There are multiple ways to tackle this problem. First of all, we should be ensuring that the contract between spark submit and the Spark driver it launches is stable. This has been the case for a long time with Spark on YARN.
However, we can be clever and allow for the above use case even if the above contract breaks, as follows:
sparkVersion
field to the spec of SpatkApplication
SparkApplication
if the sparkVersion
declared by that application is compatible with its bundled spark-submit version.This allows one to implement something like a rolling upgrade strategy for Spark applications run on their cluster. As an example, a cluster administrator can install a Spark 2.4 compatible Spark operator and leave the Spark 2.3 operator running on the cluster. Once the organization has upgraded all Spark applications to Spark 2.4 and higher, the Spark 2.3 operator can be decommissioned.
Tasks:
Job
to run spark-submit
.A better strategy for life cycle management of the CustomResouceDefinition
resource for SparkApplication
is needed. Currently, the controller creates the CustomResouceDefinition
resource upon starting and deletes it upon terminating. During a controller restart, existing SparkApplication
objects will be gone and users won't be able to create new SparkApplication
objects before the controller starts up again.
Installing from the user-guide and running the sample spark-pi.yaml application with the modification to make it RestartPolicy: Always
causes two drivers to be created after the first run completes.
Install the operator onto a 1.8.10 cluster:
$ kubectl create -f manifest/
Nothing is running:
$ kubectl get pods
No resources found.
Modify the restart policy to Always
, and submit the sample application:
$ kubectl create -f examples/spark-pi.yaml
sparkapplication "spark-pi" created
Logs of the submission:
$ kubectl logs -n sparkoperator sparkoperator-7598b5bb6-k2jm6 -f
I0518 20:59:48.813207 1 sparkui.go:74] Creating a service spark-pi-3506525398-ui-svc for the Spark UI for application spark-pi
I0518 20:59:48.837560 1 submission_runner.go:81] spark-submit arguments: [/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master k8s://https://10.3.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=default --conf spark.app.name=spark-pi --conf spark.kubernetes.container.image=gcr.io/ynli-k8s/spark:v2.3.0 --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-id=spark-pi-3506525398 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.driver.cores=0.100000 --conf spark.kubernetes.driver.limit.cores=200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=2.3.0 --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGgQvdG1wIgA= --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRITChEKBC90bXASCURpcmVjdG9yeQ== --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-id=spark-pi-3506525398 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.executor.instances=1 --conf spark.executor.cores=1 --conf spark.executor.memory=512m --conf spark.kubernetes.executor.label.version=2.3.0 --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGgQvdG1wIgA= --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRITChEKBC90bXASCURpcmVjdG9yeQ== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/ownerreference=ChBTcGFya0FwcGxpY2F0aW9uGghzcGFyay1waSIkNjZjMzE3ZmQtNWFkZS0xMWU4LWE3ZDgtMDI0Y2Y1NjlmMTZhKh1zcGFya29wZXJhdG9yLms4cy5pby92MWFscGhhMQ== local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar]
First driver and executor running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-pi-f01a49537f403ab39455f44f9132e36a-driver 1/1 Running 0 50s
spark-pi-f01a49537f403ab39455f44f9132e36a-exec-1 1/1 Running 0 2s
The application completes the calculation of Pi. The controller logs:
I0518 21:00:58.948971 1 controller.go:597] SparkApplication spark-pi failed or terminated, restarting it with RestartPolicy Always
I0518 21:00:59.046457 1 sparkui.go:74] Creating a service spark-pi-1742375757-ui-svc for the Spark UI for application spark-pi
I0518 21:00:59.052603 1 controller.go:597] SparkApplication spark-pi failed or terminated, restarting it with RestartPolicy Always
I0518 21:00:59.082823 1 submission_runner.go:81] spark-submit arguments: [/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master k8s://https://10.3.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=default --conf spark.app.name=spark-pi --conf spark.kubernetes.container.image=gcr.io/ynli-k8s/spark:v2.3.0 --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-id=spark-pi-1742375757 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.driver.cores=0.100000 --conf spark.kubernetes.driver.limit.cores=200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=2.3.0 --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGgQvdG1wIgA= --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRITChEKBC90bXASCURpcmVjdG9yeQ== --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-id=spark-pi-1742375757 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.executor.instances=1 --conf spark.executor.cores=1 --conf spark.executor.memory=512m --conf spark.kubernetes.executor.label.version=2.3.0 --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGgQvdG1wIgA= --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRITChEKBC90bXASCURpcmVjdG9yeQ== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/ownerreference=ChBTcGFya0FwcGxpY2F0aW9uGghzcGFyay1waSIkNjZjMzE3ZmQtNWFkZS0xMWU4LWE3ZDgtMDI0Y2Y1NjlmMTZhKh1zcGFya29wZXJhdG9yLms4cy5pby92MWFscGhhMQ== local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar]
E0518 21:00:59.408970 1 controller.go:612] failed to delete old driver pod spark-pi-f01a49537f403ab39455f44f9132e36a-driver of SparkApplication spark-pi: pods "spark-pi-f01a49537f403ab39455f44f9132e36a-driver" not found
I0518 21:00:59.414244 1 sparkui.go:74] Creating a service spark-pi-1588551100-ui-svc for the Spark UI for application spark-pi
I0518 21:00:59.836781 1 submission_runner.go:81] spark-submit arguments: [/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master k8s://https://10.3.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=default --conf spark.app.name=spark-pi --conf spark.kubernetes.container.image=gcr.io/ynli-k8s/spark:v2.3.0 --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-id=spark-pi-1588551100 --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.driver.cores=0.100000 --conf spark.kubernetes.driver.limit.cores=200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=2.3.0 --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGgQvdG1wIgA= --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRITChEKBC90bXASCURpcmVjdG9yeQ== --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-id=spark-pi-1588551100 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.executor.instances=1 --conf spark.executor.cores=1 --conf spark.executor.memory=512m --conf spark.kubernetes.executor.label.version=2.3.0 --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumemounts.test-volume=Cgt0ZXN0LXZvbHVtZRAAGgQvdG1wIgA= --conf spark.kubernetes.executor.annotation.sparkoperator.k8s.io/volumes.test-volume=Cgt0ZXN0LXZvbHVtZRITChEKBC90bXASCURpcmVjdG9yeQ== --conf spark.kubernetes.driver.annotation.sparkoperator.k8s.io/ownerreference=ChBTcGFya0FwcGxpY2F0aW9uGghzcGFyay1waSIkNjZjMzE3ZmQtNWFkZS0xMWU4LWE3ZDgtMDI0Y2Y1NjlmMTZhKh1zcGFya29wZXJhdG9yLms4cy5pby92MWFscGhhMQ== local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar]
I0518 21:01:02.913068 1 submission_runner.go:99] spark-submit completed for SparkApplication spark-pi in namespace default
I0518 21:01:03.915572 1 submission_runner.go:99] spark-submit completed for SparkApplication spark-pi in namespace default
And now, two drivers and their executors are running:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-pi-55f4dc6f838933ea9cde3cb54826a448-driver 1/1 Running 0 48s
spark-pi-55f4dc6f838933ea9cde3cb54826a448-exec-1 1/1 Running 0 2s
spark-pi-98bfa61b65903a70859b66af3f7529f8-driver 1/1 Running 0 49s
spark-pi-98bfa61b65903a70859b66af3f7529f8-exec-1 1/1 Running 0 2s
Any ideas for ensuring that only one runs?
Side question: if the driver or any one of the executors were to fail/stop/get rescheduled, does the controller notice and fully delete/redeploy the sparkapplication?
Thanks!
Currently, flags as --upload-to
, --project
and --namespace
are passed separately from application yaml file. Could it be an interesting addition, to allow in sparkctl create
to define these parameters inside the yaml?
This way SparkApplication yaml could specify which gcp project (in case of s3 - profile) to use, and location to upload dependencies for this specific app.
I am getting the RPC timeout between driver and executors. The simple sparkapplication
works fine.
Here is the logs from executor pod.
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Failure.recover(Try.scala:216)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from pyspark-example-1525728618655618581-1525728619985-driver-svc.default.svc.cluster.local:7078 in 120 seconds
... 8 more
The simple sparkapplication works fine:
apiVersion: "sparkoperator.k8s.io/v1alpha1"
kind: SparkApplication
metadata:
name: pyspark-job-example
namespace: default
spec:
type: Python
mode: cluster
mainApplicationFile: "local:///code/pyspark-example.py"
Arguments:
- "wasb://[email protected]/episodes.avro"
deps:
jars:
- "local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar"
sparkConf:
"spark.hadoop.fs.azure.account.key.sparkdkakadia.blob.core.windows.net": "4gIguFMiymTI4BoHL2LsbIdEVjDYUZfcgjYqAP6T+/eUXWH9RzJPQAlQpUsT1+hAAsVJdlZd9fBQ+ctZV+I55A=="
driver:
cores: 0.5
image: dharmeshkakadia/spark-py-example:latest
memory: "2048m"
labels:
version: 2.2.0
serviceAccount: spark
executor:
cores: 2
instances: 2
image: dharmeshkakadia/spark-executor-py:latest
memory: "1024m"
labels:
version: 2.2.0
restartPolicy: Never
However the same spec does not work under ScheduledSparkApplication
.
apiVersion: "sparkoperator.k8s.io/v1alpha1"
kind: ScheduledSparkApplication
metadata:
name: pyspark-example
namespace: default
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace
runHistoryLimit: 3
template:
type: Python
mode: cluster
mainApplicationFile: "local:///code/pyspark-example.py"
Arguments:
- "wasb://[email protected]/episodes.avro"
deps:
jars:
- "local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar"
sparkConf:
"spark.hadoop.fs.azure.account.key.sparkdkakadia.blob.core.windows.net": "4gIguFMiymTI4BoHL2LsbIdEVjDYUZfcgjYqAP6T+/eUXWH9RzJPQAlQpUsT1+hAAsVJdlZd9fBQ+ctZV+I55A=="
driver:
cores: 0.5
image: dharmeshkakadia/spark-py-example:latest
memory: "2048m"
labels:
version: 2.2.0
serviceAccount: spark
executor:
cores: 2
instances: 2
image: dharmeshkakadia/spark-executor-py:latest
memory: "1024m"
labels:
version: 2.2.0
restartPolicy: Never
Hi,
Thanks for the operator.
I was trying to play around and looking at the documentation (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md) I see Python as a supported type in the SparkApplication Spec but i am running into "Python applications are currently not supported for Kubernetes." .
I am assuming its stemming from the base spark image: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L332 . Thoughts ?
I can share more details if you believe this can be an issue on my end.
We have a lot of Streaming applications that need to be restarted if they ever die.
Currently, the controller considers an application finished if the driver pod dies for any reason.
We would like to be able to specify restartPolicy: Always
or restartPolicy: OnFailure
to get behavior similar to Kubernetes Deployments and Jobs, respectively, ie have the controller re-submit a completed or failed application.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.