Giter Club home page Giter Club logo

Spark with History Server

This spotguide deploys Spark with Spark History Server to a Kubernetes cluster allowing users to submitting and running spark applications in a namespace on the cluster.

Spark being deployed is the upstream Spark with a couple of additions from Banzai Cloud such as:

  • Spark application execution resiliency implemented with the use of Kubernetes jobs. More details can be found here
  • Checkpointing for Spark streaming application. More details can be found here
  • Upstream Spark 2.4.3 version on Kubernetes does not support encrypted RPC communication between driver and executors. Support for encrypted RPC communication will be added starting from version 3.0.0 (which is not released yet) thus we backported it into Spark 2.4.3 to make this functionality available through this Spotguide.

You can choose between version the following Spark versions: 2.4.3, 2.3.2. In version 2.3.2 you can run only applications written in Scala, while in case of 2.4.3 you can run Python and R code as well. The following encryption options are also available in 2.4.3 :

spark.authenticate=true
spark.network.crypto.enabled=true
spark.io.encryption.enabled=true

Once your spotguide is deployed successfully, you should have an up and running History Server plus RBAC resources needed to be able to submit Spark applications with spark-submit. You can find a good description of Kubernetes related spark-submit options below:

Spark on Kubernetes currently is not able to access he submission client’s local file system, you must either build your source files, dependencies into Spark Docker image, or if they must be hosted in remote locations like HDFS or HTTP. Object storages like S3, Azure Blob, Google Storage, Oracle Blob Storage, OSS are supported through corresponding HDSF connectors. Spark File API is also able to access these storages, event logs can be placed here as well, just be aware that apart from Azure Storage you can not use different storage credentials for an Object Storage, like you can not specify different AWS credentials for your source files and Spark event log files in S3.

Checkout the below table which prefix to use in case of different storages:

Storage Prefix Example Credentials
local in Docker image local local:///opt/spark/examples/src/main/python/pi.py -
Amazon S3 s3, s3a s3a://bucketName/sample.txt AWS credentials must be set in the following configuration properties for spark-submit: spark.hadoop.fs.s3a.access.key and spark.hadoop.fs.s3a.secret.key
Azure Blob wasb, wasbs wasb://exampleAzureBlobName@exampleStorageAccountName.blob.core.windows.net Azure storage account access key must be set as follows: spark.hadoop.fs.azure.account.key.exampleStorageAccountName.blob.core.windows.net=storageAccountAccessKey
Google Cloud Storage gs gs://bucketName -
Alibaba Object Storage oss oss://bucketName -
Oracle Cloud Storage oci oci://bucketName -

Lynn Calvo's Projects

typhoon icon typhoon

Minimal and free Kubernetes distribution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.