A barebones project with scala, apache spark built using gradle. Spark-shell provides spark
and sc
variables pre-initialised, here I did the same using a scala trait that you can extend.
git clone [email protected]:sumedhav/spark-gradle-template.git
./gradlew clean build
./gradlew run
./gradlew clean run
Take a look at src->main->scala->template->spark directory
We have two Items here.
The trait InitSpark
which is extended by any class that wants to run spark code. This trait has all the code for initialization. I have also supressed the logging to only error levels for less noise.
The file Main.scala
has the executable class Main
.
In this class, I do 4 things
- Print spark version.
- Find sum from 1 to 100 (inclusive).
- Read a csv file into a structured
DataSet
. - Find average age of persons from the csv.
Just import it into your favorite IDE as a gradle project. Tested with IntelliJ to work. Or use your favorite editor and build from command line with gradle.
- Spark - 2.2.4
- Scala - 2.11.12