databricks / learning-spark Goto Github PK

Example code from Learning Spark book

License: MIT License

Shell 6.36% Scala 30.57% Java 49.49% R 0.19% Protocol Buffer 0.63% Perl 0.20% Python 12.22% Batchfile 0.34%

learning-spark's Introduction

Examples for Learning Spark

Examples for the Learning Spark book. These examples require a number of libraries and as such have long build files. We have also added a stand alone example with minimal dependencies and a small build file in the mini-complete-example directory.

These examples have been updated to run against Spark 1.3 so they may be slightly different than the versions in your copy of "Learning Spark".

Requirements

JDK 1.7 or higher
Scala 2.10.3

scala-lang.org

Spark 1.3
Protobuf compiler

On debian you can install with sudo apt-get install protobuf-compiler

R & the CRAN package Imap are required for the ChapterSixExample
The Python examples require urllib3

Python examples

From spark just run ./bin/pyspark ./src/python/[example]

Spark Submit

You can also create an assembly jar with all of the dependencies for running either the java or scala versions of the code and run the job with the spark-submit script

./sbt/sbt assembly OR mvn package cd $SPARK_HOME; ./bin/spark-submit --class com.oreilly.learningsparkexamples.[lang].[example] ../learning-spark-examples/target/scala-2.10/learning-spark-examples-assembly-0.0.1.jar

learning-spark's People

Contributors

Stargazers

Watchers

Forkers

tili qyoom dbl001 mubarak jamesbconner ichocolatekapa joshuajin holdenk chookin wypb mrt justin2061 stubey yanlong10829 regiscabaret lhfei josephwinston andersonhaynes narayana1208 linxiao91 greatyan daishichao wingwinwing bviki msurendra daimin dougneedham gdtm86 analog76 darrenfu lancespeelmon johnnysoccer echalkpad dcparrarincon henry20100102 qicst23 svishnu88 liataian ai-org jrabary datajunkie007 rtvt123 joeywen sravankreddy ssmaroju so-far-so-good vicchugu yenchih binweiwu swapniltadasare jackleg yzhang921 tspannatpivotal xiangacadia manishdalmiya jamespaul007 yue1harriet1 snowwolph mickdelaney tangboy mquinz realmeh mr1azl arvind282 jsmithqs1 tiravata rockythd anujlal01 gabhi arunkumarpt nour-mws mohankashyap deanmongel vikrame emmaggie punsiitg julapnet pablogore-zz djnesmith jdkizer9 defaultrobot gpathipaka tomduhourq yujunnokia qiuzhuang tswann davidhhshao magj2006 onepet lightlycat cathar erichmond prajithlal amey91 dpavankumarreddy madhutek rydt kucukil ludwsam kenmax

learning-spark's Issues

Difference between Running spark as local[*] Vs Yarn-client Vs Yarn-cluster in terms of performance

Kindly consider this as an inquiry if not an issue.

Hi ,
I am evaluating Spark to use here at my work.

We have an existing Hortonworks HDP 2.3 install.

I am trying to work out whether I should use local or client or cluster to submit a job in Spark.

Consider I am running my job as :
sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "local[*]" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt

In this I am able to do the task in 14 Sec.

When I run the same like
sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "yarn-client" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt

It takes 16 Second

And this one
sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "yarn-cluster" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt

Takes 18 Second.

As in first case I am running it locally means its running on one machine and taking less time where as in later caseI am submitting the job to cluster with 4 node.

So can anyone let me know what is the use of running the same in cluster as I am getting performance degrade with cluster.
Or if any way is there where I can enhance the performance with cluster.

Would love to hear from someone regarding this very urgently.

~Prashant

Example 3-13 TypeError

In the process of putting together an IPython Notebook with convenient worked examples from Learning Spark, I found a simple python semantic error.

Example 3-15 says:

print "Input had " + badLinesRDD.count() + " concerning lines"

which results in

TypeError                                 Traceback (most recent call last)
<ipython-input-10-078b22c97d4b> in <module>()
 ----> 1 print "Input had " + badLinesRDD.count() + " concerning lines"
  2 print "Here are 10 examples:"
  3 for line in badLinesRDD.take(10):
  4     print line

TypeError: cannot concatenate 'str' and 'int' objects

It should say something more like this:

print "Input had %d worrisome lines" % (badLinesRDD.count())

I made a gist of an ipython notebook showing the problem and the fix, with simple worked examples, which you can see (and download) here:

http://nbviewer.ipython.org/gist/nealmcb/b6d989a83adddcdd459f

I suggest including such notebooks in future editions.

should be username of every machine(master and slave) same or different in spark multi node ?

sbt assembly - FileNotFoundException (Interrupted system call)

trying to do sbt assembly, it gets partway through and then gives a ton of FileNotFoundExceptions, with (Interrupted system call). This output show the first couple:

...
[info] Including: chill-java-0.5.0.jar
[info] Including: spark-cassandra-connector-java_2.10-1.0.0-rc5.jar
[info] Including: scopt_2.10-3.2.0.jar
scala.collection.parallel.CompositeThrowable: Multiple exceptions thrown during a parallel computation: java.io.FileNotFoundException:     /Users/kcpaul/projects/learning-spark/target/streams/$global/assemblyOption/$global/streams/assembly/be9fe88db32697a9ad164c2f7c93714b1afaa465_2dc57d2    aeaf2dee095ca4149dbe0163f2d66a845.jarName (Interrupted system call)
java.io.FileInputStream.open(Native Method)
java.io.FileInputStream.<init>(FileInputStream.java:146)
sbt.Using$$anonfun$fileInputStream$1.apply(Using.scala:73)
sbt.Using$$anonfun$fileInputStream$1.apply(Using.scala:73)
sbt.Using$$anon$2.openImpl(Using.scala:65)
sbt.OpenFile$class.open(Using.scala:43)
sbt.Using$$anon$2.open(Using.scala:64)
sbt.Using$$anon$2.open(Using.scala:64)
sbt.Using.apply(Using.scala:23)
sbt.IO$.transfer(IO.scala:252)
.
.
.

java.io.FileNotFoundException: /Users/kcpaul/.ivy2/cache/org.json4s/json4s-core_2.10/jars/json4s-core_2.10-3.2.10.jar (Interrupted system call)
java.io.FileInputStream.open(Native Method)
java.io.FileInputStream.<init>(FileInputStream.java:146)
sbt.Using$$anonfun$fileInputStream$1.apply(Using.scala:73)
sbt.Using$$anonfun$fileInputStream$1.apply(Using.scala:73)
sbt.Using$$anon$2.openImpl(Using.scala:65)
sbt.OpenFile$class.open(Using.scala:43)
sbt.Using$$anon$2.open(Using.scala:64)
sbt.Using$$anon$2.open(Using.scala:64)
sbt.Using.apply(Using.scala:23)
sbtassembly.AssemblyUtils$.unzip(AssemblyUtils.scala:37)
.
.
.

java.io.FileNotFoundException: /Users/kcpaul/.sbt/boot/scala-2.10.4/lib/scala-reflect.jar (Interrupted system call)
java.io.FileInputStream.open(Native Method)
java.io.FileInputStream.<init>(FileInputStream.java:146)
sbt.Using$$anonfun$fileInputStream$1.apply(Using.scala:73)
sbt.Using$$anonfun$fileInputStream$1.apply(Using.scala:73)
sbt.Using$$anon$2.openImpl(Using.scala:65)
sbt.OpenFile$class.open(Using.scala:43)
sbt.Using$$anon$2.open(Using.scala:64)
sbt.Using$$anon$2.open(Using.scala:64)
sbt.Using.apply(Using.scala:23)
sbtassembly.AssemblyUtils$.unzip(AssemblyUtils.scala:37)
.
.
.

I found this where it says "this is caused by missing particular dependencies that may be excluded directly or indirectly in a build file. Just fix this and it is resolved."

Example 4-12 - PerKeyAvg for Python Incorrect

In the example, the map method shows to take a lambda with two parameters (key and xy), but it appears as though the python version of spark only has a map method that expects a lambda with just a single parameter.

So instead of the following

r = sumCount.map(lambda key, xy: (key, xy[0]/xy[1])).collectAsMap()

We should use

 r = sumCount.map( lambda kvp: ( kvp[0], kvp[1][0] / kvp[1][1] ) ).collectAsMap()

Want to create a Row Object In Java

In my analysis code I want to create a JavaRDD of type Row from another data.
Can u provide me solution for this

Manually adding scala.binary.version to Maven pom file

I had an issue during building the project due to missing property: scala.binary.version, which is used for:

<dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.binary.version}</artifactId>
      <version>2.2.1</version>
</dependency>

The property doesn't appear to be set in the pom file. I fixed the issue by adding my Scala version. Should this be added by default? I didn't see any documentation that this needs to be set, or am I doing it wrong?:

<properties>
    <java.version>1.7</java.version>
    <scala.binary.version>2.10</scala.binary.version>
 </properties>

Maven dependencies incorrect

I had issues building the learning-spark-master Maven project due to missing artifacts. I fixed this by modifying the following dependencies (just adding "_2.10" to the end) to match what's in the Central Maven repository:

    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-mllib_2.10</artifactId>
      <version>1.3.1</version>
    </dependency>
    <dependency> <!-- Cassandra -->
      <groupId>com.datastax.spark</groupId>
      <artifactId>spark-cassandra-connector_2.10</artifactId>
      <version>1.0.0-rc5</version>
    </dependency>
    <dependency> <!-- Cassandra -->
      <groupId>com.datastax.spark</groupId>
      <artifactId>spark-cassandra-connector-java_2.10</artifactId>
      <version>1.0.0-rc5</version>
    </dependency>

Example 4-17 incorrect

Two issues:

The initialization syntax is incorrect: { (Store(...) ...}
No code is provided for the Store class. Creating a case class Store(name: String) doesn't work properly. The joins don't work with Store objects as keys. I suspect one needs to override the hashCode method.

Any suggestions.

sbt.librarymanagement.ResolveException: Error downloading com.github.gseitz:sbt-protobuf;sbtVersion=1.0;scalaVersion=2.12:0.3.3

I'm trying to run this project but when I import the sbt I'm getting the following error message:

sbt.librarymanagement.ResolveException: Error downloading com.github.gseitz:sbt-protobuf;sbtVersion=1.0;scalaVersion=2.12:0.3.3

Any ideas? I've reinstalled all the required packages and everything seems to be fine.

json file for chapter 6 missing?

Cannot find the json data file for chapter 6

Task not serializable

Using the Jupyter notebook with Apache Toree as kernel

import play.api.libs.json._

case class Person(name: String, lovesPandas: Boolean)
implicit val personFormat = Json.format[Person]

val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""

val jsonParse = Json.parse(text)
val result = Json.fromJson[Person](jsonParse)
result.get

is working while

import org.apache.spark._
import play.api.libs.json._
import play.api.libs.functional.syntax._

case class Person(name: String, lovesPandas: Boolean)
implicit val personReads = Json.format[Person]

val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""

val input = sc.parallelize(List(text))
val parsed = input.map(Json.parse(_))
val result = parsed.flatMap(record => {    
    personReads.reads(record).asOpt
})
result.filter(_.lovesPandas).map(Json.toJson(_)).saveAsTextFile("files/out/pandainfo.json")

returns with

Name: org.apache.spark.SparkException
Message: Task not serializable
StackTrace: org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
[...]

Seems to be there is a general problem...?

sbt assembly error

I downloaded the usb.zip and am following the OReilly video course Introduction to Apache Spark. There are some places where the presentation is out of sync with the usb.zip contents but this is where it really hurts - I cant seem to compile the examples with the instruction provided in the video, that is, run the sbt.cmd file with the parameter assembly. Is there a fix for this? This is the error I get.

D:\Spark\usb\spark-training\sbt>sbt assembly
[info] Set current project to sbt (in build file:/D:/Spark/usb/spark-training/sbt/)
[error] Not a valid command: assembly
[error] Not a valid project ID: assembly
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assembly
[error] assembly
[error] ^

D:\Spark\usb\spark-training\sbt>

'sbt package' fails due to schema proto files

sbt package

Compiling schema /databricks/learning-spark-master/src/main/protobuf/places.proto
java.lang.RuntimeException: error occured while compiling protobuf files: Cannot run program "protoc": error=2, No such file or directory

Caused by: java.io.IOException: Cannot run program "protoc": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)

Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:248)

error error occured while compiling protobuf files: Cannot run program "protoc": error=2, No such file or directory

Doesn't even run...

I would like to run BasicLoadJson.java, but this does not work. I cloned the repo, maven clean build it and tried to run it, but I get the error

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/FlatMapFunction
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
    at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
    at java.lang.Class.getMethod0(Class.java:3018)
    at java.lang.Class.getMethod(Class.java:1784)
    at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.function.FlatMapFunction
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Why is that? What can I do?

UnicodeEncodeError: 'ascii' codec can't encode characters in position 186-189: ordinal not in range(128)

hello:
the function "saveAsTextFile or TextFile" did not run on linux.

Spark RDD pipe example got "/usr/bin/env: RScript: No such file or directory" error

I ran PipeExample with master = local using original codes and I got
java.io.IOException: Cannot run program "/tmp/spark-8db4969b-f76c-4377-b40a-acb4c6c6a7bb/userFiles-988a7579-afc7-43be-ad86-dcd98861f269/home/dev/projects/samples/learning-spark-master/src/R/finddistance.R": error=2, No such file or directory

I think this is a mistake. We should give the file name instead of full file path to SparkFiles.get. Spark will locate the file in the designated tmp folder for us. Therefore, I rewrite codes followingChapterSixExample as the below and I still got errors
16/01/01 17:16:17 INFO Utils: /home/dev/projects/samples/spark-tutorial/src/R/finddistance.R has been previously copied to /tmp/spark-8a56a6c2-32cc-4ec4-ace4-bfd62c796881/userFiles-6c9e6b58-3d70-49f4-bb6b-7f55178adabc/finddistance.R
/usr/bin/env: RScript: No such file or directory
There should not any permission issue for tmp folder. However, I cannot find that specific folder. Any idea, how I can resolve this issue and get either PipeExample or ChapterSixExample to work. Thanks

val pwd = System.getProperty("user.dir")
val distScript = pwd + "/src/R/finddistance.R"
val distScriptName = "finddistance.R"
sc.addFile(distScript)
val piped = rdd.pipe(Seq(SparkFiles.get(distScriptName))

build failure due to protocol buffer

Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
at sbt.SimpleProcessBuilder.run(ProcessImpl.scala:349)
at sbt.AbstractProcessBuilder.run(ProcessImpl.scala:128)
at sbt.AbstractProcessBuilder$$anonfun$runBuffered$1.apply(ProcessImpl.scala:159)
at sbt.AbstractProcessBuilder$$anonfun$runBuffered$1.apply(ProcessImpl.scala:159)
at sbt.BufferedLogger.buffer(BufferedLogger.scala:25)
at sbt.AbstractProcessBuilder.runBuffered(ProcessImpl.scala:159)
at sbt.AbstractProcessBuilder.$bang(ProcessImpl.scala:156)
at sbtprotobuf.ProtobufPlugin$$anonfun$protobufSettings$6$$anonfun$apply$1.apply(ProtobufPlugin.scala:27)
at sbtprotobuf.ProtobufPlugin$$anonfun$protobufSettings$6$$anonfun$apply$1.apply(ProtobufPlugin.scala:27)
at sbtprotobuf.ProtobufPlugin$.executeProtoc(ProtobufPlugin.scala:66)
at sbtprotobuf.ProtobufPlugin$.sbtprotobuf$ProtobufPlugin$$compile(ProtobufPlugin.scala:81)
at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1$$anonfun$5.apply(ProtobufPlugin.scala:107)
at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1$$anonfun$5.apply(ProtobufPlugin.scala:106)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:249)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:245)
at sbt.Difference.apply(Tracked.scala:224)
at sbt.Difference.apply(Tracked.scala:206)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:245)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:244)
at sbt.Difference.apply(Tracked.scala:224)
at sbt.Difference.apply(Tracked.scala:200)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:244)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:242)
at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1.apply(ProtobufPlugin.scala:109)
at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1.apply(ProtobufPlugin.scala:104)
at scala.Function7$$anonfun$tupled$1.apply(Function7.scala:35)
at scala.Function7$$anonfun$tupled$1.apply(Function7.scala:34)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
error error occured while compiling protobuf files: Cannot run program "protoc": error=2, No such file or directory

java.lang.NoClassDefFoundError: org/apache/spark/SparkConf

package com.oreilly.learningsparkexamples.java;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class JavaSparkTest {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local").setAppName("My App");
JavaSparkContext sc = new JavaSparkContext(conf);
// JavaRDD rdd = sc.textFile("D:\工作\20171227\newdesc.xml");
// JavaRDD lineee = rdd.filter(line -> line.contains("desc"));
// System.out.println(lineee.count());
}
}

直接git到本地，然后依赖包也全部下载完毕了，但是始终报错找不到jar包：
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at com.oreilly.learningsparkexamples.java.JavaSparkTest.main(JavaSparkTest.java:12)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more

工具是使用的IntelliJ Idea

Basic ideas to solve Spark OOM: Count all the high frequence words in a big table

The detail question is:

I want to count all the high frequence words in a big table.

I split each sentence of each row, then flatmap to one word per row, then groupby, then count the word number in each group.

It will OOM.

sbt assembly error

After running "sbt assembly", I got the following error:

[error] learning-spark/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJsonWithJackson.scala:47: not found: type ioRecord
[error] Some(mapper.readValue(record, classOf[ioRecord]))
[error] ^
[error] learning-spark/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJsonWithJackson.scala:53: value lovesPandas is not a member of Nothing
[error] result.filter(.lovesPandas).map(mapper.writeValueAsString())
[error] ^
[error] two errors found
error Compilation failed
[error] Total time: 24 s, completed Jun 17, 2015 10:00:31 PM

Removing BasicParseJsonWithJackson.scala and recompiling would result in other errors related to protocol buffer. Has anyone successfully built the project? How did you do it?

Thanks!

learning-spark/files/int_string.csv

It seems that the Kommas are lost, so the example on Site 177 (Working with Beeline) does not work.

sample 2.7

This code is does not work:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)

how to read flattened type field from elasticsearch using pyspark connector?

After reading index from elasticsearch using pyspark in databricks - All fields appeared but only one field which is a type of 'flattened'. Is there any option that i need to include during read. Following is my snippet to read from index.

spark.read.format("org.elasticsearch.spark.sql")\
      .option("es.nodes", ",".join(db['nodes']))\
      .option("es.mapping.date.rich", "false")\
      .option("es.net.http.auth.user", 'abc')\
      .option("es.net.http.auth.pass", '123abc')\
      .load('indexname')

elasticsearch: 7.5.1
spark: 2.4.3

java.lang.ClassNotFoundException: com.oreilly.learningsparkexamples.mini.scala.WordCount

I run sbt clean package is successful. and then execute
spark-submit --class com.oreilly.learningsparkexamples.mini.scala.WordCount ./target/.. ./README.md ./wordcount
show ClassNotFoundException.
build.sbt

version := "0.0.1"   
scalaVersion := "2.10.6"   
// additional libraries   
libraryDependencies ++= Seq(   
  "org.apache.spark" %% "spark-core" % "2.2.0" % "provided"   
)    

spark version is 2.2.0

scala version is   
`Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL`

object proto is not a member of package com.oreilly.learningsparkexamples

I am getting the object proto is not a member of package import com.oreilly.learningsparkexamples.proto.Places error while importing into scala ide 4.7.0

FTP example does not work...

i keep on getting
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed without indication.
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:298)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:495)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:537)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:586)
at org.apache.commons.net.ftp.FTP.quit(FTP.java:794)
at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:788)
at org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1642)
at org.apache.hadogop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:257)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)

Pls note i have tried also connecting to SEC FTP (publicly available using anonymous:youremailaddress and i am still receiving issues....
could you please assist in guiding me on opening a FTP file using spark?
my email address is [email protected]

kind regards