Giter Club home page Giter Club logo

sparklens's Introduction

Gitter

README

Sparklens is a profiling tool for Spark with a built-in Spark scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of Spark applications. It helps in understanding how efficiently a given Spark application is using the compute resources provided to it. Maybe your application will run faster with more executors and may be it wont. Sparklens can answer this question by looking at a single run of your application.

It helps you narrow down to few stages (or driver, or skew or lack of tasks) which are limiting your application from scaling out and provides contextual information about what could be going wrong with these stages. Primarily it helps you approach spark application tuning as a well defined method/process instead of something you learn by trial and error, saving both developer and compute time.

Sparklens Reporting as a Service

http://sparklens.qubole.com is a reporting service built on top of Sparklens. This service was built to lower the pain of sharing and discussing Sparklens output. Users can upload the Sparklens JSON file to this service and retrieve a global sharable link. The link delivers the Sparklens report in an easy-to-consume HTML format with intuitive charts and animations. It is also useful to have a link for easy reference for yourself, in case some code changes result in lower utilization or make the application slower.

What does it report?

  • Estimated completion time and estimated cluster utilisation with different numbers of executors
Executor count    31  ( 10%) estimated time 87m 29s and estimated cluster utilization 92.73%
Executor count    62  ( 20%) estimated time 47m 03s and estimated cluster utilization 86.19%
Executor count   155  ( 50%) estimated time 22m 51s and estimated cluster utilization 71.01%
Executor count   248  ( 80%) estimated time 16m 43s and estimated cluster utilization 60.65%
Executor count   310  (100%) estimated time 14m 49s and estimated cluster utilization 54.73%

Given a single run of a Spark application, Sparklens can estimate how your application will perform given any arbitrary number of executors. This helps you understand the ROI on adding executors.

  • Job/stage timeline which shows how the parallel stages were scheduled within a job. This makes it easy to visualise the DAG with stage dependencies at the job level.
07:05:27:666 JOB 151 started : duration 01m 39s 
[    668 ||||||||||||||||||||||                                                          ]
[    669 |||||||||||||||||||||||||||                                                     ]
[    673                                                                                 ]
[    674                         ||||                                                    ]
[    675                            |||||||                                              ]
[    676                                   ||||||||||||||                                ]
[    677                         |||||||                                                 ]
[    678                                                 |                               ]
[    679                                                  |||||||||||||||||||||||||||||||]

*Lots of interesting per-stage metrics like Input, Output, Shuffle Input and Shuffle Output per stage. OneCoreComputeHours available and used per stage to discover inefficient stages.

Total tasks in all stages 189446
Per Stage  Utilization
Stage-ID   Wall    Task      Task     IO%    Input     Output    ----Shuffle-----    -WallClockTime-    --OneCoreComputeHours---   MaxTaskMem
          Clock%  Runtime%   Count                               Input  |  Output    Measured | Ideal   Available| Used%|Wasted%                                  
       0    0.00    0.00         2    0.0  254.5 KB    0.0 KB    0.0 KB    0.0 KB    00m 04s   00m 00s    05h 21m    0.0  100.0    0.0 KB 
       1    0.00    0.01        10    0.0  631.1 MB    0.0 KB    0.0 KB    0.0 KB    00m 07s   00m 00s    08h 18m    0.2   99.8    0.0 KB 
       2    0.00    0.40      1098    0.0    2.1 GB    0.0 KB    0.0 KB    5.7 GB    00m 14s   00m 00s    16h 25m    3.2   96.8    0.0 KB 
       3    0.00    0.09       200    0.0    0.0 KB    0.0 KB    5.7 GB    2.3 GB    00m 03s   00m 00s    04h 35m    2.6   97.4    0.0 KB 
       4    0.00    0.03       200    0.0    0.0 KB    0.0 KB    2.3 GB    0.0 KB    00m 01s   00m 00s    01h 13m    2.9   97.1    0.0 KB 
       7    0.00    0.03       200    0.0    0.0 KB    0.0 KB    2.3 GB    2.7 GB    00m 02s   00m 00s    02h 27m    1.7   98.3    0.0 KB 
       8    0.00    0.03        38    0.0    0.0 KB    0.0 KB    2.7 GB    2.7 GB    00m 05s   00m 00s    06h 20m    0.6   99.4    0.0 KB 

Internally, Sparklens has the concept of an analyzer which is a generic component for emitting interesting events. The following analyzers are currently available:

  1. AppTimelineAnalyzer
  2. EfficiencyStatisticsAnalyzer
  3. ExecutorTimelineAnalyzer
  4. ExecutorWallclockAnalyzer
  5. HostTimelineAnalyzer
  6. JobOverlapAnalyzer
  7. SimpleAppAnalyzer
  8. StageOverlapAnalyzer
  9. StageSkewAnalyzer

We are hoping that Spark experts the world over will help us with ideas or contributions to extend this set. Similarly, Spark users can help us in finding what is missing here by raising challenging tuning questions.

How to use Sparklens?

1. Using the Sparklens package while running your app

Note: Apart from the console based report, you can also get an UI based report similar to this in your email. You have to pass --conf spark.sparklens.report.email=<email> along with other relevant confs mentioned below. This functionality is available in Sparklens 0.3.2 and above.

Use the following arguments to spark-submit or spark-shell:

--packages qubole:sparklens:0.3.2-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener

2. Run from Sparklens offline data

You can choose not to run sparklens inside the app, but at a later time. Run your app as above with additional configuration parameters:

--packages qubole:sparklens:0.3.2-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true

This will not run reporting, but instead create a Sparklens JSON file for the application which is stored in the spark.sparklens.data.dir directory (by default, /tmp/sparklens/). Note that this will be stored on HDFS by default. To save this file to s3, please set spark.sparklens.data.dir to s3 path. This data file can now be used to run Sparklens reporting independently, using spark-submit command as follows:

./bin/spark-submit --packages qubole:sparklens:0.3.2-s_2.11 --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg <filename>

<filename> should be replaced by the full path of sparklens json file. If the file is on s3 use the full s3 path. For files on local file system, use file:// prefix with the local file location. HDFS is supported as well.

You can also upload a Sparklens JSON data file to http://sparklens.qubole.com to see this report as an HTML page.

3. Run from Spark event-history file

You can also run Sparklens on a previously run spark-app using an event history file, (similar to running via sparklens-json-file above) with another option specifying that is file is an event history file. This file can be in any of the formats the event history files supports, i.e. text, snappy, lz4 or lzf. Note the extra source=history parameter in this example:

./bin/spark-submit --packages qubole:sparklens:0.3.2-s_2.11 --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg <filename> source=history

It is also possible to convert an event history file to a Sparklens json file using the following command:

./bin/spark-submit --packages qubole:sparklens:0.3.2-s_2.11 --class com.qubole.sparklens.app.EventHistoryToSparklensJson qubole-dummy-arg <srcDir> <targetDir>

EventHistoryToSparklensJson is designed to work on local file system only. Please make sure that the source and target directories are on local file system.

4. Checkout the code and use the normal sbt commands:

sbt compile 
sbt package 
sbt clean 

You will find the Sparklens jar in the target/scala-2.11 directory. Make sure the Scala and Java versions correspond to those required by your Spark cluster. We have tested it with Java 7/8, Scala 2.11.8 and Spark versions 2.0.0 and onwards.

Once you have the Sparklens JAR available, add the following options to your spark-submit command line:

--jars /path/to/sparklens_2.11-0.3.2.jar 
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener

You could also add this to your cluster's spark-defaults.conf so that it is automatically available for all applications.

Working with Notebooks

It is possible to use Sparklens in your development cycle using notebooks. Sparklens keeps lots of information in-memory. To make it work with notebooks, it tries to minimize the amount of memory by keeping limited history of jobs executed in Spark.

How to use Sparklens with Python notebooks (e.g. Zeppelin)

  1. Add this as the first cell
QNL = sc._jvm.com.qubole.sparklens.QuboleNotebookListener.registerAndGet(sc._jsc.sc())
import time

def profileIt(callableCode, *args):
if (QNL.estimateSize() > QNL.getMaxDataSize()):
  QNL.purgeJobsAndStages()
startTime = long(round(time.time() * 1000))
result = callableCode(*args)
endTime = long(round(time.time() * 1000))
time.sleep(QNL.getWaiTimeInSeconds())
print(QNL.getStats(startTime, endTime))
  1. Wrap your code in some python function say myFunc
  2. profileIt(myFunc)

As you can see this is not the only way to use it from Python. The core function is: QNL.getStats(startTime, endTime)

Another way to use this tool, so that we don’t need to worry about objects going out of scope is:

Create the QNL object as part of the first paragraph For every piece of code that requires profiling:

if (QNL.estimateSize() > QNL.getMaxDataSize()):
  QNL.purgeJobsAndStages()
startTime = long(round(time.time() * 1000))

<-- Your Python code here -->

endTime = long(round(time.time() * 1000))
time.sleep(QNL.getWaiTimeInSeconds())
print(QNL.getStats(startTime, endTime))

QNL.purgeJobsAndStages() is responsible for making sure that the tool doesn’t use too much memory. It removes historical information, throwing away data about old stages to keep the memory usage by the tool modest.

How to use Sparklens with Scala notebooks (e.g. Zeppelin)

  1. Add this as the first cell
import com.qubole.sparklens.QuboleNotebookListener
val QNL = new QuboleNotebookListener(sc.getConf)
sc.addSparkListener(QNL)
  1. Anywhere you need to profile the code:
QNL.profileIt {
    //Your code here
}

It is important to realize that QNL.profileIt takes a block of code as input. Hence any variables declared in this part are not accessible after the method returns. Of course it can refer to other code/variables in scope.

The way to go about using this tool with notebooks is to have only one cell in the profiling scope. The moment you are happy with the results, just remove the profiling wrapper and execute the same cell again. This will ensure that your variables come back in scope and are accessible to next cell. Also note that, the output of the tool in notebooks is little different from what you would see in command line. This is just to make the information concise. We will be making this part configurable.

Working with Streaming Applications

For using Sparklens with Spark Streaming applications, check out our new project Streaminglens.

More informtaion?

Release Notes

  • [03/20/2018] Version 0.1.1 - Sparklens Core
  • [04/06/2018] Version 0.1.2 - Package name fixes
  • [08/07/2018] Version 0.2.0 - Support for offline reporting
  • [01/10/2019] Version 0.2.1 - Stability fixes
  • [05/10/2019] Version 0.3.0 - Support for handling parallel Jobs
  • [05/10/2019] Version 0.3.1 - Fixed JSON parsing issue with Spark 2.4.0 and above
  • [05/06/2020] Version 0.3.2 - Support for generating email based report using sparklens.qubole.com

Contributing

We haven't given this much thought. Just raise a PR and if you don't hear from us, shoot an email to [email protected] to get our attention.

Reporting bugs or feature requests

Please use the GitHub issues for the Sparklens project to report issues or raise feature requests. If you can code, better raise a PR.

sparklens's People

Contributors

abhishekd0907 avatar beriaanirudh avatar emlyn avatar fdemesmaeker avatar iamrohit avatar mayurdb avatar michaelmior avatar rishitesh avatar vrajat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparklens's Issues

QuboleJobListener threw an exception java.lang.NullPointerException

Getting this error which leads to my job failure. This exception leads to this error - org.apache.spark.SparkException: Job 33 canceled because SparkContext was shut down

18/11/22 08:47:10 ERROR AsyncEventQueue: Listener QuboleJobListener threw an exception
java.lang.NullPointerException
	at scala.collection.mutable.HashTable$$anon$1.next(HashTable.scala:214)
	at scala.collection.mutable.HashTable$$anon$1.next(HashTable.scala:206)
	at scala.collection.mutable.HashMap$$anon$3.next(HashMap.scala:115)
	at scala.collection.IterableLike$class.head(IterableLike.scala:107)
	at scala.collection.AbstractIterable.head(Iterable.scala:54)
	at scala.collection.TraversableLike$class.last(TraversableLike.scala:431)
	at scala.collection.AbstractTraversable.last(Traversable.scala:104)
	at com.qubole.sparklens.common.AppContext$.getMap(AppContext.scala:100)
	at com.qubole.sparklens.timespan.JobTimeSpan.getMap(JobTimeSpan.scala:91)
	at com.qubole.sparklens.common.AppContext$$anonfun$getMap$1.apply(AppContext.scala:102)
	at com.qubole.sparklens.common.AppContext$$anonfun$getMap$1.apply(AppContext.scala:102)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
	at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
	at scala.collection.SetLike$class.map(SetLike.scala:92)
	at scala.collection.AbstractSet.map(Set.scala:47)
	at com.qubole.sparklens.common.AppContext$.getMap(AppContext.scala:102)
	at com.qubole.sparklens.common.AppContext.toString(AppContext.scala:58)
	at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:137)
	at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:167)
	at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
	at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
	at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
	at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1323)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

This is the command I am using
spark-submit --jars /home/hadoop/ayush/sparklens_2.11-0.2.0.jar --conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener --class com.oyo.spark.application.MergeIncrementalData --master yarn --deploy-mode cluster --queue ingestion /home/hadoop/jp/application-0.0.1-SNAPSHOT/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod ingestiondb.bookings

Executor IDs not found when analyzing spark history log

I am running sparklens with a spark history file passed as parameter and I get a bunch of java.util.NoSuchElementException: key not found from com.qubole.sparklens.QuboleJobListener.onExecutorRemoved
Note that after several exceptions, the analysis goes fine.

If I understood correctly, this code is triggered when an executor is removed and look for the executor data in the executorMap, implying that this executor id is present in the executor map, which makes sense. Do you have any idea why this happens?
I see in my stacktrace this happening for 4-5 executor ids.
Does it make sense to access the map with a get and catch this exception to not pollute the output of the analysis?

I am running on spark 2.3 with speculation and dynamic allocation enabled (not sure it the latter can be related to the problem)

The full stacktrace is here https://www.pastiebin.com/5bef1290999d2

UPDATE: I see other exception raising that look like the following

Exception in thread "pool-4-thread-3" java.util.NoSuchElementException: key not found: executorRuntime

for different thread names

SparkLens reporting from event-history files

After sparklens-9 (de-coupling reporting from spark-app), we should be able to run SparkLens simulations from event-history files as well. This will give 2 advantages:

  1. Older run jobs which didn't have sparkLens, will also profilable.
  2. Usability: User have additional option for profiling: event-history-log files; in addition to in-app and sparkLens files.

Crash

I had a run crash with the following stacktrace (IIUC caused by a job with no stages?) causing my spark app to fail:

java.lang.UnsupportedOperationException: empty.max
	at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
	at scala.collection.AbstractTraversable.max(Traversable.scala:104)
	at com.qubole.sparklens.timespan.JobTimeSpan.computeCriticalTimeForJob(JobTimeSpan.scala:52)
	at com.qubole.sparklens.analyzer.EfficiencyStatisticsAnalyzer$$anonfun$4.apply(EfficiencyStatisticsAnalyzer.scala:62)
	at com.qubole.sparklens.analyzer.EfficiencyStatisticsAnalyzer$$anonfun$4.apply(EfficiencyStatisticsAnalyzer.scala:62)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at com.qubole.sparklens.analyzer.EfficiencyStatisticsAnalyzer.analyze(EfficiencyStatisticsAnalyzer.scala:62)
	at com.qubole.sparklens.analyzer.AppAnalyzer$class.analyze(AppAnalyzer.scala:30)
	at com.qubole.sparklens.analyzer.EfficiencyStatisticsAnalyzer.analyze(EfficiencyStatisticsAnalyzer.scala:26)
	at com.qubole.sparklens.QuboleJobListener$$anonfun$onApplicationEnd$3.apply(QuboleJobListener.scala:159)
	at com.qubole.sparklens.QuboleJobListener$$anonfun$onApplicationEnd$3.apply(QuboleJobListener.scala:158)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
	at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:158)
	at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
	at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
	at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
	at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
	at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
	at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
	at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
	at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
	at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)

Emailing report feature Not Working - Unresponsive post

Upon giving the email id address to conf "--conf spark.sparklens.report.email=" in spark submit, the jar is internally posting the request of report generation from "http://sparklens.qubole.com/generate_report/request_generate_report" , This website is unresponsive, because of which an I/O exception (java.net.SocketException) is thrown.

Object and method name where the request is made-
HttpRequestHandler.requestReport

Snippet of the Exception -
21/06/27 12:10:48.913 INFO org.apache.http.impl.execchain.RetryExec: I/O exception (java.net.SocketException) caught when processing request to {}->http://sparklens.qubole.com:80: Connection reset
21/06/27 12:10:48.915 INFO org.apache.http.impl.execchain.RetryExec: Retrying request to {}->http://sparklens.qubole.com:80
21/06/27 12:11:00.928 INFO org.apache.http.impl.execchain.RetryExec: I/O exception (java.net.SocketException) caught when processing request to {}->http://sparklens.qubole.com:80: Connection reset
21/06/27 12:11:00.930 INFO org.apache.http.impl.execchain.RetryExec: Retrying request to {}->http://sparklens.qubole.com:80
21/06/27 12:11:12.939 INFO org.apache.http.impl.execchain.RetryExec: I/O exception (java.net.SocketException) caught when processing request to {}->http://sparklens.qubole.com:80: Connection reset
21/06/27 12:11:12.941 INFO org.apache.http.impl.execchain.RetryExec: Retrying request to {}->http://sparklens.qubole.com:80
Error while trying to generate email report: Connection reset
Try to use sparklens.qubole.com to generate the report manually

Scalability aware Autoscaling with Sparklens

With repetitive workloads (such as ETLs), Sparklens can leverage the knowledge of resource-requirements from previous runs of a spark application, and use it to autoscale executor requirements such that the same latency of spark application is met with the minimum executors needed at every job. This provided all other configurations of the application remain same.

This can be done by the following:

  1. One the first run on an app, the Sparklens-json will contain all the information regarding this need. We will now show graphs showing the actual executors scaling Vs the minimum executor autoscaling in which the same latency of app can be achived. This minimum number is per-job-basis for the application.
  2. When the same app is run again, user can pass the Sparklens-json from the previous run, and another configuration to let Sparklens dictate autoscaling of executors for this run.

Issue with S3 directory as spark.sparklens.data.dir

Hi,
I tried specifying an S3 location as spark.sparklens.data.dir and I am getting the following error.
The issue is reproduced in every job.

As per my understanding, this is the code which is dumping the spark lens json file - https://github.com/qubole/sparklens/blob/master/src/main/scala/com/qubole/sparklens/QuboleJobListener.scala#L135
This should work with S3 directories as well.

19/02/27 07:23:18 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.ExceptionInInitializerError
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createAndTrack(TemporaryDirectoriesGenerator.java:145)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createTemporaryDirectories(TemporaryDirectoriesGenerator.java:94)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:642)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:171)
at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:136)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
Caused by: java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoryShutdownHook.(TemporaryDirectoryShutdownHook.java:17)
... 22 more
19/02/27 07:23:18 ERROR Utils: throw uncaught fatal error in thread SparkListenerBus
java.lang.ExceptionInInitializerError
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createAndTrack(TemporaryDirectoriesGenerator.java:145)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createTemporaryDirectories(TemporaryDirectoriesGenerator.java:94)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:642)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:171)
at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:136)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
Caused by: java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoryShutdownHook.(TemporaryDirectoryShutdownHook.java:17)
... 22 more
19/02/27 07:23:18 INFO SparkContext: SparkContext already stopped.
Exception in thread "SparkListenerBus" java.lang.ExceptionInInitializerError
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createAndTrack(TemporaryDirectoriesGenerator.java:145)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createTemporaryDirectories(TemporaryDirectoriesGenerator.java:94)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:642)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:171)
at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:136)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
Caused by: java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoryShutdownHook.(TemporaryDirectoryShutdownHook.java:17)
... 22 more
19/02/27 07:23:18 INFO YarnClusterSchedulerBackend: Shutting down all executors

Cannot convert Spark event log to SparkLens report (error is null)

Hi, thanks for the work on Sparklens!
I'm trying to convert a historical Spark event log file into a SparkLens report with the following command:

spark-submit --packages qubole:sparklens:0.2.1-s_2.11 --class com.qubole.sparklens.app.EventHistoryToSparklensJson dummy-arg spark-event-log-file report.json

where spark-event-log-file is an uncompressed Spark event log file.
But I get the following error:

Converting Event History files to Sparklens Json files
src: /Users/jrjd/spark-event-log-file destination: /Users/jrjd/report.json
Failed to process file: /Users/jrjd/spark-event-log-file error: null
19/06/18 11:24:18 INFO ShutdownHookManager: Shutdown hook called
19/06/18 11:24:18 INFO ShutdownHookManager: Deleting directory /private/var/folders/8z/fvg7d2fd7td98rzbvs7mjvzr0000gn/T/spark-0db82d35-d70a-40ce-ad50-3cfb03395168

Here is my version of Spark:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_25

By any chance do you have any idea about what is happening? I have a hard time debugging because of the error: null. Thanks!

Sparklens with spark-submit

Once we run the spark application with spark-submit by adding below sparklens configs, where can we find the sparklens results? Or does Sparklens only works with notebooks?

--packages qubole:sparklens:0.1.2-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener

JAR version Issue while Implementing StreamingLense

I am trying to Implement StreamingLense In Spark Application.I Have added below 2 lines in existing code as suggested here. https://github.com/qubole/streaminglens

Screenshot 2021-09-26 at 12 38 24 PM

    1. class StreamingLens_POC(spark: SparkSession, options: RequestBuilder){}
    2. val streamingLens = new StreamingLens_POC(spark, options) 

    // Added New Block For StreamingLense
    class StreamingLens_POC(spark: SparkSession, options: RequestBuilder)

   // Existing Code which was working fine without any issue.
    object StreamingLens_POC {
    def main(args: Array[String]): Unit = {
    val applicationName = args(0) 
    val spark = SparkSession
   .builder()
   .appName(applicationName)
   //.config("spark.master", "local") //Addition code to execute in local
   .getOrCreate()
  println("Spark Streaming Lens POC Program Started")
  val streamingLens = new StreamingLens_POC(spark, options)   // added this new line for StreamingLense
 //..... existing code Code....
..
..
..
..
}

After that When I am trying to execute this application on server using below spark submit Command.

    spark-submit \
    --name SPARK_STREAMING_POC \
    --num-executors 1 \
    --jars  /home/username/jar/spark-streaminglens_2.11-0.5.3.jar , /home/username/jar/logstash-gelf-1.3.1.jar, ..(other required jar) \
    --master yarn --deploy-mode cluster --driver-cores 1 --driver-memory 2G --executor-cores 1 --executor-memory 2G \
    --supervise --class com.pkg.data.StreamingLens_POC /home/username/jar/PrjectJarName.jar \
    SPARK_STREAMING_POC

But Its Giving Below Error.

     21/09/24 11:50:26 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: biz.paluch.logging.gelf.log4j.GelfLogAppender.setAdditionalFieldTypes(Ljava/lang/String;)V
    java.lang.NoSuchMethodError: biz.paluch.logging.gelf.log4j.GelfLogAppender.setAdditionalFieldTypes(Ljava/lang/String;)V

Can someone Kindly Suggest. If I need to do any addition Task here.

Error parsing spark 2.2.0 event logs

Certain spark 2.20 event logs (dumped from Qubole) cannot be parsed with this error:

18/08/20 16:52:05 ERROR ReplayListenerBus: Malformed line #2: {"Event":"org.apache.spark.scheduler.SparkListenerAMStart","containerId":"container_1534754524533_0006_01_000001","hostname":"$REDACTED"}
java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListenerAMStart
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
	at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:521)
	at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.qubole.sparklens.app.EventHistoryReporter.<init>(EventHistoryReporter.scala:33)

Spark tries to lookup the event class from the event type in the json file, but it encountered an event type => class mapping that is not found. I could not find any references to SparkListenerAMStart in spark's source code so I'm curious how Spark could have generated this event.

My guesses as to what happened. Either:

  • Spark compatability issues between old versions of spark and new versions of spark. The event log was written with Spark 2.2 but the parser is assuming Spark 2.0. IE) The deserialization code from Spark 2.0 does not have the proper mapping to handle Spark 2.2 events.
  • Some qubole-custom extension to Spark is writing out these non-standard events.

resolver-fix

plugins.sbt has a resolver pointing to https://dl.bintray.com/spark-packages/maven but this website is forbidden.
To create a jar from this repo, it throws an error - Resolver Exception - sbt-spark-package 0.2.4 not found.

Solution - To update the plugins.sbt with repos.spark-packages,org instead of the dl.bintray.com/spark-packages/maven".
PR raised with the resolution - #69

Email report generation is not working

I have seen 2 issues with email report generation

  1. email having dot(.) is considered as invalid, example: [email protected] (my office email has this pattern & when I tried generating email, I was getting invalid email in yarn logs). Tried validating this regex https://github.com/qubole/sparklens/blob/7fa57b9606abaace2ff7459028a6bf5a68fd5fa9/src/main/scala/com/qubole/sparklens/helper/EmailReportHelper.scala#L15 through https://regex101.com/ for above mentioned email and I'm getting full match as [email protected]

email_regex

  1. I tried generating with regular email id (without having any dots in email) and even then I was not getting any email reports.

Getting negetive wallclock time in the sparklens UI

Hi,
Thank you for the amazing tool for the performance monitoring of spark jobs. I was trying out some long running spark query using sparklens. However I was getting some strange output in the sparklens UI regarding the job time, like wallclock time etc.
image
image
Could you please help me to resolve this issue?

Fix number of executors for calculation CoreComputeTime

Currently, while calculating CoreComputeTime ( i.e. the "time-of-app" X "total-num-of-cores") , we include cores from all executors irrespective of how long an executor lived in the spark-app time-frame. This could give a blown-up number for CoreComputeTime available because executors could go down, and new executors could come up all the time.

On the other hand, finding the exact CoreComputeTime (i.e. summation across executors for ("time-of-executor" X "num-core-per-executor") ) is also not a good number, since it does not tell whether the machines had gone down along with executors or not (which is often 'not'). So the compute wasted on host-without-executors would not be accounted, if we only look at the executor up-time.

So a better number would be the "maximum-concurrent-executors-that-ran" X "time-of-app", which would give us a better estimate of the compute available.

Spark streaming support

Sparklens seem to be waiting for the job to complete, however it doesn't seem to be working with the
Spark streaming jobs. Do you have any plan to fix it ?

MaxTaskMem 0.0 KB allways

Hello there! thanks for the tool, looks promising. nice thing that it works on historical events data!

I tried that and got a 'Max memory which an executor could have taken = 0.0 KB'. and everywhere a 0.0 KB in the 'MaxTaskMem' column which is obviously wrong since the job ran and used some memory.
Other metrics look at least sane.

I wonder what can be the case?

My ultimate goal is to find out how much can I lower the memory footprint of a job and do not get OOM, this 'MaxTaskMem' field looks like a good fit for that, right?

request: option to add spark.job.description to the report

While reading the for example the Printing Application timeline, looking at 535 jobs it is hard to relate a number to an action in the Spark program. In the Spark UI there is a solution for this: set the spark.job.description and/or spark.jobGroup.id in the driver for every (eager) action you want the Spark executors to do (see the answers to https://stackoverflow.com/questions/39123314/how-to-add-custom-description-to-spark-job-for-displaying-in-spark-web-ui)

Is it possible to add an option to Sparklens to choose to report the spark.job.description and/or spark.jobGroup.id with its Job id?

Not Able to See StreamingLens Report In Logs.

Hi,

I am trying to Implement StreamingLens In My existing Streaming Application. My application Is working fine and its loading data from one kafka topic to anothe kafka topic. But in ambari I am not able to see StreamingLens Reports, when I did this for batch application using sparklence i could see the logs generated by sparklense with all the resources information but same report I am not able to see for streaming application can someone suggest if I need to do additional code or where should I check the reports which should generate by StreamingLens.

My Sample code

      class SparkStreamingLens(spark: SparkSession, options: RequestBuilder)
      object SparkStreamingLens {
      def main(args: Array[String]): Unit = {
      println(" Spark Parameters are :")
      val igniteDetails = args(0)  
      val applicationName = args(1)
      val argumentTable = args(2)
      // options.addParameter("streamingLens.reporter.intervalMinutes", "1")
      val spark = SparkSession
      .builder()
      .appName(applicationName)
      .getOrCreate()
      val streamingLens = new SparkStreamingLens(spark, options)
      // Remaining Code to Read from Kafka and write Into Kafka(Streaming Data)
      }
      }

  spark-submit Command:

  spark-submit \
  --verbose \
  --name SparkStreamingLens \
  --num-executors 1  \
  --conf streamingLens.reporter.intervalMinutes=1  \
  --jars /home/abc/jars/spark-streaminglens_2.11-0.5.3.jar,\
 /home/abc/jars/kafka-clients-0.10.2.1.jar,\
  --master yarn \
  --deploy-mode cluster \
  --driver-cores 1  --driver-memory 2G  --executor-cores 1  --executor-memory 2G \
  --supervise --class com.data.datalake.SparkStreamingLens \
 /home/abc/jar/SparkStreamingLens-spark-utility_2.11-1.0.jar  \
 "jdbc:ignite:thin://00.000.00.00:00000;distributedJoins=true;user=aaaaaa;password=aaaaaaa;"  \
 SparkStreamingLens \
 argumentTable

Not able to run spark lens on spark history file

hi Guys,

I am trying to get spark lens report from my history file. However it's not able to identify the arguments.

I am getting following error

[sa_awsprd_nztoaws@ip-10-160-1-109 gshah]$ spark-submit --jars sparklens-0.3.2-s_2.11.jar  --class com.qubole.sparklens.app.ReporterApp qubole-dummy-arg  file:////home/gshah/application_1589573238486_1924 source=file
20/09/03 14:57:22 WARN DependencyUtils: Local jar /home/gshah/qubole-dummy-arg does not exist, skipping.
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 3788"...
Killed

Support for zipped offline log files

When downloading from the Spark History Server, logs are zipped by default. Can Sparklens support reading this format without having to manually unpack?

Release for Scala 2.12

Hi,

Any plans to release for Scala 2.12?

Scala 2.13 and Spark 3 would be good to have support for.

Thanks,
Janek

How does source=history option work?

I am trying to run sparklens on event logs of my application.

I am using following command

./bin/spark-submit \
	--packages qubole:sparklens:0.2.0-s_2.11 \
	--master local[0] \
	--class com.qubole.sparklens.app.ReporterApp \
	qubole-dummy-arg file:///Users/shasidhar/interests/sparklens/eventlog.txt source=history

I see following output in console

Ivy Default Cache set to: /Users/shasidhar/.ivy2/cache
The jars for the packages stored in: /Users/shasidhar/.ivy2/jars
:: loading settings :: url = jar:file:/Users/shasidhar/interests/spark/spark-2.3.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
qubole#sparklens added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
	confs: [default]
	found qubole#sparklens;0.2.0-s_2.11 in spark-packages
:: resolution report :: resolve 177ms :: artifacts dl 5ms
	:: modules in use:
	qubole#sparklens;0.2.0-s_2.11 from spark-packages in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/6ms)
2019-01-03 15:46:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Local jar /Users/shasidhar/interests/spark/spark-2.3.0-bin-hadoop2.7/qubole-dummy-arg does not exist, skipping.

2019-01-03 15:46:52 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-01-03 15:46:52 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/3t/rfd2djjs1yg30mhmw8z_s7tw0000gp/T/spark-7a992110-6a4f-44f4-9473-1ddade11b53a

What exactly I need to look at after this? Does it generate sparklens json file? If yes, where I can see the output file?

Decouple reporting from spark-app

Right now, the simulations are run in the spark-app. This may add penalty to the app runtime.

By de-coupling simulation we can also get the additional advantage of running simulations independently on any previously-run-app. This is also a goal of de-coupling.

EventHistoryToSparklensJson doesn't work with local or HDFS events file/directory

EventHistoryToSparklensJson class treats input events file argument as local file or directory. However, EventHistoryReporter class, used internally, reads it as HDFS file.

This makes both local and HDFS events file unusable with EventHistoryToSparklensJson.
Doc mentions that input file should be local path.

To circumvent this issue, I had to keep events file in both local and HDFS filesystems at identical paths.

Jar used: https://mvnrepository.com/artifact/qubole/sparklens/0.3.1-s_2.11
Java 8/Scala 2.11/Spark 2.4.3/AWS EMR

Implementation Of StreamingLens Without Changing in Existing Code.

Hi, I am Trying to Implement sparklens for streaming Application. I am able to do the same for spark Batch Application by adding below additional line in my batch application spark submit command.

--jars /home/sparklens-0.1.2-s_2.11.jar
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true

Without Changing anything in code.

now I am trying to do the same with my streaming Applications also and using --jars /path/to/spark-streaminglens_2.11-0.5.3.jar
this jar in spark-submit command. Can someone kindly suggest if is there any way to run the same without changing anything in code.

//all Imports

object StreamApp {

def main(args: Array[String]): Unit = {

//SparkSession With sparklens Properties
val spark = SparkSession
.builder()
.appName("SparkLence_With_Spark_Submit")
.getOrCreate()

  // code for streaming Application

  }
  }

The qubole#sparklens;0.3.2-s_2.11 module is intermittently not found in the SparkPackages repo

According to https://mvnrepository.com/artifact/qubole/sparklens/0.3.2-s_2.11, the module should be available in SparkPackages, but we've seen intermittent download failures over the past week or so. For instance, right now, the package is not available. It looks like potentially an issue with https://dl.bintray.com/, as all requests are getting a 403 Forbidden response. Do you have any plans to host the module on a different maven repo?

Sparklens for streaming

Hi,

Are there any plans to adjust Sparklens for streaming processing? I assume that right now it is suitable only for batch processes?

Best,
Dominika

What is the overhead of running sparklens with every job.

I am planning to keep sparklens running for each job providing me statics for each job so that later on I can compare different metrics across different jobs.
Wanted to figure out what is the overhead of running the sparklens for a spark job.

Error while opening PySpark shell with the package and conf on my local

Error

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: scala/Product$class

pyspark --version

22/02/03 18:11:23 WARN Utils: Your hostname, BLRETV-C02G23EY.local resolves to a loopback address: 127.0.0.1; using 192.168.118.186 instead (on interface en0)
22/02/03 18:11:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/Cellar/apache-spark/3.2.0/libexec/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.12
Branch HEAD
Compiled by user ubuntu on 2021-10-06T12:46:30Z
Revision 5d45a415f3a29898d92380380cfd82bfc7f579ea
Url https://github.com/apache/spark
Type --help for more information.

pyspark --packages qubole:sparklens:0.3.2-s_2.11 --conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener

Python 3.8.2 (default, Jun  8 2021, 11:59:35)
[Clang 12.0.5 (clang-1205.0.22.11)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
22/02/03 18:01:16 WARN Utils: Your hostname, BLRETV-C02G23EY.local resolves to a loopback address: 127.0.0.1; using 192.168.118.186 instead (on interface en0)
22/02/03 18:01:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/Cellar/apache-spark/3.2.0/libexec/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
:: loading settings :: url = jar:file:/usr/local/Cellar/apache-spark/3.2.0/libexec/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/aman.rv/.ivy2/cache
The jars for the packages stored in: /Users/aman.rv/.ivy2/jars
qubole#sparklens added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-876ed4ea-9694-4b7e-94dd-880032fb7c49;1.0
	confs: [default]
	found qubole#sparklens;0.3.2-s_2.11 in spark-packages
:: resolution report :: resolve 104ms :: artifacts dl 2ms
	:: modules in use:
	qubole#sparklens;0.3.2-s_2.11 from spark-packages in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-876ed4ea-9694-4b7e-94dd-880032fb7c49
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/6ms)
22/02/03 18:01:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/03 18:01:18 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.base/java.lang.Thread.run(Thread.java:829)
22/02/03 18:01:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/shell.py:42: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/shell.py", line 38, in <module>
    spark = SparkSession._create_shell_session()  # type: ignore
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/sql/session.py", line 553, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/sql/session.py", line 228, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/context.py", line 392, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/context.py", line 146, in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/context.py", line 209, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/pyspark/context.py", line 329, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1573, in __call__
    return_value = get_return_value(
  File "/usr/local/Cellar/apache-spark/3.2.0/libexec/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: scala/Product$class
	at com.qubole.sparklens.common.ApplicationInfo.<init>(ApplicationInfo.scala:22)
	at com.qubole.sparklens.QuboleJobListener.<init>(QuboleJobListener.scala:42)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2876)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2868)
	at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2538)
	at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2537)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2537)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:641)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)

formula for ideal executor plot

Hi,

First of all, amazing project!

From the report generated from http://sparklens.qubole.com/ , I see the ideal executor plot where it plot "the minimal number of executors (ideal) which could have finished the same work in same amount of wall clock time"

I am curious what are the formulas, equations for such plot. If you can give me some explanation on how you guys approach it, that would be great. Thanks!

IncompatibleClassChangeError when using sparklens with Spark 1.6

Following the instructions on the home page, I built the code, and used it in the spark-submit. I aded the jar to the list of jars, and added the --conf.

Here is the error I get:

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IncompatibleClassChangeError: class com.qubole.sparklens.QuboleJobListener has interface org.apache.spark.scheduler.SparkListener as super class
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2134)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2131)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2131)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:589)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:214)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)

The same error occurs when running Scala-Spark

Implementation Of StreamingLens in Existing Spark Streaming Applications

I am trying to Implement StreamingLense In Spark Application.I Have added below 2 lines in existing code as suggested here. https://github.com/qubole/streaminglens

Screenshot 2021-09-26 at 12 38 24 PM

    1. class StreamingLens_POC(spark: SparkSession, options: RequestBuilder){}
    2. val streamingLens = new StreamingLens_POC(spark, options) 

    // Added New Block For StreamingLense
    class StreamingLens_POC(spark: SparkSession, options: RequestBuilder)

   // Existing Code which was working fine without any issue.
    object StreamingLens_POC {
    def main(args: Array[String]): Unit = {
    val applicationName = args(0) 
    val spark = SparkSession
   .builder()
   .appName(applicationName)
   //.config("spark.master", "local") //Addition code to execute in local
   .getOrCreate()
  println("Spark Streaming Lens POC Program Started")
  val streamingLens = new StreamingLens_POC(spark, options)   // added this new line for StreamingLense
 //..... existing code Code....
..
..
..
..
}

After that When I am trying to execute this application on server using below spark submit Command.

    spark-submit \
    --name SPARK_STREAMING_POC \
    --num-executors 1 \
    --jars  /home/username/jar/spark-streaminglens_2.11-0.5.3.jar , /home/username/jar/logstash-gelf-1.3.1.jar, ..(other required jar) \
    --master yarn --deploy-mode cluster --driver-cores 1 --driver-memory 2G --executor-cores 1 --executor-memory 2G \
    --supervise --class com.pkg.data.StreamingLens_POC /home/username/jar/PrjectJarName.jar \
    SPARK_STREAMING_POC

But Its Giving Below Error.

     21/09/24 11:50:26 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: biz.paluch.logging.gelf.log4j.GelfLogAppender.setAdditionalFieldTypes(Ljava/lang/String;)V
    java.lang.NoSuchMethodError: biz.paluch.logging.gelf.log4j.GelfLogAppender.setAdditionalFieldTypes(Ljava/lang/String;)V

Can someone Kindly Suggest. If I need to do any addition Task here.

EventLoss analyzer for detecting event loss

Loss of critical events like StageEnd, JobEnd, ExecutorAdded, etc leads to inaccurate reports. #56 highlights this problem. We should write an EventLossDetector Analyzer which will detect the event loss and communicate to the user that the report might not be accurate or not show the report altogether.

StageSkewAnalyzer: Arithmetic Exception: division by zero

I'm receiving this error running sparklens on spark history file

Failed in Analyzer StageSkewAnalyzer
java.lang.ArithmeticException: / by zero
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer$$anonfun$computePerStageEfficiencyStatistics$3.apply(StageSkewAnalyzer.scala:109)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer$$anonfun$computePerStageEfficiencyStatistics$3.apply(StageSkewAnalyzer.scala:90)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer.computePerStageEfficiencyStatistics(StageSkewAnalyzer.scala:90)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer.analyze(StageSkewAnalyzer.scala:33)
        at com.qubole.sparklens.analyzer.AppAnalyzer$class.analyze(AppAnalyzer.scala:32)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer.analyze(StageSkewAnalyzer.scala:27)
        at com.qubole.sparklens.analyzer.AppAnalyzer$$anonfun$startAnalyzers$1.apply(AppAnalyzer.scala:91)
        at com.qubole.sparklens.analyzer.AppAnalyzer$$anonfun$startAnalyzers$1.apply(AppAnalyzer.scala:89)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
        at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
        at com.qubole.sparklens.analyzer.AppAnalyzer$.startAnalyzers(AppAnalyzer.scala:89)
        at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:168)
        at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
        at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
        at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
        at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
        at org.apache.spark.scheduler.ReplayListenerBus.postToAll(ReplayListenerBus.scala:35)
        at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:85)
        at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at com.qubole.sparklens.app.EventHistoryReporter.<init>(EventHistoryReporter.scala:38)
        at com.qubole.sparklens.app.ReporterApp$.parseInput(ReporterApp.scala:54)
        at com.qubole.sparklens.app.ReporterApp$.delayedEndpoint$com$qubole$sparklens$app$ReporterApp$1(ReporterApp.scala:27)
        at com.qubole.sparklens.app.ReporterApp$delayedInit$body.apply(ReporterApp.scala:20)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
        at scala.App$class.main(App.scala:76)
        at com.qubole.sparklens.app.ReporterApp$.main(ReporterApp.scala:20)
        at com.qubole.sparklens.app.ReporterApp.main(ReporterApp.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

In the output I can see total number of cores available = 10, and total number of executors = 11, what could be the cause of this?
This leads to the executorCores variables to be equal to zero, which leads to the issue above.

How to understand the spark-lens statistics. Is there any wiki?

Hi @mayurdb / @iamrohit / @rishitesh

We have generated our application's performance report but we are having trouble understanding the output report (html). We have generated the JSON file and we uploaded it to your reporting service which generated a nice HTML report (http://sparklens.qubole.com/report_view/616b961beb11dc0298b1)

Could you please direct us to any type of documentation which explains how to interpret the output HTML report ? Thank you in advance.

Gaurav

sparklens datadirectory not found

Hi There,

We are running spark-lens with the application itself, and we also want to check the offline directory to generate JSON but there is no directory, named /tmp/sparklens, found after the application has finished.

Not able to understand why is this happening OR are we missing some configuration. Could you please give us some pointers here please?

Thanks,
Gaurav

Not able to see the sparklens.Json File at mentioned Location

Hi I have recently started Learning SparkLens and trying to generate one sample json file for my SparkApplication and I'm using below spark Submit Command.

spark-submit
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true
--conf spark.sparklens.data.dir=/home/data/sparklens.json
--num-executors 1
--jars /home/data/SparkLensJar/sparklens-0.1.2-s_2.11.jar
--master yarn
--deploy-mode cluster
--driver-cores 1
--driver-memory 1G
--executor-cores 1
--executor-memory 1G
--supervise --class com.spark.data.Sparklens
/home/data/SparkLensJar/data-spark-utility_2.11-1.0.jar

I am able to see report/Matrics in Log but Its not creating any Json file at mentioned Location.I have also tried Hdfs location /user/username instead of localFilePath and still not able to see the Json File.

pyspark can use sparklens?

i have a pyspark program.
spark 2.4.0 , hadoop 3.0.0
use spark-submit --jars sparklens-0.3.2-s_2.11.jar  --conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener --conf spark.sparklens.reporting.disabled=true
to submit the pyspark program.
but error accure, anybody know why?
727433568511414356

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.