dataflint / spark Goto Github PK
View Code? Open in Web Editor NEWPerformance Observability for Apache Spark
License: Apache License 2.0
Performance Observability for Apache Spark
License: Apache License 2.0
(been playing a bit, this tool looks pretty cool, thanks for sharing it!)
tried with a basic 2-way join and some aggregate (on TPCDS tableS).
When viewing a SQL query (summary --> clicking a query), the join operator (ShuffledHashJoin) is presented as SELECT, both in BASIC view
and in ADVANCED view
I am playing with Spark 3.5.0, this is the physical plan:
AdaptiveSparkPlan (31)
+- == Final Plan ==
TakeOrderedAndProject (20)
+- * HashAggregate (19)
+- AQEShuffleRead (18)
+- ShuffleQueryStage (17), Statistics(sizeInBytes=399.0 KiB, rowCount=1.28E+4)
+- Exchange (16)
+- * HashAggregate (15)
+- * Project (14)
+- * ShuffledHashJoin Inner BuildRight (13)
:- AQEShuffleRead (6)
: +- ShuffleQueryStage (5), Statistics(sizeInBytes=87.9 MiB, rowCount=2.88E+6)
: +- Exchange (4)
: +- * Filter (3)
: +- * ColumnarToRow (2)
: +- Scan parquet (1)
+- AQEShuffleRead (12)
+- ShuffleQueryStage (11), Statistics(sizeInBytes=281.3 KiB, rowCount=1.80E+4)
+- Exchange (10)
+- * Filter (9)
+- * ColumnarToRow (8)
+- Scan parquet (7)
Hello everyone, I know that AWS Glue is not in the supported platforms list, but I decided to give it a try and see if it would work.
This attempt failed, resulting in an error when initializing the Spark Context.
I was wondering if this is a known issue, or if anyone managed to get this working.
Environment
spark version: 3.3
platform: Glue 4.0
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Session and context initialized and job running successfully.
Additional context
Returned error:
File "/tmp/job.py", line 78, in
.getOrCreate()
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 269, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/context.py", line 491, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/context.py", line 197, in init
self._do_init(
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/context.py", line 282, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/context.py", line 410, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1585, in call
return_value = get_return_value(
File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at org.apache.spark.dataflint.DataflintSparkUILoader$.install(DataflintSparkUILoader.scala:17)
at io.dataflint.spark.SparkDataflintDriverPlugin.registerMetrics(SparkDataflintPlugin.scala:26)
at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$registerMetrics$1(PluginContainer.scala:75)
at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$registerMetrics$1$adapted(PluginContainer.scala:74)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.internal.plugin.DriverPluginContainer.registerMetrics(PluginContainer.scala:74)
at org.apache.spark.SparkContext.$anonfun$new$41(SparkContext.scala:681)
at org.apache.spark.SparkContext.$anonfun$new$41$adapted(SparkContext.scala:681)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.SparkContext.(SparkContext.scala:681)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Describe the bug
Dataflint tab in spark UI keeps giving server disconnected, and then it refreshes,
it also doesn't show spark sql queries (ongoing or completed) and keeps saying that no spark sql query
Environemnt
spark verison: 3.2
platform: standalone/ cloudera
To Reproduce
Steps to reproduce the behavior:
it will keep showing server disconnected, then it'll refresh and show the page for a few moments, and then again it'll say server disconnected.
Expected behavior
2 expectations,
Additional context
Add any other context about the problem here.
Describe the bug
A clear and concise description of what the bug is.
It's not a bug, but improvement
Environemnt
spark verison: 3.2/
platform: standalone
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Could Turn off auto-refresh
Screenshots
As you can see, data has been updating
Additional context
Add any other context about the problem here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.