Giter Club home page Giter Club logo

docker-dotnet-spark's People

Contributors

brunosaboia avatar indy-3rdman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

docker-dotnet-spark's Issues

Azure Data Lake Store Support

Would you be interested in a PR adding support for communicating with Azure Datalake?

The changes I have made to enable it locally:

  • Switch the spark package to use "without-hadoop"
  • Download the full hadoop dist.
  • Upgrade Hadoop version to 3.2.2
  • Add a couple of environment variables for HADOOP
  • Add the necessary hadoop/azure jar files to $SPARK_HOME/jars

I am specifically interested in having it in the runtime image but could see its use in the interactive as well.

Pass packages parameter in interactive docker (jupyter)

@indy-3rdman
I've encountered some problem using interactive docker (jupyter)

To run my spark program, I have to pass a parameter like
spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.6 ...

but i don't get it how to do it in jupyter.

what should i do to solve it?

[2020-11-26T00:28:42.3520910Z] [c2435164f8aa] [Error] [JvmBridge] org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:248)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:232)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:159)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.$anonfun$handleBackendRequest$2(DotnetBackendHandler.scala:99)
	at org.apache.spark.api.dotnet.ThreadPool$$anon$1.run(ThreadPool.scala:34)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Issue with some methods from Microsoft.Spark.Sql while using Docker for debug from VS

Hello,
I'm getting an error during debug from Visual Studio. Docker container 3rdman/dotnet-spark:latest is up and running with port mapping -p 5567:5567.

Below sample code is failing at last First() method. Show() and Count() works fine, but First() is failing due to connection error to 127.0.0.1:40687 Similar issue is with another methods.

Any idea what I've missed? Has anyone faced similar problem?

Sample dotnet code:

          dataFrameTest.Sort().Show();
          dataFrameTest.Count();
          var firstRowFromDataFrameTest = dataFrameTest.First();

Log from dotnet app in VS:


[Info] [ConfigurationService] Using port 5567 for connection.
[Info] [JvmBridge] JvMBridge port is 5567

System.Net.Internals.SocketExceptionFactory.ExtendedSocketException: 'No connection could be made because the target machine actively refused it. 127.0.0.1:40687'

Log from Docker:

INFO DotnetRunner: Starting DotnetBackend with .
INFO DotnetBackend: The number of DotnetBackend threads is set to 10.
INFO DotnetRunner: Port number used by DotnetBackend is 5050
INFO SparkContext: Running Spark version 3.0.1
...
* .NET Backend running debug mode. Press enter to exit *

...
INFO DAGScheduler: Job 0 finished: count at NativeMethodAccessorImpl.java:0, took 1.889003 s
Exception in thread "serve-DataFrame" java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:560)
	at java.net.ServerSocket.accept(ServerSocket.java:528)
	at org.apache.spark.security.SocketAuthServer$$anon$1.run(SocketAuthServer.scala:58)

It seems it's failing within Microsoft.Spark.Sql.DataFrame class where socket.Connect is being called with parameters coming from GetConnectionInfo

        private IEnumerable<Row> GetRows(string funcName, params object[] args)
        {
            (int port, string secret, _) = GetConnectionInfo(funcName, args);
            using ISocketWrapper socket = SocketFactory.CreateSocket();
            socket.Connect(IPAddress.Loopback, port, secret);
            foreach (Row row in new RowCollector().Collect(socket))
            {
                yield return row;
            }
        }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.