Need a standalone example at <a href="https://github.com/Microsoft/SparkCLR/tree/maste

PR <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id=

Thanks for your comments <a class="user-mention notranslate" data-hovercard-type="user

And randomly, I encountered following error: <a target="_blank" rel="noopener nore

Pre-Requisites: <a href="https://github.com/Microsoft/Mobius/blob/master/notes/run

add streaming example - Kafka about mobius HOT 8 CLOSED

microsoft commented on May 10, 2024

add streaming example - Kafka

from mobius.

Comments (8)

skaarthik commented on May 10, 2024

PR #398

from mobius.

DaweiCai commented on May 10, 2024

Hi skaarthik,

When I was trying to run the streaming Sample SparkClrHdfsWordCount with the following command:

.\sparkclr-submit.cmd --master local[*] --exe SparkClrHdfsWordCount.exe C:\Users\dacai\Mobius\examples\Streaming\HdfsWordCount\bin\Release C:\Users\dacai\Mobius\cp\ C:\Users\dacai\Mobius\dacaiTemp

I can't see the output format in the cmd console:
wordCounts.ForeachRDD((time, rdd) =>
{
Console.WriteLine("-------------------------------------------");
Console.WriteLine("Time: {0}", time);
Console.WriteLine("-------------------------------------------");
object[] taken = rdd.Take(10);
foreach (object record in taken)
{
Console.WriteLine(record);
}
Console.WriteLine();
});

But with the batch example: Mobius\examples\Batch\WordCount, I can see the output in the console.

Do you do why or could you show me some clue how to deep into this issue?

Thanks @skaarthik

from mobius.

hebinhuang commented on May 10, 2024

@kasuosuo
I do see the results from Console. Please make sure all Pre-Requisites are set correctly. and the Input Directory (c:\users\dacai\Mobius\dacaiTemp) has files, the checkpoint directory (c:\users\dacai\Mobius\cp) has not been created before you start to run.

from mobius.

DaweiCai commented on May 10, 2024

Thanks for your comments @hebinhuang .

Yes, I have checked :
1. c:\users\dacai\Mobius\dacaiTemp has files
2. c:\users\dacai\Mobius\cp has not been created before start the local job.
I still can not see the output, and the console stuck here：

For all Pre-Requisites you mentioned, do you have a list?

Thanks

from mobius.

DaweiCai commented on May 10, 2024

And randomly, I encountered following error:

from mobius.

hebinhuang commented on May 10, 2024

Pre-Requisites:
https://github.com/Microsoft/Mobius/blob/master/notes/running-mobius-app.md#wordcount-example-batch

Can you pipe out the console text ?

from mobius.

DaweiCai commented on May 10, 2024

Sure. I downloaded the release version and test the streaming example again, also the same error.

PS C:\Windows\system32> sparkclr-submit.cmd --master local[4] --conf spark.local.dir=E:\sparktemp\wordcount\ --exe Spark
ClrHdfsWordCount.exe E:\Java\OpenSourceSoftware\spark-clr_2.10-1.6.100\examples\Streaming\HdfsWordCount E:\sparktemp\cp
C:\Users\dacai\Mobius\dacaiTemp\ >E:\sparktemp\log.txt

log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/13 14:48:58 INFO SparkContext: Running Spark version 1.6.1
16/05/13 14:48:58 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the clus
ter manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
16/05/13 14:48:58 INFO SecurityManager: Changing view acls to: dacai
16/05/13 14:48:58 INFO SecurityManager: Changing modify acls to: dacai
16/05/13 14:48:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view perm
issions: Set(dacai); users with modify permissions: Set(dacai)
16/05/13 14:48:59 INFO Utils: Successfully started service 'sparkDriver' on port 44636.
16/05/13 14:49:00 INFO Slf4jLogger: Slf4jLogger started
16/05/13 14:49:00 INFO Remoting: Starting remoting
16/05/13 14:49:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]
.17:44649]
16/05/13 14:49:00 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44649.
16/05/13 14:49:00 INFO SparkEnv: Registering MapOutputTracker
16/05/13 14:49:00 INFO SparkEnv: Registering BlockManagerMaster
16/05/13 14:49:01 INFO DiskBlockManager: Created local directory at E:\sparktemp\wordcount\blockmgr-49d2e80b-4de8-466c-9
7d4-db7077aee6bd
16/05/13 14:49:01 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/05/13 14:49:01 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/13 14:49:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/13 14:49:01 INFO SparkUI: Started SparkUI at http://10.168.187.17:4040
16/05/13 14:49:01 INFO HttpFileServer: HTTP File server directory is E:\sparktemp\wordcount\spark-67034994-c974-4e95-8e5
e-ce8b7a503aa8\httpd-eb5a5791-bfe5-428c-bf92-2398cd0c00a9
16/05/13 14:49:01 INFO HttpServer: Starting HTTP Server
16/05/13 14:49:01 INFO Utils: Successfully started service 'HTTP file server' on port 44656.
16/05/13 14:49:01 INFO SparkContext: Added JAR file:/E:/Java/OpenSourceSoftware/spark-clr_2.10-1.6.100/runtime/lib/spark
-clr_2.10-1.6.100.jar at http://10.168.187.17:44656/jars/spark-clr_2.10-1.6.100.jar with timestamp 1463122141515
16/05/13 14:49:01 INFO Executor: Starting executor ID driver on host localhost
16/05/13 14:49:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on
port 44673.
16/05/13 14:49:01 INFO NettyBlockTransferService: Server created on 44673
16/05/13 14:49:01 INFO BlockManagerMaster: Trying to register BlockManager
16/05/13 14:49:01 INFO BlockManagerMasterEndpoint: Registering block manager localhost:44673 with 511.1 MB RAM, BlockMan
agerId(driver, localhost, 44673)
16/05/13 14:49:01 INFO BlockManagerMaster: Registered BlockManager
16/05/13 14:49:02 INFO FileInputDStream: Duration for remembering RDDs set to 60000 ms for org.apache.spark.streaming.ds
tream.FileInputDStream@76bba21e
16/05/13 14:49:02 INFO ForEachDStream: metadataCleanupDelay = -1
16/05/13 14:49:02 INFO CSharpDStream: metadataCleanupDelay = -1
16/05/13 14:49:02 INFO MappedDStream: metadataCleanupDelay = -1
16/05/13 14:49:02 INFO FileInputDStream: metadataCleanupDelay = -1
16/05/13 14:49:02 INFO FileInputDStream: Slide time = 30000 ms
16/05/13 14:49:02 INFO FileInputDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/13 14:49:02 INFO FileInputDStream: Checkpoint interval = null
16/05/13 14:49:02 INFO FileInputDStream: Remember duration = 60000 ms
16/05/13 14:49:02 INFO FileInputDStream: Initialized and validated org.apache.spark.streaming.dstream.FileInputDStream@7
6bba21e
16/05/13 14:49:02 INFO MappedDStream: Slide time = 30000 ms
16/05/13 14:49:02 INFO MappedDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/13 14:49:02 INFO MappedDStream: Checkpoint interval = null
16/05/13 14:49:02 INFO MappedDStream: Remember duration = 30000 ms
16/05/13 14:49:02 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@54c59c3
8
16/05/13 14:49:02 INFO CSharpDStream: Slide time = 30000 ms
16/05/13 14:49:02 INFO CSharpDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/13 14:49:02 INFO CSharpDStream: Checkpoint interval = null
16/05/13 14:49:02 INFO CSharpDStream: Remember duration = 30000 ms
16/05/13 14:49:02 INFO CSharpDStream: Initialized and validated org.apache.spark.streaming.api.csharp.CSharpDStream@650c
8b7c
16/05/13 14:49:02 INFO ForEachDStream: Slide time = 30000 ms
16/05/13 14:49:02 INFO ForEachDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/13 14:49:02 INFO ForEachDStream: Checkpoint interval = null
16/05/13 14:49:02 INFO ForEachDStream: Remember duration = 30000 ms
16/05/13 14:49:02 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@74690
abf
16/05/13 14:49:03 INFO RecurringTimer: Started timer for JobGenerator at time 1463122170000
16/05/13 14:49:03 INFO JobGenerator: Started JobGenerator at 1463122170000 ms
16/05/13 14:49:03 INFO JobScheduler: Started JobScheduler
16/05/13 14:49:03 INFO StreamingContext: StreamingContext started
16/05/13 14:49:30 INFO FileInputDStream: Finding new files took 11 ms
16/05/13 14:49:30 INFO FileInputDStream: New files at time 1463122170000 ms:

CSharp transform callback failed with java.net.SocketException: Connection reset
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.net.SocketInputStream.read(SocketInputStream.java:203)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.spark.api.csharp.SerDe$.readObjectType(SerDe.scala:20)
at org.apache.spark.api.csharp.SerDe$.readObject(SerDe.scala:24)
at org.apache.spark.streaming.api.csharp.CSharpDStream$.callCSharpTransform(CSharpDStream.scala:59)
at org.apache.spark.streaming.api.csharp.CSharpDStream.compute(CSharpDStream.scala:105)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.
scala:352)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.
scala:352)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:346)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)
at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47)
at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processE
vent(JobGenerator.scala:181)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/05/13 14:49:31 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
16/05/13 14:49:31 INFO JobGenerator: Stopping JobGenerator immediately
16/05/13 14:49:31 INFO RecurringTimer: Stopped timer for JobGenerator after time 1463122170000
16/05/13 14:49:31 INFO JobScheduler: No jobs added for time 1463122170000 ms
16/05/13 14:49:31 INFO CheckpointWriter: CheckpointWriter executor terminated ? true, waited for 0 ms.
16/05/13 14:49:31 INFO JobGenerator: Stopped JobGenerator
16/05/13 14:49:31 INFO JobScheduler: Stopped JobScheduler
16/05/13 14:49:31 INFO StreamingContext: StreamingContext stopped successfully
16/05/13 14:49:31 INFO SparkContext: Invoking stop() from shutdown hook
16/05/13 14:49:31 INFO SparkUI: Stopped Spark web UI at http://10.168.187.17:4040
16/05/13 14:49:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/13 14:49:31 INFO MemoryStore: MemoryStore cleared
16/05/13 14:49:31 INFO BlockManager: BlockManager stopped
16/05/13 14:49:31 INFO BlockManagerMaster: BlockManagerMaster stopped
16/05/13 14:49:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/13 14:49:31 INFO SparkContext: Successfully stopped SparkContext
16/05/13 14:49:31 INFO ShutdownHookManager: Shutdown hook called
16/05/13 14:49:31 INFO ShutdownHookManager: Deleting directory E:\sparktemp\wordcount\spark-67034994-c974-4e95-8e5e-ce8b
7a503aa8\httpd-eb5a5791-bfe5-428c-bf92-2398cd0c00a9
16/05/13 14:49:31 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/05/13 14:49:31 INFO ShutdownHookManager: Deleting directory E:\sparktemp\wordcount\spark-67034994-c974-4e95-8e5e-ce8b
7a503aa8
16/05/13 14:49:31 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remo
te transports.

For the E:\sparktemp\log.txt, it shows:
SPARKCLR_JAR=spark-clr_2.10-1.6.100.jar
[sparkclr-submit.cmd] Command to run --master local[4] --conf spark.local.dir=E:\sparktemp\wordcount\ --name SparkClrHdfsWordCount --class org.apache.spark.deploy.csharp.CSharpRunner E:\Java\OpenSourceSoftware\spark-clr_2.10-1.6.100\runtime\lib\spark-clr_2.10-1.6.100.jar E:\Java\OpenSourceSoftware\spark-clr_2.10-1.6.100\examples\Streaming\HdfsWordCount E:\Java\OpenSourceSoftware\spark-clr_2.10-1.6.100\examples\Streaming\HdfsWordCount\SparkClrHdfsWordCount.exe E:\sparktemp\cp\ C:\Users\dacai\Mobius\dacaiTemp
[CSharpRunner.main] Starting CSharpBackend!
[CSharpRunner.main] Port number used by CSharpBackend is 44600
[CSharpRunner.main] adding key=spark.jars and value=file:/E:/Java/OpenSourceSoftware/spark-clr_2.10-1.6.100/runtime/lib/spark-clr_2.10-1.6.100.jar to environment
[CSharpRunner.main] adding key=spark.app.name and value=SparkClrHdfsWordCount to environment
[CSharpRunner.main] adding key=spark.submit.deployMode and value=client to environment
[CSharpRunner.main] adding key=spark.master and value=local[4] to environment
[CSharpRunner.main] adding key=spark.local.dir and value=E:\sparktemp\wordcount\ to environment
[2016-05-13T06:48:54.9589864Z] [DAWEICAI-Z420] [Info] [ConfigurationService] ConfigurationService runMode is LOCAL
[2016-05-13T06:48:54.9609888Z] [DAWEICAI-Z420] [Info] [SparkCLRConfiguration] CSharpBackend successfully read from environment variable CSHARPBACKEND_PORT
[2016-05-13T06:48:54.9609888Z] [DAWEICAI-Z420] [Info] [SparkCLRIpcProxy] CSharpBackend port number to be used in JvMBridge is 44600
[2016-05-13T06:48:58.6821475Z] [DAWEICAI-Z420] [Info] [SparkConf] Spark app name set to HdfsWordCount
[2016-05-13T06:49:02.6525224Z] [DAWEICAI-Z420] [Info] [StreamingContextIpcProxy] Callback server port number is 44675
[CSharpBackendHandler] Connecting to a callback server at port 44675
[2016-05-13T06:49:30.0754966Z] [DAWEICAI-Z420] [Debug] [StreamingContextIpcProxy] New thread (id=5) created to process callback request
[2016-05-13T06:49:30.0924992Z] [DAWEICAI-Z420] [Error] [StreamingContextIpcProxy] Exception processing call back request. Thread id 5
[2016-05-13T06:49:30.0995051Z] [DAWEICAI-Z420] [Exception] [StreamingContextIpcProxy] Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.Spark.CSharp.Interop.Ipc.SerDe.ReadBytes(Stream s, Int32 length) in c:\projects\sparkclr\csharp\Adapter\Microsoft.Spark.CSharp\Interop\Ipc\SerDe.cs:line 212
at Microsoft.Spark.CSharp.Interop.Ipc.SerDe.ReadBytes(Stream s) in c:\projects\sparkclr\csharp\Adapter\Microsoft.Spark.CSharp\Interop\Ipc\SerDe.cs:line 252
at Microsoft.Spark.CSharp.Proxy.Ipc.StreamingContextIpcProxy.ProcessCallbackRequest(Object socket) in c:\projects\sparkclr\csharp\Adapter\Microsoft.Spark.CSharp\Proxy\Ipc\StreamingContextIpcProxy.cs:line 284
[2016-05-13T06:49:30.0995051Z] [DAWEICAI-Z420] [Error] [StreamingContextIpcProxy] ProcessCallbackRequest fail, will exit ...
[CSharpRunner.main] closing CSharpBackend
Requesting to close all call back sockets.
[CSharpRunner.main] Return CSharpBackend code 1
Utils.exit() with status: 1, maxDelayMillis: 1000

from mobius.

hebinhuang commented on May 10, 2024

@kasuosuo Please try command the below:
sparkclr-submit.cmd --master local[*] --exe SparkClrHdfsWordCount.exe E:\Java\OpenSourceSoftware\spark-clr_2.10-1.6.100\examples\Streaming\HdfsWordCount E:\sparktemp\cp C:\Users\dacai\Mobius\dacaiTemp 1> 1.txt 2>2.txt

from mobius.

add streaming example - Kafka about mobius HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent