zuinnote / hadoopcryptoledger Goto Github PK
View Code? Open in Web Editor NEWHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
License: Apache License 2.0
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
License: Apache License 2.0
Hello and thank you for creating this very useful open source project!
I started using it yesterday and ran into a couple of issues, I am hoping you can point me in the right direction. I'm including here an example app for reproducing the problems.
My setup:
My Geth node is synced to the mainnet and following the advice here, I created multiple files for holding the exported blockchain (up until block number 5M).
Here's a snippet of the directory, with the file names using a slightly different nomenclature than the example, but still produced with geth export
:
➜ eth-blockchain ls -lah
total 25G
drwxrwxr-x 2 sancho sancho 4.0K Feb 13 16:22 .
drwxr-xr-x 43 sancho sancho 4.0K Feb 13 16:22 ..
-rwxrwxr-x 1 sancho sancho 126M Feb 13 03:01 blocks-0-200000
-rwxrwxr-x 1 sancho sancho 240M Feb 13 03:32 blocks-1000001-1200000
-rwxrwxr-x 1 sancho sancho 274M Feb 13 03:33 blocks-1200001-1400000
-rwxrwxr-x 1 sancho sancho 299M Feb 13 03:33 blocks-1400001-1600000
-rwxrwxr-x 1 sancho sancho 307M Feb 13 03:40 blocks-1600001-1800000
-rwxrwxr-x 1 sancho sancho 290M Feb 13 03:40 blocks-1800001-2000000
-rwxrwxr-x 1 sancho sancho 301M Feb 13 03:41 blocks-2000001-2200000
-rwxrwxr-x 1 sancho sancho 148M Feb 13 03:02 blocks-200001-400000
-rwxrwxr-x 1 sancho sancho 332M Feb 13 03:41 blocks-2200001-2400000
...
The code for the app I'm using:
package analytics
import collection.JavaConverters._
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io.BytesWritable
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.zuinnote.hadoop.ethereum.format.common.EthereumBlock
import org.zuinnote.hadoop.ethereum.format.mapreduce.EthereumBlockFileInputFormat
object TestApp {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
.setAppName("test-app")
.setMaster("local[*]")
val spark = SparkSession.builder
.config(sparkConf)
.getOrCreate()
val hadoopConf = new Configuration()
hadoopConf.set("hadoopcryptoledeger.ethereumblockinputformat.usedirectbuffer", "false")
val rdd = spark.sparkContext.newAPIHadoopFile(
"/home/sancho/eth-blockchain",
classOf[EthereumBlockFileInputFormat], classOf[BytesWritable], classOf[EthereumBlock], hadoopConf
)
println("Number of transactions with negative gas price: " + rdd
.flatMap(_._2.getEthereumTransactions.asScala)
.filter(_.getGasPrice < 0)
.count()
)
println("Number of transactions with negative gas limit: " + rdd
.flatMap(_._2.getEthereumTransactions.asScala)
.filter(_.getGasLimit < 0)
.count()
)
val blockNumber = 4800251
println(s"Number of transactions with negative gas price in block $blockNumber: " + rdd
.filter(_._2.getEthereumBlockHeader.getNumber == blockNumber)
.flatMap(_._2.getEthereumTransactions.asScala)
.filter(_.getGasPrice < 0)
.count()
)
println(s"Number of transactions with negative gas limit in block $blockNumber: " + rdd
.filter(_._2.getEthereumBlockHeader.getNumber == blockNumber)
.flatMap(_._2.getEthereumTransactions.asScala)
.filter(_.getGasLimit < 0)
.count()
)
}
}
This is the build.sbt
file:
lazy val commonSettings = Seq(
scalaVersion := "2.11.7",
test in assembly := {}
)
lazy val ethBlockchainAnalytics = (project in file(".")).
settings(commonSettings).
settings(
name := "EthBlockchainAnalytics",
version := "0.1",
libraryDependencies ++= Seq(
"com.github.zuinnote" %% "spark-hadoopcryptoledger-ds" % "1.1.2",
"org.apache.spark" %% "spark-core" % "2.2.1" % "provided",
"org.apache.spark" %% "spark-sql" % "2.2.1" % "provided"),
assemblyJarName in assembly := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar",
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case PathList("javax", "servlet", xs@_*) => MergeStrategy.last
case PathList("org", "apache", xs@_*) => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
)
The launcher script I'm using:
#!/bin/sh
JAR=$1
/usr/local/lib/spark/bin/spark-submit \
--class analytics.TestApp \
--driver-memory 20G \
$JAR
And finally the command I'm using to run it:
➜ EthBlockchainAnalytics src/main/resources/launcher.sh /home/sancho/IdeaProjects/EthBlockchainAnalytics/target/scala-2.11/EthBlockchainAnalytics_2.11-0.1.jar
The output of the above application when run like this is:
Number of transactions with negative gas price: 8732263
Number of transactions with negative gas limit: 25699923
Number of transactions with negative gas price in block 4800251: 2
Number of transactions with negative gas limit in block 4800251: 8
As a quick sanity check, I ran the following:
➜ ~ geth attach
Welcome to the Geth JavaScript console!
instance: Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9
modules: admin:1.0 debug:1.0 eth:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0
> var txs = eth.getBlock(4800251).transactions
undefined
> for (var i=0; i<txs.length; i++) { if (eth.getTransaction(txs[i]).gasPrice < 0) console.log(txs[i]) }
undefined
Any idea why I'm seeing so many negative gas prices and gas limits when using Spark?
Create a Flink Table Source to facilitate working with the Flink Table Api (cf. https://flink.apache.org/news/2017/03/29/table-sql-api-update.html)
ERC20 tokens are standardized currencies on the Ethereum block chain (cf. https://github.com/ethereum/EIPs/blob/master/EIPS/eip-20.md). Although one could directly parse them without any issues in any Big Data applications, it could make sense to make them as part of the library since they have many applications.
Examples, unit and intregration tests need to be provided.
Create web scraper to fetch currency exchange rates for currencies based on cryptoledgers.
This should support multiple sources in a generic way and the data should be made available together with cryptoledger data provided already by the HadoopCryptoLedger library.
First we need to work on a design, implementation, unit & integration tests (the latter probably on an embedded jetty and/or Tomcat)
Hadoop 3.1 has been released (cf. https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release).
We need to investigate the impact of it (especially for the mapred.* and mapreduce.* APIs, JDK8 only support) and implement some test cases.
We also need to take into account the dependence of other Big Data platforms on Hadoop (most notably Hive and Spark, to a lesser extent Flink)
Ethereum is a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference (https://www.ethereum.org/).
This implies to add additional HadoopInputFormats to process the Ethereum blockchain. Furthermore a HiveSerde and Spark datasource should be created. Unit tests must be included.
Finally, an example for MapReduce, Hive and Spark should be provided. The examples should include unit and integration tests.
Ethereum classic is out of scope.
We are running into exceptions when attempting to do graph analysis using graphframes after creating a dataframe using spark-hadoopcryptoledger-ds. Seems like there are blocksizes bigger than what can fit inside an int ?
java.lang.InterruptedException: org.zuinnote.hadoop.ethereum.format.exception.EthereumBlockReadException: Error: This block size cannot be handled currently (larger then largest number in positive signed int)
Although segwit blocks can be processed with the current hadoopcryptoledger library, dedicated support makes it more easier to use.
Add unit tests for segwit2x compatible blocks:
https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki
fix if needed issues.
increase the default max blocksize according to formula (although the current default of 2M should be sufficient to process the new segwit blocks). Set default max block size to block weight <= 4,000,000
BitcoinScriptpatternparser:
Add parsing for the new witness transaction types:
pay-to-witness-public-key-hash
pay-to-witness-script-hash
support compressed p2hash p2pub addresses
BitcoinUtil:
calculate segwit transaction hash
calculate transaction weight
calculate virtual transaction size
calculate base transaction size
calculate total transaction size
calculate block weight
calculate block base size
Slimcoin is according to its web site "is the first known cryptocurrency designed as a combined proof-of-burn/proof-of-stake/proof-of-work system).
Since it has proof-of-burn it might be valuable to analyse this one as well to evaluate effects of proof-of-burn
There was a bug in the encoding and thus the calculation of transaction hash was wrong in some cases.
Fixed in 1.1.4
Monero is a secure, private, untraceable currency. It is open-source and freely available to all.(https://getmonero.org/home).
This implies to add additional HadoopInputFormats to process the Monero blockchain. Furthermore a HiveSerde, Flink Data Source and Spark datasource should be created. Unit tests must be included.
Finally, an example for MapReduce, Hive, Flink and Spark should be provided. The examples should include unit and integration tests.
At the moment BitcoinScriptPatternParser returns the key hash which can be used to identify the unique address of a transaction and also to search it in blockchain explorers.
Bitcoin has also the notation of Bitcoin addresses which are primarily for users to improve the user experience. The conversion of such addresses is described in https://github.com/bitcoin/bips/blob/master/bip-0013.mediawiki
The idea of this issue is to create a functionality BitcoinScriptPatternParser.getBitcoinAddress(String) that converts the output of BitcoinScriptPatternParser.getPaymentDestination to a bitcoin address (if possible). Additionally we create HiveUDFs. All new methods need to be unit tested and documentation needs to be updated.
Hive 2.x is out since some time now and thus we can drop Hive 1.2.x support and introduce Hive 2.x support
The hyperledger provides various frameworks for developing cryptoledgers. Frameworks are: Fabric, Iroha, Sawtooth.
The idea of this issue to investigate how they can be included in hadoopcryptoledger with unit tests, provide examples on how to analyse of those ledgers and write integration tests for them.
My understanding is that the value field of an ethereum transaction can by upto 32 bytes long, however we store this as a long after parsing the RLP encoded data. It seems that we store 0L in case the data is longer than 8 bytes and there is no warning or error thrown when transaction values are discarded / replaced by zeros. Is this the case ?
Investigate support and create examples+unit tests for using HadoopOffice with Apache Beam (https://beam.apache.org/).
Apache Beam supports writing Big Data jobs once and run them on multiple platforms (e.g. Flink, Spark, Apex, Google Cloud Dataflow...)
Presto is a distributed SQL Query Engine for Big Data supporting a variety of sources. The idea is to add the hadoopcryptoledger library as a source to query directly blockchains, such as Bitcoin&Altcoins or Ethereum&Altcoins.
Currently, the hadoopcryptoledger library can be used indirectly with Presto via the Hive connector.
Relevant documentation:
https://prestodb.io/docs/current/develop/connectors.html
@jornfranke
I have run the example according to the wiki:
https://github.com/ZuInnoTe/hadoopcryptoledger/wiki/Using-Spark-Scala-Graphx-to-analyze-the-Bitcoin-transaction-graph
it has run over 4 hours,and the issue occurs below.
and i run it again for about 4hours.the issue still happens.
it seems that the memory is not enough.but the memory of my docker is enough big.
[root@quickstart scala-spark-graphx-bitcointransaction]# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 2.0T 1.2T 797G 60% /
tmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/xvda1 2.0T 1.2T 797G 60% /etc/resolv.conf
/dev/xvda1 2.0T 1.2T 797G 60% /etc/hostname
/dev/xvda1 2.0T 1.2T 797G 60% /etc/hosts
shm 64M 0 64M 0% /dev/shm
[root@quickstart scala-spark-graphx-bitcointransaction]# free
total used free shared buffers cached
Mem: 65960180 25773772 40186408 686156 102472 14819768
-/+ buffers/cache: 10851532 55108648
Swap: 0 0 0
[root@quickstart scala-spark-graphx-bitcointransaction]# free -h
total used free shared buffers cached
Mem: 62G 24G 38G 670M 100M 14G
-/+ buffers/cache: 10G 52G
Swap: 0B 0B 0B
how i can deal with this problem?thanks very much.
18/06/08 06:17:35 INFO collection.ExternalSorter: Thread 913 spilling in-memory map of 24.6 MB to disk (9 times so far)
18/06/08 06:17:35 INFO collection.ExternalSorter: Thread 947 spilling in-memory map of 25.0 MB to disk (28 times so far)
18/06/08 06:17:55 INFO collection.ExternalAppendOnlyMap: Thread 922 spilling in-memory map of 27.4 MB to disk (3 times so far)
18/06/08 06:18:02 INFO collection.ExternalSorter: Thread 947 spilling in-memory map of 24.7 MB to disk (29 times so far)
18/06/08 06:18:04 INFO collection.ExternalSorter: Thread 950 spilling in-memory map of 24.7 MB to disk (15 times so far)
18/06/08 06:18:09 INFO collection.ExternalSorter: Thread 909 spilling in-memory map of 24.8 MB to disk (18 times so far)
18/06/08 06:18:25 INFO collection.ExternalSorter: Thread 913 spilling in-memory map of 24.7 MB to disk (10 times so far)
18/06/08 06:18:26 WARN memory.TaskMemoryManager: leak 18.3 MB memory from org.apache.spark.util.collection.ExternalSorter@12719270
18/06/08 06:18:26 ERROR executor.Executor: Managed memory leak detected; size = 19172698 bytes, TID = 7855
18/06/08 06:18:26 ERROR executor.Executor: Exception in task 168.0 in stage 8.0 (TID 7855)
java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-67,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/06/08 06:18:26 INFO scheduler.TaskSetManager: Starting task 175.0 in stage 8.0 (TID 7862, localhost, executor driver, partition 175, PROCESS_LOCAL, 2049 bytes)
18/06/08 06:18:26 INFO executor.Executor: Running task 175.0 in stage 8.0 (TID 7862)
18/06/08 06:18:26 WARN scheduler.TaskSetManager: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 ERROR scheduler.TaskSetManager: Task 168 in stage 8.0 failed 1 times; aborting job
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Getting 1280 non-empty blocks out of 1280 blocks
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Stage 12 was cancelled
18/06/08 06:18:26 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (map at SparkScalaBitcoinTransactionGraph.scala:89) failed in 18010.838 s due to Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
Driver stacktrace:
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 9
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Stage 9 was cancelled
18/06/08 06:18:26 INFO scheduler.DAGScheduler: ShuffleMapStage 9 (map at SparkScalaBitcoinTransactionGraph.scala:81) failed in 9262.181 s due to Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
Driver stacktrace:
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 8
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Stage 8 was cancelled
18/06/08 06:18:26 INFO scheduler.DAGScheduler: ShuffleMapStage 8 (map at SparkScalaBitcoinTransactionGraph.scala:85) failed in 1948.098 s due to Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
Driver stacktrace:
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 172.0 in stage 8.0 (TID 7859)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 169.0 in stage 8.0 (TID 7856)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 173.0 in stage 8.0 (TID 7860)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 170.0 in stage 8.0 (TID 7857)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 167.0 in stage 8.0 (TID 7854)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 174.0 in stage 8.0 (TID 7861)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 171.0 in stage 8.0 (TID 7858)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 175.0 in stage 8.0 (TID 7862)
18/06/08 06:18:26 INFO scheduler.DAGScheduler: Job 1 failed: top at SparkScalaBitcoinTransactionGraph.scala:100, took 18011.687877 s
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1280 blocks
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1457)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1445)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1444)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1444)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1668)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1627)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1616)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1862)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1982)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1397)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1384)
at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1365)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.top(RDD.scala:1364)
at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.jobTop5AddressInput(SparkScalaBitcoinTransactionGraph.scala:100)
at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.main(SparkScalaBitcoinTransactionGraph.scala:53)
at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph.main(SparkScalaBitcoinTransactionGraph.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 175.0 in stage 8.0 (TID 7862)
18/06/08 06:18:26 WARN scheduler.TaskSetManager: Lost task 175.0 in stage 8.0 (TID 7862, localhost, executor driver): TaskKilled (killed intentionally)
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
18/06/08 06:18:26 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 170.0 in stage 8.0 (TID 7857)
18/06/08 06:18:26 WARN scheduler.TaskSetManager: Lost task 170.0 in stage 8.0 (TID 7857, localhost, executor driver): TaskKilled (killed intentionally)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 172.0 in stage 8.0 (TID 7859)
18/06/08 06:18:26 ERROR scheduler.TaskSchedulerImpl: Exception in statusUpdate
java.util.concurrent.RejectedExecutionException: Task org.apache.spark.scheduler.TaskResultGetter$$anon$3@3eec2833 rejected from java.util.concurrent.ThreadPoolExecutor@67fd2e27[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 7857]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:104)
at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:387)
at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:361)
at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalBackend.scala:66)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
18/06/08 06:18:26 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 171.0 in stage 8.0 (TID 7858)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 173.0 in stage 8.0 (TID 7860)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 174.0 in stage 8.0 (TID 7861)
18/06/08 06:18:27 ERROR executor.Executor: Exception in task 169.0 in stage 8.0 (TID 7856)
java.lang.IllegalArgumentException: requirement failed: File segment length cannot be negative (got -16024861)
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.storage.FileSegment.<init>(FileSegment.scala:28)
at org.apache.spark.storage.DiskBlockObjectWriter.fileSegment(DiskBlockObjectWriter.scala:236)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:729)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:721)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:721)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
18/06/08 06:18:27 INFO executor.Executor: Not reporting error to driver during JVM shutdown.
18/06/08 06:18:27 ERROR executor.Executor: Exception in task 167.0 in stage 8.0 (TID 7854)
java.io.FileNotFoundException: /tmp/blockmgr-3d6f28c7-9d39-4c84-ad36-cd4864d0474f/1a/temp_shuffle_5b3ac818-3769-4488-8a4a-1bd053c342d6 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:526)
at org.apache.spark.util.collection.ExternalSorter$SpillReader.org$apache$spark$util$collection$ExternalSorter$SpillReader$$readNextItem(ExternalSorter.scala:578)
at org.apache.spark.util.collection.ExternalSorter$SpillReader$$anon$5.hasNext(ExternalSorter.scala:601)
at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:432)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:725)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:721)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:721)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
18/06/08 06:18:27 INFO executor.Executor: Not reporting error to driver during JVM shutdown.
18/06/08 06:18:33 INFO storage.MemoryStore: MemoryStore cleared
18/06/08 06:18:33 INFO storage.BlockManager: BlockManager stopped
18/06/08 06:18:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/06/08 06:18:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/06/08 06:18:33 INFO spark.SparkContext: Successfully stopped SparkContext
18/06/08 06:18:33 INFO util.ShutdownHookManager: Shutdown hook called
18/06/08 06:18:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-7aa73177-a289-47ce-ab1a-f2c3432aa428
The Ethereum Name Service (ENS) allows registration of Internet addresses on the Ethereum blockchain. In some sense, it is similar to Namecoin for the Bitcoin blockchain, but based on the smart contract VM.
The idea here is to add additional utility functionality and Hive UDFs to the library.
Of course, everything is supplied with an example and unit tests/integration tests.
The problem resides in the EthereumUtil.java:294 where a 2-byte number is assumed to be signed.
The first such block is 403419 (which I have attached) with the listHeader
of F98B1FF9021AA008741F
.
403419.zip
Namecoin is an experimental open-source technology which improves decentralization, security, censorship resistance, privacy, and speed of certain components of the Internet infrastructure such as DNS and identities. (https://namecoin.org/)
Bitcoin is the underlying technology and it shares a lot with respect to data structures. Hence, an example application will be provided for the HadoopCryptoLedger library based on Bitcoin for processing Namecoins.
This example should include unit tests and integration test. It will be available for MapReduce, Hive and Spark.
This component takes as an input a Bitcoin binary script and translates it into a human-readable script. More information https://en.bitcoin.it/wiki/Script
JDK9 is supposed to be release on 21st of September.
The goal of this task is to integrate JDK 9 compilation and unit testing into the CI chain to validate early that it works.
Add a flink native data source.
Add example for Apache Flink (https://flink.apache.org/) including integration tests.
Downloaded the entire blockchain using the canonical bitcoind, copied it to an Hadoop (2.7.3) on AWS and ran a Spark job over the data. Was OK for about 900 partitions then failed with:
Caused by: java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:506)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361)
at org.zuinnote.hadoop.bitcoin.format.common.BitcoinBlockReader.readBlock(BitcoinBlockReader.java:115)
at org.zuinnote.hadoop.bitcoin.format.mapreduce.BitcoinBlockRecordReader.nextKeyValue(BitcoinBlockRecordReader.java:83)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:207)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1126)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1132)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
.
.
Using version 1.0.4.
Hi ,
i have difficulty running the
sbt +clean +assembly +it:test
command. It is because of sbt version. could you please point out the sbt version you used?
Thanks.
Live streaming of Bitcoin blockchain data for immediate analysis to HDFS, but also other applications (e.g. via Kafka).
It could be done as a flume source or a Kafka producer.
This Flume source should
Unit and integration tests must be provided.
An example manual needs to be provided to integrate the Flume source into any cluster that has Flume support deployed. As a basic, it shows that Bitcoin Blocks are stored in HDFS files using the append mode and configurable file size (e.g. 128M) and meta data is stored in an updatable fashion in Hbase.
Litecoin is a peer-to-peer Internet currency that enables instant, near-zero cost payments to anyone in the world. (https://litecoin.org/)
Bitcoin is the underlying technology and it shares a lot with respect to data structures. Hence, an example application will be provided for the HadoopCryptoLedger library based on Bitcoin for processing Litecoins.
This example should include unit tests and integration test. It will be available for MapReduce, Hive and Spark.
I am new to big data and hadoop, I am trying to make use of hadoopcryptoledger library to do some bitcoin graph analysis, I followed this tutorial Using Spark Scala Graphx to analyze the Bitcoin transaction graph (https://github.com/ZuInnoTe/hadoopcryptoledger/wiki/Using-Spark-Scala-Graphx-to-analyze-the-Bitcoin-transaction-graph)
While executing the command
sbt clean assembly test it:test
I ran into an issue:
/home/jnikhil/hadoopcryptoledger/examples/scala-spark-graphx-
bitcointransaction/build.sbt:30: error: not found: value assemblyJarName
assemblyJarName in assembly := "example-hcl-spark-scala-graphx-bitcointransaction.jar"
[error] Type error in expression
Does anyone know why am I facing this issue?
Scala 2.10 seems to be mostly replaced by Scala 2.11
Both version has been supported from the beginning and we aim at removing support for 2.10
As discovered by liorregev and thankfully he provided a pull request:
#38
pull request passes unit tests.
The normal block reading functionality already worked correctly, it is just about the header detection.
at the same time, a similar issue might exists in the BitcoinBlock reader for header (magic) detection. Still need to be investigated.
will be added to release 1.1.2
@jornfranke
Hi,thanks for your great project.I occur this issue when run gradlew build.
╷
├─ JMockit integration ✔tcoinblock:testIntegration
└─ JUnit Jupiter ✔
└─ MapReduceBitcoinBlockIntegrationTest ✔
├─ mapReduceGenesisBlock() ✘ Successfully executed mapreduce application ==> expected: <0> but was: <1>
└─ checkTestDataGenesisBlockAvailable() ✔
Failures (1):
JUnit Jupiter:MapReduceBitcoinBlockIntegrationTest:mapReduceGenesisBlock()
MethodSource [className = 'org.zuinnote.hadoop.bitcoin.example.MapReduceBitcoinBlockIntegrationTest', methodName = 'mapReduceGenesisBlock', methodParameterTypes = '']
=> org.opentest4j.AssertionFailedError: Successfully executed mapreduce application ==> expected: <0> but was: <1>
org.zuinnote.hadoop.bitcoin.example.MapReduceBitcoinBlockIntegrationTest.mapReduceGenesisBlock(MapReduceBitcoinBlockIntegrationTest.java:187)
I have tried over for one day.but still i can't work over it.I need your help. thanks very much.
My jdk is 1.8.0_171
hadoop version
Hadoop 2.6.0-cdh5.14.2
Subversion http://github.com/cloudera/hadoop -r 5724a4ad7a27f7af31aa725694d3df09a68bb213
Compiled by jenkins on 2018-03-27T20:40Z
Compiled with protoc 2.5.0
From source with checksum 302899e86485742c090f626a828b28
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.14.2.jar
spark-shell --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Hi Jorne,
As discussed via email. I was hesitant to raise an issue as I think this may be down to error with me.
Magic, I've tried "F9BEB4D9" and also "F9BEB4D9,FABFB5DA" as well as not setting it.
For maxblocksize I've tried not having set, "2M", "2", "8M", "8". Not sure if the M is supposed to be there.
I'm using the maven module. Spark 2.10. When using method shown in SparkBitcoinBlockDataSource I get the following error, I think it might actually be a problem with spark setup:
java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
I'll try using gradle to build your project and run the examples.
Hi,
Thanks a lot for developing this library. I was going through blocks and transactions contained within them. When I tried to get hashes of transactions, I could not find them. By hash i mean the tx id: d77435177fe87b96d8e751221bfd1dfba9dc23e0cd521df59ed1725dba2c74f3 as in https://blockchain.info/tx/d77435177fe87b96d8e751221bfd1dfba9dc23e0cd521df59ed1725dba2c74f3
My code is stuck at this point:
val noOfTransactionPair = bitcoinBlocksRDD.map(e => {
val txList = new ListBuffer[(String,Int,Int)]()
val blk: BitcoinBlock = e._2
val transactions: util.List[BitcoinTransaction] = blk.getTransactions
for(x<-0 to transactions.size()-1){
val transaction: BitcoinTransaction = transactions.get(x)
**transaction.getTxHash** //this does not exist?
txList+=((blk.getTime.toString, transaction.getListOfInputs.size, transaction.getListOfOutputs.size()))
}
txList
})
Please let me know what i am missing.
EIP-155 (cf. https://github.com/ethereum/EIPs/blob/master/EIPS/eip-155.md) describes an alternative way to calculate the send address (from) of a transaction. This was not supported and led to wrong send addresses for transactions according to EIP-155
fixed in 1.1.4
Emercoin is another cryptocurrency, but at the same time enables - similar to Namecoin - storing of key/value pairs.
Emercoin seems to be largely Bitcoin based so we may be already able to read it out of the box.
We need to add special utility functions (similar to the one we added for Namecoin) to read key/value pairs in a user friendly way and provide UDFs.
Finally, we should provide examples including unit/integration tests
AuxPow is a system which is used by Altcoins (mainly Namecoin) that changes the structure of a block (cf. https://en.bitcoin.it/wiki/Merged_mining_specification).
This would require parsing additional information. It seems that AuxPow can probably be detected by checking that there is only "one transaction" (in fact it refers to one txinput) followed by a a 32 0x00 and an ending in (prevTxOutIndex) of FF FF FF FF.
Ripple "is a network of computers which use the Ripple consensus algorithm to atomically settle and record transactions on a secure distributed database, the Ripple Consensus Ledger (RCL). Because of its distributed nature, the RCL offers transaction immutability without a central operator. The RCL contains a built-in currency exchange and its path-finding algorithm finds competitive exchange rates across order books and currency pairs":
https://github.com/ripple/rippled
This implies to add additional HadoopInputFormats to process the Ripple blockchain. Furthermore a HiveSerde and Spark datasource should be created. Unit tests must be included.
Finally, an example for MapReduce, Flink, Hive and Spark should be provided. The examples should include unit and integration tests.
JDK7 has been out of maintenance since a long time, hence we need to drop support for it.
This also means we need to check new features of JDK8, such as Lambdas for the Spark examples in Java, unsigned longs, and where applicable parallel collections
This is a quality management release that includes:
All source files and packages shall contain SPDX metadata information, especially about licenses, copyrights, creator, package download information.
See here for more information about SPDX: https://spdx.org/about
The goal of this task is to automate the system testing of this library by including automated testing:
Automated testing should also report on time needed for each step in the test scenario
Currently we have to deactivate mutation testing (pitest), because it does not support junit5. We need to wait for a fix.
EIP-721 has been created in spirit of EIP-20, but focuses on non-fungible tokens. This allows you to trade a huge variety of assets on the Ethereum blockchain, such as physical properties, virtual collectables or negative value assets (loans, burdens etc.).
The goal of this utlity function is to facilitate analysis on those standardized types of tokens.
Includes unit tests and example application.
Convert to new Hadoop API:
http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
I am trying to build spark-submit according to this wiki:
https://github.com/ZuInnoTe/hadoopcryptoledger/wiki/Using-Spark-Scala-Graphx-to-analyze-the-Bitcoin-transaction-graph
spark-submit --class org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph --master local[8] ./target/scala-2.10/example-hcl-spark-scala-graphx-bitcointransaction.jar /user/bitcoin/input /user/bitcoin/output
It appears that the method such as extractTransactionData is not found. how i can solove it?thanks very much.
18/06/07 09:54:24 ERROR executor.Executor: Exception in task 10.0 in stage 0.0 (TID 10)
java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.extractTransactionData(SparkScalaBitcoinTransactionGraph.scala:108)
at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$$anonfun$1.apply(SparkScalaBitcoinTransactionGraph.scala:60)
at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$$anonfun$1.apply(SparkScalaBitcoinTransactionGraph.scala:60)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Junit 5 was published in September 2017 (cf. http://junit.org/junit5/). We should migrate to it since this is the new maintained version and it has enhanced support for JDK8.
We have to check if coverage via Jacoco works with Junit5
This is about adding examples for
Hive: - here you need to explode the transactions in a table, calculate their current hash and join the inputs of a transactions based on the hash and input index. The difference between the total sum of input values - the total sum of output values corresponds to the transaction fee
Flink: - similar as in Hive using FlinkSQL
Spark: - similar as in Hive using Spark SQL
MapReduce: here a more detailed mapreduce job need to be specified
for calculating transaction fees.
Spark 2.2+ transitively requires BC version 1.51 which conflicts with 1.58 required by inputformat
. I would suggest 1.58 is shaded and assembled into the JAR to avoid this issue (if possible).
Reference: ZuInnoTe/spark-hadoopcryptoledger-ds#9
Attached a file with the un-parsable block
447533.zip
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.