zuinnote / hadoopcryptoledger Goto Github PK

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

License: Apache License 2.0

Java 99.25% Shell 0.75%

bigdata bitcoin blockchain cryptoledger ethereum flink hadoop hive spark

hadoopcryptoledger's People

Contributors

Stargazers

Watchers

hadoopcryptoledger's Issues

Negative gasPrice and gasLimit

Hello and thank you for creating this very useful open source project!

I started using it yesterday and ran into a couple of issues, I am hoping you can point me in the right direction. I'm including here an example app for reproducing the problems.

My setup:

Ubuntu 16.04 (64-bit)
Scala 2.11.7
Spark 2.2.1 (Hadoop 2.7) running in local mode
Geth 1.7.3-stable

My Geth node is synced to the mainnet and following the advice here, I created multiple files for holding the exported blockchain (up until block number 5M).

Here's a snippet of the directory, with the file names using a slightly different nomenclature than the example, but still produced with geth export:

➜  eth-blockchain ls -lah
total 25G
drwxrwxr-x  2 sancho sancho 4.0K Feb 13 16:22 .
drwxr-xr-x 43 sancho sancho 4.0K Feb 13 16:22 ..
-rwxrwxr-x  1 sancho sancho 126M Feb 13 03:01 blocks-0-200000
-rwxrwxr-x  1 sancho sancho 240M Feb 13 03:32 blocks-1000001-1200000
-rwxrwxr-x  1 sancho sancho 274M Feb 13 03:33 blocks-1200001-1400000
-rwxrwxr-x  1 sancho sancho 299M Feb 13 03:33 blocks-1400001-1600000
-rwxrwxr-x  1 sancho sancho 307M Feb 13 03:40 blocks-1600001-1800000
-rwxrwxr-x  1 sancho sancho 290M Feb 13 03:40 blocks-1800001-2000000
-rwxrwxr-x  1 sancho sancho 301M Feb 13 03:41 blocks-2000001-2200000
-rwxrwxr-x  1 sancho sancho 148M Feb 13 03:02 blocks-200001-400000
-rwxrwxr-x  1 sancho sancho 332M Feb 13 03:41 blocks-2200001-2400000
...

The code for the app I'm using:

package analytics

import collection.JavaConverters._
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io.BytesWritable
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.zuinnote.hadoop.ethereum.format.common.EthereumBlock
import org.zuinnote.hadoop.ethereum.format.mapreduce.EthereumBlockFileInputFormat


object TestApp {

  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf()
      .setAppName("test-app")
      .setMaster("local[*]")

    val spark = SparkSession.builder
      .config(sparkConf)
      .getOrCreate()

    val hadoopConf = new Configuration()
    hadoopConf.set("hadoopcryptoledeger.ethereumblockinputformat.usedirectbuffer", "false")

    val rdd = spark.sparkContext.newAPIHadoopFile(
      "/home/sancho/eth-blockchain",
      classOf[EthereumBlockFileInputFormat], classOf[BytesWritable], classOf[EthereumBlock], hadoopConf
    )

    println("Number of transactions with negative gas price: " + rdd
      .flatMap(_._2.getEthereumTransactions.asScala)
      .filter(_.getGasPrice < 0)
      .count()
    )

    println("Number of transactions with negative gas limit: " + rdd
      .flatMap(_._2.getEthereumTransactions.asScala)
      .filter(_.getGasLimit < 0)
      .count()
    )

    val blockNumber = 4800251

    println(s"Number of transactions with negative gas price in block $blockNumber: " + rdd
        .filter(_._2.getEthereumBlockHeader.getNumber == blockNumber)
        .flatMap(_._2.getEthereumTransactions.asScala)
        .filter(_.getGasPrice < 0)
        .count()
    )

    println(s"Number of transactions with negative gas limit in block $blockNumber: " + rdd
      .filter(_._2.getEthereumBlockHeader.getNumber == blockNumber)
      .flatMap(_._2.getEthereumTransactions.asScala)
      .filter(_.getGasLimit < 0)
      .count()
    )
  }
}

This is the build.sbt file:

lazy val commonSettings = Seq(
  scalaVersion := "2.11.7",
  test in assembly := {}
)

lazy val ethBlockchainAnalytics = (project in file(".")).
  settings(commonSettings).
  settings(
    name := "EthBlockchainAnalytics",
    version := "0.1",
    libraryDependencies ++= Seq(
      "com.github.zuinnote" %% "spark-hadoopcryptoledger-ds" % "1.1.2",
      "org.apache.spark" %% "spark-core" % "2.2.1" % "provided",
      "org.apache.spark" %% "spark-sql" % "2.2.1" % "provided"),
    assemblyJarName in assembly := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar",
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", xs@_*) => MergeStrategy.discard
      case PathList("javax", "servlet", xs@_*) => MergeStrategy.last
      case PathList("org", "apache", xs@_*) => MergeStrategy.last
      case x =>
        val oldStrategy = (assemblyMergeStrategy in assembly).value
        oldStrategy(x)
    }
  )

The launcher script I'm using:

#!/bin/sh

JAR=$1

/usr/local/lib/spark/bin/spark-submit \
    --class analytics.TestApp \
    --driver-memory 20G \
$JAR

And finally the command I'm using to run it:

➜  EthBlockchainAnalytics src/main/resources/launcher.sh /home/sancho/IdeaProjects/EthBlockchainAnalytics/target/scala-2.11/EthBlockchainAnalytics_2.11-0.1.jar

The output of the above application when run like this is:

Number of transactions with negative gas price: 8732263
Number of transactions with negative gas limit: 25699923
Number of transactions with negative gas price in block 4800251: 2
Number of transactions with negative gas limit in block 4800251: 8

As a quick sanity check, I ran the following:

➜  ~ geth attach
Welcome to the Geth JavaScript console!

instance: Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9
 modules: admin:1.0 debug:1.0 eth:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0

> var txs = eth.getBlock(4800251).transactions
undefined
> for (var i=0; i<txs.length; i++) { if (eth.getTransaction(txs[i]).gasPrice < 0) console.log(txs[i]) }
undefined

Any idea why I'm seeing so many negative gas prices and gas limits when using Spark?

Flink: Create Table Source

Create a Flink Table Source to facilitate working with the Flink Table Api (cf. https://flink.apache.org/news/2017/03/29/table-sql-api-update.html)

Ethereum: Utility function to parse ERC20 Tokens

ERC20 tokens are standardized currencies on the Ethereum block chain (cf. https://github.com/ethereum/EIPs/blob/master/EIPS/eip-20.md). Although one could directly parse them without any issues in any Big Data applications, it could make sense to make them as part of the library since they have many applications.

Examples, unit and intregration tests need to be provided.

Create Web scraper to fetch currency exchange rates

Create web scraper to fetch currency exchange rates for currencies based on cryptoledgers.
This should support multiple sources in a generic way and the data should be made available together with cryptoledger data provided already by the HadoopCryptoLedger library.

First we need to work on a design, implementation, unit & integration tests (the latter probably on an embedded jetty and/or Tomcat)

Investigate and implement support for Hadoop 3.1

Hadoop 3.1 has been released (cf. https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release).

We need to investigate the impact of it (especially for the mapred.* and mapreduce.* APIs, JDK8 only support) and implement some test cases.

We also need to take into account the dependence of other Big Data platforms on Hadoop (most notably Hive and Spark, to a lesser extent Flink)

Support for Ethereum

Ethereum is a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference (https://www.ethereum.org/).

This implies to add additional HadoopInputFormats to process the Ethereum blockchain. Furthermore a HiveSerde and Spark datasource should be created. Unit tests must be included.

Finally, an example for MapReduce, Hive and Spark should be provided. The examples should include unit and integration tests.

Ethereum classic is out of scope.

Is there a way around the currently supported maximum ethereum block size ?

We are running into exceptions when attempting to do graph analysis using graphframes after creating a dataframe using spark-hadoopcryptoledger-ds. Seems like there are blocksizes bigger than what can fit inside an int ?

java.lang.InterruptedException: org.zuinnote.hadoop.ethereum.format.exception.EthereumBlockReadException: Error: This block size cannot be handled currently (larger then largest number in positive signed int)

Bitcoin: Enhanced support for segwit

Although segwit blocks can be processed with the current hadoopcryptoledger library, dedicated support makes it more easier to use.

Add unit tests for segwit2x compatible blocks:
https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki

fix if needed issues.

increase the default max blocksize according to formula (although the current default of 2M should be sufficient to process the new segwit blocks). Set default max block size to block weight <= 4,000,000

BitcoinScriptpatternparser:
Add parsing for the new witness transaction types:
pay-to-witness-public-key-hash
pay-to-witness-script-hash
support compressed p2hash p2pub addresses

BitcoinUtil:
calculate segwit transaction hash
calculate transaction weight
calculate virtual transaction size
calculate base transaction size
calculate total transaction size
calculate block weight
calculate block base size

Add support for Slimcoin

Slimcoin is according to its web site "is the first known cryptocurrency designed as a combined proof-of-burn/proof-of-stake/proof-of-work system).

Since it has proof-of-burn it might be valuable to analyse this one as well to evaluate effects of proof-of-burn

https://github.com/slimcoin-project/Slimcoin

Ethereum: Encoding is sometimes wrong and thus transaction hash / sendaddress

There was a bug in the encoding and thus the calculation of transaction hash was wrong in some cases.

Fixed in 1.1.4

Support for Monero

Monero is a secure, private, untraceable currency. It is open-source and freely available to all.(https://getmonero.org/home).

This implies to add additional HadoopInputFormats to process the Monero blockchain. Furthermore a HiveSerde, Flink Data Source and Spark datasource should be created. Unit tests must be included.

Finally, an example for MapReduce, Hive, Flink and Spark should be provided. The examples should include unit and integration tests.

Bitcoin: Convert key hash to Bitcoin address

At the moment BitcoinScriptPatternParser returns the key hash which can be used to identify the unique address of a transaction and also to search it in blockchain explorers.
Bitcoin has also the notation of Bitcoin addresses which are primarily for users to improve the user experience. The conversion of such addresses is described in https://github.com/bitcoin/bips/blob/master/bip-0013.mediawiki

The idea of this issue is to create a functionality BitcoinScriptPatternParser.getBitcoinAddress(String) that converts the output of BitcoinScriptPatternParser.getPaymentDestination to a bitcoin address (if possible). Additionally we create HiveUDFs. All new methods need to be unit tested and documentation needs to be updated.

Drop Hive 1.2.x support and provide Hive 2.x support

Hive 2.x is out since some time now and thus we can drop Hive 1.2.x support and introduce Hive 2.x support

Support Hyperledger

The hyperledger provides various frameworks for developing cryptoledgers. Frameworks are: Fabric, Iroha, Sawtooth.

The idea of this issue to investigate how they can be included in hadoopcryptoledger with unit tests, provide examples on how to analyse of those ledgers and write integration tests for them.

Possible data loss when reading transaction values

My understanding is that the value field of an ethereum transaction can by upto 32 bytes long, however we store this as a long after parsing the RLP encoded data. It seems that we store 0L in case the data is longer than 8 bytes and there is no warning or error thrown when transaction values are discarded / replaced by zeros. Is this the case ?

Investigate support for Apache Beam

Investigate support and create examples+unit tests for using HadoopOffice with Apache Beam (https://beam.apache.org/).

Apache Beam supports writing Big Data jobs once and run them on multiple platforms (e.g. Flink, Spark, Apex, Google Cloud Dataflow...)

Presto: Create Connector

Presto is a distributed SQL Query Engine for Big Data supporting a variety of sources. The idea is to add the hadoopcryptoledger library as a source to query directly blockchains, such as Bitcoin&Altcoins or Ethereum&Altcoins.
Currently, the hadoopcryptoledger library can be used indirectly with Presto via the Hive connector.

Relevant documentation:
https://prestodb.io/docs/current/develop/connectors.html

Issue:Spark Scala Graphx to analyze the Bitcoin transaction graph

@jornfranke
I have run the example according to the wiki:
https://github.com/ZuInnoTe/hadoopcryptoledger/wiki/Using-Spark-Scala-Graphx-to-analyze-the-Bitcoin-transaction-graph

it has run over 4 hours,and the issue occurs below.
and i run it again for about 4hours.the issue still happens.
it seems that the memory is not enough.but the memory of my docker is enough big.

[root@quickstart scala-spark-graphx-bitcointransaction]# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay         2.0T  1.2T  797G  60% /
tmpfs            32G     0   32G   0% /dev
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/xvda1      2.0T  1.2T  797G  60% /etc/resolv.conf
/dev/xvda1      2.0T  1.2T  797G  60% /etc/hostname
/dev/xvda1      2.0T  1.2T  797G  60% /etc/hosts
shm              64M     0   64M   0% /dev/shm
[root@quickstart scala-spark-graphx-bitcointransaction]# free
             total       used       free     shared    buffers     cached
Mem:      65960180   25773772   40186408     686156     102472   14819768
-/+ buffers/cache:   10851532   55108648
Swap:            0          0          0
[root@quickstart scala-spark-graphx-bitcointransaction]# free -h
             total       used       free     shared    buffers     cached
Mem:           62G        24G        38G       670M       100M        14G
-/+ buffers/cache:        10G        52G
Swap:           0B         0B         0B

how i can deal with this problem?thanks very much.

18/06/08 06:17:35 INFO collection.ExternalSorter: Thread 913 spilling in-memory map of 24.6 MB to disk (9 times so far)
18/06/08 06:17:35 INFO collection.ExternalSorter: Thread 947 spilling in-memory map of 25.0 MB to disk (28 times so far)
18/06/08 06:17:55 INFO collection.ExternalAppendOnlyMap: Thread 922 spilling in-memory map of 27.4 MB to disk (3 times so far)
18/06/08 06:18:02 INFO collection.ExternalSorter: Thread 947 spilling in-memory map of 24.7 MB to disk (29 times so far)
18/06/08 06:18:04 INFO collection.ExternalSorter: Thread 950 spilling in-memory map of 24.7 MB to disk (15 times so far)
18/06/08 06:18:09 INFO collection.ExternalSorter: Thread 909 spilling in-memory map of 24.8 MB to disk (18 times so far)
18/06/08 06:18:25 INFO collection.ExternalSorter: Thread 913 spilling in-memory map of 24.7 MB to disk (10 times so far)
18/06/08 06:18:26 WARN memory.TaskMemoryManager: leak 18.3 MB memory from org.apache.spark.util.collection.ExternalSorter@12719270
18/06/08 06:18:26 ERROR executor.Executor: Managed memory leak detected; size = 19172698 bytes, TID = 7855
18/06/08 06:18:26 ERROR executor.Executor: Exception in task 168.0 in stage 8.0 (TID 7855)
java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-67,5,main]
java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/06/08 06:18:26 INFO scheduler.TaskSetManager: Starting task 175.0 in stage 8.0 (TID 7862, localhost, executor driver, partition 175, PROCESS_LOCAL, 2049 bytes)
18/06/08 06:18:26 INFO executor.Executor: Running task 175.0 in stage 8.0 (TID 7862)
18/06/08 06:18:26 WARN scheduler.TaskSetManager: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

18/06/08 06:18:26 ERROR scheduler.TaskSetManager: Task 168 in stage 8.0 failed 1 times; aborting job
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Getting 1280 non-empty blocks out of 1280 blocks
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Stage 12 was cancelled
18/06/08 06:18:26 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (map at SparkScalaBitcoinTransactionGraph.scala:89) failed in 18010.838 s due to Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

Driver stacktrace:
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 9
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Stage 9 was cancelled
18/06/08 06:18:26 INFO scheduler.DAGScheduler: ShuffleMapStage 9 (map at SparkScalaBitcoinTransactionGraph.scala:81) failed in 9262.181 s due to Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

Driver stacktrace:
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 8
18/06/08 06:18:26 INFO scheduler.TaskSchedulerImpl: Stage 8 was cancelled
18/06/08 06:18:26 INFO scheduler.DAGScheduler: ShuffleMapStage 8 (map at SparkScalaBitcoinTransactionGraph.scala:85) failed in 1948.098 s due to Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

Driver stacktrace:
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 172.0 in stage 8.0 (TID 7859)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 169.0 in stage 8.0 (TID 7856)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 173.0 in stage 8.0 (TID 7860)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 170.0 in stage 8.0 (TID 7857)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 167.0 in stage 8.0 (TID 7854)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 174.0 in stage 8.0 (TID 7861)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 171.0 in stage 8.0 (TID 7858)
18/06/08 06:18:26 INFO executor.Executor: Executor is trying to kill task 175.0 in stage 8.0 (TID 7862)
18/06/08 06:18:26 INFO scheduler.DAGScheduler: Job 1 failed: top at SparkScalaBitcoinTransactionGraph.scala:100, took 18011.687877 s
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1280 blocks
18/06/08 06:18:26 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 168 in stage 8.0 failed 1 times, most recent failure: Lost task 168.0 in stage 8.0 (TID 7855, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1445)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1444)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1444)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1668)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1627)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1616)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1862)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1982)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
	at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1397)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1384)
	at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1365)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.top(RDD.scala:1364)
	at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.jobTop5AddressInput(SparkScalaBitcoinTransactionGraph.scala:100)
	at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.main(SparkScalaBitcoinTransactionGraph.scala:53)
	at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph.main(SparkScalaBitcoinTransactionGraph.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
	at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.readNextItem(ExternalAppendOnlyMap.scala:515)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.hasNext(ExternalAppendOnlyMap.scala:535)
	at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.org$apache$spark$util$collection$ExternalAppendOnlyMap$ExternalIterator$$readNextHashCode(ExternalAppendOnlyMap.scala:335)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:408)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:406)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:406)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:301)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 175.0 in stage 8.0 (TID 7862)
18/06/08 06:18:26 WARN scheduler.TaskSetManager: Lost task 175.0 in stage 8.0 (TID 7862, localhost, executor driver): TaskKilled (killed intentionally)
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
18/06/08 06:18:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
18/06/08 06:18:26 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 170.0 in stage 8.0 (TID 7857)
18/06/08 06:18:26 WARN scheduler.TaskSetManager: Lost task 170.0 in stage 8.0 (TID 7857, localhost, executor driver): TaskKilled (killed intentionally)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 172.0 in stage 8.0 (TID 7859)
18/06/08 06:18:26 ERROR scheduler.TaskSchedulerImpl: Exception in statusUpdate
java.util.concurrent.RejectedExecutionException: Task org.apache.spark.scheduler.TaskResultGetter$$anon$3@3eec2833 rejected from java.util.concurrent.ThreadPoolExecutor@67fd2e27[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 7857]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
	at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:104)
	at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:387)
	at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:361)
	at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalBackend.scala:66)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
18/06/08 06:18:26 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 171.0 in stage 8.0 (TID 7858)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 173.0 in stage 8.0 (TID 7860)
18/06/08 06:18:26 INFO executor.Executor: Executor killed task 174.0 in stage 8.0 (TID 7861)
18/06/08 06:18:27 ERROR executor.Executor: Exception in task 169.0 in stage 8.0 (TID 7856)
java.lang.IllegalArgumentException: requirement failed: File segment length cannot be negative (got -16024861)
	at scala.Predef$.require(Predef.scala:233)
	at org.apache.spark.storage.FileSegment.<init>(FileSegment.scala:28)
	at org.apache.spark.storage.DiskBlockObjectWriter.fileSegment(DiskBlockObjectWriter.scala:236)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:729)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:721)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:721)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
18/06/08 06:18:27 INFO executor.Executor: Not reporting error to driver during JVM shutdown.
18/06/08 06:18:27 ERROR executor.Executor: Exception in task 167.0 in stage 8.0 (TID 7854)
java.io.FileNotFoundException: /tmp/blockmgr-3d6f28c7-9d39-4c84-ad36-cd4864d0474f/1a/temp_shuffle_5b3ac818-3769-4488-8a4a-1bd053c342d6 (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:146)
	at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:526)
	at org.apache.spark.util.collection.ExternalSorter$SpillReader.org$apache$spark$util$collection$ExternalSorter$SpillReader$$readNextItem(ExternalSorter.scala:578)
	at org.apache.spark.util.collection.ExternalSorter$SpillReader$$anon$5.hasNext(ExternalSorter.scala:601)
	at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:432)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:725)
	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:721)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:721)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
18/06/08 06:18:27 INFO executor.Executor: Not reporting error to driver during JVM shutdown.
18/06/08 06:18:33 INFO storage.MemoryStore: MemoryStore cleared
18/06/08 06:18:33 INFO storage.BlockManager: BlockManager stopped
18/06/08 06:18:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/06/08 06:18:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/06/08 06:18:33 INFO spark.SparkContext: Successfully stopped SparkContext
18/06/08 06:18:33 INFO util.ShutdownHookManager: Shutdown hook called
18/06/08 06:18:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-7aa73177-a289-47ce-ab1a-f2c3432aa428

Add support for Ethereum Name Service (ENS)

The Ethereum Name Service (ENS) allows registration of Internet addresses on the Ethereum blockchain. In some sense, it is similar to Namecoin for the Bitcoin blockchain, but based on the smart contract VM.

https://ens.domains/

The idea here is to add additional utility functionality and Hive UDFs to the library.
Of course, everything is supplied with an example and unit tests/integration tests.

Reading blocks with sizes between 0x8000 and 0xFFFF is failing

The problem resides in the EthereumUtil.java:294 where a 2-byte number is assumed to be signed.
The first such block is 403419 (which I have attached) with the listHeader of F98B1FF9021AA008741F.
403419.zip

Example for Namecoin

Namecoin is an experimental open-source technology which improves decentralization, security, censorship resistance, privacy, and speed of certain components of the Internet infrastructure such as DNS and identities. (https://namecoin.org/)

Bitcoin is the underlying technology and it shares a lot with respect to data structures. Hence, an example application will be provided for the HadoopCryptoLedger library based on Bitcoin for processing Namecoins.

This example should include unit tests and integration test. It will be available for MapReduce, Hive and Spark.

Extend Bitcoin Script Parser to translate Bitcoin binary scripts into human-readable scripts

This component takes as an input a Bitcoin binary script and translates it into a human-readable script. More information https://en.bitcoin.it/wiki/Script

Investigate and implement support for JDK9

JDK9 is supposed to be release on 21st of September.

The goal of this task is to integrate JDK 9 compilation and unit testing into the CI chain to validate early that it works.

Support for Apache Flink

Add a flink native data source.
Add example for Apache Flink (https://flink.apache.org/) including integration tests.

BufferUnderflowException in BitcoinBlockReader.readBlock

Downloaded the entire blockchain using the canonical bitcoind, copied it to an Hadoop (2.7.3) on AWS and ran a Spark job over the data. Was OK for about 900 partitions then failed with:

Caused by: java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:506)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361)
at org.zuinnote.hadoop.bitcoin.format.common.BitcoinBlockReader.readBlock(BitcoinBlockReader.java:115)
at org.zuinnote.hadoop.bitcoin.format.mapreduce.BitcoinBlockRecordReader.nextKeyValue(BitcoinBlockRecordReader.java:83)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:207)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1126)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1132)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
.
.

Using version 1.0.4.

sbt version

Hi ,

i have difficulty running the
sbt +clean +assembly +it:test
command. It is because of sbt version. could you please point out the sbt version you used?

Thanks.

Flume Source to live stream Blockchain data into HDFS

Live streaming of Bitcoin blockchain data for immediate analysis to HDFS, but also other applications (e.g. via Kafka).

It could be done as a flume source or a Kafka producer.
This Flume source should

Provide Bitcoin Blocks to any Flume Channel
Provide Bitcoin Block metadata (e.g. number of confirmations, validations of checksums etc.) to any Flume Channel. Metadata should be related to one block and does not describe deltas, but only full changes. For example, the number of confirmations is always the currently known total number of confirmations and not new confirmations that are known. The reason is that otherwise the application would have to maintain this information which leads usually to inconsistent information (e.g. number of confirmations is different from the real number of confirmations etc.). However, the flume source would need to have a backend to manage state, which should be ideally configurable. Via JDBC one could connect to a variety of NoSQL databases (e.g. Hbase, ignite etc.).

Unit and integration tests must be provided.
An example manual needs to be provided to integrate the Flume source into any cluster that has Flume support deployed. As a basic, it shows that Bitcoin Blocks are stored in HDFS files using the append mode and configurable file size (e.g. 128M) and meta data is stored in an updatable fashion in Hbase.

Example for Litecoin

Litecoin is a peer-to-peer Internet currency that enables instant, near-zero cost payments to anyone in the world. (https://litecoin.org/)

This example should include unit tests and integration test. It will be available for MapReduce, Hive and Spark.

error: not found: value assemblyJarName in assembly

I am new to big data and hadoop, I am trying to make use of hadoopcryptoledger library to do some bitcoin graph analysis, I followed this tutorial Using Spark Scala Graphx to analyze the Bitcoin transaction graph (https://github.com/ZuInnoTe/hadoopcryptoledger/wiki/Using-Spark-Scala-Graphx-to-analyze-the-Bitcoin-transaction-graph)

While executing the command

sbt clean assembly test it:test

I ran into an issue:

/home/jnikhil/hadoopcryptoledger/examples/scala-spark-graphx-    
bitcointransaction/build.sbt:30: error: not found: value assemblyJarName
assemblyJarName in assembly := "example-hcl-spark-scala-graphx-bitcointransaction.jar"
[error] Type error in expression

Does anyone know why am I facing this issue?

Drop support for Scala 2.10 in examples

Scala 2.10 seems to be mostly replaced by Scala 2.11
Both version has been supported from the beginning and we aim at removing support for 2.10

Read operations for detecting Ethereum Block header may not read all bytes in case of network latency

As discovered by liorregev and thankfully he provided a pull request:
#38

pull request passes unit tests.

The normal block reading functionality already worked correctly, it is just about the header detection.
at the same time, a similar issue might exists in the BitcoinBlock reader for header (magic) detection. Still need to be investigated.

will be added to release 1.1.2

gradle problem:testIntegeration maoReduceGenesisBlock() fail

@jornfranke
Hi,thanks for your great project.I occur this issue when run gradlew build.

╷
├─ JMockit integration ✔tcoinblock:testIntegration
└─ JUnit Jupiter ✔
   └─ MapReduceBitcoinBlockIntegrationTest ✔
      ├─ mapReduceGenesisBlock() ✘ Successfully executed mapreduce application ==> expected: <0> but was: <1>
      └─ checkTestDataGenesisBlockAvailable() ✔

Failures (1):
  JUnit Jupiter:MapReduceBitcoinBlockIntegrationTest:mapReduceGenesisBlock()
    MethodSource [className = 'org.zuinnote.hadoop.bitcoin.example.MapReduceBitcoinBlockIntegrationTest', methodName = 'mapReduceGenesisBlock', methodParameterTypes = '']
    => org.opentest4j.AssertionFailedError: Successfully executed mapreduce application ==> expected: <0> but was: <1>
       org.zuinnote.hadoop.bitcoin.example.MapReduceBitcoinBlockIntegrationTest.mapReduceGenesisBlock(MapReduceBitcoinBlockIntegrationTest.java:187)

I have tried over for one day.but still i can't work over it.I need your help. thanks very much.

My jdk is 1.8.0_171
hadoop version

Hadoop 2.6.0-cdh5.14.2
Subversion http://github.com/cloudera/hadoop -r 5724a4ad7a27f7af31aa725694d3df09a68bb213
Compiled by jenkins on 2018-03-27T20:40Z
Compiled with protoc 2.5.0
From source with checksum 302899e86485742c090f626a828b28
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.14.2.jar

spark-shell --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

ERROR BitcoinBlockReader:476 - Cannot skip block in InputStream

Hi Jorne,

As discussed via email. I was hesitant to raise an issue as I think this may be down to error with me.

Magic, I've tried "F9BEB4D9" and also "F9BEB4D9,FABFB5DA" as well as not setting it.

For maxblocksize I've tried not having set, "2M", "2", "8M", "8". Not sure if the M is supposed to be there.

I'm using the maven module. Spark 2.10. When using method shown in SparkBitcoinBlockDataSource I get the following error, I think it might actually be a problem with spark setup:

java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;

I'll try using gradle to build your project and run the examples.

Cannot find Transaction hash

Hi,

Thanks a lot for developing this library. I was going through blocks and transactions contained within them. When I tried to get hashes of transactions, I could not find them. By hash i mean the tx id: d77435177fe87b96d8e751221bfd1dfba9dc23e0cd521df59ed1725dba2c74f3 as in https://blockchain.info/tx/d77435177fe87b96d8e751221bfd1dfba9dc23e0cd521df59ed1725dba2c74f3

My code is stuck at this point:

val noOfTransactionPair = bitcoinBlocksRDD.map(e => {
      val txList = new ListBuffer[(String,Int,Int)]()
      val blk: BitcoinBlock = e._2
      val transactions: util.List[BitcoinTransaction] = blk.getTransactions
      for(x<-0 to transactions.size()-1){
        val transaction: BitcoinTransaction = transactions.get(x)
       **transaction.getTxHash** //this does not exist?
        txList+=((blk.getTime.toString, transaction.getListOfInputs.size, transaction.getListOfOutputs.size()))
      }

      txList
    })

Please let me know what i am missing.

Ethereum: EIP-155 not supported

EIP-155 (cf. https://github.com/ethereum/EIPs/blob/master/EIPS/eip-155.md) describes an alternative way to calculate the send address (from) of a transaction. This was not supported and led to wrong send addresses for transactions according to EIP-155

fixed in 1.1.4

Support for Emercoin

Emercoin is another cryptocurrency, but at the same time enables - similar to Namecoin - storing of key/value pairs.

Emercoin seems to be largely Bitcoin based so we may be already able to read it out of the box.

We need to add special utility functions (similar to the one we added for Namecoin) to read key/value pairs in a user friendly way and provide UDFs.

Finally, we should provide examples including unit/integration tests

https://github.com/emercoin/emercoin

Bitcoin: Support AuxPow for AltCoins

AuxPow is a system which is used by Altcoins (mainly Namecoin) that changes the structure of a block (cf. https://en.bitcoin.it/wiki/Merged_mining_specification).

This would require parsing additional information. It seems that AuxPow can probably be detected by checking that there is only "one transaction" (in fact it refers to one txinput) followed by a a 32 0x00 and an ending in (prevTxOutIndex) of FF FF FF FF.

Support for Ripple

Ripple "is a network of computers which use the Ripple consensus algorithm to atomically settle and record transactions on a secure distributed database, the Ripple Consensus Ledger (RCL). Because of its distributed nature, the RCL offers transaction immutability without a central operator. The RCL contains a built-in currency exchange and its path-finding algorithm finds competitive exchange rates across order books and currency pairs":
https://github.com/ripple/rippled

This implies to add additional HadoopInputFormats to process the Ripple blockchain. Furthermore a HiveSerde and Spark datasource should be created. Unit tests must be included.

Finally, an example for MapReduce, Flink, Hive and Spark should be provided. The examples should include unit and integration tests.

Drop support for JDK 7

JDK7 has been out of maintenance since a long time, hence we need to drop support for it.
This also means we need to check new features of JDK8, such as Lambdas for the Spark examples in Java, unsigned longs, and where applicable parallel collections

Quality Management Release

This is a quality management release that includes:

Unit tests for coverage up to 80%
Evaluate quality of selected unit tests by enabling Mutation Testing using Pitest
Unit tests for the example applications
Integration tests for the example applications
Automated publishing of test reports to Github pages
Fix of reported bugs and code smells reported by static code analyzers
Versioning of build tool (gradle wrapper)

Provide SPDX metadata in source and packages

All source files and packages shall contain SPDX metadata information, especially about licenses, copyrights, creator, package download information.

See here for more information about SPDX: https://spdx.org/about

Automated cloud testing

The goal of this task is to automate the system testing of this library by including automated testing:

Automatically load the full blockchains into cloud storage (We start with the Bitcoin blockchain)
Execute the following test scenarios on the full blockchain
- Get the balance of each wallet (using Hive on TEZ/ Flink / Spark)
- Determine the average transaction fees / block (using Hive on TEZ / Flink / Spark)
- Create Bitcoin Transaction graph and execute a PageRank on it
- ... to be extended

Automated testing should also report on time needed for each step in the test scenario

Mutation testing (Pitest) not working with Junit5

Currently we have to deactivate mutation testing (pitest), because it does not support junit5. We need to wait for a fix.

See hcoles/pitest#284

Ethereum: Utility function for EIP-721 (non-fungible tokens)

EIP-721 has been created in spirit of EIP-20, but focuses on non-fungible tokens. This allows you to trade a huge variety of assets on the Ethereum blockchain, such as physical properties, virtual collectables or negative value assets (loans, burdens etc.).

The goal of this utlity function is to facilitate analysis on those standardized types of tokens.
Includes unit tests and example application.

Convert to new Hadoop API

Convert to new Hadoop API:
http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

spark-submit NoSuchMethodError org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.extractTransactionData

@jornfranke

I am trying to build spark-submit according to this wiki:
https://github.com/ZuInnoTe/hadoopcryptoledger/wiki/Using-Spark-Scala-Graphx-to-analyze-the-Bitcoin-transaction-graph

spark-submit --class org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph --master local[8] ./target/scala-2.10/example-hcl-spark-scala-graphx-bitcointransaction.jar /user/bitcoin/input /user/bitcoin/output

It appears that the method such as extractTransactionData is not found. how i can solove it?thanks very much.

18/06/07 09:54:24 ERROR executor.Executor: Exception in task 10.0 in stage 0.0 (TID 10)
java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
	at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$.extractTransactionData(SparkScalaBitcoinTransactionGraph.scala:108)
	at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$$anonfun$1.apply(SparkScalaBitcoinTransactionGraph.scala:60)
	at org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinTransactionGraph$$anonfun$1.apply(SparkScalaBitcoinTransactionGraph.scala:60)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Investigate migration to Junit5

Junit 5 was published in September 2017 (cf. http://junit.org/junit5/). We should migrate to it since this is the new maintained version and it has enhanced support for JDK8.

We have to check if coverage via Jacoco works with Junit5

Bitcoin: Add example on how to calculate transaction fee

This is about adding examples for
Hive: - here you need to explode the transactions in a table, calculate their current hash and join the inputs of a transactions based on the hash and input index. The difference between the total sum of input values - the total sum of output values corresponds to the transaction fee
Flink: - similar as in Hive using FlinkSQL
Spark: - similar as in Hive using Spark SQL
MapReduce: here a more detailed mapreduce job need to be specified
for calculating transaction fees.

Bouncy Castle Version Conflicts with Spark 2.2's

Spark 2.2+ transitively requires BC version 1.51 which conflicts with 1.58 required by inputformat. I would suggest 1.58 is shaded and assembled into the JAR to avoid this issue (if possible).

Reference: ZuInnoTe/spark-hadoopcryptoledger-ds#9

Reading block 447533 causes BufferUnderflowException

Attached a file with the un-parsable block
447533.zip

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.