Giter Club home page Giter Club logo

scala-spark-examples's Introduction

Spark and Data Science on the JVM

#Nd4j

Nd4j is a scientific computing framework on the JVM meant to emulate coding practices of numpy and matlab. JVM based code often involves loops and lacks ndarrays. Nd4j is an interface to a number of BLAS implementations such as jcublas and jblas

##Spark

Deep learning4j spark uses parallel iterative reduce parameter averaging with training on datasets a mini batch size at a time based on partitioning. We expose an ml lib like interface to the algorithms. The general idea is as follows:

  1. Instanitate a configuration the same way you would in deeplearning4j-core.
   val conf = new NeuralNetConfiguration.Builder().momentum(0.9)
     .activationFunction(Activations.tanh()).weightInit(WeightInit.VI)
     .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
     .iterations(100).visibleUnit(RBM.VisibleUnit.GAUSSIAN)
     .hiddenUnit(RBM.HiddenUnit.RECTIFIED).stepFunction(new GradientStepFunction())
     .nIn(4).nOut(3).layerFactory(layerFactory)
     .list(3).hiddenLayerSizes(3, 2).`override`(classifierOverride )
     .build()

Setup a spark conf and context as normal

    val sparkConf = new SparkConf().setMaster("local[8]")
       .set(SparkDl4jMultiLayer.AVERAGE_EACH_ITERATION, "false")
       .set("spark.akka.frameSize", "100").setAppName("mnist")
 
     val sc = new JavaSparkContext(new SparkContext(sparkConf))
 
     val d: DataSet = new IrisDataSetIterator(150, 150).next
     d.normalizeZeroMeanZeroUnitVariance
     d.shuffle
     val next: java.util.List[DataSet] = d.asList
 
     val data: JavaRDD[DataSet] = sc.parallelize(next)
     val examples = MLLibUtil.fromDataSet(sc,data).rdd
     val network2: MultiLayerNetwork = SparkDl4jMultiLayer.train(examples,conf)

Contributions:

Scala community! One of the things we need is a scala wrapper for nd4j. Operator overloading for INDArray is a huge priority for us.

We also need a scala wrapper for deeplearning4j. There are definitely better ways we could be doing things. Please subscribe to our google group: http://groups.google.com/forum/#!forum/deeplearning4j

Submit issues: https://github.com/SkymindIO/deeplearning4j/issues

Vote for our hadoop summit talk as well!

https://hadoopsummit.uservoice.com/forums/283261-data-science-and-hadoop/filters/top

scala-spark-examples's People

Watchers

Mirza Safiullah Baig avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.