Giter Club home page Giter Club logo

h2o-sparkling's Introduction

h2o-sparkling

Makes interoperability between H2O and Spark trivial.

Requirements

  • Spark 1.0.0 (SQL component required)
  • Tachyon 0.4.1
  • Java 1.6+

Installation

  • First compile latest version of spark with SQL component
git clone spark
cd spark
sbt/sbt assembly publish-local
cd h2o-sparkling-demo
sbt assembly

Note: The assembly stage is important, since the demo is a Spark driver sending a jar-file containing implementation of a working job.

Run demo

Run local version

For this run no Spark cloud is required:

  • Execute an instance of H2O embedding Spark driver
cd h2o-sparkling-demo
sbt "run --local"

Run distributed version

For this run a Spark cloud is required:

  • Run master and one worker on local node
cd spark/sbin
./start-master.sh
./start-slave.sh 1 "spark://localhost:7077"
  • Assembly h2o-sparkling-demo jar file which can be sent by the driver to Spark cloud
cd h2o-sparkling-demo
sbt assembly
sbt "run --remote"

Run additional H2O node

cd h2o-sparkling-demo
sbt runH2O

Select different RDD2Frame extractor

Currently demo supports three extractors:

  • dummy - pull all data into driver and create a frame
  • file - ask Spark to save RDD as a file on local filesystem and then parse a stored file
  • tachyon - ask Spark to save RDD to tachyon filesystem, then H2O load a file from tachyon FS

The extractor can be selected via --extractor command line parameter, e.g., --extractor==tachyon

Running with Tachyon

  • Start Tachyon
cd tachyon/bin
./tachyon-start.sh

Example

Run a demo with Tachyon-based extractor againts remote Spark cloud:

cd h2o-sparkling-demo
sbt assembly
sbt "run --remote --extractor=tachyon"

Run airlines demo with file-based extractor againts remote Spark cloud running on non-default location:

sbt "run --remote --sparkMaster=spark://localhost:17077 --noshutdown --demo=airlines --extractor=file"

Doc

h2o-sparkling's People

Contributors

mmalohlava avatar srisatish avatar

Watchers

James Cloos avatar IData avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.