Giter Club home page Giter Club logo

qcon2015's Introduction

QCon 2015

Materials for the San Francisco QConf 2015 Workshop. The goal for the day is to learn to use Spark, H2O and Sparkling Water to build smart applications driven by machine learning models. The tutorials will go over:

  • How to clean and munge data in Spark and H2O.
  • How to read in multiple datasets and join them to provide more features to the machine learning process.
  • How to use MLlib in conjunction with H2O's library or algorithms to take the best of platforms using Sparkling Water.
  • How to integrate the scoring engine from your Sparkling Water script into Spark Streaming to produce real-time predictions.
  • How to deploy smarter applications on top of Spark.
  • How to deploy simple models

Outline

  1. Spark & Sparkling Water Introduction
    • H2O and Spark intro
    • Sparkling Water intro
    • Installation and setup of Spark
      • Running Spark shell
    • Installation and setup of Sparkling Water
    • Basic architecture and overview of functionalities
    • Hands on demonstration of Sparkling Water
      • Running Sparkling Shell
  2. Simple Spam Detector
    • Use Spark to tokenize text
    • Use MLlib's TF-IDF model to transform the data into a table
    • Build GBM model to label incoming text as spam or not spam (ham)
  3. Ask Craig(list) Application
    • Build a classifier to label job description into appropriate industry categories
    • Deploy it as Spark application
  4. Standalone application concepts
    • Deploy the classification model inside Spark Streaming
  5. Spark Streaming and Model Deployment
    • Loading a saved H2O binary model
    • Exposing the model via Spark stream
  6. Spark Streaming and Model Deployment #2
    • Using exported POJO model in Spark stream
  7. Final Application
    • Assembling the final application: combining the front end and back end
  8. Lending Club Example
    • A smart app predicting loan interest
    • Off-line training pipeline driven from R
    • POJO models exposed via REST API

Requirements

  • Mac OS X or Linux
  • Java 7
  • Spark 1.5+
  • Sparkling Water 1.5.6
  • IntelliJ IDEA development environment
  • Scala SDK 2.10.4 for IDEA (can be fetch from Ivy cache)
  • Maven dependencies (fetch by Gradle)

Goals

  • Get familiar with Spark
  • Understand Sparkling Water
  • Combine power of Spark MLLib and Sparkling Water library to write machine learning flows
  • Write Spark/Sparkling Water standalone application
  • Deploy applications on Spark cluster
  • Deploy models

qcon2015's People

Contributors

mmalohlava avatar

Watchers

James Cloos avatar Ndjido Ardo BAR avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.