Giter Club home page Giter Club logo

salt-core's Introduction

Salt  Build Status Coverage Status

http://uncharted.software/salt

Getting Started

Please note that this README represents an unreleased, bleeding-edge version of Salt. For up-to-date stable docs, please navigate to the appropriate release tag or visit the Salt website.

Examples

Example projects can be found at salt-examples.

Usage

Source data

We need something to tile. Let's start with a small sample of the NYC Taxi Dataset, which can be downloaded from here.

Tile generation, made easy!

Let's generate tiles which represent the mean number of passengers at each pickup location in the source dataset.

To begin, we'll need a spark-shell. If you have your own Spark cluster, skip ahead to Generation. Otherwise, continue to the next step to fire up a small Spark test cluster via Docker. You'll want at least 4GB of free RAM on your machine to use this latter method.

Using the Docker test container

Running Salt requires a Spark cluster.

Start and attach to the container that was just created:

$ ./test-environment
$ ./test-environment attach

You can remove the container at any time by running:

$ ./test-environment rm

Keep the container running! We'll need it to try the following example.

Generation

Attach to the container using ./test-environment attach. Then download the sample data file and launch a spark-shell with salt to try this example:

$ curl -OL http://assets.oculusinfo.com/pantera/taxi_micro.csv
$ spark-shell --packages "software.uncharted.salt:salt-core:4.1.0"

Now it's time to run a simple tiling job! Enter paste mode (:paste), and paste the following script:

import software.uncharted.salt.core.projection.numeric._
import software.uncharted.salt.core.generation.Series
import software.uncharted.salt.core.generation.TileGenerator
import software.uncharted.salt.core.analytic._
import software.uncharted.salt.core.generation.request._
import software.uncharted.salt.core.analytic.numeric._
import java.sql.Timestamp
import org.apache.spark.sql.Row

// source RDD
// It is STRONGLY recommended that you filter your input RDD
// down to only the columns you need for tiling.
val rdd = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("taxi_micro.csv")
  .select("pickup_lon", "pickup_lat", "passengers")
  .rdd

// cache the RDD to make things a bit faster
rdd.cache

// We use a value extractor function to retrieve data-space coordinates from rows
// In this case, that's column 0 (pickup_time, converted to a double millisecond value) and column 1 (distance)
val cExtractor = (r: Row) => {
  if (r.isNullAt(0) || r.isNullAt(1)) {
    None
  } else {
    Some((r.getDouble(0), r.getDouble(1)))
  }
}

// create a projection from data-space into mercator tile space, which is suitable for
// display on top of a map using a mapping library such as leaflet.js
// we specify the zoom levels we intend to support using a Seq[Int]
val projection = new MercatorProjection(Seq(0,1))

// a value extractor function to grab the number of passengers from a Row
val vExtractor = (r: Row) => {
  if (r.isNullAt(2)) {
    None
  } else {
    Some(r.getInt(2).toDouble)
  }
}

// A series ties the value extractors, projection and bin/tile aggregators together.
// We'll be tiling average passengers per bin, and max/min of the bin averages per tile
// We'll also divide our tiles into 8x8 bins so that the output is readable. We specify
// this using the maximum possible bin index, which is (7,7)
val avgPassengers = new Series((7, 7), cExtractor, projection, vExtractor, MeanAggregator, Some(MinMaxAggregator))

// which tiles are we generating? In this case, we'll use a TileSeqRequest
// which allows us to specify a list of tiles we're interested in, by coordinate.
// these tiles should be within the bounds of the Projection we created earlier
val request = new TileSeqRequest(Seq((0,0,0), (1,0,0)))

// Tile Generator object, which houses the generation logic
@transient val gen = TileGenerator(sc)

// Flip the switch by passing in the series and the request
// Note: Multiple series can be run against the source data
// at the same time, so a Seq[Series] is also supported as
// the second argument to generate()
val result = gen.generate(rdd, avgPassengers, request)

// Try to read some values from bins, from the first (and only) series
println(result.map(t => (avgPassengers(t).get.coords, avgPassengers(t).get.bins)).collect.deep.mkString("\n"))

Salt Library Contents

Salt is made of some simple, but vital, components:

Projections

A projection maps from data space to the tile coordinate space.

Salt currently supports three projections:

  • CartesianProjection (x, y, v)
  • MercatorProjection (x, y, v)
  • SeriesProjection (x, v)

Aggregators

Aggregators are used to aggregate values within a bin or a tile.

Salt includes seven sample aggregators:

  • CountAggregator
  • MaxAggregator
  • MinAggregator
  • MinMaxAggregator (for tile-level analytics)
  • MeanAggregator
  • SumAggregator
  • TopElementsAggregator

Additional aggregators can be implemented on-the-fly within your script as you see fit.

Requests

Salt allows tile batches to be phrased in several ways:

  • TileSeqRequest (built from a Seq[TC] of tile coordinates, requesting specific tiles)
  • TileLevelRequest (built from a Seq[Int] of levels, requesting all tiles at those levels)

Series

A Series pairs together a Projection with Aggregators. Multiple Series can be generated simultaneously, each operating on the source data in tandem.

salt-core's People

Contributors

ghnuberath avatar nkronenfeld avatar

Watchers

James Cloos avatar Peter Clarke avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.