Giter Club home page Giter Club logo

apache-spark-examples's Introduction

apache-spark-examples

Apache Spark Examples

Overview

These examples were put together for some talks on Apache Spark by AgilData [http://www.agildata.com/]

[8/5/16] This repo has been updated to use Apache Spark 2.0. The original code for Apache Spark 1.6.x is available on this branch: https://github.com/AgilData/apache-spark-examples/tree/spark_1.6.x

Note that some of the Java examples do not currently work. This is to highlight some of the issues when using Java with the DataFrame API and this is covered in our talk (we will update this README with a link to the slides soon).

US Census 2010

There are various code samples in this repo in both Java and Scala for performing some trivial analytics on US census data.

To download the full US census data for Colorado:

http://www2.census.gov/census_2010/04-Summary_File_1/Colorado/

Download the zip file and unzip into a testdata directory within this project.

usgeo2010.txt contains geographic information in fixed-width format. For the examples in this repo we are only interested in the following fields:

s.substring(18,25),   // Logical Record No
s.substring(226,316), // Name
s.substring(8,11)     // Summary Level (050 is county)

For full documentation on the file formats, download http://www.census.gov/prod/cen2010/doc/sf1.pdf

Word Count Examples

This repo also contains the classic word count examples, in Java and Scala, with some minor modifications.

You can use any text file as an input and in our talk we used the complete works of Shakespeare in text format. The download is available here:

http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt

apache-spark-examples's People

Contributors

andygrove avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.