Giter Club home page Giter Club logo

aws-emr's Introduction

AWS EMR Spark

Steps for Deploying Spark App on Amazon EMR

  • Step 1

    Test your application using Scala - ide using sample data.

  • Step 2

    • Remove all local path and Spark Context master local reference from Scala file.

    • Use SBT to package your application

      • Create and empty directory sbt
      • sbt new scala/hello-world.g8.
      • add your scale files under sbt\movies\src\main\scala directory
      • edit sbt\movies\built.sbt
    • name := "MostRatedMovies100k"
      
      version := "1.0"
      
      organization := "com.forsynet.sparkemr"
      
      scalaVersion := "2.11.12"
      
      libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % "2.4.5" % "provided"
      )
    • at command prompt run command "sbt assembly"

    • This will create jar files with all dependencies under sbt\movies\target\scala-2.11\MostRatedMovies100k-assembly-1.0.jar

  • Step 3

    • upload the jar files and the data files to s3 bucket .Use UI or below cli commands

    • aws configure
      aws s3api create-bucket --bucket rupeshemr
      aws s3 sync data/
      
      • Verify the data is uploaded to s3 bucket

Create an Amazon EMR cluster

  • Step 4

  • Use the aws cli or the UI to create the cluster.

  • aws emr create-cluster \
        --instance-type m3.xlarge \
        --release-label emr-5.10.0 \
        --service-role EMR_DefaultRole \
        --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \
        --security-configuration mySecurityConfiguration \
        --kerberos-attributes file://kerberos_attributes.json
    • Verify Cluster Creation

    • Step5

    • Add SSH Inbound rule to security groups

  • Step 6

  • ssh into emr master node

  • copy the jar file from s3 bucker

  • aws s3 cp s3://rupeshemr/MostRatedMovies-1.0.jar ./
    

Submit the Spark Job

  • Step 7

    • spark-submit MostRatedMovies-1.0.jar
  • Verify Results of top rated movies

  • Use the Spark History Server UI to see the Spark Job History for submitted job

  • Verify Amazon s3 bucket for logs created for the job

aws-emr's People

Contributors

rupeshtr78 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.