Giter Club home page Giter Club logo

moj-analytical-services.docker_spark_history_ui's Introduction

docker_spark_history_ui

A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS glue

Background

AWS recently announced that it's possible to monitor and troubleshoot Glue ETL jobs in the Spark UI.

Upon first glance, the docs seems to suggest it's as simple as including:

'--enable-spark-ui': 'true',
'--spark-event-logs-path': 's3://s3-event-log-path'

in the job config options.

On more careful reading of the docs, it becomes clear the Spark UI is not provided automatically as part of the AWS Glue GUI. Instead, AWS provide a CloudFormation template that allow you to run the spark history server. See here for a description. We probably do not want to use this as it's just another piece of complexity in the platform that we'd need to look after.

Essentially all AWS Glue does is output logs in the format requirred by the Spark History Server. This means an alternative option is to run a local version of the Spark History Service, and import the logs generated by Glue into this locally-running server.

This repo provides a dockerised version of this server to make it as easy as possible to get up and running.

Instructions

1. Set your glue job going

Glue job

In your glue job, you need to enable the following options (this code uses etl manager):

job = GlueJob('my_dir/', bucket=bucket, job_role=my_role,
              job_arguments={"'--enable-spark-ui': 'true',
                             '--spark-event-logs-path': 's3://my-bucket/path-where-i-want-logs-to-go' })

2. Clone this repo

git clone [email protected]:moj-analytical-services/docker_spark_history_ui.git
cd docker_spark_history_ui

3. Build dockerfile from this repo

docker build -t sparkhistoryserver .

4. Copy the events from the job to a local events folder

mkdir events
aws s3 sync s3://my-bucket/path-where-i-want-logs-to-go events

5. Run the spark history server and navigate to the web frontend

docker run -v ${PWD}/events:/tmp/spark-events -p 18080:18080 sparkhistoryserver

and go to http://127.0.0.1:18080 in your web browser

Notes on how I made this work

Note that in the dockerfile, I set the SPARK_NO_DAEMONIZE environment variable, see here. Otherwise the dockerfile exists soon after it starts

moj-analytical-services.docker_spark_history_ui's People

Contributors

robinl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.