Giter Club home page Giter Club logo

jupyter-spark's Introduction

jupyter-spark

Build Status

codecov

Jupyter Notebook extension for Apache Spark integration.

Includes a progress indicator for the current Notebook cell if it invokes a Spark job. Queries the Spark UI service on the backend to get the required Spark job information.

Alt text

To view all currently running jobs, click the "show running Spark jobs" button, or press Alt+S.

Alt text

Alt text

A proxied version of the Spark UI can be accessed at http://localhost:8888/spark.

Installation

To install, simply run:

pip install jupyter-spark
jupyter serverextension enable --py jupyter_spark
jupyter nbextension install --py jupyter_spark
jupyter nbextension enable --py jupyter_spark
jupyter nbextension enable --py widgetsnbextension

The last step is needed to enable the widgetsnbextension extension that Jupyter-Spark depends on. It may have been enabled before by a different extension.

You may want to append --user to the commands above if you're getting configuration errors upon invoking them.

To double-check if the extension was correctly installed run:

jupyter nbextension list
jupyter serverextension list

Pleaes feel free to install lxml as well to improve performance of the server side communication to Spark using your favorite package manager, e.g.:

pip install lxml

For development and testing, clone the project and run from a shell in the project's root directory:

pip install -e .
jupyter serverextension enable --py jupyter_spark
jupyter nbextension install --py jupyter_spark
jupyter nbextension enable --py jupyter_spark

To uninstall the extension run:

jupyter serverextension disable --py jupyter_spark
jupyter nbextension disable --py jupyter_spark
jupyter nbextension uninstall --py jupyter_spark
pip uninstall jupyter-spark

Configuration

To change the URL of the Spark API that the job metadata is fetched from override the Spark.url config value, e.g. on the command line:

jupyter notebook --Spark.url="http://localhost:4040"

Changelog

0.3.0 (2016-07-04)

  • Rewrote proxy to use an async Tornado handler and HTTP client to fetch responses from Spark.

  • Simplified proxy processing to take Amazon EMR proxying into account

  • Extended test suite to cover proxy handler, too.

  • Removed requests as a dependency.

0.2.0 (2016-06-30)

  • Refactored to fix a bunch of Python packaging and code quality issues

  • Added test suite for Python code

  • Set up continuous integration: https://travis-ci.org/mozilla/jupyter-spark

  • Set up code coverage reports: https://codecov.io/gh/mozilla/jupyter-spark

  • Added ability to override Spark API URL via command line option

  • IMPORTANT Requires manual step to enable after running pip install (see installation docs)!

    To update:

    1. Run pip uninstall jupyter-spark
    2. Delete spark.js from your nbextensions folder.
    3. Delete any references to jupyter_spark.spark in jupyter_notebook_config.json (in your .jupyter directory)
    4. Delete any references to spark in notebook.json (in .jupyter/nbconfig)
    5. Follow installation instructions to reinstall

0.1.1 (2016-05-03)

  • Initial release with a working prototype

jupyter-spark's People

Contributors

shaybeau731 avatar jezdez avatar andershammar1 avatar musicaljelly avatar mreid-moz avatar yeah568 avatar niels-be avatar maurodoglio avatar vitillo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.