Giter Club home page Giter Club logo

pixiedust's Introduction

PixieDust

PyPI version Build Status

PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.

New Book now available: Thoughtful Data Science

This book published by Packt Publishing is the user and developer reference for using PixieDust

Pixiedust developer community

Wait! There is a developer community? Yes there is! If you already are a member, login. If you would like to contribute please join us.

Why you need it

Notebooks are a powerful tool for fast and flexible data analysis. But the learning curve is steep.

Python data science notebooks were first popularized in academia, and there are some formalities to work through before you can get to your analysis. For example, in a Python interactive notebook, a mundane task like creating a simple chart or saving data into a persistence repository requires mastery of complex code like this matplotlib snippet:

All this for a chart?
All this for a chart?

Once you do create a notebook that provides great data insights, it's hard to share with business users, who don't want to slog through all that dry, hard-to-read code, much less tweak it and collaborate.

PixieDust to the rescue.

What is PixieDust?

PixieDust is an open source helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. It also fills a gap for users who have no access to configuration files when a notebook is hosted on the cloud.

Use in Python or Scala

PixieDust greatly simplifies working with Python display libraries like matplotlib, but works just as effectively in Scala notebooks too. You no longer have compromise your love of Scala to generate great charts. PixieDust lets you bring robust Python visualization options to your Scala notebooks. Installer and instructions to use Scala with PixieDust are coming soon...

Features

PixieDust's current capabilities include:

  • packageManager lets you install Spark packages inside a Python notebook. This is something that you can't do today on hosted Jupyter notebooks, which prevents developers from using a large number of spark package add-ons.

  • Visualizations. One single API called display() lets you visualize your Spark object in different ways: table, charts, maps, etc.... This module is designed to be extensible, providing an API that lets anyone easily contribute a new visualization plugin.

    This sample visualization plugin uses d3 to show the different flight routes for each airport:

    graph map

  • Embedded apps. Let nonprogrammers actively use notebooks. Transform a hard-to-read notebook into a polished graphic app for business users. Check out these preliminary sample apps:

    • An app can feature embedded forms and responses, flightpredict, which lets users enter flight details to see the likelihood of landing on-time.
    • Or present a sophisticated workflow, like our twitter demo, which delivers a real-time feed of tweets, trending hashtags, and aggregated sentiment charts with Watson Tone Analyzer.
  • Extensibility. Create your own visualizations or apps using the PixieDust extensibility APIs. If you know html and css, you can write and deliver amazing graphics without forcing notebook users to type one line of code. Use the shape of the data to control when PixieDust shows your visualization in a menu.

  • Export. Notebook users can download data to .csv, HTML, JSON, etc. locally on your laptop or into a variety of back-end data sources, like Cloudant, dashDB, GraphDB, etc...

    save as options

  • Scala Bridge. Use Scala directly in your Python notebook. Variables are automatically transfered from Python to Scala and vice-versa. Learn more.

    Or start in a Scala notebook. As mentioned, all these PixieDust features work not only in Python, but in Scala too. So if you prefer Scala, you'll soon be able to start there and use PixieDust to insert sophisticated Python graphic options within your Scala notebook. Instructions coming soon.

  • Spark progress monitor. Track the status of your Spark job. No more waiting in the dark. Notebook users can now see how a cell's code is running behind the scenes.

Watch this video to see PixieDust in action:

about PixieDust

Usage

You can use PixieDust locally or online within IBM's Watson Studio.

Use online

To use PixieDust online

Use locally

  • Pixiedust supports
  • Spark 1.6 or 2.0
  • Python 2.7 or 3.5

Sample notebooks

Wherever you prefer to work, try out the following sample notebooks:

Tutorials

Contribute

Note: PixieDust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back.

Read how to contribute for details on our code of conduct and instructions for submitting pull requests to us.

Developer Guide

Dive into the PixieDust developer docs and learn how to build your own custom visualization or embedded app. You can also pitch in and contribute an enhancement to PixieDust's core features.

We can't wait to see what you build.

License

Apache License, Version 2.0.

For details and all the legalese, read LICENSE.

pixiedust's People

Contributors

anirudhbedre avatar benhuds avatar chetnawarade avatar dtaieb avatar elgalu avatar esafak avatar eyaltrabelsi avatar glynnbird avatar hmaarrfk avatar jessmantaro avatar jordangeorge avatar kastentx avatar margrietgroenendijk avatar markwatsonatx avatar mikebroberg avatar milhcbt avatar mobilizer avatar natashadsilva avatar nicklondhe avatar pete-may avatar ptitzler avatar rajrsingh avatar slipperypenguin avatar sshuklao avatar vabarbosa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pixiedust's Issues

ImportError: No module named _tkinter

Just tried to import pixiedust and hit the error. I tried restarting the kernel still the same error.

ImportErrorTraceback (most recent call last)
<ipython-input-1-fa75a171b8de> in <module>()
      1 get_ipython().system(u'pip install --user --upgrade --quiet pixiedust')
----> 2 import pixiedust
      3 
      4 jars = ["http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.9.0.0/kafka-clients-0.9.0.0.jar",
      5         "http://central.maven.org/maven2/org/apache/kafka/kafka_2.10/0.9.0.0/kafka_2.10-0.9.0.0.jar",

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/__init__.pyc in <module>()
     29 uninstallPackage=packageManager.uninstallPackage
     30 
---> 31 import display
     32 import services
     33 

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/__init__.py in <module>()
     16 
     17 from .display import *
---> 18 import chart,graph,table,tests,download
     19 from pixiedust.utils.printEx import *
     20 import traceback

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/__init__.py in <module>()
     15 # -------------------------------------------------------------------------------
     16 
---> 17 from .barChartDisplay import BarChartDisplay
     18 from .lineChartDisplay import LineChartDisplay
     19 from .scatterPlotDisplay import ScatterPlotDisplay

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/barChartDisplay.py in <module>()
     16 
     17 from .display import ChartDisplay
---> 18 from .mpld3ChartDisplay import Mpld3ChartDisplay
     19 import matplotlib.pyplot as plt
     20 import numpy as np

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/mpld3ChartDisplay.py in <module>()
     15 # -------------------------------------------------------------------------------
     16 
---> 17 from .baseChartDisplay import BaseChartDisplay
     18 from .display import ChartDisplay
     19 from .plugins.chart import ChartPlugin

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/baseChartDisplay.py in <module>()
     22 from pyspark.sql.types import StructType
     23 import matplotlib.cm as cm
---> 24 import matplotlib.pyplot as plt
     25 import mpld3
     26 import mpld3.plugins as plugins

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/pyplot.py in <module>()
    112 
    113 from matplotlib.backends import pylab_setup
--> 114 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
    115 
    116 _IP_REGISTERED = None

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/backends/__init__.pyc in pylab_setup()
     30     # imports. 0 means only perform absolute imports.
     31     backend_mod = __import__(backend_name,
---> 32                              globals(),locals(),[backend_name],0)
     33 
     34     # Things we pull in from all backends

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/backends/backend_tkagg.py in <module>()
      4 
      5 from matplotlib.externals import six
----> 6 from matplotlib.externals.six.moves import tkinter as Tk
      7 from matplotlib.externals.six.moves import tkinter_filedialog as FileDialog
      8 

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in load_module(self, fullname)
    197         mod = self.__get_module(fullname)
    198         if isinstance(mod, MovedModule):
--> 199             mod = mod._resolve()
    200         else:
    201             mod.__loader__ = self

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in _resolve(self)
    111 
    112     def _resolve(self):
--> 113         return _import_module(self.mod)
    114 
    115     def __getattr__(self, attr):

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in _import_module(name)
     78 def _import_module(name):
     79     """Import module, returning the module after the last dot."""
---> 80     __import__(name)
     81     return sys.modules[name]
     82 

/usr/local/src/bluemix_jupyter_bundle.v22/notebook/lib/python2.7/lib-tk/Tkinter.py in <module>()
     37     # Attempt to configure Tcl/Tk without requiring PATH
     38     import FixTk
---> 39 import _tkinter # If this fails your Python may not be configured for Tk
     40 tkinter = _tkinter # b/w compat for export
     41 TclError = _tkinter.TclError

ImportError: No module named _tkinter

Support Decimal type in charts

mpld3 uses Python json library which does not support serialization of Decimal types.

Potential fixes:

  1. Override default json encoder (if possible)
  2. Convert all incoming Decimals to Doubles (may cause issues with precision)

Line/Bar Chart - Impossible to have 2 lines / 2 bars?

I have a dataframe with column A, B and C.
I want A to be the x-axis.
I want B to be the size of the bar or the height of the line.
I want C to be the category, to have an other bar or an other line for each C attribute with a different color.

From the options of PixieDust, I can see the "keys" and "values" but I see nothing for the categories.

As a user, I want to easily remove the automatically selected keys and values

Exploring some of the sample notebooks I realized that more often than not the default options don't work for many chart types. For example, pick million dollar homes and google as the renderer. An error is displayed incompatible data table: Error: table contains more columns than expected. If I want to change the chart options I have to manually remove each pre-selected field, which is rather labor intensive for data sets with many fields. A reset/clear button that resets the options would be nice to have.

As a data scientist, I want to not aggregate values for scatter & line plots, see raw data.

Expected behavior

Plot raw data via line and scatter plot. When user selects line or scatter, do not display aggregate option. For scatter in matplotlib, Options dialog does not show aggregtation.

Actual behavior

Currently pixiedust 'Options' dialog aggregates values into SUM, AVG, MIN, MAX and COUNT which are helpful with bar, pie and histograms but not for scatter and line, in fact that changes the course of analysis as raw data is also important to look at.

Steps to reproduce the behavior

Select type of chart, open Options dialog.

Plot options before rendering

It would be nice to select plotting options before rendering since it can take a while to render large amounts of data. Sitting around waiting for the default behavior to finish before being able to select options. Perhaps a modal window can pop up when you select the chart option before it begins.

Pixiedust Zeppelin Support

Hi,

I am working in Zeppelin, it has support to visualize that but its visualization is limited. Is it possible to add pixiedust to Zeppelin, i had tried to integrate it Zeppelin but when i call display(dataframe). It display the following:

{"data": {"text/plain": "<IPython.core.display.HTML object>", "text/html": "\n<script>\n //Marker 706ab2be\n setTimeout(function(){\n var cells=IPython.notebook.get_cells().filter(function(cell){\n if(!cell.output_area || !cell.output_area.outputs){\n return false;\n }\n return cell.output_area.outputs.filter(function(output){\n if (output.output_type===\"display_data\"&&output.data&&output.data[\"text/html\"]){\n return output.data[\"text/html\"].includes(\"//Marker 706ab2be\")\n }\n return false;\n }).length > 0;\n });\n if(cells.length>0){\n var cell=cells[0];\n var cellId=cell.cell_id;\n var cellMetadata=cell._metadata.pixiedust;\n //cell.output_area.clear_output(false, true);\n var old_msg_id = cell.last_msg_id;\n if (old_msg_id) {\n cell.kernel.clear_callbacks_for_msg(old_msg_id);\n }\n var snifferOptions = [];\n \n snifferOptions.push({'nostore_bokeh':!!window.Bokeh})\n \n \n \n \n !\nfunction() {\n cellId = typeof cellId === \"undefined\" ? \"\" : cellId;\n var curCell=IPython.notebook.get_cells().filter(function(cell){\n return cell.cell_id==\"cellId\".replace(\"cellId\",cellId);\n });\n curCell=curCell.length>0?curCell[0]:null;\n console.log(\"curCell\",curCell);\n var startWallToWall;\n //Resend the display command\n var callbacks = {\n shell : {\n payload : {\n set_next_input : function(payload){\n if (curCell){\n curCell._handle_set_next_input(payload);\n }\n }\n }\n },\n iopub:{\n output:function(msg){\n console.log(\"msg\", msg);\n if (true){\n curCell.output_area.handle_output.apply(curCell.output_area, arguments);\n curCell.output_area.outputs=[];\n return;\n }\n var msg_type=msg.header.msg_type;\n var content = msg.content;\n var executionTime = $(\"#execution706ab2be\");\n if(msg_type===\"stream\"){\n $('#wrapperHTML706ab2be').html(content.text);\n }else if (msg_type===\"display_data\" || msg_type===\"execute_result\"){\n var html=null;\n if (!!content.data[\"text/html\"]){\n html=content.data[\"text/html\"];\n }else if (!!content.data[\"image/png\"]){\n html=html||\"\";\n html+=\"<img src='data:image/png;base64,\" +content.data[\"image/png\"]+\"'></img>\";\n }\n \n if (!!content.data[\"application/javascript\"]){\n try {\n eval(content.data[\"application/javascript\"]);\n } catch(err) {\n curCell.output_area.handle_output.apply(curCell.output_area, arguments);\n } \n return;\n }\n \n if (html){\n try{\n $('#wrapperHTML706ab2be').html(html);\n }catch(e){\n console.log(\"Invalid html output\", e, html);\n $('#wrapperHTML706ab2be').html( \"Invalid html output. <pre>\" \n + html.replace(/>/g,'&gt;').replace(/</g,'&lt;').replace(/\"/g,'&quot;') + \"<pre>\");\n }\n\n if(curCell&&curCell.output_area&&curCell.output_area.outputs){\n setTimeout(function(){\n var data = JSON.parse(JSON.stringify(content.data));\n if(!!data[\"text/html\"])data[\"text/html\"]=html;\n function savedData(data){\n \n var markup='<style type=\"text/css\">.pd_warning{display:none;}</style>';\n markup+='<div class=\"pd_warning\"><em>Hey, there\\'s something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter</em></div>';\n nodes = $.parseHTML(data[\"text/html\"], null, true);\n var s = $(nodes).wrap(\"<div>\").parent().find(\".pd_save\").not(\".pd_save .pd_save\")\n s.each(function(){\n var found = false;\n if ( $(this).attr(\"id\") ){\n var n = $(\"#\" + $(this).attr(\"id\"));\n if (n.length>0){\n found=true;\n n.each(function(){\n $(this).addClass(\"is-viewer-good\");\n });\n markup+=n.wrap(\"<div>\").parent().html();\n }\n }else{\n $(this).addClass(\"is-viewer-good\");\n }\n if (!found){\n markup+=$(this).parent().html();\n }\n });\n data[\"text/html\"] = markup;\n return data;\n }\n curCell.output_area.outputs.push({\"data\": savedData(data),\"metadata\":content.metadata,\"output_type\":msg_type});\n },2000);\n }\n }\n }else if (msg_type === \"error\") {\n require(['base/js/utils'], function(utils) {\n var tb = content.traceback;\n console.log(\"tb\",tb);\n if (tb && tb.length>0){\n var data = tb.reduce(function(res, frame){return res+frame+'\\\\n';},\"\");\n console.log(\"data\",data);\n data = utils.fixConsole(data);\n data = utils.fixCarriageReturn(data);\n data = utils.autoLinkUrls(data);\n $('#wrapperHTML706ab2be').html(\"<pre>\" + data +\"</pre>\");\n }\n });\n }\n\n //Append profiling info\n if (executionTime.length > 0 && $(\"#execution706ab2be\").length == 0 ){\n $('#wrapperHTML706ab2be').append(executionTime);\n }else if (startWallToWall && $(\"#execution706ab2be\").length > 0 ){\n $(\"#execution706ab2be\").append($(\"<div/>\").text(\"Wall to Wall time: \" + ( (new Date().getTime() - startWallToWall)/1000 ) + \"s\"));\n }\n }\n }\n }\n \n if (IPython && IPython.notebook && IPython.notebook.session && IPython.notebook.session.kernel){\n var command = \",cell_id='cellId'\".replace(\"cellId\",cellId);\n function addOptions(options){\n function getStringRep(v) {\n return \"'\" + v + \"'\";\n }\n for (var key in (options||{})){\n var value = options[key];\n var hasValue = value != null && typeof value !== 'undefined' && value !== '';\n var replaceValue = hasValue ? (key+\"=\" + getStringRep(value) ) : \"\";\n var pattern = (hasValue?\"\":\",\")+\"\\\\s*\" + key + \"\\\\s*=\\\\s*'(\\\\\\\\'|[^'])*'\";\n var rpattern=new RegExp(pattern);\n var n = command.search(rpattern);\n if ( n >= 0 ){\n command = command.replace(rpattern, replaceValue);\n }else if (hasValue){\n var n = command.lastIndexOf(\")\");\n command = [command.slice(0, n), (command[n-1]==\"(\"? \"\":\",\") + replaceValue, command.slice(n)].join('')\n } \n }\n }\n if(typeof cellMetadata != \"undefined\" && cellMetadata.displayParams){\n addOptions(cellMetadata.displayParams);\n addOptions({\"showchrome\":\"true\"});\n }else if ('False'=='True' && curCell && curCell._metadata.pixiedust ){\n addOptions(curCell._metadata.pixiedust.displayParams || {} );\n }\n addOptions({});\n \n \n \n for (var o in snifferOptions){\n addOptions(snifferOptions[o]);\n }\n \n \n var pattern = \"\\\\w*\\\\s*=\\\\s*'(\\\\\\\\'|[^'])*'\";\n var rpattern=new RegExp(pattern,\"g\");\n var n = command.match(rpattern);\n var displayParams={}\n for (var i = 0; i < n.length; i++){\n var parts=n[i].split(\"=\");\n var key = parts[0].trim();\n var value = parts[1].trim()\n if (!key.startsWith(\"nostore_\") && key != \"showchrome\" && key != \"prefix\" && key != \"cell_id\"){\n displayParams[key] = value.substring(1,value.length-1);\n }\n }\n if(curCell&&curCell.output_area){\n curCell._metadata.pixiedust = curCell._metadata.pixiedust || {}\n curCell._metadata.pixiedust.displayParams=displayParams\n curCell.output_area.outputs=[];\n var old_msg_id = curCell.last_msg_id;\n if (old_msg_id) {\n curCell.kernel.clear_callbacks_for_msg(old_msg_id);\n }\n }else{\n console.log(\"couldn't find the cell\");\n }\n $('#wrapperJS706ab2be').html(\"\")\n $('#wrapperHTML706ab2be').html('<div style=\"width:100px;height:60px;left:47%;position:relative\"><i class=\"fa fa-circle-o-notch fa-spin\" style=\"font-size:48px\"></i></div>'+\n '<div style=\"text-align:center\">Loading your data. Please wait...</div>');\n startWallToWall = new Date().getTime();\n \n console.log(\"Running command2\",command);\n IPython.notebook.session.kernel.execute(command, callbacks, {silent:true,store_history:false,stop_on_error:true});\n }\n}\n()\n }else{\n alert(\"An error occurred, unable to access cell id\");\n }\n },500);\n</script>"}, "metadata": {}}

Any help related to this issue? Or anyone had tried pixiedust with Zeppelin?

Getting more done in GitHub with ZenHub

Hola! @rajrsingh has created a ZenHub account for the ibm-cds-labs organization. ZenHub is the only project management tool integrated natively in GitHub – created specifically for fast-moving, software-driven teams.


How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

  • Real-time, customizable task boards for GitHub issues;
  • Multi-Repository burndown charts, estimates, and velocity tracking based on GitHub Milestones;
  • Personal to-do lists and task prioritization;
  • Time-saving shortcuts – like a quick repo switcher, a “Move issue” button, and much more.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @rajrsingh.

ZenHub Board

Add export to SVG option

export graphics to SVG for editing in a graphics program -- changing fonts, moving overlapping text, etc.

log scale

Would be nice to have axis on log10 scale.

Download DataFrame as JSON is broken

@vabarbosa : Get following error in the browser console:
main.min.js:23119 Couldn't process kernel message TypeError: Failed to execute 'appendChild' on 'Node': parameter 1 is not of type 'Node'.

As a user, I want to know more about the sample data before I load it

Expected behavior

sampleData() should link to the curated public sample data set so a user can make an informed decision which set to pick:

Id 	Name 	Topic 	Publisher
1 	[car performance data](link-to-public-data-set) 	transportation 	IBM

Even better, let the user somehow pick from any of the public data sets available on the exchange.

Actual behavior (0.80)

pixiedust.sampleData() currently displays

Id 	Name 	Topic 	Publisher
1 	Car performance data 	transportation 	IBM

Steps to reproduce the behavior

import pixiedust
pixiedust.sampleData()

As a data scientist, I want to see Options dialog display type of chart selected so that I remember which chart was chosen.

Expected behavior

When user clicks on Options dialog, display type of chart (or even type of renderer, optional though as it is seen on the right hand side) as the drop down of type of chart does not store last selected type of chart.

Actual behavior

Options does not display type of chart, so it is difficult to remember most suitable aggregate, x and y values for the chart.

Steps to reproduce the behavior

Select type of chart. Click Options button.

Ensure python 3 compatibility

I get the following errors upon installation:

>>> import pixiedust
>>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/esafak/Library/Python/3.5/lib/python/site-packages/pixiedust/__init__.py", line 19, in <module>
    import packageManager
ImportError: No module named 'packageManager'

>>> from pixiedust.packageManager import PackageManager
>>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/esafak/Library/Python/3.5/lib/python/site-packages/pixiedust/__init__.py", line 19, in <module>
    import packageManager
ImportError: No module named 'packageManager'

I think you have an issue with relative imports.

As a data-scientist, scala Variables from one %%scala cell are need to be accessible in other %%scala cell

As a data-scientist, scala Variables from one %%scala cell are need to be accessible in other %%scala cell so that the code looks cleaner.

Expected behavior

%%scala Should not loose context or variables defined in other or previous %%scala cells.

Actual behavior

%%scala __df.show()
Output:-
pixiedustRunner.scala:20: error: not found: value __df
__df.show()
^
pixiedustRunner.scala:22: error: not found: value __df
Map( "__df"->__df )
^
two errors found
In [ ]:

Steps to reproduce the behavior

Create a new notebook with Python
Run below in one cell
%%scala
import org.apache.spark.sql._

print(sc.version)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val (__df) = sqlContext.read.json("people.json")

__df.show()

Next Cell, Use Python display() to visualize __df
display(__df)

Now in next cell run below:-
%%scala __df.show()

Thanks,
Charles.

pixiedust.sampleData(n) doesn't load any data on DSX

I've tried several data sets using PD v0.80 on DSX.

in [6] pixiedust.sampleData(6)

Downloading 'Million dollar home sales in NE Mass late 2016' from https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv

Creating pySpark DataFrame for 'Million dollar home sales in NE Mass late 2016'. Please wait...

Even for a small data set like the one listed above (~100k) data frame creation never completes.

Namespace pollution

This might sound a bit nit-picky, but one thing I'd like to see changed would be to avoid using
from XX import *, like from pixidust.display import display in examples and documentation. I'm not sure what's in pixiedust (though I could look at the code), but there are probably a lot of modules that potentially have the same names as other graphing packages. Something better might be import pixiedust.display as pixd or just use the full path import pixiedust.diplay.

It would make it easier for folks to straight copy example code without worrying about namespace collisions.

Feel free to close this issue if you think I'm being an arse.

pixiedust is unable to share python value with scala

See:

screen shot 2016-10-31 at 01 49 51

The full stacktrace:

Py4JJavaErrorTraceback (most recent call last)
<ipython-input-19-1128de37cfc4> in <module>()
----> 1 get_ipython().run_cell_magic(u'scala', u'', u'\nprintln(bootstrap_servers_string)\n\nimport org.apache.spark.streaming.Duration\nimport org.apache.spark.streaming.Seconds\nimport org.apache.spark.streaming.StreamingContext\nimport com.ibm.cds.spark.samples.config.MessageHubConfig\nimport com.ibm.cds.spark.samples.dstream.KafkaStreaming.KafkaStreamingContextAdapter\nimport org.apache.kafka.common.serialization.Deserializer\nimport org.apache.kafka.common.serialization.StringDeserializer\n\nval kafkaProps = new MessageHubConfig\n\nkafkaProps.setConfig("bootstrap.servers", bootstrap_servers_string.toString())\nkafkaProps.setConfig("kafka.user.name", sasl_plain_username.toString())\nkafkaProps.setConfig("kafka.user.password", sasl_plain_password.toString())\nkafkaProps.setConfig("kafka.topic", messagehub_topic_name.toString())\nkafkaProps.setConfig("api_key", api_key.toString())\nkafkaProps.setConfig("kafka_rest_url", kafka_rest_url.toString())\n\nkafkaProps.createConfiguration()\n\nval ssc = new StreamingContext( sc, Seconds(2) )\n\nval stream = ssc.createKafkaStream[String, String, StringDeserializer, StringDeserializer](\n                     kafkaProps,\n                     List(kafkaProps.getConfig("kafka.topic"))\n                     );\n\nstream.print()\nssc.start()\nssc.awaitTermination()')

/usr/local/src/bluemix_jupyter_bundle.v22/notebook/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2118             magic_arg_s = self.var_expand(line, stack_depth)
   2119             with self.builtin_trap:
-> 2120                 result = fn(magic_arg_s, cell)
   2121             return result
   2122 

<decorator-gen-125> in scala(self, line, cell)

/usr/local/src/bluemix_jupyter_bundle.v22/notebook/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/gpfs/fs01/user/s85d-88ebffb000cc3e-39ca506ba762/.local/lib/python2.7/site-packages/pixiedust/utils/scalaBridge.pyc in scala(self, line, cell)
    166                 runnerObject.callMethod("set" + key[0].upper() + key[1:], val["initValue"])
    167 
--> 168         varMap = runnerObject.callMethod("runCell")
    169 
    170         #capture the return vars and update the interactive shell

/gpfs/fs01/user/s85d-88ebffb000cc3e-39ca506ba762/.local/lib/python2.7/site-packages/pixiedust/utils/javaBridge.pyc in callMethod(self, methodName, *args)
    145                             break;
    146                 if match:
--> 147                     return m.invoke(self.jHandle, jMethodArgs)
    148 
    149         raise ValueError("Method {0} that matches the given arguments not found".format(methodName) )

/usr/local/src/spark160master/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
    811         answer = self.gateway_client.send_command(command)
    812         return_value = get_return_value(
--> 813             answer, self.gateway_client, self.target_id, self.name)
    814 
    815         for temp_arg in temp_args:

/usr/local/src/spark160master/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/usr/local/src/spark160master/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error occurred while calling o513.invoke.
: java.lang.NullPointerException
    at com.ibm.pixiedust.PixiedustScalaRun$.runCell(pixiedustRunner.scala:125)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:507)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:507)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:785)

The notebook:

https://apsportal.ibm.com/analytics/notebooks/c94ee063-dab3-4440-9b90-125347ff80af/view?access_token=1c752de564c7c2df3fe924ce4122e71e658e73ff64160c80b667844527a8df88

Create ~/data/libs directory if not exists

For fresh installation of pixiedust installPixiedustJar() needs ./data/libs to exist in the home directory otherwise fails with 'No such file or directory':

Cell code:
import pixiedust
pixiedust.installPackage("graphframes:graphframes:0")
pixiedust.printAllPackages()

Output:
Pixiedust database opened successfully
Pixiedust version 0.60


IOError Traceback (most recent call last)
in ()
----> 1 import pixiedust
2 pixiedust.installPackage("graphframes:graphframes:0")
3 pixiedust.printAllPackages()

/Users/chetnadwarade/pixiedust/pixiedust/init.py in ()
20 warnings.simplefilter("ignore")
21 #shortcut to logging
---> 22 import utils
23 import utils.pdLogging
24 logger = utils.pdLogging.getPixiedustLogger()

/Users/chetnadwarade/pixiedust/pixiedust/utils/init.py in ()
44
45 if copyFile:
---> 46 installPixiedustJar()

/Users/chetnadwarade/pixiedust/pixiedust/utils/init.py in installPixiedustJar()
32 def installPixiedustJar():
33 with pkg_resources.resource_stream(name, "resources/pixiedust.jar") as resJar:
---> 34 with open( jarFilePath, 'w+' ) as installedJar:
35 shutil.copyfileobj(resJar, installedJar)
36 print("Pixiedust runtime updated. Please restart kernel")

IOError: [Errno 2] No such file or directory: '/Users/chetnadwarade/data/libs/pixiedust.jar'

Multi series chart

Need to visualize a multi-series bar chart. A common requirement to visualize say, quarterly aggregated sales across year. Can this support be added.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.