pixiedust / pixiedust Goto Github PK

Python Helper library for Jupyter Notebooks

Home Page: https://pixiedust.github.io/pixiedust/

License: Apache License 2.0

Python 10.36% Jupyter Notebook 82.53% HTML 4.85% JavaScript 1.58% Java 0.09% Scala 0.22% Makefile 0.13% Batchfile 0.12% Smarty 0.01% CSS 0.10%

python-notebook spark scala-notebooks visualization jupyter-notebook python data-science pixiedust

pixiedust's Introduction

PixieDust

PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.

New Book now available: Thoughtful Data Science

This book published by Packt Publishing is the user and developer reference for using PixieDust

Pixiedust developer community

Wait! There is a developer community? Yes there is! If you already are a member, login. If you would like to contribute please join us.

Why you need it

Notebooks are a powerful tool for fast and flexible data analysis. But the learning curve is steep.

Python data science notebooks were first popularized in academia, and there are some formalities to work through before you can get to your analysis. For example, in a Python interactive notebook, a mundane task like creating a simple chart or saving data into a persistence repository requires mastery of complex code like this matplotlib snippet:

All this for a chart?

Once you do create a notebook that provides great data insights, it's hard to share with business users, who don't want to slog through all that dry, hard-to-read code, much less tweak it and collaborate.

PixieDust to the rescue.

What is PixieDust?

PixieDust is an open source helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. It also fills a gap for users who have no access to configuration files when a notebook is hosted on the cloud.

Use in Python or Scala

PixieDust greatly simplifies working with Python display libraries like matplotlib, but works just as effectively in Scala notebooks too. You no longer have compromise your love of Scala to generate great charts. PixieDust lets you bring robust Python visualization options to your Scala notebooks. Installer and instructions to use Scala with PixieDust are coming soon...

Features

PixieDust's current capabilities include:

packageManager lets you install Spark packages inside a Python notebook. This is something that you can't do today on hosted Jupyter notebooks, which prevents developers from using a large number of spark package add-ons.
Visualizations. One single API called display() lets you visualize your Spark object in different ways: table, charts, maps, etc.... This module is designed to be extensible, providing an API that lets anyone easily contribute a new visualization plugin.

This sample visualization plugin uses d3 to show the different flight routes for each airport:
Embedded apps. Let nonprogrammers actively use notebooks. Transform a hard-to-read notebook into a polished graphic app for business users. Check out these preliminary sample apps:
- An app can feature embedded forms and responses, flightpredict, which lets users enter flight details to see the likelihood of landing on-time.
- Or present a sophisticated workflow, like our twitter demo, which delivers a real-time feed of tweets, trending hashtags, and aggregated sentiment charts with Watson Tone Analyzer.
Extensibility. Create your own visualizations or apps using the PixieDust extensibility APIs. If you know html and css, you can write and deliver amazing graphics without forcing notebook users to type one line of code. Use the shape of the data to control when PixieDust shows your visualization in a menu.
Export. Notebook users can download data to .csv, HTML, JSON, etc. locally on your laptop or into a variety of back-end data sources, like Cloudant, dashDB, GraphDB, etc...
Scala Bridge. Use Scala directly in your Python notebook. Variables are automatically transfered from Python to Scala and vice-versa. Learn more.

Or start in a Scala notebook. As mentioned, all these PixieDust features work not only in Python, but in Scala too. So if you prefer Scala, you'll soon be able to start there and use PixieDust to insert sophisticated Python graphic options within your Scala notebook. Instructions coming soon.
Spark progress monitor. Track the status of your Spark job. No more waiting in the dark. Notebook users can now see how a cell's code is running behind the scenes.

Watch this video to see PixieDust in action:

Usage

You can use PixieDust locally or online within IBM's Watson Studio.

Use online

To use PixieDust online

Sign up for a free trial on IBM's Watson Studio
Create a new notebook from URL using this template and learn the basics

https://github.com/pixiedust/pixiedust/blob/master/notebook/DSX/Welcome%20to%20PixieDust.ipynb
Review the documentation

Use locally

Pixiedust supports

Spark 1.6 or 2.0
Python 2.7 or 3.5

Sample notebooks

Wherever you prefer to work, try out the following sample notebooks:

Welcome to PixieDust The ultimate notebook to get started with PixieDust.
Intro to PixieDust. Uses PackageManager to install GraphFrames, generates a dataframe from a simple data set, and lets you try the display() API. See also: Intro to PixieDust for Spark 2.x
Mapping Intro lets you load sample data sets, explore display() API features, including maps.

Tutorials

Discover hidden Facebook usage insights
FlightPredict II: The Sequel shows how to predict flight delays with PixieDust. Includes an embedded app
Sentiment Analysis of Twitter Hashtags with Spark revisits a spark streaming app this time using PixieDust and Jupyter. Includes an embedded app.

Contribute

Note: PixieDust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back.

Read how to contribute for details on our code of conduct and instructions for submitting pull requests to us.

Developer Guide

Dive into the PixieDust developer docs and learn how to build your own custom visualization or embedded app. You can also pitch in and contribute an enhancement to PixieDust's core features.

We can't wait to see what you build.

License

Apache License, Version 2.0.

For details and all the legalese, read LICENSE.

pixiedust's People

Contributors

Stargazers

Watchers

Forkers

ygoverdhan rakeshnb codeaudit anamecheverri pberkes kushalvenkatesh jessewei zhuohuwu0603 kreczko natashadsilva ap-aep utilitytools nicklondhe awesome-python higgintown trudake pereirapysensing rsweinst tanyaselvog eldritchjs digideskio charlesbrandontheis kastentx tylernd27 romeokienzler kc594 jordangeorge huj10001 daniellinke rongw bodonnell005 yingchun79 faviovazquez j143-zz claudiopinheiro rinatsafianov sksundaram-learning karlroche suryakant54321 ulemanstreaming tatianamc arkadianriver mbarce vasanthgx lazycrazyowl mentrics fraudnetnyc benhuds anithasutrave amsully kathleen-francis slipperypenguin chohanbin henfee sax1johno sgml lilprincexyz dyno-marketing jenslo cuulee tonuchsketch jrstaff lumiqai jissac pombredanne adigidh johnkushkorov dtaieb jaumb mattiasbackstromprevas direkshan-digital sdk2116 vasilikiloukoumi danielmanning luxurytt pilhokim jamesbconner leidym35 arifulmondal benzei elgalu radovankavicky gapdata kirosg zhezhe123 rajrsingh cojo75 stopkickingtherobots behrsl25 kokonrori muxuezi y44k0v ajaypod x-karlitox paolupi nagyistzcons hauter jbdatascience mobilizer stevexyz

pixiedust's Issues

ImportError: No module named _tkinter

Just tried to import pixiedust and hit the error. I tried restarting the kernel still the same error.

ImportErrorTraceback (most recent call last)
<ipython-input-1-fa75a171b8de> in <module>()
      1 get_ipython().system(u'pip install --user --upgrade --quiet pixiedust')
----> 2 import pixiedust
      3 
      4 jars = ["http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.9.0.0/kafka-clients-0.9.0.0.jar",
      5         "http://central.maven.org/maven2/org/apache/kafka/kafka_2.10/0.9.0.0/kafka_2.10-0.9.0.0.jar",

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/__init__.pyc in <module>()
     29 uninstallPackage=packageManager.uninstallPackage
     30 
---> 31 import display
     32 import services
     33 

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/__init__.py in <module>()
     16 
     17 from .display import *
---> 18 import chart,graph,table,tests,download
     19 from pixiedust.utils.printEx import *
     20 import traceback

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/__init__.py in <module>()
     15 # -------------------------------------------------------------------------------
     16 
---> 17 from .barChartDisplay import BarChartDisplay
     18 from .lineChartDisplay import LineChartDisplay
     19 from .scatterPlotDisplay import ScatterPlotDisplay

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/barChartDisplay.py in <module>()
     16 
     17 from .display import ChartDisplay
---> 18 from .mpld3ChartDisplay import Mpld3ChartDisplay
     19 import matplotlib.pyplot as plt
     20 import numpy as np

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/mpld3ChartDisplay.py in <module>()
     15 # -------------------------------------------------------------------------------
     16 
---> 17 from .baseChartDisplay import BaseChartDisplay
     18 from .display import ChartDisplay
     19 from .plugins.chart import ChartPlugin

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/pixiedust/display/chart/baseChartDisplay.py in <module>()
     22 from pyspark.sql.types import StructType
     23 import matplotlib.cm as cm
---> 24 import matplotlib.pyplot as plt
     25 import mpld3
     26 import mpld3.plugins as plugins

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/pyplot.py in <module>()
    112 
    113 from matplotlib.backends import pylab_setup
--> 114 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
    115 
    116 _IP_REGISTERED = None

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/backends/__init__.pyc in pylab_setup()
     30     # imports. 0 means only perform absolute imports.
     31     backend_mod = __import__(backend_name,
---> 32                              globals(),locals(),[backend_name],0)
     33 
     34     # Things we pull in from all backends

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/backends/backend_tkagg.py in <module>()
      4 
      5 from matplotlib.externals import six
----> 6 from matplotlib.externals.six.moves import tkinter as Tk
      7 from matplotlib.externals.six.moves import tkinter_filedialog as FileDialog
      8 

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in load_module(self, fullname)
    197         mod = self.__get_module(fullname)
    198         if isinstance(mod, MovedModule):
--> 199             mod = mod._resolve()
    200         else:
    201             mod.__loader__ = self

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in _resolve(self)
    111 
    112     def _resolve(self):
--> 113         return _import_module(self.mod)
    114 
    115     def __getattr__(self, attr):

/gpfs/fs01/user/xxxxxx/.local/lib/python2.7/site-packages/matplotlib/externals/six.pyc in _import_module(name)
     78 def _import_module(name):
     79     """Import module, returning the module after the last dot."""
---> 80     __import__(name)
     81     return sys.modules[name]
     82 

/usr/local/src/bluemix_jupyter_bundle.v22/notebook/lib/python2.7/lib-tk/Tkinter.py in <module>()
     37     # Attempt to configure Tcl/Tk without requiring PATH
     38     import FixTk
---> 39 import _tkinter # If this fails your Python may not be configured for Tk
     40 tkinter = _tkinter # b/w compat for export
     41 TclError = _tkinter.TclError

ImportError: No module named _tkinter

As an inventory manager, I want to figure out what weather is coming so that I can adjust product stock

Notebook Users need detailed instructs for Display options

Edit Display(API) topic to explain how to work with chart options

Support Decimal type in charts

mpld3 uses Python json library which does not support serialization of Decimal types.

Potential fixes:

Override default json encoder (if possible)
Convert all incoming Decimals to Doubles (may cause issues with precision)

Good to have a way to specify a title for the visual.

Matplotlib allows us to give specific text as titles. Pixiedust should offer same.

As a notebook user, I need instructs to load sample data

add/edit help topic

Line charts: "Options" > "Show legends" doesn't seem to do anything

The charts appear to be the same when this option is enabled or disabled.

Line/Bar Chart - Impossible to have 2 lines / 2 bars?

I have a dataframe with column A, B and C.
I want A to be the x-axis.
I want B to be the size of the bar or the height of the line.
I want C to be the category, to have an other bar or an other line for each C attribute with a different color.

From the options of PixieDust, I can see the "keys" and "values" but I see nothing for the categories.

As a user, I want to easily remove the automatically selected keys and values

Exploring some of the sample notebooks I realized that more often than not the default options don't work for many chart types. For example, pick million dollar homes and google as the renderer. An error is displayed incompatible data table: Error: table contains more columns than expected. If I want to change the chart options I have to manually remove each pre-selected field, which is rather labor intensive for data sets with many fields. A reset/clear button that resets the options would be nice to have.

As a notebook user, I want to understand the promise of PD + Scala

Not part of 1.0, make sure Scala mentions explain the feature, (link to video demo?) and specify that it's coming soon.

display(): options > "values" should require at least one field

It's currently possible to only specify a key (but no value) in the options. I can't think of a scenario where this kind of setting would result in a useful chart. Can you?

Wrong formatting for bar chart display

Posted on SO: http://stackoverflow.com/questions/41990549/dsx-images-generated-by-pixiedust-display-command-are-ugly

As a data scientist, I want to not aggregate values for scatter & line plots, see raw data.

Expected behavior

Plot raw data via line and scatter plot. When user selects line or scatter, do not display aggregate option. For scatter in matplotlib, Options dialog does not show aggregtation.

Actual behavior

Currently pixiedust 'Options' dialog aggregates values into SUM, AVG, MIN, MAX and COUNT which are helpful with bar, pie and histograms but not for scatter and line, in fact that changes the course of analysis as raw data is also important to look at.

Steps to reproduce the behavior

Select type of chart, open Options dialog.

display(): "Download SVG" doesn't work

I've tried Chrome and Safari on Mac.

Plot options before rendering

It would be nice to select plotting options before rendering since it can take a while to render large amounts of data. Sitting around waiting for the default behavior to finish before being able to select options. Perhaps a modal window can pop up when you select the chart option before it begins.

Add support for dataframe complex type

Currently, row with a complex type (e.g nested structure) are not handled correctly by the chart option dialog. The schema tab of table visualization should also display nested structure more clearly (currently display as json)
@markwatsonatx @vabarbosa

Pixiedust Zeppelin Support

Hi,

I am working in Zeppelin, it has support to visualize that but its visualization is limited. Is it possible to add pixiedust to Zeppelin, i had tried to integrate it Zeppelin but when i call display(dataframe). It display the following:

{"data": {"text/plain": "<IPython.core.display.HTML object>", "text/html": "\n<script>\n //Marker 706ab2be\n setTimeout(function(){\n var cells=IPython.notebook.get_cells().filter(function(cell){\n if(!cell.output_area || !cell.output_area.outputs){\n return false;\n }\n return cell.output_area.outputs.filter(function(output){\n if (output.output_type===\"display_data\"&&output.data&&output.data[\"text/html\"]){\n return output.data[\"text/html\"].includes(\"//Marker 706ab2be\")\n }\n return false;\n }).length > 0;\n });\n if(cells.length>0){\n var cell=cells[0];\n var cellId=cell.cell_id;\n var cellMetadata=cell._metadata.pixiedust;\n //cell.output_area.clear_output(false, true);\n var old_msg_id = cell.last_msg_id;\n if (old_msg_id) {\n cell.kernel.clear_callbacks_for_msg(old_msg_id);\n }\n var snifferOptions = [];\n \n snifferOptions.push({'nostore_bokeh':!!window.Bokeh})\n \n \n \n \n !\nfunction() {\n cellId = typeof cellId === \"undefined\" ? \"\" : cellId;\n var curCell=IPython.notebook.get_cells().filter(function(cell){\n return cell.cell_id==\"cellId\".replace(\"cellId\",cellId);\n });\n curCell=curCell.length>0?curCell[0]:null;\n console.log(\"curCell\",curCell);\n var startWallToWall;\n //Resend the display command\n var callbacks = {\n shell : {\n payload : {\n set_next_input : function(payload){\n if (curCell){\n curCell._handle_set_next_input(payload);\n }\n }\n }\n },\n iopub:{\n output:function(msg){\n console.log(\"msg\", msg);\n if (true){\n curCell.output_area.handle_output.apply(curCell.output_area, arguments);\n curCell.output_area.outputs=[];\n return;\n }\n var msg_type=msg.header.msg_type;\n var content = msg.content;\n var executionTime = $(\"#execution706ab2be\");\n if(msg_type===\"stream\"){\n $('#wrapperHTML706ab2be').html(content.text);\n }else if (msg_type===\"display_data\" || msg_type===\"execute_result\"){\n var html=null;\n if (!!content.data[\"text/html\"]){\n html=content.data[\"text/html\"];\n }else if (!!content.data[\"image/png\"]){\n html=html||\"\";\n html+=\"<img src='data:image/png;base64,\" +content.data[\"image/png\"]+\"'></img>\";\n }\n \n if (!!content.data[\"application/javascript\"]){\n try {\n eval(content.data[\"application/javascript\"]);\n } catch(err) {\n curCell.output_area.handle_output.apply(curCell.output_area, arguments);\n } \n return;\n }\n \n if (html){\n try{\n $('#wrapperHTML706ab2be').html(html);\n }catch(e){\n console.log(\"Invalid html output\", e, html);\n $('#wrapperHTML706ab2be').html( \"Invalid html output. <pre>\" \n + html.replace(/>/g,'>').replace(/</g,'<').replace(/\"/g,'"') + \"<pre>\");\n }\n\n if(curCell&&curCell.output_area&&curCell.output_area.outputs){\n setTimeout(function(){\n var data = JSON.parse(JSON.stringify(content.data));\n if(!!data[\"text/html\"])data[\"text/html\"]=html;\n function savedData(data){\n \n var markup='<style type=\"text/css\">.pd_warning{display:none;}</style>';\n markup+='<div class=\"pd_warning\"><em>Hey, there\\'s something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter</em></div>';\n nodes = $.parseHTML(data[\"text/html\"], null, true);\n var s = $(nodes).wrap(\"<div>\").parent().find(\".pd_save\").not(\".pd_save .pd_save\")\n s.each(function(){\n var found = false;\n if ( $(this).attr(\"id\") ){\n var n = $(\"#\" + $(this).attr(\"id\"));\n if (n.length>0){\n found=true;\n n.each(function(){\n $(this).addClass(\"is-viewer-good\");\n });\n markup+=n.wrap(\"<div>\").parent().html();\n }\n }else{\n $(this).addClass(\"is-viewer-good\");\n }\n if (!found){\n markup+=$(this).parent().html();\n }\n });\n data[\"text/html\"] = markup;\n return data;\n }\n curCell.output_area.outputs.push({\"data\": savedData(data),\"metadata\":content.metadata,\"output_type\":msg_type});\n },2000);\n }\n }\n }else if (msg_type === \"error\") {\n require(['base/js/utils'], function(utils) {\n var tb = content.traceback;\n console.log(\"tb\",tb);\n if (tb && tb.length>0){\n var data = tb.reduce(function(res, frame){return res+frame+'\\\\n';},\"\");\n console.log(\"data\",data);\n data = utils.fixConsole(data);\n data = utils.fixCarriageReturn(data);\n data = utils.autoLinkUrls(data);\n $('#wrapperHTML706ab2be').html(\"<pre>\" + data +\"</pre>\");\n }\n });\n }\n\n //Append profiling info\n if (executionTime.length > 0 && $(\"#execution706ab2be\").length == 0 ){\n $('#wrapperHTML706ab2be').append(executionTime);\n }else if (startWallToWall && $(\"#execution706ab2be\").length > 0 ){\n $(\"#execution706ab2be\").append($(\"<div/>\").text(\"Wall to Wall time: \" + ( (new Date().getTime() - startWallToWall)/1000 ) + \"s\"));\n }\n }\n }\n }\n \n if (IPython && IPython.notebook && IPython.notebook.session && IPython.notebook.session.kernel){\n var command = \",cell_id='cellId'\".replace(\"cellId\",cellId);\n function addOptions(options){\n function getStringRep(v) {\n return \"'\" + v + \"'\";\n }\n for (var key in (options||{})){\n var value = options[key];\n var hasValue = value != null && typeof value !== 'undefined' && value !== '';\n var replaceValue = hasValue ? (key+\"=\" + getStringRep(value) ) : \"\";\n var pattern = (hasValue?\"\":\",\")+\"\\\\s*\" + key + \"\\\\s*=\\\\s*'(\\\\\\\\'|[^'])*'\";\n var rpattern=new RegExp(pattern);\n var n = command.search(rpattern);\n if ( n >= 0 ){\n command = command.replace(rpattern, replaceValue);\n }else if (hasValue){\n var n = command.lastIndexOf(\")\");\n command = [command.slice(0, n), (command[n-1]==\"(\"? \"\":\",\") + replaceValue, command.slice(n)].join('')\n } \n }\n }\n if(typeof cellMetadata != \"undefined\" && cellMetadata.displayParams){\n addOptions(cellMetadata.displayParams);\n addOptions({\"showchrome\":\"true\"});\n }else if ('False'=='True' && curCell && curCell._metadata.pixiedust ){\n addOptions(curCell._metadata.pixiedust.displayParams || {} );\n }\n addOptions({});\n \n \n \n for (var o in snifferOptions){\n addOptions(snifferOptions[o]);\n }\n \n \n var pattern = \"\\\\w*\\\\s*=\\\\s*'(\\\\\\\\'|[^'])*'\";\n var rpattern=new RegExp(pattern,\"g\");\n var n = command.match(rpattern);\n var displayParams={}\n for (var i = 0; i < n.length; i++){\n var parts=n[i].split(\"=\");\n var key = parts[0].trim();\n var value = parts[1].trim()\n if (!key.startsWith(\"nostore_\") && key != \"showchrome\" && key != \"prefix\" && key != \"cell_id\"){\n displayParams[key] = value.substring(1,value.length-1);\n }\n }\n if(curCell&&curCell.output_area){\n curCell._metadata.pixiedust = curCell._metadata.pixiedust || {}\n curCell._metadata.pixiedust.displayParams=displayParams\n curCell.output_area.outputs=[];\n var old_msg_id = curCell.last_msg_id;\n if (old_msg_id) {\n curCell.kernel.clear_callbacks_for_msg(old_msg_id);\n }\n }else{\n console.log(\"couldn't find the cell\");\n }\n $('#wrapperJS706ab2be').html(\"\")\n $('#wrapperHTML706ab2be').html('<div style=\"width:100px;height:60px;left:47%;position:relative\"><i class=\"fa fa-circle-o-notch fa-spin\" style=\"font-size:48px\"></i></div>'+\n '<div style=\"text-align:center\">Loading your data. Please wait...</div>');\n startWallToWall = new Date().getTime();\n \n console.log(\"Running command2\",command);\n IPython.notebook.session.kernel.execute(command, callbacks, {silent:true,store_history:false,stop_on_error:true});\n }\n}\n()\n }else{\n alert(\"An error occurred, unable to access cell id\");\n }\n },500);\n</script>"}, "metadata": {}}

Any help related to this issue? Or anyone had tried pixiedust with Zeppelin?

Getting more done in GitHub with ZenHub

Hola! @rajrsingh has created a ZenHub account for the ibm-cds-labs organization. ZenHub is the only project management tool integrated natively in GitHub – created specifically for fast-moving, software-driven teams.

How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

Real-time, customizable task boards for GitHub issues;
Multi-Repository burndown charts, estimates, and velocity tracking based on GitHub Milestones;
Personal to-do lists and task prioritization;
Time-saving shortcuts – like a quick repo switcher, a “Move issue” button, and much more.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @rajrsingh.

Add export to SVG option

export graphics to SVG for editing in a graphics program -- changing fonts, moving overlapping text, etc.

As a notebook user, I need to know how to invoke Spark Progress Monitor

Add topic with command and info

As a data scientist, I want to get the locations of my customers and stores so that I can make nice maps

As a developer, I want more visualization options, so that ...

Expected behavior

Actual behavior

Steps to reproduce the behavior

log scale

Would be nice to have axis on log10 scale.

Download DataFrame as JSON is broken

@vabarbosa : Get following error in the browser console:
main.min.js:23119 Couldn't process kernel message TypeError: Failed to execute 'appendChild' on 'Node': parameter 1 is not of type 'Node'.

As a notebook user, I need instructs/details re: download data & chart options

Add topic re:

data download options
-Stash to Cloudant is currently Beta feature
-cover download SVG too

As a user, I want to access inline help so that I don't have to ask questions

Expected behavior

Proper support for pixiedust? and help(pixiedust)

Steps to reproduce the behavior

import pixiedust and run pixiedust? or help(pixiedust) or help(pixiedust.packageManager) ...

As a developer, I want to use an install script so that installation is very easy

Expected behavior

Actual behavior

Steps to reproduce the behavior

As a user, I want to know what prerequisites have to be met so a chart can be rendered

Example: I opened the million dollar homes sample notebook. Visualization wasn't working because

I was using an unsupported browser
I didn't know that for mapbox the chart options need to be customized (an access token must be provided)
I didn't know where to obtain the token

As a user, I want to know more about the sample data before I load it

Expected behavior

sampleData() should link to the curated public sample data set so a user can make an informed decision which set to pick:

Id 	Name 	Topic 	Publisher
1 	[car performance data](link-to-public-data-set) 	transportation 	IBM

Even better, let the user somehow pick from any of the public data sets available on the exchange.

Actual behavior (0.80)

pixiedust.sampleData() currently displays

Id 	Name 	Topic 	Publisher
1 	Car performance data 	transportation 	IBM

Steps to reproduce the behavior

import pixiedust
pixiedust.sampleData()

As a data scientist, I want to see Options dialog display type of chart selected so that I remember which chart was chosen.

Expected behavior

When user clicks on Options dialog, display type of chart (or even type of renderer, optional though as it is seen on the right hand side) as the drop down of type of chart does not store last selected type of chart.

Actual behavior

Options does not display type of chart, so it is difficult to remember most suitable aggregate, x and y values for the chart.

Steps to reproduce the behavior

Select type of chart. Click Options button.

display(): "Stash to cloudant" can't edit or delete connection

As a <user type>, I want to <task> so that <goal> (make this the title)

Expected behavior

Actual behavior

Steps to reproduce the behavior

As a developer, I want a Code of Conduct so that my interaction with the community is good

Expected behavior

Actual behavior

Steps to reproduce the behavior

As a data analyst, I want to match customer zips to demographic data so that I can understand who these people are

Passing DataFrame from Python to Scala not working

When passing a DataFrame from Python to Scala using the Scala Bridge, the DataFrame comes across as null in Scala.

As a developer, I want to integrate mapping into my Notebook work so that I can do spatial analysis and visualization

These stories will highlight Mapbox, CartoDB, geopandas, etc.

As a developer, I want a comprehensive Read Me/Dev Guide so that I can learn how to contribute

Expected behavior

Actual behavior

Steps to reproduce the behavior

display(): Options > "# of rows to display" shouldn't accept 0 as valid input

Ensure python 3 compatibility

I get the following errors upon installation:

>>> import pixiedust
>>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/esafak/Library/Python/3.5/lib/python/site-packages/pixiedust/__init__.py", line 19, in <module>
    import packageManager
ImportError: No module named 'packageManager'

>>> from pixiedust.packageManager import PackageManager
>>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/esafak/Library/Python/3.5/lib/python/site-packages/pixiedust/__init__.py", line 19, in <module>
    import packageManager
ImportError: No module named 'packageManager'

I think you have an issue with relative imports.

As a data-scientist, scala Variables from one %%scala cell are need to be accessible in other %%scala cell

As a data-scientist, scala Variables from one %%scala cell are need to be accessible in other %%scala cell so that the code looks cleaner.

Expected behavior

%%scala Should not loose context or variables defined in other or previous %%scala cells.

Actual behavior

%%scala __df.show()
Output:-
pixiedustRunner.scala:20: error: not found: value __df
__df.show()
^
pixiedustRunner.scala:22: error: not found: value __df
Map( "__df"->__df )
^
two errors found
In [ ]:

Steps to reproduce the behavior

Create a new notebook with Python
Run below in one cell
%%scala
import org.apache.spark.sql._

print(sc.version)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val (__df) = sqlContext.read.json("people.json")

__df.show()

Next Cell, Use Python display() to visualize __df
display(__df)

Now in next cell run below:-
%%scala __df.show()

Thanks,
Charles.

Histogram binning

I'd like to be able to set histogram bin sizes in some way.

pixiedust.sampleData(n) doesn't load any data on DSX

I've tried several data sets using PD v0.80 on DSX.

in [6] pixiedust.sampleData(6)

Downloading 'Million dollar home sales in NE Mass late 2016' from https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv

Creating pySpark DataFrame for 'Million dollar home sales in NE Mass late 2016'. Please wait...

Even for a small data set like the one listed above (~100k) data frame creation never completes.

As a developer, I need to understand that extensibility instructs coming soon

Fill in gaps in Developer section, specify (coming soon) within topic titles.
Review and align for what's ready in 1.0 release.

Namespace pollution

This might sound a bit nit-picky, but one thing I'd like to see changed would be to avoid using
from XX import *, like from pixidust.display import display in examples and documentation. I'm not sure what's in pixiedust (though I could look at the code), but there are probably a lot of modules that potentially have the same names as other graphing packages. Something better might be import pixiedust.display as pixd or just use the full path import pixiedust.diplay.

It would make it easier for folks to straight copy example code without worrying about namespace collisions.

Feel free to close this issue if you think I'm being an arse.

As a data scientist, I want to blog a Tutorial to show that Pixiedust works swith Scala

Expected behavior

Actual behavior

Steps to reproduce the behavior

pixiedust is unable to share python value with scala

See:

The full stacktrace:

Py4JJavaErrorTraceback (most recent call last)
<ipython-input-19-1128de37cfc4> in <module>()
----> 1 get_ipython().run_cell_magic(u'scala', u'', u'\nprintln(bootstrap_servers_string)\n\nimport org.apache.spark.streaming.Duration\nimport org.apache.spark.streaming.Seconds\nimport org.apache.spark.streaming.StreamingContext\nimport com.ibm.cds.spark.samples.config.MessageHubConfig\nimport com.ibm.cds.spark.samples.dstream.KafkaStreaming.KafkaStreamingContextAdapter\nimport org.apache.kafka.common.serialization.Deserializer\nimport org.apache.kafka.common.serialization.StringDeserializer\n\nval kafkaProps = new MessageHubConfig\n\nkafkaProps.setConfig("bootstrap.servers", bootstrap_servers_string.toString())\nkafkaProps.setConfig("kafka.user.name", sasl_plain_username.toString())\nkafkaProps.setConfig("kafka.user.password", sasl_plain_password.toString())\nkafkaProps.setConfig("kafka.topic", messagehub_topic_name.toString())\nkafkaProps.setConfig("api_key", api_key.toString())\nkafkaProps.setConfig("kafka_rest_url", kafka_rest_url.toString())\n\nkafkaProps.createConfiguration()\n\nval ssc = new StreamingContext( sc, Seconds(2) )\n\nval stream = ssc.createKafkaStream[String, String, StringDeserializer, StringDeserializer](\n                     kafkaProps,\n                     List(kafkaProps.getConfig("kafka.topic"))\n                     );\n\nstream.print()\nssc.start()\nssc.awaitTermination()')

/usr/local/src/bluemix_jupyter_bundle.v22/notebook/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2118             magic_arg_s = self.var_expand(line, stack_depth)
   2119             with self.builtin_trap:
-> 2120                 result = fn(magic_arg_s, cell)
   2121             return result
   2122 

<decorator-gen-125> in scala(self, line, cell)

/usr/local/src/bluemix_jupyter_bundle.v22/notebook/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/gpfs/fs01/user/s85d-88ebffb000cc3e-39ca506ba762/.local/lib/python2.7/site-packages/pixiedust/utils/scalaBridge.pyc in scala(self, line, cell)
    166                 runnerObject.callMethod("set" + key[0].upper() + key[1:], val["initValue"])
    167 
--> 168         varMap = runnerObject.callMethod("runCell")
    169 
    170         #capture the return vars and update the interactive shell

/gpfs/fs01/user/s85d-88ebffb000cc3e-39ca506ba762/.local/lib/python2.7/site-packages/pixiedust/utils/javaBridge.pyc in callMethod(self, methodName, *args)
    145                             break;
    146                 if match:
--> 147                     return m.invoke(self.jHandle, jMethodArgs)
    148 
    149         raise ValueError("Method {0} that matches the given arguments not found".format(methodName) )

/usr/local/src/spark160master/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
    811         answer = self.gateway_client.send_command(command)
    812         return_value = get_return_value(
--> 813             answer, self.gateway_client, self.target_id, self.name)
    814 
    815         for temp_arg in temp_args:

/usr/local/src/spark160master/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/usr/local/src/spark160master/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error occurred while calling o513.invoke.
: java.lang.NullPointerException
    at com.ibm.pixiedust.PixiedustScalaRun$.runCell(pixiedustRunner.scala:125)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:507)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:507)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:785)

The notebook:

https://apsportal.ibm.com/analytics/notebooks/c94ee063-dab3-4440-9b90-125347ff80af/view?access_token=1c752de564c7c2df3fe924ce4122e71e658e73ff64160c80b667844527a8df88

Flight Predict notebook doesn't render in GH

I can't get this notebook to render.

https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/Flight%20Predict%20with%20Pixiedust.ipynb

Create ~/data/libs directory if not exists

For fresh installation of pixiedust installPixiedustJar() needs ./data/libs to exist in the home directory otherwise fails with 'No such file or directory':

Cell code:
import pixiedust
pixiedust.installPackage("graphframes:graphframes:0")
pixiedust.printAllPackages()

Output:
Pixiedust database opened successfully
Pixiedust version 0.60

IOError Traceback (most recent call last)
in ()
----> 1 import pixiedust
2 pixiedust.installPackage("graphframes:graphframes:0")
3 pixiedust.printAllPackages()

/Users/chetnadwarade/pixiedust/pixiedust/init.py in ()
20 warnings.simplefilter("ignore")
21 #shortcut to logging
---> 22 import utils
23 import utils.pdLogging
24 logger = utils.pdLogging.getPixiedustLogger()

/Users/chetnadwarade/pixiedust/pixiedust/utils/init.py in ()
44
45 if copyFile:
---> 46 installPixiedustJar()

/Users/chetnadwarade/pixiedust/pixiedust/utils/init.py in installPixiedustJar()
32 def installPixiedustJar():
33 with pkg_resources.resource_stream(name, "resources/pixiedust.jar") as resJar:
---> 34 with open( jarFilePath, 'w+' ) as installedJar:
35 shutil.copyfileobj(resJar, installedJar)
36 print("Pixiedust runtime updated. Please restart kernel")

IOError: [Errno 2] No such file or directory: '/Users/chetnadwarade/data/libs/pixiedust.jar'

As a developer, I want to test my work so that I don't make Pixiedust unstable

Expected behavior

Actual behavior

Steps to reproduce the behavior

Missing label namein visualized piechart

Label name is missing for a section of the pie chart

Multi series chart

Need to visualize a multi-series bar chart. A common requirement to visualize say, quarterly aggregated sales across year. Can this support be added.

pixiedust / pixiedust Goto Github PK

pixiedust's Introduction

PixieDust

New Book now available: Thoughtful Data Science

Pixiedust developer community

Why you need it

What is PixieDust?

Use in Python or Scala

Features

Usage

Use online

Use locally

Sample notebooks

Tutorials

Contribute

Developer Guide

License

pixiedust's People

Contributors

Stargazers

Watchers

Forkers

pixiedust's Issues

Expected behavior

Actual behavior

Steps to reproduce the behavior

How do I use ZenHub?

What can ZenHub do?

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior (0.80)

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Expected behavior

Actual behavior

Steps to reproduce the behavior

Recommend Projects

Recommend Topics

Recommend Org