Giter Club home page Giter Club logo

ibm / visualize-data-with-python Goto Github PK

View Code? Open in Web Editor NEW
59.0 22.0 78.0 13.83 MB

A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.

Home Page: https://developer.ibm.com/patterns/visualize-data-with-python/

License: Apache License 2.0

Jupyter Notebook 100.00%
notebook data-science pixiedust dsx ibm spark journey jupyter-notebook ibmcode ibm-developer-technology-cognitive

visualize-data-with-python's Introduction

WARNING: This repository is no longer maintained

This repository will not be updated. The repository will be kept available in read-only mode.

Visualize and analyze data from the 2017 flood in Houston, TX using a Jupyter Notebook on IBM Watson Studio

In this Code Pattern we will use some standard techniques for data science and data engineering running on IBM Watson Studio to analyze publicly available data for the 2017 flooding in Houston, TX. Watson Studio is an interactive, collaborative, cloud-based environment where data scientists, developers, and others interested in data science can use tools (e.g., RStudio, Jupyter Notebooks, Spark, etc.) to collaborate, share, and gather insight from their data.

When the reader has completed this Code Pattern, they will understand how to:

The intended audience for this Code Pattern is application developers and other stakeholders who wish to utilize the power of Data Science quickly and effectively.

Flow

architecture

  1. Load the Jupyter notebook onto the IBM Watson Studio platform.
  2. USGS data from the Houston flood of 2017 is loaded into the notebook.
  3. The notebook is used to clean the data, and then display it.
  4. A PixieApp dashboard is created and can be interacted with.
  5. Mapbox and Folium are used for map visualizations

Included technologies

  • IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
  • PixieDust Python helper library for python notebooks
  • PixieApps: Python library used to write UI elements for analytics, and run them directly in a Jupyter notebook.
  • Mapbox GL: JavaScript library that uses WebGL to render interactive maps.

Prerequisites

Steps

Follow these steps to setup and run this Code Pattern. The steps are described in detail below.

  1. Sign up for the Watson Studio
  2. Create the notebook
  3. Run the notebook

1. Sign up for Watson Studio

Sign up for IBM's Watson Studio. By creating a project in Watson Studio a free tier Object Storage service will be created in your IBM Cloud account. Take note of your service names as you will need to select them in the following steps.

Note: When creating your Object Storage service, select the Free storage type in order to avoid having to pay an upgrade fee.

2. Create the notebook

  • In Watson Studio, click New Project + under Projects or, at the top of the page click + New and choose the tile for Data Science and then Create Project.
  • In Watson Studio using the project you've created, click on + Add to project and then choose the Notebook tile, OR in the Assets tab under Notebooks choose + New notebook to create a notebook.
  • Select the From URL tab. [1]
  • Enter a name for the notebook. [2]
  • Optionally, enter a description for the notebook. [3]
  • Under Notebook URL provide the following url: https://raw.githubusercontent.com/IBM/visualize-data-with-python/master/notebooks/HoustonFlood2017.ipynb [4]
  • For Runtime select the Spark Python 3.6 option. [5]
  • Click the Create notebook button. [6]

Create Notebook

3. Run the notebook

NOTE: There are points in the notebook where you will have to enter your Mapbox Token to render the map.

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

  • A blank, this indicates that the cell has never been executed.
  • A number, this number represents the relative order this code step was executed.
  • A *, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

  • One cell at a time.
    • Select the cell, and then press the Play button in the toolbar.
  • Batch mode, in sequential order.
    • From the Cell menu bar, there are several options available. For example, you can Run All cells in your notebook, or you can Run All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.
  • At a scheduled time.
    • Press the Schedule button located in the top right section of your notebook panel. Here you can schedule your notebook to be executed once at some future time, or repeatedly at your specified interval.

Sample Output

Note: Some interactive map functionality, like Options and Layers will not work. To see these, you must run the notebook itself.

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

visualize-data-with-python's People

Contributors

dolph avatar imgbot[bot] avatar jamaya2001 avatar kant avatar ljbennett62 avatar markstur avatar rajrsingh avatar rhagarty avatar sanjeevghimire avatar scottdangelo avatar stevemar avatar stevemart avatar tanmayb123 avatar yamachan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visualize-data-with-python's Issues

2.6.x Plot the Discharge against time gives Warning

plt.plot(df['datetime'],df['Discharge(cfs)'])
plt.title('Houston Flood discharge at Hunting Bayou stream gauge')
plt.ylabel('Discharge(cfs)')
plt.xlabel('datetime')
ax = plt.gca()
df.set_index('datetime')

# Only label every 20th value
ticks_to_use = df.index[::100]
# label ticks per day
dr = pd.date_range('2017-08-23', periods=9, freq='D')

## Now set the ticks and labels
ax.set_xticks(ticks_to_use)
ax.set_xticklabels(dr)
plt.xticks(rotation='vertical')

plt.show()

yields:

  (prop.get_family(), self.defaultFamily[fontext]))```

change `convert_objects()` to `Series.infer_objects()`

df['GuageHeight(feet)'] = df['GuageHeight(feet)'].convert_objects(convert_numeric=True)
df['Discharge(cfs)'] = df['Discharge(cfs)'].convert_objects(convert_numeric=True)

yields:

/Users/scott/.pyenv/versions/3.6.5/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use Series.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  """Entry point for launching an IPython kernel.
/Users/scott/.pyenv/versions/3.6.5/lib/python3.6/site-packages/ipykernel_launcher.py:2: FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use Series.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

problem with PixieDust versioning

From Armen Pischdotchian
Imported the notebook
https://raw.githubusercontent.com/IBM/pixiedust-traffic-analysis/master/notebooks/pixiedust-traffic-analysis.ipynb
On Cell 5, where it did the download, the error occurred. However, importing it from a local file worked, but now the instruction in the github are no good after that first display; the instructions import libraries from pyspark yet no such animal if you download from File....or at least that is what I understood.
Basically, do a filter for all tutorials that have Pixiedust in them, and therein lies the same error. I think it has to do with Pixiedust versioning that has become incompatible with latest Python and Spark versions.

Notebook is broken: Not able to open

Unreadable Notebook: /Users/mangeshpatankar/Documents/2018/Q2_2018/callforcode/city/pixiedust-traffic-analysis.ipynb NotJSONError("Notebook does not appear to be JSON: '\n\n\n\n\n\n\n<html lang...",)

bar chart errors on undefined value

I am getting an error when trying to visualize the data as a bar chart:
Invalid html output: undefined is not an object (evaluating 'this._mapping[t[0]].value')

Options
</div>

<div class="form-group col-sm-3">
    <label class="field">Renderer:</label>
    <select class="form-control" id="rendererc3083d2f" style="margin:0; display:inline-block; width:70%;">
        
            <option value="matplotlib" >matplotlib</option>
        
            <option value="bokeh" selected>bokeh</option>
        
    </select>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.