Giter Club home page Giter Club logo

introduction-to-pandas's Introduction

pandas

Materials for teaching the introductory pandas workshop at UC Berkeley's D-Lab.

Set Up

For this workshop we'll be using a Jupyter notebook.

Software for the workshop

The best learning experience happens when you can edit and run code. So, please have pandas, Matplotlib, and Jupyter or IPython installed. There are several options for getting your environment set up.

  1. Anaconda with Python 3.5+ (2.7 is okay).
  2. Python 3.5+ (2.7 is okay) and required packages installed using a package manager, such as conda (via Miniconda) or pip; you must install IPython 3.0+ with notebook support or IPython 4.0+/Jupyter 1.0+, pandas 0.17+, and Matplotlib 1.3+.
  3. (Perhaps as a last resort) BCE Summer 2015.

Both Anaconda and BCE distributions will install everything you need for this workshop (but BCE will most likely be out of date). If you decide to use pip, you can do the following (or for Miniconda, replace pip with conda):

# Install pandas and Matplotlib
$ pip install pandas matplotlib

# Install Jupyter
$ pip install --upgrade jupyter

Files for the workshop

Once those are installed, you should get the necessary files for this workshop, which are contained in this repository. Get them by doing the following:

# Clone this repository
$ git clone https://github.com/dlab-berkeley/introduction-to-pandas.git

# Navigate to the repo
$ cd introduction-to-pandas

# Start the interactive session
$ jupyter notebook

# ...alternatively (older versions of IPython)
$ ipython notebook

Outline

For this workshop, we'll go through an example using European unemployment data. We'll load, view, and modify the data as well as calculate some descriptive statistics. The idea is to get a sense of what it would be like to use pandas as part of your workflow.

We plan to cover:

  • pandas data structures
  • loading data
  • subsetting and filtering
  • calculating summary statistics
  • dealing with missing values
  • merging data sets
  • creating new variables
  • basic plotting
  • exporting data

Further resources

pandas Documentation

introduction-to-pandas's People

Contributors

akokai avatar davclark avatar henchc avatar juanshishido avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.