Giter Club home page Giter Club logo

dsc-3-25-04-visualizing-time-series-data-lab-demo-online-ds-000's Introduction

Visualizing Time Series Data - Lab

Introduction

As mentioned in the lecture, time series visualizations play an important role in the analysis of time series data. Time series are often plotted to allow data diagnostics to identify temporal structures.

In this lab, we'll cover main techniques for visualizing timeseries data in Python using the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia again. You might remember from the lesson that the units are in degrees Celsius and there are 3,650 observations. The source of the data is credited as the Australian Bureau of Meteorology.

Objectives

You will be able to:

  • Explore the temporal structure of time series with line plots
  • Understand and describe the distribution of observations using histograms and density plots
  • Measure the change in distribution over intervals using box and whisker plots and heat map plots

Let's get started!

Import the necessary libraries

# Load required libraries
# Load the data from min_temp.csv and check the index

Check the info. Next, make sure the index is the timestamp.

Check the info again

Time Series line plot

Create a time series line plot for temp_data

# Draw a line plot using temp_data 

Some distinguishable patterns appear when we plot the data. Here we can see a pattern in our timeseries i.e. temperature values are maximum at the beginnig of each year and minimum at around the 6th month. Yes, we are talking about Australia here so this is normal. This cyclical pattern is known as seasonality and will be covered in later labs.

Time Series dot plot

For a dense timeseries, as seen above, you may want to change the style of a line plot for a more refined visualization with a higher resolution of events. One way could be to change the continuous line to dots, each representing one entry in the time series. this can be achieved by style parameter of the line plot. lets pass style='b. as an argument to .plot() function

# Use dots instead on a continuous line and redraw the timeseries. 

This plot helps us identify clear outliers in certain years!

Grouping and Visualizing time series data

Now, let's group data by year and create a line plot for each year for direct comparison. You'll regroup data per year using Pandas.grouper().

  • Import pandas grouper and use it to group values by year.
  • Rearrange the data so you can create subplots for each year.
# Use pandas grouper to group values using annual frequency
#Create a new DataFrame and store yearly values in columns 

You can see 10 subplots correspoding to the number of columns in your new DataFrame. Each plot is 365 days in length following the annual frequency.

Now, plot the same plots in an overlapping way.

# Plot overlapping yearly groups 

We can see in both plots above that due the dense nature of time-series (365 values) and a high correlation between the values in different years (i.e. similar temperature values for each year), we can not clearly identify any differences in these groups. However, if you try this on the CO2 dataset used in the last lab, you should be able to see a clear trend showing an increase every year.

Time Series Histogram

Create a histogram for your data.

# Plot a histogram of the temperature dataset

The plot shows a distribution that looks strongly Gaussian/Normal. The plotting function automatically selects the size of the bins based on the spread of values in the data.

Time Series Density Plots

Create a time series density plot

# Plot a density plot for temperature dataset

We can see that density plot provides a clearer summary of the distribution of observations. We can see that perhaps the distribution is a little asymmetrical and perhaps a little pointy to be Gaussian.

Time Series Box and Whisker Plots by Interval

Let's use our groups by years to plot a box and whisker plot for each year for direct comparison using boxplot().

# Generate a box and whiskers plot for temp_annual dataframe

In our plot above, we dont see much difference in the mean temperature over years, however, we can spot some outliers showing extremely cold or hot days.

We can also plot distribution across months within each year. Perform following tasks to achieve this.

  1. Extract observations for year 1990 only, the last year in the dataset.

  2. Group observations by month, and add each month to a new DataFrame as a column.

  3. Create 12 box and whisker plots, one for each month of 1990.

# Use temp Dataset to extract values for 1990


# Add each month to dataFrame as a column


# Set the column names for each month i.e. 1,2,3, .., 12

# Plot the box and whiskers plot for each month 

We see 12 box and whisker plots, showing the significant change in distribution of minimum temperatures across the months of the year from the Southern Hemisphere summer in January to the Southern Hemisphere winter in the middle of the year, and back to summer again.

Time Series Heat Maps

Let's create a heatmap of the Minimum Daily Temperatures data. The matshow() function from the matplotlib library is used as no heatmap support is provided directly in Pandas.

  1. Rotate (transpose) the temp_annual dataframe as a new matrix the matrix so that each row represents one year and each column one day.
  2. Use matshow() function to draw a heatmap for transposed yearly matrix.
##### Transpose the yearly group DataFrame and draw a heatmap with matshow()

We can now see that the plot shows the cooler minimum temperatures in the middle days of the years and the warmer minimum temperatures in the start and ends of the years, and all the fading and complexity in between.

Following this intuition, let's draw another heatmap comparing the months of the year in 1990. Each column represents one month, with rows representing the days of the month from 1 to 31.

# draw a heatmap comparing the months of the year in 1990.

The plot shows the same macro trend seen for each year on the zoomed level of month-to-month. We can also see some white patches at the bottom of the plot. This is missing data for those months that have fewer than 31 days, with February being quite an outlier with 28 days in 1990.

Summary

In this lab, we discovered how to explore and better understand a time-series dataset in Python and Pandas. We learnt how to explore the temporal relationships with line, scatter, and autocorrelation plots. We also explored the distribution of observations with histograms and density plots and change in distribution of observations with box and whisker and heat map plots.

dsc-3-25-04-visualizing-time-series-data-lab-demo-online-ds-000's People

Contributors

loredirick avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.