Giter Club home page Giter Club logo

dsc-0-09-11-distributions-pdf-lab-online-ds-ft-100118's Introduction

The Probability Density Function (PDF) - Lab

Introduction

In this lab we will look at building visualizations known as density plots to estimate the probability density for a given set of data.

Objectives

You will be able to:

  • Calculate the PDF from given dataset containing real valued random variables
  • Plot density functions and comment on the shape of the plot
  • Plot density functions using seaborn

Let's get started!

We'll import all the required libraries for you for this lab.

# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import pandas as pd 

Import the dataset 'weight-height.csv' as pandas dataframe . Calculate the mean and standard deviation for weights and heights for male and female individually.

Hint : Use your pandas dataframe subsetting skills like loc(), iloc() and groupby()

data = None
male_df =  None
female_df =  None

  

# Male Height mean: 69.02634590621737
# Male Height sd: 2.8633622286606517
# Male Weight mean: 187.0206206581929
# Male Weight sd: 19.781154516763813
# Female Height mean: 63.708773603424916
# Female Height sd: 2.696284015765056
# Female Weight mean: 135.8600930074687
# Female Weight sd: 19.022467805319007
Male Height mean: 69.02634590621737
Male Height sd: 2.8633622286606517
Male Weight mean: 187.0206206581929
Male Weight sd: 19.781154516763813
Female Height mean: 63.708773603424916
Female Height sd: 2.696284015765056
Female Weight mean: 135.8600930074687
Female Weight sd: 19.022467805319007

Plot overlapping normalized histograms for male and female heights - use binsize = 10, set alpha level so that overlap can be visualized

<matplotlib.legend.Legend at 0x10a5a38d0>

png

# Record your observations - are these inline with your personal observations?

Write a function density() that takes in a random variable and calculates the density function using np.hist and interpolation. The function should return two lists carrying x and y coordinates for plotting the density functio

def density(x):
    
    pass



# Generate test data and test the function - uncomment to run the test
# np.random.seed(5)
# mu, sigma = 0, 0.1 # mean and standard deviation
# s = np.random.normal(mu, sigma, 100)
# x,y = density(s)
# plt.plot(x,y, label = 'test')
# plt.legend()
<matplotlib.legend.Legend at 0x10acba668>

png

Add Overlapping density plots for male and female heights to the histograms plotted earlier

# You code here 
[<matplotlib.lines.Line2D at 0x10e25c9b0>]

png

Repeat above exerice for male and female weights

# Your code here 
[<matplotlib.lines.Line2D at 0x115c5fa90>]

png

Write your observations in the cell below.

# Record your observations - are these inline with your personal observations?


# So whats the takeaway when comparing male and female heights and weights 

Repeat Above experiments in seaborn and compare with your results.

Text(0.5,1,'Comparing weights')

png

Text(0.5,1,'Comparing Weights')

png

# Your comments on the two approaches here. 
# are they similar ? what makes them different if they are ?

Summary

In this lesson we saw how to build the probability density curves visually for given datasets and compare on the distribution visually by looking at the spread , center and overlap between data elements. This is a useful EDA technique and can be used to answer some initial questions before embarking on a complex analytics journey.

dsc-0-09-11-distributions-pdf-lab-online-ds-ft-100118's People

Contributors

shakeelraja avatar loredirick avatar erdosn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.