ds-skills-correlation-iteration-atlanta-ds-100918's Introduction

import pandas as pd

df = pd.read_csv('causes_of_death.tsv', delimiter='\t')
print(len(df))
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Notes	State	State Code	Ten-Year Age Groups	Ten-Year Age Groups Code	Gender	Gender Code	Race	Race Code	Deaths	Population	Crude Rate
0	NaN	Alabama	1	< 1 year	1	Female	F	American Indian or Alaska Native	1002-5	14	3579	Unreliable
1	NaN	Alabama	1	< 1 year	1	Female	F	Asian or Pacific Islander	A-PI	24	7443	322.5
2	NaN	Alabama	1	< 1 year	1	Female	F	Black or African American	2054-5	2093	169339	1236.0
3	NaN	Alabama	1	< 1 year	1	Female	F	White	2106-3	2144	347921	616.2
4	NaN	Alabama	1	< 1 year	1	Male	M	Asian or Pacific Islander	A-PI	33	7366	448.0

Practice Explorations

Groupby State and Sum the numeric features.

# Groupby State
grouped = #Your code here

Calculate the Correlation Coefficient between the Deaths and Population Columns (of your grouped dataframe)

#Your code here

Repeat this process across multiple features

Iterate over the following columns: ['Race', 'Gender', 'Ten-Year Age Groups']. Within your for loop, create a temporary groupby aggregate as we did for the State column above. Then, print any aggregate grouping where the correlation coefficient is less then .95.

#Your code here

Combining Features

We can further expand upon our exploration above by testing multiple features against each other! Complete the code below to print any combination of features where the correlation between population and death is below .95 (or some other appropriate threshold).

#This could also be accomplished with the combinations() method from the itertools package.
for n, feat1 in enumerate(features):
    for feat2 in features[n:]:
        #Your code here
        #groupby feat1 and feat2!!
        #repeat your code above to check if the correlation is below a [high] threshold.

Recommend Projects

nabil95 / ds-skills-correlation-iteration-atlanta-ds-100918 Goto Github PK

ds-skills-correlation-iteration-atlanta-ds-100918's Introduction

Practice Explorations

Groupby State and Sum the numeric features.

Calculate the Correlation Coefficient between the Deaths and Population Columns (of your grouped dataframe)

Repeat this process across multiple features

Combining Features

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent