Giter Club home page Giter Club logo

dsc-linear-transformations-lab's Introduction

Linear Transformations - Lab

Introduction

In this lab, you'll practice your linear transformation skills!

Objectives

You will be able to:

  • Determine if a linear transformation would be useful for a specific model or set of data
  • Identify an appropriate linear transformation technique for a specific model or set of data
  • Apply linear transformations to independent and dependent variables in linear regression
  • Interpret the coefficients of variables that have been transformed using a linear transformation

Ames Housing Data

Let's look at the Ames Housing data, where each record represents a home sale:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')

ames = pd.read_csv('ames.csv', index_col=0)
ames

We'll use this subset of features. These are specifically the continuous numeric variables, which means that we'll hopefully have meaningful mean values.

From the data dictionary (data_description.txt):

LotArea: Lot size in square feet

MasVnrArea: Masonry veneer area in square feet

TotalBsmtSF: Total square feet of basement area

GrLivArea: Above grade (ground) living area square feet

GarageArea: Size of garage in square feet
ames = ames[[
    "LotArea",
    "MasVnrArea",
    "TotalBsmtSF",
    "GrLivArea",
    "GarageArea",
    "SalePrice"
]].copy()
ames

We'll also drop any records with missing values for any of these features:

ames.dropna(inplace=True)
ames

And plot the distributions of the un-transformed variables:

ames.hist(figsize=(15,10), bins="auto");

Step 1: Build an Initial Linear Regression Model

SalePrice should be the target, and all other columns in ames should be predictors.

# Your code here - build a linear regression model with un-transformed features

Step 2: Evaluate Initial Model and Interpret Coefficients

Describe the model performance overall and interpret the meaning of each predictor coefficient. Make sure to refer to the explanations of what each feature means from the data dictionary!

# Your written answer here
Answer (click to reveal)

The model overall is statistically significant and explains about 68% of the variance in sale price.

The coefficients are all statistically significant.

  • LotArea: for each additional square foot of lot area, the price increases by about \$0.26
  • MasVnrArea: for each additional square foot of masonry veneer, the price increases by about \$55
  • TotalBsmtSF: for each additional square foot of basement area, the price increases by about \$44
  • GrLivArea: for each additional square foot of above-grade living area, the price increases by about \$64
  • GarageArea: for each additional square foot of garage area, the price increases by about \$93

Step 3: Express Model Coefficients in Metric Units

Your stakeholder gets back to you and says this is great, but they are interested in metric units.

Specifically they would like to measure area in square meters rather than square feet.

Report the same coefficients, except using square meters. You can do this by building a new model, or by transforming just the coefficients.

The conversion you can use is 1 square foot = 0.092903 square meters.

# Your code here - building a new model or transforming coefficients
# from initial model so that they are in square meters
# Your written answer here
Answer (click to reveal)
  • LotArea: for each additional square meter of lot area, the price increases by about \$2.76
  • MasVnrArea: for each additional square meter of masonry veneer, the price increases by about \$593
  • TotalBsmtArea: for each additional square meter of basement area, the price increases by about \$475
  • GrLivArea: for each additional square meter of above-grade living area, the price increases by about \$687
  • GarageArea: for each additional square meter of garage area, the price increases by about \$1,006

Step 4: Center Data to Provide an Interpretable Intercept

Your stakeholder is happy with the metric results, but now they want to know what's happening with the intercept value. Negative \$17k for a home with zeros across the board...what does that mean?

Center the data so that the mean is 0, fit a new model, and report on the new intercept.

(It doesn't matter whether you use data that was scaled to metric units or not. The intercept should be the same either way.)

# Your code here - center data
# Your code here - build a new model
# Your written answer here - interpret the new intercept
Answer (click to reveal)

The new intercept is about \$181k. This means that a home with average lot area, average masonry veneer area, average total basement area, average above-grade living area, and average garage area would sell for about \$181k.

Step 5: Identify the "Most Important" Feature

Finally, either build a new model with transformed coefficients or transform the coefficients from the Step 4 model so that the most important feature can be identified.

Even though all of the features are measured in area, they are different kinds of area (e.g. lot area vs. masonry veneer area) that are not directly comparable as-is. So apply standardization (dividing predictors by their standard deviations) and identify the feature with the highest standardized coefficient as the "most important".

# Your code here - building a new model or transforming coefficients
# from centered model so that they are in standard deviations
# Your written answer here - identify the "most important" feature
Answer (click to reveal)

The feature with the highest standardized coefficient is GrLivArea. This means that above-grade living area is most important.

Summary

Great! You've now got some hands-on practice transforming data and interpreting the results!

dsc-linear-transformations-lab's People

Contributors

loredirick avatar hoffm386 avatar cheffrey2000 avatar mas16 avatar sumedh10 avatar fpolchow avatar lmcm18 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.