In this lecture, you'll be introduced to the multiple linear regression model. We'll start with an introductory example using linear regression, which you've seen before, to act as a segue into multiple linear regression. We'll conclude this lecture by looking at a real data example.
You will be able to:
- Understand and explain the use cases for linear and multiple regression analyses
- Describe multiple regression with >1 predictors
You have previously learned about linear regression models. In these models, what you try to do is fit a linear relationship between two variables. Let's refresh our memory with the example below. Here, we are trying to find a relationship between seniority and monthly income. It is definitely reasonable to assume that, on average, people with more seniority have a higher income than people who are newer to the job market and have less experience. A potential relationship could look like the plot below. The monthly income is shown in 1000 USD.
#import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1234)
sen = np.random.uniform(18, 65, 100)
income = np.random.normal((sen/10), 0.5)
sen = sen.reshape(-1,1)
fig = plt.figure(figsize=(7,5))
fig.suptitle('seniority vs. income', fontsize=16)
plt.scatter(sen, income)
plt.plot(sen, sen/10, c = "black")
plt.xlabel("seniority", fontsize=14)
plt.ylabel("monthly income", fontsize=14)
plt.show();
Here, seniority is the independent variable, and monthly income is the target variable.
Of course, you know that seniority is not the only factor that drives income. Other factors that can play a role are, among others, number of hours worked, years of education, and city of employment. Now You could create simple linear regression models for each of these factors and monthly outcome as a target, but more interestingly, you can also create a model where all these potential predictors serve as independent variables at once. How does this work? Let's start with an example with two predictors. Let's start by refreshing the expression for our linear model. As a general expression for our linear model, we had that
adding in years of education as a predictor, you can extend this model to:
What exactly does that look like?
As we have two predictors here, the simple line is replaced by a plane. Our
And it obviously doesn't stop here! We can add as many predictors as we like. What is important to note, however, is that for models with more than 2 predictors representing a multiple linear model becomes very difficult and even impossible! Still, it is done all the time, as linear models of all sorts are extensively used in many fields!
When thinking of lines and slopes statistically, slope parameters associated with a particular predictor
where
Congratulations! You have gained an initial understanding for a multinomial linear regression model. To illustrate the usage of the multiple linear regression model, we'll be using the auto-mpg dataset which contains information on the technical specifications of a variety of cars. In the labs, you'll be practicing your newly gained knowledge using the Boston Housing data again.