In this lab, you'll continue to formalize your knowledge of gradient descent by coding the algorithm yourself. In the upcoming labs, you'll apply similar procedures to implement logistic regression on your own.
In this lab you will:
- Implement gradient descent from scratch to minimize OLS
To practice gradient descent, you'll investigate a simple regression case in which you're looking to minimize the Residual Sum of Squares (RSS) between the predictions and the actual values. Remember that this is referred to as Ordinary Least Squares (OLS) regression. You'll compare two simplistic models and use gradient descent to improve upon these initial models.
- Import the file
'movie_data.xlsx'
using Pandas - Print the first five rows of the data
You can use the
read_excel()
function to import an Excel file.
# Import the data
import pandas as pd
df = None
# Print the first five rows of the data
Imagine someone is attempting to predict the domestic gross sales of a movie based on the movie's budget, or at least further investigate how these two quantities are related. Two models are suggested and need to be compared.
The two models are:
Here's a graph of the two models along with the actual data:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = np.linspace(start=df['budget'].min(), stop=df['budget'].max(), num=10**5)
plt.scatter(x, 1.575*x, label='Mean Ratio Model') # Model 1
plt.scatter(x, 1.331*x, label='Median Ratio Model') # Model 2
plt.scatter(df['budget'], df['domgross'], label='Actual Data Points')
plt.title('Gross Domestic Sales vs. Budget', fontsize=18)
plt.xlabel('Budget', fontsize=16)
plt.ylabel('Gross Domestic Sales', fontsize=16)
plt.legend(bbox_to_anchor=(1, 1))
plt.show()
To compare the two models (and future ones), a metric for evaluating and comparing models to each other is needed. Traditionally, this is the residual sum of squares. As such you are looking to minimize $ \sum(\hat{y}-y)^2$.
Write a function rss()
which calculates the residual sum of squares for a simplistic model:
def rss(m, X=df['budget'], y=df['domgross']):
pass
Which of the two models is better?
# Your code here
# Your response here
Now that you have a loss function, you can use numerical methods to find a minimum to the loss function. By minimizing the loss function, you have achieved an optimal solution according to the problem formulation. Here's the outline of gradient descent from the previous lesson:
- Define initial parameters:
- pick a starting point
- pick a step size
$\alpha$ (alpha) - choose a maximum number of iterations; the algorithm will terminate after this many iterations if a minimum has yet to be found
- (optionally) define a precision parameter; similar to the maximum number of iterations, this will terminate the algorithm early. For example, one might define a precision parameter of 0.00001, in which case if the change in the loss function were less than 0.00001, the algorithm would terminate. The idea is that we are very close to the bottom and further iterations would make a negligible difference
- Calculate the gradient at the current point (initially, the starting point)
- Take a step (of size alpha) in the direction of the gradient
- Repeat steps 2 and 3 until the maximum number of iterations is met, or the difference between two points is less then your precision parameter
To start, visualize the cost function. Plot the cost function output for a range of m values from -3 to 5.
# Your code here
As you can see, this is a simple cost function. The minimum is clearly around 1. With that, it's time to implement gradient descent in order to find the optimal value for m.
# Set a starting point
cur_x = None
# Initialize a step size
alpha = None
# Initialize a precision
precision = 0.0000001
# Helpful initialization
previous_step_size = 1
# Maximum number of iterations
max_iters = 10000
# Iteration counter
iters = 0
# Create a loop to iterate through the algorithm until either the max_iteration or precision conditions is met
# Your code here; create a loop as described above
# Calculate the gradient. This is often done by hand to reduce computational complexity.
# For here, generate points surrounding your current state, then calculate the rss of these points
# Finally, use the np.gradient() method on this survey region.
# This code is provided here to ease this portion of the algorithm implementation
x_survey_region = np.linspace(start = cur_x - previous_step_size , stop = cur_x + previous_step_size , num = 101)
rss_survey_region = [np.sqrt(rss(m)) for m in x_survey_region]
gradient = np.gradient(rss_survey_region)[50]
# Update the current x, by taking an "alpha sized" step in the direction of the gradient
# Update the iteration number
# The output for the above will be: ('The local minimum occurs at', 1.1124498053361267)
print("The local minimum occurs at", cur_x)
Replot the RSS cost curve as above. Add a red dot for the minimum of this graph using the solution from your gradient descent function above.
# Your code here
In this lab, you coded up a gradient descent algorithm from scratch! In the next lab, you'll apply this to logistic regression in order to create a full implementation yourself!