Giter Club home page Giter Club logo

dsc-evaluating-logistic-regression-models-lab's Introduction

Evaluating Logistic Regression Models - Lab

Introduction

In regression, you are predicting continous values so it makes sense to discuss error as a distance of how far off our estimates were. When classifying a binary variable, however, a model is either correct or incorrect. As a result, we tend to quantify this in terms of how many false positives versus false negatives we come across. In particular, we examine a few different specific measurements when evaluating the performance of a classification algorithm. In this lab, you'll review precision, recall, accuracy, and F1 score in order to evaluate our logistic regression models.

Objectives

In this lab you will:

  • Implement evaluation metrics from scratch using Python

Terminology review

Let's take a moment and review some classification evaluation metrics:

$$ \text{Precision} = \frac{\text{Number of True Positives}}{\text{Number of Predicted Positives}} $$

$$ \text{Recall} = \frac{\text{Number of True Positives}}{\text{Number of Actual Total Positives}} $$

$$ \text{Accuracy} = \frac{\text{Number of True Positives + True Negatives}}{\text{Total Observations}} $$

$$ \text{F1 score} = 2 * \frac{\text{Precision * Recall}}{\text{Precision + Recall}} $$

At times, it may be best to tune a classification algorithm to optimize against precision or recall rather than overall accuracy. For example, imagine the scenario of predicting whether or not a patient is at risk for cancer and should be brought in for additional testing. In cases such as this, we often may want to cast a slightly wider net, and it is preferable to optimize for recall, the number of cancer positive cases, than it is to optimize precision, the percentage of our predicted cancer-risk patients who are indeed positive.

Split the data into training and test sets

import pandas as pd
df = pd.read_csv('heart.csv')
df.head()

Split the data first into X and y, and then into training and test sets. Assign 25% to the test set and set the random_state to 0.

# Import train_test_split


# Split data into X and y
y = None
X = None

# Split the data into a training and a test set
X_train, X_test, y_train, y_test = None

Build a vanilla logistic regression model

  • Import and instantiate LogisticRegression
  • Make sure you do not use an intercept term and use the 'liblinear' solver
  • Fit the model to training data
# Import LogisticRegression


# Instantiate LogisticRegression
logreg = None

# Fit to training data
model_log = None
model_log

Write a function to calculate the precision

def precision(y, y_hat):
    # Your code here
    pass

Write a function to calculate the recall

def recall(y, y_hat):
    # Your code here
    pass

Write a function to calculate the accuracy

def accuracy(y, y_hat):
    # Your code here
    pass

Write a function to calculate the F1 score

def f1_score(y, y_hat):
    # Your code here
    pass

Calculate the precision, recall, accuracy, and F1 score of your classifier

Do this for both the training and test sets.

# Your code here
y_hat_train = None
y_hat_test = None

Great job! Now it's time to check your work with sklearn.

Calculate metrics with sklearn

Each of the metrics we calculated above is also available inside the sklearn.metrics module.

In the cell below, import the following functions:

  • precision_score
  • recall_score
  • accuracy_score
  • f1_score

Compare the results of your performance metrics functions above with the sklearn functions. Calculate these values for both your train and test set.

# Your code here

Nicely done! Did the results from sklearn match that of your own?

Compare precision, recall, accuracy, and F1 score for train vs test sets

Calculate and then plot the precision, recall, accuracy, and F1 score for the test and training splits using different training set sizes. What do you notice?

import matplotlib.pyplot as plt
%matplotlib inline
training_precision = []
testing_precision = []
training_recall = []
testing_recall = []
training_accuracy = []
testing_accuracy = []
training_f1 = []
testing_f1 = []

for i in range(10, 95):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= None) # replace the "None" here
    logreg = LogisticRegression(fit_intercept=False, C=1e20, solver='liblinear')
    model_log = None
    y_hat_test = None
    y_hat_train = None 
    
    # Your code here

Create four scatter plots looking at the train and test precision in the first one, train and test recall in the second one, train and test accuracy in the third one, and train and test F1 score in the fourth one.

We already created the scatter plot for precision:

# Train and test precision
plt.scatter(list(range(10, 95)), training_precision, label='training_precision')
plt.scatter(list(range(10, 95)), testing_precision, label='testing_precision')
plt.legend()
plt.show()
# Train and test recall
# Train and test accuracy
# Train and test F1 score

Summary

Nice! In this lab, you calculated evaluation metrics for classification algorithms from scratch in Python. Going forward, continue to think about scenarios in which you might prefer to optimize one of these metrics over another.

dsc-evaluating-logistic-regression-models-lab's People

Contributors

fpolchow avatar loredirick avatar mas16 avatar mathymitchell avatar sumedh10 avatar taylorhawks avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.