Giter Club home page Giter Club logo

dsc-3-31-11-regression-trees-model-optimization-lab-demo-online-ds-000's Introduction

Regression Trees and Model Optimization - Lab

Introduction

In this final lab, we'll see how to apply regression analysis using CART trees for regression while making use of some hyperparameter tuning to improve our model. For a comparison of predictive capabilities and computational cost, we'll work the "Boston Housing" dataset. This will allow us to compare different regression approaches in terms of their accuracy and cost involved.

Objectives

You will be able to:

  • Apply predictive regression analysis with CART trees
  • Get the data ready for modeling
  • Tune the key hyper parameters based a various models developed during training
  • Study the impact of tree pruning on the quality of predictions

Boston Housing Dataset - Again !

The dataset is available in the repo as boston.csv.

  • Load the Dataset and print its head and dimensions
# Your code here 

Identify Features and Target Data

In this lab, we'll use three features from the Boston housing dataset: 'RM', 'LSTAT', and 'PTRATIO'. You'll find a brief description of each predictor below:

Features

  • 'RM' is the average number of rooms among homes in the neighborhood.
  • 'LSTAT' is the percentage of homeowners in the neighborhood considered "lower class" (working poor).
  • 'PTRATIO' is the ratio of students to teachers in primary and secondary schools in the neighborhood.

Target

  • MEDV', the median value of the home.

  • Create dataframes for features and target as shown above.

  • Inspect the contents for validity

# Your code here 

Inspect Correlations

  • Use scatter plots to show the correlation between chosen features and target variable
  • Comment on each scatter plot
# Your code here 

Create Evaluation Metrics

  • Create a function performance(true, predicted) to calculate and return the r-sqaured score and MSE for two equal sized arrays showing true and predicted values
  • TEst the function with given data
# Evaluation Metrics
# Import metrics

def performance(y_true, y_predict):
    """ Calculates and returns the performance score between 
        true and predicted values based on the metric chosen. """
    
    
    # Your code here 
    
    
    pass

# Calculate the performance - TEST
score = performance([3, -0.5, 2, 7, 4.2], [2.5, 0.0, 2.1, 7.8, 5.3])
score

# [0.9228556485355649, 0.4719999999999998]

Supervised Training

  • For supervised learning, split the features and target datasets into training/test data (80/20).
  • For reproducibility, use random_state=42
# Your code here 

Grow a Vanilla Regression Tree

  • Run a baseline model for later comparison using the datasets created above
  • Generate predictions for test dataset and calculate the performance measures using the function created above.
  • Use random_state=45 for tree instance
  • Record your observations
# Your code here 

# (0.4712438851035674, 38.7756862745098)  - R2, MSE

Hyperparameter Tuning

  • Find the best tree depth for a depth range: 1-30
  • Run the regressor repeatedly in a for loop for each depth value.
  • Use random_state=45 for reproducibility
  • Calculate MSE and r-squared for each run
  • Plot both performance measures, for all runs.
  • Comment on the output
# Your code here 

More Hyperparameter Tuning

  • Repeat the above process for min_samples_split parameter

  • Use a a range of values from 2-10 for this parameter

  • Use random_state=45 for reproducibility

  • Visualize the output and comment on results as above

# Your code here 

Run the "Optimized" Model

  • Use the best values for max_depth and min_samples_split found in previous runs and run an optimized model with these values.
  • Calculate the performance and comment on the output
# Your code here 

Level Up - Optional

  • How about bringing in some more features from the original dataset which may be good predictors?
  • Also , try tuning more hyperparameters like max-features to find the optimal version of the model.

Summary

In this lab, we looked at applying a decision tree based regression analysis on the Boston Housing Dataset. We saw how to train various models to find the optimal values for pruning and limiting the growth of the trees. We also looked at how to extract some rules from visualizing trees , that might be used for decision making later.

dsc-3-31-11-regression-trees-model-optimization-lab-demo-online-ds-000's People

Contributors

loredirick avatar shakeelraja avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.