This repo is a compilation of work done to explore possible features for improving zillow models for predicting tax assessed values for single family properties sold in 2017.
The goal of this project is to gain insight on how to improve models that predict the property tax assessed values of Single Family Properties that had a transaciton in 2017.
Providing acurrate data to our customer base is of upmost importance. Here at Zillow we pride ourselves in consistant improvement. Imporving our predictive models can help us keep an edge in this highly competitive online space. In this project I will explore what features we can use to improve our current models to predict tax assessed values of Single Family Properties that had a transaction in 2017.
-
Does county affect price?
-
Does the size of the house (area) affect price?
-
Does the age affect price?
-
Does the number of bedrooms or bathrooms affect price.
-
In exploration we saw that area and year_built were good indicators of taxable_value. While bathrooms performed better both bathroom and bedroom count also showed promise.
-
In the modeling phase the top performers all had the original features so the newly created features could be dropped or reassesed as they did not show significant improvment.
https://github.com/mwboiss/regression_project/blob/main/Report.ipynb
Variable | Meaning |
---|---|
'bathrooms' | Number of bathrooms in home including fractional bathrooms |
'bedrooms' | Number of bedrooms in home |
'bed_to_bath_ratio' | Bedrooms divided by Bathrooms |
'area' | Calculated total finished living area of the home |
'county' | county where home was sold |
'parcelid' | Unique identifier for parcels (lots) |
'propertylandusetypeid' | Type of land use the property is zoned for |
'yearbuilt' | The Year the principal residence was built |
'taxvaluedollarcnt' | The total tax assessed value of the parcel |
'taxable_value' | The total tax assessed value of the parcel |
-
A locally stored env.py file containing hostname, username and password for the mySQL database containing the zillow dataset is needed.
-
Data Science Libraries needed: pandas, numpy, matplotlib.pyplot, seaborn, scipy.stats, sklearn
-
All files in the repo should be cloned to reproduce this project.
-
Ensuring .gitignore is setup to protect env.py file data.
-
Create and test acquire functions
-
Add functions to wrangle.py module
-
Create and test prepare functions
-
Add functions to wrangle.py module
-
Explore data for missing values
-
Add code to prepare function to remove values
-
Test function in notebook
-
Assess data for outliers
-
Remove outliers if needed
-
Create function to remove outliers
-
Add function to wrangle.py module
-
Scale data appropriately
-
Create function to scale data
-
Add function to wrangle.py module
-
Write code needed to split data into train, validate and test
-
Add code to prepare function and test in notebook
-
Does county affect price?
-
Does the size of the house (area) affect price?
-
Does the age affect price?
-
Does the number of bedrooms or bathrooms affect price.
- Create visualizations exploring each question
- Run statistics test relevant to each question
- Create a summary that answers exploritory questions
-
Evaluate which metrics best answer each question
-
Evaluate a basline meteric used to compare models to the most present target variable
-
Develop models to predict the Property Tax assessed value of Single Family Properties sold in 2017.
-
Fit the models to Train data
-
Evaluate on Validate data to ensure no overfitting
-
Evaluate top model on test data
- Create report ensuring well documented code and clear summary of findings as well as next steps to improve research