The goal of this project is for you to use EDA, visualization, data cleaning, preprocesing, and linear models to predict home prices given the features of the home, and interpret your linear models to find out what features add value to a home! This project is a bit more open-ended than project 1.
Be sure to ...
- Think about your choices when it comes to your choices about the data. Be ready to defend your decisions!
- Use lots of plots to dig deeper into the data! Describe the plots and convey what you learned from them.
- Don't forget to read the description of the data at the kaggle website! This has valuable information that will help you clean and impute data.
NaN
means something in many of the columns! Don't just drop or fill them! - Try fitting many models! Document your work and note what you've tried.
- Apply what you've learned in class, books, videos, and blog posts.
From the Kaggle competition website:
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.