Giter Club home page Giter Club logo

zillow-prize-zillow-s-home-value-prediction-zestimate-'s Introduction

Observations after modeling

The test set is significantly larger than the train set. Linear regression is trained in a certain level of interval to predict in a train data set. Therefore, data extrapolation is a issues while using a test set to predict. For example, linear regression will not predict well anything beyond the train set because data extrapolation is varies in the test data and features are not in a linear relationship. Therefore, the forecast will be a linear line. However, polynomial features may alleviate this issues during the process of feature engineering.

Since the goal is to predict the log error, new features are not aim to predict the sale price. For example, pool count and size could be the outliners that could be attributed to overfitting. Filling the missing values with KNN is time-consuming and categorical data need to be compressed before modeling. Some of the fields could just a a noise for the model.

Feature importance indicates the Gini impurity on the split. The performance measure may be the purity (Gini index) used to select the split points or another more specific error function. Having irrelevant features in data can decrease the accuracy of the models and make the model learn based on irrelevant features.

Training time will be a issues as the data set is large. Therefore, using fewer features and deploy the time-efficient models such as cat boost and light GBM are ideal to achieve this project. Ensemble model with iterations of different parameters in ensemble model could improve the scores from 0.065 to 0.064. However, finding a right tune could be challenging and time consuming.

zillow-prize-zillow-s-home-value-prediction-zestimate-'s People

Contributors

billypyu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.