Giter Club home page Giter Club logo

osu_hackai_2024_triple_c's Introduction

OSU HackAI: House Price Prediction Analysis Overview

  • Team Name: Triple C
  • Team Member: Hao-Chun Chou, Lillian Chen, Larry Chen
  • Challenge: AI Club Track (House Price Prediction)

House Price Prediction Analysis

Predict house prices by analyzing a dataset with various attributes such as area, number of bedrooms, bathrooms, and more. We conducted our analyses using two primary tools: PySpark and Scikit-learn, to compare different models and identify the most effective approach for predicting house prices.

Data Preparation

Our dataset comprised 327 records with no missing values, duplicates, or inconsistencies. Although we considered removing outliers, we ultimately retained the full dataset to avoid losing valuable information, given its limited size.

Feature Engineering

In both PySpark and Scikit-learn, we transformed categorical variables into numeric formats. This involved using StringIndexer in PySpark and manual encoding methods in Scikit-learn. Features underwent binary and ordinal encoding where applicable, and we applied log transformations to 'price' and 'area' to normalize their distributions. A feature, "price per sqft," was initially considered but excluded due to its potential to mislead the model, as it implicitly contains the target variable.

Model Selection and Performance

The Gradient Boosting (GB) model implemented in Scikit-learn was the standout, achieving an R^2 of 0.687. Although we experimented with neural networks, their performance was significantly poorer, likely due to the dataset's small size, illustrating the challenges of using deep learning in this context.

Conclusions

  • The GB model in Scikit-learn provided the highest accuracy in our experiments.
  • The project highlighted the importance of judicious feature selection, particularly in avoiding features that could lead to overfitting.
  • The limited dataset size was a key challenge, especially for more complex models like neural networks, indicating the need for larger datasets for these approaches.

Future Directions

Further research could involve acquiring more data to enhance the dataset and potentially improve model performance. Additionally, exploring alternative feature engineering techniques and model parameters may offer insights into achieving better results.

osu_hackai_2024_triple_c's People

Contributors

larrychencode avatar

Watchers

 avatar

Forkers

coejohnm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.