Capstone Project for Udacity’s Machine Learning Nanodegree
Python 3.5.3 with a Conda environment exported to environment.yaml
.
To setup the environment, install Conda from here and follow these instructions to create the environment.
The dataset is available on the Kaggle competition page. A login is required to accept their terms & conditions.
Download, unzip and movetrain.json
and test.json
files into the Data directory.
RentalHop is an online apartment rental listing for the New York City area. One of its differentiating features is its relevancy score, a “HopScore”, by which it sorts listings by default. They would also like to use data on rental properties to improve their product in other ways, like fraud detection and quality control. For this, Two Sigma, their data-focused managing investors, have partnered with Kaggle to hold a machine learning competition: Two Sigma Connect: Rental Listing Inquiries.
RentalHop has back-end functions that could be improved with reliable predictions of how much interest individual listings will generate. These functions are:
- Fraud identification
- Quality control
- Guiding owners and agents toward better listings
By applying a variety of machine learning techniques on rental listing data (price, location, etc.), an algorithm can “learn” complex patterns that correspond to levels of interest users will have in different listings. This algorithm can them provide reliable predictions of how much interest new listings will generate.