Giter Club home page Giter Club logo

airbnb_rating_prediction's Introduction

airbnb_rating_prediction

Introduction

This repository results from my Udacity Nanodegree. The goal is to analyse datasets from Inside Airbnb and predict potential ratings of housings that are not rated yet. Several graphs and key figures are shown during the analysis process to reveal dependencies and assess the performance of the model. In a first step the data is loaded and cleaned using a custom function. The as the cleaning process is time consuming the cleaned data is saved to a seperate spreadsheet. Data cleaning is performed on the listings in Amsterdam, Boston and Stockholm but can be done with any other city available at Inside Airbnb. The cleaned data is modeled by cross-validated lasso regression to prevent overfitting and have an easy model that can be set up fast and provides additional insights. The notebook is designed according to the Cross-Industry Standard Process for Data Mining (CRISP-DM) in order to have a consistent structure.

Data Acquisition

The data can be downloaded from Inside Airbnb. For the analysis the zipped listings dataset of any city can be used. The original and cleaned datasets of Boston, Amsterdam and Stockholm are available in this GitHub.

Files

This repository contains the jupyter notebook "airbnb.ipynb" for the data engineering and modeling and the original and cleaned datasets of the Airbnb listings in Boston, Stockholm and Amsterdam.

Problem Formulation

The key questions are:

  • On which factors does the rating of a listing depend the most?
  • How accurately can we predict the actual rating?
  • How do different cities compare?

Results

The key findings are that the rating of a listing strongly depends on factors that are determined rather by the host than the housing itself. The model performance was good for highly rated offers between 85 and 100. For lower ratings not enough data was available to ensure a reliable performance with the methods used. The model achieves a good average prediction performance but is not able to map the full complexity of the underlying processes. More sophisticated feature engineering and more complex models are promising to drastically improve the performance. More detailed information can be found at Medium.

Further Notes

Please be aware that you have to enter your own mapbox access token in order to use the scattermapbox capability of plotly.

airbnb_rating_prediction's People

Contributors

edkahe avatar

Watchers

 avatar

Forkers

csiripinyo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.