There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
- This is the first project of Udacity Data Scientist Term 2.
- In this report,we will use data analysis about the dataset and answer three questions.
- Is there any relationship between 'price' and 'review_score_rate'?
- Is there any pattern between other scores with price ?
- Is there any pattern between the location and the price?
-
The notebook 'airbnb.ipynb' strives to answer some chosen question using simple exploratory data analysis, and descriptive statistics on the airbnb dataset. This notebook follows on lines of Cross-Industry Standard Process for Data Mining (CRISP-DM)
-
'airbnb.html' is the static html version of the notebook.
Create the 'data' folder in the root path. Please compress the data sets into this directory.
Both data sets contain the following files:
- calendar set (calendar.csv) : Including listing id and the price and availability for that day.
- listings set (listings.csv) : Including full descriptions and average review score.
- reviews set (reviews.csv) :Including unique id for each reviewer and detailed comments.
The main findings of the code can be found at the post available here.
In the report, we will find that although the data of Seattle and Boston are different, the analysis of data characteristics shows that there is a certain correlation between price and region, as well as the evaluation of housing.
Must give credit to Airbnb for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available in Seattle AirBNB Data and Boston AirBNB Data.And the original source can be found here. Otherwise, feel free to use the code here as you would like!