This project deals with several classifcation (dicrete multivariate distribution, multivariate gaussian distribution and logistic regression) and regression techniques(Multivariate Regression and K-d trees).
Yelp.com, a social networking website, assists the users to submit the reviews from 1 to 5 stars, for the local businesses such as schools and restaurants for their different products and services. It helps people to find out the great local businesses. 3 stars or better rating was received by 78% of the businesses listed on this website. If the review is useful to user, he can give that review a “thumbs-up”. Thus, the votes to the review tell how much it was useful. In this project, our aim is to classify if the user has given more than 3 star rating and to determine how many people find it useful. We are provided with 6000 Yelp user reviews and 50 features for the 50 words that were selected by using bag-of-words model (as they appeared most frequently). By using machine learning techniques, the task we perform is to train our method by using 5000 samples (known outputs) and then use remaining 1000 reviews in order to validate the chosen method. We also report the prediction performance by using appropriate methods of machine learning. We examined the errors that occurred and explained them thoroughly. Thus, this project involves learning of the machine learning techniques for data analysis.