Giter Club home page Giter Club logo

review_text_classifier's Introduction

review_test_classifier

This project predicts multi-class customer sentiment using text features from labelled customer reviews of restaurants from Yelp and Google.

Business understanding Predictive insights of customer-related metrics can help retail establishments make decisions that improve customer experience. However, the quality of customer predictions require text data that correctly expresses the heterogeneity and salient points of customer voices in natural language, as well as performance optimization across different algorithmic options.

Data understanding Customer reviews for this project were sourced from: 1) an aggregated subset of seven million Yelp customer reviews made available by Yelp as a set of downloadable JSON files, as well as 2) three hundred test customer reviews queried from the Places API of Google. For project purposes, the data was stored in, and queried from, a MongoDB database.

Data preparation Raw strings of customer reviews were extracted, processed and represented to models through a bag-of-words approach under a variety of alternate specifications using the Scikit-learn library. Specifications of parameters included, e.g.: 1) alternative tokenization strategies using: a) only unigrams, b) only bigrams, and c) combined unigram and bigrams, 2) dimensionality reduction techniques that included ANOVA F-scores, 3) vectorization techniques using term-frequency inverse document frequency (TFIDF) algorithms. The prediction target comprised three classes representing positive, neutral and negative customer sentiment, which we created by aggregating customer-defined scores on a 1 to 5 basis, where out positive sentiment class = user score of 5, neutral sentiment class = score of 4, and negative sentiment class = scores of 1, 2, or 3.

Modeling The basic modelling task for this project was multi-class classification using text features. Models tested included logistic regression, support vector machines, and boosting ensemble models from the XG boost library.

Evaluation We evaluate model performance and associated parameter specifications using log loss and accuracy score metrics. The test set evaluation results for the initial top-performing model, a logistic regression classifier, was 0.75, and the log loss was 0.56.

Deployment We deployed the top performing model through a flask app for the purpose of basic user testing.

review_text_classifier's People

Contributors

teosoft7 avatar glmack avatar cenuno avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.