Giter Club home page Giter Club logo

topic-modelling-amazon-fine-food-reviews's Introduction

Topic-Modelling-Amazon-Fine-Food-Reviews

We encounter such huge texts of documents time and again. But, before reading the documents, if we can get an overview of what the documents are about, it would make the job so much easy. For example, before watching a movie, if we can watch its trailer, it can help us decide if the movie is worth it. Similarly, we can decide if the documents are worth the effort?

Or, alternatively, if we have a bunch of documents and we can identify each document with their "topic", we can shortlist the documents based on topic of interest - without having to read through all the documents.

In Machine Learning and Natural Language Processing, Topic Models, a type of statistical model, gives us the ability to discover topics from a collection of documents.

Heroku based App

Below you can find the link to the app, where you can provide your own reviews and find out what the review is about? Please keep in mind the algorithm works best for food/beverages based products that can be commonly found for Amazon's fine foods.

App - https://topic-modelling-amzon-reviews.herokuapp.com/

DATA

For the project, I used publicly available Amazon's Fine Food reveiws data. It can be accessed here. The data contains approx. 569,000 reviews from 256,000 users.

data

This is what a sample from all the words look like.

xyz

There is quite a range of words in this. From coffee to chocolate to dog. It is hard to read what kind of topics or themes are actually in the reviews.

Text Normalization

Cleaning the textual data was very important to get good topics from the reviews. The process involved following steps:

  • Removing HTML tags
  • Correcting grammar contractions
  • Lowercasing the reviews
  • Removing numbers and additional white spaces
  • Removing Punctuations
  • Tokenization
  • Remving stopwords (using a long list of words from rank.nl and domain specific words)
  • Removing Whitespaces
  • Lemmatizing all reviews

Modelling

  1. K-Means - Identified 15 topics using k-means. Evaluated topics using SSE

kmeans

  1. LDA - Identified 16 topics using LDA. Evaluated the topics using coherence scores

lda_final_vis

  1. NMF - Identified 11 topics using NMF. Evaluated the topics using coherence scores

nmf

Conclusion

LDA does a better job here. Both the models have been good picking the topics for majority of documents but LDA takes a slight edge, so I'm gonna use it as my final model here. The final LDA model was deployed using Flask and Heroku.

topic-modelling-amazon-fine-food-reviews's People

Contributors

pareshg18 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

kiran-18k

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.