Giter Club home page Giter Club logo

sumop's Introduction

SumOp

Summarizing the opinion of a batch of reviews by producing an instant views showing topics and sentiments.

Provides a visual summary of a batch of reviews. Given the url of a restaurant, this module scraps a list of reviews and estimates the sentiment. Each sentence is cut to isolate a sentiment in order to handle multi-opinionated texts. Aspects/Topics are detected within each fragment.

How to run it ?

  • Install Docker
  • Install docker-compose
  • Clone this repository : git clone https://github.com/vesran/SumOp.git
  • Navigate to .../sumop
  • Run docker_compose up
  • Go to localhost:8501 to see the dashboard.

Application

This project runs with Streamlit and shows the sentiments for each aspect from a chosen restaurant from Yelp. The link to the reviews to collect can be customized in the sumop/params.py file under the URL_FORMAT variable. Only Yelp can be parsed.

Example

Contributing

Adding a new website

Under sumop/scraper, add scripts to scrap the desired website. Data should be collected into a text file, each comment separated by \n. Change the PATH_TO_SAVE_SCRAPED_COMMENTS variable in sumop/params.py to indicate the new text file you have defined as well as the link to the website's article URL_FORMAT (# indicates page number but it is only used to scrap reviews so you can customize this variable as you wish). Next, run the app again.

Changing the list of aspects

The list of aspects is defined in the sumop/params.py file under the variable ASPECTS. This variable refers to a dictionary which maps the aspects to the minimum probability threshold which consider that a review is actually talking about the specified aspect. Experiments should be conducted to find the optimal threshold values. If not possible, the value 0.4 is suggested. These values are independent of each others.

Customizing the dashboard

The dashboard was built using Streamlit and Plotly. The script responsible of the dashboard is sumop/dashboard.py.

Method's details

TL;DR

One main assumption can be underlined after analysing multiple datasets : sentiments are usually grouped within a sentence (ie a sentence starts by positive statements and ends with a negative review). In order to estimate the sentiment within each aspect, a NN model is trained to estimate the polarity of each word in a sentence. It allows us to segment sentences with multiples sentiments. Each fragment of text may several aspects but one sentiment. From now, it's only a matter of detecting aspects.

For more information, you can have a look at the PDF document.

Pipeline

pipeline

Sentiment analysis

Exemple sentiment analysis

Aspect detection

Used a modiified version of the aspect detection procedure by Tulken and Cranenburgh (2020).

aspect detection

TODOs

  • Code
  • Add anasent notebook
  • Dashboard
  • Docker
  • Add pdf

References

sumop's People

Contributors

vesran avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.