Giter Club home page Giter Club logo

karimck / 0.1.2-sentiment-analysis-visualization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mlh-fellowship/0.1.2-sentiment-analysis-visualization

0.0 0.0 0.0 46.53 MB

Machine Learning Web Application. Helps to visualize a character-by-character breakdown of how sentiment analysis classifies text

Home Page: https://mlh-fellowship.github.io/0.1.2-sentiment-analysis-visualization/

License: MIT License

Shell 0.02% JavaScript 0.06% Python 0.39% CSS 0.02% HTML 0.03% Dockerfile 0.03% Jupyter Notebook 99.44%

0.1.2-sentiment-analysis-visualization's Introduction

Sentiment Analysis Visualization

Status GitHub Issues GitHub Pull Requests License: MIT


Pod 0.1.2

A web-app that helps to visualize a word-by-word breakdown of how sentiment analysis classifies text

Frontend View


Major goals

  • Research and decide on a machine learning model/architecture
  • Pick out 2-3 datasets we can use to train
  • Build a training pipeline
  • Train and implement the model
  • Serve the model using BentoML as an API
  • Create a web app to take in input and visualize the output

Calling the api

Our endpoint is at https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/ Our prediction endpoint can be accessed through making a POST request to https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict.

# e.g. 
curl -X POST "https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict" \
     -H "accept: */*" -H "Content-Type: application/json" \
     -d "{\"text\":\"Some example text.\"}"

Basically, make sure to set the content type to JSON and send a JSON in the format

{
  "text": "content"
}

If successful, you should get a 200 OK status and a body with something along the lines of [[0.8614905476570129], [0.7018478512763977], [0.617088258266449]] where each entry represents the sentiment from 0 (negative) to 1 (positive) of each word.


Training a new model

Currently, we have only implemented a training pipeline for the IMDB dataset but this is subject to change in the future. You can train a new classifier on the dataset by doing

python train.py

This will replace the current model in /model. model.json stores the model architecture, weights.h5 stores trained weights, and tokenizer.json stores word indices.


Packaging it with bentoML

BentoML helps us to easily serve our Keras model through an API. You can package a new API by running

python bento_service_packager.py
> ...
> [0.07744759]
> [0.1166597 ]
> [0.18447165]
> [0.20329727]
> [0.24308157]
> [0.25030023]]
> _____
> saved model path: /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2

If you'd like to save the packaged API, just copy the contents into /bento_deploy

cp -r /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2/* bento_deploy
# or whatever the autogenerated URI is

There are a few dependency nuances to be aware of before building the actual Docker image. To make sure the build doesn't error out, edit bento_deploy/requirements.txt is

tensorflow==2.1.0
sklearn
bentoml==0.7.8

Then, we can build and push and run the image as follows

docker build -t bento-classifier:latest .
docker run -p 5000:5000 bento-classifier:latest

Then, visit localhost:5000 to see the BentoML server!


Simple deep LSTM architecture

> model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding (Embedding)        (None, 100, 64)           320000
_________________________________________________________________
lstm (LSTM)                  (None, 100, 64)           33024
_________________________________________________________________
dropout (Dropout)            (None, 100, 64)           0
_________________________________________________________________
lstm_1 (LSTM)                (None, 64)                33024
_________________________________________________________________
FC1 (Dense)                  (None, 256)               16640
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0
_________________________________________________________________
out_layer (Dense)            (None, 1)                 257
_________________________________________________________________
activation (Activation)      (None, 1)                 0
=================================================================
Total params: 402,945
Trainable params: 402,945
Non-trainable params: 0
_________________________________________________________________

Data and training process

  • 85% / 15% train-test split
  • dataset is balanced (25k positive, 25k negative)
  • RMSProp with 1e-3 Learning Rate and early stopping with patience of 2 epochs
  • preprocessing
    • to lowercase
    • removed punctuation
    • removed <br /> tags
    • tokenized with vocab size of 5k
    • max sequence length of 100
  • achieved 82.2% accuracy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.