Giter Club home page Giter Club logo

sentiment-analysis's Introduction

Sentiment-Analysis

Sentiment Analysi is the process of computationally defining and categorizing opinions expressed in a piece of text, especially to decide if the writer's attitude toward a specific subject is positive or negative.

Dataset

We used the IMDB, which is categorised 0 - negative or 1 - positive. Also we checked multiple resorces such as this one or this in order to gain an overview how other people solve this problem.

How we built it

Bag of Words

We used the Bow Classifier for our Model. For it we created a class BoWClassifier that inhetirts the torch nn.Module module. Also we implimented the early stopping in order to avoid overffiting and reduce the waiting time to train the model.

Results

The dev accuracy in the last epoch (because of the early stopping it is 87.92% and the dev loss it is 0.418

Screenshot 2021-05-09 at 16 49 51

Never the less we can see pretty big dicrepancy between dev and train loss and accuracy so this might indicate that the model is a little overffit. SInce we provided a lot of data for the training we suspect that the problem is that this model is too simple. This is why we tried also to train on the LSTM classifier Screenshot 2021-05-09 at 16 49 14

Challenges we ran into

  • Hard to improve the accuracy
  • Memory issues

LSTM sequence classifier model with pretrained embeddings

For the second model we used Sequence Classification with LSTM Recurrent Neural Networks. For it we also created a separate class LSTMClassifier which also inherits the nn.Module and we tokenized the sentances using the Tokenizermodule from keras. Also we introduced downsampling in order to speed up the time and to check if this will affect our neural network.

Glove Embeddings

Global Vectors for Word Representation, or GloVe, is an “unsupervised learning algorithm for obtaining vector representations for words.” Training is performed on aggregated global word-word co-occurrence statistics from a corpus. It is developed by Stanford. We used the glove.6B.100d embeddings for the model.

Results

We got only 46% accuracy for dev and 0.824 for the loss which we consider very bad results.

Screenshot 2021-05-09 at 17 03 58

Screenshot 2021-05-09 at 17 04 16

Test accuracy 51.5%

We decided to use other metrics to understand the poor results

Precision

When evaluating the sentiment (positive, negative, neutral) of a given text document, the baseline of precision lies around 80-85% . This is the baseline we try to meet or beat when we're training a sentiment scoring system. Test Precision : 75.9% Which is lower than the baseline.

Confusion Matrix

A confusion matrix is a method of visualizing classification results. Confusion matrix will show you if your predictions match the reality and how do they match in more detail.

image

The Confusion matrix helps us understand how many correct prediction does the model make

image

Challenges we ran into

  • Random kernel stopping when using pandas
  • Memory issues
  • Doing the Sanity check the notebook just stops because it requires too much memory so we commented out this part

What we learned

  • use Pythonlibraries to create models
  • impove the accuracy by testung models with different parametes
  • solving an nlp problme from scratch
  • overcoming memory issues by decreasing the dataset or batch size

sentiment-analysis's People

Contributors

airinb avatar sahejpalarneja avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.