Giter Club home page Giter Club logo

necaati / sentiment-analysis-of-text-data-tweets- Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ajayshewale/sentiment-analysis-of-text-data-tweets-

0.0 0.0 0.0 1.25 MB

This project addresses the problem of sentiment analysis on Twitter. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report, only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building a good classifier. After creating a lot of custom features, utilizing bag-of-words representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at the level of 58% was achieved. Analysing the public sentiment as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like the stock exchange.

Jupyter Notebook 22.90% Python 4.44% HTML 72.66%

sentiment-analysis-of-text-data-tweets-'s Introduction

Sentiment-Analysis-of-Text-Data-Tweets-

This project addresses the problem of sentiment analysis on Twitter. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report, only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building a good classifier. After creating a lot of custom features, utilizing bag-of-words representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at the level of 58% was achieved. Analysing the public sentiment as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like the stock exchange.

Dataset Information

We use and compare various different methods for sentiment analysis on tweets . The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets. The input data consisted of two CSV files:

  • train.csv (5971 tweets)
  • test.csv (4000 tweets)

Requirements

There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows.

  • python 3x
  • pandas
  • numpy
  • scikit-learn
  • scipy
  • nltk
  • regex
  • matplotlib
  • seaborn

The library requirements specific to some methods are:

  • xgboost for XGBoost.

Note: It is recommended to use Anaconda distribution of Python. It already have lot of prebuild libraries installed.

Information about files

  • data/text.csv: Training Dataset.
  • data/train.csv: Testing Dataset.
  • data/emoticons.txt: Emoticons Dataset.
  • Sentiment Analysis of Text Data.ipynb: Heart of the beast, contains all the functions, models and results from the project.
  • report.pdf: report of the project.
  • proposal.pdf: proposal of the project.
  • plots: consists of plots produced in project.
  • Sentiment Analysis of Text Data.html: Jupyter notebook in html format.
  • result.csv: result of the test data.

How to Run the project

  • For the sake of simplicity I have added everything in a single Python Notebook file. Follow the Notebook, each cell in the notebook is well commented which will help understand the project steps.

sentiment-analysis-of-text-data-tweets-'s People

Contributors

ajayshewale avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.