Online reputation analysis of several brands using transformers

In this project, NLP techniques of topic modeling and sentiment analysis with transformers are used to analyze the online reputation of several brands - Apple, Tesla, Amazon, Google and Microsoft - from content published on X (Twitter) between 01-06-2019 and 01-01-2020. For topic modeling the BERTopic model (based on BERT) ,designed specifically for this task, was used, while for sentiment analysis BERTweet model (based on RoBERTa) hosted on Huggingface was the one used, which is suitable for analyzing sentiment underlying English tweets.

The analysis methodology was as follows:

Data selection
Cleaning and pre-processing
Descriptive analysis of N-frames (unigrams, bigrams, trigrams) using the TF-IDF algorithm.
Topical modeling
Sentiment analysis

As for the content of the repository, it contains the following files:

Descriptive data analysis: in this file the initial data is loaded, the temporal and company filtering is performed and the content distribution by company or the temporal evolution of the number of tweets is analyzed, among others.

N-Grams Analysis Apple and Tesla: in this file the N-Grams analysis of the sets of tweets about Apple and Tesla is performed, applying the TF-IDF algorithm to obtain the most relevant unigrams, bigrams and trigrams. The most repeated terms are also visualized using word clouds.

Amazon, Google and Microsoft N-Grams analysis: in this file the same N-Grams analysis procedure is repeated for the Amazon-Google-Microsoft set.
Apple topic modeling: this file performs the topic modeling with the BERTopic model, obtaining the optimal number of the most relevant topics about Apple. It also includes multiple visualizations included in the model, such as intertopic distance, hierarchical clustering, similarity matrix or temporal evolution of the topics along the time span.
Tesla topic modeling: in this file the same topic modeling procedure is repeated for the Tesla ensemble.
Amazon-Google-Microsoft topic modeling::in this file the same topic modeling procedure is repeated for the Amazon-Google-Microsoft set.
Sentiment analysis: this file contains the sentiment analysis of the 3 sets using the BERTweet model, in which for each tweet a positive (POS), negative (NEG) or neutral (NEU) categorization is obtained, as well as the corresponding confidence score or index.
Sentiment Analysis - Graphs: this file contains the code used for : overall distribution and temporal evolution of sentiment across sets, evolution of the model's confidence score, and distribution of sentiment and temporal evolution of sentiment for a set of relevant topics.

The initial data has been extracted from the following Kaggle dataset: Tweets about the Top Companies from 2015 to 2020

teeterls / tfg_analisis_reputacion_online_marcas Goto Github PK

tfg_analisis_reputacion_online_marcas's Introduction

Online reputation analysis of several brands using transformers

tfg_analisis_reputacion_online_marcas's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent