matanbt / emotion-recognition-nlp-project Goto Github PK

Emotion Recognition model based on BERT and VAD emotion representation.

Python 99.75% Shell 0.25%

emotion-recognition-nlp-project's Introduction

Emotion Recognition - NLP Final Project

Introduction

Implementation for the emotion recognition model we present in the paper, was created as part of final-project in NLP course, in TAU.

Our model uses BERT and VAD emotions representations to train an emotion detection model using GoEmotions dataset. The model can extract specific emotions from text, as well as their 3D representations (in VAD space).

Model Architecture

Paper

Link to the paper

Usage

To train and evaluate a model, simply execute:

$ python main.py --config {my-config-file.json}

Where config.json is a configuration file located in ./config.

In order to reproduce the paper's experiments you may use the following table:

Experiment	Config
Classification model (baseline)	`paper_configs/ge_baseline.json`
MAE, 1NN	`paper_configs/mae.json`
MAE, SVM	`paper_configs/mae.json`
MSE, 1NN	`paper_configs/mse.json`
MAE+CE, 1NN	`paper_configs/mae_ce.json`
MAE, VAD scaled, 1NN	`paper_configs/mae_scaled.json`
MAE, VAD scaled, SVM	`paper_configs/mae_scaled.json`

Pipelines Usage

If you wish to classify your own text by running the model as part of full pipeline, use the single_example_pipeline.py script (you'll just need to change the constants describing the model).
If you wish to train and evaluate full datasets in different ways, as presented in the paper, use the zeroshot_pipelines.py script.

Directories and Files

config: stores all the (JSON) configurations of experiments we run while writing the paper, baseline includes.
notebooks: directory for all notebooks we write during the research.
src: The code, based on pytorch implementation of GoEmotions baseline.
- data_processing: package to preprocess all sorts of data. data_loader.py defines the essential data processor for our datasets.
- models: package defines each model that was used.
- train_eval_run: package that holds all the logic for training the evaluation, and also run_model.py which glues the latter two together.
- main.py: CLI entry point to train / evaluate the models.
data: all the data and mappings we use in this project.
- goemotions:
  - emotions.txt: list of emotions, each line number represent the index of the emotions in GoEmotions dataset.
  - train.tsv, dev.tsv, test.tsv: GoEmotions dataset split done by the paper.
- additional datasets we experimented with.
paper-stuff: Visualization of the model and VAD spaces we use in the paper.

Requirements:

Python 3.8 with the following dependencies:

torch==1.4.0
transformers==2.11.0
datasets
attrdict==2.0.1
pandas
tensorboard
sklearn

Acknowledgements

Original dataset (GoEmotions) and baseline implementation can be found at Google Research repo.
We loosely based the core implementation of the model, on the PyTorch GoEmotions baseline implementation by @monologg, in this repo.
We found much use in the unified collection of emotions-related datasets, in this repo

emotion-recognition-nlp-project's People

Contributors

Stargazers

Watchers

emotion-recognition-nlp-project's Issues

Jonathan Idea: Regression to pretrained embeddings of the labels

If success --> WebGUI

Can fork the good looking UI: https://github.com/nur-ag/emotion-ui.

Test `valence` dimension on IMDB reviews sentiment analysis

Switch [NAME] and [RELIGION] in GoEmotions examples with real words - might help our model

Linear-Regression from frozen BERT encoding as a baseline

Finetune with another dataset!

(meaning we add more examples to training, that really know VAD!)

Find a template

Use the regression results as a “feature” (If regression result is not strong enough for labeling)

Data augmentation (generate from the existing dataset more examples for something)

Acknowledge / Address: we will train our resgression model on "sparse" space.

Solutions:

Add noise (to each VAD in our mapped go emotions)
Add special VAD examples from other datasets!

Multi label classification- VAD merge emotions

search in psychology papers how VAD of multiple emotions is viewed

Remove data["example_very_unclear"] (enhance the dataset)

zero shot classification task with another dataset

Add penalty with NN (meaning we'll have a new loss, weighting regression and then NN result)

For example we put root on the regression loss IFF the NN got wrong classification / double the regression loss for wrong classification / etc.

Why not trying RoBERTa as a base model?

EmoRoBERTa easily improved..
BUT we want to prove our point, so only after adding out thing to the baseline we should go that way.

Fine-tuning BERT with regression (our main model!)

An article for fine-tuning BERT with regression:
https://medium.com/@anthony.galtier/fine-tuning-bert-for-a-regression-task-is-a-description-enough-to-predict-a-propertys-list-price-cf97cd7cb98a
FInetuning BERT with Hugging-Face's Trainer:
https://huggingface.co/docs/transformers/training
https://huggingface.co/course/chapter3/3?fw=pt

stemmer = PorterStemmer()
nltk.download('stopwords')
stopword_list = stopwords.words('english')


def process_reddit_comment(strng):
    # remove [NAME] placeholder
    processed_strng = re.sub('\[name]', '', strng)
    # remove reddit symbol 
    processed_strng = re.sub('/r', '', processed_strng)
    return processed_strng


def punct_remover(strng):
    # punctuation marks to be completely removed
    clean_strng = re.sub(r'[?|!|\'|"|#]', r'', strng)
    # punctuation marks to be replaced with space
    clean_strng = re.sub(r'[.|,|)|(|\|/]', r' ', clean_strng)
    # replace multi-space with single space 
    clean_strng = re.sub(r' +', r' ', clean_strng)

    return clean_strng


def tokenize_stem_no_stopwords(strng):
    return [stemmer.stem(w) for w in word_tokenize(strng) if w not in stopword_list]

Side Evaluation: Evaluate on IMDB Reviews sentiment-analysis

By defining a some threshold for VAD to positive vs negative.
We should expect our model to give at least 90% on the dev set... as this should be easier task...