Giter Club home page Giter Club logo

emotion-recognition-nlp-project's Introduction

Emotion Recognition - NLP Final Project

Introduction

Implementation for the emotion recognition model we present in the paper, was created as part of final-project in NLP course, in TAU.

Our model uses BERT and VAD emotions representations to train an emotion detection model using GoEmotions dataset. The model can extract specific emotions from text, as well as their 3D representations (in VAD space).

Model Architecture

Paper

Link to the paper

Usage

To train and evaluate a model, simply execute:

$ python main.py --config {my-config-file.json}

Where config.json is a configuration file located in ./config.

  • In order to reproduce the paper's experiments you may use the following table:
Experiment Config
Classification model (baseline) paper_configs/ge_baseline.json
MAE, 1NN paper_configs/mae.json
MAE, SVM paper_configs/mae.json
MSE, 1NN paper_configs/mse.json
MAE+CE, 1NN paper_configs/mae_ce.json
MAE, VAD scaled, 1NN paper_configs/mae_scaled.json
MAE, VAD scaled, SVM paper_configs/mae_scaled.json

Pipelines Usage

  • If you wish to classify your own text by running the model as part of full pipeline, use the single_example_pipeline.py script (you'll just need to change the constants describing the model).
  • If you wish to train and evaluate full datasets in different ways, as presented in the paper, use the zeroshot_pipelines.py script.

Directories and Files

  • config: stores all the (JSON) configurations of experiments we run while writing the paper, baseline includes.
  • notebooks: directory for all notebooks we write during the research.
  • src: The code, based on pytorch implementation of GoEmotions baseline.
    • data_processing: package to preprocess all sorts of data. data_loader.py defines the essential data processor for our datasets.
    • models: package defines each model that was used.
    • train_eval_run: package that holds all the logic for training the evaluation, and also run_model.py which glues the latter two together.
    • main.py: CLI entry point to train / evaluate the models.
  • data: all the data and mappings we use in this project.
    • goemotions:
      • emotions.txt: list of emotions, each line number represent the index of the emotions in GoEmotions dataset.
      • train.tsv, dev.tsv, test.tsv: GoEmotions dataset split done by the paper.
    • additional datasets we experimented with.
  • paper-stuff: Visualization of the model and VAD spaces we use in the paper.

Requirements:

Python 3.8 with the following dependencies:

  • torch==1.4.0
  • transformers==2.11.0
  • datasets
  • attrdict==2.0.1
  • pandas
  • tensorboard
  • sklearn

Acknowledgements

  • Original dataset (GoEmotions) and baseline implementation can be found at Google Research repo.
  • We loosely based the core implementation of the model, on the PyTorch GoEmotions baseline implementation by @monologg, in this repo.
  • We found much use in the unified collection of emotions-related datasets, in this repo

emotion-recognition-nlp-project's People

Contributors

matanbt avatar shirfrenkel avatar

Stargazers

 avatar

Watchers

 avatar

emotion-recognition-nlp-project's Issues

Consider preprocessing comments

stemmer = PorterStemmer()
nltk.download('stopwords')
stopword_list = stopwords.words('english')


def process_reddit_comment(strng):
    # remove [NAME] placeholder
    processed_strng = re.sub('\[name]', '', strng)
    # remove reddit symbol 
    processed_strng = re.sub('/r', '', processed_strng)
    return processed_strng


def punct_remover(strng):
    # punctuation marks to be completely removed
    clean_strng = re.sub(r'[?|!|\'|"|#]', r'', strng)
    # punctuation marks to be replaced with space
    clean_strng = re.sub(r'[.|,|)|(|\|/]', r' ', clean_strng)
    # replace multi-space with single space 
    clean_strng = re.sub(r' +', r' ', clean_strng)

    return clean_strng


def tokenize_stem_no_stopwords(strng):
    return [stemmer.stem(w) for w in word_tokenize(strng) if w not in stopword_list]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.